Server admin log/Archive 37

From Wikitech
Jump to navigation Jump to search

2019-04-30

  • 23:56 ayounsi@deploy1001: Finished deploy [librenms/librenms@0fd8da6]: Rollback LibreNMS (duration: 00m 05s)
  • 23:56 ayounsi@deploy1001: Started deploy [librenms/librenms@0fd8da6]: Rollback LibreNMS
  • 23:49 ayounsi@deploy1001: Finished deploy [librenms/librenms@2094575]: Upgrade LibreNMS to 1.51 - T207481 (duration: 00m 04s)
  • 23:49 ayounsi@deploy1001: Started deploy [librenms/librenms@2094575]: Upgrade LibreNMS to 1.51 - T207481
  • 23:35 ariel@deploy1001: Finished deploy [dumps/dumps@d715ea0]: determine page ranges of content output files by cumul revision length as well as rev count (duration: 00m 03s)
  • 23:35 ariel@deploy1001: Started deploy [dumps/dumps@d715ea0]: determine page ranges of content output files by cumul revision length as well as rev count
  • 23:18 ayounsi@deploy1001: Finished deploy [librenms/librenms@0fd8da6]: Rollback LibreNMS (duration: 00m 05s)
  • 23:18 ayounsi@deploy1001: Started deploy [librenms/librenms@0fd8da6]: Rollback LibreNMS
  • 23:07 ayounsi@deploy1001: Finished deploy [librenms/librenms@2094575]: Upgrade LibreNMS to 1.51 - T207481 (duration: 00m 05s)
  • 23:07 ayounsi@deploy1001: Started deploy [librenms/librenms@2094575]: Upgrade LibreNMS to 1.51 - T207481
  • 22:21 mobrovac@deploy1001: Finished deploy [restbase/deploy@b3b140f]: Parsoid: Use the new stash tables for old revisions - T215956 (duration: 23m 56s)
  • 21:57 mobrovac@deploy1001: Started deploy [restbase/deploy@b3b140f]: Parsoid: Use the new stash tables for old revisions - T215956
  • 21:56 mobrovac@deploy1001: Finished deploy [restbase/deploy@b3b140f] (dev-cluster): Parsoid: use the new stashing tables for old revisions too (duration: 03m 22s)
  • 21:52 mobrovac@deploy1001: Started deploy [restbase/deploy@b3b140f] (dev-cluster): Parsoid: use the new stashing tables for old revisions too
  • 21:44 sbassett: Deployed patch for T222038 (1.34.0-wmf.1 and 1.34.0-wmf.3)
  • 21:44 sbassett: Deployed patch for T222036 (1.34.0-wmf.1 and 1.34.0-wmf.3)
  • 21:13 thcipriani@deploy1001: rebuilt and synchronized wikiversions files: group0 to 1.34.0-wmf.3
  • 21:10 mutante: netmon1002 - apt-get remove --purge php 7.0* ; apt-get install php-common php-pear (pending upgrades) | netmon2001: apt autoremove
  • 21:06 mutante: netmon2001 - apt-get install php-common php-pear (pending upgrades)
  • 21:03 mutante: netmon2001 - apt-get remove --purge php7.0*
  • 21:03 mutante: librenms - switched from PHP 7.0 to PHP 7.2 succesful now. reverted manual changes for debugging on netmon1002
  • 20:29 thcipriani@deploy1001: Finished scap: testwiki to 1.34.0-wmf.3 and rebuild l10n cache (duration: 31m 17s)
  • 20:21 mutante: netmon1002 - loading PHP 7.2 module to debug issue for librenms. librenms very short downtime
  • 19:58 thcipriani@deploy1001: Started scap: testwiki to 1.34.0-wmf.3 and rebuild l10n cache
  • 19:56 thcipriani@deploy1001: Pruned MediaWiki: 1.33.0-wmf.20 (duration: 02m 07s)
  • 19:47 thcipriani@deploy1001: Pruned MediaWiki: 1.33.0-wmf.19 (duration: 02m 24s)
  • 19:44 smalyshev@deploy1001: Finished deploy [wdqs/wdqs@4360316]: Redeploy GUI for fixes T222133, T222129, T222181, T222182 (duration: 09m 17s)
  • 19:44 thcipriani@deploy1001: Pruned MediaWiki: 1.33.0-wmf.18 (duration: 02m 25s)
  • 19:43 mutante: switched netmon1002/netmon2001 from PHP 7.0 to 7.2 but reverted because LibreNMS still had an issue with it
  • 19:40 thcipriani@deploy1001: Pruned MediaWiki: 1.33.0-wmf.17 (duration: 10m 11s)
  • 19:35 smalyshev@deploy1001: Started deploy [wdqs/wdqs@4360316]: Redeploy GUI for fixes T222133, T222129, T222181, T222182
  • 19:27 otto@deploy1001: scap-helm eventgate-analytics finished
  • 19:27 otto@deploy1001: scap-helm eventgate-analytics cluster codfw completed
  • 19:27 otto@deploy1001: scap-helm eventgate-analytics upgrade production -f eventgate-analytics/eventgate-analytics-codfw-values.yaml --reset-values stable/eventgate-analytics [namespace: eventgate-analytics, clusters: codfw]
  • 19:26 otto@deploy1001: scap-helm eventgate-analytics finished
  • 19:26 otto@deploy1001: scap-helm eventgate-analytics cluster eqiad completed
  • 19:26 otto@deploy1001: scap-helm eventgate-analytics upgrade production -f eventgate-analytics/eventgate-analytics-eqiad-values.yaml --reset-values stable/eventgate-analytics [namespace: eventgate-analytics, clusters: eqiad]
  • 19:25 otto@deploy1001: scap-helm eventgate-analytics finished
  • 19:25 otto@deploy1001: scap-helm eventgate-analytics cluster staging completed
  • 19:25 otto@deploy1001: scap-helm eventgate-analytics upgrade staging -f eventgate-analytics/eventgate-analytics-staging-values.yaml --reset-values stable/eventgate-analytics [namespace: eventgate-analytics, clusters: staging]
  • 19:24 otto@deploy1001: scap-helm eventgate-analytics upgrade production -f eventgate-analytics/eventgate-analytics-staging-values.yaml --reset-values stable/eventgate-analytics [namespace: eventgate-analytics, clusters: staging]
  • 19:24 otto@deploy1001: scap-helm eventgate-analytics upgrade production -f eventgate-analytics/analytics/eventgate-analytics-staging-values.yaml --reset-values stable/eventgate-analytics [namespace: eventgate-analytics, clusters: staging]
  • 18:40 cdanis: running puppet on ms-be201[3,5] to bump replication concurrency T221068
  • 18:24 cdanis: running puppet on ms-be2014 to bump replication concurrency T221068
  • 18:09 thcipriani: start branchcut for 1.34.0-wmf.3
  • 17:16 bsitzmann@deploy1001: Finished deploy [mobileapps/deploy@1f09e44]: Update mobileapps to 142ba30 (T217837) (duration: 04m 16s)
  • 17:11 bsitzmann@deploy1001: Started deploy [mobileapps/deploy@1f09e44]: Update mobileapps to 142ba30 (T217837)
  • 16:57 ayounsi@deploy1001: Finished deploy [librenms/librenms@0fd8da6]: Rollback LibreNMS (duration: 00m 09s)
  • 16:57 ayounsi@deploy1001: Started deploy [librenms/librenms@0fd8da6]: Rollback LibreNMS
  • 16:52 arturo: merging change to `profile::base` and `::raid` https://gerrit.wikimedia.org/r/c/operations/puppet/+/507357 related to T221225
  • 16:36 ayounsi@deploy1001: Finished deploy [librenms/librenms@2094575]: Upgrade LibreNMS to 1.51 - T207706 (duration: 00m 11s)
  • 16:36 ayounsi@deploy1001: Started deploy [librenms/librenms@2094575]: Upgrade LibreNMS to 1.51 - T207706
  • 16:27 XioNoX: upgrade librenms to 1.51
  • 16:26 jbond42: upgrade puppet and facter in eqsin
  • 16:04 ema: pool cp4022 w/ ATS backend T219967
  • 15:58 robh@cumin1001: END (PASS) - Cookbook sre.hosts.decommission (exit_code=0)
  • 15:58 robh@cumin1001: START - Cookbook sre.hosts.decommission
  • 15:58 robh@cumin1001: END (PASS) - Cookbook sre.hosts.decommission (exit_code=0)
  • 15:58 robh@cumin1001: START - Cookbook sre.hosts.decommission
  • 15:45 elukey: restart hadoop hdfs namenodes on an-master100[1,2] to pick up new logging settings - T220702
  • 15:18 jynus: stop s8 instance on dbstore2001 for cloning to db2100 T220572
  • 15:09 jiji@deploy1001: Synchronized wmf-config/CommonSettings.php: Send 1% of anonymous users to PHP7.2 - T219150 (duration: 00m 54s)
  • 14:58 jbond42: enable-puppet "T220987: global kafaka log shipping - staged rollout (jbond)"
  • 14:56 cdanis: cdanis@cumin1001.eqiad.wmnet ~ % sudo cumin 'bast3002*' 'run-puppet-agent --enable "filippo prometheus"'
  • 14:49 cdanis: cdanis@cumin1001.eqiad.wmnet ~ % sudo cumin 'labmon1001*' 'run-puppet-agent --enable "staged rollout T222105 by cdanis"'
  • 14:44 jijiki: Sending 1% of anonymous users to PHP7.2 - T219150
  • 14:43 cdanis: cdanis@cumin1001.eqiad.wmnet ~ % sudo cumin 'bast5001*' 'run-puppet-agent --enable "staged rollout T222105 by cdanis"'
  • 14:26 jbond42: disable-puppet "T220987: global kafaka log shipping - staged rollout (jbond)"
  • 14:24 cdanis: cdanis@cumin1001.eqiad.wmnet ~ % sudo cumin 'prometheus2004*' 'run-puppet-agent --enable "staged rollout T222105 by cdanis"'
  • 14:17 cdanis: cdanis@cumin1001.eqiad.wmnet ~ % sudo cumin 'prometheus2003*' 'run-puppet-agent --enable "staged rollout T222105 by cdanis"'
  • 14:15 cdanis: cdanis@prometheus1003.eqiad.wmnet ~ % sudo enable-puppet 'cdanis testing original query.max-samples T222105'
  • 13:29 cdanis: cdanis@prometheus1004.eqiad.wmnet ~ % sudo systemctl restart prometheus@ops.service
  • 13:28 ema: depool cp4022 and reimage as upload_ats T219967
  • 13:20 arturo: reverting sudo puppet module changes https://gerrit.wikimedia.org/r/c/operations/puppet/+/507317
  • 13:16 cdanis: cdanis@prometheus1003.eqiad.wmnet ~ % sudo systemctl restart prometheus@ops.service
  • 13:15 cdanis: cdanis@prometheus1003.eqiad.wmnet ~ % sudo disable-puppet 'cdanis testing original query.max-samples T222105'
  • 13:08 cdanis: OOMed the eqiad ops prometheus @ prometheus1003
  • 13:02 cdanis: OOMed the eqiad ops prometheus @ prometheus1004
  • 12:47 cdanis: cdanis@prometheus1003.eqiad.wmnet ~ % sudo run-puppet-agent --enable "staged rollout T222105 by cdanis"
  • 12:41 arturo: merging a sudo puppet module change
  • 12:39 cdanis: cdanis@prometheus1004.eqiad.wmnet ~ % sudo run-puppet-agent --enable "staged rollout T222105 by cdanis"
  • 12:34 elukey: moved /home to /srv/home (more space in a dedicated partition) on stat1005
  • 12:32 cdanis: cdanis@cumin1001.eqiad.wmnet ~ % sudo cumin 'R:prometheus::server' 'disable-puppet "staged rollout T222105 by cdanis"'
  • 11:27 Lucas_WMDE: EU SWAT done
  • 11:22 mlitn@deploy1001: Synchronized wmf-config/CommonSettings.php: Allow cross-site requests from mobile domains (duration: 00m 52s)
  • 11:15 lucaswerkmeister-wmde@deploy1001: Synchronized wmf-config/InitialiseSettings.php: SWAT: Serialize empty lists as objects on Commons (T138104)|gerrit:507032Serialize empty lists as objects on Commons (T138104) (duration: 00m 54s)
  • 11:12 lucaswerkmeister-wmde@deploy1001: Synchronized wmf-config/InitialiseSettings.php: SWAT: Serialize empty lists as objects on Wikidata (T138104)|gerrit:507031Serialize empty lists as objects on Wikidata (T138104) (duration: 00m 55s)
  • 11:08 gilles@deploy1001: Finished deploy [performance/navtiming@d6756c0]: T221848 Proper fix for partitions_for_topic in python-kafka > 1.4.4 (duration: 00m 05s)
  • 11:08 gilles@deploy1001: Started deploy [performance/navtiming@d6756c0]: T221848 Proper fix for partitions_for_topic in python-kafka > 1.4.4
  • 11:02 ema: cp3038 mbox lag, restarting varnish-be
  • 10:55 kart_: Updated cxserver to 2019-04-30-055331-production (T219412)
  • 10:49 santhosh@deploy1001: scap-helm cxserver finished
  • 10:49 santhosh@deploy1001: scap-helm cxserver cluster codfw completed
  • 10:49 santhosh@deploy1001: scap-helm cxserver upgrade -f cxserver-codfw-values.yaml production stable/cxserver [namespace: cxserver, clusters: codfw]
  • 10:48 santhosh@deploy1001: scap-helm cxserver finished
  • 10:48 santhosh@deploy1001: scap-helm cxserver cluster eqiad completed
  • 10:48 santhosh@deploy1001: scap-helm cxserver upgrade -f cxserver-eqiad-values.yaml production stable/cxserver [namespace: cxserver, clusters: eqiad]
  • 10:45 santhosh@deploy1001: scap-helm cxserver finished
  • 10:45 santhosh@deploy1001: scap-helm cxserver cluster staging completed
  • 10:45 santhosh@deploy1001: scap-helm cxserver upgrade -f cxserver-staging-values.yaml staging stable/cxserver [namespace: cxserver, clusters: staging]
  • 10:32 godog: rollout rsyslog upgrade to 8.1901.0-1~bpo9+wmf1 in codfw
  • 10:32 arturo: T222060 reimaged labtestservices2003 as stretch spare system
  • 10:32 arturo: T222057 reimaged labtestvirt2003 as spare system
  • 10:12 godog: rollout rsyslog upgrade to 8.1901.0-1~bpo9+wmf1 in eqsin / ulsfo / esams
  • 10:08 jynus: stop s7 and x1 instances on dbstore2* for cloning T220572
  • 09:31 fsero@puppetmaster1001: conftool action : set/pooled=yes; selector: cluster=docker-registry,service=docker-registry
  • 09:26 fsero: creating lvs endpoints for docker registry - T221101
  • 09:02 elukey: roll restart hdfs namenodes on an-master100[1,2] to pick up new settings - T220702
  • 08:22 godog: bounce prometheus on bast4002 after backfill has finished - T187987
  • 08:11 gilles@deploy1001: Finished deploy [performance/navtiming@8f135ac]: T221848 Default to partition 0 when no partition is found (duration: 00m 05s)
  • 08:11 gilles@deploy1001: Started deploy [performance/navtiming@8f135ac]: T221848 Default to partition 0 when no partition is found
  • 08:11 gilles@deploy1001: deploy aborted: T221848 Defalt to partition 0 when no partition is found (duration: 00m 00s)
  • 08:11 gilles@deploy1001: Started deploy [performance/navtiming@8f135ac]: T221848 Defalt to partition 0 when no partition is found
  • 07:53 gilles@deploy1001: Finished deploy [performance/navtiming@e900152]: T221848 add more logging around startup (duration: 00m 05s)
  • 07:53 gilles@deploy1001: Started deploy [performance/navtiming@e900152]: T221848 add more logging around startup
  • 07:29 moritzm: installing systemd updates for jessie
  • 07:24 marostegui: Remove labservices1001 and labservices1002 from tendril T221857
  • 05:27 marostegui@deploy1001: Synchronized wmf-config/db-eqiad.php: Clarify db1093's status (duration: 00m 51s)
  • 05:26 marostegui@deploy1001: Synchronized wmf-config/db-codfw.php: Clarify db1093's status (duration: 00m 55s)
  • 04:26 mutante: LDAP - remove user pirroh from group nda (T222085 and cross-validate-accounts demands consistency)
  • 02:23 mutante: analytics1050 - systemctl start mclog ... it was failed like recently on analytics1052 (T212219 ?)
  • 02:09 tgr@deploy1001: Synchronized wmf-config/db-eqiad.php: SWAT: depool db1093|gerrit:507237depool db1093 (duration: 00m 54s)
  • 01:30 mutante: contint2001..then contint1001 - deleting /etc/zuul/wikimedia and letting puppet re-clone it (gerrit:507070) (T218844)

2019-04-29

  • 23:59 ebernhardson@deploy1001: Synchronized wmf-config/CirrusSearch-production.php: T220625 Add cloudelastic servers to wgCirrusSearchClusters (5/5) (duration: 00m 52s)
  • 23:58 ebernhardson@deploy1001: Synchronized wmf-config/InitialiseSettings.php: T220625 Add cloudelastic servers to wgCirrusSearchClusters (4/5) (duration: 00m 52s)
  • 23:56 ebernhardson@deploy1001: Synchronized wmf-config/ProductionServices.php: T220625 Add cloudelastic servers to wgCirrusSearchClusters (3/5) (duration: 00m 50s)
  • 23:55 ebernhardson@deploy1001: Synchronized wmf-config/LabsServices.php: T220625 Add cloudelastic servers to wgCirrusSearchClusters (2/5) (duration: 00m 52s)
  • 23:54 ebernhardson@deploy1001: Synchronized tests/: T220625 Add cloudelastic servers to wgCirrusSearchClusters (1/5) (duration: 00m 53s)
  • 23:34 smalyshev@deploy1001: Finished deploy [wdqs/wdqs@65796ad]: New deploy with GUI fix (duration: 31m 04s)
  • 23:33 ebernhardson@deploy1001: Synchronized wmf-config/InitialiseSettings.php: T221154: Add static.inaturalist.org to $wgCopyUploadDomains for Commons (duration: 00m 54s)
  • 23:03 smalyshev@deploy1001: Started deploy [wdqs/wdqs@65796ad]: New deploy with GUI fix
  • 21:13 mutante: restarting gerrit
  • 21:10 mutante: cobalt (gerrit) upgrading openjdk 8 minor version
  • 20:40 arlolra: Updated Parsoid to c9dab9d (T106578, T113194, T205338, T219072, T219938, T221384, T219943)
  • 20:37 XioNoX: add BGP session to AS4922 in eqiad
  • 20:37 RoanKattouw: Deployed patch for T222014
  • 20:26 arlolra@deploy1001: Finished deploy [parsoid/deploy@7859b58]: Updating Parsoid to c9dab9d (duration: 06m 36s)
  • 20:25 jbond@cumin1001: conftool action : set/pooled=yes; selector: name=mw127[5-9].eqiad.wmnet
  • 20:19 arlolra@deploy1001: Started deploy [parsoid/deploy@7859b58]: Updating Parsoid to c9dab9d
  • 20:18 jbond@cumin1001: conftool action : set/pooled=no; selector: name=mw127[5-9].eqiad.wmnet
  • 20:18 jbond@cumin1001: conftool action : set/pooled=yes; selector: name=mw127[0-4].eqiad.wmnet
  • 20:10 jbond@cumin1001: conftool action : set/pooled=no; selector: name=mw127[0-4].eqiad.wmnet
  • 20:08 jbond@cumin1001: conftool action : set/pooled=yes; selector: name=mw126[5-9].eqiad.wmnet
  • 19:59 jbond@cumin1001: conftool action : set/pooled=no; selector: name=mw126[5-9].eqiad.wmnet
  • 19:52 jbond@cumin1001: conftool action : set/pooled=yes; selector: name=mw126[1-4].eqiad.wmnet
  • 19:44 thcipriani: gerrit back
  • 19:44 jbond@cumin1001: conftool action : set/pooled=no; selector: name=mw126[1-4].eqiad.wmnet
  • 19:44 jbond@cumin1001: conftool action : set/pooled=yes; selector: name=mw125[4-8].eqiad.wmnet
  • 19:43 thcipriani: gerrit restart for https://gerrit.wikimedia.org/r/327763 T221026
  • 19:39 jbond@cumin1001: conftool action : set/pooled=no; selector: name=mw125[4-8].eqiad.wmnet
  • 19:39 jbond@cumin1001: conftool action : set/pooled=yes; selector: name=mw125[0-3].eqiad.wmnet
  • 19:36 jbond@cumin1001: conftool action : set/pooled=no; selector: name=mw125[0-3].eqiad.wmnet
  • 19:35 jbond@cumin1001: conftool action : set/pooled=yes; selector: name=mw124[5-9].eqiad.wmnet
  • 19:32 jbond@cumin1001: conftool action : set/pooled=no; selector: name=mw124[5-9].eqiad.wmnet
  • 19:31 jbond@cumin1001: conftool action : set/pooled=yes; selector: name=mw124[0-4].eqiad.wmnet
  • 19:26 jbond@cumin1001: conftool action : set/pooled=no; selector: name=mw124[0-4].eqiad.wmnet
  • 19:26 jbond@cumin1001: conftool action : set/pooled=yes; selector: name=mw124[0-4].eqiad.wmnet
  • 19:25 jbond@cumin1001: conftool action : set/pooled=yes; selector: name=mw123[8-9].eqiad.wmnet
  • 19:21 jbond@cumin1001: conftool action : set/pooled=no; selector: name=mw123[8-9].eqiad.wmnet
  • 19:20 jbond@cumin1001: conftool action : set/pooled=yes; selector: name=mw123[0-5].eqiad.wmnet
  • 19:17 jbond@cumin1001: conftool action : set/pooled=no; selector: name=mw123[0-5].eqiad.wmnet
  • 19:07 otto@deploy1001: sync-file aborted: Enable cirrussearch-request logging to eventgate-analytics for group0 wikis - T214080 (duration: 00m 02s)
  • 19:05 otto@deploy1001: Synchronized wmf-config/InitialiseSettings.php: Enable cirrussearch-request logging to eventgate-analytics for group0 wikis - T214080 (duration: 00m 53s)
  • 19:01 ottomata: deploying config change to enable cirrusssearch-request logging to eventgate-analytics for group0 wikis - T214080
  • 18:59 RoanKattouw: Deployed patch for T221739
  • 18:45 otto@deploy1001: scap-helm eventgate-analytics finished
  • 18:45 otto@deploy1001: scap-helm eventgate-analytics cluster eqiad completed
  • 18:45 otto@deploy1001: scap-helm eventgate-analytics upgrade production -f analytics/eventgate-analytics-eqiad-values.yaml --reset-values stable/eventgate-analytics [namespace: eventgate-analytics, clusters: eqiad]
  • 18:44 otto@deploy1001: scap-helm eventgate-analytics finished
  • 18:44 otto@deploy1001: scap-helm eventgate-analytics cluster codfw completed
  • 18:44 otto@deploy1001: scap-helm eventgate-analytics upgrade production -f analytics/eventgate-analytics-codfw-values.yaml --reset-values stable/eventgate-analytics [namespace: eventgate-analytics, clusters: codfw]
  • 18:42 catrope@deploy1001: Synchronized static/images/project-logos/: Change wikimaniawiki logo to Wikimania 2019 version (T221829) (duration: 00m 54s)
  • 18:41 otto@deploy1001: scap-helm eventgate-analytics finished
  • 18:41 otto@deploy1001: scap-helm eventgate-analytics cluster staging completed
  • 18:41 otto@deploy1001: scap-helm eventgate-analytics upgrade staging -f analytics/eventgate-analytics-staging-values.yaml --reset-values stable/eventgate-analytics [namespace: eventgate-analytics, clusters: staging]
  • 18:41 jbond@cumin1001: conftool action : set/pooled=yes; selector: name=mw122[8-9].eqiad.wmnet
  • 18:37 jbond@cumin1001: conftool action : set/pooled=no; selector: name=mw122[8-9].eqiad.wmnet
  • 18:37 catrope@deploy1001: Synchronized wmf-config/InitialiseSettings.php: Serialize empty lists as objects on Test Commons (T138104) (duration: 00m 54s)
  • 18:34 jbond@cumin1001: conftool action : set/pooled=yes; selector: name=mw122[1-6].eqiad.wmnet
  • 18:33 otto@deploy1001: scap-helm eventgate-analytics finished
  • 18:33 otto@deploy1001: scap-helm eventgate-analytics cluster staging completed
  • 18:33 otto@deploy1001: scap-helm eventgate-analytics upgrade staging -f analytics/eventgate-analytics-staging-values.yaml --reset-values stable/eventgate-analytics [namespace: eventgate-analytics, clusters: staging]
  • 18:30 catrope@deploy1001: Synchronized wmf-config/InitialiseSettings.php: Serialize empty lists as objects on Test Wikidata (T138104) (duration: 00m 53s)
  • 18:29 jbond@cumin1001: conftool action : set/pooled=no; selector: name=mw122[1-6].eqiad.wmnet
  • 18:26 otto@deploy1001: scap-helm eventgate-analytics finished
  • 18:26 otto@deploy1001: scap-helm eventgate-analytics cluster staging completed
  • 18:26 otto@deploy1001: scap-helm eventgate-analytics upgrade staging -f analytics/eventgate-analytics-staging-values.yaml --reset-values stable/eventgate-analytics [namespace: eventgate-analytics, clusters: staging]
  • 18:22 Jeff_Green: authdns-update for T221475
  • 18:21 catrope@deploy1001: Synchronized docroot/noc: Publish throttle-analyze at noc (T187894) (duration: 00m 53s)
  • 18:13 catrope@deploy1001: Synchronized wmf-config/InitialiseSettings.php: Add www4.bibl.ulaval.ca to wgCopyUploadsDomains (T220704) (duration: 00m 53s)
  • 17:35 Jeff_Green: authdns-update to deploy T214525
  • 17:15 onimisionipe@deploy1001: Finished deploy [wdqs/wdqs@9273213]: Blazegraph upgrade for new LDF version and GUI updates (duration: 06m 58s)
  • 17:08 onimisionipe@deploy1001: Started deploy [wdqs/wdqs@9273213]: Blazegraph upgrade for new LDF version and GUI updates
  • 16:38 jforrester@deploy1001: Synchronized wmf-config/InitialiseSettings.php: SDC UW config cleanup: Drop wmgMediaInfoEnableUploadWizardDepicts from IS (duration: 00m 53s)
  • 16:34 jforrester@deploy1001: Synchronized wmf-config/CommonSettings.php: SDC UW config cleanup: Switch to wmgMediaInfoEnableUploadWizardStatements in CS (duration: 00m 53s)
  • 16:33 jforrester@deploy1001: sync-file aborted: SDC UW config cleanup: Switch to wmgMediaInfoEnableUploadWizardStatements in CS (duration: 00m 01s)
  • 16:33 jforrester@deploy1001: Synchronized wmf-config/InitialiseSettings.php: SDC UW config cleanup: Add wmgMediaInfoEnableUploadWizardDepicts to IS (duration: 00m 53s)
  • 16:28 jforrester@deploy1001: Synchronized wmf-config/InitialiseSettings.php: SDC: Enable feature flag for depicts in UW on Test Commons (duration: 00m 53s)
  • 15:40 mholloway-shell@deploy1001: Synchronized wmf-config/InitialiseSettings.php: Update WikimediaEditorTasks counter config (T221951) (duration: 00m 58s)
  • 14:49 herron: added uid=sukhe,ou=people,dc=wikimedia,dc=org to nda ldap group T221990
  • 13:56 jbond42: rolling security updates for imagemagick
  • 13:45 fsero: DNS: creating docker-registry.svc.(eqiad|codfw).wmnet RRs
  • 13:17 jbond42: rolling security updates for libpng
  • 12:46 godog: resume rollout rsyslog 8.1901.0-1 to jessie hosts - T219764
  • 12:07 jynus: stop dbstore2002:s3 and dbstore2001:s5 for cloning to db2098/99 T220572
  • 11:56 kart_: EU-Midday SWAT done. Thanks.
  • 11:56 kartik@deploy1001: Synchronized php-1.34.0-wmf.1/extensions/ContentTranslation: SWAT: 506971|Change the way we calculate total unmodified MT (T221930) (duration: 00m 56s)
  • 11:30 kartik@deploy1001: Synchronized wmf-config/InitialiseSettings.php: SWAT: 505765|Add namespace "Aldono" at eo.wiktionary (T221525) (duration: 00m 54s)
  • 11:21 kartik@deploy1001: Synchronized wmf-config/InitialiseSettings.php: SWAT: 506939| (T222018) (duration: 00m 53s)
  • 11:14 kartik@deploy1001: Synchronized wmf-config/InitialiseSettings.php: SWAT: 506860|Allow admins to add or remove patroller group at enwikivoyage (T222008) (duration: 00m 55s)
  • 09:27 joal@deploy1001: Finished deploy [analytics/refinery@53a4eee]: Test of deploy-user change (analytics -> analytics-deploy) - bis (duration: 28m 19s)
  • 09:13 jynus: stop dbstore2002:s4 for cloning to db2099 T220572
  • 08:59 joal@deploy1001: Started deploy [analytics/refinery@53a4eee]: Test of deploy-user change (analytics -> analytics-deploy) - bis
  • 08:39 godog: begin migration of bast4002 to prometheus v2 - T187987
  • 08:38 joal@deploy1001: Finished deploy [analytics/refinery@53a4eee]: Test of deploy-user change (analytics -> analytics-deploy) (duration: 15m 38s)
  • 08:33 elukey: restart keyholder on deploy1001 + rearm keys
  • 08:28 elukey: restart keyholder-proxy on deploy1001 (attempt to see if new analytics scap settings got applied)
  • 08:25 oblivian@deploy1001: Synchronized wmf-config/CommonSettings.php: Enable unicode overrides table for php 7.2 T219279 (duration: 00m 53s)
  • 08:25 jynus: stop dbstore2001:s2 for cloning to db2098 T220572
  • 08:23 oblivian@deploy1001: Synchronized wmf-config/Php72ToUpper.php: Adding unicode overrides table for php 7.2 T219279 (duration: 00m 54s)
  • 08:23 joal@deploy1001: Started deploy [analytics/refinery@53a4eee]: Test of deploy-user change (analytics -> analytics-deploy)
  • 07:58 marostegui@deploy1001: Synchronized wmf-config/db-codfw.php: Move db2045 from s8 to x1 T219493 (duration: 00m 55s)
  • 07:47 marostegui: Stop mysql on db2034 (lag will happen on x1 codfw) - T219493
  • 07:44 marostegui: Stop replication on db2034 (x1 master) for maintenance - T219493
  • 07:13 moritzm: updated stretch netboot image for 9.9 point release

2019-04-28

  • 17:46 jiji@cumin1001: conftool action : set/pooled=no; selector: name=cp3037.esams.wmnet
  • 17:46 jijiki: Depooling cp3037 - server and mgmt is unreachable
  • 14:55 James_F: Updated trwiki's MediaWiki:Common.css to not over-ride the logo.
  • 14:53 James_F: Manually purged the trwiki logos from Varnish as part of updating them for 2 year anniversary.
  • 14:47 jforrester@deploy1001: Synchronized static/images/project-logos/trwiki.png: trwiki: Update logo for 2 year anniversary, part III (duration: 00m 53s)
  • 14:45 jforrester@deploy1001: Synchronized static/images/project-logos/trwiki-1.5x.png: trwiki: Update logo for 2 year anniversary, part II (duration: 00m 53s)
  • 14:44 jforrester@deploy1001: Synchronized static/images/project-logos/trwiki-2x.png: trwiki: Update logo for 2 year anniversary, part I (duration: 00m 55s)

2019-04-27

  • 17:44 elukey: restart pdfrender on scb1002 (alert flapping)
  • 12:37 jynus: correcting last log, stopping dbstore2002:s1 to clone it to db2097 T220572
  • 12:37 jynus: stopping dbstore2002:s6 to clone it to db2097 T220572
  • 00:11 foks: reset passwords for FritzSolms@global and Seanhood@global

2019-04-26

  • 20:15 foks: changing email and password for "Lemon martini@global"
  • 19:38 foks: changing password for JDiPierro@global
  • 19:21 bblack: varnish-backend-restart on cp4026, evidence of artificial 503s from mbox lag behavior, probably related to the semi-abuse client doing odd 404 traffic to ulsfo that's triggering bugs in swift's rewrite.py ....
  • 19:04 foks: changing password for Subinsebastien
  • 17:50 mutante: analytics1052 - reported broken systemd state in Icinga - service mcelog was in state failed - systemctl start mcelog - (T212219 ?)
  • 16:18 jynus: stop s6 mariadb instance on dbstore2001 T220572
  • 15:34 jbond42: upgrade puppet 4=> 5 and facter 2 => 3 on canary hosts: thumbor1001 ms-fe1005 ms-be1013 scb1001 restbase1007
  • 15:05 jbond42: upgrade puppet 4=> 5 and facter 2 => 3 on canary hosts: ores1001.yaml wtp1025.yaml rdb1006.yaml
  • 14:18 marostegui: Set pc1004-1006 and pc2004-2006 as unracked on netbox - T209858 T210969
  • 13:17 jbond42: upgrade puppet 4=> 5 and facter 2 => 3 on canary hosts: mw1311.yaml, mx2001 & dubnium
  • 12:52 ema: cp4025: restart varnish-be due to mbox lag
  • 12:50 jijiki: Restarting hhvm on mw1288
  • 12:48 jbond42: upgrade puppet 4=> 5 and facter 2 => 3 on mc1019, maps1001 and logstash1007
  • 12:45 ema@puppetmaster1001: conftool action : set/pooled=yes; selector: cluster=cache_upload,name=cp4021.ulsfo.wmnet,dc=ulsfo
  • 12:44 ema: pool cp4021 w/ ATS backend T219967
  • 12:20 ema: repool cp3030 after directors.frontend.vcl testing T219967
  • 12:09 jbond42: upgrade puppet 4=> 5 and facter 2 => 3 on canary hosts: elastic1017, ganeti2001, analytics1042
  • 11:26 jbond42: upgrade puppet 4=> 5 and facter 2 => 3 on lvs4007, dns2001 and multatuli
  • 11:16 jbond42: upgrade puppet 4=> 5 and facter 2 => 3 on bast4002, aqs1004 and conf2001
  • 10:28 moritzm: restarting Parsoid on wtp1025 for glibc update
  • 10:19 ema: depool cp3030 for testing T219967
  • 09:48 marostegui: Remove labtestservices2001 from tendril - T218022
  • 09:11 moritzm: restarting AQS on aqs1004 for glibc update
  • 08:42 elukey: restart pdfrender on scb1003 (alert flapping)
  • 08:21 moritzm: uploaded php-xdebug 2.7.0+wmf1 for component/php72 (T221923)
  • 07:20 moritzm: installing glibc updates on a number of analytics hosts
  • 04:58 marostegui@deploy1001: Synchronized wmf-config/db-eqiad.php: Repool db1113:3316 T221782 (duration: 00m 56s)
  • 00:31 eileen: civicrm revision changed from 88736c7c11 to 34027da7df, config revision is 2119df9495

2019-04-25

2019-04-24

  • 22:46 mutante: icinga-downtime -h ms-be2034 -r swift-rebalancing -d 86400
  • 22:19 mutante: deploying varnish/trafficserver change to cover www.wikiba.se (not prod yet)
  • 22:19 mutante: icinga-downtime -h ms-be2039 -r swift-rebalancing -d 86400
  • 21:31 mutante: icinga-downtime -h ms-be2038 -r swift-rebalancing -d 86400
  • 20:41 mobrovac@deploy1001: Finished deploy [restbase/deploy@8a6b6fc]: Parsoid storage simplification step 1: switch Parsoid stashing to simple key/value - T215956 (duration: 20m 39s)
  • 20:21 mobrovac@deploy1001: Started deploy [restbase/deploy@8a6b6fc]: Parsoid storage simplification step 1: switch Parsoid stashing to simple key/value - T215956
  • 20:01 mobrovac@deploy1001: Finished deploy [restbase/deploy@8a6b6fc] (dev-cluster): Switch Parsoid stashing to simple key/value (duration: 04m 18s)
  • 19:57 mobrovac@deploy1001: Started deploy [restbase/deploy@8a6b6fc] (dev-cluster): Switch Parsoid stashing to simple key/value
  • 18:47 mutante: pooled mw1297 as a new API server (T192457)
  • 18:46 dzahn@cumin1001: conftool action : set/pooled=yes; selector: name=mw1297.eqiad.wmnet,cluster=api_appserver
  • 18:45 mutante: mw1297 - scap pull
  • 18:17 mutante: sudo icinga-downtime -h ms-be2031 -r swift-rebalancing -d 86400
  • 17:52 mutante: contint1001 - for logfile in $(find /var/log/zuul/ ! -name "*.gz"); do gzip $logfile; done to get more disk space (T207707)
  • 17:33 mutante: contint1001 - apt-get clean for 1% more disk space
  • 17:23 mutante: proton1001 - restarting proton service - low RAM caused facter/puppet fails (https://tickets.puppetlabs.com/browse/PUP-8048) freed memory and fixed puppet run (cc: T219456 T214975)
  • 16:33 catrope@deploy1001: Synchronized php-1.34.0-wmf.1/extensions/GrowthExperiments/: Fix exceptions in Homepage logging (duration: 00m 56s)
  • 15:52 herron: performing rolling restart of pybal on low-traffic eqiad/codfw lvs hosts
  • 15:32 jijiki: Restarting php7.2-fpm on mw2* in codfw for 505383 and T211488
  • 15:00 herron: switching kibana lvs to source hash scheduler
  • 14:41 jijiki: restart pdfrender on scb1002
  • 14:28 godog: being rollout rsyslog 8.1901.0-1 to jessie hosts - T219764
  • 13:38 marostegui: Poweroff db2080 for onsite maintenance - T216240
  • 13:01 jijiki: Restarting php7.2-fpm on mw13* for 505383 and T211488
  • 12:36 jijiki: restarting pdfrender on scb1004
  • 12:23 moritzm: rolling restart of Cassandra on restbase/eqiad to pick up Java security update
  • 11:59 jijiki: Restarting php7.2-fpm on mw12* for 505383 and T211488
  • 11:45 gehel: restarting relforge for jvm ugprade
  • 11:33 jbond42: security update ghostscript on scb jessie servers
  • 11:25 jijiki: Restarting php7.2-fpm on mw-canary for 505383 and T211488
  • 11:23 ladsgroup@deploy1001: Finished deploy [ores/deploy@060fc37]: (no justification provided) (duration: 16m 18s)
  • 11:07 ladsgroup@deploy1001: Started deploy [ores/deploy@060fc37]: (no justification provided)
  • 10:28 akosiaris@deploy1001: scap-helm cxserver finished
  • 10:28 akosiaris@deploy1001: scap-helm cxserver cluster staging completed
  • 10:28 akosiaris@deploy1001: scap-helm cxserver upgrade -f cxserver-staging-values.yaml staging stable/cxserver [namespace: cxserver, clusters: staging]
  • 10:23 jijiki: Restarting php-fpm on mw1238 for 505383 and T211488
  • 09:58 moritzm: installing rsync security updates on jessie
  • 08:44 moritzm: rolling restart of Cassandra on restbase/codfw to pick up Java security update
  • 08:29 godog: swift eqiad-prod: start decom for ms-be101[45] - T220590
  • 08:17 godog: bounce prometheus on bast5001 after migration and backfill
  • 08:04 gehel@cumin1001: END (PASS) - Cookbook sre.elasticsearch.force-shard-allocation (exit_code=0)
  • 08:04 gehel@cumin1001: START - Cookbook sre.elasticsearch.force-shard-allocation
  • 08:02 gehel@cumin1001: END (PASS) - Cookbook sre.elasticsearch.force-shard-allocation (exit_code=0)
  • 08:02 gehel@cumin1001: START - Cookbook sre.elasticsearch.force-shard-allocation
  • 06:41 marostegui: Optimize tables on pc1010
  • 06:38 elukey: restart pdfrender on scb1003
  • 06:37 marostegui@deploy1001: Synchronized wmf-config/db-codfw.php: Repool db2082 (duration: 00m 52s)
  • 06:22 marostegui: Upgrade db2082
  • 06:22 marostegui@deploy1001: Synchronized wmf-config/db-codfw.php: Repool db2079, depool db2082 (duration: 00m 55s)
  • 06:18 marostegui: Upgrade db2081
  • 06:10 marostegui: Upgrade db2079
  • 06:10 marostegui@deploy1001: Synchronized wmf-config/db-codfw.php: Repool db2086, depool db2079 (duration: 00m 53s)
  • 05:55 marostegui: Upgrade db2086
  • 05:55 marostegui@deploy1001: Synchronized wmf-config/db-codfw.php: Repool db2083 and depool db2086 (duration: 00m 52s)
  • 05:38 marostegui: Upgrade db2080 and db2083
  • 05:38 marostegui@deploy1001: Synchronized wmf-config/db-codfw.php: Depool db2080 and db2083 (duration: 00m 54s)
  • 03:45 SMalyshev: repooled wdqs1003, it's good now
  • 01:26 eileen: jobs restarted process-control config revision is ef6d4761e5
  • 01:06 eileen: civicrm revision changed from 31982324b8 to 468f85e524, config revision is 13b9eefe7b
  • 01:02 eileen: process-control config revision is 13b9eefe7b
  • 00:29 mutante: mw1297 - rebooting for nutcracker issue
  • 00:28 mutante: mw1297 - scap pull
  • 00:08 mutante: DNS - add initiatives.wikimedia.org (and initiaves.m) for campaign wiki requested at T167375

2019-04-23

  • 23:51 mutante: mw1297 - initial puppet run - will show up in Icinga in a little while but not pooled yet.. all the things are being installed right now
  • 23:48 ejegg: updated payments-wiki (inactive cluster) from 7a312e371a to aa8dad50e7
  • 23:39 jforrester@deploy1001: Synchronized php-1.34.0-wmf.1/extensions/GrowthExperiments/modules/homepage/ext.growthExperiments.Homepage.Logger.js: SWAT GrowthExperiments: Fix validation errors due to state= (duration: 00m 53s)
  • 23:38 jforrester@deploy1001: Synchronized php-1.34.0-wmf.1/extensions/GrowthExperiments/includes/EventLogging/SpecialHomepageLogger.php: SWAT GrowthExperiments: Fix EventLogging errors (duration: 00m 53s)
  • 23:25 mutante: generating mcrouter certs for appservers, added mw1297.eqiad.wmnet (T192457)
  • 23:23 jforrester@deploy1001: Synchronized php-1.34.0-wmf.1/languages/Language.php: SWAT T219728 Add support for new Japanese era name 'Reiwa' (duration: 00m 52s)
  • 23:20 jforrester@deploy1001: Synchronized php-1.34.0-wmf.1/extensions/VisualEditor/modules/ve-mw/init/ve.init.mw.Target.js: SWAT T221668 VisualEditor: Restore external paste sanitization of DOM elements (duration: 00m 55s)
  • 23:09 jforrester@deploy1001: Synchronized wmf-config/InitialiseSettings.php: SWAT T221521 Add autoreviewer to wgRestrictionLevels on ptwikinews (duration: 00m 54s)
  • 22:35 XioNoX: push firewall rule to pfw3-eqiad - T221475
  • 22:33 XioNoX: push firewall rule to pfw3-codfw - T221475
  • 21:54 reedy@deploy1001: Synchronized php-1.34.0-wmf.1/extensions/ORES/includes/Specials/SpecialORESModels.php: T221696 (duration: 00m 55s)
  • 21:43 robh@cumin1001: END (PASS) - Cookbook sre.hosts.decommission (exit_code=0)
  • 21:43 robh@cumin1001: START - Cookbook sre.hosts.decommission
  • 21:33 thcipriani: restarting gerrit to pickup config changes
  • 20:55 smalyshev@deploy1001: Finished deploy [wdqs/wdqs@51b4728]: Deploy new Updater fix for cnstraints (T221407) (duration: 13m 03s)
  • 20:43 andrewbogott: updating designate pools on cloudservices1003 and 1004 using eqiad1_pool_config.yml template from the puppet repo
  • 20:42 smalyshev@deploy1001: Started deploy [wdqs/wdqs@51b4728]: Deploy new Updater fix for cnstraints (T221407)
  • 20:26 urandom: dropping disused restbase keyspaces -- T221530
  • 19:57 robh@cumin1001: END (PASS) - Cookbook sre.hosts.decommission (exit_code=0)
  • 19:57 robh@cumin1001: START - Cookbook sre.hosts.decommission
  • 19:32 mutante: webperf* - running puppet to git pull docroot
  • 19:11 thcipriani: gerrit restart
  • 18:59 krinkle@deploy1001: Synchronized php-1.34.0-wmf.1/extensions/MassMessage: c640195 (duration: 00m 56s)
  • 18:09 SMalyshev: depool wdqs1003 to let it catch up
  • 18:03 robh@cumin1001: END (PASS) - Cookbook sre.hosts.decommission (exit_code=0)
  • 18:03 robh@cumin1001: START - Cookbook sre.hosts.decommission
  • 18:03 robh@cumin1001: END (PASS) - Cookbook sre.hosts.decommission (exit_code=0)
  • 18:02 robh@cumin1001: START - Cookbook sre.hosts.decommission
  • 17:43 jijiki: Restarting memcached on mc1029 - T208844
  • 17:26 bsitzmann@deploy1001: Finished deploy [mobileapps/deploy@78985fb]: Update mobileapps to 6d3a422 (T201382 T217837) (duration: 04m 06s)
  • 17:22 bsitzmann@deploy1001: Started deploy [mobileapps/deploy@78985fb]: Update mobileapps to 6d3a422 (T201382 T217837)
  • 16:55 jijiki: Depool thumbor2004 for 505759 and pool back - T187765
  • 16:54 gehel: restart wdqs for jvm ugprade
  • 16:49 jijiki: Depool thumbor1004 for 505759 and pool back - T187765
  • 16:43 jijiki: Depool thumbor2003 for 505759 and pool back - T187765
  • 16:40 jijiki: Depool thumbor1003 for 505759 and pool back - T187765
  • 16:38 otto@deploy1001: Synchronized wmf-config/InitialiseSettings.php: Enable api-request logging to eventgate-analytics for all wikis - T214080 (duration: 00m 53s)
  • 16:33 ottomata: proceeding to enable api-request eventgate-analytics logging for all wikis
  • 16:31 herron: added jfishback to wmf ldap group T221660
  • 16:12 robh@cumin1001: END (PASS) - Cookbook sre.hosts.decommission (exit_code=0)
  • 16:12 robh@cumin1001: START - Cookbook sre.hosts.decommission
  • 16:07 reedy@deploy1001: Synchronized wmf-config/InitialiseSettings.php: set wglocaltimezone for sqwikiquote T221627 (duration: 00m 54s)
  • 15:28 mlitn@deploy1001: Synchronized wmf-config/InitialiseSettings.php: SDC: Enable Depicts functionality on Commons (duration: 00m 54s)
  • 14:27 jijiki: Depool thumbor2002 for 505759 and pool back - T187765
  • 14:21 jijiki: Depool thumbor1002 for 505759 and pool back - T187765
  • 14:16 jijiki: Depool thumbor2001 for 505759 and pool back - T187765
  • 14:14 jijiki: Depool thumbor1001 for 505759 and pool back - T187765
  • 14:07 jijiki: Disable puppet on thumbor* to merge 505759
  • 13:54 ema: depool cp4021 and reimage as upload_ats T219967
  • 13:17 jijiki: Restart nagios-nrpe-server on prometheus1003
  • 12:15 godog: swift eqiad-prod: fully decom ms-be1013 - T220590
  • 11:59 moritzm: installing clamav security updates on fermium
  • 11:56 kart_: EU-Midday SWAT is done.
  • 11:54 kart_: 'SWAT: gerrit:505059 deployment-prep: Use new poolcounter instance, gerrit:505060 deployment-prep: Use new ms-fe host.'
  • 11:53 kartik@deploy1001: Synchronized wmf-config/LabsServices.php: SWAT: gerrit:505643 (duration: 00m 53s)
  • 11:45 jijiki: Stop xenon-log, excimer-log and apache on mwlog*
  • 11:43 kartik@deploy1001: Synchronized wmf-config/InitialiseSettings.php: SWAT: gerrit:505643 Turn off logging for CitationUsage and CitationUsagePageLoad (T213969) (duration: 00m 53s)
  • 11:29 kartik@deploy1001: Synchronized wmf-config/InitialiseSettings.php: SWAT: Fix undefined variable from last SWAT (duration: 00m 54s)
  • 11:27 moritzm: installing clamav security updates on mendelevium (OTRS host)
  • 11:18 kartik@deploy1001: Synchronized wmf-config: SWAT: gerrit:505220 Use higher unmodified MT threshold for Indonesian Wikipedia (T221353) (duration: 00m 57s)
  • 10:44 moritzm: uploaded ferm 2.4-1+wmf2+deb10u1 to buster-wikimedia (T153468)
  • 09:23 godog: upgrade prometheus to v2 on bast5001, previous metrics will not be available until migration and backfill are complete - T187987
  • 09:19 elukey: dumping Kafka consumer offsets' history on logstash1012 for T221202
  • 09:00 fdans@deploy1001: Finished deploy [analytics/refinery@0d63671]: deploying changes to pageview definition brought in refinery source 0.0.87 (duration: 14m 09s)
  • 08:54 fsero: synchronizing old docker_registry content into new one - T221101
  • 08:46 fdans@deploy1001: Started deploy [analytics/refinery@0d63671]: deploying changes to pageview definition brought in refinery source 0.0.87
  • 08:14 moritzm: removing debmonitor entries for labvirt* hosts
  • 08:06 moritzm: installing wget security updates on jessie
  • 07:27 gilles@deploy1001: Synchronized wmf-config/InitialiseSettings.php: T216499 Set wgPriorityHintsRatio (duration: 00m 52s)
  • 06:20 marostegui@deploy1001: Synchronized wmf-config/db-eqiad.php: Repool all slaves in x1 T136427 (duration: 00m 57s)
  • 05:52 elukey: powercycle wtp2019 - no ssh, mgmt console stuck
  • 05:16 marostegui: Deploy schema change on x1 master - lag will appear on x1 slaves - T136427
  • 05:16 marostegui@deploy1001: Synchronized wmf-config/db-eqiad.php: Depool all slaves in x1 T136427 (duration: 00m 54s)

2019-04-22

  • 18:46 gilles@deploy1001: Synchronized php-1.34.0-wmf.1/includes/media/ThumbnailImage.php: T216499 Only apply high priority hint half the time (duration: 00m 53s)
  • 18:22 XioNoX: Add k8s BGP neighbors on cr1/2-eqiad - T220822
  • 18:15 XioNoX: Add k8s BGP neighbors on cr1/2-codfw - T220822
  • 08:47 marostegui: finished maintenance window on dbstore1003 and dbstore1005
  • 08:37 marostegui: Upgrade dbstore1005
  • 07:53 marostegui@deploy1001: Synchronized wmf-config/db-eqiad.php: Fully repool db1099 (duration: 00m 54s)
  • 07:37 marostegui@deploy1001: Synchronized wmf-config/db-eqiad.php: More traffic to db1099 (duration: 00m 53s)
  • 07:16 marostegui@deploy1001: Synchronized wmf-config/db-eqiad.php: More traffic to db1099 (duration: 00m 53s)
  • 06:40 marostegui: Upgrade dbstore1003
  • 06:38 marostegui@deploy1001: Synchronized wmf-config/db-eqiad.php: More traffic to db1099 (duration: 00m 53s)
  • 05:56 marostegui@deploy1001: Synchronized wmf-config/db-eqiad.php: More traffic to db1099 (duration: 00m 53s)
  • 05:38 marostegui@deploy1001: Synchronized wmf-config/db-eqiad.php: Slowly repool db1099 (duration: 00m 54s)
  • 05:26 marostegui: Stop MySQL and reboot db1099 to see if memory errors clear up T221502
  • 05:25 marostegui@deploy1001: Synchronized wmf-config/db-eqiad.php: Depool db1099 T221502 (duration: 01m 15s)

2019-04-21

  • 05:19 marostegui: Clean up some space on webperf2001 - T221508

2019-04-20

  • 08:12 _joe_: depooling mw1261,mw1312 wikidata (at least) not working
  • 07:58 jijiki: Pool thumbor1001
  • 07:52 jijiki: depool thumbor1001, switch back to nginx - T187765
  • 07:50 _joe_: restarting php-fpm on mw1312, mw1261 to test the new settings over the weekend

2019-04-19

  • 23:17 dzahn@cumin1001: conftool action : set/pooled=yes; selector: name=mw2245.codfw.wmnet,cluster=api_appserver
  • 23:16 dzahn@cumin1001: conftool action : set/pooled=yes; selector: name=mw2244.codfw.wmnet,cluster=api_appserver
  • 23:10 dzahn@cumin1001: conftool action : set/pooled=yes; selector: name=mw2150.codfw.wmnet,service=nginx,cluster=jobrunner
  • 22:55 mutante: mw2244,mw2245,mw2150 - scap pull
  • 22:53 mutante: mw2244,mw2245,mw2150 - rebooting for known nutcracker issue after first install
  • 22:47 mutante: furud - remounted /mnt/hdfs for T221483
  • 21:42 mutante: mw2150,mw2244,mw2245: initial puppet run, added to mw roles
  • 19:38 otto@deploy1001: Synchronized wmf-config/CirrusSearch-common.php: No-op - enabling cirrussearch-request logging in beta (duration: 00m 52s)
  • 19:37 otto@deploy1001: Synchronized wmf-config/InitialiseSettings-labs.php: No-op - enabling cirrussearch-request logging in beta (duration: 00m 53s)
  • 19:36 otto@deploy1001: Synchronized wmf-config/InitialiseSettings.php: No-op - prep for enabling cirrussearch-request logging in beta (duration: 00m 53s)
  • 16:20 bblack: wikipedia.org CNAME TTLs increase to 4H - https://gerrit.wikimedia.org/r/c/operations/dns/+/505249 - T208263
  • 16:18 ejegg: rolled back payments-wiki from eb3d0f35de to aa8dad50e7
  • 15:55 reedy@deploy1001: Synchronized php-1.34.0-wmf.1/includes/logging/LogFormatter.php: T220767 (duration: 00m 53s)
  • 15:54 bblack: restart pybal on lvs1016 (eqiad primary) for eventscehmas service add
  • 15:54 reedy@deploy1001: Synchronized php-1.34.0-wmf.1/includes/Linker.php: T220767 (duration: 00m 55s)
  • 15:50 bblack@cumin1001: conftool action : set/pooled=yes; selector: name=schema.*
  • 15:42 bblack: restart pybal on lvs2003 (codfw primary) for eventscehmas service add
  • 15:39 bblack: restart pybal on lvs2006 (codfw backup) for eventscehmas service add
  • 15:32 bblack: restarting pybal on lvs1006 (eqiad backup) for eventschema service add
  • 14:59 volans: uploaded spicerack_0.0.23-1_amd64.deb to apt.wikimedia.org stretch-wikimedia
  • 12:59 gilles@deploy1001: Synchronized wmf-config/InitialiseSettings.php: T216499 T216598 Enable Priority Hints and Element Timing on eswiki (duration: 00m 56s)
  • 08:45 akosiaris: restart gerrit to pick up https://gerrit.wikimedia.org/r/504981
  • 06:39 elukey: roll restart of druid daemons on druid100[1-3] to pick up new jvm settings

2019-04-18

  • 23:16 mobrovac: evening SWAT completed
  • 23:10 mobrovac@deploy1001: Synchronized php-1.34.0-wmf.1/extensions/EventBus/includes: (no justification provided) (duration: 00m 54s)
  • 23:10 ejegg: updated payments-wiki from aa8dad50e7 to eb3d0f35de
  • 23:07 mobrovac@deploy1001: Synchronized wmf-config/InitialiseSettings.php: Add wikimania years namespaces to wgNamespacesWithSubpages - T220950 (duration: 00m 53s)
  • 23:00 robh@cumin1001: END (PASS) - Cookbook sre.hosts.decommission (exit_code=0)
  • 23:00 robh@cumin1001: START - Cookbook sre.hosts.decommission
  • 22:40 ejegg: updated payments-wiki from aa8dad50e7 to 2f7cd8f195
  • 22:14 mutante: LDAP - adding 'ldoan' and 'schang' to 'wmf' (T221118)
  • 22:01 XioNoX: remove asw2-a-eqiad license keys for troubleshoting
  • 21:58 ejegg: rolled back payments-wiki to aa8dad50e7
  • 21:55 mutante: LDAP - adding rosalie-wmde to group 'wmde' (T220691)
  • 21:52 ejegg: updated payments-wiki from aa8dad50e7 to 2f7cd8f195
  • 21:28 mutante: puppetmaster1001 - mcrouter_generate_certs --generate
  • 21:18 thcipriani@deploy1001: Finished deploy [gerrit/gerrit@e3c340f]: plugin update -- no restart needed (cobalt) (duration: 00m 10s)
  • 21:18 thcipriani@deploy1001: Started deploy [gerrit/gerrit@e3c340f]: plugin update -- no restart needed (cobalt)
  • 21:17 thcipriani@deploy1001: Finished deploy [gerrit/gerrit@e3c340f]: plugin update -- no restart needed (gerrit2001) (duration: 00m 11s)
  • 21:17 thcipriani@deploy1001: Started deploy [gerrit/gerrit@e3c340f]: plugin update -- no restart needed (gerrit2001)
  • 21:14 robh@cumin1001: END (FAIL) - Cookbook sre.hosts.decommission (exit_code=99)
  • 21:14 robh@cumin1001: START - Cookbook sre.hosts.decommission
  • 20:56 twentyafterfour@deploy1001: rebuilt and synchronized wikiversions files: all wikis to 1.34.0-wmf.1 refs T220726
  • 20:52 cdanis: root@icinga1001.wikimedia.org /var/lib/icinga # for DOWNTIME in $(fgrep -B12 'comment=mobrovac: temp stop JQ for T221368 - cdanis@cumin1001' retention.dat | grep -A13 servicedowntime | grep downtime_id | cut -d= -f2); do printf "[%lu] DEL_SVC_DOWNTIME;%u\n" $(date +%s) $DOWNTIME ; done > rw/icinga.cmd
  • 20:40 mobrovac@deploy1001: Synchronized php-1.34.0-wmf.1/extensions/Translate/utils/MessageUpdateJob.php: Translate jobs: Remove problematic Job::$params assignments, dir 2/2 - T221368 (duration: 01m 00s)
  • 20:39 mobrovac@deploy1001: Synchronized php-1.34.0-wmf.1/extensions/Translate/tag: Translate jobs: Remove problematic Job::$params assignments, dir 1/2 - T221368 (duration: 01m 01s)
  • 20:32 cdanis: cdanis@cumin1001.eqiad.wmnet ~ % sudo cumin 'scb*' 'enable-puppet "mobrovac: temp stop JQ for T221368"'
  • 20:31 mobrovac@deploy1001: Finished deploy [cpjobqueue/deploy@71941b1]: Ignore Kafka disconnect errors (duration: 00m 51s)
  • 20:30 mobrovac@deploy1001: Started deploy [cpjobqueue/deploy@71941b1]: Ignore Kafka disconnect errors
  • 19:36 cdanis: cdanis@cumin1001.eqiad.wmnet ~ % sudo cookbook sre.hosts.downtime -r "mobrovac: temp stop JQ for T221368" 'scb*'
  • 19:36 cdanis@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
  • 19:36 cdanis@cumin1001: START - Cookbook sre.hosts.downtime
  • 19:29 cdanis: cdanis@cumin1001.eqiad.wmnet ~ % sudo cumin 'scb*' 'disable-puppet "mobrovac: temp stop JQ for T221368" && systemctl stop cpjobqueue'
  • 19:17 mobrovac@deploy1001: Started restart [cpjobqueue/deploy@922cbc0]: Bounce CP4JQ, lots of transport broken failures - T221368
  • 19:11 mobrovac@deploy1001: Synchronized php-1.34.0-wmf.1/extensions/EventBus/includes/EventFactory.php: Remove the use of page titles in JobExecutor, file 2/2 - T221368 (duration: 00m 59s)
  • 19:10 mobrovac@deploy1001: Synchronized php-1.34.0-wmf.1/extensions/EventBus/includes/JobExecutor.php: Remove the use of page titles in JobExecutor, file 1/2 - T221368 (duration: 01m 01s)
  • 18:47 robh@cumin1001: END (PASS) - Cookbook sre.hosts.decommission (exit_code=0)
  • 18:47 robh@cumin1001: START - Cookbook sre.hosts.decommission
  • 18:47 robh@cumin1001: END (PASS) - Cookbook sre.hosts.decommission (exit_code=0)
  • 18:47 robh@cumin1001: START - Cookbook sre.hosts.decommission
  • 18:41 mutante: mw2150 - reimaging, not in confctl
  • 18:02 dzahn@puppetmaster1001: conftool action : set/pooled=yes; selector: name=mw2151.codfw.wmnet,cluster=jobrunner,service=nginx
  • 17:49 mutante: mw2151 - scap pull
  • 17:46 mobrovac@deploy1001: Synchronized php-1.34.0-wmf.1/extensions/EventBus/includes/JobExecutor.php: Default to a dummy title for invalid titles - T221368 (duration: 01m 01s)
  • 17:20 twentyafterfour@deploy1001: Synchronized php-1.34.0-wmf.1/extensions/AbuseFilter/includes/: sync https://gerrit.wikimedia.org/r/c/mediawiki/extensions/AbuseFilter/+/504863 (duration: 01m 00s)
  • 16:20 bblack: Experimental DNS-level changes deploying for wikipedia.org domain - if wikipedia.org DNS problems appear, revert https://gerrit.wikimedia.org/r/c/operations/dns/+/504588 - T208263
  • 16:17 XioNoX: remove peering to 63199 in eqsin (down for 1 month, no reply to emails)
  • 16:13 XioNoX: rollback dhcp option 82 test from asw2-b-eqiad
  • 14:55 fsero: synchronizing docker_registry_codfw swift container from docker_registry
  • 14:40 XioNoX: push firewall change to pfw3-eqiad - T221278
  • 13:30 jbond42: rolling updates of ruby2.1 on jessie
  • 13:08 elukey: roll restart of cassandra on aqs* to pick up new openjdk upgrades
  • 13:05 jmm@cumin2001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
  • 13:05 jmm@cumin2001: START - Cookbook sre.hosts.downtime
  • 12:58 reedy@deploy1001: rebuilt and synchronized wikiversions files: group1 back to .25
  • 12:36 anomie: Ran `php7adm /opcache-free` on mw1274 to test a theory related to T221347. The log entries related to that task stopped immediately.
  • 12:30 gehel: restarting blazegraph + updater on wdqs* for jvm upgrade
  • 12:22 moritzm: installing Java security updates on restbase-dev hosts (along with Cassandra restarts)
  • 12:21 gehel: restarting blazegraph + updater on wdqs1009 / wdqs1010 for jvm upgrade
  • 12:19 moritzm: installing Java security updates on WDQS autodeploy/test hosts
  • 10:40 jmm@cumin2001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
  • 10:40 jmm@cumin2001: START - Cookbook sre.hosts.downtime
  • 10:35 moritzm: installing rails security updates on jessie hosts
  • 10:21 moritzm: installing jasper updates on jessie hosts
  • 09:44 akosiaris: update grafana service/ dashboard to have user, system, throttled CPU metrics under the CPU saturation row
  • 09:41 gilles@deploy1001: Synchronized wmf-config/InitialiseSettings.php: T216597 Run CPU benchmark for all samples on eswiki/ruwiki (duration: 01m 06s)
  • 09:11 jmm@cumin2001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
  • 09:10 jmm@cumin2001: START - Cookbook sre.hosts.downtime
  • 09:00 jmm@cumin2001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
  • 09:00 jmm@cumin2001: START - Cookbook sre.hosts.downtime
  • 08:54 elukey@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
  • 08:54 elukey@cumin1001: START - Cookbook sre.hosts.downtime
  • 08:54 elukey@cumin1001: END (FAIL) - Cookbook sre.hosts.downtime (exit_code=99)
  • 08:54 elukey@cumin1001: START - Cookbook sre.hosts.downtime
  • 08:53 elukey: reboot kafka10[12-23] (old Analytics cluster) for kernel + openjdk upgrades
  • 08:23 gehel@cumin1001: END (PASS) - Cookbook sre.wdqs.data-transfer (exit_code=0)
  • 08:14 moritzm: installing libssh2 security updates on jessie
  • 08:01 moritzm: restarting mw1261-mw1265 to pick up new libssh2
  • 07:55 jmm@cumin2001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
  • 07:55 jmm@cumin2001: START - Cookbook sre.hosts.downtime
  • 07:53 filippo@puppetmaster1001: conftool action : set/pooled=yes; selector: name=prometheus2004.codfw.wmnet
  • 07:28 moritzm: installing libssh2 security updates
  • 07:19 gehel@cumin1001: START - Cookbook sre.wdqs.data-transfer
  • 06:58 moritzm: restarting icinga on icinga1001 (T196336)
  • 06:37 moritzm: rolling reboots of Swift backends in eqiad for combined kernel/glibc/OpenSSL update

2019-04-17

  • 22:46 krinkle@deploy1001: Synchronized php-1.34.0-wmf.1/includes/: I3a50508178159 (duration: 01m 21s)
  • 22:40 XioNoX: push firewall change to pfw3-codfw - T221278
  • 22:28 krinkle@deploy1001: Synchronized php-1.34.0-wmf.1/extensions/Score/: Id58156cfca805 / T219342 (duration: 01m 03s)
  • 21:30 XioNoX: enable option-82 on asw2-b:cloud-hosts1-b-eqiad vlan
  • 21:10 thcipriani: gerrit back
  • 21:07 thcipriani@deploy1001: Finished deploy [gerrit/gerrit@4dcb851]: Gerrit update (cobalt -- restart incoming) (duration: 00m 10s)
  • 21:07 thcipriani@deploy1001: Started deploy [gerrit/gerrit@4dcb851]: Gerrit update (cobalt -- restart incoming)
  • 21:06 thcipriani@deploy1001: Finished deploy [gerrit/gerrit@4dcb851]: Gerrit update (gerrit2001 only) (duration: 00m 11s)
  • 21:06 thcipriani@deploy1001: Started deploy [gerrit/gerrit@4dcb851]: Gerrit update (gerrit2001 only)
  • 19:14 twentyafterfour@deploy1001: Synchronized php: group1 wikis to 1.34.0-wmf.1 refs T220726 (duration: 01m 49s)
  • 19:13 twentyafterfour@deploy1001: rebuilt and synchronized wikiversions files: group1 wikis to 1.34.0-wmf.1 refs T220726
  • 18:04 thcipriani: gerrit back
  • 18:01 thcipriani: gerrit restart for https://gerrit.wikimedia.org/r/504611/
  • 17:59 jforrester@deploy1001: Synchronized wmf-config/InitialiseSettings.php: SDC: Enable Wikidata federation on Commons again T214075 (duration: 01m 00s)
  • 17:20 otto@deploy1001: Synchronized wmf-config/InitialiseSettings.php: Enabling EventGate api-request logging on group1 wikis (duration: 01m 00s)
  • 17:18 mutante: LDAP - added 'brennen' to group 'gerritadmin' (T218858)
  • 17:18 jforrester@deploy1001: Synchronized php-1.34.0-wmf.1/extensions/OATHAuth/: UBN T221257 train un-blocker (duration: 01m 02s)
  • 17:09 jforrester@deploy1001: Synchronized php-1.34.0-wmf.1/extensions/Echo/includes/formatters/: Notifications: Revert 7121b9c4 per I8f9a6a19ba (duration: 01m 01s)
  • 16:49 tzatziki: deleting three files for legal compliance
  • 16:47 jforrester@deploy1001: Synchronized php-1.34.0-wmf.1/extensions/WikibaseMediaInfo/: SDC: Various fixes T218922 T221071 T221110 T221123 (duration: 01m 02s)
  • 16:41 jforrester@deploy1001: Synchronized php-1.34.0-wmf.1/autoload.php: Update to point to new maintenance scripts (duration: 01m 00s)
  • 16:39 jforrester@deploy1001: Synchronized php-1.34.0-wmf.1/maintenance/language/generateUpperCharTable.php: Maintenance script for _joe_ (duration: 00m 59s)
  • 16:38 jforrester@deploy1001: Synchronized php-1.34.0-wmf.1/maintenance/language/generateUcfirstOverrides.php: Maintenance script for _joe_ (duration: 01m 00s)
  • 16:21 jforrester@deploy1001: Synchronized php-1.34.0-wmf.1/languages/Language.php: T219279 Ability to set wgOverrideUcfirstCharacters part 1 try two (duration: 01m 00s)
  • 16:18 jforrester@deploy1001: Synchronized php-1.34.0-wmf.1/includes/DefaultSettings.php: T219279 Ability to set wgOverrideUcfirstCharacters part 1b (duration: 01m 03s)
  • 16:13 jforrester@deploy1001: scap failed: average error rate on 4/11 canaries increased by 10x (rerun with --force to override this check, see https://logstash.wikimedia.org/goto/db09a36be5ed3e81155041f7d46ad040 for details)
  • 16:11 XioNoX: set fasw-c-eqiad:ge-[0-1]/0/17 in admin vlan - T221232
  • 16:09 jforrester@deploy1001: Synchronized wmf-config/InitialiseSettings.php: SWAT T220434 Deploy Partial blocks to Chinese Wikipedia (duration: 01m 02s)
  • 14:37 ariel@deploy1001: Finished deploy [dumps/dumps@dcf04a0]: fix up paths for 1.34_wmf.1 for AbstractFilter (duration: 00m 04s)
  • 14:36 ariel@deploy1001: Started deploy [dumps/dumps@dcf04a0]: fix up paths for 1.34_wmf.1 for AbstractFilter
  • 14:35 otto@deploy1001: scap-helm eventgate-analytics finished
  • 14:35 otto@deploy1001: scap-helm eventgate-analytics cluster eqiad completed
  • 14:35 otto@deploy1001: scap-helm eventgate-analytics upgrade production -f eventgate-analytics-eqiad-values.yaml --reset-values stable/eventgate-analytics [namespace: eventgate-analytics, clusters: eqiad]
  • 14:34 otto@deploy1001: scap-helm eventgate-analytics finished
  • 14:34 otto@deploy1001: scap-helm eventgate-analytics cluster codfw completed
  • 14:34 otto@deploy1001: scap-helm eventgate-analytics upgrade production -f eventgate-analytics-codfw-values.yaml --reset-values stable/eventgate-analytics [namespace: eventgate-analytics, clusters: codfw]
  • 14:13 elukey@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
  • 14:12 elukey@cumin1001: START - Cookbook sre.hosts.downtime
  • 13:56 otto@deploy1001: scap-helm eventgate-analytics finished
  • 13:56 otto@deploy1001: scap-helm eventgate-analytics cluster staging completed
  • 13:56 otto@deploy1001: scap-helm eventgate-analytics upgrade staging -f eventgate-analytics-staging-values.yaml --reset-values stable/eventgate-analytics [namespace: eventgate-analytics, clusters: staging]
  • 13:52 elukey: upgrading hadoop cdh distrubition to 5.16.1 on all the Hadoop-related nodes - T218343
  • 13:48 elukey@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
  • 13:48 elukey@cumin1001: START - Cookbook sre.hosts.downtime
  • 13:48 godog: reimage prometheus2004 - T187987
  • 12:57 filippo@puppetmaster1001: conftool action : set/pooled=yes; selector: name=prometheus1004.eqiad.wmnet
  • 12:44 godog: bounce prometheus instances on prometheus[12]003 after https://gerrit.wikimedia.org/r/c/operations/puppet/+/499742
  • 12:33 moritzm: running some ferm tests on graphite2002
  • 12:10 godog: briefly stop all prometheus on prometheus1003 to finish metrics rsync - T187987
  • 11:39 Lucas_WMDE: EU SWAT done
  • 11:38 lucaswerkmeister-wmde@deploy1001: Synchronized wmf-config/InitialiseSettings.php: SWAT: Enable suggestion constraint status on testwikidata (T221108, T204439)|gerrit:504380Enable suggestion constraint status on testwikidata (T221108, T204439) (duration: 01m 01s)
  • 10:58 volans@deploy1001: Finished deploy [debmonitor/deploy@f049b3b]: Deploy Debmonitor v0.1.9 (duration: 01m 00s)
  • 10:57 volans@deploy1001: Started deploy [debmonitor/deploy@f049b3b]: Deploy Debmonitor v0.1.9
  • 10:40 moritzm: installing Java security updates on kafka/analytics cluster
  • 09:17 godog: swift eqiad-prod continue ms-be1013 decom - T220590
  • 09:09 elukey: restart eventlogging on eventlog1002 due to errors in processors and consumer lag accumulated after the last Kafka Jumbo roll restart
  • 08:47 godog: reimage prometheus1004 - T187987
  • 08:38 jynus@deploy1001: Synchronized wmf-config/db-eqiad.php: Repool db1078 fully (duration: 01m 00s)
  • 08:29 moritzm: installing ghostscript security updates
  • 07:51 gilles@deploy1001: Synchronized php-1.33.0-wmf.25/extensions/NavigationTiming: T216597 Event timing support (duration: 01m 01s)
  • 07:45 gilles@deploy1001: Synchronized wmf-config/InitialiseSettings.php: T216597 Enable Event Timing origin trial on ruwiki and eswiki (duration: 01m 04s)
  • 07:21 jynus@deploy1001: Synchronized wmf-config/db-eqiad.php: Repool db1078 with low load (duration: 01m 18s)
  • 07:07 moritzm: rolling reboots of Swift backends in codfw for combined kernel/glibc/OpenSSL update

2019-04-16

  • 23:42 ebernhardson@deploy1001: Synchronized wmf-config/InitialiseSettings.php: Return CirrusSearch to standard execution against eqiad cluster (duration: 01m 00s)
  • 23:37 ebernhardson@deploy1001: Synchronized php-1.33.0-wmf.25/extensions/CirrusSearch/includes/: Fix fatals on malformed search queries against overridden clusters (duration: 01m 06s)
  • 22:42 thcipriani: gerrit back
  • 22:39 thcipriani: restarting gerrit for configuration update https://gerrit.wikimedia.org/r/504448
  • 22:24 jforrester@deploy1001: Synchronized wmf-config/InitialiseSettings.php: T165795 Give bureaucrats the usermerge right (duration: 00m 59s)
  • 22:20 jforrester@deploy1001: Synchronized php-1.34.0-wmf.1/extensions/NewUserMessage/includes/NewUserMessage.php: Disable onLocalUserCreated for known bot accounts (duration: 01m 01s)
  • 22:17 mobrovac@deploy1001: Finished deploy [restbase/deploy@f1c767d]: mobile-sections simplification: use the key/value bucket only - T215960 (duration: 20m 02s)
  • 22:09 jforrester@deploy1001: Synchronized wmf-config/InitialiseSettings.php: T165795 Enable the UserMerge extension for clean-up on wikitech (duration: 01m 00s)
  • 21:57 mobrovac@deploy1001: Started deploy [restbase/deploy@f1c767d]: mobile-sections simplification: use the key/value bucket only - T215960
  • 21:56 eileen: civicrm revision changed from 1bc1570967 to 31982324b8, config revision is e5a7908330
  • 21:56 mobrovac@deploy1001: Finished deploy [restbase/deploy@f1c767d] (dev-cluster): mobile-sections simplification: use the key/value bucket only (duration: 05m 24s)
  • 21:50 mobrovac@deploy1001: Started deploy [restbase/deploy@f1c767d] (dev-cluster): mobile-sections simplification: use the key/value bucket only
  • 21:47 twentyafterfour@deploy1001: rebuilt and synchronized wikiversions files: group0 wikis to 1.34.0-wmf.1 refs T220726
  • 21:24 andrewbogott: deleting 'eqiad' endpoint in keystone
  • 21:21 twentyafterfour@deploy1001: Finished scap: testwikis wikis to 1.34.0-wmf.1 refs T220726 (duration: 36m 47s)
  • 21:09 XioNoX: add wpao to wmf/ops in LDAP - T221142
  • 21:02 cdanis@puppetmaster1001: conftool action : set/pooled=yes; selector: name=mw1280.eqiad.wmnet
  • 20:59 otto@deploy1001: scap-helm eventgate-analytics finished
  • 20:58 otto@deploy1001: scap-helm eventgate-analytics cluster staging completed
  • 20:58 otto@deploy1001: scap-helm eventgate-analytics upgrade staging -f eventgate-analytics-staging-values.yaml --reset-values stable/eventgate-analytics [namespace: eventgate-analytics, clusters: staging]
  • 20:55 andrewbogott: removing keystone endpoints for the 'eqiad' region
  • 20:45 twentyafterfour@deploy1001: Started scap: testwikis wikis to 1.34.0-wmf.1 refs T220726
  • 20:43 mobrovac@deploy1001: Finished deploy [restbase/deploy@dfca9e6]: Use the simplified key/value bucket - T215960 (duration: 19m 52s)
  • 20:43 otto@deploy1001: scap-helm eventgate-analytics finished
  • 20:42 otto@deploy1001: scap-helm eventgate-analytics cluster staging completed
  • 20:42 otto@deploy1001: scap-helm eventgate-analytics upgrade staging -f eventgate-analytics-staging-values.yaml --reset-values --set debug_mode_enabled=true stable/eventgate-analytics [namespace: eventgate-analytics, clusters: staging]
  • 20:23 mobrovac@deploy1001: Started deploy [restbase/deploy@dfca9e6]: Use the simplified key/value bucket - T215960
  • 20:19 ariel@deploy1001: Finished deploy [dumps/dumps@796ccb5]: use safe_load yaml and getReplicaServer.php, cleanup symlinks once per job only (duration: 00m 04s)
  • 20:19 ariel@deploy1001: Started deploy [dumps/dumps@796ccb5]: use safe_load yaml and getReplicaServer.php, cleanup symlinks once per job only
  • 20:11 mobrovac@deploy1001: Finished deploy [restbase/deploy@dfca9e6] (dev-cluster): Use the simplified key/value bucket (duration: 05m 24s)
  • 20:05 mobrovac@deploy1001: Started deploy [restbase/deploy@dfca9e6] (dev-cluster): Use the simplified key/value bucket
  • 20:04 otto@deploy1001: scap-helm eventgate-analytics finished
  • 20:04 otto@deploy1001: scap-helm eventgate-analytics cluster staging completed
  • 20:04 otto@deploy1001: scap-helm eventgate-analytics upgrade staging -f eventgate-analytics-staging-values.yaml --reset-values --set debug_mode_enabled=true stable/eventgate-analytics [namespace: eventgate-analytics, clusters: staging]
  • 19:59 otto@deploy1001: scap-helm eventgate-analytics finished
  • 19:59 otto@deploy1001: scap-helm eventgate-analytics cluster staging completed
  • 19:59 otto@deploy1001: scap-helm eventgate-analytics upgrade staging -f eventgate-analytics-staging-values.yaml --reset-values --set debug_mode_enabled=true stable/eventgate-analytics [namespace: eventgate-analytics, clusters: staging]
  • 19:56 gehel: restarting cassandra on maps* for config change - T221055
  • 19:49 otto@deploy1001: scap-helm eventgate-analytics finished
  • 19:49 otto@deploy1001: scap-helm eventgate-analytics cluster staging completed
  • 19:49 otto@deploy1001: scap-helm eventgate-analytics upgrade staging -f eventgate-analytics-staging-values.yaml --reset-values --set debug_mode_enabled=true stable/eventgate-analytics [namespace: eventgate-analytics, clusters: staging]
  • 19:48 otto@deploy1001: scap-helm eventgate-analytics finished
  • 19:48 otto@deploy1001: scap-helm eventgate-analytics cluster staging completed
  • 19:48 otto@deploy1001: scap-helm eventgate-analytics upgrade staging -f eventgate-analytics-staging-values.yaml --reset-values --set main_app.debug_mode_enabled=true stable/eventgate-analytics [namespace: eventgate-analytics, clusters: staging]
  • 19:11 twentyafterfour: twentyafterfour@deploy1001:/srv/mediawiki-staging$ scap prep 1.34.0-wmf.1
  • 19:07 bblack: restarting varnish backend on cp1083
  • 19:04 bblack: restarting varnish backend on cp1085
  • 18:55 cdanis: cdanis@cp1085.eqiad.wmnet ~ % sudo -i depool
  • 18:53 otto@deploy1001: scap-helm eventgate-analytics finished
  • 18:53 otto@deploy1001: scap-helm eventgate-analytics cluster staging completed
  • 18:53 otto@deploy1001: scap-helm eventgate-analytics upgrade staging -f eventgate-analytics-staging-values.yaml --reset-values --set main_app.profiling_enabled=true stable/eventgate-analytics [namespace: eventgate-analytics, clusters: staging]
  • 18:46 twentyafterfour: branching 1.34.0-wmf.1 refs T220726
  • 18:25 otto@deploy1001: scap-helm eventgate-analytics finished
  • 18:25 otto@deploy1001: scap-helm eventgate-analytics cluster staging completed
  • 18:25 otto@deploy1001: scap-helm eventgate-analytics upgrade staging -f eventgate-analytics-staging-values.yaml --reset-values --set wmfdebug_enabled=true stable/eventgate-analytics [namespace: eventgate-analytics, clusters: staging]
  • 18:14 cmjohnson1: powering off mw1280 to replace DIMM
  • 18:08 mutante: restbase2007, restbase2008 - re-enabled puppet which was disabled with reason 'decom'ed' but actually needed to run to decom after they had moved to role::spare::system (T208087)
  • 17:56 reedy@deploy1001: Synchronized php-1.33.0-wmf.25/extensions/WikimediaIncubator/: T220623 (duration: 00m 53s)
  • 17:47 herron: beginning rolling ELK upgrade to 5.6.15
  • 17:46 lucaswerkmeister-wmde@deploy1001: Synchronized wmf-config/InitialiseSettings-labs.php: no-op preparatory change (T221107)|gerrit:504386no-op preparatory change (T221107) (duration: 00m 52s)
  • 17:36 arturo: toolforge k8s reallocation (from nova-network to neutron) is causing troubles with IRC bots, expect missing entries in the SAL
  • 17:28 aborrero@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
  • 17:28 aborrero@cumin1001: START - Cookbook sre.hosts.downtime
  • 17:27 andrewbogott: restarting rabbitmq on cloudcontrol1003
  • 17:26 cdanis@puppetmaster1001: conftool action : set/pooled=no; selector: dc=eqiad,name=mw1280.eqiad.wmnet,cluster=api_appserver
  • 17:25 arturo: rebooted cloudnet1003
  • 17:24 gehel: force initialization of unassigned shards on elasticsearch eqiad
  • 17:16 lucaswerkmeister-wmde@deploy1001: Synchronized wmf-config/InitialiseSettings.php: no-op preparatory change (T221108)|gerrit:504374no-op preparatory change (T221108) (duration: 00m 52s)
  • 16:54 Lucas_WMDE: lucaswerkmeister-wmde@mwmaint1002:~$ mwscript extensions/WikibaseQualityConstraints/maintenance/ImportConstraintEntities.php --wiki=testwikidatawiki --config-format=wgConf | tee T221108.php
  • 16:53 mutante: bast2001 - shutdown -h now - decom'ed (T219492)
  • 16:48 mutante: puppet node clean bast2001.wikimedia.org ; puppet node deactivate bast2001.wikimedia.org ; it showed up in Icinga again despite running decom cookbook (T219492)
  • 16:47 otto@deploy1001: scap-helm eventgate-analytics finished
  • 16:47 otto@deploy1001: scap-helm eventgate-analytics cluster staging completed
  • 16:47 otto@deploy1001: scap-helm eventgate-analytics upgrade staging -f eventgate-analytics-staging-values.yaml --reset-values --set wmfdebug_enabled=true stable/eventgate-analytics [namespace: eventgate-analytics, clusters: staging]
  • 16:44 otto@deploy1001: scap-helm eventgate-analytics finished
  • 16:44 otto@deploy1001: scap-helm eventgate-analytics cluster staging completed
  • 16:44 otto@deploy1001: scap-helm eventgate-analytics upgrade staging -f eventgate-analytics-staging-values.yaml --reset-values stable/eventgate-analytics [namespace: eventgate-analytics, clusters: staging]
  • 16:43 jynus: upgrading and shutting down db1078 T219115
  • 16:41 jynus: disabling notifications on db1078 T219115
  • 16:37 jynus@deploy1001: Synchronized wmf-config/db-eqiad.php: Depool db1078 (duration: 00m 52s)
  • 15:36 arturo: reimaging cloudnet2002-dev because role name change
  • 15:21 otto@deploy1001: scap-helm eventgate-analytics finished
  • 15:21 otto@deploy1001: scap-helm eventgate-analytics cluster staging completed
  • 15:20 otto@deploy1001: scap-helm eventgate-analytics upgrade staging --version 0.0.28 -f eventgate-analytics-staging-values.yaml --reset-values stable/eventgate-analytics [namespace: eventgate-analytics, clusters: staging]
  • 15:19 otto@deploy1001: scap-helm eventgate-analytics finished
  • 15:19 otto@deploy1001: scap-helm eventgate-analytics cluster staging completed
  • 15:19 otto@deploy1001: scap-helm eventgate-analytics upgrade staging -f eventgate-analytics-staging-values.yaml --reset-values stable/eventgate-analytics [namespace: eventgate-analytics, clusters: staging]
  • 15:18 otto@deploy1001: scap-helm eventgate-analytics finished
  • 15:18 otto@deploy1001: scap-helm eventgate-analytics cluster staging completed
  • 15:18 otto@deploy1001: scap-helm eventgate-analytics upgrade staging -f eventgate-analytics-staging-values.yaml --reset-values stable/eventgate-analytics [namespace: eventgate-analytics, clusters: staging]
  • 15:16 elukey: roll restart kafka on kafka-jumbo100[1-6] to pick up openjdk upgrades
  • 14:58 gehel: manual data transfer from wdqs1008 to wdqs1009 - T220830
  • 14:56 ema: swift-fe-eqiad: nginx reload for new TLS certificate T204245
  • 14:53 gehel@cumin1001: END (PASS) - Cookbook sre.wdqs.data-transfer (exit_code=0)
  • 14:52 gehel@cumin1001: START - Cookbook sre.wdqs.data-transfer
  • 14:51 ema@puppetmaster1001: conftool action : set/pooled=yes; selector: name=ms-fe1005.eqiad.wmnet
  • 14:45 ema: test https://gerrit.wikimedia.org/r/504340 on ms-fe1005 T204245
  • 14:30 ema: swift-fe-codfw: nginx reload for new TLS certificate T204245
  • 14:22 gehel@cumin1001: END (PASS) - Cookbook sre.wdqs.data-transfer (exit_code=0)
  • 14:21 gehel@cumin1001: START - Cookbook sre.wdqs.data-transfer
  • 14:20 elukey: roll restart of all the druid daemons on druid100[1-6] to pick up new openjdk updates
  • 14:17 ema@puppetmaster1001: conftool action : set/pooled=yes; selector: name=ms-fe2005.codfw.wmnet
  • 14:07 jijiki: Pooling thumbor1001
  • 14:04 ema: test https://gerrit.wikimedia.org/r/#/c/operations/puppet/+/504331/ on ms-fe2005 T204245
  • 14:01 ema@puppetmaster1001: conftool action : set/pooled=no; selector: name=ms-fe2005.codfw.wmnet
  • 14:01 jijiki: Depooling thumbor1001
  • 13:58 jijiki: Disable puppet on thumbor1001 for ~24h to serve traffic via haproxy - T187765
  • 13:54 gehel@cumin1001: END (FAIL) - Cookbook sre.wdqs.data-transfer (exit_code=99)
  • 13:53 gehel@cumin1001: START - Cookbook sre.wdqs.data-transfer
  • 13:52 jijiki: Enable puppet on thumbor*
  • 13:42 gehel@cumin1001: END (FAIL) - Cookbook sre.wdqs.data-transfer (exit_code=99)
  • 13:41 gehel@cumin1001: START - Cookbook sre.wdqs.data-transfer
  • 13:39 gehel: restetting cookbooks repo on cumin1001 (local changes)
  • 13:34 jijiki: Disabling puppet on thumbor* to merge 504284
  • 13:13 ema: cp-ats: upgrade fifo-log-demux to 0.2 and restart services
  • 13:10 ema: fifo-log-demux 0.2 uploaded to stretch-wikimedia
  • 13:03 arturo: T220095 renaming/reimaging labtestcontrol2003 as cloudcontrol2003-dev
  • 12:58 moritzm: installing ghostscript update on thumbor1001
  • 12:54 gehel: cleanup redundant prometheus-elasticsearch units on elasticsearch servers
  • 12:52 godog: swift eqiad-prod continue ms-be1013 decom - T220590
  • 12:17 moritzm: installing OpenSSL 1.0.2 updates on cp* Varnish hosts
  • 12:07 arturo: rebooting cloudvirt200[123]-dev because deep changes in config
  • 11:18 hoo@deploy1001: Synchronized wmf-config/InitialiseSettings.php: Add wgWikibaseMusicalNotationLineWidthInches to config (T218191) (duration: 00m 52s)
  • 11:10 hoo@deploy1001: Synchronized wmf-config/Wikibase.php: Revert "WikibaseClient: Conditionally enable mapframe support" (T218051) (duration: 00m 51s)
  • 11:08 hoo@deploy1001: Synchronized wmf-config/InitialiseSettings.php: Enable signatures in 2019: NS (ID 128) for wikimaniawiki (T221062) (duration: 00m 52s)
  • 10:49 gilles: T221065 eswiki purge finished
  • 10:45 moritzm: installing libjs-bootstrap updates from Stretch point release
  • 10:21 gilles: T221065 mwscript purgeList.php eswiki --all --verbose on mwmaint1002
  • 10:21 moritzm: installing xapian-core update from stretch point release
  • 10:18 gilles@deploy1001: Synchronized wmf-config/InitialiseSettings.php: T221065 Set up origin trials on Spanish Wikipedia mobile site (duration: 00m 52s)
  • 09:59 jijiki: Enabling puppet again on on dbproxy* and thumbor*
  • 09:51 jynus@deploy1001: Synchronized wmf-config/db-eqiad.php: Reduce db1078 load (duration: 00m 53s)
  • 09:37 jijiki: Disabling puppet on dbproxy* and thumbor* to merge 502972
  • 09:26 fsero: [late logging] swift container-to-container synchronization enabled between docker_registry_eqiad and docker_registry_codfw swift containers at 08:15:00 UTC
  • 09:05 ema@puppetmaster1001: conftool action : set/pooled=yes; selector: name=cp1076.eqiad.wmnet,service=varnish-fe
  • 09:05 ema@puppetmaster1001: conftool action : set/pooled=yes; selector: name=cp1076.eqiad.wmnet,service=nginx
  • 09:05 ema: cp1076: repool varnish-fe pointing to Varnish T213263
  • 08:57 ema@puppetmaster1001: conftool action : set/pooled=no; selector: name=cp1076.eqiad.wmnet,service=varnish-fe
  • 08:57 ema@puppetmaster1001: conftool action : set/pooled=no; selector: name=cp1076.eqiad.wmnet,service=nginx
  • 08:57 ema: cp1076: depool varnish-fe in preparation of traffic switchback to Varnish T213263
  • 08:40 hoo: Updated the Wikidata property suggester with data from the 2019-04-08 JSON dump and applied the T132839 workarounds
  • 08:33 moritzm: rebooting ms-be1020 for combined kernel/glibc/OpenSSL update
  • 08:01 moritzm: rebooting Swift frontends in codfw for combined kernel/glibc/OpenSSL security updates
  • 07:50 ema@puppetmaster1001: conftool action : set/pooled=yes; selector: name=cp2002.codfw.wmnet,service=varnish-fe
  • 07:50 ema@puppetmaster1001: conftool action : set/pooled=yes; selector: name=cp2002.codfw.wmnet,service=nginx
  • 07:50 ema: cp2002: repool varnish-fe pointing to Varnish T213263
  • 07:47 moritzm: rebooting Swift frontends in eqiad combined kernel/glibc/OpenSSL security updates
  • 07:45 ema@puppetmaster1001: conftool action : set/pooled=no; selector: name=cp2002.codfw.wmnet,service=varnish-fe
  • 07:45 ema@puppetmaster1001: conftool action : set/pooled=no; selector: name=cp2002.codfw.wmnet,service=nginx
  • 07:45 ema: cp2002: depool varnish-fe in preparation of traffic switchback to Varnish T213263
  • 07:36 marostegui: Upgrade db2093
  • 07:33 ema@puppetmaster1001: conftool action : set/pooled=yes; selector: name=cp2005.codfw.wmnet,service=varnish-fe
  • 07:33 ema@puppetmaster1001: conftool action : set/pooled=yes; selector: name=cp2005.codfw.wmnet,service=nginx
  • 07:32 ema: cp2005: repool varnish-fe pointing to Varnish T213263
  • 07:25 ema@puppetmaster1001: conftool action : set/pooled=no; selector: name=cp2005.codfw.wmnet,service=varnish-fe
  • 07:25 ema@puppetmaster1001: conftool action : set/pooled=no; selector: name=cp2005.codfw.wmnet,service=nginx
  • 07:25 ema: cp2005: depool varnish-fe in preparation of traffic switchback to Varnish T213263
  • 07:11 moritzm: upgrading Java on Hadoop/Kafka/Jumbo/Druid clusters
  • 05:02 marostegui@deploy1001: Synchronized wmf-config/db-eqiad.php: Repool pc1007 (duration: 00m 31s)
  • 01:46 aaron@deploy1001: Synchronized php-1.33.0-wmf.25/includes/parser/Parser.php: 73529ae6c5ffb6 (duration: 00m 53s)
  • 00:34 onimisionipe: pooled maps2003 - postgres init complete!
  • 00:33 krinkle@deploy1001: Synchronized wmf-config/profiler.php: I7589aa153 (duration: 00m 52s)
  • 00:33 urandom: creating new restbase schema -- T221031

2019-04-15

  • 23:39 dzahn@cumin1001: END (PASS) - Cookbook sre.hosts.decommission (exit_code=0)
  • 23:39 dzahn@cumin1001: START - Cookbook sre.hosts.decommission
  • 23:20 cdanis: cdanis@icinga1001.wikimedia.org ~ % sudo systemctl restart tcpircbot-logmsgbot.service
  • 23:17 bd808: scap: SWAT: wikitech: Use cn:caseExactMatch: as account search filter|gerrit:497423wikitech: Use cn:caseExactMatch: as account search filter (T165795)
  • 20:59 thcipriani: gerrit back
  • 20:57 gehel: shutting down blazegraph and updater on wdqs1010, waiting for data reimport
  • 20:55 thcipriani: gerrit restart to pick up gc log changes incoming
  • 20:37 arlolra: Updated Parsoid to 83c17fc9
  • 20:23 Amir1: the ores deployment is over
  • 19:49 XioNoX: export BGP communities (prepend x3 outside asia) to AS3491 in eqsin
  • 19:46 mutante: bromine/vega: rm /etc/rsyncd.conf ; systemctl stop rsync (clean up old rsync config gerrit:503961)
  • 19:45 XioNoX: update (and add) AS3491 BGP communities in eqsin
  • 18:58 XioNoX: update mr1-* security policies - T219384
  • 18:41 onimisionipe: depooling maps2003 for psotgres init
  • 18:40 onimisionipe: pooling map2002 - postgres init complete
  • 18:39 Amir1: Morning SWAT is done
  • 18:35 shdubsh: logstash1009: disabling puppet and testing logstash config
  • 18:09 mutante: LDAP - adding legoktm and qchris to gerritadmin group (T219086)
  • 17:45 anomie: Backporting fix for T220991
  • 17:41 akosiaris: force puppet agent run on maps* after moving config-vars.yaml file for kartotherian, tilerator, tileratorui T220982
  • 17:33 mutante: LDAP - re-adding 'pbj' to 'nda' group, extended access until May 6th, transparency report contractor
  • 17:23 mutante: wikibugs - qdel'ed jobs and restarted another time, make it rejoin
  • 17:17 onimisionipe: wdqs deployment is complete! for some reasons I don't know scap did not logging here
  • 17:17 herron: restarted logstash on logstash1007
  • 17:15 mutante: restarted wikibugs because it stopped talking
  • 16:08 onimisionipe: pooling maps2001 - postgres reinit is complete
  • 15:55 Reedy: changed /srv/mediawiki/docroot/wikimedia.org to a symlink to standard-docroot
  • 15:53 XioNoX: add cloud-in4 firewall filter to codfw - T211921
  • 15:31 onimisionipe: restarting prometheus-wmf-elasticsearch-exporter-9* on all elastic nodes
  • 15:30 onimisionipe: restarting prometheus-wmf-elasticsearch-exporter-9200 on all elastic nodes
  • 15:28 _joe_: systemctl reset-failed on ms-be1027, debmonitor session
  • 15:24 Amir1: end of foreachwikiindblist wikidataclient extensions/Wikibase/lib/maintenance/populateSitesTable.php --force-protocol https (T219871)
  • 14:55 gehel: deploying tilerator to maps1001 to validate deployment is working - T220982
  • 14:55 Amir1: start of foreachwikiindblist wikidataclient extensions/Wikibase/lib/maintenance/populateSitesTable.php --force-protocol https (T219871)
  • 14:43 _joe_: running apply-config-tilerator on maps1001
  • 14:40 _joe_: running apply-config-karthoterian on maps1001
  • 14:22 cdanis: T220982 cdanis@cumin1001.eqiad.wmnet ~ % sudo cumin 'maps1*' 'sudo chmod -R a+r /srv/deployment/tilerator /srv/deployment/kartotherian'
  • 14:21 cdanis: cdanis@cumin1001.eqiad.wmnet ~ % sudo cumin 'maps1*' "disable-puppet 'bad permissions - T220982 - cdanis'"
  • 14:18 cdanis: cdanis@cumin1001.eqiad.wmnet ~ % sudo cumin 'maps*' 'sudo chmod -R a+r /srv/deployment/tilerator /srv/deployment/kartotherian'
  • 14:18 gehel: reseting permissions on maps server fir /srv/deployment/kartotherian and /srv/deplyoment/tilerator
  • 14:04 moritzm: rebooting ms-fe1005 for combined kernel/glibc/OpenSSL update
  • 13:57 jbond42: upgrading puppet 4 -> 5 and facter 2 -> 3 on mediawiki::canary_appserver, mediawiki::appserver::canary_api and cache::cache roles
  • 13:56 gehel: restart tilerator / kartotherian on all maps servers for openssl update
  • 13:55 godog: start ms-be1013 decom - T220590
  • 13:42 godog: reboot ms-be1013
  • 13:09 moritzm: installing wget security updates on trusty hosts
  • 12:59 moritzm: restarting archiva on archiva1001 for OpenJDK security update
  • 12:50 moritzm: restarting Apache on matomo1001 to pick up OpenSSL update
  • 12:14 moritzm: rolling restart of HHVM/Apache on deployment servers to pick up OpenSSL update
  • 11:59 fsero: pointing boron docker builds to the new registry temporarily (docker builds on boron might fail)
  • 11:35 Amir1: EU swat is done
  • 11:26 moritzm: rolling restart of HHVM/Apache on labweb* to pick up OpenSSL update
  • 09:58 moritzm: installing openssl1.0 security updates
  • 09:18 gehel: unbanning elastic1029 from cluster
  • 08:58 moritzm: updating mediawiki servers in eqiad to version 1.8.1 of the PHP extension for wikidiff
  • 08:29 onimisionipe: increase wal_keep_segments on codfw maps master
  • 08:19 moritzm: updating mediawiki servers in codfw to version 1.8.1 of the PHP extension for wikidiff
  • 07:50 Amir1: ladsgroup@mwmaint1002:~$ mwscript maintenance/initSiteStats.php --wiki=hywwiki --active (T220936)
  • 05:31 marostegui: Upgrade db1100
  • 05:07 marostegui: powercycle mw1280 (crashed)

2019-04-14

  • 06:10 ebernhardson: unban elastic1027 from eqiad-psi
  • 05:36 ebernhardson: unbanning elastic1027 after about half the shards left and load dropped
  • 05:31 ebernhardson: ban elastic1027 from elasticsearch-psi in eqiad
  • 04:59 ebernhardson: restart elasticsearch_6@production-searhc-psi-eqiad on elastic1027 due to 100% cpu for last 30+ minutes

2019-04-13

  • 18:46 godog: 3h downtime for cloudvirt1015
  • 15:58 ebernhardson: restart elasticsearch on elastic1027
  • 15:34 shdubsh: restart recommendation_api on scb1001
  • 15:33 shdubsh: restart recommendation_api on scb2001
  • 10:46 onimisionipe: depooling maps2001 for postgres init
  • 08:05 gehel: repooling wdqs1008 - data transfer completed - T220830
  • 00:32 krinkle@deploy1001: Synchronized php-1.33.0-wmf.25/includes/: Idc19cc29764a / T220854 - hot fix (duration: 05m 37s)

2019-04-12

  • 21:16 Krinkle: scap was unable to sync to 1 apache (connect to host cloudweb2001-dev.wikimedia.org port 22: Connection timed out)
  • 21:10 krinkle@deploy1001: Synchronized php-1.33.0-wmf.25/extensions/ImageMap/includes/ImageMap.php: I0ee84f059da / T217087 (duration: 05m 12s)
  • 19:27 dzahn@cumin1001: END (FAIL) - Cookbook sre.hosts.decommission (exit_code=99)
  • 19:27 dzahn@cumin1001: START - Cookbook sre.hosts.decommission
  • 19:24 dzahn@cumin1001: END (FAIL) - Cookbook sre.hosts.decommission (exit_code=99)
  • 19:24 dzahn@cumin1001: START - Cookbook sre.hosts.decommission
  • 18:59 dzahn@cumin1001: END (PASS) - Cookbook sre.hosts.decommission (exit_code=0)
  • 18:59 dzahn@cumin1001: START - Cookbook sre.hosts.decommission
  • 17:17 onimisionipe: depooling maps2002 for postgres init
  • 17:16 onimisionipe: repooling maps2001 - postgres init is complete
  • 16:14 elukey: install ifstat on all the mc1* hosts for network bandwidth investigation
  • 15:56 gehel: starting data trasnfer from wdqs1008 to wdqs1009 - T220830
  • 15:32 thcipriani: gerrit back
  • 15:29 thcipriani: gerrit restart incoming
  • 14:29 onimisionipe: depool maps2001 for postgres initialization
  • 13:24 akosiaris: re-enable puppet across the fleet. Patch merged, recovery storm coming
  • 13:18 akosiaris: disable puppet across the fleet to avoid incoming puppet alert storm
  • 12:57 marostegui: Purge old rows and optimize tables on spare host pc1010 T210725
  • 12:53 urandom: decommissioning cassandra-c, restbase2008 -- T208087
  • 12:49 gehel: rolling restart of cassandra on maps* for jvm upgrade
  • 12:22 arturo: T220095 disable icinga checks for labtestcontrol2003
  • 12:16 gilles@deploy1001: Synchronized wmf-config/InitialiseSettings.php: T220807 Reduce cawiki survey sampling rate (duration: 05m 11s)
  • 11:56 moritzm: upgrading app server canaries to version 1.8.1 of the PHP wikidiff extension (HHVM already deployed) T203069
  • 11:46 moritzm: upgrading acmechief hosts to latest buster state
  • 11:44 gilles@deploy1001: Synchronized wmf-config/InitialiseSettings.php: T220807 Oversample navtiming on cawiki and commonswiki (duration: 05m 14s)
  • 11:37 Trey314159: reindexing Greek, Turkish, and Irish wikis on elastic@eqiad and elastic@codfw complete (T217806)
  • 11:19 moritzm: installed Java security updates on relforge* hosts
  • 11:10 moritzm: installing Java security updates on remaining maps hosts
  • 10:32 arturo: T219626 reimaging cloudcontrol2001-dev
  • 10:13 elukey: matomo updated to 3.9.1 on matomo1001 + deb upload to wikimedia-stretch - T218037
  • 09:53 moritzm: updated mwdebug1001 to php-wikidiff 1.8.1
  • 09:37 moritzm: updated mwdebug1002 to php-wikidiff 1.8.1
  • 09:30 volans: reset mgmt card on labtestcontrol2003 - T220783
  • 09:07 moritzm: added the wikimedia repository key to the stretch build chroot on boron, fixes builds using the PHP72/SPICERACK hooks
  • 09:05 arturo: T218021 disable icinga checks for labtestcontrol2001
  • 08:35 gilles@deploy1001: Synchronized php-1.33.0-wmf.25/extensions/NavigationTiming/modules/ext.navigationTiming.js: T220788 Fix veaction === null case (duration: 00m 54s)
  • 08:02 moritzm: updated ssacli in thirdparty/hwraid component for stretch to 3.30-13.0 T220787
  • 07:12 marostegui: Manually install ssacli on db2[097|098|099|100|101|102] T220787 T220572
  • 07:04 moritzm: synced ssacli to thirdparty/hwraid components for jessie/stretch T220787
  • 01:00 mutante: puppet cert clean, puppet node clean, puppet node deactivate on cloudnet2001-dev.codfw.wmnet (T218025)
  • 00:25 tstarling@deploy1001: Synchronized wmf-config/profiler.php: increase excimer max depth (duration: 00m 53s)
  • 00:02 ejegg: updated fundraising CiviCRM from 24b968b1f9 to 1bc1570967

2019-04-11

  • 23:57 urandom: decommissioning cassandra-b, restbase2008 -- T208087
  • 22:15 jforrester@deploy1001: Synchronized php-1.33.0-wmf.25/extensions/WikibaseMediaInfo/resources/: Hot-deploy fix for WBMI variable cache miss T220665 (duration: 00m 55s)
  • 20:46 mutante: deleting job of wikibugs-phab-listener in an attempt to restart it
  • 19:47 cdanis: cdanis@mwdebug1001.eqiad.wmnet ~ % sudo systemctl stop hhvm && sudo rm /var/cache/hhvm/fcgi.hhbc.sq3 && sudo systemctl start hhvm
  • 19:39 twentyafterfour: mediawiki error rate seems to be back to normal after deploying 1.33.0-wmf.25, the new branch looks stable refs T206679
  • 18:55 mutante: disabling puppet on hosts using class 'confd' to safely deploy gerrit:456317
  • 18:55 Trey314159: reindexing Greek, Turkish, and Irish wikis on elastic@eqiad and elastic@codfw (T217806)
  • 18:01 onimisionipe: increase replication factor on maps codfw cluster
  • 17:45 onimisionipe@deploy1001: Finished deploy [kartotherian/deploy@5394b59] (stretch): Insert maps2001 into stretch environment (duration: 00m 22s)
  • 17:45 onimisionipe@deploy1001: Started deploy [kartotherian/deploy@5394b59] (stretch): Insert maps2001 into stretch environment
  • 17:22 mbsantos@deploy1001: Finished deploy [proton/deploy@5cb8bbe]: Update chromium-renderer to 8988283 (T213362, T216191, T212322) (duration: 01m 33s)
  • 17:21 mbsantos@deploy1001: Started deploy [proton/deploy@5cb8bbe]: Update chromium-renderer to 8988283 (T213362, T216191, T212322)
  • 16:25 otto@deploy1001: scap-helm eventgate-analytics upgrade staging -f eventgate-analytics-staging-values.yaml --reset-values stable/eventgate-analytics [namespace: eventgate-analytics, clusters: staging]
  • 15:48 otto@deploy1001: scap-helm eventgate-analytics finished
  • 15:47 otto@deploy1001: scap-helm eventgate-analytics cluster staging completed
  • 15:47 otto@deploy1001: scap-helm eventgate-analytics upgrade staging -f eventgate-analytics-staging-values.yaml --reset-values stable/eventgate-analytics [namespace: eventgate-analytics, clusters: staging]
  • 15:42 otto@deploy1001: scap-helm eventgate-analytics finished
  • 15:42 otto@deploy1001: scap-helm eventgate-analytics cluster staging completed
  • 15:42 otto@deploy1001: scap-helm eventgate-analytics upgrade staging -f eventgate-analytics-staging-values.yaml --reset-values stable/eventgate-analytics [namespace: eventgate-analytics, clusters: staging]
  • 15:41 otto@deploy1001: scap-helm eventgate-analytics upgrade staging -f eventgate-analytics-staging-values.yaml --reset-values stable/eventgate-analytics [namespace: eventgate-analytics, clusters: staging]
  • 15:36 onimisionipe@deploy1001: Finished deploy [kartotherian/deploy@13d9ebb] (stretch): Update stretch instance with latest code (duration: 00m 22s)
  • 15:35 onimisionipe@deploy1001: Started deploy [kartotherian/deploy@13d9ebb] (stretch): Update stretch instance with latest code
  • 15:23 lucaswerkmeister-wmde@deploy1001: Synchronized wmf-config/InitialiseSettings.php: no-op comment update|gerrit:503008no-op comment update (duration: 01m 00s)
  • 15:06 cdanis@puppetmaster1001: conftool action : set/pooled=true; selector: dnsdisc=kartotherian,name=codfw
  • 14:53 paravoid: rebooting labnet1002
  • 14:49 vgutierrez: uploaded acme-chief 0.16 to apt.wikimedia.org (buster) - T207461
  • 14:47 urandom: decommissioning cassandra-a, restbase2008 -- T208087
  • 14:46 akosiaris: cxserver Add gargage collections graphs under saturation. T205911
  • 14:18 Amir1: Deployment of Url shortener is done now
  • 14:15 ladsgroup@deploy1001: Synchronized wmf-config/InitialiseSettings.php: Deploy UrlShortener to metawiki, let's get the party started (T108557, T44085) (duration: 01m 00s)
  • 12:49 gehel@puppetmaster1001: conftool action : set/pooled=inactive; selector: dc=codfw,cluster=maps,name=maps2001.codfw.wmnet
  • 12:20 kartik@deploy1001: scap-helm cxserver finished
  • 12:19 kartik@deploy1001: scap-helm cxserver cluster eqiad completed
  • 12:19 kartik@deploy1001: scap-helm cxserver upgrade -f cxserver-eqiad-values.yaml production stable/cxserver [namespace: cxserver, clusters: eqiad]
  • 12:16 kartik@deploy1001: scap-helm cxserver finished
  • 12:16 kartik@deploy1001: scap-helm cxserver cluster codfw completed
  • 12:15 kartik@deploy1001: scap-helm cxserver upgrade -f cxserver-codfw-values.yaml production stable/cxserver [namespace: cxserver, clusters: codfw]
  • 12:12 kartik@deploy1001: scap-helm cxserver finished
  • 12:12 kartik@deploy1001: scap-helm cxserver cluster staging completed
  • 12:12 kartik@deploy1001: scap-helm cxserver upgrade -f cxserver-staging-values.yaml staging stable/cxserver [namespace: cxserver, clusters: staging]
  • 11:40 zeljkof: EU SWAT finished
  • 11:39 zfilipin@deploy1001: Synchronized wmf-config/InitialiseSettings.php: SWAT: Increase musical notation datatype string length limit (T218767)|gerrit:500692Increase musical notation datatype string length limit (T218767) (duration: 01m 02s)
  • 11:37 akosiaris@deploy1001: scap-helm cxserver finished
  • 11:36 akosiaris@deploy1001: scap-helm cxserver cluster codfw completed
  • 11:36 akosiaris@deploy1001: scap-helm cxserver upgrade -f cxserver-codfw-values.yaml production stable/cxserver [namespace: cxserver, clusters: codfw]
  • 11:30 onimisionipe: removing maps2002 from cassandra cluster due to dead node error
  • 10:46 moritzm: upgrading remaining app servers to HHVM 3.18.5+dfsg-1+wmf8+deb9u2 and wikidiff 1.8.1 (T203069)
  • 10:39 hashar: Upgrading CI Jenkins
  • 10:21 volans: forcing puppet run on A:cp-upload_codfw
  • 10:15 gehel: remove maps2001 from new cassandra cluster -T198622
  • 10:10 oblivian@puppetmaster1001: conftool action : set/pooled=false; selector: dnsdisc=kartotherian,name=codfw
  • 09:57 elukey: roll restart druid-coordinator/overlord on druid100[4-6] to pick up new jvm settings
  • 09:01 moritzm: deployment servers to HHVM 3.18.5+dfsg-1+wmf8+deb9u2 and wikidiff 1.8.1 (T203069)
  • 08:20 moritzm: upgrading remaining job runners to HHVM 3.18.5+dfsg-1+wmf8+deb9u2 and wikidiff 1.8.1 (T203069)
  • 08:19 elukey: roll restart of druid-broker/historical on druid100[4-6] to pick up new settings
  • 06:33 moritzm: uploaded jenkins 2.164.2 to apt.wikimedia.org (stretch-wikimedia / thirdparty/ci)
  • 06:32 moritzm: uploaded jenkins 2.164.2 to apt.wikimedia.org (jessie-wikimedia / thirdparty)
  • 06:24 moritzm: upgrading remaining API Servers to HHVM 3.18.5+dfsg-1+wmf8+deb9u2 and wikidiff 1.8.1 (T203069)
  • 05:03 marostegui@deploy1001: Synchronized wmf-config/db-eqiad.php: Remove s3 ready only T219115 (duration: 00m 36s)
  • 05:02 marostegui@deploy1001: Synchronized wmf-config/db-eqiad.php: Switchover s3 master eqiad from db1078 to db1075 T219115 (duration: 00m 36s)
  • 05:01 marostegui@deploy1001: Synchronized wmf-config/db-eqiad.php: Set s3 on read-only T219115 (duration: 00m 37s)
  • 05:00 marostegui: Starting s3 failover from db1078 to db1075 - T219115
  • 04:32 marostegui: Disable puppet on db1078 and db1075 T219115
  • 04:18 marostegui: Start topology changes to move s3 slaves under db1075 T219115
  • 04:14 marostegui: Disable GTID on s3 hosts - https://phabricator.wikimedia.org/T219115
  • 00:45 jforrester@deploy1001: Synchronized php-1.33.0-wmf.25/extensions/PageTriage/: UBN Fix for pageTriage and ORES T220649 (duration: 01m 04s)
  • 00:12 twentyafterfour: deploying phabricator upgrade

2019-04-10

  • 20:43 urandom: decommissioning cassandra-c, restbase2007 -- T208087
  • 20:27 otto@deploy1001: Synchronized wmf-config/InitialiseSettings.php: Revert - Enabling api-request logging via eventgate-analytics for group1 wikis - T214080 (duration: 01m 00s)
  • 19:48 otto@deploy1001: Synchronized wmf-config/InitialiseSettings.php: Enabling api-request logging via eventgate-analytics for group1 wikis - T214080 (duration: 00m 59s)
  • 19:42 twentyafterfour@deploy1001: Synchronized php: group1 wikis to 1.33.0-wmf.25 refs T206679 (duration: 01m 48s)
  • 19:40 twentyafterfour@deploy1001: rebuilt and synchronized wikiversions files: group1 wikis to 1.33.0-wmf.25 refs T206679
  • 19:28 twentyafterfour@deploy1001: rebuilt and synchronized wikiversions files: group0 wikis to 1.33.0-wmf.25 refs T206679
  • 19:26 XioNoX: enable sampling on cr2-eqiad external links, outbound
  • 19:17 twentyafterfour@deploy1001: Pruned MediaWiki: 1.33.0-wmf.20 [keeping static files] (duration: 02m 18s)
  • 19:14 ejegg: updated fundraising CiviCRM from d0e44a9e51 to 24b968b1f9
  • 19:08 twentyafterfour@deploy1001: Pruned MediaWiki: 1.33.0-wmf.19 [keeping static files] (duration: 02m 22s)
  • 17:44 twentyafterfour@deploy1001: Pruned MediaWiki: 1.33.0-wmf.18 [keeping static files] (duration: 02m 22s)
  • 16:58 chaomodus: restarted nagios-nrpe-server on proton1001 (it died due to OOM)
  • 16:51 elukey@puppetmaster1001: conftool action : set/pooled=no; selector: name=druid1004.eqiad.wmnet
  • 16:01 elukey: restart brokers on druid100[3-6] - locking after segments get deleted
  • 15:46 krinkle@deploy1001: Synchronized php-1.33.0-wmf.25/includes/parser/DateFormatter.php: Ib2b3fb / T220563 (duration: 01m 00s)
  • 15:28 gilles@deploy1001: Synchronized php-1.33.0-wmf.25/includes/media/ThumbnailImage.php: T216499 Only apply high priority hint half the time (duration: 00m 59s)
  • 15:26 oblivian@deploy1001: Finished deploy [docker-pkg/deploy@605690c]: Upgrade to docker-pkg 2.0.0 everywhere (duration: 00m 21s)
  • 15:26 oblivian@deploy1001: Started deploy [docker-pkg/deploy@605690c]: Upgrade to docker-pkg 2.0.0 everywhere
  • 15:24 jforrester@deploy1001: Synchronized php-1.33.0-wmf.25/extensions/Score/: UBN Revert Score changes that broke VE T220465 (duration: 01m 01s)
  • 15:19 oblivian@deploy1001: Finished deploy [docker-pkg/deploy@605690c]: Upgrade to docker-pkg 2.0.0 (duration: 00m 13s)
  • 15:19 oblivian@deploy1001: Started deploy [docker-pkg/deploy@605690c]: Upgrade to docker-pkg 2.0.0
  • 15:01 fsero: pooled back mwdebug200[1,2] T219989
  • 15:00 fsero: repooling mwdebug2002
  • 15:00 jijiki: Enable puppet on thumbor1001, switch back to nginx, pool thumbor1004 - T187765
  • 14:57 fsero: repooling mwdebug2001
  • 14:20 hashar: CI processing was a bit slower than usual over the past couple hours or so. It should be slightly faster now T220606
  • 14:13 joal@deploy1001: Finished deploy [analytics/aqs/deploy@fc1d232]: Deploying per-page limits for druid-endpoints (duration: 14m 41s)
  • 13:58 joal@deploy1001: Started deploy [analytics/aqs/deploy@fc1d232]: Deploying per-page limits for druid-endpoints
  • 13:47 fsero: resizing disk on mwdebug2002 T219989
  • 13:42 anomie@deploy1001: Synchronized wmf-config/InitialiseSettings.php: Setting actor migration to write-both/read-new on group0 (T188327) (duration: 01m 00s)
  • 13:19 marostegui: Deploy schema change on aawiki aawikibooks aawiktionary abwiki abwiktionary acewiki advisorswiki advisorywiki adywiki afwiki on x1 - T136427
  • 12:41 urandom: decommissioning cassandra-b, restbase2007 -- T208087
  • 12:40 hashar: contint2001: stopped puppet and zuul-merger for debugging
  • 12:17 jbond42: rolling security update of systemd on stretch systems
  • 12:07 Amir1: EU swat is done
  • 12:07 ladsgroup@deploy1001: Synchronized wmf-config/CommonSettings.php: SWAT: Prep work for deploying UrlShortener extension (T108557), part II (duration: 01m 00s)
  • 12:05 ladsgroup@deploy1001: Synchronized wmf-config/InitialiseSettings.php: SWAT: Prep work for deploying UrlShortener extension (T108557), part I (duration: 01m 00s)
  • 11:46 dcausse: elastisearch search cluster: reindexing zh-min-nan wikis (T219533)
  • 10:55 moritzm: upgrading nodejs on analytics-tool1002 to latest node 10 version from component/node10
  • 10:46 gilles: T220265 setZoneAccess on all wikis finished
  • 10:40 akosiaris: upgrade kubernetes-node on kubestage1002 (staging cluster) to 1.12.7-1 T220405
  • 10:33 moritzm: upgrading nodejs on aqs* to latest node 10 version from component/node10
  • 10:25 fsero: resizing disk on mwdebug2001 T219989
  • 10:17 akosiaris: upload kubernetes_1.12.7-1 to apt.wikimedia.org/stretch-wikimedia component main T220405
  • 10:14 marostegui@deploy1001: Synchronized wmf-config/db-eqiad.php: Repool db1064 T217453 (duration: 00m 59s)
  • 10:08 marostegui@deploy1001: Synchronized wmf-config/db-eqiad.php: Repool db1120 T217453 (duration: 01m 03s)
  • 09:59 moritzm: upgrading labweb hosts (wikitech) to HHVM 3.18.5+dfsg-1+wmf8+deb9u2 and wikidiff 1.8.1 (T203069)
  • 09:51 akosiaris: upgrade kubernetes-node on kubestage1001 (staging cluster) to 1.12.7-1 T220405
  • 09:50 moritzm: upgrading snapshot hosts to HHVM 3.18.5+dfsg-1+wmf8+deb9u2 and wikidiff 1.8.1 (T203069)
  • 09:40 akosiaris: upgrade kubernetes-master on neon (staging cluster) to 1.12.7-1 T220405
  • 09:40 akosiaris: upgrade kubernetes-master on neon (staging cluster) to 1.12.7-1
  • 09:05 moritzm: upgrading job runners mw1299-mw1311 to HHVM 3.18.5+dfsg-1+wmf8+deb9u2 and wikidiff 1.8.1 (T203069)
  • 08:56 elukey: restart druid-broker on druid100[4-6] - stuck after attempt datasource delete action
  • 08:46 godog: roll-restart swift frontends - T214289
  • 08:36 elukey: update thirdparty/cloudera packages to cdh 5.16.1 for jessie/stretch-wikimedia - T218343
  • 08:26 onimisionipe@deploy1001: Finished deploy [kartotherian/deploy@f7518bb] (stretch): Insert maps2003 into stretch environment (duration: 00m 22s)
  • 08:26 onimisionipe@deploy1001: Started deploy [kartotherian/deploy@f7518bb] (stretch): Insert maps2003 into stretch environment
  • 08:12 gilles: T220265 foreachwiki extensions/WikimediaMaintenance/filebackend/setZoneAccess.php --backend local-multiwrite
  • 07:22 mholloway-shell@deploy1001: Finished deploy [mobileapps/deploy@efd5bd5]: Revert "Bifurcate imageinfo queries to improve performance" (T220574) (duration: 04m 05s)
  • 07:18 mholloway-shell@deploy1001: Started deploy [mobileapps/deploy@efd5bd5]: Revert "Bifurcate imageinfo queries to improve performance" (T220574)
  • 07:12 onimisionipe: depooling maps200[34] to increase cassandra replication factor - T198622
  • 07:09 jijiki: Rolling restart thumbor service
  • 07:08 jijiki: Upgrading Thumbor servers to python-thumbor-wikimedia to 2.4-1+deb9u1
  • 06:59 marostegui: Deploy schema change on x1 master, with replication, lag will happen on x1 T217453
  • 06:59 marostegui@deploy1001: Synchronized wmf-config/db-eqiad.php: Depool x1 slaves T217453 (duration: 01m 13s)
  • 05:52 _joe_: setting both mwdebug200{1,2} to pooled = inactive to remove them from scap dsh list and allow deployments, T219989
  • 05:12 _joe_: same on mwdebug2001
  • 05:08 _joe_: removing hhvm cache on mwdebug2002
  • 00:37 Krinkle: last scap sync-file failed to mwdebug2002.codfw and mwdebug2001.codfw due to insufficient disk space
  • 00:20 krinkle@deploy1001: Synchronized php-1.33.0-wmf.25/resources/src/startup/: I3b9f1a13379a / Ie9db60e417cca (duration: 01m 01s)

2019-04-09

  • 23:14 twentyafterfour@deploy1001: Pruned MediaWiki: 1.33.0-wmf.17 [keeping static files] (duration: 06m 03s)
  • 22:31 twentyafterfour@deploy1001: Finished scap: testwikis wikis to 1.33.0-wmf.25 refs T206679 (duration: 39m 59s)
  • 22:19 chaomodus: uploaded python-pynetbox to apt.wikimedia.org/stretch-wikimedia (T217072)
  • 22:13 mobrovac@deploy1001: Finished deploy [restbase/deploy@c0a2977]: Bring RB on restbase20(19|20) up to date - T208087 (duration: 02m 32s)
  • 22:11 mobrovac@deploy1001: Started deploy [restbase/deploy@c0a2977]: Bring RB on restbase20(19|20) up to date - T208087
  • 21:57 twentyafterfour@deploy1001: Started scap: testwikis wikis to 1.33.0-wmf.25 refs T206679
  • 21:48 urandom: decommissioning cassandra-a, restbase2007 -- T208087
  • 19:46 herron: added myself to ldap group cn=archiva-deployers,ou=groups,dc=wikimedia,dc=org
  • 19:10 twentyafterfour: branching 1.33.0-wmf.25
  • 18:53 crusnov@deploy1001: Finished deploy [netbox/deploy@018d83e]: Minor fix to Netbox-Ganeti sync script (duration: 00m 52s)
  • 18:52 crusnov@deploy1001: Started deploy [netbox/deploy@018d83e]: Minor fix to Netbox-Ganeti sync script
  • 18:50 thcipriani: gerrit back
  • 18:48 thcipriani: gerrit restart
  • 18:48 thcipriani@deploy1001: Finished deploy [gerrit/gerrit@43d2d2e]: Gerrit update (cobalt) -- restart incoming (duration: 00m 10s)
  • 18:47 thcipriani@deploy1001: Started deploy [gerrit/gerrit@43d2d2e]: Gerrit update (cobalt) -- restart incoming
  • 18:46 thcipriani@deploy1001: Finished deploy [gerrit/gerrit@43d2d2e]: Gerrit update (gerrit2001 only) (duration: 00m 10s)
  • 18:46 thcipriani@deploy1001: Started deploy [gerrit/gerrit@43d2d2e]: Gerrit update (gerrit2001 only)
  • 18:42 volans: restart icinga on icinga1001 - T196336
  • 18:38 cdanis: T196336 cdanis@icinga1001$ sudo systemctl restart nsca
  • 18:27 crusnov@deploy1001: Finished deploy [netbox/deploy@4aa3e47]: Add node sync to Netbox-Ganeti sync script - T215229 (duration: 00m 57s)
  • 18:26 crusnov@deploy1001: Started deploy [netbox/deploy@4aa3e47]: Add node sync to Netbox-Ganeti sync script - T215229
  • 18:11 gilles@deploy1001: Finished deploy [performance/asoranking@4c83130]: (no justification provided) (duration: 00m 03s)
  • 18:11 gilles@deploy1001: Started deploy [performance/asoranking@4c83130]: (no justification provided)
  • 18:07 urandom: bootstrapping cassandra-c, restbase2020 -- T208087
  • 17:58 gilles@deploy1001: Finished deploy [performance/asoranking@4c83130]: (no justification provided) (duration: 00m 02s)
  • 17:58 gilles@deploy1001: Started deploy [performance/asoranking@4c83130]: (no justification provided)
  • 17:56 elukey: restart keyholder-agent on deploy1001 to pick up new settings for analytics (+ arm all the keys)
  • 17:42 gilles@deploy1001: Finished deploy [performance/asoranking@4c83130]: (no justification provided) (duration: 00m 04s)
  • 17:42 gilles@deploy1001: Started deploy [performance/asoranking@4c83130]: (no justification provided)
  • 17:42 elukey: restart keyholder-proxy.service on deploy1001 as attempt to reload perms for the analytics_deploy key
  • 17:37 gilles@deploy1001: Finished deploy [performance/asoranking@4c83130]: (no justification provided) (duration: 00m 10s)
  • 17:37 gilles@deploy1001: Started deploy [performance/asoranking@4c83130]: (no justification provided)
  • 17:19 bsitzmann@deploy1001: Finished deploy [mobileapps/deploy@b04c397]: Update mobileapps to 3edfcad (T220045 T219411 T219667) (duration: 03m 50s)
  • 17:15 bsitzmann@deploy1001: Started deploy [mobileapps/deploy@b04c397]: Update mobileapps to 3edfcad (T220045 T219411 T219667)
  • 17:14 twentyafterfour@deploy1001: Synchronized php-1.33.0-wmf.24/includes/export/WikiExporter.php: deploy https://gerrit.wikimedia.org/r/c/mediawiki/core/+/502537/1 (duration: 00m 51s)
  • 17:09 twentyafterfour@deploy1001: Synchronized php-1.33.0-wmf.24/includes/export/XmlDumpWriter.php: deploy https://gerrit.wikimedia.org/r/c/mediawiki/core/+/502538/1 (duration: 00m 52s)
  • 17:04 gilles@deploy1001: Synchronized php-1.33.0-wmf.24/includes/specials/SpecialUploadStash.php: T220265 Add support for X-Swift-Secret to upload stash (duration: 00m 53s)
  • 17:03 twentyafterfour: deploying https://gerrit.wikimedia.org/r/c/mediawiki/core/+/502538/1 and https://gerrit.wikimedia.org/r/c/mediawiki/core/+/502537/1
  • 17:01 arturo: T220426 reimaging+renaming labtestnet2002 to cloudweb2001-dev
  • 16:49 otto@deploy1001: scap-helm eventgate-analytics finished
  • 16:49 otto@deploy1001: scap-helm eventgate-analytics cluster eqiad completed
  • 16:49 otto@deploy1001: scap-helm eventgate-analytics upgrade production -f eventgate-analytics-eqiad-values.yaml --reset-values stable/eventgate-analytics [namespace: eventgate-analytics, clusters: eqiad]
  • 16:46 otto@deploy1001: scap-helm eventgate-analytics finished
  • 16:46 otto@deploy1001: scap-helm eventgate-analytics cluster codfw completed
  • 16:46 otto@deploy1001: scap-helm eventgate-analytics upgrade production -f eventgate-analytics-codfw-values.yaml --reset-values stable/eventgate-analytics [namespace: eventgate-analytics, clusters: codfw]
  • 16:45 otto@deploy1001: scap-helm eventgate-analytics finished
  • 16:45 otto@deploy1001: scap-helm eventgate-analytics cluster staging completed
  • 16:45 otto@deploy1001: scap-helm eventgate-analytics upgrade staging -f eventgate-analytics-staging-values.yaml --reset-values stable/eventgate-analytics [namespace: eventgate-analytics, clusters: staging]
  • 16:41 herron: performing rolling restart of kafka main brokers and eventbus instances in eqiad to pick up security updates
  • 16:32 otto@deploy1001: scap-helm eventgate-analytics finished
  • 16:32 otto@deploy1001: scap-helm eventgate-analytics cluster staging completed
  • 16:32 otto@deploy1001: scap-helm eventgate-analytics upgrade staging -f eventgate-analytics-staging-values.yaml --reset-values stable/eventgate-analytics [namespace: eventgate-analytics, clusters: staging]
  • 16:28 jijiki: Restarting thumbor service on thumbor1001
  • 16:26 jijiki: Upgrading thumbor1001 to python-thumbor-wikimedia_2.4-1+deb9u1
  • 16:18 jijiki: Uploading python-thumbor-wikimedia_2.4-1+deb9u1 to component/thumbor in stretch-wikimedia
  • 15:05 moritzm: uploaded jenkins 2.164.1 for stretch-wikimedia/thirdparty/ci
  • 15:04 moritzm: uploaded jenkins 2.164.1 for jessie-wikimedia/thirdparty
  • 14:42 ejegg: updated payments-wiki from 15bcb3d1a6 to aa8dad50e7
  • 14:10 ema: reboot lvs2010 with systemd 232 T209707
  • 14:09 godog: bootstrapping cassandra-b, restbase2020 -- T208087
  • 13:19 godog: bounce rsyslog on wezen
  • 13:11 fsero: building envoy docker image
  • 13:07 jbond42: rolling security updates of systemd on canary systems
  • 12:35 godog: bounce rsyslog on lithium
  • 12:13 elukey: powercycle logstash1012 - no ssh, no mgmt console available, seems completely stuck
  • 12:10 jbond42: remove facter2.4 from wikimedia-buster
  • 11:27 moritzm: upgrading API servers mw1276-mw1290 to HHVM 3.18.5+dfsg-1+wmf8+deb9u2 and wikidiff 1.8.1 (T203069)
  • 11:07 akosiaris: pool both DCs for newly created swift.recovery.wmnet RR
  • 11:07 akosiaris@puppetmaster1001: conftool action : set/pooled=true; selector: name=.*,dnsdisc=swift
  • 11:00 ema: rebooting lvs2010 with systemd 241-1~bpo9+1 T209707
  • 10:57 moritzm: updated buster installer to daily build from 9th of April
  • 10:09 godog: bootstrapping cassandra-a, restbase2020 -- T208087
  • 10:07 moritzm: rebooting stat1005 for some tests again
  • 09:49 gilles@deploy1001: Synchronized php-1.33.0-wmf.24/extensions/NavigationTiming: T220476 Add originCountry to paintTiming context (duration: 00m 54s)
  • 09:46 moritzm: rebooting stat1005 for some tests
  • 08:47 akosiaris: switch swift to be accessed from varnish+ats active/active rw
  • 08:38 marostegui@deploy1001: Synchronized wmf-config/db-eqiad.php: Remove old comment from db1089 (duration: 00m 51s)
  • 08:33 marostegui@deploy1001: Synchronized wmf-config/db-codfw.php: Repool db2069 (duration: 00m 50s)
  • 08:10 marostegui: Upgrade db2069
  • 08:10 marostegui@deploy1001: Synchronized wmf-config/db-codfw.php: Depool db2069 (duration: 00m 51s)
  • 07:52 moritzm: upgrading app servers mw1319-mw1333 to HHVM 3.18.5+dfsg-1+wmf8+deb9u2 and wikidiff 1.8.1 (T203069)
  • 07:49 marostegui@deploy1001: Synchronized wmf-config/db-eqiad.php: Deploy parsercache key change everywhere T210725 (duration: 00m 53s)
  • 07:37 moritzm: installing samba security updates
  • 07:21 marostegui: Change parsercache keys on mw[1230-1235,1238-1239] - T210725
  • 07:10 jijiki: Depool thumbor1004 for testing - T187765
  • 07:09 marostegui: Change parsercache keys on mw[1221-1229] - T210725
  • 07:03 marostegui: Change parsercache keys on mw[1280-1289] - T210725
  • 06:51 dcausse: elasticsearch search cluster: reindex all spaceless languages in eqiad and codfw (T219533)
  • 06:47 moritzm: installing libav security updates
  • 06:39 marostegui: Change parsercache keys on mw[1260-1269] - T210725
  • 06:30 marostegui: Change parsercache keys on mw[1270-1279] - T210725
  • 06:01 marostegui: Deploy parsercache key change on canaries only - T210725
  • 03:23 krinkle@deploy1001: Synchronized php-1.33.0-wmf.24/extensions/ExternalGuidance/extension.json: Id04a3a / T219841 (duration: 00m 52s)
  • 03:16 onimisionipe: depooled maps2003 - T219849
  • 02:47 onimisionipe: restarting tilerator on maps2003 - T219849
  • 02:40 krinkle@deploy1001: Synchronized php-1.33.0-wmf.24/extensions/ExternalGuidance/extension.json: I8614f6 / T219841 (duration: 00m 53s)
  • 01:27 eileen: civicrm revision changed from dfe89516b3 to d0e44a9e51, config revision is 2bcbf44521
  • 00:45 urandom: bootstrapping cassandra-c, restbase2019 -- T208087
  • 00:07 ebernhardson@deploy1001: Synchronized wmf-config/: T218716: Migrade configs to WikibaseCirrusSearch (duration: 00m 51s)

2019-04-08

  • 23:57 ebernhardson@deploy1001: Synchronized wmf-config/InitialiseSettings.php: T218954: Enable WBCS search on commons too (duration: 00m 50s)
  • 23:45 ebernhardson@deploy1001: Synchronized wmf-config: T218954: Disable wbcs dispatching query builder on commons (3/3) (duration: 00m 52s)
  • 23:41 ebernhardson@deploy1001: Synchronized wmf-config: T218954: Disable wbcs dispatching query builder on commons (3/3) (duration: 00m 51s)
  • 23:33 ebernhardson@deploy1001: Synchronized wmf-config/Wikibase.php: T218954: Disable wbcs dispatching query builder on commons (2/3) (duration: 00m 52s)
  • 23:10 ebernhardson@deploy1001: Synchronized wmf-config/: T218954: Disable wbcs dispatching query builder on commons (1/3) (duration: 00m 52s)
  • 22:45 XioNoX: rollback enable sampling on cr2-eqiad external links
  • 22:29 XioNoX: enable sampling on cr2-eqiad external links
  • 22:18 XioNoX: enable sampling on eqiad Telia transit link
  • 22:04 jforrester@deploy1001: Synchronized php-1.33.0-wmf.24/extensions/WikibaseMediaInfo/src/WikibaseMediaInfoHooks.php: WBMI T220277 (duration: 00m 57s)
  • 22:01 XioNoX: pfw firewall rules update - T217355
  • 20:49 bsitzmann@deploy1001: Finished deploy [mobileapps/deploy@c7fa522]: Update mobileapps to cdb9928 (T220045 T219411 T219667) (duration: 07m 55s)
  • 20:41 bsitzmann@deploy1001: Started deploy [mobileapps/deploy@c7fa522]: Update mobileapps to cdb9928 (T220045 T219411 T219667)
  • 20:24 urandom: bootstrapping cassandra-b, restbase2019 -- T208087
  • 20:08 bearND: mobileapps deploy failed on canary (Check 'endpoints' failed). Rolled back canary.
  • 20:08 bsitzmann@deploy1001: Finished deploy [mobileapps/deploy@c7fa522]: Update mobileapps to cdb9928 (T220045 T219411 T219667) (duration: 02m 10s)
  • 20:05 bsitzmann@deploy1001: Started deploy [mobileapps/deploy@c7fa522]: Update mobileapps to cdb9928 (T220045 T219411 T219667)
  • 19:59 marxarelli: promotion of 1.33.0-wmf.24 to all wikis completed. error rates nominal aside from usual timeouts. cc: T206678, T220037
  • 19:51 dduvall@deploy1001: rebuilt and synchronized wikiversions files: all wikis to 1.33.0-wmf.24
  • 19:48 marxarelli: promoting 1.33.0-wmf.24 to all wikis. cc: T220037, T206678
  • 19:41 dduvall@deploy1001: Synchronized php: group1 wikis to 1.33.0-wmf.24 (duration: 01m 46s)
  • 19:41 marxarelli: dduvall@deploy1001 rebuilt and synchronized wikiversions files: group1 wikis to 1.33.0-wmf.24
  • 19:41 marxarelli: dduvall@deploy1001 rebuilt and synchronized wikiversions files: group1 wikis to 1.33.0-wmf.2
  • 19:35 marxarelli: starting promotion of 1.33.0-wmf.24 to group1
  • 18:45 Lucas_WMDE: Morning SWAT done
  • 18:31 bblack: deploying wiktionary CNAME experiment - https://phabricator.wikimedia.org/T208263#5094712
  • 18:27 mobrovac@deploy1001: Finished deploy [restbase/deploy@9cf5364]: Lower AQS rate limits and fix recommendation-api spec - T219910 T220221 (duration: 21m 14s)
  • 18:27 otto@deploy1001: Synchronized wmf-config/InitialiseSettings.php: Enable eventgate-analytics api-request logging for group0 wikis - T214080 (duration: 00m 56s)
  • 18:24 mobrovac: restart pdfrender on scb2001 - T174916
  • 18:13 otto@deploy1001: scap-helm eventgate-analytics finished
  • 18:13 otto@deploy1001: scap-helm eventgate-analytics cluster eqiad completed
  • 18:12 otto@deploy1001: scap-helm eventgate-analytics upgrade production -f eventgate-analytics-eqiad-values.yaml --reset-values stable/eventgate-analytics [namespace: eventgate-analytics, clusters: eqiad]
  • 18:12 otto@deploy1001: scap-helm eventgate-analytics finished
  • 18:12 otto@deploy1001: scap-helm eventgate-analytics cluster codfw completed
  • 18:12 otto@deploy1001: scap-helm eventgate-analytics upgrade production -f eventgate-analytics-codfw-values.yaml --reset-values stable/eventgate-analytics [namespace: eventgate-analytics, clusters: codfw]
  • 18:10 otto@deploy1001: scap-helm eventgate-analytics upgrade production -f eventgate-analytics-codfw-values.yaml --reset-values stable/eventgate-analytics [namespace: eventgate-analytics, clusters: codfw]
  • 18:09 otto@deploy1001: scap-helm eventgate-analytics finished
  • 18:09 otto@deploy1001: scap-helm eventgate-analytics cluster staging completed
  • 18:09 otto@deploy1001: scap-helm eventgate-analytics upgrade staging -f eventgate-analytics-staging-values.yaml --reset-values stable/eventgate-analytics [namespace: eventgate-analytics, clusters: staging]
  • 18:06 mobrovac@deploy1001: Started deploy [restbase/deploy@9cf5364]: Lower AQS rate limits and fix recommendation-api spec - T219910 T220221
  • 17:50 arturo: T220129 renaming labtestmetal2001.codfw.wmnet to clouddb2001-dev.codfw.wmnet
  • 17:42 XioNoX: add swift term to cr1/2-eqiad - T220081
  • 17:14 onimisionipe@deploy1001: Finished deploy [wdqs/wdqs@c30a540]: GUI updates, Updater with redirect fix and Blazegraph with XSS fix (duration: 11m 17s)
  • 17:03 onimisionipe@deploy1001: Started deploy [wdqs/wdqs@c30a540]: GUI updates, Updater with redirect fix and Blazegraph with XSS fix
  • 16:59 mobrovac@deploy1001: Finished deploy [mobileapps/deploy@64f09a0]: Force-deploy to scb1001 to test the config perms (duration: 00m 16s)
  • 16:59 mobrovac@deploy1001: Started deploy [mobileapps/deploy@64f09a0]: Force-deploy to scb1001 to test the config perms
  • 16:55 mholloway-shell@deploy1001: Synchronized wmf-config/InitialiseSettings-labs.php: Replace needed WikimediaEditorTasks Beta Cluster config (T220153) (duration: 00m 58s)
  • 16:31 urandom: bootstrapping cassandra-a, restbase2019 -- T208087
  • 15:35 herron: aborting ores to logstash kafka logging pipeline switchover for now. puppet applied only to ores2009, reverting now
  • 15:19 herron: switching ores to logstash kafka logging pipeline (via temporary puppet disable and rolling puppet agent runs)
  • 15:09 jijiki: Pool mw2206 - T215415
  • 14:55 papaul: powering down mw2206 for DIMM replacement
  • 14:49 otto@deploy1001: Finished deploy [analytics/refinery@7fa6fb7]: deploying oozie article recommender for baho (duration: 18m 35s)
  • 14:45 papaul: powering down elastic2048 for disk replacement
  • 14:30 otto@deploy1001: Started deploy [analytics/refinery@7fa6fb7]: deploying oozie article recommender for baho
  • 14:17 anomie@deploy1001: Synchronized wmf-config/InitialiseSettings.php: Setting actor migration to write-both/read-new on test wikis and mediawikiwiki (T188327) (duration: 00m 59s)
  • 14:06 jijiki: Temporarily serve thumbor traffic on thumbor1001 via haproxy - T187765
  • 13:41 moritzm: upgrading job runners in codfw to HHVM 3.18.5+dfsg-1+wmf8+deb9u2 and wikidiff 1.8.1 (T203069)
  • 12:31 hashar: contint2001: upgraded python-pbr 0.8.2-1 -> 1.10.0-1 # T218559
  • 12:25 moritzm: upgrading API servers in codfw to HHVM 3.18.5+dfsg-1+wmf8+deb9u2 and wikidiff 1.8.1 (T203069)
  • 12:06 arturo: reboot cloudvirt1009 to clean some ACPI errors in dmesg
  • 12:03 arturo: T219776 puppet node deactivate labtestnet2003.codfw.wmnet
  • 12:00 hashar: contint1001 upgraded zuul to 2.5.1-wmf6 # T208426
  • 11:53 hoo@deploy1001: Synchronized wmf-config/Wikibase.php: WikibaseClient: Conditionally enable mapframe support (T218051) (duration: 00m 58s)
  • 11:48 hashar: contint2001: stopping zuul-server , it is not meant to be running there
  • 11:41 hoo@deploy1001: Synchronized wmf-config/abusefilter.php: Enable blocking feature of AbuseFilter in zh.wikipedia (T210364) (duration: 00m 58s)
  • 11:25 hoo@deploy1001: Synchronized wmf-config/InitialiseSettings.php: Create uploader user group for thwiki (T216615) (duration: 00m 58s)
  • 11:12 jijiki: Restarted thumbor services after librsvg upgrade
  • 11:11 fsero: upgrading envoy to 1.9.1 T215810
  • 10:42 jdrewniak@deploy1001: Synchronized portals: Wikimedia Portals Update: Bumping portals to master (T128546)|gerrit:502190 Bumping portals to master (T128546) (duration: 00m 58s)
  • 10:41 jdrewniak@deploy1001: Synchronized portals/wikipedia.org/assets: Wikimedia Portals Update: Bumping portals to master (T128546)|gerrit:502190 Bumping portals to master (T128546) (duration: 00m 59s)
  • 10:34 moritzm: upgrading app servers in codfw to HHVM 3.18.5+dfsg-1+wmf8+deb9u2 and wikidiff 1.8.1 (T203069)
  • 10:23 jijiki: Running debdeploy to upgrade librsvg
  • 09:43 gehel: force allocation of 3 unassigned shards on elasticsearch / cirrus / eqiad
  • 09:30 arturo: T219776 puppet node clean labtestnet2003.codfw.wmnet
  • 09:20 volans: restarting icinga on icinga1001 - T196336
  • 08:45 moritzm: upgrading API servers mw1221-mw1235 to HHVM 3.18.5+dfsg-1+wmf8+deb9u2 and wikidiff 1.8.1 (T203069)
  • 08:34 akosiaris@deploy1001: scap-helm zotero finished
  • 08:34 akosiaris@deploy1001: scap-helm zotero cluster staging completed
  • 08:34 akosiaris@deploy1001: scap-helm zotero upgrade -f zotero-values-staging.yaml --reset-values staging stable/zotero [namespace: zotero, clusters: staging]
  • 08:32 akosiaris@deploy1001: scap-helm zotero finished
  • 08:32 akosiaris@deploy1001: scap-helm zotero cluster eqiad completed
  • 08:32 akosiaris@deploy1001: scap-helm zotero upgrade -f zotero-values-eqiad.yaml production stable/zotero [namespace: zotero, clusters: eqiad]
  • 08:32 akosiaris: lower CPU, memory limits for zotero pods. Set 1 cpu, 700Mi. This should help the pods to recover faster in some cases. The old memory leak issues we used to have seem to be no longer present
  • 08:31 akosiaris@deploy1001: scap-helm zotero finished
  • 08:31 akosiaris@deploy1001: scap-helm zotero cluster codfw completed
  • 08:31 akosiaris@deploy1001: scap-helm zotero upgrade -f zotero-values-codfw.yaml production stable/zotero [namespace: zotero, clusters: codfw]
  • 08:17 godog: delete fundraising folder from public grafana - T219825
  • 08:01 godog: bounce grafana after https://gerrit.wikimedia.org/r/c/operations/puppet/+/501519
  • 07:59 moritzm: upgrading mw1266-mw1275 to HHVM 3.18.5+dfsg-1+wmf8+deb9u2 and wikidiff 1.8.1 (T203069)
  • 07:59 moritzm: upgrading mw1266-mw1255 to HHVM 3.18.5+dfsg-1+wmf8+deb9u2 and wikidiff 1.8.1 (T203069)
  • 07:24 marostegui@deploy1001: Synchronized wmf-config/db-eqiad.php: Repool all slaves in x1 T217453 (duration: 00m 58s)
  • 07:19 marostegui: Deploy schema change on the first 10 wikis - T217453
  • 07:18 marostegui@deploy1001: Synchronized wmf-config/db-eqiad.php: Depool all slaves in x1 T217453 (duration: 00m 59s)
  • 07:02 moritzm: installing wget security updates
  • 07:02 marostegui@deploy1001: Synchronized wmf-config/db-eqiad.php: Repool all slaves in x1 T143763 (duration: 00m 58s)
  • 06:34 _joe_: restarted netbox, SIGSEGV on HUP-induced reload
  • 05:20 marostegui: Deploy schema change on x1 master with replication, there will be lag on x1 slaves T143763
  • 05:18 marostegui@deploy1001: Synchronized wmf-config/db-eqiad.php: Depool all slaves in x1 T219777 T143763 (duration: 01m 30s)

2019-04-07

  • off: restarted icinga on icinga2001
  • 06:34 oblivian@puppetmaster1001: conftool action : set/pooled=true; selector: dnsdisc=zotero,name=codfw
  • 06:23 _joe_: deleting zotero pods with high memory watermark in codfw
  • 06:03 oblivian@puppetmaster1001: conftool action : set/pooled=false; selector: dnsdisc=zotero,name=codfw

2019-04-06

  • 10:09 gilles: Purging ruwiki namespaces > 0

2019-04-05

  • 23:10 thcipriani: revert some recent problematic gerrit acl changes
  • 22:46 chaomodus: restarted pdfrender on scb1002 T174916
  • 21:45 hashar: thcipriani restarted Gerrit. CI works again # T220243
  • 21:37 thcipriani: restarting gerrit
  • 21:30 hashar: CI / Zuul is no more processing events / T220243
  • 17:29 thcipriani: gerrit back on 2.15.11
  • 17:27 thcipriani: restart gerrit
  • 17:26 thcipriani@deploy1001: Finished deploy [gerrit/gerrit@a4e66d4]: Gerrit to back to 2.15.11 on cobalt (restart incoming) (duration: 00m 11s)
  • 17:26 thcipriani@deploy1001: Started deploy [gerrit/gerrit@a4e66d4]: Gerrit to back to 2.15.11 on cobalt (restart incoming)
  • 17:25 thcipriani@deploy1001: Finished deploy [gerrit/gerrit@a4e66d4]: Gerrit to back to 2.15.11 (on gerrit2001 only) (duration: 00m 10s)
  • 17:25 thcipriani@deploy1001: Started deploy [gerrit/gerrit@a4e66d4]: Gerrit to back to 2.15.11 (on gerrit2001 only)
  • 17:19 krinkle@deploy1001: Synchronized php-1.33.0-wmf.23/includes/diff/TextSlotDiffRenderer.php: Ia326c6 / T220217 (duration: 01m 02s)
  • 17:12 krinkle@deploy1001: Synchronized php-1.33.0-wmf.24/includes/diff/TextSlotDiffRenderer.php: Ia326c6 / T220217 (duration: 01m 00s)
  • 16:02 krinkle@deploy1001: Synchronized php-1.33.0-wmf.24/includes/jobqueue/jobs/RefreshLinksJob.php: Ib1ac31365f9c / T220037 (duration: 00m 59s)
  • 15:58 ejegg: re-enabled recurring donations queue consumer
  • 15:57 krinkle@deploy1001: Synchronized php-1.33.0-wmf.24/extensions/NavigationTiming/: I6b23be / T220156 (duration: 01m 00s)
  • 15:51 krinkle@deploy1001: Synchronized php-1.33.0-wmf.24/extensions/GlobalBlocking/includes/specials/: I5843cd181ca7d (duration: 01m 02s)
  • 15:08 ejegg: upgraded fundraising CiviCRM from 3c55850631 to 83478013a8
  • 15:01 ejegg: disabled recurring donation queue consumer
  • 14:55 papaul: powering down restbase2019 and 2020 for relocation
  • 13:53 gehel@cumin2001: END (FAIL) - Cookbook sre.elasticsearch.rolling-reboot (exit_code=99)
  • 13:45 akosiaris: repool eqiad for all kubernetes services T217426
  • 13:45 akosiaris: ρepool eqiad for all kubernetes services T217426
  • 13:45 akosiaris@puppetmaster1001: conftool action : set/pooled=true; selector: name=eqiad,dnsdisc=citoid
  • 13:45 akosiaris@puppetmaster1001: conftool action : set/pooled=true; selector: name=eqiad,dnsdisc=cxserver
  • 13:45 akosiaris@puppetmaster1001: conftool action : set/pooled=true; selector: name=eqiad,dnsdisc=mathoid
  • 13:44 akosiaris@puppetmaster1001: conftool action : set/pooled=true; selector: name=eqiad,dnsdisc=blubberoid
  • 13:44 akosiaris@puppetmaster1001: conftool action : set/pooled=true; selector: name=eqiad,dnsdisc=zotero
  • 13:41 arturo: T220203 reimage labtestnet2002 as spare in stretch
  • 13:36 arturo: T220101 disable active icinga checks for cloudcontrol2002-dev
  • 13:35 gehel@cumin1001: END (PASS) - Cookbook sre.elasticsearch.force-shard-allocation (exit_code=0)
  • 13:35 gehel@cumin1001: START - Cookbook sre.elasticsearch.force-shard-allocation
  • 13:35 gehel@cumin1001: END (PASS) - Cookbook sre.elasticsearch.force-shard-allocation (exit_code=0)
  • 13:35 gehel@cumin1001: START - Cookbook sre.elasticsearch.force-shard-allocation
  • 12:50 gehel@cumin1001: END (FAIL) - Cookbook sre.elasticsearch.force-shard-allocation (exit_code=99)
  • 12:49 gehel@cumin1001: START - Cookbook sre.elasticsearch.force-shard-allocation
  • 12:48 jijiki: Restarting pybal on lvs1016 and lvs2003 for 496382
  • 12:43 gehel@cumin1001: END (PASS) - Cookbook sre.elasticsearch.force-shard-allocation (exit_code=0)
  • 12:43 gehel@cumin1001: START - Cookbook sre.elasticsearch.force-shard-allocation
  • 12:43 gehel@cumin1001: END (PASS) - Cookbook sre.elasticsearch.force-shard-allocation (exit_code=0)
  • 12:43 gehel@cumin1001: START - Cookbook sre.elasticsearch.force-shard-allocation
  • 12:33 gehel@cumin2001: START - Cookbook sre.elasticsearch.rolling-reboot
  • 12:33 gehel@cumin2001: END (ERROR) - Cookbook sre.elasticsearch.rolling-reboot (exit_code=97)
  • 12:32 akosiaris: depool eqiad for all kubernetes services T217426
  • 12:32 akosiaris@puppetmaster1001: conftool action : set/pooled=false; selector: name=eqiad,dnsdisc=citoid
  • 12:32 akosiaris@puppetmaster1001: conftool action : set/pooled=false; selector: name=eqiad,dnsdisc=cxserver
  • 12:32 akosiaris@puppetmaster1001: conftool action : set/pooled=false; selector: name=eqiad,dnsdisc=mathoid
  • 12:32 akosiaris@puppetmaster1001: conftool action : set/pooled=false; selector: name=eqiad,dnsdisc=blubberoid
  • 12:32 akosiaris@puppetmaster1001: conftool action : set/pooled=false; selector: name=eqiad,dnsdisc=zotero
  • 12:31 akosiaris: repool codfw for all kubernetes services T217426
  • 12:30 gehel@cumin1001: END (PASS) - Cookbook sre.elasticsearch.force-shard-allocation (exit_code=0)
  • 12:30 gehel@cumin1001: START - Cookbook sre.elasticsearch.force-shard-allocation
  • 12:29 akosiaris: repool codfw for all kubernetes services
  • 12:29 akosiaris@puppetmaster1001: conftool action : set/pooled=true; selector: name=codfw,dnsdisc=citoid
  • 12:29 akosiaris@puppetmaster1001: conftool action : set/pooled=true; selector: name=codfw,dnsdisc=cxserver
  • 12:29 akosiaris@puppetmaster1001: conftool action : set/pooled=true; selector: name=codfw,dnsdisc=mathoid
  • 12:29 akosiaris@puppetmaster1001: conftool action : set/pooled=true; selector: name=codfw,dnsdisc=blubberoid
  • 12:29 akosiaris@puppetmaster1001: conftool action : set/pooled=true; selector: name=codfw,dnsdisc=zotero
  • 12:18 gehel@cumin2001: START - Cookbook sre.elasticsearch.rolling-reboot
  • 12:18 gehel@cumin2001: END (ERROR) - Cookbook sre.elasticsearch.rolling-reboot (exit_code=97)
  • 12:15 gehel@cumin1001: END (PASS) - Cookbook sre.elasticsearch.force-shard-allocation (exit_code=0)
  • 12:15 gehel@cumin1001: START - Cookbook sre.elasticsearch.force-shard-allocation
  • 12:12 bblack: repool esams
  • 12:04 gehel@cumin2001: START - Cookbook sre.elasticsearch.rolling-reboot
  • 11:53 bblack: esams depooled in DNS
  • 11:37 jijiki: Restarting pybal on lvs1006 and lvs2006 for 496382
  • 11:27 gehel@cumin2001: END (FAIL) - Cookbook sre.elasticsearch.rolling-reboot (exit_code=99)
  • 10:57 arturo: updating puppet catalog compiler facts
  • 10:42 elukey: restart druid broker on druid100[5,6] - exceptions in the logs after old datasource removal
  • 10:41 elukey: restart druid broker on druid1004 - exceptions in the logs after old datasource removal
  • 10:10 gehel@cumin2001: START - Cookbook sre.elasticsearch.rolling-reboot
  • 10:10 gehel@cumin2001: END (ERROR) - Cookbook sre.elasticsearch.rolling-reboot (exit_code=97)
  • 09:27 gehel@cumin2001: START - Cookbook sre.elasticsearch.rolling-reboot
  • 09:27 gehel@cumin2001: END (ERROR) - Cookbook sre.elasticsearch.rolling-reboot (exit_code=97)
  • 09:26 gehel@cumin1001: END (PASS) - Cookbook sre.elasticsearch.force-shard-allocation (exit_code=0)
  • 09:26 gehel@cumin1001: START - Cookbook sre.elasticsearch.force-shard-allocation
  • 08:57 akosiaris: depool codfw kubernetes apps from discovery in preparation for upgrade
  • 08:57 akosiaris@puppetmaster1001: conftool action : set/pooled=false; selector: name=codfw,dnsdisc=citoid
  • 08:57 akosiaris@puppetmaster1001: conftool action : set/pooled=false; selector: name=codfw,dnsdisc=cxserver
  • 08:57 akosiaris@puppetmaster1001: conftool action : set/pooled=false; selector: name=codfw,dnsdisc=mathoid
  • 08:57 akosiaris@puppetmaster1001: conftool action : set/pooled=false; selector: name=codfw,dnsdisc=blubberoid
  • 08:57 akosiaris@puppetmaster1001: conftool action : set/pooled=false; selector: name=codfw,dnsdisc=zotero
  • 08:55 arturo: T220101 reimaging+renaming labtestservices2002 to cloudservices2002-dev
  • 08:43 akosiaris: upgrade kubernetes staging cluster to 1.11.9
  • 08:32 elukey: roll restart of aqs on aqs100* to pick up new druid settings
  • 08:07 marostegui@deploy1001: Synchronized wmf-config/db-eqiad.php: Fully repool db1075 (duration: 00m 59s)
  • 08:06 gehel@cumin2001: START - Cookbook sre.elasticsearch.rolling-reboot
  • 07:51 elukey: restart gerrit on cobalt (timeouts and general slowdown)
  • 07:34 jijiki: Repooling thumbor1004 until we replace its memory - T215411
  • 07:18 moritzm: upgrading mw1262-mw1265 to HHVM 3.18.5+dfsg-1+wmf8+deb9u2 and wikidiff 1.8.1 (T203069)
  • 06:56 marostegui@deploy1001: Synchronized wmf-config/db-eqiad.php: More traffic to db1075 (duration: 00m 57s)
  • 06:04 marostegui@deploy1001: Synchronized wmf-config/db-eqiad.php: More traffic to db1075 (duration: 01m 00s)
  • 05:32 marostegui@deploy1001: Synchronized wmf-config/db-eqiad.php: Repool db1075 with low weight (duration: 00m 58s)
  • 05:15 marostegui: Fully upgrade and reboot db1075
  • 05:14 marostegui@deploy1001: Synchronized wmf-config/db-eqiad.php: Depool db1075 (duration: 00m 59s)
  • 04:49 gilles: T216594 Start purge of namespace 0 on ruwiki
  • 02:27 eileen: update civicrm revision changed from 7560af93df to 3c55850631, config revision is 9ad5ef3e15
  • 00:09 bd808@deploy1001: Synchronized wmf-config/InitialiseSettings.php: SWAT: wikitech: Lock LDAP accounts when users are blocked|gerrit:497866wikitech: Lock LDAP accounts when users are blocked, Disable Phabricator accounts when blocked on wikitech|gerrit:501123Disable Phabricator accounts when blocked on wikitech (T168692) 2/2 (duration: 00m 57s)
  • 00:07 bd808@deploy1001: Synchronized wmf-config/wikitech.php: SWAT: wikitech: Lock LDAP accounts when users are blocked|gerrit:497866wikitech: Lock LDAP accounts when users are blocked, Disable Phabricator accounts when blocked on wikitech|gerrit:501123Disable Phabricator accounts when blocked on wikitech (T168692) (duration: 00m 59s)

2019-04-04

  • 23:52 bd808@deploy1001: Synchronized php-1.33.0-wmf.23/extensions/LdapAuthentication: SWAT: Also set an LDAP password policy on Block|gerrit:501412Also set an LDAP password policy on Block (T168692) (duration: 01m 01s)
  • 23:38 bd808@deploy1001: Synchronized wmf-config/InitialiseSettings.php: SWAT: Add smn and sms to wmgExtraLanguageNames|gerrit:501393Add smn and sms to wmgExtraLanguageNames (T220118) (duration: 01m 02s)
  • 21:22 XioNoX: renumber AS58587 to AS10075 in eqsin
  • 21:17 bblack: DNS deploying https://gerrit.wikimedia.org/r/c/operations/dns/+/500731 which can affect resolution of our CNAME records. If dns-related issues, can revert at will!
  • 21:09 herron: restarting eqiad ELK stack for security updates
  • 20:45 marxarelli: promotion of 1.33.0-wmf.24 rolled back to group0 and holding. cc: T206678, T220037
  • 20:41 dduvall@deploy1001: rebuilt and synchronized wikiversions files: Revert "group2/group1 wikis to 1.33.0-wmf.24"
  • 20:36 marxarelli: rolling back again following still high rates of DBTransactionError (avg ~ 800/min)
  • 20:16 dduvall@deploy1001: rebuilt and synchronized wikiversions files: all wikis to 1.33.0-wmf.24
  • 20:11 marxarelli: promoting 1.33.0-wmf.24 to all wikis
  • 20:11 marxarelli: error rates look good after proper syncs and re-deploy. cc: T220037
  • 20:06 dduvall@deploy1001: Synchronized php-1.33.0-wmf.24/extensions/Citoid/modules/ve.ui.Citoid.init.js: sync for https://gerrit.wikimedia.org/r/c/mediawiki/extensions/Citoid/+/501114 (duration: 00m 58s)
  • 20:04 dduvall@deploy1001: Synchronized php-1.33.0-wmf.24/extensions/LdapAuthentication/LdapAuthenticationPlugin.php: sync for https://gerrit.wikimedia.org/r/c/mediawiki/extensions/LdapAuthentication/+/500994 (duration: 00m 57s)
  • 20:03 dduvall@deploy1001: Synchronized php-1.33.0-wmf.24/extensions/LdapAuthentication/LdapAuthenticationHooks.php: sync for https://gerrit.wikimedia.org/r/c/mediawiki/extensions/LdapAuthentication/+/500994 (duration: 00m 58s)
  • 20:02 dduvall@deploy1001: Synchronized php-1.33.0-wmf.24/extensions/LdapAuthentication/LdapAuthentication.php: sync for https://gerrit.wikimedia.org/r/c/mediawiki/extensions/LdapAuthentication/+/500994 (duration: 00m 58s)
  • 19:58 dduvall@deploy1001: Synchronized php-1.33.0-wmf.24/extensions/EventBus/includes/JobExecutor.php: syncing JobExecutor changes (duration: 00m 58s)
  • 19:55 dduvall@deploy1001: Synchronized php: group1 wikis to 1.33.0-wmf.24 (duration: 01m 47s)
  • 19:53 dduvall@deploy1001: rebuilt and synchronized wikiversions files: group1 wikis to 1.33.0-wmf.24
  • 19:51 marxarelli: re-deploying to group1 after proper syncs
  • 19:47 dduvall@deploy1001: Synchronized php-1.33.0-wmf.24/extensions/Citoid/modules/ve.ui.Citoid.init.js: (no justification provided) (duration: 00m 59s)
  • 19:46 dduvall@deploy1001: Synchronized php-1.33.0-wmf.24/extensions/EventBus/includes/JobExecutor.php: (no justification provided) (duration: 00m 58s)
  • 19:45 dduvall@deploy1001: Synchronized php-1.33.0-wmf.24/extensions/LdapAuthentication/LdapAuthenticationPlugin.php: (no justification provided) (duration: 00m 58s)
  • 19:44 dduvall@deploy1001: Synchronized php-1.33.0-wmf.24/extensions/LdapAuthentication/LdapAuthenticationHooks.php: (no justification provided) (duration: 00m 59s)
  • 19:43 dduvall@deploy1001: Synchronized php-1.33.0-wmf.24/extensions/LdapAuthentication/LdapAuthentication.php: (no justification provided) (duration: 00m 59s)
  • 19:19 dduvall@deploy1001: rebuilt and synchronized wikiversions files: Revert "group1 wikis to 1.33.0-wmf.24"
  • 19:13 marxarelli: large spike in DBTransactionError errors. rolling back. cc: T220037
  • 19:12 dduvall@deploy1001: Synchronized php: group1 wikis to 1.33.0-wmf.24 (duration: 01m 46s)
  • 19:10 dduvall@deploy1001: rebuilt and synchronized wikiversions files: group1 wikis to 1.33.0-wmf.24
  • 19:06 marxarelli: fetch/rebase looks good, incorporates fixes for T220037, T219510. deploying
  • 19:03 marxarelli: preparing to promote 1.33.0-wmf.24 to group1
  • 18:33 catrope@deploy1001: Synchronized wmf-config/InitialiseSettings.php: Enable partial blocks on frwiki, plwiki (T219327, T219218) (duration: 00m 58s)
  • 18:23 catrope@deploy1001: Synchronized wmf-config/InitialiseSettings.php: Enable ORES RCFilters on eswikiquote (T219160) (duration: 01m 02s)
  • 18:13 moritzm: restarted apache on people.wikimedia.org to pick up OpenSSL update
  • 17:59 bstorm_: stopped postgresql on labsdb1006.eqiad.wmnet and moved the database master functionality (and all rsyncs) to clouddb1003.clouddb-services.eqiad.wmflabs
  • 17:59 ppchelko@deploy1001: Finished deploy [cpjobqueue/deploy@922cbc0]: Switch to new logging infrastructure T211125 (duration: 04m 03s)
  • 17:55 ppchelko@deploy1001: Started deploy [cpjobqueue/deploy@922cbc0]: Switch to new logging infrastructure T211125
  • 17:47 ppchelko@deploy1001: Finished deploy [changeprop/deploy@f69dc9c]: Switch to new logging infrastructure T211125 (duration: 01m 44s)
  • 17:45 ppchelko@deploy1001: Started deploy [changeprop/deploy@f69dc9c]: Switch to new logging infrastructure T211125
  • 17:33 jynus: stopping replication on dbstore2001:s8 for backup testing T206203
  • 17:29 jynus: killing ongoing backup at dbprov2002, stuck
  • 17:28 gehel@cumin2001: END (ERROR) - Cookbook sre.elasticsearch.rolling-reboot (exit_code=97)
  • 17:10 gehel@cumin2001: START - Cookbook sre.elasticsearch.rolling-reboot
  • 16:31 herron: beginning rolling kafka restarts on kafka200[123] for security updates
  • 16:01 herron: repooling kafka2003 eventbus
  • 15:59 mutante: wikivoyage-old.org domain has been retired and deactivated (T219867, T81727)
  • 15:56 herron: depooling kafka2003 for eventbus security updates
  • 15:55 herron: repooling kafka2002 eventbus
  • 15:52 herron: depooling kafka2002 for eventbus security updates
  • 15:52 herron: pooling kafka2001 eventbus
  • 15:42 herron: depooling kafka2001 for eventbus security updates
  • 15:38 moritzm: rolling restart of proton to pick up openssl security update
  • 15:03 gehel@cumin2001: END (PASS) - Cookbook sre.elasticsearch.rolling-reboot (exit_code=0)
  • 14:59 moritzm: installing libdatetime-timezone-perl updates
  • 14:24 jiji@cumin1001: conftool action : set/pooled=no; selector: dc=codfw,service=cxserver,cluster=scb,name=scb.*
  • 14:24 jiji@cumin1001: conftool action : set/pooled=no; selector: dc=eqiad,service=cxserver,cluster=scb,name=scb.*
  • 14:23 jijiki: Depooling scb* from service cxserver traffic
  • 13:46 gehel@cumin2001: START - Cookbook sre.elasticsearch.rolling-reboot
  • 13:46 ladsgroup@deploy1001: Synchronized wmf-config/interwiki.php: Update interwiki cache (duration: 03m 37s)
  • 13:29 jbond42: restart of gerrit apache service will occure at 13:40
  • 13:28 volans: upgraded spicerack to 0.0.22 on cumin[12]001
  • 13:27 volans: uploaded spicerack_0.0.22-1_amd64.deb to apt.wikimedia.org stretch-wikimedia
  • 13:23 moritzm: upgrading mw1261 to HHVM 3.18.5+dfsg-1+wmf8+deb9u2 / wikidiff 1.8.1
  • 13:20 jijiki: Stopped all citoid services from scb* - 494215
  • 13:15 jbond42: restart of phabricator apache service will occure at 14:25
  • 12:46 moritzm: uploaded HHVM 3.18.5+dfsg-1+wmf8+deb9u2 to apt.wikimedia.org/stretch-wikimedia
  • 12:10 arturo: T219626 reimaging cloudcontrol2001-dev again
  • 11:43 moritzm: upgrading HHVM on mwdebug servers in eqiad along with update to hhvm-wikidiff 1.8.1
  • 11:35 moritzm: uploaded nodejs 10.15.2~dfsg-1+wmf1 to the component/node10 component of apt.wikimedia.org/stretch-wikimedia (updated to latest 10.x release and a change to ensure zlib binary compat with NodeSource) (T215562)
  • 11:34 Amir1: EU SWAT is done
  • 11:32 ladsgroup@deploy1001: Synchronized wmf-config/CommonSettings.php: SWAT: Add mediawiki.org to the URL shortener whitelist|gerrit:500976Add mediawiki.org to the URL shortener whitelist (duration: 00m 58s)
  • 11:28 jbond42: rolling security updates for apache on jessie
  • 11:25 ladsgroup@deploy1001: Synchronized wmf-config/InitialiseSettings.php: SWAT: Enable ReferencePreviews beta feature on de- and ar-wiki (T218766)|gerrit:498371Enable ReferencePreviews beta feature on de- and ar-wiki (T218766) (duration: 01m 00s)
  • 11:21 arturo: T219626 reimaging cloudcontrol2001-dev again
  • 11:08 arturo: drop python-psutil from jessie-wikimedia/openstack-mitaka-jessie, related to T219626
  • 10:56 moritzm: uploaded hhvm-wikidiff 1.8.1 to apt.wikimedia.org/stretch-wikimedia (source package is named php-wikdiff2 for legacy reasons) (T203069)
  • 10:21 arturo: T219626 reimaging cloudcontrol2001-dev again
  • 10:01 moritzm: installing openssl1.0 security updates on stretch-based DB hosts
  • 08:36 moritzm: rolling restart of parsoid to pick up OpenSSL security update
  • 08:06 moritzm: uploaded Apache 2.4.10-10+deb8u14+wmf1 to apt.wikimedia.org/jessie-wikimedia (latest jessie security update rebased with our local patches)
  • 05:39 marostegui: Stop MySQL on db2033 for decommission - T219493
  • 05:32 marostegui: Remove db2033 from tendril and zarcillo - T219493
  • 05:19 marostegui@deploy1001: Synchronized wmf-config/db-eqiad.php: Remove db2033 for decommission T219493 (duration: 00m 59s)
  • 05:18 marostegui@deploy1001: Synchronized wmf-config/db-codfw.php: Remove db2033 for decommission T219493 (duration: 00m 59s)
  • 04:58 marostegui: Deploy schema change on labswiki for the job table - T219887
  • 00:40 chaomodus: restart pdfrender on scb1003 - T174916

2019-04-03

  • 23:51 catrope@deploy1001: Synchronized wmf-config/InitialiseSettings.php: Enable Flow beta feature on zhwikisource (T219588) (duration: 00m 58s)
  • 23:50 catrope@deploy1001: Synchronized dblists/flow.dblist: Enable Flow on zhwikisource (T219588) (duration: 00m 57s)
  • 23:38 catrope@deploy1001: Synchronized wmf-config/InitialiseSettings.php: Enable GrowthExperiments homepage EventLogging on testwiki (duration: 00m 59s)
  • 23:20 catrope@deploy1001: Synchronized wmf-config/InitialiseSettings.php: Configure GrowthExperiments homepage tutorial pages on cswiki, kowiki, viwiki (dark deploy) (duration: 00m 59s)
  • 23:18 catrope@deploy1001: sync-file aborted: (no justification provided) (duration: 00m 00s)
  • 23:14 catrope@deploy1001: Synchronized wmf-config/InitialiseSettings.php: Configure GrowthExperiments homepage on testwiki (duration: 01m 01s)
  • 21:32 elukey: start hadoop-hdfs-namenode on an-master1002 after outage due to big job hitting HDFS
  • 20:40 gehel: excluding elastic2048 from cluster and depooling - T220038
  • 20:29 arlolra: Updated Parsoid to 0b3bb10 (T219337)
  • 20:20 arlolra@deploy1001: Finished deploy [parsoid/deploy@4f740e3]: Updating Parsoid to 0b3bb10 (duration: 05m 44s)
  • 20:14 arlolra@deploy1001: Started deploy [parsoid/deploy@4f740e3]: Updating Parsoid to 0b3bb10
  • 20:09 marxarelli: 1.33.0-wmf.24 is holding at group0 following rollback. filed T220037. cc: T206678
  • 19:56 marxarelli: log correction group1 reverted to 1.33.0-wmf.23
  • 19:56 dduvall@deploy1001: rebuilt and synchronized wikiversions files: Revert group1 to 1.33.0-wmf.24
  • 19:55 marxarelli: 111,185 and counting DBTransactionError for jobrunner.discovery.wmnet
  • 19:53 marxarelli: rolling back group1
  • 19:53 marxarelli: massive spike in DBTransactionError ([{exception_id}] {exception_url} Wikimedia\Rdbms\DBTransactionError from line 246 of /srv/mediawiki/php-1.33.0-wmf.24/includes/libs/rdbms/lbfactory/LBFactory.php: RefreshLinksJob::runForTitle: transaction round 'RefreshLinksJob::run' already started.)
  • 19:51 dduvall@deploy1001: Synchronized php: group1 wikis to 1.33.0-wmf.24 (duration: 01m 49s)
  • 19:50 marxarelli: dduvall@deploy1001 rebuilt and synchronized wikiversions files: group1 wikis to 1.33.0-wmf.24
  • 19:34 smalyshev@deploy1001: Finished deploy [wdqs/wdqs@50b2af9]: Deploy new Updater for more cache-friendly update startegy (duration: 10m 54s)
  • 19:23 smalyshev@deploy1001: Started deploy [wdqs/wdqs@50b2af9]: Deploy new Updater for more cache-friendly update startegy
  • 18:14 thcipriani: gerrit back on 2.15.12
  • 18:12 thcipriani: restarting gerrit for 2.15.12 update
  • 18:11 thcipriani@deploy1001: Finished deploy [gerrit/gerrit@e416edf]: Gerrit to 2.15.12 on cobalt (restart to follow) (duration: 00m 11s)
  • 18:11 thcipriani@deploy1001: Started deploy [gerrit/gerrit@e416edf]: Gerrit to 2.15.12 on cobalt (restart to follow)
  • 18:09 thcipriani@deploy1001: Finished deploy [gerrit/gerrit@e416edf]: Gerrit to 2.15.12 on gerrit2001 only (duration: 00m 11s)
  • 18:09 thcipriani@deploy1001: Started deploy [gerrit/gerrit@e416edf]: Gerrit to 2.15.12 on gerrit2001 only
  • 17:57 elukey: restart hadoop-hdfs-namenode on an-master1001 as precautionary measure after the outage (currently standby)
  • 17:44 herron: shortly postponing restarts of eventbus and kafka services for security updates due to unrelated firefighting - repooling kafka1001
  • 17:19 elukey: restart hadoop-hdfs-namenode on an-master1002 after forced shutdown due to errors
  • 17:14 herron: depooling kafka1001 to restart eventbus and kafka services for security updates
  • 17:04 Lucas_WMDE: EU SWAT done
  • 17:04 Lucas_WMDE: lucaswerkmeister-wmde@mwmaint1002:~$ mwscript maintenance/namespaceDupes.php --wiki=srwiki --fix # T214428 – 0 pages to fix, 0 links to fix, Looks good!
  • 17:03 lucaswerkmeister-wmde@deploy1001: Synchronized wmf-config/throttle.php: SWAT: Add new throttle rule (T220001)|gerrit:500987Add new throttle rule (T220001) (duration: 00m 58s)
  • 17:00 lucaswerkmeister-wmde@deploy1001: Synchronized php-1.33.0-wmf.24/extensions/EventBus: SWAT: Incorrect order of calls in createPageDeleteEvent.|gerrit:500959Incorrect order of calls in createPageDeleteEvent. (duration: 00m 59s)
  • 16:51 gehel@cumin2001: END (FAIL) - Cookbook sre.elasticsearch.rolling-reboot (exit_code=99)
  • 16:44 gehel@cumin2001: START - Cookbook sre.elasticsearch.rolling-reboot
  • 16:37 Lucas_WMDE: lucaswerkmeister-wmde@mwmaint1002:~$ mwscript maintenance/namespaceDupes.php --wiki=idwiktionary --fix # T218796 – 41 links to fix, 41 were resolvable, Looks good!
  • 16:36 lucaswerkmeister-wmde@deploy1001: Synchronized wmf-config/InitialiseSettings.php: Add namespace "Lampiran" at id.wiktionary (T218796)|gerrit:499530Add namespace "Lampiran" at id.wiktionary (T218796) (duration: 00m 59s)
  • 16:29 lucaswerkmeister-wmde@deploy1001: Synchronized wmf-config/InitialiseSettings.php: SWAT: Enable Draft namespace on srwiki (T214428)|gerrit:500761Enable Draft namespace on srwiki (T214428) (duration: 01m 00s)
  • 16:22 lucaswerkmeister-wmde@deploy1001: Synchronized wmf-config/InitialiseSettings.php: SWAT: Add three domains at wgCopyUploadDomains (T216886, T219075)|gerrit:500154Add three domains at wgCopyUploadDomains (T216886, T219075) (duration: 01m 00s)
  • 16:16 lucaswerkmeister-wmde@deploy1001: Synchronized wmf-config/flaggedrevs.php: SWAT: Remove namespace 104 from FlaggedRevs configuration for arwiki (T217507)|gerrit:500153Remove namespace 104 from FlaggedRevs configuration for arwiki (T217507) (duration: 01m 00s)
  • 15:18 volans: shutdown ms-be2026 for firmware upgrade - T219854
  • 15:16 volans@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
  • 15:16 volans@cumin1001: START - Cookbook sre.hosts.downtime
  • 15:00 anomie@mwmaint1002: Fixing empty values for 'target_author_actor' in log_search on wikitech for T215525
  • 15:00 anomie@mwmaint1002: Fixing empty values for 'target_author_actor' in log_search on section 8 wikis for T215525
  • 15:00 anomie@mwmaint1002: Fixing empty values for 'target_author_actor' in log_search on section 7 wikis for T215525
  • 15:00 anomie@mwmaint1002: Fixing empty values for 'target_author_actor' in log_search on section 6 wikis for T215525
  • 15:00 anomie@mwmaint1002: Fixing empty values for 'target_author_actor' in log_search on section 5 wikis for T215525
  • 15:00 anomie@mwmaint1002: Fixing empty values for 'target_author_actor' in log_search on section 4 wikis for T215525
  • 15:00 anomie@mwmaint1002: Fixing empty values for 'target_author_actor' in log_search on remaining section 3 wikis for T215525
  • 15:00 anomie@mwmaint1002: Fixing empty values for 'target_author_actor' in log_search on section 2 wikis for T215525
  • 14:59 anomie@mwmaint1002: Fixing empty values for 'target_author_actor' in log_search on section 1 wikis for T215525
  • 14:56 anomie@deploy1001: Synchronized php-1.33.0-wmf.24/maintenance/includes/MigrateActors.php: Backporting fix from gerrit:500754 (duration: 01m 01s)
  • 14:55 anomie@deploy1001: Synchronized php-1.33.0-wmf.23/maintenance/includes/MigrateActors.php: Backporting fix from gerrit:500754 (duration: 01m 01s)
  • 14:18 marostegui: Stop replication on pc2007 for testing - T210725
  • 14:03 andrewbogott: restarting rabbitmq on cloudcontrol1003
  • 13:59 andrewbogott: restarting neutron-l3-agent on cloudnet1003 and cloudnet1004
  • 13:46 andrewbogott: restarting neutron-metadata-agent on cloudnet1003
  • 13:44 gilles@deploy1001: Synchronized php-1.33.0-wmf.23/includes/media/MediaTransformOutput.php: T216499 Identify images that should have had high importance (duration: 00m 59s)
  • 13:34 moritzm: reverting dbmonitor2001 to deb8u12+wmf1 build
  • 13:02 gehel@cumin2001: END (FAIL) - Cookbook sre.elasticsearch.rolling-reboot (exit_code=99)
  • 13:01 gehel@cumin2001: START - Cookbook sre.elasticsearch.rolling-reboot
  • 12:49 gehel@cumin2001: END (FAIL) - Cookbook sre.elasticsearch.rolling-reboot (exit_code=99)
  • 12:45 gehel@cumin2001: START - Cookbook sre.elasticsearch.rolling-reboot
  • 12:42 arturo: T219626 reimaging cloudcontrol2001-dev
  • 12:31 mutante: restarting gerrit service to apply change 498431
  • 11:25 Amir1: EU SWAT is done
  • 11:16 jbond42: rolling security updates for apache
  • 10:29 mutante: planet1001/2001 - apt autoremove un-required packages
  • 10:27 mutante: planet1001/2001 - upgrade apache2, openssh, locales, rsyslog ..
  • 10:25 arturo: updating puppet compiler facts
  • 10:19 volans: upgraded spicerack to 0.0.21 on cumin[12]001
  • 10:17 volans: uploaded spicerack_0.0.21-1_amd64.deb to apt.wikimedia.org stretch-wikimedia
  • 09:56 marostegui: Alter empty job table on s6 primary master - T219887
  • 09:55 moritzm: upgrading beta to hhvm wikidiff 1.8.1 (T203069)
  • 09:54 mutante: running mysql select queries on m3-slave to get data from phabricator conpherence as requested by andre
  • 09:45 moritzm: removed labtestnet2003.codfw.wmnet from debmonitor (T219776)
  • 09:29 ema: cp-ats-codfw: test ATS rolling restart T213263
  • 09:27 marostegui: Drop wikishared.wikimedia_editor_tasks_entity_description_exists table from x1 T219963
  • 09:27 marostegui@deploy1001: Synchronized wmf-config/db-eqiad.php: Repool s8 sanitarium master (duration: 00m 56s)
  • 09:26 marostegui@deploy1001: Synchronized wmf-config/db-codfw.php: Repool s8 sanitarium master (duration: 01m 00s)
  • 08:35 jynus: merging change on network constants (firewall operation)
  • 08:23 marostegui: Restart mysql on sanitarium hosts db1124 db1125 db2094 db2095 - T218302
  • 08:18 marostegui: Stop replication on db2082 and db1087 (s8 sanitarium masters) T218302
  • 08:17 marostegui@deploy1001: Synchronized wmf-config/db-codfw.php: Depool s8 sanitarium master (duration: 00m 57s)
  • 08:16 marostegui@deploy1001: Synchronized wmf-config/db-eqiad.php: Depool s8 sanitarium master (duration: 00m 58s)
  • 08:09 moritzm: installing new apache packages on mmw1261
  • 07:53 gilles@deploy1001: Synchronized php-1.33.0-wmf.24/includes/media/ThumbnailImage.php: T216499 Only apply high priority hint half the time (duration: 00m 58s)
  • 07:51 moritzm: installing new apache packages on mwdebug
  • 07:42 marostegui: Reboot db1115 - tendril and dbtree will be down
  • 07:40 marostegui: DIsable event scheduler on db1115 before restarting - tendril is stuck
  • 07:26 marostegui@deploy1001: Synchronized wmf-config/db-eqiad.php: Repool db1120 T219493 (duration: 00m 57s)
  • 07:25 marostegui: Deploy schema change on db1073, labtestwiki - T219887
  • 07:09 marostegui: Stop replication in sync on db1120 and db2034 (x1 codfw master) - T219493
  • 07:07 marostegui@deploy1001: Synchronized wmf-config/db-eqiad.php: Depool db1120 T219493 (duration: 01m 13s)
  • 06:04 _joe_: restart varnish backend on cp1085, causing unavailability
  • 05:57 marostegui: Fix data drifts on bnwikisource on x1 - T219493
  • 05:31 marostegui@deploy1001: Synchronized wmf-config/db-eqiad.php: Repool pc1007 (duration: 00m 59s)
  • 05:23 marostegui: Upgrade pc1007
  • 05:23 marostegui@deploy1001: Synchronized wmf-config/db-eqiad.php: Depool pc1007 for upgrade (duration: 01m 00s)

2019-04-02

  • 23:44 jforrester@deploy1001: Synchronized wmf-config/InitialiseSettings.php: SWAT enwiki: Restrict move-categorypages to +extendedmover/+sysop/+bot T219261 (duration: 00m 58s)
  • 23:30 jforrester@deploy1001: Synchronized wmf-config/InitialiseSettings.php: SWAT Add new WMCS IP range to wgRateLimitsExcludedIps T167432 (duration: 00m 57s)
  • 23:27 jforrester@deploy1001: Synchronized wmf-config/InitialiseSettings.php: SWAT Enable SandboxLink for rowiki T219855 (duration: 00m 56s)
  • 23:22 jforrester@deploy1001: Synchronized wmf-config/InitialiseSettings.php: SDC: Add 'depicts' statements to search index on testcommons (duration: 00m 59s)
  • 21:27 andrewbogott: rebooting labservices1001
  • 21:16 andrewbogott: rebooting labservices1002
  • 20:54 andrewbogott: restarting pdns and pdns-recursor on labservices1001 and 1002 in hopes of getting those machines to act a bit less sluggish
  • 20:23 krinkle@deploy1001: Synchronized php-1.33.0-wmf.24/skins/Vector/includes/: I6e04b512d / T219864 (duration: 00m 59s)
  • 20:20 krinkle@deploy1001: Synchronized php-1.33.0-wmf.23/skins/Vector/includes/: I6e04b512d / T219864 (duration: 01m 00s)
  • 20:16 marxarelli: 1.33.0-wmf.24 successfully deployed to group0. errors rates look normal (T206678)
  • 20:07 dduvall@deploy1001: rebuilt and synchronized wikiversions files: Group0 to 1.33.0-wmf.24
  • 19:57 dduvall@deploy1001: Finished scap: testwiki to php-1.33.0-wmf.24 and rebuild l10n cache (duration: 44m 20s)
  • 19:12 dduvall@deploy1001: Started scap: testwiki to php-1.33.0-wmf.24 and rebuild l10n cache
  • 18:41 ppchelko@deploy1001: Finished deploy [restbase/deploy@2cb53a7]: Kafka logging pipeline, full deploy T211125 (duration: 20m 49s)
  • 18:22 marxarelli: cutting mediawiki branch 1.33.0-wmf.24 (T206678)
  • 18:22 marxarelli: cutting mediawiki branch 1.33.0-wmf.24
  • 18:20 ppchelko@deploy1001: Started deploy [restbase/deploy@2cb53a7]: Kafka logging pipeline, full deploy T211125
  • 18:20 ppchelko@deploy1001: deploy aborted: Kafka logging pipeline, full deploy T211125 (duration: 00m 03s)
  • 18:20 ppchelko@deploy1001: Started deploy [restbase/deploy@2cb53a7]: Kafka logging pipeline, full deploy T211125
  • 18:09 ppchelko@deploy1001: Finished deploy [restbase/deploy@2cb53a7]: Kafka logging pipeline, canary on restbase2010 T211125 (duration: 02m 33s)
  • 18:06 ppchelko@deploy1001: Started deploy [restbase/deploy@2cb53a7]: Kafka logging pipeline, canary on restbase2010 T211125
  • 17:59 ppchelko@deploy1001: Finished deploy [restbase/deploy@2cb53a7] (dev-cluster): Kafka logging pipeline, dev cluster only T211125 (duration: 03m 25s)
  • 17:56 ppchelko@deploy1001: Started deploy [restbase/deploy@2cb53a7] (dev-cluster): Kafka logging pipeline, dev cluster only T211125
  • 17:51 ppchelko@deploy1001: Finished deploy [restbase/deploy@3dcf328]: Upgrade swagger to v3, attempt 2, T218218 (duration: 20m 47s)
  • 17:37 ejegg: updated payments-wiki-staging from 793bce1a5f to 15bcb3d1a6
  • 17:30 ppchelko@deploy1001: Started deploy [restbase/deploy@3dcf328]: Upgrade swagger to v3, attempt 2, T218218
  • 17:30 ppchelko@deploy1001: Finished deploy [restbase/deploy@3dcf328] (dev-cluster): Upgrade swagger to v3, attempt 2, T218218 (duration: 03m 02s)
  • 17:27 ppchelko@deploy1001: Started deploy [restbase/deploy@3dcf328] (dev-cluster): Upgrade swagger to v3, attempt 2, T218218
  • 16:47 XioNoX: - replacing accepted-prefix-limit with prefix-limit in eqsin - T211730
  • 16:44 ppchelko@deploy1001: Finished deploy [restbase/deploy@6026ad1]: Switch to swagger 3 T218218 (duration: 04m 52s)
  • 16:39 ppchelko@deploy1001: Started deploy [restbase/deploy@6026ad1]: Switch to swagger 3 T218218
  • 16:36 XioNoX: - replacing accepted-prefix-limit with prefix-limit on esams - T211730
  • 16:12 XioNoX: - replacing accepted-prefix-limit with prefix-limit on cr2-eqiad - T211730
  • 16:02 mutante: T194174 - bump. started alerting again 2 days ago
  • 16:00 mutante: icinga - schedule (30d) downtime for kubernetes operational latencies alerts (T219696) on kubernetes1004
  • 15:57 arturo: T219626 reimaging cloudcontrol2001-dev again
  • 15:55 mutante: scandium - systemctl start parsoid-vd was failed (T201366)
  • 15:55 herron: beginning rolling upgrade of codfw ELK cluster to 5.6.15 T219571
  • 15:52 mutante: icinga - re-enabling notifications for scandium. setup task is resolved yet systemd is alerting, should not have been turned off anymore (T201366)
  • 15:39 XioNoX: repool eqsin - T219847
  • 15:32 jbond42: add cpp-hocon 0.1.6 to jessie-wikimedia/backports
  • 15:21 jforrester@deploy1001: Synchronized wmf-config/InitialiseSettings.php: VE: Enable mobile section editing A/B test on all remaining wikis T219564 (duration: 00m 51s)
  • 15:07 moritzm: stopped/disabled ipmievd on cumin2001
  • 14:54 jbond42: add leatherman 1.4 to jessie-wikimedia/backports
  • 13:44 anomie@mwmaint1002: Fixing empty values for 'target_author_actor' in log_search on test wikis and mediawikiwiki for T215525
  • 13:24 volans: reboot ms-be2026 to see if that fixes the controller - T219854
  • 13:23 volans@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
  • 13:23 volans@cumin1001: START - Cookbook sre.hosts.downtime
  • 13:20 volans@cumin1001: END (FAIL) - Cookbook sre.hosts.downtime (exit_code=99)
  • 13:20 volans@cumin1001: START - Cookbook sre.hosts.downtime
  • 13:20 jynus: updating puppet compiler facts
  • 12:11 arturo: icinga downtime toolschecker for 1 month T219243
  • 12:07 hashar: contint1001: compressing some MediaWiki debugging logs under /srv/jenkins/builds # T219850
  • 11:42 moritzm: restarting parsoid on wtp1025 to pick up openssl update
  • 11:33 hashar: contint1001: cleaning Docker containers #T219850
  • 11:23 Amir1: EU SWAT is done
  • 11:22 ladsgroup@deploy1001: Synchronized wmf-config/InitialiseSettings.php: SWAT: Add the urlshortener-manage-url right and enable it for stewards (T133109)|gerrit:499777Add the urlshortener-manage-url right and enable it for stewards (T133109), Part I (duration: 00m 51s)
  • 11:21 ladsgroup@deploy1001: Synchronized wmf-config/CommonSettings.php: SWAT: Add the urlshortener-manage-url right and enable it for stewards (T133109)|gerrit:499777Add the urlshortener-manage-url right and enable it for stewards (T133109), Part I (duration: 00m 53s)
  • 11:14 akosiaris: T217715 Update mathoid, citoid, cxserver, eventgate grafana dashboards to use the new recording rules for the quantiles
  • 11:14 jbond42: add cmake 3.6.2 to jessie-wikimedia/backports
  • 11:02 jbond42: add rapidjson 1.1.0 to jessie-wikimedia/backports
  • 10:47 jbond42: add catch 1.10 to jessie-wikimedia/backports
  • 10:42 jbond42: add strip-nondeterminism 0.034 to jessie-wikimedia/backports
  • 10:39 jbond42: add dh-autoreconf 12 to jessie-wikimedia/backports
  • 10:30 jbond42: add debhelper 10.2.5 and dh-systemd 10.2.5 to jessie-wikimedia/backports
  • 10:08 elukey: manually purge varnishkafka graphite alert's URL as attempt to avoid a flapping alert - T219842
  • 09:14 arturo: T219776 finally reimaging cloudnet2003-dev.codfw.wmnet (was labtestnet2003)
  • 09:03 _joe_: uploaded patched version of bootstrap-vz to account for jessie-updates vanishing (T219683)
  • 08:58 marostegui@deploy1001: Synchronized wmf-config/db-eqiad.php: Repool all slaves in x1 T219777 T143763 (duration: 00m 53s)
  • 08:50 marostegui: Execute schema change on db1069 x1 master with replication enabled on the following small wikis: aawiki aawikibooks aawiktionary abwiki abwiktionary acewiki advisorswiki advisorywiki adywiki afwiki T143763
  • 08:20 marostegui: Compress wikishared.urlshortcodes table on x1, directly on the master with replication (table has 1 row) - T219777
  • 08:16 marostegui@deploy1001: Synchronized wmf-config/db-eqiad.php: Depool all slaves in x1 T219777 T143763 (duration: 00m 53s)
  • 08:13 moritzm: installing debdeploy updates on remaining hosts in eqiad/codfw
  • 08:05 moritzm: installing openssl1.0 security updates
  • 07:52 moritzm: removed labvirt1008 from debmonitor (T216661)
  • 06:38 marostegui@deploy1001: Synchronized wmf-config/db-eqiad.php: Repool db1120 (duration: 00m 50s)
  • 06:31 marostegui@deploy1001: Synchronized wmf-config/db-eqiad.php: Depool db1120 (duration: 00m 52s)
  • 06:26 marostegui@deploy1001: Synchronized wmf-config/db-eqiad.php: Repool db1064 (duration: 00m 52s)
  • 06:16 marostegui@deploy1001: Synchronized wmf-config/db-eqiad.php: Depool db1064 (duration: 00m 54s)
  • 06:02 marostegui@deploy1001: Synchronized wmf-config/db-eqiad.php: Repool pc1008 (duration: 00m 53s)
  • 05:58 oblivian@deploy1001: Finished deploy [docker-pkg/deploy@2a090ef]: New version for T219778 (duration: 00m 19s)
  • 05:58 oblivian@deploy1001: Started deploy [docker-pkg/deploy@2a090ef]: New version for T219778
  • 05:55 marostegui: Upgrade pc1008
  • 05:55 marostegui@deploy1001: Synchronized wmf-config/db-eqiad.php: Depool pc1008 (duration: 00m 56s)
  • 04:14 onimisionipe: restarted tilerator on maps200[1-3] - connection refused
  • 01:18 XioNoX: replacing accepted-prefix-limit with prefix-limit on cr1-eqiad - T211730
  • 01:14 XioNoX: replacing accepted-prefix-limit with prefix-limit in eqord - T211730
  • 00:52 XioNoX: depool eqsin due to Telia eqsin-codfw link outage
  • 00:40 XioNoX: replacing accepted-prefix-limit with prefix-limit in [co|eq]dfw - T211730
  • 00:25 XioNoX: replacing accepted-prefix-limit with prefix-limit on all ulsfo peers - T211730
  • 00:19 XioNoX: replacing accepted-prefix-limit with prefix-limit on one ulsfo peer - T211730
  • 00:06 XioNoX: jnt push to msw switches

2019-04-01

  • 23:54 shdubsh: restarting kafka on kafka-jumbo1004
  • 23:47 shdubsh: restarting kafka on kafka-jumbo1003
  • 23:36 shdubsh: restart kafka on kafka-jumbo1002
  • 23:28 shdubsh: restart kafka on kafka-jumbo1001
  • 23:16 XioNoX: jnt push to csw2-esams
  • 22:52 XioNoX: restart pdfrender on scb1003 - T174916
  • 21:44 sbassett@deploy1001: Synchronized private/PrivateSettings.php: Remove kowiki spam mitigations T212679 (duration: 00m 54s)
  • 21:28 XioNoX: Push AS specific policy-statements to cr1/2-eqsin v4 peers - T211930
  • 21:11 dcausse: elasticsearch search cluster: reindex spaceless languages (T219533)
  • 19:48 gilles@deploy1001: Synchronized wmf-config/InitialiseSettings.php: T216499 Renew Priority Hints origin trial token (duration: 00m 54s)
  • 19:48 bblack: authdns2001 (ns1) upgrade gdnsd -> 3.1.0
  • 18:58 XioNoX: re-set ulsfo-codfw ospf cost to previous default - T219591
  • 18:52 shdubsh: restart mjolnir-kafka-msearch on relforge1002 to adopt new logging config
  • 18:44 dcausse: Morning SWAT done
  • 18:42 dcausse@deploy1001: Synchronized wmf-config/InitialiseSettings.php: T219268: [cirrus] Use bm25 similarity for all wikis (duration: 00m 51s)
  • 18:33 dcausse@deploy1001: Synchronized wmf-config/CirrusSearch-production.php: T210381: [cirrus] Cleanup transitional states (duration: 00m 53s)
  • 18:22 lucaswerkmeister-wmde@deploy1001: Synchronized wmf-config/CommonSettings.php: SWAT: ExternalGuidance: Allow google translate hosts as known services (T218948)|gerrit:498913ExternalGuidance: Allow google translate hosts as known services (T218948) (duration: 00m 53s)
  • 18:18 bblack: multatuli (ns2) upgrade gdnsd -> 3.1.0
  • 18:10 lucaswerkmeister-wmde@deploy1001: Synchronized wmf-config/InitialiseSettings.php: SWAT: Add tmpSerializeEmptyListsAsObjects Wikibase repo config (T138104)|gerrit:499999Add tmpSerializeEmptyListsAsObjects Wikibase repo config (T138104) (duration: 00m 54s)
  • 17:55 XioNoX: remove asw2-c-eqiad:et-3/1/2 from disabled interfaces - T218059
  • 17:31 bblack: authdns1001 (ns0) upgrade gdnsd -> 3.1.0
  • 17:22 bblack: upgrade gdnsd -> 3.1.0 (wmf2) on cp1099 (authdns test)
  • 17:21 bblack: uploading gdnsd-3.1.0-1~wmf2 to stretch-wikimedia
  • 17:15 onimisionipe@deploy1001: Finished deploy [wdqs/wdqs@115a6bf]: Added more endpoint, GUI updates and new bot pattern (duration: 12m 10s)
  • 17:07 arturo: restart dhcp server in install2002 to release old lease for labtestnet2003
  • 17:03 onimisionipe@deploy1001: Started deploy [wdqs/wdqs@115a6bf]: Added more endpoint, GUI updates and new bot pattern
  • 16:32 vgutierrez: slowly reenabling puppet in cache text cluster - T213705
  • 16:28 bblack: upgrade gdnsd -> 3.1.0 on cp1099 (authdns test)
  • 16:25 bblack: uploading gdnsd-3.1.0-1~wmf1 to stretch-wikimedia
  • 16:15 arturo: T219776 reimaging + renaming labtestnet2003 into cloudnet2003-dev
  • 16:13 vgutierrez@puppetmaster1001: conftool action : set/pooled=yes; selector: name=cp1075.eqiad.wmnet
  • 16:07 vgutierrez@puppetmaster1001: conftool action : set/pooled=no; selector: name=cp1075.eqiad.wmnet
  • 16:05 vgutierrez@puppetmaster1001: conftool action : set/pooled=yes; selector: name=cp2023.codfw.wmnet
  • 15:57 vgutierrez@puppetmaster1001: conftool action : set/pooled=no; selector: name=cp2023.codfw.wmnet
  • 15:56 vgutierrez@puppetmaster1001: conftool action : set/pooled=yes; selector: name=cp3042.esams.wmnet
  • 15:49 vgutierrez@puppetmaster1001: conftool action : set/pooled=no; selector: name=cp3042.esams.wmnet
  • 15:48 vgutierrez@puppetmaster1001: conftool action : set/pooled=yes; selector: name=cp4032.ulsfo.wmnet
  • 15:43 vgutierrez@puppetmaster1001: conftool action : set/pooled=no; selector: name=cp4032.ulsfo.wmnet
  • 15:42 vgutierrez@puppetmaster1001: conftool action : set/pooled=yes; selector: name=cp5007.eqsin.wmnet
  • 15:30 vgutierrez@puppetmaster1001: conftool action : set/pooled=no; selector: name=cp5007.eqsin.wmnet
  • 15:24 vgutierrez: disable puppet in the cache text cluster - T213705
  • 15:09 Amir1: mwscript extensions/CirrusSearch/maintenance/updateSearchIndexConfig.php --wiki=hywwiki --baseName hywwiki --cluster (eqiad|codfw)
  • 14:59 mholloway-shell@deploy1001: Synchronized wmf-config/InitialiseSettings-labs.php: Cleanup: Remove obsolete WikimediaEditorTasks beta cluster prefs (duration: 00m 50s)
  • 14:44 moritzm: rolling out debdeploy 0.0.99.10 for jessie, buster, stretch systems
  • 14:42 moritzm: restarting superset on analytics-tool1004 to pick up latest Python
  • 14:41 Amir1: ladsgroup@mwmaint1002:~$ mwscript maintenance/createAndPromote.php --wiki=hywwiki --force --sysop Ladsgroup
  • 14:37 ladsgroup@deploy1001: Synchronized langlist: (no justification provided) (duration: 00m 50s)
  • 14:35 ladsgroup@deploy1001: Synchronized wmf-config/InitialiseSettings.php: (no justification provided) (duration: 00m 50s)
  • 14:33 ladsgroup@deploy1001: Synchronized multiversion/MWMultiVersion.php: T212597 (duration: 00m 51s)
  • 14:32 Amir1: wikiadmin@10.64.32.136(hywwiki)> update text set old_text = 'DB://cluster25/1';
  • 14:18 ladsgroup@deploy1001: rebuilt and synchronized wikiversions files: (no justification provided)
  • 14:11 moritzm: uploaded debdeploy 0.0.99.10 to apt.wikimedia.org (jessie, stretch, buster)
  • 14:07 ladsgroup@deploy1001: Synchronized dblists: (no justification provided) (duration: 00m 52s)
  • 14:04 vgutierrez@puppetmaster1001: conftool action : set/pooled=yes; selector: name=cp5007.eqsin.wmnet
  • 13:57 vgutierrez@puppetmaster1001: conftool action : set/pooled=no; selector: name=cp5007.eqsin.wmnet
  • 13:56 vgutierrez@puppetmaster1001: conftool action : set/pooled=yes; selector: name=cp5001.eqsin.wmnet
  • 13:50 hashar: Reverted CI Jenkins jobs to Quibble 0.0.28 # T219647
  • 13:47 vgutierrez@puppetmaster1001: conftool action : set/pooled=no; selector: name=cp5001.eqsin.wmnet
  • 13:26 mvolz@deploy1001: scap-helm citoid finished
  • 13:26 mvolz@deploy1001: scap-helm citoid cluster codfw completed
  • 13:26 mvolz@deploy1001: scap-helm citoid upgrade production -f citoid-codfw-values.yaml stable/citoid [namespace: citoid, clusters: codfw]
  • 13:23 mvolz@deploy1001: scap-helm citoid finished
  • 13:23 mvolz@deploy1001: scap-helm citoid cluster eqiad completed
  • 13:23 mvolz@deploy1001: scap-helm citoid upgrade production -f citoid-eqiad-values.yaml stable/citoid [namespace: citoid, clusters: eqiad]
  • 13:12 mvolz@deploy1001: scap-helm citoid finished
  • 13:12 mvolz@deploy1001: scap-helm citoid cluster staging completed
  • 13:12 mvolz@deploy1001: scap-helm citoid upgrade staging -f citoid-staging-values.yaml stable/citoid [namespace: citoid, clusters: staging]
  • 13:11 hashar: Upgraded CI Jenkins jobs to Quibble 0.0.30 # T219647
  • 13:09 jbond42: rolling security update of tshark
  • 12:24 oblivian@deploy1001: Finished deploy [docker-pkg/deploy@46ba982]: Rollback - third time is the charm (duration: 00m 43s)
  • 12:23 oblivian@deploy1001: Started deploy [docker-pkg/deploy@46ba982]: Rollback - third time is the charm
  • 12:08 oblivian@deploy1001: Finished deploy [docker-pkg/deploy@0c32dc1]: Rollback to 1.0.0, T219778 (duration: 00m 18s)
  • 12:08 oblivian@deploy1001: Started deploy [docker-pkg/deploy@0c32dc1]: Rollback to 1.0.0, T219778
  • 12:02 oblivian@deploy1001: Finished deploy [docker-pkg/deploy@UNKNOWN]: Rollback to 1.0.0, T219778 (duration: 00m 34s)
  • 12:02 oblivian@deploy1001: Started deploy [docker-pkg/deploy@UNKNOWN]: Rollback to 1.0.0, T219778
  • 11:58 Lucas_WMDE: EU SWAT done
  • 11:57 lucaswerkmeister-wmde@deploy1001: Synchronized php-1.33.0-wmf.23/extensions/WikibaseLexeme: SWAT: Fix GrammaticalFeatureListWidget (T219134, T219734)|gerrit:500237Fix GrammaticalFeatureListWidget (T219134, T219734) (duration: 01m 00s)
  • 11:53 moritzm: uploaded logstash/kibana/elasticsearch 5.6.15 to component thirdparty/elastic56
  • 11:52 moritzm: uploaded logstash/kibana/elasticsearch to component thirdparty/elastic56
  • 11:51 zfilipin@deploy1001: Synchronized wmf-config/InitialiseSettings.php: SWAT: Add unwatchedpages permission to rollbacker and patroller at zhwiki (T219285)|gerrit:500393Add unwatchedpages permission to rollbacker and patroller at zhwiki (T219285) (duration: 00m 52s)
  • 11:41 zfilipin@deploy1001: Synchronized static/images/project-logos/: SWAT: Correct logos for the Gujarati Wikipedia (T219373)|gerrit:499210Correct logos for the Gujarati Wikipedia (T219373) (duration: 00m 52s)
  • 11:34 zfilipin@deploy1001: Synchronized wmf-config/abusefilter.php: SWAT: Enable logging of private filters on commonswiki (T218527)|gerrit:497236Enable logging of private filters on commonswiki (T218527) (duration: 00m 50s)
  • 11:25 zfilipin@deploy1001: Synchronized wmf-config/InitialiseSettings.php: SWAT: Revert "Revert "Remove $wgAbuseFilterRuntimeProfile"" (T191039)|gerrit:498818Revert "Revert "Remove $wgAbuseFilterRuntimeProfile"" (T191039) (duration: 00m 51s)
  • 11:17 zfilipin@deploy1001: Synchronized wmf-config/abusefilter.php: SWAT: Revert "Revert "Remove $wgAbuseFilterProfile"" (T191039)|gerrit:498817Revert "Revert "Remove $wgAbuseFilterProfile"" (T191039) (duration: 00m 52s)
  • 11:16 oblivian@deploy1001: Finished deploy [docker-pkg/deploy@0c32dc1]: Upgrade to 1.1.2 (duration: 01m 08s)
  • 11:15 oblivian@deploy1001: Started deploy [docker-pkg/deploy@0c32dc1]: Upgrade to 1.1.2
  • 11:00 jbond42: halt rolling updates of tshark untill after SWAT
  • 10:48 jdrewniak@deploy1001: Synchronized portals: Wikimedia Portals Update: Bumping portals to master (T128546)|gerrit:500410 Bumping portals to master (T128546) (duration: 00m 50s)
  • 10:47 jdrewniak@deploy1001: Synchronized portals/wikipedia.org/assets: Wikimedia Portals Update: Bumping portals to master (T128546)|gerrit:500410 Bumping portals to master (T128546) (duration: 00m 52s)
  • 10:42 jbond42: rolling security update of tshark
  • 10:32 _joe_: pruning old images on boron
  • 10:31 oblivian@deploy1001: Finished deploy [docker-pkg/deploy@7ef5ca3]: Upgrade to 1.1.2 (duration: 00m 26s)
  • 10:31 oblivian@deploy1001: Started deploy [docker-pkg/deploy@7ef5ca3]: Upgrade to 1.1.2
  • 10:27 arturo: T219626 reimaging cloudcontrol2001-dev
  • 09:09 moritzm: installing Chromium security updates on proton* (tested the new release in deployment-prep)
  • 08:40 marostegui@deploy1001: Synchronized wmf-config/db-codfw.php: Repool db2033 (duration: 00m 51s)
  • 08:09 marostegui: Deploy testing schema change on enwiki.echo_event on db2033 and upgrade mysql - T143961
  • 07:54 ariel@deploy1001: Finished deploy [dumps/dumps@7abb6c8]: get db user/passwd va mw maint script (duration: 00m 03s)
  • 07:54 ariel@deploy1001: Started deploy [dumps/dumps@7abb6c8]: get db user/passwd va mw maint script
  • 07:30 marostegui@deploy1001: Synchronized wmf-config/db-codfw.php: Depool db2033 (duration: 00m 51s)
  • 06:28 _joe_: pushing wikimedia-jessie:{20190401,latest} to docker-registry.w.o T219580
  • 06:27 _joe_: installing new bootstrap-vz on boron T219580
  • 05:38 marostegui@deploy1001: Synchronized wmf-config/db-eqiad.php: Repool db1077 (duration: 00m 50s)
  • 05:08 marostegui: Deploy schema change on db1077, this will generate lag on s3 on labs
  • 05:08 marostegui@deploy1001: Synchronized wmf-config/db-eqiad.php: Depool db1077 (duration: 00m 53s)

2019-03-31

  • 06:57 marostegui: Remove old files from dbstore1001 to clean up the disk space warning

2019-03-30

  • 03:39 krinkle@deploy1001: Synchronized php-1.33.0-wmf.23/extensions/ImageMap/includes/ImageMap.php: I1387825f25e / T217087 (duration: 00m 52s)
  • 03:16 krinkle@deploy1001: Synchronized php-1.33.0-wmf.23/skins/Vector/includes/templates/index.mustache: I0d6e036b65da0 / T219359 / i18n regression (duration: 00m 54s)

2019-03-29

  • 22:06 bstorm_: stopped database services on labsdb1004 and labsdb1005
  • 21:01 ebernhardson@deploy1001: Finished deploy [search/mjolnir/deploy@ff9d424]: Elasticsearch 6 fixes for content-type headers (part 3) (duration: 05m 14s)
  • 20:55 ebernhardson@deploy1001: Started deploy [search/mjolnir/deploy@ff9d424]: Elasticsearch 6 fixes for content-type headers (part 3)
  • 20:49 ebernhardson@deploy1001: Finished deploy [search/mjolnir/deploy@ff9d424]: Elasticsearch 6 fixes for content-type headers (part 3) (duration: 03m 13s)
  • 20:46 ebernhardson@deploy1001: Started deploy [search/mjolnir/deploy@ff9d424]: Elasticsearch 6 fixes for content-type headers (part 3)
  • 20:35 ebernhardson@deploy1001: Finished deploy [search/mjolnir/deploy@ff9d424]: Elasticsearch 6 fixes for content-type headers (part 2) (duration: 03m 30s)
  • 20:31 ebernhardson@deploy1001: Started deploy [search/mjolnir/deploy@ff9d424]: Elasticsearch 6 fixes for content-type headers (part 2)
  • 20:30 ebernhardson@deploy1001: Finished deploy [search/mjolnir/deploy@ff9d424]: Elasticsearch 6 fixes for content-type headers (duration: 00m 30s)
  • 20:29 ebernhardson@deploy1001: Started deploy [search/mjolnir/deploy@ff9d424]: Elasticsearch 6 fixes for content-type headers
  • 18:41 ejegg: updated payments-wiki from 4b49bb7333 to 793bce1a5f
  • 15:51 XioNoX: repool ulsfo - T219591
  • 15:48 XioNoX: bump ulsfo-codfw ospf link cost to 1000 - T219591
  • 15:14 _joe_: pruning old images and containers on boron
  • 15:00 mutante: ldap-eqiad-replica02 - running out of disk - apt-get clean - gzipping /var/log/debug
  • 13:06 ema@puppetmaster1001: conftool action : set/pooled=yes; selector: name=cp2005.codfw.wmnet,service=varnish-fe
  • 13:06 ema@puppetmaster1001: conftool action : set/pooled=yes; selector: name=cp2005.codfw.wmnet,service=nginx
  • 13:06 ema@puppetmaster1001: conftool action : set/pooled=yes; selector: name=cp2002.codfw.wmnet,service=varnish-fe
  • 13:06 ema@puppetmaster1001: conftool action : set/pooled=yes; selector: name=cp2002.codfw.wmnet,service=nginx
  • 13:05 ema: cp2002/cp2005: repool varnish-fe for user traffic T213263
  • 12:55 thcipriani: gerrit running on 2.15.11
  • 12:53 thcipriani: restarting gerrit to finish rollback to 2.15.11
  • 12:52 thcipriani@deploy1001: Finished deploy [gerrit/gerrit@670ddb8]: Gerrit (back) to version 2.15.11 on cobalt -- restart of gerrit incoming (duration: 00m 11s)
  • 12:52 thcipriani@deploy1001: Started deploy [gerrit/gerrit@670ddb8]: Gerrit (back) to version 2.15.11 on cobalt -- restart of gerrit incoming
  • 12:51 moritzm: removing php 7.0 packages from snapshot1008, dumps are only using 7.2 (T218193)
  • 12:50 thcipriani@deploy1001: Finished deploy [gerrit/gerrit@670ddb8]: Gerrit (back) to version 2.15.11 (gerrit2001 only) (duration: 00m 10s)
  • 12:50 thcipriani@deploy1001: Started deploy [gerrit/gerrit@670ddb8]: Gerrit (back) to version 2.15.11 (gerrit2001 only)
  • 12:47 moritzm: upgrading snapshot1008 to component/php72 (T218193)
  • 12:46 moritzm: upgrading snapshot1005-1007/1009 to component/php72 (T218193)
  • 12:23 ema: rolling ATS restarts to apply https://gerrit.wikimedia.org/r/#/c/operations/puppet/+/500011/ T213263
  • 11:45 mutante: cobalt - systemctl restart gerrit
  • 10:36 ema@puppetmaster1001: conftool action : set/pooled=no; selector: name=cp2002.codfw.wmnet,service=varnish-fe
  • 10:36 ema@puppetmaster1001: conftool action : set/pooled=no; selector: name=cp2002.codfw.wmnet,service=nginx
  • 10:35 ema@puppetmaster1001: conftool action : set/pooled=yes; selector: name=cp2002.codfw.wmnet,service=varnish-fe
  • 10:35 ema@puppetmaster1001: conftool action : set/pooled=yes; selector: name=cp2002.codfw.wmnet,service=nginx
  • 09:37 mutante: restarting zuul on contint1001
  • 09:05 ema@puppetmaster1001: conftool action : set/pooled=no; selector: name=cp2005.codfw.wmnet,service=varnish-fe
  • 09:05 ema@puppetmaster1001: conftool action : set/pooled=no; selector: name=cp2005.codfw.wmnet,service=nginx
  • 09:05 ema@puppetmaster1001: conftool action : set/pooled=no; selector: name=cp2002.codfw.wmnet,service=varnish-fe
  • 09:05 ema@puppetmaster1001: conftool action : set/pooled=no; selector: name=cp2002.codfw.wmnet,service=nginx
  • 08:36 godog: depool ulsfo as precaution -- link repair in progress
  • 08:30 marostegui@deploy1001: Synchronized wmf-config/db-eqiad.php: Fully repool db1110 (duration: 00m 50s)
  • 07:58 gilles@deploy1001: Synchronized php-1.33.0-wmf.23/includes/media/MediaTransformOutput.php: T216499 Only apply high priority half the time (duration: 00m 50s)
  • 07:54 marostegui@deploy1001: Synchronized wmf-config/db-eqiad.php: More traffic to db1110 (duration: 00m 51s)
  • 07:28 marostegui@deploy1001: Synchronized wmf-config/db-eqiad.php: More traffic to db1110 (duration: 00m 50s)
  • 07:19 vgutierrez: reenabling puppet in acme-chief clients after verifying NOOP in netmon2001
  • 07:17 marostegui@deploy1001: Synchronized wmf-config/db-eqiad.php: Slowly repool db1110 (duration: 01m 06s)
  • 07:11 vgutierrez: disabling puppet in acme-chief clients to merge I437b91 safely
  • 07:06 marostegui: Upgrade db1110
  • 07:06 marostegui@deploy1001: Synchronized wmf-config/db-eqiad.php: Depool db1110 (duration: 00m 49s)
  • 07:01 gilles@deploy1001: Synchronized wmf-config/InitialiseSettings.php: T216598 T216594 Element Timing for Images and Layout Stability on ruwiki (duration: 00m 51s)
  • 06:56 marostegui: Remove tools section from tendril by doing: update shards set display='0' where name='tools'; T216749
  • 06:44 marostegui@deploy1001: Synchronized wmf-config/db-eqiad.php: Repool pc1009 (duration: 00m 49s)
  • 06:41 marostegui: Upgrade pc1009
  • 06:32 marostegui@deploy1001: Synchronized wmf-config/db-eqiad.php: Depool pc1009 (duration: 00m 50s)
  • 06:19 marostegui@deploy1001: Synchronized wmf-config/db-eqiad.php: Repool db1075 (duration: 00m 50s)
  • 05:49 marostegui: Disable notifications on labsdb1004 and labsdb1005 - T216749
  • 05:47 marostegui: Remove labsdb1004 and labsdb1005 from tendril - T216749
  • 05:46 marostegui@deploy1001: Synchronized wmf-config/db-eqiad.php: Depool db1075 (duration: 00m 52s)
  • 00:18 krinkle@deploy1001: Synchronized php-1.33.0-wmf.23/includes/api/ApiStashEdit.php: I35213d83a0 (duration: 00m 49s)
  • 00:16 krinkle@deploy1001: Synchronized wmf-config/InitialiseSettings.php: I8887ce013a8 (duration: 00m 51s)
  • 00:00 krinkle@deploy1001: Synchronized wmf-config/InitialiseSettings.php: I24a5469dbfd0 / T216206 for testwikidatawiki (duration: 00m 50s)

2019-03-28

  • 23:54 krinkle@deploy1001: Synchronized wmf-config/Wikibase.php: Ib9d617 (duration: 00m 50s)
  • 23:53 krinkle@deploy1001: Synchronized wmf-config/WikibaseSearchSettings.php: Ib9d617 (duration: 00m 51s)
  • 23:14 bstorm_: completed setting up clouddb1003 as the replica of labsdb1006 (osm)
  • 22:13 bd808@deploy1001: Finished deploy [striker/deploy@2f62c43]: Fixes for error pages and repo creation (T176325) (duration: 00m 59s)
  • 22:12 bd808@deploy1001: Started deploy [striker/deploy@2f62c43]: Fixes for error pages and repo creation (T176325)
  • 22:11 XioNoX: add AS specific policy-statements to cr1-eqsin v6 transits - T211930
  • 21:51 thcipriani: restarting gerrit
  • 21:18 jforrester@deploy1001: Synchronized wmf-config/InitialiseSettings.php: [Wikimania] Enable VisualEditor in the 2019 namespace T218645 (duration: 00m 50s)
  • 21:16 XioNoX: add AS specific policy-statements to cr2-eqsin v6 transits - T211930
  • 21:13 jforrester@deploy1001: Synchronized wmf-config/InitialiseSettings.php: [Wikitech] Enable VisualEditor in extra namespaces (duration: 00m 50s)
  • 20:48 jforrester@deploy1001: Synchronized wmf-config/InitialiseSettings.php: VisualEditor: Enable mobile section editing A/B test on 10 Wikipedias T218851 T218939 (duration: 00m 50s)
  • 20:29 moritzm: restarting Gerrit on cobalt to effect new Java security update
  • 19:47 mholloway-shell@deploy1001: Synchronized wmf-config/InitialiseSettings.php: Enable WikimediaEditorTasks on wikidatawiki (duration: 00m 52s)
  • 19:39 mdholloway: created table wikimedia_editor_tasks_entity_description_exists on wikidatawiki
  • 19:19 marxarelli: 1.33.0-wmf.23 deployed for all wikis (T206677)
  • 19:09 marxarelli: dduvall@deploy1001 rebuilt and synchronized wikiversions files: all wikis to 1.33.0-wmf.23
  • 18:45 bstorm_: switching replica for osmdb to clouddb1003 VM from labsdb1007
  • 18:42 addshore@deploy1001: Synchronized wmf-config/db-labs.php: BETA ONLY db-labs (duration: 00m 57s)
  • 18:35 addshore@deploy1001: Synchronized wmf-config/Wikibase.php: SWAT: wikibase.php, define sharedCacheKeyGroup (duration: 00m 57s)
  • 18:32 jforrester@deploy1001: Synchronized php-1.33.0-wmf.23/extensions/ProofreadPage/includes/Index/IndexContent.php: ProofreadPage: Fix AbuseFilter UBN T219514 (duration: 00m 57s)
  • 18:17 jforrester@deploy1001: Synchronized php-1.33.0-wmf.23/extensions/AdvancedSearch/: AdvancedSearch: Fix two UBNs T219455 T219539 (duration: 00m 59s)
  • 18:03 ejegg: updated payments-wiki from 6661655e37 to 4b49bb7333
  • 17:46 ebernhardson@deploy1001: Finished deploy [search/mjolnir/deploy@ff9d424]: Deploy logging @cee: prefixing bugfix (duration: 03m 24s)
  • 17:43 ebernhardson@deploy1001: Started deploy [search/mjolnir/deploy@ff9d424]: Deploy logging @cee: prefixing bugfix
  • 16:39 XioNoX: enable cr2-codfw:xe-5/0/0 (to cr2-eqdfw)
  • 16:36 mutante: wikitech-static - changing [renewalparams] authenticator = to 'apache' from 'standalone' (installer = was already apache) (T214640)
  • 16:36 jbond42: move python3-requests and python3-urllib3 from jessie-wikimedia backports to component/kube2proxy
  • 16:33 XioNoX: disable cr2-codfw:xe-5/0/0 (to cr2-eqdfw)
  • 16:00 akosiaris: poweroff sessionstore2001 for a re-racking
  • 15:15 mutante: wikitech-static - removing acme-setup cron jobs from root's crontab. this was used before the switch to certbot, is unrelated and added to confusion and maybe the problem (T214640)
  • 15:07 otto@deploy1001: scap-helm eventgate-analytics finished
  • 15:07 otto@deploy1001: scap-helm eventgate-analytics cluster eqiad completed
  • 15:07 otto@deploy1001: scap-helm eventgate-analytics upgrade production -f eventgate-analytics-eqiad-values.yaml --reset-values stable/eventgate-analytics [namespace: eventgate-analytics, clusters: eqiad]
  • 15:07 otto@deploy1001: scap-helm eventgate-analytics finished
  • 15:07 otto@deploy1001: scap-helm eventgate-analytics cluster codfw completed
  • 15:07 otto@deploy1001: scap-helm eventgate-analytics upgrade production -f eventgate-analytics-codfw-values.yaml --reset-values stable/eventgate-analytics [namespace: eventgate-analytics, clusters: codfw]
  • 15:06 otto@deploy1001: scap-helm eventgate-analytics finished
  • 15:06 otto@deploy1001: scap-helm eventgate-analytics cluster staging completed
  • 15:06 otto@deploy1001: scap-helm eventgate-analytics upgrade staging -f eventgate-analytics-staging-values.yaml --reset-values stable/eventgate-analytics [namespace: eventgate-analytics, clusters: staging]
  • 14:46 ppchelko@deploy1001: Finished deploy [cpjobqueue/deploy@4deeb04]: Partition htmlCacheUpdate topic, final cleanup stage T219159 (duration: 00m 52s)
  • 14:45 ppchelko@deploy1001: Started deploy [cpjobqueue/deploy@4deeb04]: Partition htmlCacheUpdate topic, final cleanup stage T219159
  • 14:32 ppchelko@deploy1001: Finished deploy [cpjobqueue/deploy@3a8a889]: Partition htmlCacheUpdate topic, step 2 T219159 (duration: 00m 53s)
  • 14:31 ppchelko@deploy1001: Started deploy [cpjobqueue/deploy@3a8a889]: Partition htmlCacheUpdate topic, step 2 T219159
  • 14:07 gehel: reindexing changes from '2019-03-26T12:00:00Z' to '2019-03-28T12:00:00Z' into cirrus / elasticsearch - T218878
  • 13:59 gehel: restarting elasticsearch on elastic2050 to validate JVM upgrade
  • 13:57 moritzm: upgrading Java on elasticsearch hosts
  • 13:50 filippo@puppetmaster1001: conftool action : set/pooled=no; selector: name=prometheus2004.codfw.wmnet
  • 13:49 filippo@puppetmaster1001: conftool action : set/pooled=yes; selector: name=prometheus2003.codfw.wmnet
  • 13:22 ppchelko@deploy1001: Finished deploy [cpjobqueue/deploy@c120b38]: Partition htmlCacheUpdate topic, explicitly exclude htmlCacheUpdate T219159 (duration: 00m 48s)
  • 13:21 ppchelko@deploy1001: Started deploy [cpjobqueue/deploy@c120b38]: Partition htmlCacheUpdate topic, explicitly exclude htmlCacheUpdate T219159
  • 13:14 ppchelko@deploy1001: Finished deploy [cpjobqueue/deploy@17285f8]: Partition htmlCacheUpdate topic, step 1 T219159 (duration: 01m 46s)
  • 13:12 ppchelko@deploy1001: Started deploy [cpjobqueue/deploy@17285f8]: Partition htmlCacheUpdate topic, step 1 T219159
  • 12:20 moritzm: removing php 7.0 packages from snapshot1005-1007/1009, dumps are only using 7.2 (T218193)
  • 12:13 jbond42: move git from jessie-wikimedia backports repo components/ci
  • 12:02 lucaswerkmeister-wmde@deploy1001: Synchronized wmf-config/InitialiseSettings.php: Revert "SDC: Enable both new-style and old-style Wikibase federation on Commons" (T219450)|gerrit:499756Revert "SDC: Enable both new-style and old-style Wikibase federation on Commons" (T219450) (duration: 00m 57s)
  • 11:54 moritzm: upgrading snapshot1005-1007/1009 to component/php72 (T218193)
  • 11:53 ladsgroup@deploy1001: rebuilt and synchronized wikiversions files: Revert T212597
  • 11:51 ladsgroup@deploy1001: Synchronized dblists: Revert T212597 (duration: 00m 58s)
  • 11:27 ladsgroup@deploy1001: Synchronized dblists: T212597 (duration: 00m 56s)
  • 11:01 godog: test copying prometheus metrics on bast3002
  • 10:54 gehel: restarting elasticsearch-psi on elastic20[35,36,53] (shards stuck in recovery) - T218878
  • 10:22 gehel: restarting elasticsearch on elastic20[34,36,50] (shards stuck in recovery) - T218878
  • 10:15 addshore@deploy1001: Synchronized php-1.33.0-wmf.23/extensions/Wikibase/lib: T219452 Revert: Use enableModuleContentVersion() for Wikibase\lib\SitesModule|gerrit:499738Revert: Use enableModuleContentVersion() for Wikibase\lib\SitesModule (duration: 01m 06s)
  • 10:11 gehel: restarting elasticsearch-omega on elastic2050 (shards stuck in recovery) - T218878
  • 09:56 gehel: restarting elasticsearch-omega on elastic2031 (shards stuck in recovery) - T218878
  • 09:42 gehel: restarting elasticsearch on elastic20[28,29,41] (shards stuck in recovery) - T218878
  • 09:37 gehel: restarting elasticsearch-psi on elastic20[39,40] (shards stuck in recovery) - T218878
  • 09:33 marostegui@deploy1001: Synchronized wmf-config/db-eqiad.php: Repool db1123 (duration: 00m 56s)
  • 09:28 gehel: restarting elasticsearch on elastic20[25,27] (shards stuck in recovery) - T218878
  • 09:19 gehel: restarting elasticsearch-omega on elastic20[38,50] (shards stuck in recovery) - T218878
  • 09:14 godog: install rsyslog 8.1901.0-1~bpo8+wmf1 on phab1001 and copper
  • 09:09 gehel: restarting elasticsearch-omega on elastic2050 (shards stuck in recovery) - T218878
  • 09:06 gehel: restarting elasticsearch-psi on elastic20[35,36,53] (shards stuck in recovery) - T218878
  • 09:00 gehel: restarting elasticsearch-psi on elastic2036 (shards stuck in recovery) - T218878
  • 08:57 marostegui@deploy1001: Synchronized wmf-config/db-eqiad.php: Depool db1123 (duration: 00m 55s)
  • 08:43 marostegui@deploy1001: Synchronized wmf-config/db-codfw.php: Repool pc2007 after upgrade (duration: 00m 57s)
  • 08:38 gehel: retry shard allocation on elasticsearch codfw all clusters (curl -k -XPOST 'https://localhost:9243/_cluster/reroute?pretty&explain=true&retry_failed') - T218878
  • 08:37 gehel: retry shard allocation on elasticsearch codfw (curl -k -XPOST 'https://localhost:9243/_cluster/reroute?pretty&explain=true&retry_failed')
  • 08:33 elukey: move hadoop yarn configuration from hdfs back to zookeeper - T218758
  • 08:32 marostegui: Upgrade pc2007
  • 08:31 marostegui@deploy1001: Synchronized wmf-config/db-codfw.php: Depool pc2007 for upgrade (duration: 00m 56s)
  • 08:23 marostegui@deploy1001: Synchronized wmf-config/db-codfw.php: Repool pc2009 after upgrade (duration: 00m 57s)
  • 08:12 marostegui: Upgrade pc2009
  • 08:11 marostegui@deploy1001: Synchronized wmf-config/db-codfw.php: Depool pc2009 for upgrade (duration: 00m 57s)
  • 08:10 gehel@cumin2001: END (PASS) - Cookbook sre.elasticsearch.force-shard-allocation (exit_code=0)
  • 08:07 gehel@cumin2001: START - Cookbook sre.elasticsearch.force-shard-allocation
  • 07:32 marostegui@deploy1001: Synchronized wmf-config/db-codfw.php: Repool pc2008 after upgrade (duration: 00m 57s)
  • 07:22 marostegui: Upgrade pc2008
  • 07:22 marostegui@deploy1001: Synchronized wmf-config/db-codfw.php: Depool pc2008 for upgrade (duration: 00m 57s)
  • 07:18 marostegui@deploy1001: Synchronized wmf-config/db-eqiad.php: Clean up old non used entries (duration: 01m 04s)
  • 06:27 marostegui: Deploy schema change on s3 codfw, lag will be generated on s3 codfw.
  • 05:39 marostegui: Restart apache on phab1001 - phabricator is down
  • 02:50 chaomodus: restarted pdfrender on scb1004 in order to attempt to address flapping errors
  • 01:45 XioNoX: add AS specific policy-statements to cr2-eqsin (but don't apply them yet) - T211930
  • 01:20 XioNoX: progressive jnt push to standardize cr*
  • 01:15 XioNoX: remove sandbox-out6 filter from all routers
  • 00:56 XioNoX: jnt push to standardize asw*
  • 00:32 XioNoX: jnt push to standardize mr1-*
  • 00:21 krinkle@deploy1001: Synchronized php-1.33.0-wmf.23/includes/api/ApiStashEdit.php: Ic357dbfcd9ab / T203786 (duration: 00m 57s)

2019-03-27

  • 23:46 mholloway-shell@deploy1001: Synchronized php-1.33.0-wmf.23/extensions/WikimediaEditorTasks: Fix: Pass database name to the NameTableStore constructor (duration: 00m 57s)
  • 23:34 thcipriani@deploy1001: Synchronized wmf-config/InitialiseSettings.php: SWAT: Load WikibaseLexemeCirrusSearch on Wikidata|gerrit:499400Load WikibaseLexemeCirrusSearch on Wikidata T216206 (duration: 00m 58s)
  • 23:25 thcipriani@deploy1001: Synchronized wmf-config/InitialiseSettings.php: SWAT: Load WikibaseLexemeCirrusSearch on test.wikidata.org|gerrit:499399Load WikibaseLexemeCirrusSearch on test.wikidata.org T216206 (duration: 00m 59s)
  • 22:51 smalyshev@deploy1001: Finished deploy [wdqs/wdqs@4fd1022]: Deploy new bot patterns (duration: 00m 31s)
  • 22:51 smalyshev@deploy1001: Started deploy [wdqs/wdqs@4fd1022]: Deploy new bot patterns
  • 22:47 smalyshev@deploy1001: Finished deploy [wdqs/wdqs@4fd1022]: Deploy new bot patterns (duration: 00m 04s)
  • 22:47 smalyshev@deploy1001: Started deploy [wdqs/wdqs@4fd1022]: Deploy new bot patterns
  • 22:45 krinkle@deploy1001: Synchronized wmf-config/profiler.php: I8c7f8c / T176916 (duration: 00m 59s)
  • 22:36 smalyshev@deploy1001: Finished deploy [wdqs/wdqs@4fd1022]: Deploy new bot patterns (duration: 00m 34s)
  • 22:35 smalyshev@deploy1001: Started deploy [wdqs/wdqs@4fd1022]: Deploy new bot patterns
  • 22:30 niharika29@deploy1001: Finished deploy [scholarships/scholarships@9db232d]: Update wikimania-scholarships; includes fix for broken privacy policy link (duration: 00m 02s)
  • 22:30 niharika29@deploy1001: Started deploy [scholarships/scholarships@9db232d]: Update wikimania-scholarships; includes fix for broken privacy policy link
  • 22:21 smalyshev@deploy1001: Finished deploy [wdqs/wdqs@4fd1022]: Deploy new bot patterns (duration: 00m 31s)
  • 22:21 smalyshev@deploy1001: Started deploy [wdqs/wdqs@4fd1022]: Deploy new bot patterns
  • 21:59 chaomodus: restarting proton1001 to upgrade ram
  • 21:58 chaomodus: restarting proton1002 to upgrade ram
  • 21:57 chaomodus: restarting proton2001 in order to upgrade ram
  • 21:54 chaomodus: restarting proton2002 in order to upgrade ram
  • 21:25 dcausse@deploy1001: Synchronized wmf-config/Wikibase.php: T219448 (duration: 00m 55s)
  • 21:25 eileen: civicrm revision changed from 67b8405b60 to 7560af93df, config revision is 5a0cbb3c7d (was actually before the process control one)
  • 21:24 eileen: process-control config revision is e1bc772c89
  • 21:17 chaomodus: restarted proton on proton1001 in response to memory exhaustion and cpu peg
  • 21:07 milimetric@deploy1001: Finished deploy [analytics/refinery@fdd21a4]: non-deploy changes and two new oozie jobs (duration: 11m 48s)
  • 20:55 milimetric@deploy1001: Started deploy [analytics/refinery@fdd21a4]: non-deploy changes and two new oozie jobs
  • 20:29 mholloway-shell@deploy1001: Synchronized wmf-config/InitialiseSettings.php: Update WikimediaEditorTasks config for DB location split (duration: 00m 57s)
  • 20:23 mholloway-shell@deploy1001: Synchronized php-1.33.0-wmf.23/extensions/WikimediaEditorTasks: Update DB utils to handle counts and suggestion DBs in different locations (duration: 00m 58s)
  • 20:14 gehel@cumin2001: END (FAIL) - Cookbook sre.elasticsearch.e6-upgrade (exit_code=99)
  • 20:14 mholloway-shell@deploy1001: Synchronized php-1.33.0-wmf.23/extensions/WikimediaEditorTasks: Fix: Use READ_LOCKING when evaluating whether to update targets_passed (duration: 00m 58s)
  • 20:04 gehel@cumin2001: START - Cookbook sre.elasticsearch.e6-upgrade
  • 20:03 gehel@cumin2001: END (ERROR) - Cookbook sre.elasticsearch.e6-upgrade (exit_code=97)
  • 19:48 gehel@cumin2001: END (PASS) - Cookbook sre.elasticsearch.force-shard-allocation (exit_code=0)
  • 19:43 gehel@cumin2001: START - Cookbook sre.elasticsearch.force-shard-allocation
  • 19:43 herron: removed queued wikidata notification messages for a***a@w**gm**ster.** on mx1001 to address gmail excessive volume rate limiting
  • 19:32 jijiki: restarting pdfrender on scb1001
  • 19:30 gehel@cumin2001: END (PASS) - Cookbook sre.elasticsearch.force-shard-allocation (exit_code=0)
  • 19:27 marxarelli: (resent; originally @ 1916) dduvall@deploy1001 rebuilt and synchronized wikiversions files: group1 wikis to 1.33.0-wmf.23
  • 19:23 gehel@cumin2001: START - Cookbook sre.elasticsearch.force-shard-allocation
  • 19:18 dduvall@deploy1001: Synchronized php: group1 wikis to 1.33.0-wmf.23 (duration: 01m 45s)
  • 19:14 gehel@cumin2001: START - Cookbook sre.elasticsearch.e6-upgrade
  • 18:48 thcipriani: restarting gerrit process
  • 18:12 jynus: update grants on db1115 for new provisioning hosts on codfw T218336
  • 18:10 elukey: interface::rps applied to all the mc10XX hosts - T203786
  • 17:41 gehel@cumin2001: END (PASS) - Cookbook sre.elasticsearch.force-shard-allocation (exit_code=0)
  • 17:41 gehel@cumin2001: START - Cookbook sre.elasticsearch.force-shard-allocation
  • 17:10 ema: fermium: /usr/local/sbin/disable_list wikimetrics T211835
  • 16:55 jforrester@deploy1001: Synchronized wmf-config/InitialiseSettings.php: T214075 SDC: Enable Wikidata federation on Commons (duration: 00m 57s)
  • 16:38 elukey: mc20XX and mc1022 have interface::rps enabled - T203786
  • 16:28 jforrester@deploy1001: Synchronized php-1.33.0-wmf.23/extensions/GlobalPreferences/includes/GlobalPreferencesFactory.php: Hot-fix T219380 GlobalPreferences: Allow modifiedPrefs to be set even if no UI control (duration: 00m 58s)
  • 16:18 jforrester@deploy1001: Synchronized wmf-config/CommonSettings.php: SDC: Use feature flag for enabling depicts in UW (duration: 00m 57s)
  • 16:13 jforrester@deploy1001: Synchronized wmf-config/InitialiseSettings.php: SDC: Add feature flag for enabling depicts in UW (duration: 00m 57s)
  • 15:56 jbond42: bastion reboots complete
  • 15:56 ariel@deploy1001: Finished deploy [dumps/dumps@88ddd76]: ability to use lbzip2 for meta-history compression (duration: 00m 03s)
  • 15:56 ariel@deploy1001: Started deploy [dumps/dumps@88ddd76]: ability to use lbzip2 for meta-history compression
  • 15:44 jbond42: rebooting bast2001.wikimedia.org in 5 minutes
  • 15:44 gehel@cumin2001: END (PASS) - Cookbook sre.elasticsearch.force-shard-allocation (exit_code=0)
  • 15:43 gehel@cumin2001: START - Cookbook sre.elasticsearch.force-shard-allocation
  • 15:42 jbond42: rebooting bast2002.wikimedia.org in 5 minutes
  • 15:38 jbond42: rebooting bast1002.wikimedia.org in 5 minutes
  • 15:34 jbond42: rebooting bast4002.wikimedia.org in 5 minutes
  • 15:30 jbond42: rebooting bast5001.wikimedia.org in 5 minutes
  • 15:24 jbond42: rebooting iron.wikimedia.org in 5 minutes
  • 15:22 gehel@cumin2001: END (PASS) - Cookbook sre.elasticsearch.force-shard-allocation (exit_code=0)
  • 15:21 gehel@cumin2001: START - Cookbook sre.elasticsearch.force-shard-allocation
  • 15:19 elukey: slowly rolling out interface::rps to all the mcXXXX nodes - T203786
  • 14:52 gehel@cumin2001: END (FAIL) - Cookbook sre.elasticsearch.e6-upgrade (exit_code=99)
  • 14:45 gehel@cumin2001: END (PASS) - Cookbook sre.elasticsearch.force-shard-allocation (exit_code=0)
  • 14:44 gehel@cumin2001: START - Cookbook sre.elasticsearch.force-shard-allocation
  • 14:13 gehel@cumin2001: END (PASS) - Cookbook sre.elasticsearch.force-shard-allocation (exit_code=0)
  • 14:12 marostegui: Sanitize hywwiki on db1124:3313 T212625
  • 14:11 gehel@cumin2001: START - Cookbook sre.elasticsearch.force-shard-allocation
  • 14:05 godog: roll-restart logstash to apply https://gerrit.wikimedia.org/r/c/operations/puppet/+/498417
  • 13:38 gehel@cumin2001: START - Cookbook sre.elasticsearch.e6-upgrade
  • 13:35 gehel@cumin2001: END (PASS) - Cookbook sre.elasticsearch.force-shard-allocation (exit_code=0)
  • 13:35 gehel@cumin2001: START - Cookbook sre.elasticsearch.force-shard-allocation
  • 13:11 ladsgroup@deploy1001: Synchronized dblists: (no justification provided) (duration: 00m 57s)
  • 12:42 ladsgroup@deploy1001: Synchronized dblists: (no justification provided) (duration: 00m 58s)
  • 12:41 Amir1: scap sync-file dblists
  • 12:30 Amir1: mwscript extensions/WikimediaMaintenance/addWiki.php --wiki=mediawikiwiki hyw wikipedia hywwiki hyw.wikipedia.org
  • 12:25 gehel@cumin2001: END (PASS) - Cookbook sre.elasticsearch.force-shard-allocation (exit_code=0)
  • 12:23 gehel@cumin2001: START - Cookbook sre.elasticsearch.force-shard-allocation
  • 12:15 gehel@cumin2001: END (FAIL) - Cookbook sre.elasticsearch.e6-upgrade (exit_code=99)
  • 11:47 gehel@cumin2001: END (PASS) - Cookbook sre.elasticsearch.force-shard-allocation (exit_code=0)
  • 11:43 gehel@cumin2001: START - Cookbook sre.elasticsearch.force-shard-allocation
  • 11:37 mdholloway: created wikimedia_editor_tasks_entity_description_exists table on testwikidatawiki
  • 11:28 _joe_: SWAT done
  • 11:24 oblivian@deploy1001: Synchronized php-1.33.0-wmf.22/extensions/WikimediaEvents: SWAT: Backport Use a cookie to persist the seed for php7 a/b test to .22 T216676 (duration: 00m 58s)
  • 11:20 zfilipin@deploy1001: Synchronized wmf-config/throttle.php: SWAT: Throttle rule for The Art and Feminism Edit-a-thon in Taiwan (T219113)|gerrit:498770Throttle rule for The Art and Feminism Edit-a-thon in Taiwan (T219113) (duration: 00m 59s)
  • 11:14 zfilipin@deploy1001: Synchronized wmf-config/throttle.php: SWAT: Clean the throttles up (T219311)|gerrit:499287Clean the throttles up (T219311) (duration: 00m 57s)
  • 11:10 dcausse: elasticsearch search cluster: setting cluster.routing.allocation.disk.watermark.flood_stage to 100% on omega/psi/chi@eqiad (T219364)
  • 11:08 zfilipin@deploy1001: Synchronized wmf-config/throttle.php: SWAT: Add throttle rule for Czech editathon (T219291)|gerrit:499231Add throttle rule for Czech editathon (T219291) (duration: 00m 58s)
  • 11:06 dcausse: elasticsearch search cluster: setting "index.blocks.read_only_allow_delete" to null on all indices in omega/psi/chi@omega (T219364)
  • 11:04 mutante: re-enabled puppet on logstash1007 through 1011 - then on logstash*
  • 11:00 gehel@cumin2001: START - Cookbook sre.elasticsearch.e6-upgrade
  • 10:57 godog: upgrade rsyslog to 8.1903.0-3~bpo8+wmf1 on cobalt to test imfile file rotation fix - T214176
  • 10:53 mutante: enabling and running puppet on logstash1007
  • 10:49 mutante: disabling puppet on logstash* via cumin
  • 10:47 marostegui@deploy1001: Synchronized wmf-config/db-eqiad.php: Repool db1090:3312 (duration: 00m 58s)
  • 10:20 godog: upgrade rsyslog to 8.1903.0-3~bpo8+wmf1 on phab1001 to test imfile file rotation fix - T214176
  • 09:58 marostegui@deploy1001: Synchronized wmf-config/db-eqiad.php: Depool db1090:3312 (duration: 00m 56s)
  • 09:41 marostegui@deploy1001: Synchronized wmf-config/db-eqiad.php: Repool db1074 (duration: 00m 57s)
  • 09:41 marostegui: Upgrade db2092
  • 09:06 vgutierrez: puppet reenabled in acme-chief clients - T207295
  • 09:01 marostegui: Deploy schema change on db1074, this will generate lag on labsdb hosts for s2
  • 09:01 marostegui@deploy1001: Synchronized wmf-config/db-eqiad.php: Depool db1074 (duration: 00m 57s)
  • 08:56 marostegui@deploy1001: Synchronized wmf-config/db-eqiad.php: Repool db1076 (duration: 00m 54s)
  • 08:33 vgutierrez: disabling puppet in acme-chief clients to get rid safely of old TLS material - T207295
  • 08:25 marostegui@deploy1001: Synchronized wmf-config/db-eqiad.php: Depool db1076 (duration: 00m 57s)
  • 08:17 godog: bounce rsyslog on phab* - apache access logs stopped at ~6.30 today
  • 08:09 godog: bounce rsyslog on cobalt - apache access logs stopped at ~6.30 today
  • 08:07 marostegui@deploy1001: Synchronized wmf-config/db-eqiad.php: Repool db1122 (duration: 00m 57s)
  • 07:11 marostegui@deploy1001: Synchronized wmf-config/db-eqiad.php: Depool db1122 (duration: 00m 58s)
  • 06:57 SMalyshev: depooled wdqs1005 to catch up
  • 06:56 SMalyshev: repooled wdqs1004
  • 06:42 marostegui@deploy1001: Synchronized wmf-config/db-eqiad.php: Repool db1103:3312 (duration: 00m 58s)
  • 06:02 marostegui@deploy1001: Synchronized wmf-config/db-codfw.php: Change one parsercache key on codfw - T210725 (duration: 00m 57s)
  • 05:58 marostegui@deploy1001: Synchronized wmf-config/db-eqiad.php: Depool db1103:3312 (duration: 01m 10s)
  • 00:56 SMalyshev: depooled wdq1004 to catch up
  • 00:55 SMalyshev: repooled wdq1006

2019-03-26

  • 23:37 SMalyshev: repooled wdqs2003
  • 23:12 ebernhardson@deploy1001: Synchronized wmf-config/InitialiseSettings-labs.php: T216206 : sync noop labs config: Actually load WBCS-Lexeme extension before trying to use it (duration: 00m 57s)
  • 22:12 gehel: freezing and unfreezing writes to elasticsearch codfw
  • 21:47 SMalyshev: depool wdq2003 to catch it up
  • 21:32 ebernhardson: manually thaw search.svc.codfw.wmnet:9643
  • 21:31 mholloway-shell@deploy1001: Synchronized wmf-config/InitialiseSettings.php: Enable WikimediaEditorTasks on testwikidatawiki (duration: 00m 57s)
  • 21:22 mdholloway: created new db tables for WikimediaEditorTasks in x1
  • 21:00 SMalyshev: depooled wdqs1006 to see if it'd catch up better
  • 20:19 marxarelli: correction: group0 to 1.33.0-wmf.23
  • 20:15 dduvall@deploy1001: rebuilt and synchronized wikiversions files: group0 to 1.33.0-wmf.0
  • 20:08 ejegg: updated payments-wiki from f42910460b to 6661655e37
  • 19:58 dduvall@deploy1001: Finished scap: testwiki to php-1.33.0-wmf.23 and rebuild l10n cache (duration: 37m 59s)
  • 19:20 dduvall@deploy1001: Started scap: testwiki to php-1.33.0-wmf.23 and rebuild l10n cache
  • 19:18 marxarelli: scap clean failure due to T218783. train is rolling without cleanup
  • 19:17 jynus: reloading db2095 mariadb instances to reload and check filters
  • 19:13 jynus: reloading db2094 mariadb instances to reload and check filters
  • 19:07 dduvall@deploy1001: clean aborted: Pruned MediaWiki: 1.33.0-wmf.17 (duration: 00m 10s)
  • 19:04 jynus: reloading db1125 mariadb instances to reload and check filters
  • 18:49 marxarelli: branch 1.33.0-wmf.23 was cut successfully (T206677)
  • 18:24 jynus: reloading db1124 mariadb instances to reload and check filters
  • 18:21 marxarelli: starting branch cut for 1.33.0-wmf.23 (T206677)
  • 18:09 thcipriani: gerrit back on version 2.15.12, upgrade complete.
  • 18:05 thcipriani: restarting gerrit on cobalt for update to 2.15.12
  • 18:05 gehel@cumin2001: END (FAIL) - Cookbook sre.elasticsearch.e6-upgrade (exit_code=99)
  • 18:05 thcipriani@deploy1001: Finished deploy [gerrit/gerrit@d3d2134]: Gerrit to 2.15.12 on cobalt (duration: 00m 15s)
  • 18:04 thcipriani@deploy1001: Started deploy [gerrit/gerrit@d3d2134]: Gerrit to 2.15.12 on cobalt
  • 18:03 thcipriani@deploy1001: Finished deploy [gerrit/gerrit@d3d2134]: Gerrit to 2.15.12 on gerrit2001 only (duration: 00m 11s)
  • 18:03 thcipriani@deploy1001: Started deploy [gerrit/gerrit@d3d2134]: Gerrit to 2.15.12 on gerrit2001 only
  • 18:01 thcipriani: starting gerrit 2.15.12 upgrade
  • 17:45 otto@deploy1001: scap-helm eventgate-analytics finished
  • 17:45 otto@deploy1001: scap-helm eventgate-analytics cluster eqiad completed
  • 17:45 otto@deploy1001: scap-helm eventgate-analytics upgrade production -f eventgate-analytics-eqiad-values.yaml --reset-values stable/eventgate-analytics [namespace: eventgate-analytics, clusters: eqiad]
  • 17:43 otto@deploy1001: scap-helm eventgate-analytics finished
  • 17:43 otto@deploy1001: scap-helm eventgate-analytics cluster codfw completed
  • 17:43 otto@deploy1001: scap-helm eventgate-analytics upgrade production -f eventgate-analytics-codfw-values.yaml --reset-values stable/eventgate-analytics [namespace: eventgate-analytics, clusters: codfw]
  • 17:41 otto@deploy1001: scap-helm eventgate-analytics finished
  • 17:41 otto@deploy1001: scap-helm eventgate-analytics cluster staging completed
  • 17:41 otto@deploy1001: scap-helm eventgate-analytics upgrade staging -f eventgate-analytics-staging-values.yaml --reset-values stable/eventgate-analytics [namespace: eventgate-analytics, clusters: staging]
  • 17:39 otto@deploy1001: scap-helm eventgate-analytics finished
  • 17:39 otto@deploy1001: scap-helm eventgate-analytics cluster staging completed
  • 17:39 otto@deploy1001: scap-helm eventgate-analytics upgrade staging -f eventgate-analytics-staging-values.yaml --dry-run --debug stable/eventgate-analytics [namespace: eventgate-analytics, clusters: staging]
  • 17:38 otto@deploy1001: scap-helm eventgate-analytics finished
  • 17:38 otto@deploy1001: scap-helm eventgate-analytics cluster staging completed
  • 17:38 otto@deploy1001: scap-helm eventgate-analytics upgrade staging -f eventgate-analytics-staging-values.yaml stable/eventgate-analytics [namespace: eventgate-analytics, clusters: staging]
  • 17:38 arlolra: Updated Parsoid to f58c3d1 (T219023)
  • 17:38 otto@deploy1001: scap-helm eventgate-analytics finished
  • 17:38 otto@deploy1001: scap-helm eventgate-analytics cluster staging completed
  • 17:38 otto@deploy1001: scap-helm eventgate-analytics upgrade staging -f eventgate-analytics-staging-values.yaml --dry-run --debug stable/eventgate-analytics [namespace: eventgate-analytics, clusters: staging]
  • 17:33 otto@deploy1001: scap-helm eventgate-analytics finished
  • 17:33 otto@deploy1001: scap-helm eventgate-analytics cluster staging completed
  • 17:33 otto@deploy1001: scap-helm eventgate-analytics upgrade staging -f eventgate-analytics-staging-values.yaml --dry-run --debug stable/eventgate-analytics [namespace: eventgate-analytics, clusters: staging]
  • 17:31 otto@deploy1001: scap-helm eventgate-analytics finished
  • 17:31 otto@deploy1001: scap-helm eventgate-analytics cluster staging completed
  • 17:31 otto@deploy1001: scap-helm eventgate-analytics upgrade staging -f eventgate-analytics-staging-values.yaml stable/eventgate-analytics [namespace: eventgate-analytics, clusters: staging]
  • 17:28 arlolra@deploy1001: Finished deploy [parsoid/deploy@395a214]: Updating Parsoid to f58c3d1 (duration: 06m 51s)
  • 17:21 arlolra@deploy1001: Started deploy [parsoid/deploy@395a214]: Updating Parsoid to f58c3d1
  • 17:14 gehel@cumin2001: END (PASS) - Cookbook sre.elasticsearch.force-shard-allocation (exit_code=0)
  • 17:13 gehel@cumin2001: START - Cookbook sre.elasticsearch.force-shard-allocation
  • 17:12 otto@deploy1001: scap-helm eventgate-analytics finished
  • 17:12 otto@deploy1001: scap-helm eventgate-analytics cluster staging completed
  • 17:12 otto@deploy1001: scap-helm eventgate-analytics upgrade staging -f eventgate-analytics-staging-values.yaml --dry-run --debug stable/eventgate-analytics [namespace: eventgate-analytics, clusters: staging]
  • 17:10 otto@deploy1001: scap-helm eventgate-analytics finished
  • 17:10 otto@deploy1001: scap-helm eventgate-analytics cluster staging completed
  • 17:10 otto@deploy1001: scap-helm eventgate-analytics upgrade staging -f eventgate-analytics-staging-values.yaml stable/eventgate-analytics [namespace: eventgate-analytics, clusters: staging]
  • 17:06 otto@deploy1001: scap-helm eventgate-analytics finished
  • 17:06 otto@deploy1001: scap-helm eventgate-analytics cluster staging completed
  • 17:06 otto@deploy1001: scap-helm eventgate-analytics upgrade staging -f eventgate-analytics-staging-values.yaml --dry-run --debug stable/eventgate-analytics [namespace: eventgate-analytics, clusters: staging]
  • 17:03 otto@deploy1001: scap-helm eventgate-analytics finished
  • 17:03 otto@deploy1001: scap-helm eventgate-analytics cluster staging completed
  • 17:03 otto@deploy1001: scap-helm eventgate-analytics upgrade staging -f eventgate-analytics-staging-values.yaml --dry-run --debug stable/eventgate-analytics [namespace: eventgate-analytics, clusters: staging]
  • 16:59 otto@deploy1001: scap-helm eventgate-analytics finished
  • 16:59 otto@deploy1001: scap-helm eventgate-analytics cluster staging completed
  • 16:59 otto@deploy1001: scap-helm eventgate-analytics upgrade staging -f eventgate-analytics-staging-values.yaml --dry-run --debug stable/eventgate-analytics [namespace: eventgate-analytics, clusters: staging]
  • 16:59 otto@deploy1001: scap-helm eventgate- upgrade staging -f eventgate-analytics-staging-values.yaml --dry-run --debug stable/eventgate-analytics [namespace: eventgate-, clusters: staging]
  • 16:59 otto@deploy1001: scap-helm eventgate- upgrade staging -f eventgate-analytics-staging-values.yaml --dry-run --debug stable/eventgate-analytics [namespace: eventgate-, clusters: staging]
  • 16:58 gehel@cumin2001: START - Cookbook sre.elasticsearch.e6-upgrade
  • 16:58 gehel@cumin2001: END (PASS) - Cookbook sre.elasticsearch.force-shard-allocation (exit_code=0)
  • 16:57 gehel@cumin2001: START - Cookbook sre.elasticsearch.force-shard-allocation
  • 16:57 otto@deploy1001: scap-helm eventgate-analytics finished
  • 16:57 otto@deploy1001: scap-helm eventgate-analytics cluster staging completed
  • 16:57 otto@deploy1001: scap-helm eventgate-analytics upgrade staging -f eventgate-analytics-staging-values.yaml stable/eventgate-analytics [namespace: eventgate-analytics, clusters: staging]
  • 16:31 gilles@deploy1001: Finished deploy [performance/asoranking@9a1e5ef]: (no justification provided) (duration: 00m 52s)
  • 16:30 gilles@deploy1001: Started deploy [performance/asoranking@9a1e5ef]: (no justification provided)
  • 16:07 gehel@cumin2001: END (PASS) - Cookbook sre.elasticsearch.force-shard-allocation (exit_code=0)
  • 16:07 gehel@cumin2001: START - Cookbook sre.elasticsearch.force-shard-allocation
  • 16:05 robh: decom of labtestvirt200[12] started via T218023
  • 15:45 otto@deploy1001: scap-helm eventgate-analytics finished
  • 15:45 otto@deploy1001: scap-helm eventgate-analytics cluster staging completed
  • 15:45 otto@deploy1001: scap-helm eventgate-analytics upgrade staging --version 0.0.16 -f eventgate-analytics-staging-values.yaml stable/eventgate-analytics [namespace: eventgate-analytics, clusters: staging]
  • 15:44 otto@deploy1001: scap-helm eventgate-analytics finished
  • 15:44 otto@deploy1001: scap-helm eventgate-analytics cluster staging completed
  • 15:44 otto@deploy1001: scap-helm eventgate-analytics upgrade staging --version 0.0.16 -f eventgate-analytics-staging-values.yaml stable/eventgate-analytics [namespace: eventgate-analytics, clusters: staging]
  • 15:43 gehel@cumin2001: END (PASS) - Cookbook sre.elasticsearch.force-shard-allocation (exit_code=0)
  • 15:43 otto@deploy1001: scap-helm eventgate-analytics upgrade staging --version 52 -f eventgate-analytics-staging-values.yaml stable/eventgate-analytics [namespace: eventgate-analytics, clusters: staging]
  • 15:43 gehel@cumin2001: START - Cookbook sre.elasticsearch.force-shard-allocation
  • 15:40 otto@deploy1001: scap-helm eventgate-analytics finished
  • 15:40 otto@deploy1001: scap-helm eventgate-analytics cluster staging completed
  • 15:40 otto@deploy1001: scap-helm eventgate-analytics upgrade --help [namespace: eventgate-analytics, clusters: staging]
  • 15:34 otto@deploy1001: scap-helm eventgate-analytics finished
  • 15:34 otto@deploy1001: scap-helm eventgate-analytics cluster staging completed
  • 15:34 otto@deploy1001: scap-helm eventgate-analytics upgrade staging -f eventgate-analytics-staging-values.yaml --dry-run --debug stable/eventgate-analytics [namespace: eventgate-analytics, clusters: staging]
  • 15:32 otto@deploy1001: scap-helm eventgate-analytics finished
  • 15:32 otto@deploy1001: scap-helm eventgate-analytics cluster staging completed
  • 15:32 otto@deploy1001: scap-helm eventgate-analytics upgrade staging -f eventgate-analytics-staging-values.yaml stable/eventgate-analytics [namespace: eventgate-analytics, clusters: staging]
  • 15:31 gehel@cumin2001: END (FAIL) - Cookbook sre.elasticsearch.e6-upgrade (exit_code=99)
  • 15:31 otto@deploy1001: scap-helm eventgate-analytics finished
  • 15:31 otto@deploy1001: scap-helm eventgate-analytics cluster staging completed
  • 15:31 otto@deploy1001: scap-helm eventgate-analytics upgrade staging -f eventgate-analytics-staging-values.yaml stable/eventgate-analytics [namespace: eventgate-analytics, clusters: staging]
  • 15:27 otto@deploy1001: scap-helm eventgate-analytics finished
  • 15:27 otto@deploy1001: scap-helm eventgate-analytics cluster staging completed
  • 15:27 otto@deploy1001: scap-helm eventgate-analytics upgrade staging -f eventgate-analytics-staging-values.yaml stable/eventgate-analytics [namespace: eventgate-analytics, clusters: staging]
  • 15:20 gehel@cumin2001: END (PASS) - Cookbook sre.elasticsearch.force-shard-allocation (exit_code=0)
  • 15:20 gehel@cumin2001: START - Cookbook sre.elasticsearch.force-shard-allocation
  • 15:08 gehel@cumin2001: END (PASS) - Cookbook sre.elasticsearch.force-shard-allocation (exit_code=0)
  • 15:07 gehel@cumin2001: START - Cookbook sre.elasticsearch.force-shard-allocation
  • 14:01 jbond42: rolling update of passenger on puppet masters
  • 13:35 gehel@cumin2001: END (PASS) - Cookbook sre.elasticsearch.force-shard-allocation (exit_code=0)
  • 13:35 gehel@cumin2001: START - Cookbook sre.elasticsearch.force-shard-allocation
  • 13:06 gehel@cumin2001: END (PASS) - Cookbook sre.elasticsearch.force-shard-allocation (exit_code=0)
  • 13:04 gehel@cumin2001: START - Cookbook sre.elasticsearch.force-shard-allocation
  • 12:58 gehel@cumin2001: START - Cookbook sre.elasticsearch.e6-upgrade
  • 11:42 Amir1: EU SWAT is done
  • 11:40 Amir1: ladsgroup@mwmaint1002:~$ mwscript extensions/Wikibase/lib/maintenance/populateSitesTable.php --wiki=wikimaniawiki --force-protocol https (T217730)
  • 11:39 Amir1: wikiadmin@db1078.eqiad.wmnet(wikimaniawiki)> DELETE FROM sites; and site_identifiers
  • 11:28 ladsgroup@deploy1001: Synchronized wmf-config/InitialiseSettings.php: SWAT: Set $wmgWikibaseSiteGroup for wikimaniawiki (T217730)|gerrit:498440Set $wmgWikibaseSiteGroup for wikimaniawiki (T217730) (duration: 00m 49s)
  • 11:22 elukey: temporary install ifstat on mc1022 + tmux session to log in/out bandwidth usage every 1s for T203786
  • 11:20 ladsgroup@deploy1001: Synchronized wmf-config/throttle.php: SWAT: Throttle rule for Wikimedia Hackathon 2019 (T213869)|gerrit:498949Throttle rule for Wikimedia Hackathon 2019 (T213869), try II (duration: 00m 49s)
  • 11:11 ladsgroup@deploy1001: Synchronized wmf-config/throttle.php: SWAT: Throttle rule for Wikimedia Hackathon 2019 (T213869)|gerrit:498949Throttle rule for Wikimedia Hackathon 2019 (T213869) (duration: 00m 51s)
  • 10:54 marostegui@deploy1001: Synchronized wmf-config/db-eqiad.php: Repool db1105:3312 (duration: 00m 49s)
  • 09:56 marostegui@deploy1001: Synchronized wmf-config/db-eqiad.php: Depool db1105:3312 (duration: 00m 50s)
  • 09:54 marostegui: Upgrade db2071
  • 09:42 marostegui: Upgrade db2070
  • 09:15 jijiki: Restarting pdfrender on scb1001
  • 09:09 filippo@puppetmaster1001: conftool action : set/pooled=no; selector: name=prometheus1004.eqiad.wmnet
  • 09:05 filippo@puppetmaster1001: conftool action : set/pooled=yes; selector: name=prometheus1003.eqiad.wmnet
  • 08:09 marostegui: Deploy schema change on s2 codfw master, this will generate lag on codfw s2
  • 07:16 marostegui@deploy1001: Synchronized wmf-config/db-eqiad.php: Repool db1119 (duration: 00m 49s)
  • 06:51 marostegui@deploy1001: Synchronized wmf-config/db-eqiad.php: Depool db1119 (duration: 00m 50s)
  • 06:35 marostegui@deploy1001: Synchronized wmf-config/db-eqiad.php: Repool db1106 (duration: 00m 52s)
  • 06:02 marostegui@deploy1001: Synchronized wmf-config/db-eqiad.php: Depool db1106 (duration: 00m 51s)
  • 06:02 marostegui: Deploy schema change on db1106, this will generate lag on s1 on labs hosts

2019-03-25

  • 23:20 jforrester@deploy1001: Synchronized wmf-config/InitialiseSettings.php: SWAT T219234 Turn on Elastica logging channel (duration: 00m 51s)
  • 22:32 krinkle@deploy1001: Synchronized docroot/wikipedia.org/speed-tests/Banksy.enwiki.872156204: T185446 (duration: 00m 49s)
  • 21:44 jforrester@deploy1001: Synchronized wmf-config/Wikibase.php: T216206 Deploy WikibaseLexemeCirrusSearch: Part 1 - set up variables, sub-part b (duration: 00m 49s)
  • 21:43 jforrester@deploy1001: Synchronized wmf-config/InitialiseSettings.php: T216206 Deploy WikibaseLexemeCirrusSearch: Part 1 - set up variables, sub-part a (duration: 00m 50s)
  • 21:40 XioNoX: apply transport-in4 filter to cr1/2-eqiad - T190090
  • 21:33 jforrester@deploy1001: Synchronized wmf-config/InitialiseSettings.php: T218715 Enable WBCS on Testcommons too (duration: 00m 50s)
  • 20:32 ebernhardson: T218994 set various deprecation channels on all six cirrus elasticsearch clusters to ERROR
  • 19:54 dcausse: elasticsearch search cluster: SET "logger.org.elasticsearch.common.logging.DeprecationLogger" to "ERROR" to psi/omega@eqiad (T218994)
  • 19:48 dcausse: elasticsearch search cluster: SET "logger.org.elasticsearch.deprecation.index.query.functionscore.ScoreFunctionBuilder" to "ERROR" to chi/psi/omega@eqiad (T218994)
  • 19:40 volans: restart icinga on icinga1001 to reset modified attributes
  • 19:37 dcausse: morning SWAT done
  • 19:33 dcausse@deploy1001: Synchronized wmf-config/InitialiseSettings.php: [cirrus] switch all wikis to eqiad (elastic 6.5.4) (duration: 00m 50s)
  • 19:21 dcausse@deploy1001: Synchronized wmf-config/InitialiseSettings.php: T192254 (duration: 00m 49s)
  • 19:13 dcausse@deploy1001: Synchronized wmf-config/CommonSettings.php: T218260 (duration: 00m 49s)
  • 19:06 ebernhardson@deploy1001: Finished deploy [search/mjolnir/deploy@1cbf290]: Roll update to mjolnir-bulk-daemon es6 handling of super_detect_noop (duration: 03m 27s)
  • 19:02 ebernhardson@deploy1001: Started deploy [search/mjolnir/deploy@1cbf290]: Roll update to mjolnir-bulk-daemon es6 handling of super_detect_noop
  • 18:46 dcausse@deploy1001: Synchronized wmf-config/flaggedrevs.php: revert T217507 (duration: 00m 49s)
  • 18:43 ebernhardson: restart mjolnir-kafka-msearch-daemon across cirrus elasticsearch servers
  • 18:41 dcausse@deploy1001: scap failed: average error rate on 6/11 canaries increased by 10x (rerun with --force to override this check, see https://logstash.wikimedia.org/goto/db09a36be5ed3e81155041f7d46ad040 for details)
  • 18:32 dcausse@deploy1001: Synchronized php-1.33.0-wmf.22/extensions/FlaggedRevs/: T218949: Fix reject changes when user is partially blocked (duration: 00m 51s)
  • 18:27 dcausse@deploy1001: Synchronized dblists/visualeditor-nondefault.dblist: T192135 (duration: 00m 50s)
  • 18:15 dcausse@deploy1001: Synchronized wmf-config/CommonSettings.php: T211622: Enforce 8 char password length requirements for non-privileged users (duration: 00m 50s)
  • 17:24 onimisionipe@deploy1001: Finished deploy [wdqs/wdqs@281aaf8]: New build for blazegraph and updater plus GUI updates (duration: 10m 31s)
  • 17:24 elukey: restart pdfrender on scb1004
  • 17:14 onimisionipe@deploy1001: Started deploy [wdqs/wdqs@281aaf8]: New build for blazegraph and updater plus GUI updates
  • 17:11 ebernhardson: restart mjolnir-kafka-msearch-daemon on relforge100[12]
  • 17:10 dcausse@deploy1001: Synchronized wmf-config/InitialiseSettings.php: T218878: [cirrus] switch low volume wikis to eqiad (elastic 6.5.4) (duration: 00m 49s)
  • 16:56 ebernhardson@deploy1001: Finished deploy [search/mjolnir/deploy@2fb5038]: Ship new logging support code via new simplified virtualenv deployment (duration: 09m 52s)
  • 16:47 ebernhardson@deploy1001: Started deploy [search/mjolnir/deploy@2fb5038]: Ship new logging support code via new simplified virtualenv deployment
  • 16:28 ebernhardson@deploy1001: Finished deploy [search/mjolnir/deploy@ba09eb5]: Ship new logging support code via new simplified virtualenv deployment (duration: 09m 10s)
  • 16:19 ebernhardson@deploy1001: Started deploy [search/mjolnir/deploy@ba09eb5]: Ship new logging support code via new simplified virtualenv deployment
  • 16:19 hashar: updating Jenkins plugins and restarting
  • 16:16 ebernhardson@deploy1001: Finished deploy [search/mjolnir/deploy@6eda7d8]: Ship new logging support code via new simplified virtualenv deployment (duration: 02m 38s)
  • 16:13 ebernhardson@deploy1001: Started deploy [search/mjolnir/deploy@6eda7d8]: Ship new logging support code via new simplified virtualenv deployment
  • 15:48 XioNoX: remove 2nd AS7568 router in Equinix Singapore
  • 15:21 ebernhardson@deploy1001: Finished deploy [search/mjolnir/deploy@ddf26d0]: Ship new logging support code (duration: 01m 29s)
  • 15:20 ebernhardson@deploy1001: Started deploy [search/mjolnir/deploy@ddf26d0]: Ship new logging support code
  • 15:00 jbond42: updateing passenger on rhodium
  • 14:29 andrewbogott: updating slapd indexes on seaborgium, serpens, ldap-eqiad-replica01, ldap-eqiad-replica02 for 498396
  • 13:52 ema@puppetmaster1001: conftool action : set/pooled=yes; selector: name=cp1076.eqiad.wmnet,service=varnish-fe
  • 13:52 ema@puppetmaster1001: conftool action : set/pooled=yes; selector: name=cp1076.eqiad.wmnet,service=nginx
  • 13:52 ema: cp1076: repool varnish-fe, frontend misses served by cp-ats T213263
  • 13:41 ema@puppetmaster1001: conftool action : set/pooled=no; selector: name=cp1076.eqiad.wmnet,service=varnish-fe
  • 13:41 ema@puppetmaster1001: conftool action : set/pooled=no; selector: name=cp1076.eqiad.wmnet,service=nginx
  • 13:41 ema: cp1076: depool varnish-fe and point it to cp-ats T213263
  • 13:28 mutante: planet - manually updating en version since new monitoring check warned it wasn't current (T203208)
  • 13:17 mutante: mwmaint1002 - manually running tor_exit_node cron command and test with PHP 7.2
  • 12:48 mutante: reloading icinga config
  • 12:15 Lucas_WMDE: EU SWAT finished
  • 12:08 oblivian@deploy1001: Synchronized wmf-config/CommonSettings.php: Move 0.1% of anonymous users to php7 T212828 (duration: 00m 49s)
  • 12:07 moritzm: installing openssl1.0 security updates on stretch
  • 12:00 lucaswerkmeister-wmde@deploy1001: Synchronized wmf-config/InitialiseSettings.php: SWAT: Revert "Remove $wgAbuseFilterRuntimeProfile" (T191039)|gerrit:498814Revert "Remove $wgAbuseFilterRuntimeProfile" (T191039) (duration: 00m 51s)
  • 11:48 lucaswerkmeister-wmde@deploy1001: Synchronized wmf-config/InitialiseSettings.php: SWAT: Remove $wgAbuseFilterRuntimeProfile (T191039)|gerrit:486470Remove $wgAbuseFilterRuntimeProfile (T191039) (duration: 00m 49s)
  • 11:46 ema: cp-ats-codfw: upgrade trafficserver to 8.0.3-1wm1
  • 11:35 lucaswerkmeister-wmde@deploy1001: Synchronized php-1.33.0-wmf.22/extensions/Wikibase/repo: SWAT: Revert "OutputPageBeforeHTML: do nothing for non entity pages" (T218907)|gerrit:498354 Revert "OutputPageBeforeHTML: do nothing for non entity pages" (T218907) (duration: 01m 06s)
  • 11:26 filippo@puppetmaster1001: conftool action : set/pooled=no; selector: name=prometheus2003.codfw.wmnet
  • 11:23 filippo@puppetmaster1001: conftool action : set/pooled=yes; selector: name=prometheus2004.codfw.wmnet
  • 11:23 godog: switch codfw prometheus from prometheus2003 to prometheus2004
  • 11:19 ema: cp-ats-eqiad: upgrade trafficserver to 8.0.3-1wm1
  • 11:18 oblivian@deploy1001: Synchronized wmf-config/ProductionServices.php: switching to use of the local proxy for search in php7 (duration: 00m 50s)
  • 11:16 oblivian@deploy1001: Synchronized wmf-config/LabsServices.php: switching to use of the local proxy for search in php7 (duration: 00m 50s)
  • 11:09 ema: trafficserver 8.0.3-1wm1 uploaded to stretch-wikimedia
  • 10:56 marostegui@deploy1001: Synchronized wmf-config/db-eqiad.php: Fully repool db1083 (duration: 00m 48s)
  • 10:40 marostegui@deploy1001: Synchronized wmf-config/db-eqiad.php: More traffic to db1083 (duration: 00m 49s)
  • 10:40 gehel: disable deprecation warnings on elasticsearch eqiad - T218994
  • 10:38 jdrewniak@deploy1001: Synchronized portals: Wikimedia Portals Update: Bumping portals to master (T128546)|gerrit:498800 Bumping portals to master (T128546) (duration: 00m 49s)
  • 10:37 jdrewniak@deploy1001: Synchronized portals/wikipedia.org/assets: Wikimedia Portals Update: Bumping portals to master (T128546)|gerrit:498800 Bumping portals to master (T128546) (duration: 00m 49s)
  • 10:27 moritzm: installing Java security updates on Hadoop/Druid test cluster
  • 10:26 marostegui@deploy1001: Synchronized wmf-config/db-eqiad.php: More traffic to db1083 (duration: 00m 49s)
  • 10:12 marostegui@deploy1001: Synchronized wmf-config/db-eqiad.php: More traffic to db1083 (duration: 00m 49s)
  • 10:07 moritzm: installing ntfs-3g security updates
  • 10:02 marostegui@deploy1001: Synchronized wmf-config/db-eqiad.php: Slowly repool db1083 (duration: 00m 49s)
  • 09:42 moritzm: uploaded openjdk 8u212-b01-1~deb8u1 to apt.wikimedia.org/jessie-wikimedia/main
  • 09:34 marostegui: Upgrade db2062
  • 09:24 hashar: contint1001: manually compressing Zuul log files sudo -u zuul gzip --best /var/log/zuul/*.log.????-??-??
  • 09:22 marostegui@deploy1001: Synchronized wmf-config/db-eqiad.php: Depool db1083+ (duration: 00m 49s)
  • 09:18 marostegui: Upgrade db2055
  • 09:16 marostegui@deploy1001: Synchronized wmf-config/db-eqiad.php: Repool db1105:3311 (duration: 00m 49s)
  • 09:10 mutante: contint1001 - restarting zuul
  • 08:21 marostegui@deploy1001: Synchronized wmf-config/db-eqiad.php: Depool db1105:3311 (duration: 00m 49s)
  • 08:08 vgutierrez: reenabling puppet in openldap servers
  • 08:05 marostegui@deploy1001: Synchronized wmf-config/db-eqiad.php: Fully repool db1118 (duration: 00m 49s)
  • 07:58 vgutierrez: disable puppet and downtime host in icinga for labtestservices2001 - T218022
  • 07:40 vgutierrez: disable puppet in production openldap servers before merging https://gerrit.wikimedia.org/r/498776
  • 07:24 marostegui@deploy1001: Synchronized wmf-config/db-eqiad.php: More traffic to db1118 (duration: 00m 49s)
  • 06:59 marostegui@deploy1001: Synchronized wmf-config/db-eqiad.php: Slowly repool db1118 after mysql upgrade (duration: 00m 50s)
  • 06:45 marostegui: Stop MySQL on db1118 for upgrade
  • 06:44 marostegui: Deploy schema change on s1 codfw master, this will generate lag on codfw
  • 06:10 marostegui@deploy1001: Synchronized wmf-config/db-eqiad.php: Depool db1118 for schema change and upgrade (duration: 00m 54s)
  • 04:31 chaomodus: restarted pdfrender on scb1003 to try to help flapping

2019-03-24

  • 15:00 jijiki: Restart pdfrender on scb1002 and scb1004

2019-03-23

  • 13:02 mholloway-shell@deploy1001: Synchronized wmf-config/InitialiseSettings-labs.php: Fix WikimediaEditorTasks Beta Cluster DB config, take 2 (duration: 00m 50s)
  • 12:36 mholloway-shell@deploy1001: Synchronized wmf-config/InitialiseSettings-labs.php: Fix WikimediaEditorTasks Beta Cluster DB config (duration: 00m 52s)

2019-03-22

  • 22:13 bd808: Restarted uwsgi-striker on labweb1002
  • 22:12 bd808: Restarted uwsgi-striker on labweb1001
  • 20:14 otto@deploy1001: scap-helm eventgate-analytics finished
  • 20:14 otto@deploy1001: scap-helm eventgate-analytics cluster eqiad completed
  • 20:14 otto@deploy1001: scap-helm eventgate-analytics upgrade production -f eventgate-analytics-eqiad-values.yaml stable/eventgate-analytics [namespace: eventgate-analytics, clusters: eqiad]
  • 20:04 otto@deploy1001: scap-helm eventgate-analytics finished
  • 20:04 otto@deploy1001: scap-helm eventgate-analytics cluster eqiad completed
  • 20:04 otto@deploy1001: scap-helm eventgate-analytics upgrade production -f eventgate-analytics-eqiad-values.yaml stable/eventgate-analytics [namespace: eventgate-analytics, clusters: eqiad]
  • 19:59 ejegg: updated payments-wiki-staging from 31647bc97e to f42910460b
  • 19:57 otto@deploy1001: scap-helm eventgate-analytics finished
  • 19:57 otto@deploy1001: scap-helm eventgate-analytics cluster eqiad completed
  • 19:57 otto@deploy1001: scap-helm eventgate-analytics upgrade production -f eventgate-analytics-eqiad-values.yaml stable/eventgate-analytics [namespace: eventgate-analytics, clusters: eqiad]
  • 19:55 otto@deploy1001: scap-helm eventgate-analytics finished
  • 19:55 otto@deploy1001: scap-helm eventgate-analytics cluster codfw completed
  • 19:55 otto@deploy1001: scap-helm eventgate-analytics upgrade production -f eventgate-analytics-codfw-values.yaml stable/eventgate-analytics [namespace: eventgate-analytics, clusters: codfw]
  • 19:52 otto@deploy1001: scap-helm eventgate-analytics finished
  • 19:52 otto@deploy1001: scap-helm eventgate-analytics cluster staging completed
  • 19:52 otto@deploy1001: scap-helm eventgate-analytics upgrade staging -f eventgate-analytics-staging-values.yaml stable/eventgate-analytics --set main_app.extra_kafka_conf= [namespace: eventgate-analytics, clusters: staging]
  • 19:52 otto@deploy1001: scap-helm eventgate-analytics finished
  • 19:52 otto@deploy1001: scap-helm eventgate-analytics cluster staging completed
  • 19:52 otto@deploy1001: scap-helm eventgate-analytics upgrade staging -f eventgate-analytics-staging-values.yaml stable/eventgate-analytics --set main_app.extra_kafka_conf={} [namespace: eventgate-analytics, clusters: staging]
  • 19:46 otto@deploy1001: scap-helm eventgate-analytics finished
  • 19:46 otto@deploy1001: scap-helm eventgate-analytics cluster staging completed
  • 19:46 otto@deploy1001: scap-helm eventgate-analytics upgrade staging -f eventgate-analytics-staging-values.yaml stable/eventgate-analytics [namespace: eventgate-analytics, clusters: staging]
  • 19:39 otto@deploy1001: scap-helm eventgate-analytics finished
  • 19:39 otto@deploy1001: scap-helm eventgate-analytics cluster staging completed
  • 19:39 otto@deploy1001: scap-helm eventgate-analytics upgrade staging -f eventgate-analytics-staging-values.yaml stable/eventgate-analytics [namespace: eventgate-analytics, clusters: staging]
  • 19:36 otto@deploy1001: scap-helm eventgate-analytics finished
  • 19:36 otto@deploy1001: scap-helm eventgate-analytics cluster staging completed
  • 19:36 otto@deploy1001: scap-helm eventgate-analytics upgrade staging -f eventgate-analytics-staging-values.yaml stable/eventgate-analytics [namespace: eventgate-analytics, clusters: staging]
  • 18:53 otto@deploy1001: scap-helm eventgate-analytics finished
  • 18:53 otto@deploy1001: scap-helm eventgate-analytics cluster staging completed
  • 18:53 otto@deploy1001: scap-helm eventgate-analytics upgrade staging -f eventgate-analytics-staging-values.yaml stable/eventgate-analytics [namespace: eventgate-analytics, clusters: staging]
  • 18:41 krinkle@deploy1001: Synchronized php-1.33.0-wmf.22/extensions/Collection/: I2c4f5d / T217835 (duration: 00m 52s)
  • 18:21 otto@deploy1001: scap-helm eventgate-analytics finished
  • 18:21 otto@deploy1001: scap-helm eventgate-analytics cluster staging completed
  • 18:21 otto@deploy1001: scap-helm eventgate-analytics upgrade staging -f eventgate-analytics-staging-values.yaml stable/eventgate-analytics [namespace: eventgate-analytics, clusters: staging]
  • 18:16 otto@deploy1001: scap-helm eventgate-analytics finished
  • 18:16 otto@deploy1001: scap-helm eventgate-analytics cluster staging completed
  • 18:16 otto@deploy1001: scap-helm eventgate-analytics upgrade staging -f eventgate-analytics-staging-values.yaml --set main_app.kafka_broker_list=kafka-jumbo1003.eqiad.wmnet:9092 stable/eventgate-analytics [namespace: eventgate-analytics, clusters: staging]
  • 18:13 tzatziki: removing 5 files for legal compliance
  • 18:13 otto@deploy1001: scap-helm eventgate-analytics upgrade staging -f eventgate-analytics-staging-values.yaml --set main_app.kafka_broker_list=kafka-jumbo1003.eqiad.wmnet:9092 stable/eventgate-analytics [namespace: eventgate-analytics, clusters: staging]
  • 17:29 otto@deploy1001: scap-helm eventgate-analytics upgrade staging -f eventgate-analytics-staging-values.yaml --set main_app.kafka_broker_list=kafka-jumbo1003.eqiad.wmnet:9092 stable/eventgate-analytics [namespace: eventgate-analytics, clusters: staging]
  • 16:06 jijiki: Restart ferm on db2096
  • 15:58 James_F: UBN hot-deploy for T218918: Only load latest revision in MessageCache::loadFromDB
  • 15:26 gehel: restarting elasticsearch on elastic1046 for logging configuration change - T218994
  • 14:34 mutante: scandium - apt-get remove --purge php* ; apt autoremove ; letting puppet reinstall php 7.2 one more time using mediawiki::profile::php now
  • 14:33 gehel: upgrading to elasticsearch-curator 5.6.0 on all elasticsearch nodes (including logstash) - T218991
  • 11:22 ema: lvs1002: bounce pybal to clear backends health icinga warning T218133
  • 11:18 ema: lvs1005: bounce pybal to clear backends health icinga warning T218133
  • 10:24 mutante: scandium - apt autoremove
  • 10:20 mutante: scandium - manually removing all php* packages to let puppet reinstall 7.2 instead of 7.0
  • 10:05 ema: cp2005: repooled, serving traffic via ATS T213263
  • 10:00 ema@puppetmaster1001: conftool action : set/pooled=yes; selector: name=cp2005.codfw.wmnet,service=varnish-fe
  • 10:00 ema@puppetmaster1001: conftool action : set/pooled=yes; selector: name=cp2005.codfw.wmnet,service=nginx
  • 09:48 ema@puppetmaster1001: conftool action : set/pooled=no; selector: name=cp2005.codfw.wmnet,service=varnish-fe
  • 09:48 ema@puppetmaster1001: conftool action : set/pooled=no; selector: name=cp2005.codfw.wmnet,service=nginx
  • 09:47 ema: cp2005: depool varnish-fe in preparation of traffic switch to ATS T213263
  • 09:42 moritzm: rebooting pool counters in codfw to pick up SSBD-enabled qemu
  • 09:04 elukey: start tcpdump on mc1022 to gather traffic for analysis
  • 06:26 marostegui@deploy1001: Synchronized wmf-config/db-eqiad.php: Repool db1094 (duration: 00m 50s)
  • 06:07 marostegui@deploy1001: Synchronized wmf-config/db-eqiad.php: Depool db1094 (duration: 00m 49s)
  • 06:05 marostegui@deploy1001: Synchronized wmf-config/db-codfw.php: Repool db2096 after onsite maintenance (duration: 00m 51s)
  • 01:31 bd808: labweb: upgraded mariadb packages installed on labweb100[12]
  • 01:19 bd808@deploy1001: Finished deploy [striker/deploy@b4bcd08]: Update python wheels (duration: 01m 00s)
  • 01:18 bd808@deploy1001: Started deploy [striker/deploy@b4bcd08]: Update python wheels
  • 00:54 bd808: Striker down following upgrade. scap3 did not rebuild venv as expected. Manually resolved, but not having mysql library issues.
  • 00:47 Krinkle: krinkle@mwmaint1002 Fixing corrupt 'log_params' field of kawiki.logging row where log_id=1021367; T93110
  • 00:36 bd808@deploy1001: Finished deploy [striker/deploy@c4726e3]: Django upgrade and various bug fixes (T192487, T182142, T176325, T217932) (duration: 01m 15s)
  • 00:34 bd808@deploy1001: Started deploy [striker/deploy@c4726e3]: Django upgrade and various bug fixes (T192487, T182142, T176325, T217932)
  • 00:32 James_F: SWAT done, 12 minutes ago.
  • 00:20 jforrester@deploy1001: Finished scap: SWAT: Full scap for i18n rebuild for 498259 and 498113 (duration: 24m 49s)

2019-03-21

  • 23:57 gtirloni: downtimed systemd check in labweb1001/1002 (T218935)
  • 23:56 jforrester@deploy1001: Started scap: SWAT: Full scap for i18n rebuild for 498259 and 498113
  • 23:53 gtirloni: downtimed systemd check in labwen1001 (T210818)
  • 23:32 jforrester@deploy1001: Synchronized php-1.33.0-wmf.22/extensions/ContentTranslation/api/ApiQueryContentTranslationSuggestions.php: SWAT T218902 CX: Return API error on anonymous suggestions queries (duration: 00m 51s)
  • 23:08 jforrester@deploy1001: Synchronized wmf-config/Wikibase.php: SWAT T217730 Add wikimaniawiki to another special group in Wikibase client (duration: 00m 49s)
  • 22:33 jijiki: Restarting pdfrender on scb1003
  • 22:26 otto@deploy1001: scap-helm eventgate-analytics finished
  • 22:26 otto@deploy1001: scap-helm eventgate-analytics cluster staging completed
  • 22:26 otto@deploy1001: scap-helm eventgate-analytics upgrade staging -f eventgate-analytics-staging-values.yaml stable/eventgate-analytics [namespace: eventgate-analytics, clusters: staging]
  • 22:14 gehel@cumin1001: END (PASS) - Cookbook sre.elasticsearch.e6-upgrade (exit_code=0)
  • 22:02 mholloway-shell@deploy1001: Synchronized wmf-config/CommonSettings.php: Enable WikimediaEditorTasks on the Beta Cluster (duration: 00m 49s)
  • 21:56 mholloway-shell@deploy1001: Synchronized wmf-config/InitialiseSettings-labs.php: Add WikimediaEditorTasks labs config to InitializeSettings-labs.php (duration: 00m 47s)
  • 21:54 mholloway-shell@deploy1001: Synchronized wmf-config/InitialiseSettings.php: Add WikimediaEditorTasks default config to InitializeSettings.php (duration: 00m 49s)
  • 21:53 jijiki: Restarting pdfrender on scb1004
  • 21:52 mholloway-shell@deploy1001: Synchronized wmf-config/extension-list: Add WikimediaEditorTasks to extension-list (duration: 00m 50s)
  • 21:45 gehel@cumin1001: START - Cookbook sre.elasticsearch.e6-upgrade
  • 21:39 XioNoX: Ping offload - replace test IP with text-lb.codfw IP on cr1/2-codfw - T190090
  • 21:11 XioNoX: remove peering sessions to AS7385 on cr4-ulsfo
  • 21:08 otto@deploy1001: scap-helm eventgate-analytics finished
  • 21:08 otto@deploy1001: scap-helm eventgate-analytics cluster staging completed
  • 21:08 otto@deploy1001: scap-helm eventgate-analytics upgrade staging -f eventgate-analytics-staging-values.yaml stable/eventgate-analytics [namespace: eventgate-analytics, clusters: staging]
  • 20:55 otto@deploy1001: scap-helm eventgate-analytics finished
  • 20:55 otto@deploy1001: scap-helm eventgate-analytics cluster staging completed
  • 20:55 otto@deploy1001: scap-helm eventgate-analytics upgrade staging -f eventgate-analytics-staging-values.yaml stable/eventgate-analytics [namespace: eventgate-analytics, clusters: staging]
  • 20:43 otto@deploy1001: scap-helm eventgate-analytics finished
  • 20:43 otto@deploy1001: scap-helm eventgate-analytics cluster staging completed
  • 20:43 otto@deploy1001: scap-helm eventgate-analytics upgrade staging -f eventgate-analytics-staging-values.yaml stable/eventgate-analytics [namespace: eventgate-analytics, clusters: staging]
  • 20:29 otto@deploy1001: scap-helm eventgate-analytics finished
  • 20:29 otto@deploy1001: scap-helm eventgate-analytics cluster staging completed
  • 20:29 otto@deploy1001: scap-helm eventgate-analytics upgrade staging -f eventgate-analytics-staging-values.yaml stable/eventgate-analytics [namespace: eventgate-analytics, clusters: staging]
  • 20:29 otto@deploy1001: scap-helm eventgate-analytics finished
  • 20:29 otto@deploy1001: scap-helm eventgate-analytics cluster staging completed
  • 20:29 otto@deploy1001: scap-helm eventgate-analytics upgrade staging -f eventgate-analytics-staging-values.yaml --set main_app.kafka_broker_list=kafka-jumbo1006.eqiad.wmnet:9092 stable/eventgate-analytics [namespace: eventgate-analytics, clusters: staging]
  • 20:27 otto@deploy1001: scap-helm eventgate-analytics finished
  • 20:27 otto@deploy1001: scap-helm eventgate-analytics cluster staging completed
  • 20:27 otto@deploy1001: scap-helm eventgate-analytics upgrade staging -f eventgate-analytics-staging-values.yaml --set main_app.kafka_broker_list=kafka-jumbo1006.eqiad.wmnet:9092 stable/eventgate-analytics [namespace: eventgate-analytics, clusters: staging]
  • 20:26 otto@deploy1001: scap-helm eventgate-analytics finished
  • 20:26 otto@deploy1001: scap-helm eventgate-analytics cluster staging completed
  • 20:26 otto@deploy1001: scap-helm eventgate-analytics upgrade staging -f eventgate-analytics-staging-values.yaml --set main_app.kafka_broker_list=kafka-jumbo1005.eqiad.wmnet:9092 stable/eventgate-analytics [namespace: eventgate-analytics, clusters: staging]
  • 20:24 otto@deploy1001: scap-helm eventgate-analytics finished
  • 20:24 otto@deploy1001: scap-helm eventgate-analytics cluster staging completed
  • 20:24 otto@deploy1001: scap-helm eventgate-analytics upgrade staging -f eventgate-analytics-staging-values.yaml --set main_app.kafka_broker_list=kafka-jumbo1004.eqiad.wmnet:9092 stable/eventgate-analytics [namespace: eventgate-analytics, clusters: staging]
  • 20:23 otto@deploy1001: scap-helm eventgate-analytics finished
  • 20:23 otto@deploy1001: scap-helm eventgate-analytics cluster staging completed
  • 20:23 otto@deploy1001: scap-helm eventgate-analytics upgrade staging -f eventgate-analytics-staging-values.yaml --set main_app.kafka_broker_list=kafka-jumbo1003.eqiad.wmnet:9092 stable/eventgate-analytics [namespace: eventgate-analytics, clusters: staging]
  • 20:23 otto@deploy1001: scap-helm eventgate-analytics finished
  • 20:23 otto@deploy1001: scap-helm eventgate-analytics cluster staging completed
  • 20:23 otto@deploy1001: scap-helm eventgate-analytics upgrade staging -f eventgate-analytics-staging-values.yaml --set main_app.kafka_broker_list=kafka-jumbo1001.eqiad.wmnet:9092 stable/eventgate-analytics [namespace: eventgate-analytics, clusters: staging]
  • 20:22 otto@deploy1001: scap-helm eventgate-analytics finished
  • 20:22 otto@deploy1001: scap-helm eventgate-analytics cluster staging completed
  • 20:22 otto@deploy1001: scap-helm eventgate-analytics upgrade staging -f eventgate-analytics-staging-values.yaml --set main_app.kafka_broker_list=kafka-jumbo1001.eqiad.wmnet:9092 stable/eventgate-analytics [namespace: eventgate-analytics, clusters: staging]
  • 20:21 otto@deploy1001: scap-helm eventgate-analytics finished
  • 20:21 otto@deploy1001: scap-helm eventgate-analytics cluster staging completed
  • 20:21 otto@deploy1001: scap-helm eventgate-analytics upgrade staging -f eventgate-analytics-staging-values.yaml --set main_app.kafka_broker_list=kafka-jumbo1002.eqiad.wmnet:9092 stable/eventgate-analytics [namespace: eventgate-analytics, clusters: staging]
  • 20:03 otto@deploy1001: scap-helm eventgate-analytics finished
  • 20:03 otto@deploy1001: scap-helm eventgate-analytics cluster staging completed
  • 20:03 otto@deploy1001: scap-helm eventgate-analytics upgrade staging -f eventgate-analytics-staging-values.yaml stable/eventgate-analytics [namespace: eventgate-analytics, clusters: staging]
  • 19:45 jforrester@deploy1001: Synchronized wmf-config/Wikibase.php: T213483 Disable RDF output of mediainfo Wikibase entities (duration: 00m 49s)
  • 19:40 jforrester@deploy1001: Synchronized wmf-config/Wikibase.php: T213483 Read wmgWikibaseEntityTypesWithoutRdfOutput value (duration: 00m 50s)
  • 19:39 jforrester@deploy1001: Synchronized wmf-config/InitialiseSettings.php: T213483 Set default wmgWikibaseEntityTypesWithoutRdfOutput value (duration: 00m 51s)
  • 18:49 gehel: resetting archived settings on elasticsearch cirrus eqiad - T218879
  • 18:41 otto@deploy1001: scap-helm eventgate-analytics finished
  • 18:41 otto@deploy1001: scap-helm eventgate-analytics cluster staging completed
  • 18:41 otto@deploy1001: scap-helm eventgate-analytics upgrade staging -f eventgate-analytics-staging-values.yaml stable/eventgate-analytics [namespace: eventgate-analytics, clusters: staging]
  • 18:36 sbisson@deploy1001: Synchronized php-1.33.0-wmf.22/languages/Language.php: SWAT: languages: Partial revert of I8287118cf8ec01326ead9|gerrit:498116languages: Partial revert of I8287118cf8ec01326ead9 (duration: 00m 50s)
  • 18:30 gehel@cumin1001: END (FAIL) - Cookbook sre.elasticsearch.e6-upgrade (exit_code=99)
  • 18:25 sbisson@deploy1001: Synchronized wmf-config/InitialiseSettings.php: SWAT: Disable Welcome survey on viwiki|gerrit:498166Disable Welcome survey on viwiki (duration: 00m 49s)
  • 18:23 otto@deploy1001: scap-helm eventgate-analytics finished
  • 18:23 otto@deploy1001: scap-helm eventgate-analytics cluster staging completed
  • 18:23 otto@deploy1001: scap-helm eventgate-analytics upgrade staging -f eventgate-analytics-staging-values.yaml stable/eventgate-analytics [namespace: eventgate-analytics, clusters: staging]
  • 18:17 gehel@cumin2001: END (PASS) - Cookbook sre.elasticsearch.force-shard-allocation (exit_code=0)
  • 18:16 sbisson@deploy1001: Synchronized wmf-config/InitialiseSettings.php: SWAT: Enable logging for CitationUsage and CitationUsagePageLoad|gerrit:496857Enable logging for CitationUsage and CitationUsagePageLoad (duration: 00m 49s)
  • 18:13 gehel@cumin2001: START - Cookbook sre.elasticsearch.force-shard-allocation
  • 18:12 otto@deploy1001: scap-helm eventgate-analytics finished
  • 18:12 otto@deploy1001: scap-helm eventgate-analytics cluster staging completed
  • 18:12 otto@deploy1001: scap-helm eventgate-analytics upgrade staging -f eventgate-analytics-staging-values.yaml stable/eventgate-analytics [namespace: eventgate-analytics, clusters: staging]
  • 18:11 sbisson@deploy1001: Synchronized wmf-config/InitialiseSettings.php: SWAT: Disable reader trust survey v2|gerrit:494552Disable reader trust survey v2 (duration: 00m 50s)
  • 18:08 gehel@cumin1001: START - Cookbook sre.elasticsearch.e6-upgrade
  • 18:05 otto@deploy1001: scap-helm eventgate-analytics finished
  • 18:05 otto@deploy1001: scap-helm eventgate-analytics cluster staging completed
  • 18:05 otto@deploy1001: scap-helm eventgate-analytics upgrade staging -f eventgate-analytics-staging-values.yaml stable/eventgate-analytics [namespace: eventgate-analytics, clusters: staging]
  • 18:01 otto@deploy1001: scap-helm eventgate-analytics finished
  • 18:01 otto@deploy1001: scap-helm eventgate-analytics cluster staging completed
  • 18:01 otto@deploy1001: scap-helm eventgate-analytics upgrade staging -f eventgate-analytics-staging-values.yaml stable/eventgate-analytics [namespace: eventgate-analytics, clusters: staging]
  • 17:56 bblack: everything back to normal for lvs1002/lvs1005 (high-traffic2 @ eqiad)
  • 17:55 bblack: restarting pybal on lvs1002
  • 17:54 otto@deploy1001: scap-helm eventgate-analytics finished
  • 17:54 otto@deploy1001: scap-helm eventgate-analytics cluster staging completed
  • 17:54 otto@deploy1001: scap-helm eventgate-analytics upgrade staging -f eventgate-analytics-staging-values.yaml stable/eventgate-analytics [namespace: eventgate-analytics, clusters: staging]
  • 17:49 reedy@deploy1001: Synchronized php-1.33.0-wmf.22/includes/user/User.php: Iab2492 (duration: 00m 51s)
  • 17:43 bblack: restarting pybal on lvs1005
  • 17:43 jforrester@deploy1001: Synchronized wmf-config/InitialiseSettings.php: SDC: Enable EntitySourceBasedFederation on TestCommons (duration: 00m 50s)
  • 17:37 gehel@cumin1001: END (FAIL) - Cookbook sre.elasticsearch.e6-upgrade (exit_code=99)
  • 17:35 bblack: disabled puppet on lvs1002 + lvs1005 for new service rollout
  • 17:28 gehel@cumin1001: START - Cookbook sre.elasticsearch.e6-upgrade
  • 17:27 jforrester@deploy1001: Synchronized wmf-config/CommonSettings.php: SDC: Add test-commons.wikimedia.org to wgCrossSiteAJAXdomains (duration: 00m 49s)
  • 17:11 gehel@cumin1001: END (FAIL) - Cookbook sre.elasticsearch.e6-upgrade (exit_code=99)
  • 17:07 jforrester@deploy1001: Synchronized wmf-config/InitialiseSettings.php: SDC: Enable Depicts on TestCommons, with related config (duration: 00m 50s)
  • 17:03 gehel@cumin1001: START - Cookbook sre.elasticsearch.e6-upgrade
  • 17:03 gehel@cumin1001: END (ERROR) - Cookbook sre.elasticsearch.e6-upgrade (exit_code=97)
  • 17:02 gehel@cumin2001: END (PASS) - Cookbook sre.elasticsearch.force-shard-allocation (exit_code=0)
  • 17:02 gehel@cumin2001: START - Cookbook sre.elasticsearch.force-shard-allocation
  • 16:39 gehel@cumin1001: START - Cookbook sre.elasticsearch.e6-upgrade
  • 16:38 gehel@cumin1001: END (ERROR) - Cookbook sre.elasticsearch.e6-upgrade (exit_code=97)
  • 16:38 gehel@cumin1001: START - Cookbook sre.elasticsearch.e6-upgrade
  • 16:38 gehel@cumin1001: END (ERROR) - Cookbook sre.elasticsearch.e6-upgrade (exit_code=97)
  • 16:38 gehel@cumin2001: END (PASS) - Cookbook sre.elasticsearch.force-shard-allocation (exit_code=0)
  • 16:38 gehel@cumin2001: START - Cookbook sre.elasticsearch.force-shard-allocation
  • 16:29 gehel@cumin2001: END (PASS) - Cookbook sre.elasticsearch.force-shard-allocation (exit_code=0)
  • 16:29 gehel@cumin2001: START - Cookbook sre.elasticsearch.force-shard-allocation
  • 16:03 marostegui@deploy1001: Synchronized wmf-config/db-codfw.php: Depool db2096 for onsite maintenance (duration: 00m 50s)
  • 16:01 marostegui: Poweroff db2096 for onsite maintenance T218336
  • 15:20 moritzm: rebooting flerovium/furud for kernel updates
  • 14:35 moritzm: restarging jenkins on releases* after Java update
  • 14:18 gtirloni: downtimed labtestweb2001 (T218881)
  • 14:11 vgutierrez: re-enabling puppet in acme-chief clients - T218862
  • 14:09 arturo: T218024 disabled icinga checks for labtestweb2001
  • 14:07 gehel@cumin1001: START - Cookbook sre.elasticsearch.e6-upgrade
  • 13:58 vgutierrez: update acme-chief to version 0.15 in acmechief1001 - T218862
  • 13:54 vgutierrez: disabling puppet in acme-chief clients - T218862
  • 13:48 akosiaris: reboot oresrdb2001
  • 13:39 marostegui@deploy1001: Synchronized wmf-config/db-eqiad.php: Repool db1090:3317 (duration: 00m 51s)
  • 13:37 elukey: upgrade openjdk-8 on an-worker1080 and restarted hadoop daemons
  • 13:28 moritzm: installing Java security updates on notebook hosts
  • 13:22 zfilipin@deploy1001: rebuilt and synchronized wikiversions files: all wikis to 1.33.0-wmf.22
  • 13:18 gtirloni: downtimed cloudcontrol*, cloudservices*, labcontrol*, labweb* (T210818)
  • 13:06 moritzm: installing Java security updates on stat hosts
  • 12:40 arturo: T216497 remove python-cliff from jessie-wikimedia/openstack-mitaka-jessie
  • 12:35 jijiki: Pooling mw1339 back
  • 12:33 jijiki: Pooling mw1290 back
  • 12:08 arturo: T216497 add python-cliff to jessie-wikimedia/openstack-mitaka-jessie
  • 12:02 vgutierrez: uploaded acme-chief 0.15 to apt.wikimedia.org (buster) - T218862
  • 11:54 elukey: restart yarn node managers on an-worker10[82,89,92] - shutdown after a long yarn failover and only now downtime is expired
  • 11:36 mutante: gerrit2001 (not the master prod server)- scheduled downtime and rebooting for upgrade
  • 11:04 zeljkof: EU SWAT finished
  • 11:04 zfilipin@deploy1001: Synchronized wmf-config/throttle.php: SWAT: Add new throttle rule for LMU Edit-a-thon (T217929) (duration: 00m 57s)
  • 10:57 filippo@puppetmaster1001: conftool action : set/pooled=no; selector: name=prometheus2004.codfw.wmnet
  • 10:52 filippo@puppetmaster1001: conftool action : set/pooled=yes; selector: name=prometheus2003.codfw.wmnet
  • 10:46 elukey: restart hadoop yarn resource managers on an-master100[1,2] to pick up new settings
  • 10:23 moritzm: rebooting labtestcontrol2001 for kernel update
  • 10:19 marostegui@deploy1001: Synchronized wmf-config/db-eqiad.php: Depool db1090:3317 (duration: 00m 56s)
  • 09:59 marostegui@deploy1001: Synchronized wmf-config/db-eqiad.php: Fully repool db1086 (duration: 00m 58s)
  • 09:42 jiji@cumin1001: conftool action : set/pooled=no; selector: dc=codfw,service=cxserver,cluster=scb,name=scb.*
  • 09:42 jijiki: Depool scb* in codfw from serving cxserver, finishing its migration to k8s - T213195
  • 09:29 marostegui@deploy1001: Synchronized wmf-config/db-eqiad.php: Slowly repool db1086 after mysql upgrade (duration: 00m 56s)
  • 09:27 moritzm: rolling reboot of maps servers in codfw for kernel update
  • 09:17 marostegui: Upgrade and reboot db1086
  • 08:53 marostegui: Upgrade db1086
  • 08:53 marostegui@deploy1001: Synchronized wmf-config/db-eqiad.php: Depool db1086 for upgrade (duration: 00m 56s)
  • 08:43 marostegui@deploy1001: Synchronized wmf-config/db-eqiad.php: Repool db1086 (duration: 00m 57s)
  • 08:20 marostegui@deploy1001: Synchronized wmf-config/db-eqiad.php: Depool db1086 (duration: 00m 56s)
  • 08:09 marostegui@deploy1001: Synchronized wmf-config/db-eqiad.php: Repool db1079 (duration: 00m 56s)
  • 08:01 vgutierrez: deploying directory based certificates in acme-chief clients - T207295
  • 07:35 _joe_: rolling restart of php-fpm to pick up some changes
  • 07:34 marostegui: Deploy schema change on db1079, this will generate lag on labsdb:s8
  • 07:33 marostegui@deploy1001: Synchronized wmf-config/db-eqiad.php: Depool db1079 (duration: 00m 57s)
  • 07:03 elukey: restart pdfrender on scb1002
  • 06:55 marostegui@deploy1001: Synchronized wmf-config/db-eqiad.php: Repool db1101:3317 (duration: 00m 56s)
  • 06:24 marostegui: Run wmcs-wikireplica-dns on cloudcontrol1003 to get dbproxy1011 back
  • 06:13 marostegui@deploy1001: Synchronized wmf-config/db-eqiad.php: Depool db1101:3317 (duration: 01m 10s)
  • 06:12 marostegui: Upgrade and reboot dbproxy1011
  • 06:04 marostegui: Run wmcs-wikireplica-dns on cloudcontrol1003 to drain dbproxy1011
  • 00:09 jforrester@deploy1001: Synchronized php-1.33.0-wmf.22/includes/parser/BlockLevelPass.php: SAT T218817 Unbreak parser line counting for long wikitext pages I22eebb70a I55a2c4c0 I41a45266d (duration: 00m 56s)
  • 00:08 twentyafterfour: deploying phabricator upgrade
  • 00:01 jforrester@deploy1001: Synchronized wmf-config/CommonSettings.php: SWAT Move FundraisingTranslateWorkflow load to after Translate I73452ae8 (duration: 00m 56s)

2019-03-20

  • 23:49 jforrester@deploy1001: Synchronized php-1.33.0-wmf.22/resources/lib/ooui/oojs-ui-core.js: SWAT T218722 T218830 Bring forward UBN OOUI fix (duration: 00m 57s)
  • 23:28 maxsem@deploy1001: Synchronized wmf-config/InitialiseSettings.php: https://gerrit.wikimedia.org/r/#/c/operations/mediawiki-config/+/497948/ (duration: 00m 56s)
  • 23:10 maxsem@deploy1001: Synchronized wmf-config/InitialiseSettings.php: https://gerrit.wikimedia.org/r/#/c/operations/mediawiki-config/+/490648/ (duration: 00m 56s)
  • 22:29 jforrester@deploy1001: Synchronized wmf-config/InitialiseSettings.php: T214075 Enable federation of Wikidata items and properties on Test Commons (duration: 00m 57s)
  • 21:37 XioNoX: apply transit-in4 term offload-ping4 with test IP to cr1/2-codfw - T190090
  • 21:34 XioNoX: apply transit-in4 term offload-ping4 with test IP to cr2-codfw
  • 21:00 XioNoX: apply icmp redirect on cr1-codfw:xe-5/0/2 (to cr4-ulsfo) for test IP 208.80.154.225 - T190090
  • 20:24 zfilipin@deploy1001: Synchronized php: group1 wikis to 1.33.0-wmf.22 (duration: 01m 46s)
  • 20:23 zfilipin@deploy1001: rebuilt and synchronized wikiversions files: group1 wikis to 1.33.0-wmf.22
  • 20:13 otto@deploy1001: scap-helm eventgate-analytics cluster codfw completed
  • 20:13 otto@deploy1001: scap-helm eventgate-analytics finished
  • 20:13 otto@deploy1001: scap-helm eventgate-analytics upgrade production -f eventgate-analytics-codfw-values.yaml stable/eventgate-analytics [namespace: eventgate-analytics, clusters: codfw]
  • 20:13 otto@deploy1001: scap-helm eventgate-analytics finished
  • 20:13 otto@deploy1001: scap-helm eventgate-analytics cluster eqiad completed
  • 20:13 otto@deploy1001: scap-helm eventgate-analytics upgrade production -f eventgate-analytics-eqiad-values.yaml stable/eventgate-analytics [namespace: eventgate-analytics, clusters: eqiad]
  • 20:07 otto@deploy1001: scap-helm eventgate-analytics finished
  • 20:07 otto@deploy1001: scap-helm eventgate-analytics cluster staging completed
  • 20:07 otto@deploy1001: scap-helm eventgate-analytics upgrade staging -f eventgate-analytics-staging-values.yaml stable/eventgate-analytics [namespace: eventgate-analytics, clusters: staging]
  • 19:38 zfilipin@deploy1001: rebuilt and synchronized wikiversions files: group0 to 1.33.0-wmf.22
  • 19:13 otto@deploy1001: scap-helm eventgate-analytics cluster staging completed
  • 19:13 otto@deploy1001: scap-helm eventgate-analytics finished
  • 19:13 otto@deploy1001: scap-helm eventgate-analytics upgrade staging -f eventgate-analytics-staging-values.yaml stable/eventgate-analytics [namespace: eventgate-analytics, clusters: staging]
  • 19:13 otto@deploy1001: scap-helm eventgate-analytics upgrade staging -f eventgate-analytics-staging-values.yaml stable/eventgate-analytics [namespace: eventgate-analytics, clusters: staging]
  • 19:10 otto@deploy1001: scap-helm eventgate-analytics upgrade staging -f eventgate-analytics-staging-values.yaml stable/eventgate-analytics [namespace: eventgate-analytics, clusters: staging]
  • 19:04 zfilipin@deploy1001: Finished scap: testwiki to php-1.33.0-wmf.22 and rebuild l10n cache (duration: 38m 29s)
  • 18:50 jijiki: restarting pdfrender on scb1003
  • 18:49 ottomata: hitting eventgate-analytics in eqiad with ab
  • 18:39 otto@deploy1001: scap-helm eventgate-analytics cluster eqiad completed
  • 18:39 otto@deploy1001: scap-helm eventgate-analytics finished
  • 18:39 otto@deploy1001: scap-helm eventgate-analytics upgrade production -f eventgate-analytics-eqiad-values.yaml stable/eventgate-analytics [namespace: eventgate-analytics, clusters: eqiad]
  • 18:39 otto@deploy1001: scap-helm eventgate-analytics cluster codfw completed
  • 18:39 otto@deploy1001: scap-helm eventgate-analytics finished
  • 18:39 otto@deploy1001: scap-helm eventgate-analytics upgrade production -f eventgate-analytics-codfw-values.yaml stable/eventgate-analytics [namespace: eventgate-analytics, clusters: codfw]
  • 18:37 otto@deploy1001: scap-helm eventgate-analytics cluster staging completed
  • 18:37 otto@deploy1001: scap-helm eventgate-analytics finished
  • 18:37 otto@deploy1001: scap-helm eventgate-analytics upgrade staging -f eventgate-analytics-staging-values.yaml stable/eventgate-analytics [namespace: eventgate-analytics, clusters: staging]
  • 18:26 zfilipin@deploy1001: Started scap: testwiki to php-1.33.0-wmf.22 and rebuild l10n cache
  • 16:44 XioNoX: disable lldp on asw2-a-eqiad:ge-8/0/10
  • 16:25 chasemp: mkdir /srv/dumps/xmldatadumps/public/other/rook for T218587 (fyi apergos)
  • 15:55 gehel@cumin1001: END (PASS) - Cookbook sre.elasticsearch.e6-upgrade (exit_code=0)
  • 15:52 gehel@cumin1001: START - Cookbook sre.elasticsearch.e6-upgrade
  • 15:35 gehel@cumin1001: END (FAIL) - Cookbook sre.elasticsearch.e6-upgrade (exit_code=99)
  • 15:35 marostegui@deploy1001: Synchronized wmf-config/db-eqiad.php: Repool db1098:3317 (duration: 00m 50s)
  • 15:33 gehel@cumin1001: START - Cookbook sre.elasticsearch.e6-upgrade
  • 15:24 gehel@cumin1001: END (FAIL) - Cookbook sre.elasticsearch.e6-upgrade (exit_code=99)
  • 15:24 gehel@cumin1001: START - Cookbook sre.elasticsearch.e6-upgrade
  • 15:23 gehel@cumin1001: START - Cookbook sre.elasticsearch.e6-upgrade
  • 15:23 gehel@cumin1001: END (FAIL) - Cookbook sre.elasticsearch.e6-upgrade (exit_code=99)
  • 15:22 bawolff@deploy1001: Synchronized wmf-config/wikitech.php: Adjust account stuff at wikitech 4adc89bce4 (duration: 00m 48s)
  • 15:20 gehel@cumin1001: END (PASS) - Cookbook sre.elasticsearch.e6-upgrade (exit_code=0)
  • 15:20 gehel@cumin1001: START - Cookbook sre.elasticsearch.e6-upgrade
  • 15:10 gehel@cumin1001: END (FAIL) - Cookbook sre.elasticsearch.e6-upgrade (exit_code=99)
  • 15:09 gehel@cumin1001: START - Cookbook sre.elasticsearch.e6-upgrade
  • 15:09 gehel@cumin1001: END (ERROR) - Cookbook sre.elasticsearch.e6-upgrade (exit_code=97)
  • 15:08 gehel@cumin1001: START - Cookbook sre.elasticsearch.e6-upgrade
  • 14:51 marostegui@deploy1001: Synchronized wmf-config/db-eqiad.php: Depool db1098:3317 (duration: 00m 56s)
  • 14:35 zfilipin@deploy1001: clean aborted: Pruned MediaWiki: 1.33.0-wmf.17 (duration: 00m 03s)
  • 14:02 moritzm: rebooting oresrdb2002 for kernel update
  • 13:48 godog: take a snapshot of prometheus data on prometheus1004
  • 13:44 zfilipin@deploy1001: clean aborted: Pruned MediaWiki: 1.33.0-wmf.17 (duration: 00m 05s)
  • 13:37 zfilipin@deploy1001: clean aborted: Pruned MediaWiki: 1.33.0-wmf.17 (duration: 00m 08s)
  • 13:29 otto@deploy1001: scap-helm eventgate-analytics cluster staging completed
  • 13:29 otto@deploy1001: scap-helm eventgate-analytics finished
  • 13:29 otto@deploy1001: scap-helm eventgate-analytics upgrade staging -f eventgate-analytics-staging-values.yaml stable/eventgate-analytics [namespace: eventgate-analytics, clusters: staging]
  • 11:51 akosiaris: re-enable puppet across fleet
  • 11:45 Amir1: EU SWAT is done
  • 11:44 ladsgroup@deploy1001: Synchronized wmf-config/InitialiseSettings.php: SWAT: Add wikimania as a special group to wikidata sitelinks (T217730) (duration: 00m 50s)
  • 11:40 ladsgroup@deploy1001: Synchronized dblists/wikidataclient.dblist: SWAT: Add wikimaniawiki to wikidataclient.dblist (T217730) (duration: 00m 50s)
  • 11:34 ladsgroup@deploy1001: Synchronized wmf-config/InitialiseSettings.php: SWAT: Enable Advanced Mobile Contributions mode for ar,id,es and test wikis (T217643) (duration: 00m 50s)
  • 11:34 akosiaris" disable puppet across fleet to avoid alert spam storm
  • 11:28 ladsgroup@deploy1001: Synchronized wmf-config/InitialiseSettings.php: SWAT: Partially revert "Enable musical notation datatype in wikidata" (T218535) (duration: 00m 50s)
  • 11:16 ladsgroup@deploy1001: Synchronized wmf-config/InitialiseSettings.php: SWAT: Increased maxSerializedEntitySize from 2500 to 3000 (T217739) (duration: 01m 47s)
  • 11:03 akosiaris: restart gerrit for testing https://gerrit.wikimedia.org/r/#/c/operations/puppet/+/497727/
  • 10:28 akosiaris: restart gerrit for merge of https://gerrit.wikimedia.org/r/#/c/operations/puppet/+/497561/
  • 10:26 godog: reimage prometheus1003 with stretch - T205870
  • 10:20 marostegui: Repool dbproxy1010 and running wmcs-wikireplica-dns script
  • 10:12 marostegui: Reboot dbproxy1010 for upgrade
  • 09:45 vgutierrez: updated acme-chief to version 0.14 in acmechief[12]001
  • 09:32 marostegui: Deploy schema change on s7 codfw master, lag will appear on codfw
  • 09:06 marostegui@deploy1001: Synchronized wmf-config/db-eqiad.php: Repool db1087 (duration: 00m 48s)
  • 08:59 marostegui@deploy1001: Synchronized wmf-config/db-eqiad.php: Depool db1087 (duration: 00m 48s)
  • 08:55 filippo@puppetmaster1001: conftool action : set/pooled=no; selector: name=prometheus1003.eqiad.wmnet
  • 08:53 marostegui@deploy1001: Synchronized wmf-config/db-eqiad.php: Repool db1109 (duration: 00m 48s)
  • 08:47 marostegui@deploy1001: Synchronized wmf-config/db-eqiad.php: Depool db1109 (duration: 00m 48s)
  • 08:33 marostegui@deploy1001: Synchronized wmf-config/db-eqiad.php: Repool db1092 (duration: 00m 48s)
  • 08:26 marostegui@deploy1001: Synchronized wmf-config/db-eqiad.php: Depool db1092 (duration: 00m 48s)
  • 08:20 ema: cp2009, cp1071 (cp-ats): reboot for kernel upgrades
  • 07:32 elukey: pool kafka1001 in pybal's eventbus service after yesterday's network maintenance
  • 06:09 marostegui@deploy1001: Synchronized wmf-config/db-eqiad.php: Repool databases in row A - T187960 (duration: 00m 49s)
  • 00:48 thcipriani@deploy1001: Synchronized php-1.33.0-wmf.21/includes/Title.php: SWAT: Improve Caching in Title::loadRestrictions() (duration: 00m 51s)

2019-03-19

  • 22:20 otto@deploy1001: Finished deploy [eventlogging/analytics@9aea626]: fix for production error where mw api is returning html instead of json schemas (duration: 00m 04s)
  • 22:20 otto@deploy1001: Started deploy [eventlogging/analytics@9aea626]: fix for production error where mw api is returning html instead of json schemas
  • 21:50 otto@deploy1001: scap-helm eventgate-analytics finished
  • 21:50 otto@deploy1001; scap-helm eventgate-analytics cluster eqiad completed
  • 21:50 otto@deploy1001: scap-helm eventgate-analytics upgrade production -f eventgate-analytics-eqiad-values.yaml stable/eventgate-analytics [namespace: eventgate-analytics, clusters: eqiad]
  • 21:50 otto@deploy1001: scap-helm eventgate-analytics finished
  • 21:50 otto@deploy1001: scap-helm eventgate-analytics cluster codfw completed
  • 21:50 otto@deploy1001: scap-helm eventgate-analytics upgrade production -f eventgate-analytics-codfw-values.yaml stable/eventgate-analytics [namespace: eventgate-analytics, clusters: codfw]
  • 21:36 otto@deploy1001: scap-helm eventgate-analytics cluster staging completed
  • 21:36 otto@deploy1001: scap-helm eventgate-analytics finished
  • 21:36 otto@deploy1001; scap-helm eventgate-analytics upgrade staging -f eventgate-analytics-staging-values.yaml stable/eventgate-analytics [namespace: eventgate-analytics, clusters: staging]
  • 21:23 otto@deploy1001: scap-helm eventgate-analytics finished
  • 21:23 otto@deploy1001: scap-helm eventgate-analytics cluster staging completed
  • 21:23 otto@deploy1001: scap-helm eventgate-analytics upgrade staging -f eventgate-analytics-staging-values.yaml stable/eventgate-analytics [namespace: eventgate-analytics, clusters: staging]
  • 21:07 cdanis: cdanis@wikitech-static.wikimedia.org: apt install sshguard
  • 21:06 otto@deploy1001: scap-helm eventgate-analytics cluster staging completed
  • 21:06 otto@deploy1001: scap-helm eventgate-analytics finished
  • 21:06 otto@deploy1001: scap-helm eventgate-analytics upgrade staging -f eventgate-analytics-staging-values.yaml stable/eventgate-analytics [namespace: eventgate-analytics, clusters: staging]
  • 20:58 XioNoX: disable down ports with no description on switches
  • 20:44 cdanis: enabling puppet on contint1001
  • 19:54 gehel@cumin1001: END (FAIL) - Cookbook sre.elasticsearch.e6-upgrade (exit_code=99)
  • 19:52 gehel@cumin1001: START - Cookbook sre.elasticsearch.e6-upgrade
  • 19:47 XioNoX: disable asw2-a<->asw-a link
  • 19:44 cdanis: icinga failed over to icinga1001 successfully
  • 19:43 XioNoX: remove forced failover on cr1/cr2-eqiad
  • 19:36 cdanis: failing over icinga to icinga1001
  • 19:35 XioNoX: enable cr2-eqiad:ae1
  • 19:29 ariel@deploy1001: Finished deploy [dumps/dumps@da66149]: move maxretries to config (duration: 00m 03s)
  • 19:29 ariel@deploy1001: Started deploy [dumps/dumps@da66149]: move maxretries to config
  • 19:09 ejegg: updated CiviCRM from a2316be94f to 3bfc7a762e
  • 19:09 gtirloni: rebooted labmon1001
  • 19:02 XioNoX: disable cr2-eqiad:ae1
  • 18:46 XioNoX: failover cr2-eqiad:ae1 VRRP master to cr1
  • 18:17 XioNoX: starting pybal on lvs1002
  • 18:11 XioNoX: stopping pybal on lvs1002
  • 18:09 XioNoX: starting pybal on lvs1001
  • 18:01 XioNoX: stopping pybal on lvs1001
  • 18:01 jijiki: restart pdfrender on scb1003
  • 17:56 XioNoX: shutdown scp1001 for uplink move
  • 17:47 Lucas_WMDE: Updated the Wikidata property suggester with data from last Monday's JSON dump and applied the T132839 workarounds (T216270)
  • 17:33 hasharAway: contint1001 / CI going for a quick scheduled maintenance -network cable being moved-
  • 17:33 mholloway-shell@deploy1001: Finished deploy [mobileapps/deploy@64f09a0]: T150377: Bump wikimedia-page-library to 6.3.0 (duration: 01m 50s)
  • 17:31 mholloway-shell@deploy1001: Started deploy [mobileapps/deploy@64f09a0]: T150377: Bump wikimedia-page-library to 6.3.0
  • 17:30 mdholloway: mobileapps deploy failed for group default3, retrying
  • 17:24 tzatziki: changing email for User:St3f
  • 17:18 mholloway-shell@deploy1001: Finished deploy [mobileapps/deploy@64f09a0]: T150377: Bump wikimedia-page-library to 6.3.0 (duration: 03m 47s)
  • 17:16 addshore: started "foreachwikiindblist wiktionary extensions/Cognate/maintenance/populateCognatePages.php --batch-size 1000" in a screen on mwdebug1002 (catching up cognate after x1 readonly time)
  • 17:14 mholloway-shell@deploy1001: Started deploy [mobileapps/deploy@64f09a0]: T150377: Bump wikimedia-page-library to 6.3.0
  • 16:45 vgutierrez: uploaded acme-chief 0.14 to apt.wikimedia.org (buster) - T218685 T218418 T207295
  • 16:30 elukey: stop eventlogging's mysql kafka consumers on eventlog1002, eventlogging's db replication on db1108 to ease db1107's maintenance
  • 16:29 elukey: stop eventlogging's mysql kafka consumers on eventlog1002, eventlogging's db replication on db1108 to ease db1107's maintenance
  • 16:15 bstorm_: downtimed labstore1003 for network moves so it doesn't page
  • 16:10 otto@deploy1001: scap-helm eventgate-analytics cluster staging completed
  • 16:10 otto@deploy1001: scap-helm eventgate-analytics finished
  • 16:10 otto@deploy1001: scap-helm eventgate-analytics upgrade staging -f eventgate-analytics-staging-values.yaml stable/eventgate-analytics [namespace: eventgate-analytics, clusters: staging]
  • 16:08 ayounsi@puppetmaster1001: conftool action : set/pooled=yes; selector: name=dns1001.wikimedia.org,service=pdns_recursor
  • 16:02 ayounsi@puppetmaster1001: conftool action : set/pooled=no; selector: name=dns1001.wikimedia.org,service=pdns_recursor
  • 16:01 mobrovac@deploy1001: Finished deploy [restbase/deploy@62df8c3]: Update the docs page title; deploy v0.19.3, take #2 (duration: 21m 01s)
  • 15:58 tzatziki: changing password for User:St3f
  • 15:57 XioNoX: enable pybal on lvs1006
  • 15:55 XioNoX; disable pybal on lvs1006
  • 15:54 XioNoX: enable pybal on lvs1005
  • 15:52 XioNoX: disable pybal on lvs1005
  • 15:50 otto@deploy1001: scap-helm eventgate-analytics cluster staging completed
  • 15:50 otto@deploy1001: scap-helm eventgate-analytics finished
  • 15:50 otto@deploy1001: scap-helm eventgate-analytics upgrade staging -f eventgate-analytics-staging-values.yaml stable/eventgate-analytics [namespace: eventgate-analytics, clusters: staging]
  • 15:49 XioNoX: enable pybal on lvs1004
  • 15:45 XioNoX: disable pybal on lvs1004
  • 15:40 mobrovac@deploy1001: Started deploy [restbase/deploy@62df8c3]: Update the docs page title; deploy v0.19.3, take #2
  • 15:40 mobrovac@deploy1001: Finished deploy [restbase/deploy@62df8c3]: Update the docs page title; deploy v0.19.3 (duration: 12m 27s)
  • 15:28 mobrovac@deploy1001: Started deploy [restbase/deploy@62df8c3]: Update the docs page title; deploy v0.19.3
  • 15:27 marostegui@deploy1001: Synchronized wmf-config/db-eqiad.php: Set s2 read only OFF - T187960 (duration: 00m 26s)
  • 15:21 marostegui@deploy1001: Synchronized wmf-config/db-eqiad.php: Set s2 database master on read only - T187960 (duration: 00m 48s)
  • 15:12 XioNoX: eqiad A7 servers uplink move - T187960
  • 14:46 moritzm: rebooting icinga1001 for kernel update
  • 14:44 marostegui@deploy1001: Synchronized wmf-config/db-eqiad.php: Depool databases in row A - T187960 (duration: 00m 48s)
  • 14:41 reedy@deploy1001: Synchronized wmf-config/CommonSettings.php: Reapply I49a18d from gerrit for consistency (duration: 00m 49s)
  • 14:32 otto@deploy1001: scap-helm eventgate-analytics finished
  • 14:32 otto@deploy1001: scap-helm eventgate-analytics cluster eqiad completed
  • 14:32 otto@deploy1001: scap-helm eventgate-analytics upgrade production -f eventgate-analytics-eqiad-values.yaml stable/eventgate-analytics [namespace: eventgate-analytics, clusters: eqiad]
  • 14:32 otto@deploy1001: scap-helm eventgate-analytics cluster codfw completed
  • 14:32 otto@deploy1001: scap-helm eventgate-analytics finished
  • 14:31 otto@deploy1001: scap-helm eventgate-analytics upgrade production -f eventgate-analytics-codfw-values.yaml stable/eventgate-analytics [namespace: eventgate-analytics, clusters: codfw]
  • 14:31 otto@deploy1001: scap-helm eventgate-analytics install -n production -f eventgate-analytics-codfw-values.yaml stable/eventgate-analytics [namespace: eventgate-analytics, clusters: codfw]
  • 14:28 otto@deploy1001: scap-helm eventgate-analytics finished
  • 14:28 otto@deploy1001: scap-helm eventgate-analytics cluster staging completed
  • 14:28 <otto@deploy1001: scap-helm eventgate-analytics upgrade staging -f eventgate-analytics-staging-values.yaml stable/eventgate-analytics [namespace: eventgate-analytics, clusters: staging]
  • 14:10 otto@deploy1001: scap-helm eventgate-analytics cluster staging completed
  • 14:10 otto@deploy1001: scap-helm eventgate-analytics finished
  • 14:10 otto@deploy1001: scap-helm eventgate-analytics upgrade staging -f eventgate-analytics-staging-values.yaml stable/eventgate-analytics [namespace: eventgate-analytics, clusters: staging]
  • 13:19 akosiaris: start zuul/zuul-merger
  • 13:12 akosiaris: unfirewall gerrit, put service back in action
  • 11:31 moritzm: installing php5 security updates
  • 09:08 akosiaris: start nagios-nrpe-server on proton1002, failed due to fork() failed with error 12, bailing out...
  • 07:25 kart_: Finished manual run of unpublished ContentTranslation draft purge script (T218279)
  • 07:20 twentyafterfour@deploy1001: Synchronized wmf-config/CommonSettings.php: Temporarily disable account creation on wikitech (duration: 00m 51s)
  • 06:47 akosiaris: stop zuul and zuul-merger on contint1001
  • 03:45 kart_: Started manual run of unpublished ContentTranslation draft purge script (T218279)
  • 02:12 krinkle@deploy1001: Synchronized php-1.33.0-wmf.21/extensions/EventLogging/includes/ApiJsonSchema.php: If280a4056a (duration: 00m 48s)
  • 02:11 krinkle@deploy1001: Synchronized php-1.33.0-wmf.21/extensions/EventLogging/includes/RemoteSchema.php: If280a4056a (duration: 00m 51s)
  • 00:14 reedy@deploy1001: Synchronized php-1.33.0-wmf.21/tests/phpunit/includes/: Replace wgUser with RequestContext::getUser in User::getBlockedStatus (duration: 01m 00s)
  • 00:12 reedy@deploy1001: Synchronized php-1.33.0-wmf.21/includes/user/User.php: Replace wgUser with RequestContext::getUser in User::getBlockedStatus (duration: 00m 49s)

2019-03-18

  • 23:54 maxsem@deploy1001: Synchronized wmf-config/InitialiseSettings.php: https://gerrit.wikimedia.org/r/#/c/operations/mediawiki-config/+/494551/ (duration: 00m 49s)
  • 23:45 maxsem@deploy1001: Synchronized wmf-config/InitialiseSettings.php: https://gerrit.wikimedia.org/r/#/c/operations/mediawiki-config/+/494551/ (duration: 00m 48s)
  • 23:33 maxsem@deploy1001: Synchronized php-1.33.0-wmf.21/includes/EditPage.php: https://gerrit.wikimedia.org/r/#/c/mediawiki/core/+/497347/ (duration: 00m 49s)
  • 23:25 twentyafterfour: running puppet on phab1001 to get out of degraded state
  • 23:23 XioNoX: renumber Telia transit in eqsin
  • 23:14 maxsem@deploy1001> Synchronized wmf-config/InitialiseSettings.php: https://gerrit.wikimedia.org/r/#/c/operations/mediawiki-config/+/497317/ (duration: 00m 49s)
  • 23:07 maxsem@deploy1001> Synchronized wmf-config/CommonSettings.php: https://gerrit.wikimedia.org/r/#/c/operations/mediawiki-config/+/496515/ (duration: 00m 48s)
  • 22:18 greg-g: gjg@phab1001:~$ sudo /srv/phab/phabricator/bin/auth strip --all-types --user Barras # per request/verification from foks
  • 19:57 bawolff@deploy1001: Synchronized wmf-config/InitialiseSettings.php: Enable block disables login on wikitech (duration: 00m 48s)
  • 19:56 bawolff@deploy1001: Synchronized wmf-config/wikitech.php: Adjust ldap config (duration: 00m 48s)
  • 16:17 volans: restarting pdfrender on scb1003
  • 16:15 volans: restarting pdfrender on scb1004
  • 15:48 jiji@cumin1001: conftool action : set/pooled=no; selector: dc=eqiad,service=cxserver,cluster=scb,name=scb.*
  • 15:45 jijiki: Depool sbc* from serving cxserver on eqiad - T213195
  • 15:06 papaul: shutting down mw2206 for memtest
  • 14:47 gehel@cumin1001: END (FAIL) - Cookbook sre.elasticsearch.e6-upgrade (exit_code=99)
  • 14:46 gehel@cumin1001: START - Cookbook sre.elasticsearch.e6-upgrade
  • 14:13 filippo@puppetmaster1001: conftool action : set/pooled=no; selector: name=prometheus2003.codfw.wmnet
  • 13:42 ema: cp-ats rolling restart to apply proxy.config.cache.ram_cache.size config change T213263
  • 13:23 mvolz@deploy1001: scap-helm citoid finished
  • 13:22 mvolz@deploy1001: scap-helm citoid cluster codfw completed
  • 13:22 mvolz@deploy1001: scap-helm citoid upgrade production -f citoid-codfw-values.yaml stable/citoid [namespace: citoid, clusters: codfw]
  • 13:18 mvolz@deploy1001: scap-helm citoid finished
  • 13:18 mvolz@deploy1001: scap-helm citoid cluster eqiad completed
  • 13:17 mvolz@deploy1001: scap-helm citoid upgrade production -f citoid-eqiad-values.yaml stable/citoid [namespace: citoid, clusters: eqiad]
  • 13:04 arturo: T218022 disable icinga checks for labtestservices2001.wikimedia.org
  • 12:54 arturo: T218025 disable icinga checks for cloudnet2001-dev.codfw.wmnet
  • 12:49 mvolz@deploy1001: scap-helm citoid finished
  • 12:49 mvolz@deploy1001: scap-helm citoid cluster staging completed
  • 12:49 mvolz@deploy1001: scap-helm citoid upgrade staging -f citoid-staging-values.yaml stable/citoid [namespace: citoid, clusters: staging]
  • 12:48 mvolz@deploy1001: scap-helm citoid upgrade staging -f citoid-values-staging.yaml stable/citoid [namespace: citoid, clusters: staging]
  • 11:45 zeljkof: EU SWAT finished
  • 11:45 zfilipin@deploy1001: Synchronized wmf-config/InitialiseSettings.php: SWAT: Enable mobile section editing on bnwiki, hewiki, zh_yuewiki (T218375)|gerrit:496696Enable mobile section editing on bnwiki, hewiki, zh_yuewiki (T218375) (duration: 00m 50s)
  • 10:51 _joe_: testing safety checks for php-fpm on mwdebug2001
  • 10:35 jdrewniak@deploy1001: Synchronized portals: Wikimedia Portals Update: Bumping portals to master (T128546)|gerrit:497261 Bumping portals to master (T128546) (duration: 00m 48s)
  • 10:34 jdrewniak@deploy1001: Synchronized portals/wikipedia.org/assets: Wikimedia Portals Update: Bumping portals to master (T128546)|gerrit:497261 Bumping portals to master (T128546) (duration: 00m 49s)
  • 10:12 vgutierrez: uploaded acme-chief 0.12 to apt.wikimedia.org (buster) - T218543
  • 10:12 volans: restarted irc echo on icinga2001
  • 10:04 _joe_: hot-patching the error in php7.2-fpm config
  • 10:02 volans: running puppet on hosts matching 'C:php::fpm' to apply I004349
  • 10:00 volans: running puppet on failed hosts
  • 09:57 volans: temporarily stop ircecho to avoid spam
  • 09:40 ema: superior-cache-analyzer_3.3.7 uploaded to stretch-wikimedia T213263
  • 09:29 godog: switch to mpm_event for prometheus apache before merging https://gerrit.wikimedia.org/r/c/operations/puppet/+/496750
  • 08:58 vgutierrez: uploaded acme-chief 0.11 to apt.wikimedia.org (buster) - T207295
  • 08:52 moritzm: restarting ferm on sessionstore, was stuck in resolving one of the -a records, which were only merged in a subsequent step (T215883)
  • 08:50 marostegui@deploy1001: Synchronized wmf-config/db-eqiad.php: Repool db1104 (duration: 00m 48s)
  • 08:39 marostegui@deploy1001: Synchronized wmf-config/db-eqiad.php: Depool db1104 (duration: 00m 48s)
  • 08:34 filippo@puppetmaster1001: conftool action : set/pooled=yes; selector: name=prometheus2003.codfw.wmnet
  • 08:32 ema@puppetmaster1001: conftool action : set/pooled=yes; selector: name=cp2002.codfw.wmnet,service=varnish-fe
  • 08:32 ema@puppetmaster1001: conftool action : set/pooled=yes; selector: name=cp2002.codfw.wmnet,service=nginx
  • 08:31 ema: cp2002: repool varnish-fe to resume ATS testing T213263
  • 08:28 marostegui@deploy1001: Synchronized wmf-config/db-eqiad.php: Fully repool db1101 (duration: 00m 48s)
  • 08:22 moritzm: armed keyholder on neodymium
  • 07:49 marostegui@deploy1001: Synchronized wmf-config/db-eqiad.php: More traffic to db1101 (duration: 00m 48s)
  • 07:35 marostegui@deploy1001: Synchronized wmf-config/db-eqiad.php: More traffic to db1101 (duration: 00m 48s)
  • 07:24 marostegui@deploy1001: Synchronized wmf-config/db-eqiad.php: More traffic to db1101 (duration: 00m 48s)
  • 07:16 marostegui@deploy1001: Synchronized wmf-config/db-eqiad.php: Slowly repool db1101 (duration: 00m 49s)
  • 07:02 marostegui: Stop db1101 to upgrade mysql and kernel
  • 07:02 marostegui@deploy1001: Synchronized wmf-config/db-eqiad.php: Depool db1101 (duration: 00m 48s)
  • 06:33 marostegui: Deploy schema change on s8 codfw master (db2045), this will generate lag on s8 codfw
  • 06:30 marostegui@deploy1001: Synchronized wmf-config/db-eqiad.php: Repool db1121 (duration: 00m 48s)
  • 06:08 marostegui: Deploy schema change on x1 master (db1069) with replication - T218397
  • 06:04 marostegui: Deploy schema change on db1121 - lag will appear on labsdb:s4
  • 06:03 marostegui@deploy1001: Synchronized wmf-config/db-eqiad.php: Depool db1121 (duration: 01m 04s)
  • 03:58 kart_: Finished manual run of unpublished ContentTranslation draft purge script (T218279)
  • 02:00 kart_: Started manual run of unpublished ContentTranslation draft purge script (T218279)

2019-03-17

  • 11:51 Amir1: ladsgroup@mwmaint1002:~$ mwscript maintenance/createAndPromote.php --wiki=labswiki --force --sysop Ladsgroup
  • 08:49 elukey: restart pdfrender on scb1004

2019-03-16

  • 10:00 chasemp: stop apache on cobalt for maintenance
  • 00:19 andrewbogott: restarting slapd on seaborgium

2019-03-15

  • 22:37 shdubsh: temporarily stop ircecho on icinga2001
  • 18:00 thcipriani@deploy1001: Synchronized php-1.33.0-wmf.21/extensions/MobileFrontend: SWAT: iOS: Fix mobile editor|gerrit:496827iOS: Fix mobile editor T218069 T218062 T218352 T211490 T218062 T211491 T172877 (duration: 00m 54s)
  • 17:53 ema@puppetmaster1001: conftool action : set/pooled=no; selector: name=cp2002.codfw.wmnet,service=varnish-fe
  • 17:53 ema@puppetmaster1001: conftool action : set/pooled=no; selector: name=cp2002.codfw.wmnet,service=nginx
  • 17:53 ema: depool cp2002's varnish-fe for the weekend T213263#5027366
  • 17:25 arturo: acmechief2001 - armed keyholder
  • 17:22 arturo: cumin2001 - armed keyholder
  • 17:21 andrewbogott: updating puppet compiler facts
  • 17:13 mutante: netmon2001 - armed keyholder for rancid
  • 17:12 mutante: netmon1002 - armed keyholder for rancid
  • 17:04 arturo: arm keyholder in deploy2001
  • 17:03 arturo: arm keyholder in sarin
  • 17:02 arturo: arm keyholder in labpuppetmaster1002
  • 17:01 arturo: arm keyholder in deploy101
  • 17:00 XioNoX: clean up rigel switch port
  • 17:00 arturo: arm keyholder in acmechief1001
  • 16:58 arturo: arming keyholder in cumin1001
  • 16:09 moritzm: upgrading deployment-deploy01 to component/php72
  • 15:59 akosiaris: puppetmaster1001 rm /var/run/confd-template/.citoid*.err to remove old stale confd files that resulted from merging https://gerrit.wikimedia.org/r/494213
  • 15:54 moritzm: rebooting labtestservices2003 for kernel update
  • 15:47 andrewbogott: enabling puppet on seaborgium to apply new acme cert
  • 15:47 moritzm: rebooting labtestservices2002 for kernel update
  • 15:42 moritzm: rebooting labtestcontrol2003 for kernel update
  • 15:38 moritzm: rebooting labtestnet2002 for kernel update
  • 15:11 ema@puppetmaster1001: conftool action : set/pooled=yes; selector: service=ats-be,cluster=cache_upload,name=cp2015.codfw.wmnet
  • 15:10 ema: cp2015: repool ATS with proxy.config.cache.ram_cache.size 1G T213263
  • 15:07 moritzm: rebooting graphite2003 for kernel security update
  • 15:05 ema@puppetmaster1001: conftool action : set/pooled=no; selector: service=ats-be,cluster=cache_upload,name=cp2015.codfw.wmnet
  • 15:04 ema: cp2015: test ATS depool T213263
  • 14:45 mutante: tools tools-sgebastion-07 - dpkg-reconfigure locales and adding ko_KR.EUC-KR for Korean users by request and as done in the past on former tools bastion
  • 14:43 moritzm: rebooting etherpad1001 to pick up SSBD-enabled qemu
  • 14:31 mutante: tools-sgebastion-07 - generating locales for user request in T130532
  • 13:50 moritzm: rolling reboot of ores in codfw for SSBD/L1TF kernel update
  • 13:47 akosiaris@deploy1001: scap-helm cxserver finished
  • 13:47 akosiaris@deploy1001: scap-helm cxserver cluster staging completed
  • 13:47 akosiaris@deploy1001: scap-helm cxserver upgrade -f cxserver-staging-values.yaml staging stable/cxserver [namespace: cxserver, clusters: staging]
  • 11:16 godog: reenable prometheus@k8s on prometheus2004 with mod_proxy connection limits - T217715
  • 10:31 akosiaris: add a 10s bucket to cxserver prometheus-statsd exporter mappings
  • 10:31 akosiaris@deploy1001: scap-helm cxserver finished
  • 10:31 akosiaris@deploy1001: scap-helm cxserver cluster eqiad completed
  • 10:31 akosiaris@deploy1001: scap-helm cxserver upgrade -f cxserver-eqiad-values.yaml production stable/cxserver [namespace: cxserver, clusters: eqiad]
  • 10:31 akosiaris@deploy1001: scap-helm cxserver finished
  • 10:31 akosiaris@deploy1001: scap-helm cxserver cluster codfw completed
  • 10:31 akosiaris@deploy1001: scap-helm cxserver upgrade -f cxserver-codfw-values.yaml production stable/cxserver [namespace: cxserver, clusters: codfw]
  • 10:31 akosiaris@deploy1001: scap-helm cxserver finished
  • 10:31 akosiaris@deploy1001: scap-helm cxserver cluster staging completed
  • 10:31 akosiaris@deploy1001: scap-helm cxserver upgrade -f cxserver-staging-values.yaml staging stable/cxserver [namespace: cxserver, clusters: staging]
  • 10:30 akosiaris@deploy1001: scap-helm cxserver upgrade -f cxserver-staging-values.yaml staging stable/citoid [namespace: cxserver, clusters: staging]
  • 10:03 akosiaris@deploy1001: scap-helm citoid finished
  • 10:03 akosiaris@deploy1001: scap-helm citoid cluster codfw completed
  • 10:03 akosiaris@deploy1001: scap-helm citoid upgrade -f citoid-codfw-values.yaml production stable/citoid [namespace: citoid, clusters: codfw]
  • 10:03 akosiaris@deploy1001: scap-helm citoid finished
  • 10:02 akosiaris@deploy1001: scap-helm citoid cluster eqiad completed
  • 10:02 akosiaris@deploy1001: scap-helm citoid upgrade -f citoid-eqiad-values.yaml production stable/citoid [namespace: citoid, clusters: eqiad]
  • 10:02 akosiaris: add a 10s bucket to citoid prometheus-statsd exporter mappings
  • 10:02 akosiaris: remove prometheus-statsd-exporter from zotero pods
  • 10:02 akosiaris@deploy1001: scap-helm citoid finished
  • 10:02 akosiaris@deploy1001: scap-helm citoid cluster staging completed
  • 10:02 akosiaris@deploy1001: scap-helm citoid upgrade -f citoid-staging-values.yaml staging stable/citoid [namespace: citoid, clusters: staging]
  • 10:01 akosiaris@deploy1001: scap-helm citoid upgrade -f citoid-values-staging.yaml staging stable/citoid [namespace: citoid, clusters: staging]
  • 10:00 akosiaris@deploy1001: scap-helm zotero finished
  • 10:00 akosiaris@deploy1001: scap-helm zotero cluster staging completed
  • 10:00 akosiaris@deploy1001: scap-helm zotero upgrade --install -f zotero-values-staging.yaml staging stable/zotero [namespace: zotero, clusters: staging]
  • 09:58 akosiaris@deploy1001: scap-helm zotero upgrade -f zotero-values-staging.yaml staging stable/zotero [namespace: zotero, clusters: staging]
  • 09:53 akosiaris@deploy1001: scap-helm zotero finished
  • 09:53 akosiaris@deploy1001: scap-helm zotero cluster eqiad completed
  • 09:53 akosiaris@deploy1001: scap-helm zotero upgrade -f zotero-values-eqiad.yaml production stable/zotero [namespace: zotero, clusters: eqiad]
  • 09:53 akosiaris@deploy1001: scap-helm zotero finished
  • 09:53 akosiaris@deploy1001: scap-helm zotero cluster codfw completed
  • 09:52 akosiaris@deploy1001: scap-helm zotero upgrade -f zotero-values-codfw.yaml production stable/zotero [namespace: zotero, clusters: codfw]
  • 09:42 godog: bounce grafana-server on grafana1001
  • 09:32 marostegui@deploy1001: Synchronized wmf-config/db-eqiad.php: Repool db1103 (duration: 00m 50s)
  • 09:28 godog: correction, prometheus2004
  • 09:27 godog: temporarily disable read queries to prometheus@k8s on prometheus2003
  • 09:19 jiji@cumin1001: conftool action : set/weight=12; selector: dc=eqiad,service=cxserver,cluster=scb,name=kubernetes.*
  • 09:18 jiji@cumin1001: conftool action : set/weight=15; selector: dc=codfw,service=cxserver,cluster=scb,name=kubernetes.*
  • 09:17 jijiki: Ramp up cxserver k8s traffic to 50% - T213195
  • 08:57 marostegui@deploy1001: Synchronized wmf-config/db-eqiad.php: Depool db1103 (duration: 00m 50s)
  • 08:52 marostegui@deploy1001: Synchronized wmf-config/db-eqiad.php: Repool db1084 (duration: 00m 47s)
  • 08:22 marostegui@deploy1001: Synchronized wmf-config/db-eqiad.php: Depool db1084 (duration: 00m 49s)
  • 07:28 marostegui@deploy1001: Synchronized wmf-config/db-eqiad.php: Fully repool db1091 (duration: 00m 49s)
  • 07:17 marostegui@deploy1001: Synchronized wmf-config/db-eqiad.php: More traffic to db1091 (duration: 00m 48s)
  • 07:05 marostegui@deploy1001: Synchronized wmf-config/db-eqiad.php: More traffic to db1091 (duration: 00m 48s)
  • 07:01 kart_: Finished manual run of unpublished ContentTranslation draft purge script (T217818)
  • 06:55 marostegui@deploy1001: Synchronized wmf-config/db-eqiad.php: More traffic to db1091 (duration: 00m 48s)
  • 06:44 marostegui@deploy1001: Synchronized wmf-config/db-eqiad.php: Slowly repool db1091 (duration: 00m 48s)
  • 06:04 marostegui: Upgrade db1091
  • 06:04 marostegui@deploy1001: Synchronized wmf-config/db-eqiad.php: Depool db1091 (duration: 00m 50s)
  • 04:01 kart_: Started manual run of unpublished ContentTranslation draft purge script (T217818)
  • 01:25 ejegg: re-enabled ingenico audit parser
  • 01:25 ejegg: updated fundraising CiviCRM from 41efa14fb0 to a2316be94f

2019-03-14

  • 22:54 ejegg: temporarily disabled Ingenico WX audit parsing
  • 22:05 cdanis: cdanis@icinga2001.wikimedia.org ~ % sudo systemctl restart icinga.service
  • 21:58 cdanis: cdanis@icinga2001.wikimedia.org ~ % sudo systemctl restart nsca.service
  • 21:01 crusnov@deploy1001: Finished deploy [netbox/deploy@090a0c3]: Another minor bugfix releaes for ganeti-netbox script (duration: 00m 56s)
  • 21:00 crusnov@deploy1001: Started deploy [netbox/deploy@090a0c3]: Another minor bugfix releaes for ganeti-netbox script
  • 20:26 thcipriani: gerrit live on 2.15.11
  • 20:24 thcipriani: restarting gerrit for 2.15.11
  • 20:23 thcipriani@deploy1001: Finished deploy [gerrit/gerrit@2bc8af0]: Gerrit to 2.15.11 on cobalt (duration: 00m 02s)
  • 20:23 thcipriani@deploy1001: Started deploy [gerrit/gerrit@2bc8af0]: Gerrit to 2.15.11 on cobalt
  • 20:22 thcipriani@deploy1001: Finished deploy [gerrit/gerrit@2bc8af0]: Gerrit to 2.15.11 on gerrit2001 only (duration: 00m 04s)
  • 20:22 thcipriani@deploy1001: Started deploy [gerrit/gerrit@2bc8af0]: Gerrit to 2.15.11 on gerrit2001 only
  • 20:17 ejegg: updated CiviCRM from b4e3cf16cc to 41efa14fb0
  • 20:17 thcipriani: gerrit back to 2.15.8
  • 20:15 thcipriani: restart gerrit on cobalt
  • 20:14 thcipriani@deploy1001: Finished deploy [gerrit/gerrit@2bc8af0]: Revert Gerrit to 2.15.11 on cobalt (duration: 00m 07s)
  • 20:14 thcipriani@deploy1001: Started deploy [gerrit/gerrit@2bc8af0]: Revert Gerrit to 2.15.11 on cobalt
  • 20:14 thcipriani@deploy1001: Finished deploy [gerrit/gerrit@2bc8af0]: Revert Gerrit to 2.15.11 on gerrit2001 only (duration: 00m 10s)
  • 20:13 thcipriani@deploy1001: Started deploy [gerrit/gerrit@2bc8af0]: Revert Gerrit to 2.15.11 on gerrit2001 only
  • 20:13 bstorm_: Placed labstore1006 back in rotation for NFS and rsync
  • 20:11 crusnov@deploy1001: Finished deploy [netbox/deploy@c6cf7d6]: Minor bugfix releaes for ganeti-netbox script (duration: 00m 54s)
  • 20:10 crusnov@deploy1001: Started deploy [netbox/deploy@c6cf7d6]: Minor bugfix releaes for ganeti-netbox script
  • 20:03 jforrester@deploy1001: Synchronized php-1.33.0-wmf.21/extensions/GrowthExperiments/extension.json: Hot-deploy I19414dc31 to fix dependencies on mw.Uri (duration: 00m 49s)
  • 19:37 XioNoX: set protocols bgp group Anycast4 multihop ttl 193 on cr1/2-esams - T209989
  • 19:25 XioNoX: merged Juniper BFD Icinga check
  • 19:12 thcipriani: gerrit back up
  • 19:08 thcipriani: restarting gerrit on cobalt for 2.15.11 upgrade
  • 19:07 thcipriani@deploy1001: Finished deploy [gerrit/gerrit@2bc8af0]: Gerrit to 2.15.11 on cobalt (duration: 00m 11s)
  • 19:07 thcipriani@deploy1001: Started deploy [gerrit/gerrit@2bc8af0]: Gerrit to 2.15.11 on cobalt
  • 19:05 thcipriani@deploy1001: Finished deploy [gerrit/gerrit@2bc8af0]: Gerrit to 2.15.11 on gerrit2001 only (duration: 00m 11s)
  • 19:05 thcipriani@deploy1001: Started deploy [gerrit/gerrit@2bc8af0]: Gerrit to 2.15.11 on gerrit2001 only
  • 19:02 XioNoX: set protocols bgp group Anycast4 multihop ttl 193 on cr1/2-eqiad - T209989
  • 18:53 jforrester@deploy1001: Synchronized php-1.33.0-wmf.21/extensions/ParsoidBatchAPI/includes/ApiParsoidBatch.php: SWAT Another deprecation fix via I4936d0ce03 (duration: 00m 49s)
  • 18:37 XioNoX: set protocols bgp group Anycast4 multihop ttl 190 on cr1-codfw - T209989
  • 18:31 jforrester@deploy1001: Synchronized wmf-config/InitialiseSettings.php: SWAT T216730 Enable musical notation datatype on Wikidata (duration: 00m 48s)
  • 18:29 jforrester@deploy1001: Synchronized php-1.33.0-wmf.21/extensions/GrowthExperiments/modules/help/: SWAT Ib13cf88d GrowthExperiments log fix for closes (duration: 00m 49s)
  • 18:22 jforrester@deploy1001: Synchronized wmf-config/CommonSettings.php: SWAT T217436 Add default user config for rollback confirmation (duration: 00m 48s)
  • 18:13 jforrester@deploy1001: Synchronized wmf-config/InitialiseSettings.php: SWAT T217436 Set up exceptions for rollback confirmation (duration: 00m 49s)
  • 18:08 tzatziki: change email for KStineRowe (WMF) on officewiki, collabwiki, SUL
  • 18:05 mforns@deploy1001: Finished deploy [analytics/aqs/deploy@13203f1]: Deploying AQS for node10 upgrade (duration: 19m 40s)
  • 17:59 jforrester@deploy1001: Synchronized php-1.33.0-wmf.21/extensions/ParsoidBatchAPI/includes/ApiParsoidBatch.php: Hot-deploy I2842dfea to reduce deprecation spam after T206675 deploy of wmf.21 (duration: 00m 49s)
  • 17:45 mforns@deploy1001: Started deploy [analytics/aqs/deploy@13203f1]: Deploying AQS for node10 upgrade
  • 17:43 mforns: Deploying AQS using scap (node10 upgrade)
  • 17:32 arlolra: Updated Parsoid to f3e2209 (T213950)
  • 17:24 arlolra@deploy1001: Finished deploy [parsoid/deploy@8cf4107]: Updating Parsoid to f3e2209 (duration: 07m 09s)
  • 17:17 arlolra@deploy1001: Started deploy [parsoid/deploy@8cf4107]: Updating Parsoid to f3e2209
  • 17:15 jijiki: Pool mw1280 back - T218006
  • 17:12 jijiki: Depool mw2206 - T215415
  • 16:51 otto@deploy1001: scap-helm eventgate-analytics finished
  • 16:51 otto@deploy1001: scap-helm eventgate-analytics cluster staging completed
  • 16:51 otto@deploy1001: scap-helm eventgate-analytics upgrade staging -f eventgate-analytics-staging-values.yaml stable/eventgate-analytics [namespace: eventgate-analytics, clusters: staging]
  • 16:50 crusnov@deploy1001: Finished deploy [netbox/deploy@59430dd]: Deploy Ganeti Sync and Upgrade to upstream v2.5.8 (netmon1002) - T215229 (duration: 00m 50s)
  • 16:49 crusnov@deploy1001: Started deploy [netbox/deploy@59430dd]: Deploy Ganeti Sync and Upgrade to upstream v2.5.8 (netmon1002) - T215229
  • 16:46 crusnov@deploy1001: Finished deploy [netbox/deploy@59430dd]: Deploy Ganeti Sync and Upgrade to upstream v2.5.8 - T215229 (duration: 00m 30s)
  • 16:45 crusnov@deploy1001: Started deploy [netbox/deploy@59430dd]: Deploy Ganeti Sync and Upgrade to upstream v2.5.8 - T215229
  • 16:32 XioNoX: add default deny to mr1-* junos-host policies - T218234
  • 16:30 addshore@deploy1001: Synchronized php-1.33.0-wmf.21/extensions/Wikibase/lib/includes/Store/Sql/TermSqlIndex.php: gerrit:496481 TermSqlIndex, track calls to getTermsOfEntities (duration: 00m 50s)
  • 16:22 otto@deploy1001: scap-helm eventgate-analytics finished
  • 16:22 otto@deploy1001: scap-helm eventgate-analytics cluster staging completed
  • 16:22 otto@deploy1001: scap-helm eventgate-analytics upgrade staging -f eventgate-analytics-staging-values.yaml stable/eventgate-analytics [namespace: eventgate-analytics, clusters: staging]
  • 16:08 arturo: reimaging cloudvirt1015 again
  • 16:04 akosiaris: reboot one final time all sessionstore[12]00[123] servers
  • 16:02 arturo: T216497 drop python-dogpile.cache from jessie-wikimedia/openstack-mitaka-jessie
  • 14:57 marostegui: Start replication on db2070 after testing url_notes
  • 14:53 mutante: analytics-tool1003 - stopping idle screen session
  • 14:43 marostegui: Stop replication on db2070 to test the url_notes (will alert only on IRC)
  • 14:21 otto@deploy1001: scap-helm eventgate-analytics finished
  • 14:21 otto@deploy1001: scap-helm eventgate-analytics cluster eqiad completed
  • 14:21 otto@deploy1001: scap-helm eventgate-analytics upgrade production -f eventgate-analytics-eqiad-values.yaml --set main_app.version=v1.0.3-wmf0 stable/eventgate-analytics [namespace: eventgate-analytics, clusters: eqiad]
  • 14:09 otto@deploy1001: scap-helm eventgate-analytics finished
  • 14:09 otto@deploy1001: scap-helm eventgate-analytics cluster staging completed
  • 14:09 otto@deploy1001: scap-helm eventgate-analytics upgrade staging -f eventgate-analytics-staging-values.yaml stable/eventgate-analytics [namespace: eventgate-analytics, clusters: staging]
  • 13:54 godog: take a snapshot of data on prometheus2004
  • 13:50 arturo: reimaging cloudvirt1015
  • 13:36 marostegui@deploy1001: Synchronized wmf-config/db-eqiad.php: Fully repool db1081 into API (duration: 00m 48s)
  • 13:15 arturo: T216497 drop libpulse0 from jessie-wikimedia/openstack-mtiaka-jessie
  • 13:15 marostegui@deploy1001: Synchronized wmf-config/db-eqiad.php: Slowly repool db1081 into API (duration: 00m 49s)
  • 13:10 arturo: T216497 drop python-mysqldb from jessie-wikimedia/openstack-mtiaka-jessie
  • 13:10 zfilipin@deploy1001: rebuilt and synchronized wikiversions files: all wikis to 1.33.0-wmf.21
  • 12:50 jiji@cumin1001: conftool action : set/pooled=yes; selector: dc=codfw,service=cxserver,cluster=scb,name=kubernetes.*
  • 12:49 jiji@cumin1001: conftool action : set/weight=1; selector: dc=codfw,service=cxserver,cluster=scb,name=kubernetes.*
  • 12:42 jijiki: Rump up k8s cxserver traffic to 8% - T213195
  • 12:22 jiji@cumin1001: conftool action : set/pooled=yes; selector: dc=eqiad,service=cxserver,cluster=scb,name=kubernetes.*
  • 12:21 jiji@cumin1001: conftool action : set/weight=1; selector: dc=eqiad,service=cxserver,cluster=scb,name=kubernetes.*
  • 12:17 jijiki: Send ~4% of cxserver traffic to eqiad k8s - T213195
  • 12:14 zeljkof: EU SWAT finished
  • 12:13 kartik@deploy1001: Synchronized wmf-config: SWAT: gerrit:496418 Revert "Correct the enable context detection configuration" (duration: 00m 56s)
  • 12:12 arturo: T216497 drop some packages from jessie-wikimedia/openstack-mtiaka-jessie: qemu-XXX
  • 12:06 arturo: T216497 drop some packages from jessie-wikimedia/openstack-mtiaka-jessie: libvirt*, librados2, librbd1, because they induce the resolver to conflict with those included in stretch
  • 12:02 kartik@deploy1001: Synchronized wmf-config: SWAT: Revert gerrit:496412 Fix content detection config (duration: 00m 56s)
  • 11:58 kartik@deploy1001: scap failed: average error rate on 3/11 canaries increased by 10x (rerun with --force to override this check, see https://logstash.wikimedia.org/goto/db09a36be5ed3e81155041f7d46ad040 for details)
  • {{safesubst:SAL entry|1=11:45 kartik@deploy1001: Synchronized php-1.33.0-wmf.21/skins/MinervaNeue: SWAT: [[gerrit:496364|Ensure page-actions icons are `display:block` (T218182) (duration: 00m 57s)}}
  • 11:15 kartik@deploy1001: Synchronized wmf-config/InitialiseSettings.php: SWAT: gerrit:493672 Enable ExternalGuidance to all Wikipedias (T216129) (duration: 00m 57s)
  • 10:57 marostegui@deploy1001: Synchronized wmf-config/db-eqiad.php: Slowly repool db1081 (duration: 00m 57s)
  • 10:50 ema@puppetmaster1001: conftool action : set/pooled=yes; selector: name=cp2002.codfw.wmnet,service=varnish-fe
  • 10:50 ema@puppetmaster1001: conftool action : set/pooled=yes; selector: name=cp2002.codfw.wmnet,service=nginx
  • 10:50 ema: cp2002: pool varnish-fe to resume ATS testing T213263
  • 10:44 moritzm: installing libsdl1.2 security updates for jessie
  • 10:21 marostegui@deploy1001: Synchronized wmf-config/db-eqiad.php: Depool db1081 (duration: 00m 58s)
  • 09:54 hashar: ci: live hacked job https://integration.wikimedia.org/ci/job/quibble-vendor-mysql-hhvm-docker/ in attempt to capture 'core' files from hhvm | https://gerrit.wikimedia.org/r/#/c/integration/config/+/496392/ | T216689
  • 09:02 mutante: ms-be2037 - down since a couple hours, no SAL or ticket, powercycling
  • 08:44 marostegui: Deploy schema change on s4 codfw master (db2051), this will generate lag on codfw
  • 08:26 marostegui@deploy1001: Synchronized wmf-config/db-eqiad.php: Fully repool db1088 (duration: 00m 53s)
  • 08:21 marostegui: Upgrade s3 codfw master (db2043) there will be lag on s3 codfw
  • 08:10 marostegui@deploy1001: Synchronized wmf-config/db-eqiad.php: More traffic to db1088 (duration: 00m 55s)
  • 07:54 marostegui@deploy1001: Synchronized wmf-config/db-eqiad.php: Slowly repool db1088 (duration: 00m 55s)
  • 07:48 akosiaris@deploy1001: scap-helm cxserver finished
  • 07:48 akosiaris@deploy1001: scap-helm cxserver cluster codfw completed
  • 07:48 akosiaris@deploy1001: scap-helm cxserver upgrade -f cxserver-codfw-values.yaml production stable/cxserver [namespace: cxserver, clusters: codfw]
  • 07:42 marostegui: Upgrade db1088
  • 07:26 marostegui@deploy1001: Synchronized wmf-config/db-eqiad.php: Depool db1088 (duration: 00m 54s)
  • 07:22 kartik@deploy1001: Finished deploy [cxserver/deploy@3ba57a5]: Update cxserver to b16f4a1 (T212577, T208386) (duration: 03m 50s)
  • 07:21 marostegui@deploy1001: Synchronized wmf-config/db-eqiad.php: Fully repool db1098 (duration: 00m 55s)
  • 07:18 kartik@deploy1001: Started deploy [cxserver/deploy@3ba57a5]: Update cxserver to b16f4a1 (T212577, T208386)
  • 07:16 akosiaris@deploy1001: scap-helm cxserver finished
  • 07:16 akosiaris@deploy1001: scap-helm cxserver cluster codfw completed
  • 07:16 akosiaris@deploy1001: scap-helm cxserver upgrade -f cxserver-codfw-values.yaml production stable/cxserver [namespace: cxserver, clusters: codfw]
  • 07:16 akosiaris@deploy1001: scap-helm cxserver finished
  • 07:16 akosiaris@deploy1001: scap-helm cxserver cluster eqiad completed
  • 07:16 akosiaris@deploy1001: scap-helm cxserver upgrade -f cxserver-eqiad-values.yaml production stable/cxserver [namespace: cxserver, clusters: eqiad]
  • 07:16 akosiaris@deploy1001: scap-helm cxserver finished
  • 07:15 akosiaris@deploy1001: scap-helm cxserver cluster staging completed
  • 07:15 akosiaris@deploy1001: scap-helm cxserver upgrade -f cxserver-staging-values.yaml staging stable/cxserver [namespace: cxserver, clusters: staging]
  • 07:07 marostegui@deploy1001: Synchronized wmf-config/db-eqiad.php: More traffic to db1098 (duration: 00m 55s)
  • 06:52 marostegui@deploy1001: Synchronized wmf-config/db-eqiad.php: More traffic to db1098 (duration: 00m 54s)
  • 06:51 marostegui@deploy1001: Synchronized wmf-config/db-eqiad.php: More traffic to db1097 (duration: 00m 55s)
  • 06:50 marostegui@deploy1001: sync-file aborted: More traffic to db1097 (duration: 00m 00s)
  • 06:46 akosiaris@deploy1001: scap-helm cxserver finished
  • 06:46 akosiaris@deploy1001: scap-helm cxserver cluster staging completed
  • 06:46 akosiaris@deploy1001: scap-helm cxserver upgrade -f cxserver-staging-values.yaml staging /home/akosiaris/deployment-charts/charts/cxserver/ [namespace: cxserver, clusters: staging]
  • 06:40 marostegui: Upgrade mysql on dbstore2002
  • 06:34 marostegui@deploy1001: Synchronized wmf-config/db-eqiad.php: More traffic to db1098:3317 (duration: 00m 55s)
  • 06:21 marostegui@deploy1001: Synchronized wmf-config/db-eqiad.php: Slowly repool db1098:3317 (duration: 00m 55s)
  • 06:08 kart_: Finished manual run of unpublished ContentTranslation draft purge script (T217818)
  • 06:04 marostegui: Upgrade MySQL on db1098
  • 06:04 marostegui@deploy1001: Synchronized wmf-config/db-eqiad.php: Depool db1098 (duration: 00m 56s)
  • 04:01 kart_: Started manual run of unpublished ContentTranslation draft purge script (T217818)
  • 01:39 ejegg: updated fundraising CiviCRM from 5c45e4c24d to b4e3cf16cc

2019-03-13

  • 23:48 catrope@deploy1001: Synchronized php-1.33.0-wmf.21/skins/MinervaNeue/: Remove unnecessary parameter from getHistoryPageAction (duration: 00m 56s)
  • 23:45 catrope@deploy1001: Synchronized wmf-config/WikibaseSearchSettings.php: Fix builder class definition for WBCS (duration: 00m 56s)
  • 23:41 catrope@deploy1001: Synchronized php-1.33.0-wmf.21/extensions/MobileFrontend/: Fix animation when visual section editing enabled on mobile only (T218167) (duration: 00m 58s)
  • 23:39 catrope@deploy1001: Synchronized php-1.33.0-wmf.21/extensions/WikibaseCirrusSearch/: Fix hook return values (duration: 00m 58s)
  • 23:30 catrope@deploy1001: Synchronized php-1.33.0-wmf.21/extensions/GrowthExperiments/: Instrumentation fixes (T217802) (duration: 00m 57s)
  • 22:00 otto@deploy1001: Synchronized wmf-config/InitialiseSettings.php: Disabling api-request logging to eventgate-analytics for group0 wikis until we solve T218268 (duration: 00m 56s)
  • 21:11 otto@deploy1001: scap-helm eventgate-analytics finished
  • 21:11 otto@deploy1001: scap-helm eventgate-analytics cluster eqiad completed
  • 21:11 otto@deploy1001: scap-helm eventgate-analytics upgrade production -f eventgate-analytics-eqiad-values.yaml stable/eventgate-analytics [namespace: eventgate-analytics, clusters: eqiad]
  • 21:10 otto@deploy1001: scap-helm eventgate-analytics finished
  • 21:10 otto@deploy1001: scap-helm eventgate-analytics cluster codfw completed
  • 21:09 otto@deploy1001: scap-helm eventgate-analytics upgrade production -f eventgate-analytics-codfw-values.yaml stable/eventgate-analytics [namespace: eventgate-analytics, clusters: codfw]
  • 20:58 otto@deploy1001: scap-helm eventgate-analytics finished
  • 20:58 otto@deploy1001: scap-helm eventgate-analytics cluster staging completed
  • 20:58 otto@deploy1001: scap-helm eventgate-analytics upgrade staging -f eventgate-analytics-staging-values.yaml stable/eventgate-analytics [namespace: eventgate-analytics, clusters: staging]
  • 20:58 otto@deploy1001: scap-helm eventgate-analytics install -n staging -f eventgate-analytics-staging-values.yaml stable/eventgate-analytics [namespace: eventgate-analytics, clusters: staging]
  • 20:58 otto@deploy1001: scap-helm eventgate-analytics install -n staging -f eventgate-analytics-staging-values.yaml stable/eventgate-analytics [namespace: eventgate-analytics, clusters: eqiad,codfw]
  • 20:35 arlolra@deploy1001: Finished deploy [parsoid/deploy@e2e44bc]: Updating Parsoid to ea80d1b (duration: 06m 38s)
  • 20:28 arlolra@deploy1001: Started deploy [parsoid/deploy@e2e44bc]: Updating Parsoid to ea80d1b
  • 20:25 bsitzmann@deploy1001: Finished deploy [mobileapps/deploy@5f8e4e6]: Update mobileapps to 5865552 (7074964 d6dc3cd fbc6262) (duration: 03m 35s)
  • 20:24 otto@deploy1001: Synchronized wmf-config/InitialiseSettings.php: Disabling api-request logging to eventgate-analytics for group1 wikis to investigate possible outage (duration: 00m 56s)
  • 20:21 bsitzmann@deploy1001: Started deploy [mobileapps/deploy@5f8e4e6]: Update mobileapps to 5865552 (7074964 d6dc3cd fbc6262)
  • 20:14 bsitzmann@deploy1001: Finished deploy [mobileapps/deploy@5f8e4e6]: Update mobileapps to 5865552 (7074964 d6dc3cd fbc6262) (duration: 01m 49s)
  • 20:03 herron: increased index.mapping.total_fields.limit to 1350 on index logstash-2019.03.13
  • 19:46 jijiki: Pooling mw2206 - T215415
  • 19:26 herron: performing rolling restart of eqiad logstash instances
  • 18:51 jijiki: Depool mw1280 and mw2206 to hardware issues - T215415 T218006
  • 18:44 otto@deploy1001: Synchronized wmf-config/InitialiseSettings.php: Enabling api-request logging to eventgate-analytics for group1 wikis (duration: 00m 58s)
  • 18:30 robh: thumbor1004 memtest in progress via T215411
  • 18:29 ema@puppetmaster1001: conftool action : set/pooled=no; selector: name=cp2002.codfw.wmnet,service=varnish-fe
  • 18:29 ema@puppetmaster1001: conftool action : set/pooled=no; selector: name=cp2002.codfw.wmnet,service=nginx
  • 18:28 ema: cp2002: depool varnish-fe after 1 hour ATS experiment T213263
  • 18:09 bstorm_: rebooting labstore1006 T217473
  • 18:07 bstorm_: downtime labstore1006 for troubleshooting T217473
  • 17:57 XioNoX: set interface description on fasw-c-codfw:ge-0/0/47
  • 17:43 XioNoX: s/29073/202425/ on AMS-IX
  • 17:34 XioNoX: add missing sandbox1-b-eqiad interface to ospf(3) passive on cr1/2-eqiad
  • 17:19 ema@puppetmaster1001: conftool action : set/pooled=yes; selector: name=cp2002.codfw.wmnet,service=varnish-fe
  • 17:19 ema@puppetmaster1001: conftool action : set/pooled=yes; selector: name=cp2002.codfw.wmnet,service=nginx
  • 17:18 ema: cp2002: pool varnish-fe for user traffic, routed through ATS backends T213263
  • 17:05 otto@deploy1001: scap-helm eventgate-analytics finished
  • 17:05 otto@deploy1001: scap-helm eventgate-analytics cluster eqiad completed
  • 17:05 otto@deploy1001: scap-helm eventgate-analytics upgrade production -f eventgate-analytics-eqiad-values.yaml stable/eventgate-analytics [namespace: eventgate-analytics, clusters: eqiad]
  • 17:01 otto@deploy1001: scap-helm eventgate-analytics finished
  • 17:01 otto@deploy1001: scap-helm eventgate-analytics cluster codfw completed
  • 17:01 otto@deploy1001: scap-helm eventgate-analytics upgrade production -f eventgate-analytics-codfw-values.yaml stable/eventgate-analytics [namespace: eventgate-analytics, clusters: codfw]
  • 16:59 otto@deploy1001: scap-helm eventgate-analytics finished
  • 16:58 otto@deploy1001: scap-helm eventgate-analytics cluster staging completed
  • 16:58 otto@deploy1001: scap-helm eventgate-analytics upgrade staging -f eventgate-analytics-staging-values.yaml stable/eventgate-analytics [namespace: eventgate-analytics, clusters: staging]
  • 16:56 robh: mw2206.codfw.wmnet is being powered down for firmware update, relying on auto depool function from clean shutdown for mw api server via T215415
  • 16:42 robh: mw2206.codfw.wmnet is being powered down for firmware update, relying on auto depool function from clean shutdown for mw api server via T215415
  • 16:36 addshore: SWAT done
  • 16:36 addshore@deploy1001: Synchronized php-1.33.0-wmf.21/includes/api/ApiMain.php: SWAT: T214080 T212529 ApiMain.php api/request logging event changes gerrit:496197 (duration: 00m 57s)
  • 16:32 akosiaris@deploy1001: scap-helm cxserver finished
  • 16:32 akosiaris@deploy1001: scap-helm cxserver cluster staging completed
  • 16:32 akosiaris@deploy1001: scap-helm cxserver upgrade -f cxserver-staging-values.yaml staging /home/akosiaris/deployment-charts/charts/cxserver/ [namespace: cxserver, clusters: staging]
  • 16:19 akosiaris@deploy1001: scap-helm cxserver finished
  • 16:19 akosiaris@deploy1001: scap-helm cxserver cluster staging completed
  • 16:19 akosiaris@deploy1001: scap-helm cxserver upgrade -f cxserver-staging-values.yaml staging /home/akosiaris/deployment-charts/charts/cxserver/ [namespace: cxserver, clusters: staging]
  • 16:16 akosiaris@deploy1001: scap-helm cxserver finished
  • 16:16 akosiaris@deploy1001: scap-helm cxserver cluster staging completed
  • 16:16 akosiaris@deploy1001: scap-helm cxserver upgrade -f cxserver-staging-values.yaml staging /home/akosiaris/deployment-charts/charts/cxserver/ [namespace: cxserver, clusters: staging]
  • 16:16 akosiaris@deploy1001: scap-helm cxserver finished
  • 16:16 akosiaris@deploy1001: scap-helm cxserver cluster staging completed
  • 16:16 akosiaris@deploy1001: scap-helm cxserver upgrade -f cxserver-staging-values.yaml staging /home/akosiaris/deployment-charts/charts/cxserver/ [namespace: cxserver, clusters: staging]
  • 16:15 jijiki: Depool thumbor1004 to investigate memory issues - T215411
  • 16:04 akosiaris@deploy1001: scap-helm cxserver finished
  • 16:04 akosiaris@deploy1001: scap-helm cxserver cluster eqiad completed
  • 16:04 akosiaris@deploy1001: scap-helm cxserver upgrade -f cxserver-eqiad-values.yaml production stable/cxserver [namespace: cxserver, clusters: eqiad]
  • 16:04 akosiaris@deploy1001: scap-helm cxserver finished
  • 16:04 akosiaris@deploy1001: scap-helm cxserver cluster codfw completed
  • 16:04 akosiaris@deploy1001: scap-helm cxserver upgrade -f cxserver-codfw-values.yaml production stable/cxserver [namespace: cxserver, clusters: codfw]
  • 16:04 akosiaris@deploy1001: scap-helm cxserver finished
  • 16:04 akosiaris@deploy1001: scap-helm cxserver cluster staging completed
  • 16:04 akosiaris@deploy1001: scap-helm cxserver upgrade -f cxserver-staging-values.yaml staging stable/cxserver [namespace: cxserver, clusters: staging]
  • 15:52 akosiaris@deploy1001: scap-helm cxserver finished
  • 15:52 akosiaris@deploy1001: scap-helm cxserver cluster staging completed
  • 15:52 akosiaris@deploy1001: scap-helm cxserver upgrade -f cxserver-staging-values.yaml staging stable/cxserver [namespace: cxserver, clusters: staging]
  • 15:52 akosiaris@deploy1001: scap-helm cxserver finished
  • 15:52 akosiaris@deploy1001: scap-helm cxserver cluster eqiad completed
  • 15:52 akosiaris@deploy1001: scap-helm cxserver upgrade -f cxserver-eqiad-values.yaml production stable/cxserver [namespace: cxserver, clusters: eqiad]
  • 15:52 akosiaris@deploy1001: scap-helm cxserver upgrade -f cxserver-eqiad-values.yaml eqiad stable/cxserver [namespace: cxserver, clusters: eqiad]
  • 15:52 akosiaris@deploy1001: scap-helm cxserver finished
  • 15:52 akosiaris@deploy1001: scap-helm cxserver cluster codfw completed
  • 15:52 akosiaris@deploy1001: scap-helm cxserver upgrade -f cxserver-codfw-values.yaml production stable/cxserver [namespace: cxserver, clusters: codfw]
  • 15:40 akosiaris: do the first deploy of cxserver in eqiad/codfw T213195
  • 15:39 akosiaris@deploy1001: scap-helm cxserver finished
  • 15:39 akosiaris@deploy1001: scap-helm cxserver cluster eqiad completed
  • 15:39 akosiaris@deploy1001: scap-helm cxserver install -n production -f cxserver-eqiad-values.yaml stable/cxserver [namespace: cxserver, clusters: eqiad]
  • 15:39 akosiaris@deploy1001: scap-helm cxserver finished
  • 15:39 akosiaris@deploy1001: scap-helm cxserver cluster codfw completed
  • 15:39 akosiaris@deploy1001: scap-helm cxserver install -n production -f cxserver-codfw-values.yaml stable/cxserver [namespace: cxserver, clusters: codfw]
  • 14:27 ema: cp2002: depool varnish-fe in preparation of pointing it to ATS T213263
  • 14:13 marostegui: Upgrade db2074 (sanitarium master)
  • 13:42 akosiaris: upgrade kubestage to kubernetes 1.11.8
  • 13:42 akosiaris: upgrade neon to kubernetes 1.11.8
  • 13:28 akosiaris: upgrade kubestage1002 to kubernetes 1.11.8
  • 13:24 godog: take a snapshot of prometheus@k8s data on prometheus2004
  • 13:13 zfilipin@deploy1001: Synchronized php: group1 wikis to 1.33.0-wmf.21 (duration: 01m 43s)
  • 13:12 zfilipin@deploy1001: rebuilt and synchronized wikiversions files: group1 wikis to 1.33.0-wmf.21
  • 11:34 marostegui: Test snapshot db1117:3325 to dbstore1001 - T210292
  • 10:55 marostegui: Upgrade db2057
  • 10:00 marostegui@deploy1001: Synchronized wmf-config/db-eqiad.php: Repool db1085 (duration: 00m 56s)
  • 09:52 mutante: ms-be1035 - sudo systemctl reset-failed
  • 09:45 ema: cp1071: upgrade trafficserver to 8.0.3~rc0 for testing purposes
  • 09:41 marostegui: Deploy schema change on db1085 with replication, there will be lag on labsdb:s6
  • 09:41 marostegui@deploy1001: Synchronized wmf-config/db-eqiad.php: Depool db1085 (duration: 00m 55s)
  • 09:06 moritzm: installing PHP 7.0 security updates
  • 08:59 marostegui@deploy1001: Synchronized wmf-config/db-eqiad.php: Repool db1113:3316 (duration: 00m 55s)
  • 08:58 marostegui: Upgrade mysql and kernel on db2050
  • 08:51 ema: cp3030: wipe frontend cache to get rid of large objects T216006
  • 08:26 marostegui@deploy1001: Synchronized wmf-config/db-eqiad.php: Depool db1113:3316 (duration: 00m 55s)
  • 08:14 marostegui@deploy1001: Synchronized wmf-config/db-eqiad.php: Repool db1093 (duration: 00m 55s)
  • 08:09 moritzm: upgrading job runners in eqiad to component/php72 / PHP 7.2.16 (combined with glibc/OpenSSL updates)
  • 07:30 marostegui@deploy1001: Synchronized wmf-config/db-eqiad.php: Depool db1093 (duration: 00m 54s)
  • 07:26 moritzm: upgrading remaining app servers in eqiad to component/php72 / PHP 7.2.16 (combined with glibc/OpenSSL updates)
  • 07:25 marostegui@deploy1001: Synchronized wmf-config/db-eqiad.php: Fully repool db1096 (duration: 00m 58s)
  • 07:13 marostegui: Test snapshot dbstore1001:3311 to dbstore1001 - T210292
  • 07:12 marostegui@deploy1001: Synchronized wmf-config/db-eqiad.php: More traffic to db1096 (duration: 00m 55s)
  • 06:58 marostegui: Upgrade MySQL and kernel on db2036
  • 06:56 marostegui@deploy1001: Synchronized wmf-config/db-eqiad.php: Slowly repool db1096 (duration: 00m 55s)
  • 06:40 marostegui: Stop MySQL on db1096 for upgrade
  • 06:24 kart_: Finished manual run of unpublished ContentTranslation draft purge script (T217818)
  • 06:21 marostegui: Testing snapshotting on db1117:3321 to > dbstore1001 - T210292
  • 06:15 marostegui@deploy1001: Synchronized wmf-config/db-eqiad.php: Depool db1096 (duration: 01m 07s)
  • 04:11 kart_: Started manual run of unpublished ContentTranslation draft purge script (T217818)

2019-03-12

  • 23:33 thcipriani@deploy1001: Synchronized php-1.33.0-wmf.21/extensions/MobileFrontend/includes/specials/SpecialMobileOptions.php: SWAT: Fix: undefined locals in SpecialMobileOptions.setJsConfigVars()|gerrit:495907Fix: undefined locals in SpecialMobileOptions.setJsConfigVars() T218098 (duration: 00m 57s)
  • 20:49 shdubsh: manually upgrade prometheus-icinga-exporter to 0.5 on standby icinga
  • 19:48 eileen: civicrm revision changed from 977b9bfcf1 to 5c45e4c24d, config revision is f930677e97
  • 19:31 herron: restarted citoid on scb1003
  • 19:16 otto@deploy1001: Synchronized wmf-config/InitialiseSettings.php: Enabling api-request logging to eventgate-analytics for group0 wikis (duration: 01m 01s)
  • 19:14 arturo: T216497 manually delete libpam-systemd and libsystemd0 230-7~bpo8+2 from jessie-wikimedia/openstack-mitaka-jessie
  • 19:09 arturo: T216497 manually delete systemd 230-7~bpo8+2 from jessie-wikimedia/openstack-mitaka-jessie
  • 19:07 robh: rebooting thumbor1004 for memory troubleshooting via T215411
  • 17:11 addshore@deploy1001: Synchronized php-1.33.0-wmf.21/extensions/Wikibase/repo/includes/Store/Sql/SqlStore.php: T97368 Increase APC cache for PropertyInfoLookup from 15 to 20s (duration: 00m 55s)
  • 17:10 addshore@deploy1001: Synchronized php-1.33.0-wmf.20/extensions/Wikibase/repo/includes/Store/Sql/SqlStore.php: T97368 Increase APC cache for PropertyInfoLookup from 15 to 20s (duration: 00m 57s)
  • 17:02 jbond42: rolling update of debdeploy
  • 16:57 addshore@deploy1001: Synchronized wmf-config/InitialiseSettings-labs.php: beta only (duration: 00m 53s)
  • 16:43 addshore@deploy1001: Synchronized php-1.33.0-wmf.21/extensions/Wikibase/repo/includes/Store/Sql/SqlStore.php: T97368 Double on server cache for PropertyInfoStore (duration: 00m 55s)
  • 16:42 addshore@deploy1001: Synchronized php-1.33.0-wmf.20/extensions/Wikibase/repo/includes/Store/Sql/SqlStore.php: T97368 Double on server cache for PropertyInfoStore (duration: 00m 57s)
  • 16:29 moritzm: upgraded buster installation image to daily build from 12th of March (T213527)
  • 15:45 otto@deploy1001: scap-helm eventgate-analytics finished
  • 15:45 otto@deploy1001: scap-helm eventgate-analytics cluster codfw completed
  • 15:45 otto@deploy1001: scap-helm eventgate-analytics upgrade production -f eventgate-analytics-codfw-values.yaml stable/eventgate-analytics [namespace: eventgate-analytics, clusters: codfw]
  • 15:43 otto@deploy1001: scap-helm eventgate-analytics finished
  • 15:43 otto@deploy1001: scap-helm eventgate-analytics cluster eqiad completed
  • 15:43 otto@deploy1001: scap-helm eventgate-analytics upgrade production -f eventgate-analytics-eqiad-values.yaml stable/eventgate-analytics [namespace: eventgate-analytics, clusters: eqiad]
  • 15:43 otto@deploy1001: scap-helm eventgate-analytics upgrade production -f eventgate-analytics-eqiad-values.yaml stable/eventgate-analytics [namespace: eventgate-analytics, clusters: eqiad]
  • 15:42 ppchelko@deploy1001: scap-helm eventgate-analytics upgrade production -f eventgate-analytics-eqiad-values.yaml stable/eventgate-analytics [namespace: eventgate-analytics, clusters: eqiad]
  • 15:41 ppchelko@deploy1001: scap-helm eventgate-analytics upgrade -f eventgate-analytics-eqiad-values.yaml stable/eventgate-analytics [namespace: eventgate-analytics, clusters: eqiad]
  • 15:39 ppchelko@deploy1001: scap-helm eventgate-analytics upgrade production -f eventgate-analytics-eqiad-values.yaml stable/eventgate-analytics [namespace: eventgate-analytics, clusters: eqiad]
  • 15:38 ayounsi@puppetmaster1001: conftool action : set/pooled=yes; selector: name=dns2001.wikimedia.org,service=pdns_recursor
  • 15:37 ppchelko@deploy1001: scap-helm eventgate-analytics upgrade production -f eventgate-analytics-eqiad-values.yaml stable/eventgate-analytics [namespace: eventgate-analytics, clusters: eqiad]
  • 15:33 ppchelko@deploy1001: scap-helm eventgate-analytics upgrade production -f eventgate-analytics-eqiad-values.yaml stable/eventgate-analytics [namespace: eventgate-analytics, clusters: eqiad]
  • 15:33 ppchelko@deploy1001: scap-helm eventgate-analytics upgrade production stable/eventgate-analytics -f eventgate-analytics-eqiad-values.yaml [namespace: eventgate-analytics, clusters: eqiad]
  • 15:28 otto@deploy1001: scap-helm eventgate-analytics finished
  • 15:28 otto@deploy1001: scap-helm eventgate-analytics cluster staging completed
  • 15:28 otto@deploy1001: scap-helm eventgate-analytics upgrade staging -f eventgate-analytics-staging-values.yaml stable/eventgate-analytics [namespace: eventgate-analytics, clusters: staging]
  • 15:26 ayounsi@puppetmaster1001: conftool action : set/pooled=no; selector: name=dns2001.wikimedia.org,service=pdns_recursor
  • 15:23 ayounsi@puppetmaster1001: conftool action : set/pooled=no; selector: name=dns2001.wikimedia.org,service=pdns_^Ccursor
  • 15:02 ppchelko@deploy1001: scap-helm eventgate-analytics finished
  • 15:02 ppchelko@deploy1001: scap-helm eventgate-analytics cluster staging completed
  • 15:02 ppchelko@deploy1001: scap-helm eventgate-analytics upgrade staging -f eventgate-analytics-staging-values.yaml stable/eventgate-analytics [namespace: eventgate-analytics, clusters: staging]
  • 15:02 ppchelko@deploy1001: scap-helm eventgate-analytics upgrade -f eventgate-analytics-staging-values.yaml stable/eventgate-analytics [namespace: eventgate-analytics, clusters: staging]
  • 15:00 ppchelko@deploy1001: scap-helm eventgate-analytics upgrade -n staging -f eventgate-analytics-staging-values.yaml stable/eventgate-analytics [namespace: eventgate-analytics, clusters: staging]
  • 14:26 mutante: phab1002 - reboot
  • 13:43 marostegui: Upgrade MySQL and kernel on db2094 (inactive sanitarium)
  • 13:27 marostegui: Deploy schema change on s6 codfw, lag will be generated on s6 codfw
  • 13:24 zfilipin@deploy1001: rebuilt and synchronized wikiversions files: group0 to 1.33.0-wmf.21
  • 12:41 arturo: T215605 include python-mwclient .deb in openstack-mitaka-jessie/jessie-wikimedia in install1002
  • 12:23 jynus: testing snapshotting on db1117:3325 -> dbstore1001 T210292
  • 12:23 zfilipin@deploy1001: Finished scap: testwiki to php-1.33.0-wmf.21 and rebuild l10n cache (duration: 34m 25s)
  • 12:09 moritzm: upgrading mw1238-mw1258 to component/php72 / PHP 7.2.16 (combined with glibc/OpenSSL updates)
  • 11:59 mutante: analytics-tool1004 - start superset service
  • 11:48 zfilipin@deploy1001: Started scap: testwiki to php-1.33.0-wmf.21 and rebuild l10n cache
  • 11:47 zfilipin@deploy1001: Pruned MediaWiki: 1.33.0-wmf.17 [keeping static files] (duration: 01m 40s)
  • 11:45 zfilipin@deploy1001: Pruned MediaWiki: 1.33.0-wmf.18 [keeping static files] (duration: 01m 35s)
  • 11:42 arturo: T215605 include python-oath .deb in stretch-wikimedia thirdparty/oath
  • 11:41 zfilipin@deploy1001: Pruned MediaWiki: 1.33.0-wmf.16 (duration: 12m 41s)
  • 11:39 elukey: raise mysql's max_user_connection to 1000 for the Analytics user on labsdb1012
  • 11:36 ema@puppetmaster1001: conftool action : set/pooled=yes; selector: name=cp1077.eqiad.wmnet
  • 11:36 ema: cp1077: repool varnish-be after service restart T217893
  • 11:35 arturo: delete wrong stretch-wikimedia `thirdparty` component in install1002
  • 11:12 zeljkof: EU SWAT finished
  • 11:12 kartik@deploy1001: Synchronized wmf-config/InitialiseSettings.php: SWAT: gerrit:495842 Add campaign prefix for EG tag (T216123) (duration: 00m 49s)
  • 11:11 moritzm: upgrading API servers/job runners servers in eqiad to component/php72 / PHP 7.2.16 (combined with glibc/OpenSSL updates) (T216712)
  • 10:32 marostegui: Deploy schema change on db1082, lag will happen on s5 on labs
  • 10:29 gtirloni: re-enabled puppet on serpens and seaborgium
  • 10:19 gtirloni: updated slapd to version 2.4.47 on seaborgium (T217280)
  • 10:17 moritzm: upgrading API servers/job runners servers in codfw to component/php72 / PHP 7.2.16 (combined with glibc/OpenSSL updates) (T216712)
  • 10:14 gtirloni: upgrading seaborgium to slapd 2.4.47
  • 09:39 jynus: stop db1114 and restart it empty
  • 09:25 marostegui@deploy1001: Synchronized wmf-config/db-eqiad.php: Repool db1082 (duration: 00m 48s)
  • 08:57 elukey: restart memcached on mc1019 to apply new settings - T217731
  • 08:50 ema: cp1077 depooled again T217893
  • 08:49 ema@puppetmaster1001: conftool action : set/pooled=no; selector: name=cp1077.eqiad.wmnet
  • 08:48 moritzm: upgrading app servers in codfw to component/php72 / PHP 7.2.16 (combined with glibc/OpenSSL updates) (T216712)
  • 08:48 ema: restart varnish-be on cp1077 T217893
  • 08:47 moritzm: upgrading app servers in codfw to component/php72 / PHP 7.2.16 (combined with glibc/OpenSSL updates)
  • 08:46 ema: cp1077 repooled T217893
  • 08:46 ema@puppetmaster1001: conftool action : set/pooled=yes; selector: name=cp1077.eqiad.wmnet
  • 08:36 marostegui@deploy1001: Synchronized wmf-config/db-eqiad.php: Depool db1082 for schema change (duration: 00m 48s)
  • 08:34 jynus: deploy core replica events to db1118
  • 08:15 ema: cp1099: ferm.service failed to resolve prometheus1003.eqiad.wmnet. ferm restarted T202966
  • 07:18 marostegui: Deploy schema change on db2052 (s5 codfw master), this will generate lag on codfw T71127 T51199
  • 07:17 marostegui@deploy1001: Synchronized wmf-config/db-eqiad.php: Repool db1113 after schema change and upgrade (duration: 00m 49s)
  • 07:09 marostegui: Upgrade mysql and kernel on db1113
  • 06:40 kart_: Finished manual run of unpublished ContentTranslation draft purge script (T217818)
  • 06:17 marostegui@deploy1001: Synchronized wmf-config/db-eqiad.php: Depool db1113 for schema change and upgrade (duration: 00m 50s)
  • 04:04 kart_: Started manual run of unpublished ContentTranslation draft purge script (T217818)
  • 02:40 ejegg: updated payments-wiki from f1a89d7045 to 7a312e371a

2019-03-11

  • 17:55 addshore@deploy1001: Synchronized wmf-config/interwiki-labs.php: BETA ONLY https://gerrit.wikimedia.org/r/#/c/operations/mediawiki-config/+/495723/ (duration: 00m 48s)
  • 17:43 addshore@deploy1001: Synchronized wmf-config/InitialiseSettings-labs.php: BETA ONLY https://gerrit.wikimedia.org/r/#/c/operations/mediawiki-config/+/495721/ (duration: 00m 49s)
  • 17:23 arturo: T215605 copy python-oath from jessie-wikimedia/thirdparty to stretch-wikimedia/thirdpary in reprepro
  • 17:03 jbond@cumin1001: END (FAIL) - Cookbook sre.hosts.ipmi-password-reset (exit_code=99)
  • 17:02 jbond@cumin1001: START - Cookbook sre.hosts.ipmi-password-reset
  • 16:31 jbond@cumin1001: END (FAIL) - Cookbook sre.hosts.ipmi-password-reset (exit_code=99)
  • 16:31 jbond@cumin1001: START - Cookbook sre.hosts.ipmi-password-reset
  • 15:26 marostegui@deploy1001: Synchronized wmf-config/db-eqiad.php: Fully repool db1097 (duration: 00m 48s)
  • 15:16 mlitn@deploy1001: Synchronized wmf-config/InitialiseSettings-labs.php: Fix syntax for MediaInfo depicts config (beta only) (duration: 00m 49s)
  • 14:55 marostegui@deploy1001: Synchronized wmf-config/db-eqiad.php: More traffic to db1097 (duration: 00m 49s)
  • 14:43 moritzm: upgrading mw canaries to PHP 7.2.16
  • 14:32 marostegui@deploy1001: Synchronized wmf-config/db-eqiad.php: More traffic to db1097 (duration: 00m 48s)
  • 14:25 hashar: contint1001: stopping zuul-merger (it is cpu or IO starving the server)
  • 14:21 moritzm: upgrading mwdebug servers to PHP 7.2.16
  • 14:11 marostegui@deploy1001: Synchronized wmf-config/db-eqiad.php: Slowly repool db1097 (duration: 00m 47s)
  • 14:09 moritzm: importing build of PHP 7.2.16 for component/php72 (T216712)
  • 13:58 marostegui: Upgrade mysql on db1097
  • 13:28 arturo: disable active checks in icinga for labtestvirt200[12] (T218023)
  • 13:04 moritzm: upgrading mwdebug2002 to php 7.2.16
  • 12:23 gtirloni: updated slapd to version 2.4.47 on serpens (T217280)
  • 12:05 gtirloni: updating slapd on serpens/codfw to test possible fix for memory leaks
  • 10:48 marostegui@deploy1001: Synchronized wmf-config/db-eqiad.php: Depool db1097 for upgrade and schema change (duration: 00m 48s)
  • 10:44 jdrewniak@deploy1001: Synchronized portals: Wikimedia Portals Update: Bumping portals to master (T128546)|gerrit:495650 Bumping portals to master (T128546) (duration: 00m 49s)
  • 10:44 jdrewniak@deploy1001: Synchronized portals/wikipedia.org/assets: Wikimedia Portals Update: Bumping portals to master (T128546)|gerrit:495650 Bumping portals to master (T128546) (duration: 00m 49s)
  • 09:56 moritzm: installing chromium security updates on remaining proton hosts
  • 09:44 moritzm: installing chromium security updates on proton1001
  • 09:44 elukey: roll restart of aqs on aqs100* to pick up new druid settings
  • 08:02 marostegui: Upgrade pc1010 (spare)
  • 07:33 marostegui@deploy1001: Synchronized wmf-config/db-eqiad.php: Fully repool db1099 after upgrade (duration: 00m 48s)
  • 07:32 marostegui: Upgrade MySQL and kernel on pc2010 (spare)
  • 07:13 marostegui@deploy1001: Synchronized wmf-config/db-eqiad.php: Give more traffic to db1099 after upgrade (duration: 00m 48s)
  • 06:58 marostegui@deploy1001: Synchronized wmf-config/db-eqiad.php: Give more traffic to db1099 after upgrade (duration: 00m 48s)
  • 06:49 marostegui@deploy1001: Synchronized wmf-config/db-eqiad.php: Slowly repool db1099 after upgrade (duration: 00m 52s)
  • 06:38 kart_: Finished manual run of unpublished ContentTranslation draft purge script (T217818)
  • 06:37 marostegui: Power cycle mw1280 - server down
  • 06:35 marostegui: Upgrade mysql and kernel on db1099
  • 06:34 marostegui@deploy1001: Synchronized wmf-config/db-eqiad.php: Depool db1099 for upgrade (duration: 03m 01s)
  • 06:03 effie: Restarting pdfrender on scb1003
  • 06:02 marostegui: Upgrade MySQL on dbstore1004 (s2, s3, s4)
  • 04:01 kart_: Started manual run of unpublished ContentTranslation draft purge script (T217818)
  • 03:30 kartik@deploy1001: Finished deploy [cxserver/deploy@101bebd]: Update cxserver to 5a26308 (T216044, T217878) (duration: 04m 01s)
  • 03:26 kartik@deploy1001: Started deploy [cxserver/deploy@101bebd]: Update cxserver to 5a26308 (T216044, T217878)

2019-03-10

  • 22:35 gtirloni: toolforge stretch: increased nscd group TTL from 60 to 300sec (T217280)
  • 07:14 _joe_: restarting pdfrender on scb1004

2019-03-08

  • 19:25 reedy@deploy1001: Synchronized wmf-config/InitialiseSettings-labs.php: beta only (duration: 00m 50s)
  • 19:21 moritzm: installing php updates on netmon1002
  • 18:20 reedy@deploy1001: Synchronized wmf-config/InitialiseSettings-labs.php: beta only (duration: 00m 49s)
  • 17:30 robh: decom in progress for rdb100[123478] via T209181
  • 16:48 mbsantos@deploy1001: Finished deploy [kartotherian/deploy@acf2694] (stretch): UBN geoshapes services on maps1004.eqiad.wmnet (T217898) (duration: 00m 22s)
  • 16:47 mbsantos@deploy1001: Started deploy [kartotherian/deploy@acf2694] (stretch): UBN geoshapes services on maps1004.eqiad.wmnet (T217898)
  • 16:23 mbsantos@deploy1001: Finished deploy [kartotherian/deploy@cc302de] (stretch): UBN geoshapes services on maps2004.codfw.wmnet (T217898) (duration: 00m 24s)
  • 16:22 mbsantos@deploy1001: Started deploy [kartotherian/deploy@cc302de] (stretch): UBN geoshapes services on maps2004.codfw.wmnet (T217898)
  • 16:19 mbsantos@deploy1001: Finished deploy [kartotherian/deploy@d71df87] (stretch): UBN geoshapes services (T217898) (duration: 02m 00s)
  • 16:17 mbsantos@deploy1001: Started deploy [kartotherian/deploy@d71df87] (stretch): UBN geoshapes services (T217898)
  • 15:45 papaul: OS install on restbase2019 and restbase2020
  • 15:30 gilles@deploy1001: Finished deploy [performance/coal@8766469]: (no justification provided) (duration: 00m 06s)
  • 15:30 gilles@deploy1001: Started deploy [performance/coal@8766469]: (no justification provided)
  • 14:34 arturo: T215605 add prometheus-rabbitmq-exporter v0.4 to stretch-wikimedia
  • 14:16 gilles@deploy1001: Finished deploy [performance/navtiming@f2d8a5f]: (no justification provided) (duration: 00m 05s)
  • 14:15 gilles@deploy1001: Started deploy [performance/navtiming@f2d8a5f]: (no justification provided)
  • 13:09 vgutierrez@puppetmaster1001: conftool action : set/pooled=no; selector: name=cp1077.eqiad.wmnet
  • 12:47 akosiaris: depooling cp1077 just in case, high mailbox lag https://grafana.wikimedia.org/d/000000352/varnish-failed-fetches?orgId=1&from=now-3h&to=now&var-datasource=eqiad%20prometheus%2Fops&var-cache_type=text&var-server=All&var-layer=backend&panelId=13&fullscreen
  • 12:47 akosiaris@puppetmaster1001: conftool action : set/pooled=yes; selector: name=cp1077.*
  • 12:07 jbond42: rolling security updates of slite3 on jessie and trusty
  • 11:07 moritzm: uploaded tideways 4.0.7-1+wmf1 for component/php72 (T216712)
  • 10:34 marostegui@deploy1001: Synchronized wmf-config/db-eqiad.php: Repool db1080, db1110 (duration: 00m 49s)
  • 10:14 marostegui: Reload haproxy on dbproxy1011 to repool labsdb1009
  • 09:51 mutante: temp disabling puppet on icinga to debug an issue with elastic checks
  • 09:39 marostegui@deploy1001: Synchronized wmf-config/db-eqiad.php: Depool db1080, db1110 (duration: 00m 49s)
  • 09:15 marostegui@deploy1001: Synchronized wmf-config/db-eqiad.php: Repool db1099:3311,db1096:3315 (duration: 00m 49s)
  • 08:37 marostegui: Reload haproxy on dbproxy1011 to depool labsdb1009
  • 08:31 marostegui: Reload haproxy on dbproxy1010 to repool labsdb1011
  • 08:20 marostegui@deploy1001: Synchronized wmf-config/db-eqiad.php: Depool db1099:3311,db1096:3315 (duration: 00m 48s)
  • 08:11 marostegui@deploy1001: Synchronized wmf-config/db-eqiad.php: Fully repool db1076 (duration: 00m 48s)
  • 07:59 elukey@deploy1001: Finished deploy [analytics/superset/deploy@UNKNOWN]: Test deployment for Buster (duration: 00m 40s)
  • 07:58 elukey@deploy1001: Started deploy [analytics/superset/deploy@UNKNOWN]: Test deployment for Buster
  • 07:57 elukey@deploy1001: Finished deploy [analytics/superset/deploy@UNKNOWN]: Test deployment for Buster (duration: 00m 02s)
  • 07:57 elukey@deploy1001: Started deploy [analytics/superset/deploy@UNKNOWN]: Test deployment for Buster
  • 07:52 elukey@deploy1001: Finished deploy [analytics/superset/deploy@UNKNOWN]: Test deployment for Buster (duration: 01m 18s)
  • 07:52 marostegui@deploy1001: Synchronized wmf-config/db-eqiad.php: Repool db1077 (duration: 00m 48s)
  • 07:51 elukey@deploy1001: Started deploy [analytics/superset/deploy@UNKNOWN]: Test deployment for Buster
  • 07:49 marostegui@deploy1001: Synchronized wmf-config/db-eqiad.php: More traffic to db1076 after mysql upgrade (duration: 00m 49s)
  • 07:35 elukey@deploy1001: Finished deploy [analytics/superset/deploy@UNKNOWN]: Test deployment for Buster (duration: 00m 30s)
  • 07:34 elukey@deploy1001: Started deploy [analytics/superset/deploy@UNKNOWN]: Test deployment for Buster
  • 07:22 marostegui@deploy1001: Synchronized wmf-config/db-eqiad.php: Slowly repool db1076 after mysql upgrade (duration: 00m 49s)
  • 07:13 marostegui@deploy1001: Synchronized wmf-config/db-eqiad.php: Repool db1076 into API after mysql upgrade (duration: 00m 48s)
  • 07:04 marostegui@deploy1001: Synchronized wmf-config/db-eqiad.php: Slowly repool db1076 after mysql upgrade (duration: 00m 48s)
  • 06:53 marostegui: Stop MySQL on db1076 for upgrade
  • 06:52 marostegui@deploy1001: Synchronized wmf-config/db-eqiad.php: Depool db1076 for mysql upgrade (duration: 00m 49s)
  • 06:22 marostegui: Deploy schema change on s3 db1077 with replication (lag will happen on s3 labs)
  • 06:21 marostegui: Stop replication on s3 on labsdb1009 and labsdb1011
  • 06:20 marostegui: Reload haproxy on dbproxy1010 to depool labsdb1010
  • 06:18 marostegui@deploy1001: Synchronized wmf-config/db-eqiad.php: Depool db1077 (duration: 00m 51s)
  • 00:23 thcipriani@deploy1001: Synchronized php-1.33.0-wmf.20/skins/MinervaNeue/resources/skins.minerva.scripts/toc.js: SWAT: Passing page parameter to TOC toggler|gerrit:495021Passing page parameter to TOC toggler T217820 (duration: 00m 50s)
  • 00:16 thcipriani@deploy1001: Synchronized wmf-config/InitialiseSettings-labs.php: SWAT: Cleanup beta cluster config|gerrit:495024Cleanup beta cluster config T213599; Enable advanced mobile contributions mode on beta cluster|gerrit:495023Enable advanced mobile contributions mode on beta cluster beta-only (noop) sync (duration: 00m 49s)
  • 00:01 ayounsi@puppetmaster1001: conftool action : set/pooled=yes; selector: name=dns2001.wikimedia.org,service=pdns_recursor

2019-03-07

  • 23:53 XioNoX: set net.ipv4.ip_local_port_range="32768 60999" on dns2001 and repool server - T209989
  • 23:46 XioNoX: set net.ipv4.ip_local_port_range="49152 65535" on dns2001 - T209989
  • 23:43 ayounsi@puppetmaster1001: conftool action : set/pooled=no; selector: name=dns2001.wikimedia.org,service=pdns_recursor
  • 23:40 XioNoX: depool dns2001 - T209989
  • 20:44 XioNoX: explicitely disable sampling on non eqiad routers
  • 20:42 thcipriani: restarting gerrit on cobalt for 2.15.11 rollback
  • 20:42 thcipriani@deploy1001: Finished deploy [gerrit/gerrit@5800deb]: Revert "Gerrit to 2.15.11" on cobalt (production) (duration: 00m 07s)
  • 20:41 thcipriani@deploy1001: Started deploy [gerrit/gerrit@5800deb]: Revert "Gerrit to 2.15.11" on cobalt (production)
  • 20:40 thcipriani@deploy1001: Finished deploy [gerrit/gerrit@5800deb]: Revert "Gerrit to 2.15.11" on gerrit2001 only (duration: 00m 10s)
  • 20:40 thcipriani@deploy1001: Started deploy [gerrit/gerrit@5800deb]: Revert "Gerrit to 2.15.11" on gerrit2001 only
  • 20:10 thcipriani: restarting gerrit on cobalt for 2.15.11 upgrade
  • 20:10 thcipriani@deploy1001: Finished deploy [gerrit/gerrit@5800deb]: Gerrit to 2.15.11 on cobalt (production) (duration: 00m 11s)
  • 20:09 thcipriani@deploy1001: Started deploy [gerrit/gerrit@5800deb]: Gerrit to 2.15.11 on cobalt (production)
  • 20:07 thcipriani@deploy1001: Finished deploy [gerrit/gerrit@5800deb]: Gerrit to 2.15.11 on gerrit2001 only (duration: 00m 12s)
  • 20:07 thcipriani@deploy1001: Started deploy [gerrit/gerrit@5800deb]: Gerrit to 2.15.11 on gerrit2001 only
  • 19:33 gilles@deploy1001: Synchronized wmf-config/InitialiseSettings.php: T216499 Enable Priority Hints origin trial on ruwiki (duration: 00m 48s)
  • 19:22 niharika29@deploy1001: Synchronized wmf-config/InitialiseSettings.php: Grant 'reupload-shared' to mediawiki uploaders and fix T217523 (duration: 00m 49s)
  • 19:12 niharika29@deploy1001: Synchronized wmf-config/InitialiseSettings.php: Enable Partial Blocks on Arabic Wikipedia T217283 (duration: 00m 50s)
  • 19:04 arlolra: Updated Parsoid to d4e76d5 (T202905)
  • 18:56 arlolra@deploy1001: Finished deploy [parsoid/deploy@766a920]: Updating Parsoid to d4e76d5 (duration: 05m 01s)
  • 18:51 arlolra@deploy1001: Started deploy [parsoid/deploy@766a920]: Updating Parsoid to d4e76d5
  • 18:39 gehel@puppetmaster1001: conftool action : set/pooled=yes; selector: dc=codfw,cluster=maps,name=maps2004.codfw.wmnet
  • 18:32 mbsantos@deploy1001: Finished deploy [kartotherian/deploy@248b8c4] (stretch): Updating eqiad cluster before repool of maps2004.codfw.wmnet (duration: 01m 25s)
  • 18:30 mbsantos@deploy1001: Started deploy [kartotherian/deploy@248b8c4] (stretch): Updating eqiad cluster before repool of maps2004.codfw.wmnet
  • 18:30 mbsantos@deploy1001: Finished deploy [tilerator/deploy@fac7e5e] (stretch): Updating eqiad cluster before repool of maps2004.codfw.wmnet (duration: 03m 46s)
  • 18:26 mbsantos@deploy1001: Started deploy [tilerator/deploy@fac7e5e] (stretch): Updating eqiad cluster before repool of maps2004.codfw.wmnet
  • 18:25 gehel: cleaning kernel-proposed-updates component on reprepro (install1002)
  • 18:15 XioNoX: disable asw2-c-eqiad <-> asw-c-eqiad link - T208734
  • 17:55 gehel: rolling upgrade of kibana on logstash clusters completed - T216052
  • 17:48 gehel: rolling upgrade of kibana on logstash clusters - T216052
  • 17:44 gehel: rolling upgrade of logstash on logstash clusters completed - T216052
  • 17:36 gehel: rolling upgrade of logstash on logstash clusters - T216052
  • 17:34 gehel@deploy1001: Finished deploy [logstash/plugins@7c4c5ea]: upgrade logstash plugins to 5.6.14 - T216052 (duration: 00m 07s)
  • 17:34 gehel@deploy1001: Started deploy [logstash/plugins@7c4c5ea]: upgrade logstash plugins to 5.6.14 - T216052
  • 17:34 gehel@deploy1001: Finished deploy [logstash/plugins@7c4c5ea]: upgrade logstash plugins to 5.6.14 - T216052 (duration: 00m 08s)
  • 17:33 gehel@deploy1001: Started deploy [logstash/plugins@7c4c5ea]: upgrade logstash plugins to 5.6.14 - T216052
  • 17:16 gehel: rolling upgrade of elasticsearch on logstash clusters completed - T216052
  • 17:09 ariel@deploy1001: Finished deploy [dumps/dumps@3e25558]: fix broken page-content job retries (duration: 00m 04s)
  • 17:09 ariel@deploy1001: Started deploy [dumps/dumps@3e25558]: fix broken page-content job retries
  • 16:54 cmjohnson1: powering off cp1099 to move to different rack T202966
  • 15:26 gehel: rolling upgrade of elasticsearch on logstash clusters - T216052
  • 14:54 hashar: 1.33.0-wmf.20 seems all good
  • 14:46 marostegui: Reload haproxy on dbproxy1011 to repool labsdb1009
  • 14:15 hashar@deploy1001: rebuilt and synchronized wikiversions files: all wikis to 1.33.0-wmf.20
  • 13:47 mutante: phab1002 - removing all php-7.2 packages and letting puppet reinstall them after component change
  • 13:42 marostegui@deploy1001: Synchronized wmf-config/db-eqiad.php: Fully repool db1075 after schema change and mysql upgrade (duration: 00m 55s)
  • 13:41 marostegui: Stop mysql on labsdb1009 for upgrade (this will trigger an haproxy IRC alert)
  • 13:39 marostegui: Reload haproxy on dbproxy1010 to depool labsdb1009
  • 13:27 marostegui@deploy1001: Synchronized wmf-config/db-eqiad.php: More traffic to db1075 after schema change and mysql upgrade (duration: 00m 52s)
  • 12:59 zeljkof: EU SWAT finished
  • 12:56 gtirloni: re-enabled puppet on seaborgium/serpens
  • 12:55 zfilipin@deploy1001: Synchronized wmf-config/InitialiseSettings.php: SWAT: Enable musical notation datatype on testwikidatawiki (T216730)|gerrit:493010Enable musical notation datatype on testwikidatawiki (T216730) (duration: 00m 56s)
  • 12:42 ariel@deploy1001: Finished deploy [dumps/dumps@3a25aa0]: handle failed xml content jobs correctly (fix regression) (duration: 00m 05s)
  • 12:42 ariel@deploy1001: Started deploy [dumps/dumps@3a25aa0]: handle failed xml content jobs correctly (fix regression)
  • 12:41 zfilipin@deploy1001: Synchronized wmf-config/InitialiseSettings.php: SWAT: Create an uploader group on mediawiki.org (T217523)|gerrit:494225Create an uploader group on mediawiki.org (T217523) (duration: 00m 55s)
  • 12:34 zfilipin@deploy1001: Synchronized dblists/commonsuploads.dblist: SWAT: Restrict local uploads on mediawiki.org, take 2 (T217523)|gerrit:494806Restrict local uploads on mediawiki.org, take 2 (T217523) (duration: 00m 56s)
  • 12:24 kartik@deploy1001: Synchronized wmf-config/InitialiseSettings.php: SWAT: gerrit:492447 Restore bureaucrat rights on hi.wiktionary to default () (duration: 00m 56s)
  • 12:08 kartik@deploy1001: Synchronized wmf-config/InitialiseSettings.php: SWAT: gerrit:494477 Enable edittag for ExternalGuidance in CX and VE (T216123) (duration: 00m 57s)
  • 12:00 marostegui@deploy1001: Synchronized wmf-config/db-eqiad.php: More traffic to db1075 after schema change and mysql upgrade (duration: 00m 56s)
  • 11:49 marostegui@deploy1001: Synchronized wmf-config/db-eqiad.php: Slowly repool db1075 after schema change and mysql upgrade (duration: 00m 56s)
  • 11:45 gtirloni: temporarily disabled puppet on seaborgium/serpens to try slapd config changes
  • 11:28 gtirloni: updated seaborgium to stretch (T217280)
  • 11:21 mutante: doc.wikimedia.org - back up, manually fixed path to php-fpm.sock to 7.0 - puppet disabled, fix coming
  • 11:18 mutante: doc.wikimedia.org down and being worked on - package downgrade exposed an issue
  • 11:15 marostegui: Stop MySQL on db1075 for upgrade
  • 11:15 mutante: doc1001 - apt-get remove --purge php7.2* (the same packages with 7.0 were previosly installed in parallel)
  • 10:58 gtirloni: upgrading seaborgium to Stretch (so it's running the same distro as serpens/codfw)
  • 10:34 moritzm: restarting HHVM/Apache on mediawiki canaries to pick up OpenSSL security update
  • 10:14 marostegui@deploy1001: Synchronized wmf-config/db-eqiad.php: Depool db1075 for schema change and mysql upgrade (duration: 00m 56s)
  • 10:13 moritzm: upgrading mediawiki canaries to component/php72 (T216712)
  • 09:47 moritzm: upgrading mwdebug servers in eqiad to component/php72 (T216712)
  • 09:37 akosiaris@cumin1001: conftool action : set/pooled=no; selector: dc=codfw,service=citoid,cluster=scb,name=scb.*
  • 09:37 akosiaris: rump up traffic to citoid kubernetes to 100%
  • 09:37 akosiaris@cumin1001: conftool action : set/pooled=no; selector: dc=eqiad,service=citoid,cluster=scb,name=scb.*
  • 09:21 moritzm: upgrading mwdebug servers in codfw to component/php72 (T216712)
  • 09:15 elukey: fixed vlan-analytics1-d-eqiad members on asw2-d-eqiad - T205507
  • 09:03 mutante: mw2151 - mkdir /var/run/nutcracker ; chown nutcracker:nutcracker /var/run/nutcracker ; systemctl start nutcracker - runs again - pooling server
  • 08:57 marostegui@deploy1001: Synchronized wmf-config/db-eqiad.php: Fully repool db1122 (duration: 00m 55s)
  • 08:54 mutante: depooled mw2151 - nutcracker failing
  • 08:19 mutante: reloading icinga service
  • 08:10 marostegui@deploy1001: Synchronized wmf-config/db-eqiad.php: More traffic to db1122 (duration: 00m 55s)
  • 07:47 marostegui@deploy1001: Synchronized wmf-config/db-eqiad.php: repool db1122 into API (duration: 00m 55s)
  • 07:30 marostegui@deploy1001: Synchronized wmf-config/db-eqiad.php: Slowly repool db1122 (duration: 00m 55s)
  • 07:28 marostegui@deploy1001: sync-file aborted: Repool db1121 (duration: 00m 01s)
  • 07:21 marostegui@deploy1001: Synchronized wmf-config/db-eqiad.php: Repool db1121 (duration: 00m 56s)
  • 07:12 marostegui: Stop MySQL on db1122 to upgradwe
  • 07:12 marostegui@deploy1001: Synchronized wmf-config/db-eqiad.php: Depool db1122 for MySQL upgrade (duration: 00m 57s)
  • 06:40 kart_: Finished manual run of unpublished ContentTranslation draft purge script (T217310)
  • 06:03 marostegui: Deploy schema change on db1121, this will generate lag on labsdb:s4 - T86342
  • 06:03 marostegui@deploy1001: Synchronized wmf-config/db-eqiad.php: Depool db1121 (duration: 00m 57s)
  • 04:03 kart_: Started manual run of unpublished ContentTranslation draft purge script (T217310)
  • 01:19 twentyafterfour: phabricator update complete
  • 01:17 twentyafterfour: starting phabricator update to tag release/2019-03-07/1 - expect momentary downtime
  • 01:10 twentyafterfour: preparing phabricator upgrade
  • 00:47 aaron@deploy1001: Synchronized php-1.33.0-wmf.20/includes/specials/pagers/ActiveUsersPager.php: f929e2a5069 (duration: 00m 56s)
  • 00:43 aaron@deploy1001: Synchronized php-1.33.0-wmf.20/includes/specials/SpecialActiveusers.php: f929e2a5069 (duration: 00m 56s)
  • 00:28 aaron@deploy1001: Synchronized wmf-config/InitialiseSettings.php: Enable loading WikibaseCirrusSearch (disabled) on production wikis (duration: 00m 55s)
  • 00:23 aaron@deploy1001: Synchronized wmf-config/InitialiseSettings.php: Run WikibaseCirrusSearch code for search on testwikidatawiki (duration: 00m 56s)

2019-03-06

  • 21:23 XioNoX: test ping-offload with unused IP 208.80.153.225 - T190090
  • 20:30 hashar: 1.33.0-wmf.20 looks fine with group0 and group1
  • 20:14 hashar@deploy1001: Synchronized php: group1 wikis to 1.33.0-wmf.20 (duration: 01m 43s)
  • 20:12 hashar@deploy1001: rebuilt and synchronized wikiversions files: group1 wikis to 1.33.0-wmf.20
  • 19:51 hashar@deploy1001: Synchronized php-1.33.0-wmf.20/extensions/LdapAuthentication/LdapPrimaryAuthenticationProvider.php: Remove calls to no-longer-imeplemented methods after I2eeaeed1 - T217692 (duration: 00m 58s)
  • 19:14 XioNoX: apply ping-offload redirect to private1-a-codfw - T190090
  • 19:03 gtirloni: increased serpens vCPUs from 4 to 8 (T217280)
  • 18:55 gtirloni: increased seaborgium vCPUs from 4 to 8 (T217280)
  • 18:08 bstorm_: re-enabled puppet after observing the change works well on the partner for labstore2004 and T210818
  • 18:07 joal@deploy1001: Finished deploy [analytics/refinery@fef9181]: Regular analytics weekly deploy train (duration: 31m 02s)
  • 18:04 bstorm_: disabled puppet and downtimed labstore2004 while deploying a change for T210818
  • 17:36 joal@deploy1001: Started deploy [analytics/refinery@fef9181]: Regular analytics weekly deploy train
  • 17:34 sbisson@deploy1001: Synchronized wmf-config/throttle.php: SWAT: Added new throttle rules, removed expired|gerrit:494782Added new throttle rules, removed expired (duration: 00m 55s)
  • 17:33 sbisson@deploy1001: sync-file aborted: SWAT: Added new throttle rules, removed expired|gerrit:494782Added new throttle rules, removed expired (duration: 00m 01s)
  • 17:24 sbisson@deploy1001: Synchronized wmf-config/InitialiseSettings.php: SWAT: wgCopyUploadDomains: Changed domain for mehrnews.com|gerrit:492448wgCopyUploadDomains: Changed domain for mehrnews.com (duration: 00m 56s)
  • 17:17 sbisson@deploy1001: Synchronized php-1.33.0-wmf.20/extensions/GrowthExperiments/extension.json: SWAT: Use schema version where reading is a valid editor_interface|gerrit:494531Use schema version where reading is a valid editor_interface (duration: 00m 56s)
  • 17:10 elukey@deploy1001: Finished deploy [analytics/superset/deploy@911ad13]: First deploy to new host (duration: 00m 27s)
  • 17:10 elukey@deploy1001: Started deploy [analytics/superset/deploy@911ad13]: First deploy to new host
  • 17:09 sbisson@deploy1001: Synchronized wmf-config/InitialiseSettings.php: SWAT: Welcome survey: send all newcomers to variation A (cs, ko)|gerrit:494698Welcome survey: send all newcomers to variation A (cs, ko) (duration: 00m 56s)
  • 16:53 jbond42: built prometheus-openldap-exporter for stretch
  • 16:51 ema: upgrade ATS to 8.0.2-1wm1
  • 16:23 moritzm: imported conftool 1.0.2-1+deb10u1 for buster-wikimedia
  • 16:10 krinkle@deploy1001: Synchronized php-1.33.0-wmf.20/includes/api/ApiBase.php: I921777 (duration: 00m 58s)
  • 16:05 moritzm: imported scap for buster-wikimedia (T213527)
  • 14:40 marostegui@deploy1001: Synchronized wmf-config/db-eqiad.php: Repool db1123 (duration: 00m 56s)
  • 13:35 marostegui: Upgrade MySQL on db1123
  • 13:18 jbond42: rolling security updates for file on jessie
  • 13:02 zeljkof: EU SWAT finished
  • 12:41 zfilipin@deploy1001: Synchronized wmf-config/InitialiseSettings.php: SWAT: Change links in cswiki Help Panel (T217391)|gerrit:494668Change links in cswiki Help Panel (T217391) (duration: 00m 55s)
  • 12:32 oblivian@deploy1001: Synchronized php-1.33.0-wmf.19/extensions/WikimediaEvents: SWAT: Allow directing a sample of users to PHP 7 backport to wmf.19 T216676 (duration: 00m 57s)
  • 12:22 gtirloni: updated serpens to stretch (T217280)
  • 12:22 zfilipin@deploy1001: Synchronized wmf-config/throttle.php: SWAT: Throttle Exception for Art+Feminism event Eindhoven 8th March (T217676)|gerrit:494669Throttle Exception for Art+Feminism event Eindhoven 8th March (T217676) (duration: 00m 56s)
  • 12:10 oblivian@deploy1001: Synchronized wmf-config/CommonSettings.php: Setting php7 sample rate for anonymous users to 0 (duration: 00m 57s)
  • 11:32 godog: bounce prometheus@k8s on prometheus2004 to test limiting concurrent connections
  • 11:21 gtirloni: updated and rebooted seaborgium (T217280)
  • 11:18 gtirloni: updated and rebooted serpens (T217280)
  • 10:56 marostegui: Deploy schema change on db1123
  • 10:56 marostegui@deploy1001: Synchronized wmf-config/db-eqiad.php: Depool db1123 (duration: 00m 53s)
  • 10:48 volans: upgraded spicerack to 0.0.20 on cumin[12]001
  • 10:46 volans: uploaded spicerack_0.0.20-1_amd64.deb to apt.wikimedia.org stretch-wikimedia
  • 10:38 hashar@deploy1001: Synchronized php-1.33.0-wmf.20/extensions/Translate/TranslateUtils.php: Revert "TranslateUtils: Avoid use of deprecated class Revision" - T217689 (duration: 00m 59s)
  • 10:36 hashar: Deploying a hotfix for Translate https://gerrit.wikimedia.org/r/#/c/mediawiki/extensions/Translate/+/494659/
  • 10:22 ema: lvs100[12],lvs1016: upgrade linux to 4.9.144-3.1, reboot for L1TF kernel/microcode updates T203011
  • 09:11 ema: lvs200[123]: upgrade linux to 4.9.144-3.1, reboot for L1TF kernel/microcode updates T203011
  • 09:05 moritzm: removed debmonitor host entry for ruthenium (T216062)
  • 09:01 mutante: switching noc.wikimedia.org from apache to httpd module (mwmaint2001, then mwmaint1002)
  • 08:48 akosiaris@cumin1001: conftool action : set/weight=12; selector: dc=eqiad,service=citoid,cluster=scb,name=kubernetes.*
  • 08:48 akosiaris@cumin1001: conftool action : set/weight=15; selector: dc=codfw,service=citoid,cluster=scb,name=kubernetes.*
  • 08:48 akosiaris: increase citoid traffic to kubernetes infrastructure to 50% T213194
  • 08:48 akosiaris: increase citoid traffic to kubernetes infrastructure to 50%
  • 08:47 marostegui: Deploy schema change on s3 codfw, this will generate lag on codfw - T86342
  • 08:42 ema: lvs300[12]: upgrade linux to 4.9.144-3.1, reboot for L1TF kernel/microcode updates T203011
  • 08:30 marostegui@deploy1001: Synchronized wmf-config/db-eqiad.php: Repool db1090 after MySQL upgrade (duration: 00m 59s)
  • 08:15 marostegui: Stop MySQL on db1090 for mysql upgrade
  • 08:14 marostegui@deploy1001: Synchronized wmf-config/db-eqiad.php: Depool db1090 for MySQL upgrade (duration: 00m 56s)
  • 08:14 marostegui: Reload haproxy on dbproxy1010 to repool labsdb1011
  • 07:48 marostegui@deploy1001: Synchronized wmf-config/db-eqiad.php: Fully repool db1105 after MySQL upgrade (duration: 00m 56s)
  • 07:40 marostegui@deploy1001: Synchronized wmf-config/db-eqiad.php: Increase traffic db1105 after MySQL upgrade (duration: 00m 56s)
  • 07:34 marostegui: Remove dbstore1002 from tendril and zarcillo T216491
  • 07:09 elukey: raised analytics user's max_user_connection from 10 to 100 on labsdb1012 - T215231
  • 07:06 marostegui@deploy1001: Synchronized wmf-config/db-eqiad.php: Increase traffic db1105 after MySQL upgrade (duration: 00m 56s)
  • 06:53 marostegui@deploy1001: Synchronized wmf-config/db-eqiad.php: Slowly repool db1105 after MySQL upgrade (duration: 00m 56s)
  • 06:32 marostegui: Stop MySQL on db1105 for MySQL upgrade
  • 06:31 marostegui@deploy1001: Synchronized wmf-config/db-eqiad.php: Depool db1105 for MySQL upgrade (duration: 01m 14s)
  • 06:27 marostegui: Add labsdb1012 to tendril and zarcillo - T215231
  • 05:50 kart_: Finished manual run of unpublished ContentTranslation draft purge script (T217310)
  • 04:26 eileen: civicrm revision changed from 196493f372 to 4aac68eead, config revision is 8ca90b4c7b
  • 04:00 kart_: Started manual run of unpublished ContentTranslation draft purge script (T217310)
  • 00:55 twentyafterfour: finished US Eveninig SWAT.
  • 00:41 twentyafterfour@deploy1001: Synchronized wmf-config/InitialiseSettings.php: sync https://gerrit.wikimedia.org/r/#/c/operations/mediawiki-config/+/494524/ for SWAT refs T217276 (duration: 00m 55s)
  • 00:23 twentyafterfour@deploy1001: Synchronized wmf-config/mobile.php: sync https://gerrit.wikimedia.org/r/#/c/operations/mediawiki-config/+/494271/ for SWAT refs T212253 (duration: 00m 56s)
  • 00:12 twentyafterfour@deploy1001: Synchronized wmf-config/InitialiseSettings.php: sync https://gerrit.wikimedia.org/r/#/c/operations/mediawiki-config/+/493236/ for SWAT. refs T217080 (duration: 00m 56s)

2019-03-05

  • 23:51 ejegg: updated payments-wiki from 4f2935ad17 to f1a89d7045
  • 21:05 godog: temporarily stop requests to k8s instance on prometheus2004
  • 21:00 herron: restarted apache on grafana1001
  • 20:43 herron: retarted apache on grafana1001
  • 19:56 hashar@deploy1001: Synchronized php-1.33.0-wmf.20/extensions/LdapAuthentication/: Stop referring to the now-killed AuthPlugin class - T217692 (duration: 00m 57s)
  • 17:44 godog: bounce uwsgi on graphite1004
  • 17:25 herron: restarting uwsgi-graphite-web on graphite1004
  • 16:54 moritzm: imported logstash 1:5.6.14-1 to thirdparty/elastic56
  • 16:52 herron: restarting uwsgi-graphite-web on graphite1004
  • 16:43 otto@deploy1001: scap-helm eventgate-analytics finished
  • 16:43 otto@deploy1001: scap-helm eventgate-analytics cluster staging completed
  • 16:43 otto@deploy1001: scap-helm eventgate-analytics upgrade staging stable/eventgate-analytics -f eventgate-analytics-staging-values.yaml [namespace: eventgate-analytics, clusters: staging]
  • 16:20 herron: restarting uwsgi-graphite-web on graphite1004
  • 15:53 hashar@deploy1001: rebuilt and synchronized wikiversions files: group0 to 1.33.0-wmf.20
  • 15:35 hashar@deploy1001: Finished scap: testwiki to php-1.33.0-wmf.20 and rebuild l10n cache # T206674 (duration: 51m 03s)
  • 14:52 gtirloni: reprepro added bdsync_0.10-1+deb9u1 T209527
  • 14:44 hashar@deploy1001: Started scap: testwiki to php-1.33.0-wmf.20 and rebuild l10n cache # T206674
  • 14:42 otto@deploy1001: scap-helm eventgate-analytics finished
  • 14:42 otto@deploy1001: scap-helm eventgate-analytics cluster eqiad completed
  • 14:42 otto@deploy1001: scap-helm eventgate-analytics upgrade production stable/eventgate-analytics -f eventgate-analytics-eqiad-values.yaml [namespace: eventgate-analytics, clusters: eqiad]
  • 14:41 otto@deploy1001: scap-helm eventgate-analytics finished
  • 14:41 otto@deploy1001: scap-helm eventgate-analytics cluster codfw completed
  • 14:41 otto@deploy1001: scap-helm eventgate-analytics upgrade production stable/eventgate-analytics -f eventgate-analytics-codfw-values.yaml [namespace: eventgate-analytics, clusters: codfw]
  • 14:40 jiji@cumin1001: conftool action : set/pooled=yes; selector: dc=codfw,service=citoid,cluster=scb,name=kubernetes.*
  • 14:35 hashar@deploy1001: scap failed: CalledProcessError Command '/usr/local/bin/mwscript mergeMessageFileList.php --wiki="cawikibooks" --list-file="/srv/mediawiki-staging/wmf-config/extension-list" --output="/tmp/tmp.BRPBtKvzZH" --verbose' returned non-zero exit status 1 (duration: 00m 20s)
  • 14:35 hashar@deploy1001: Started scap: testwiki to php-1.33.0-wmf.20 and rebuild l10n cache # T206674
  • 14:34 jijiki: Rump up citoid traffic from k8s to 25% on codfw - T213194
  • 14:34 hashar@deploy1001: scap failed: CalledProcessError Command '/usr/local/bin/mwscript mergeMessageFileList.php --wiki="cawikibooks" --list-file="/srv/mediawiki-staging/wmf-config/extension-list" --output="/tmp/tmp.ngh6XIMz8y" --verbose' returned non-zero exit status 1 (duration: 00m 21s)
  • 14:33 hashar@deploy1001: Started scap: testwiki to php-1.33.0-wmf.20 and rebuild l10n cache # T206674
  • 14:33 jiji@cumin1001: conftool action : set/weight=5; selector: dc=codfw,service=citoid,cluster=scb,name=kubernetes.*
  • 14:27 hashar@deploy1001: scap failed: CalledProcessError Command '/usr/local/bin/mwscript mergeMessageFileList.php --wiki="cawikibooks" --list-file="/srv/mediawiki-staging/wmf-config/extension-list" --output="/tmp/tmp.JrfRQw0oDJ" --verbose' returned non-zero exit status 1 (duration: 00m 21s)
  • 14:27 hashar@deploy1001: Started scap: testwiki to php-1.33.0-wmf.20 and rebuild l10n cache # T206674
  • 14:25 hashar@deploy1001: Pruned MediaWiki: 1.33.0-wmf.14 (duration: 09m 47s)
  • 14:20 otto@deploy1001: scap-helm eventgate-analytics finished
  • 14:20 otto@deploy1001: scap-helm eventgate-analytics cluster codfw completed
  • 14:20 otto@deploy1001: scap-helm eventgate-analytics cluster eqiad completed
  • 14:20 otto@deploy1001: scap-helm eventgate-analytics [namespace: eventgate-analytics, clusters: eqiad,codfw]
  • 14:17 hashar@deploy1001: scap failed: LockFailedError Failed to acquire lock "/var/lock/scap.operations_mediawiki-config.lock"; owner is "hashar"; reason is "Pruned MediaWiki: 1.33.0-wmf.14" (duration: 00m 00s)
  • 14:14 hashar: Applied wmf/1.33.0-wmf.20 local patches # T206674
  • 14:12 marostegui@deploy1001: Synchronized wmf-config/db-eqiad.php: Repool db1064 T217591 (duration: 01m 50s)
  • 13:31 hashar: Cutting branch wmf/1.33.0-wmf.20 # T206674
  • 13:16 marostegui@deploy1001: Synchronized wmf-config/db-eqiad.php: Depool db1064 T217591 (duration: 00m 48s)
  • 13:14 ema: lvs500[12]: upgrade linux to 4.9.144-3.1, reboot for L1TF kernel/microcode updates T203011
  • 13:07 zeljkof: EU SWAT finished
  • 12:58 zfilipin@deploy1001: Synchronized wmf-config/InitialiseSettings.php: SWAT: Set wgArticleCountMethod=any for zhwikiversity (T214946)|gerrit:487115Set wgArticleCountMethod=any for zhwikiversity (T214946) (duration: 00m 49s)
  • 12:45 kartik@deploy1001: Synchronized wmf-config/InitialiseSettings.php: SWAT: Revert "Enable edittag for ExternalGuidance in CX and VE" (duration: 00m 48s)
  • 12:24 kartik@deploy1001: Synchronized wmf-config/InitialiseSettings.php: SWAT: Revert gerrit:493155 (duration: 00m 49s)
  • 11:59 _joe_: upgrading scap everywhere to 3.9.2-1, T217611
  • 11:52 ema: lvs400[56]: upgrade linux to 4.9.144-3.1, reboot for L1TF kernel/microcode updates T203011
  • 11:45 _joe_: installing new scap version in codfw
  • 11:44 oblivian@deploy1001: Synchronized README: Test deploy for new scap version (duration: 00m 48s)
  • 11:43 _joe_: installing new swat version on deployment servers, T217611
  • 11:22 _joe_: uploading new scap packages , T217611
  • 10:58 ema: lvs4007/lvs5003: upgrade linux to 4.9.144-3.1, reboot for L1TF kernel/microcode updates T203011
  • 10:57 marostegui@deploy1001: Synchronized wmf-config/db-eqiad.php: Fully repool db1103:3312 and db1103:3314 after mysql upgrade (duration: 00m 47s)
  • 10:55 gilles@deploy1001: Synchronized php-1.33.0-wmf.19/extensions/NavigationTiming/NavigationTiming.config.php: T187299 Fix wiki oversampling config validation (duration: 00m 48s)
  • 10:37 marostegui@deploy1001: Synchronized wmf-config/db-eqiad.php: More traffic to db1103:3312 and db1103:3314 after mysql upgrade (duration: 00m 48s)
  • 10:27 jiji@cumin1001: conftool action : set/weight=4; selector: dc=eqiad,service=citoid,cluster=scb,name=kubernetes.*
  • 10:24 jijiki: Rump up citoid traffic from k8s to 25% - T213194
  • 10:18 marostegui@deploy1001: Synchronized wmf-config/db-eqiad.php: More traffic to db1103:3312 and db1103:3314 after mysql upgrade (duration: 00m 47s)
  • 10:10 gilles@deploy1001: Synchronized wmf-config/InitialiseSettings.php: T187299 Oversample navtiming on ruwiki and eswiki (duration: 00m 47s)
  • 10:07 gilles@deploy1001: Synchronized php-1.33.0-wmf.19/extensions/NavigationTiming: T187299 Backport wiki oversampling config syntax change (duration: 00m 48s)
  • 10:00 marostegui@deploy1001: Synchronized wmf-config/db-eqiad.php: Slowly repool db1103:3312 and db1103:3314 after mysql upgrade (duration: 00m 50s)
  • 09:56 ema: lvs200[456]: upgrade linux to 4.9.144-3.1, reboot for L1TF kernel/microcode updates T203011
  • 09:31 marostegui: Stop MySQL on db1103:3312 and db1103:3314 for MySQL upgrade
  • 09:29 marostegui@deploy1001: Synchronized wmf-config/db-eqiad.php: Depool db1103:3312 and db1103:3314 for mysql upgrade (duration: 00m 47s)
  • 09:26 ema: lvs100[456]: reboot for L1TF kernel/microcode updates T203011
  • 09:23 marostegui@deploy1001: Synchronized wmf-config/db-eqiad.php: Repool db1081 (duration: 00m 47s)
  • 09:16 godog: kibana refresh field list
  • 08:58 mutante: restarting gerrit to pickup change 493963 - disable jgit gc
  • 08:47 marostegui@deploy1001: Synchronized wmf-config/db-eqiad.php: Depool db1081 (duration: 00m 47s)
  • 08:43 marostegui@deploy1001: Synchronized wmf-config/db-eqiad.php: Fully repool db1084 (duration: 00m 48s)
  • 08:33 marostegui@deploy1001: Synchronized wmf-config/db-eqiad.php: Repool db1084 in API (duration: 00m 48s)
  • 08:32 marostegui: Optimize echo_event table on x1 codfw master (db2034) this will generate lag on x1 codfw - T217591
  • 08:24 akosiaris: T213194 bump percentage of citoid requests reaching eqiad kubernetes cluster to 9%
  • 08:23 jiji@cumin1001: conftool action : set/pooled=yes; selector: dc=eqiad,service=citoid,cluster=scb,name=kubernetes100.*
  • 08:14 marostegui@deploy1001: Synchronized wmf-config/db-eqiad.php: Slowly repool db1084 (duration: 00m 49s)
  • 07:47 marostegui: Upgrade MySQL on db1084
  • 07:18 marostegui: Stop MySQL on db1095 (backups host) to upgrade MySQL
  • 07:18 marostegui@deploy1001: Synchronized wmf-config/db-eqiad.php: Depool db1084 (duration: 00m 47s)
  • 07:08 marostegui: Start transferring data from labsdb1011 to labsdb1012 - T215231
  • 06:56 marostegui: Reboot labsdb1012
  • 06:55 marostegui: Defragment echo_event tables on dbstore1005:3320 T217591
  • 06:51 marostegui@deploy1001: Synchronized wmf-config/db-eqiad.php: Repool db1091 (duration: 00m 48s)
  • 06:43 marostegui: Stop MySQL on db2035 (s2 codfw master) to upgrade MySQL
  • 06:41 kart_: Finished manual run of unpublished ContentTranslation draft purge script (T217310)
  • 06:18 marostegui: Stop MySQL on dbstore2001 to upgrade MySQL
  • 06:17 marostegui: Reload haproxy on dbproxy1010 to depool labsdb1011
  • 06:10 marostegui@deploy1001: Synchronized wmf-config/db-eqiad.php: Depool db1091 (duration: 00m 51s)
  • 03:05 catrope@deploy1001: Synchronized php-1.33.0-wmf.19/includes/api/ApiBase.php: Handle TitleBlacklist errors correctly (T217382) (duration: 00m 49s)
  • 03:03 kart_: Started manual run of unpublished ContentTranslation draft purge script (T217310)
  • 02:59 ejegg: updated payments-wiki from ca7c280f3e to 4f2935ad17
  • 02:27 catrope@deploy1001: Synchronized php-1.33.0-wmf.19/includes/api/ApiBase.php: Revert hot fix (duration: 00m 46s)
  • 02:21 catrope@deploy1001: Synchronized php-1.33.0-wmf.19/includes/api/ApiBase.php: Hot fix for T217615 (duration: 00m 47s)
  • 02:05 catrope@deploy1001: Synchronized php-1.33.0-wmf.19/includes/api/ApiBase.php: Logging live patch to debug T217615 (duration: 00m 47s)
  • 01:33 catrope@deploy1001: Synchronized php-1.33.0-wmf.19/includes/api/ApiBase.php: Logging live patch to debug T217615 (duration: 00m 47s)
  • 01:21 catrope@deploy1001: Synchronized php-1.33.0-wmf.19/includes/api/ApiBase.php: Logging live patch to debug T217615 (duration: 00m 47s)
  • 01:18 catrope@deploy1001: Synchronized php-1.33.0-wmf.19/includes/api/ApiBase.php: Logging live patch to debug T217615 (duration: 00m 47s)
  • 01:15 catrope@deploy1001: Synchronized php-1.33.0-wmf.19/includes/api/ApiBase.php: Logging live patch to debug T217615 (duration: 00m 49s)
  • 01:13 tzatziki: changing password for "Force de Mots" and "שרית חייט"
  • 00:46 XioNoX: disable unused ports of restbase1016 on asw-a
  • 00:44 catrope@deploy1001: Synchronized php-1.33.0-wmf.19/extensions/WikimediaEvents/: Redact title/create params and drop page_title in EditorJourney schema (T213974) (duration: 00m 49s)
  • 00:40 catrope@deploy1001: Synchronized wmf-config/InitialiseSettings.php: Enable ORES goodfaith on itwiki (T211032) (duration: 00m 47s)
  • 00:17 catrope@deploy1001: Synchronized php-1.33.0-wmf.19/extensions/GrowthExperiments/includes/HelpPanel.php: Exclude help panel from main page (T215664) (duration: 00m 48s)
  • 00:12 catrope@deploy1001: Synchronized wmf-config/InitialiseSettings.php: Enable ORES on kowiki (T161628) (duration: 00m 49s)

2019-03-04

  • 23:09 eileen: civicrm revision changed from 316e038a69 to 196493f372, config revision is 8ca90b4c7b
  • 22:15 arlolra: Updated Parsoid to 1660395 (T214099, T202905)
  • 22:05 arlolra@deploy1001: Finished deploy [parsoid/deploy@bdc9e66]: Updating Parsoid to 1660395 (duration: 06m 34s)
  • 21:59 arlolra@deploy1001: Started deploy [parsoid/deploy@bdc9e66]: Updating Parsoid to 1660395
  • 21:58 otto@deploy1001: scap-helm eventgate-analytics finished
  • 21:58 otto@deploy1001: scap-helm eventgate-analytics cluster codfw completed
  • 21:58 otto@deploy1001: scap-helm eventgate-analytics upgrade production stable/eventgate-analytics -f eventgate-analytics-codfw-values.yaml [namespace: eventgate-analytics, clusters: codfw]
  • 21:58 otto@deploy1001: scap-helm eventgate-analytics finished
  • 21:58 otto@deploy1001: scap-helm eventgate-analytics cluster eqiad completed
  • 21:58 otto@deploy1001: scap-helm eventgate-analytics upgrade production stable/eventgate-analytics -f eventgate-analytics-eqiad-values.yaml [namespace: eventgate-analytics, clusters: eqiad]
  • 21:54 otto@deploy1001: scap-helm eventgate-analytics finished
  • 21:54 otto@deploy1001: scap-helm eventgate-analytics cluster staging completed
  • 21:54 otto@deploy1001: scap-helm eventgate-analytics upgrade staging stable/eventgate-analytics -f eventgate-analytics-staging-values.yaml [namespace: eventgate-analytics, clusters: staging]
  • 21:54 otto@deploy1001: scap-helm eventgate-analytics install -n staging -f eventgate-analytics-staging-values.yaml stable/eventgate-analytics [namespace: eventgate-analytics, clusters: staging]
  • 21:49 ejegg: re-enabled Omnimail unsubscribe processing, disabled recipient repair job
  • 21:46 ejegg: updated Fundraising CiviCRM from 616c58cebe to 316e038a69
  • 21:19 XioNoX: add bgp sessions to AS137236 on cr1-eqsin
  • 21:14 XioNoX: re-enable bgp to AS13489 on cr2-eqiad
  • 20:44 reedy@deploy1001: Synchronized php-1.33.0-wmf.19/extensions/Echo/: T217487 (duration: 00m 53s)
  • 20:23 niharika29@deploy1001: Finished deploy [scholarships/scholarships@2ef7463]: Remove outdated translations (duration: 00m 02s)
  • 20:23 niharika29@deploy1001: Started deploy [scholarships/scholarships@2ef7463]: Remove outdated translations
  • 20:17 niharika29@deploy1001: Finished deploy [scholarships/scholarships@2ef7463]: Deploy new version of app with new translations + fix broken privacy policy link (duration: 00m 02s)
  • 20:17 niharika29@deploy1001: Started deploy [scholarships/scholarships@2ef7463]: Deploy new version of app with new translations + fix broken privacy policy link
  • 20:01 sbisson@deploy1001: Synchronized wmf-config/Wikibase.php: SWAT: Enables maplink for geocoordinate Wikibase statements display on clients|gerrit:494289Enables maplink for geocoordinate Wikibase statements display on clients (duration: 00m 48s)
  • 20:00 sbisson@deploy1001: Synchronized wmf-config/InitialiseSettings.php: SWAT: Enable reader demographics survey|gerrit:494292Enable reader demographics survey (duration: 00m 49s)
  • 19:52 sbisson@deploy1001: Synchronized wmf-config/InitialiseSettings.php: SWAT: GrowthExperiments: Enable help panel for user and user talk NS|gerrit:493616GrowthExperiments: Enable help panel for user and user talk NS (duration: 00m 49s)
  • 19:47 sbisson@deploy1001: Synchronized tests/loggingTest.php: SWAT: Add eventbus analytics logging alongside with kafka logging. (part 2)|gerrit:490668Add eventbus analytics logging alongside with kafka logging. (part 2) (duration: 00m 48s)
  • 19:46 sbisson@deploy1001: Synchronized wmf-config/: SWAT: Add eventbus analytics logging alongside with kafka logging. (part 1)|gerrit:490668Add eventbus analytics logging alongside with kafka logging. (part 1) (duration: 00m 51s)
  • 19:41 onimisionipe@deploy1001: Finished deploy [wdqs/wdqs@20badb3]: Updater and Blazegraph group to report metric domain plus GUI updates (duration: 11m 07s)
  • 19:35 sbisson@deploy1001: Synchronized wmf-config/InitialiseSettings.php: SWAT: Enable GrowthExperiments Homepage on testwiki|gerrit:494223Enable GrowthExperiments Homepage on testwiki (duration: 00m 49s)
  • 19:30 onimisionipe@deploy1001: Started deploy [wdqs/wdqs@20badb3]: Updater and Blazegraph group to report metric domain plus GUI updates
  • 19:03 bstorm_: dumps.wikimedia.org is now running off labstore1007 T217473
  • 18:25 bstorm_: disabled notifications for high load on labstore1007 while failed over T217473
  • 18:23 vgutierrez: restarting pybal on lvs5002 - T213121
  • 18:16 XioNoX: push lvs5002 changes on cr2-eqsin - T213121
  • 16:54 hashar: contint1001: cleaned all Docker containers, compress /var/log/zuul/ files
  • 16:52 jiji@cumin1001: conftool action : set/pooled=yes; selector: dc=eqiad,service=citoid,cluster=scb,name=kubernetes1001.*
  • 16:43 marostegui: Restart MySQL on db1112 for addshore
  • 16:33 jynus: enabing gtid replication on clouddb1002
  • 16:29 jforrester@deploy1001: Synchronized wmf-config/CommonSettings.php: T217365: Enable VE section editing on mobile for Beta Cluster, part II (duration: 00m 48s)
  • 16:27 jforrester@deploy1001: Synchronized wmf-config/InitialiseSettings.php: T217365: Enable VE section editing on mobile for Beta Cluster, part I (duration: 00m 51s)
  • 16:18 moritzm: installing ldb security updates
  • 16:13 jiji@cumin1001: conftool action : set/pooled=yes; selector: dc=eqiad,service=citoid,cluster=scb,name=kubernetes1001
  • 16:13 jiji@cumin1001: conftool action : set/weight=1; selector: dc=eqiad,service=citoid,cluster=scb,name=kubernetes1001
  • 16:13 jiji@cumin1001: conftool action : set/weight=1; selector: dc=eqiad,service=citoid,cluster=scb,name=kubernetes.*
  • 15:55 jijiki: Running puppet on sbc* and kubernetes* - T213194
  • 15:44 jijiki: Disabling puppet on sbc* and kubernetes* - T213194
  • 15:22 otto@deploy1001: Synchronized wmf-config/CommonSettings.php: no-op: Remove unused legacy EventBus config settings (duration: 00m 49s)
  • 15:11 marostegui@deploy1001: Synchronized wmf-config/db-eqiad.php: Repool db1097:3314 after changing index on logging table (duration: 00m 51s)
  • 14:54 marostegui@deploy1001: Synchronized wmf-config/db-eqiad.php: Repool db1089 and db1100 after changing index on logging tbale (duration: 00m 49s)
  • 14:20 elukey: update puppet compiler's facts
  • 14:20 marostegui: Change indexes on logging table on db1100 (s5) and db1097:3314 (commonswiki) - T217397
  • 14:06 marostegui@deploy1001: Synchronized wmf-config/db-eqiad.php: Depool db1097:3314, db1100 to changeindexes on logging tbale (duration: 00m 50s)
  • 13:57 gehel: restarting blazegraph on wdqs eqiad
  • 12:23 moritzm: testing component/php72 on mw2224
  • 11:04 akosiaris@deploy1001: scap-helm citoid finished
  • 11:04 akosiaris@deploy1001: scap-helm citoid cluster codfw completed
  • 11:04 akosiaris@deploy1001: scap-helm citoid upgrade -f citoid-codfw-values.yaml production stable/citoid [namespace: citoid, clusters: codfw]
  • 11:04 akosiaris@deploy1001: scap-helm citoid finished
  • 11:04 akosiaris@deploy1001: scap-helm citoid cluster eqiad completed
  • 11:04 akosiaris@deploy1001: scap-helm citoid upgrade -f citoid-eqiad-values.yaml production stable/citoid [namespace: citoid, clusters: eqiad]
  • 11:04 akosiaris@deploy1001: scap-helm citoid finished
  • 11:04 akosiaris@deploy1001: scap-helm citoid cluster staging completed
  • 11:04 akosiaris@deploy1001: scap-helm citoid upgrade -f citoid-staging-values.yaml staging stable/citoid [namespace: citoid, clusters: staging]
  • 10:53 marostegui@deploy1001: Synchronized wmf-config/db-eqiad.php: More weight to db1089 (duration: 00m 48s)
  • 10:38 jdrewniak@deploy1001: Synchronized portals: Wikimedia Portals Update: Bumping portals to master (T128546)|gerrit:494191 Bumping portals to master (T128546) (duration: 00m 50s)
  • 10:37 jdrewniak@deploy1001: Synchronized portals/wikipedia.org/assets: Wikimedia Portals Update: Bumping portals to master (T128546)|gerrit:494191 Bumping portals to master (T128546) (duration: 00m 50s)
  • 09:44 marostegui@deploy1001: Synchronized wmf-config/db-eqiad.php: Repool db1089 with low weight (duration: 00m 48s)
  • 09:27 ariel@deploy1001: Finished deploy [dumps/dumps@932bf7e]: make misc dumps failure message nicer (duration: 00m 09s)
  • 09:27 ariel@deploy1001: Started deploy [dumps/dumps@932bf7e]: make misc dumps failure message nicer
  • 09:22 godog: temporarily stop prometheus on prometheus2004 to take a snapshot
  • 08:45 gilles@deploy1001: Synchronized wmf-config/InitialiseSettings.php: T216499 Undo enabling Priority Hints origin trial on ruwiki (duration: 00m 49s)
  • 08:44 marostegui@deploy1001: Synchronized wmf-config/db-eqiad.php: Repool db1103:3314 (duration: 00m 49s)
  • 08:38 gilles@deploy1001: scap failed: average error rate on 7/11 canaries increased by 10x (rerun with --force to override this check, see https://logstash.wikimedia.org/goto/db09a36be5ed3e81155041f7d46ad040 for details)
  • 08:29 marostegui: Change logging indexes on db1089 to leave the indexes exactly like the ones on tables.sql - T217397
  • 08:14 marostegui@deploy1001: Synchronized wmf-config/db-eqiad.php: Depool db1089 - T217397 (duration: 00m 49s)
  • 07:48 ema: cp3032/cp3042: restart varnish-be due to mbox lag
  • 07:42 marostegui@deploy1001: Synchronized wmf-config/db-eqiad.php: Depool db1103:3314 for schema change (duration: 00m 49s)
  • 07:39 marostegui@deploy1001: Synchronized wmf-config/db-eqiad.php: Repool db1097:3314 (duration: 00m 53s)
  • 07:33 marostegui: Reload haproxy on dbproxy1010 to repool labsdb1010
  • 07:17 kart_: Finished manual run of unpublished ContentTranslation draft purge script (T217310)
  • 07:13 marostegui: Remove dbstore1002 from tendril and zarcillo - T216491
  • 07:05 marostegui: Upgrade MySQL on db2088 and db2091
  • 06:46 marostegui: Stop MySQL on dbstore1002 for decommission T210478 T172410 T216491 T215589
  • 06:38 marostegui: Stop MySQL on labsdb1010 for mysql upgrade
  • 06:34 gtirloni: downtimed cloudstore1008/9 (T209527)
  • 06:13 marostegui: Upgrade MySQL on db2041 db2049 db2056 db2095
  • 06:06 marostegui: Run analyze table logging on db2038 and db2059 - T71222
  • 06:05 marostegui: Reload haproxy on dbproxy1010 to depool labsdb1010
  • 06:04 marostegui@deploy1001: Synchronized wmf-config/db-eqiad.php: Depool db1094:3314 for schema change (duration: 01m 11s)
  • 05:18 kart_: Started manual run of unpublished ContentTranslation draft purge script (T217310)

2019-03-03

  • off: restarted icinga on icinga2001, stale status file, too many open files
  • 10:44 elukey: restart pdfrender on scb1003

2019-03-02

  • 12:12 gtirloni: labstore1006 started nfsd T217473

2019-03-01

  • 20:45 ejegg: turned off fundraising omnimail process unsubscribes job
  • 19:40 XioNoX: pre-configure asw-a8 ports on asw2-a8-eqiad - T187960
  • 19:32 XioNoX: pre-configure asw-a7 ports on asw2-a7-eqiad - T187960
  • 19:29 XioNoX: pre-configure asw-a6 ports on asw2-a6-eqiad - T187960
  • 19:17 XioNoX: pre-configure asw-a5 ports on asw2-a5-eqiad - T187960
  • 18:53 robh: notebook1003 has unusually high load recently (23) and seemed to lag in reporting to icinga. no hardware failures, pinged about it in #wikimedia-analytics
  • 16:33 jbond42: rolling security update of bind9 packages on jessie and trusty
  • 15:38 ema: trafficserver_8.0.2-1wm1 uploaded to stretch-wikimedia
  • 15:02 akosiaris: restore proton config values
  • 14:33 hashar: Updating all debian-glue Jenkins job to properly take in account the BUILD_TIMEOUT parameter # T217403
  • 13:24 moritzm: removed sca* hosts from debmonitor database
  • 12:49 akosiaris: lower max_render_queue_size: to 20 for proton on proton100{1,2}
  • 12:32 akosiaris: restart proton1002, OOM showed up
  • 12:31 akosiaris: restart proton on proton1001, counted 99 chromium processes left running since at least Jan 30
  • 11:47 jbond42: rebooting labsdb1005.codfw.wmnet
  • 11:17 jbond42: rebooting labstore2004.codfw.wmnet
  • 11:05 marostegui@deploy1001: Synchronized wmf-config/db-eqiad.php: Fully repool db1094 (duration: 00m 50s)
  • 08:52 godog: temporarily stop prometheus instances on prometheus2004 to take a snapshot
  • 07:44 oblivian@deploy1001: Synchronized README: Test deploy for new scap configuration (duration: 00m 48s)
  • 07:39 oblivian@deploy1001: Synchronized README: noop sync to test opcache-manager (duration: 00m 47s)
  • 07:31 oblivian@deploy1001: Synchronized README: Test deploy for new scap configuration (duration: 00m 46s)
  • 07:29 marostegui@deploy1001: Synchronized wmf-config/db-eqiad.php: Increase traffic for db1094 after mysql upgrade (duration: 00m 47s)
  • 07:23 _joe_: installed php 7.2 compatible packages on deploy1001,2001
  • 07:12 marostegui@deploy1001: Synchronized wmf-config/db-eqiad.php: Increase traffic for db1094 after mysql upgrade (duration: 00m 47s)
  • 06:56 marostegui@deploy1001: Synchronized wmf-config/db-eqiad.php: Slowly repool db1094 after mysql upgrade (duration: 00m 46s)
  • 06:48 marostegui: Deploy schema change on s4 codfw, lag will appear on s4 codfw - T86342
  • 06:43 marostegui: Stop MySQL on db1094 for mysql upgrade
  • 06:40 _joe_: upgrading php extensions on deploy* to versions compatible with php7.2
  • 05:59 marostegui@deploy1001: Synchronized wmf-config/db-eqiad.php: Depool db1094 (duration: 00m 51s)
  • 00:12 XioNoX: pre-configure asw-a3 ports on asw2-a3-eqiad - T187960
  • 00:09 thcipriani@deploy1001: Synchronized README: noop sync to test opcache-manager in scap 3.9.1-1 (duration: 00m 48s)

2019-02-28

  • 23:44 XioNoX: pre-configure asw-a2 ports on asw2-a2-eqiad - T187960
  • 23:31 XioNoX: pre-configure asw-a1 ports on asw2-a1-eqiad - T187960
  • 23:27 bblack@puppetmaster1001: conftool action : set/pooled=yes; selector: name=cp107[678]\.eqiad\.wmnet
  • 23:07 robh: decom cp1045-cp1055, all are role spare but may icinga alert for ping
  • 22:39 ejegg: updated fundraising CiviCRM from c81fe7a4fd to 616c58cebe
  • 22:33 reedy@deploy1001: Synchronized wmf-config/CommonSettings.php: Remove old translate config (duration: 00m 46s)
  • 22:29 reedy@deploy1001: Synchronized wmf-config/CommonSettings.php: Disable some translate special pages again T217376 (duration: 00m 47s)
  • 22:29 ottomata: replaying events from mediawki eventbus config outage - T217385
  • 22:03 hashar: MediaWiki 1.33.0-wmf.19 deployed on all wikis # T206673
  • 21:59 XioNoX: disable asw2-a5 <> asw-a link - T217383
  • 21:28 hashar@deploy1001: Synchronized wmf-config/CommonSettings.php: (no justification provided) (duration: 00m 47s)
  • 21:09 herron: disabling logstash persisted queue
  • 20:52 herron: cleared logstash persistent queue on logstash100[7-9]
  • 20:13 hashar@deploy1001: rebuilt and synchronized wikiversions files: all wikis to 1.33.0-wmf.19
  • 20:02 thcipriani@deploy1001: Synchronized wmf-config/throttle.php: SWAT: Add throttle exception for Amnesty International Editathon Thottle Rules: remove "all" Add new throttle rules T216998 T217063 T217305 T217311 (duration: 00m 54s)
  • 19:40 thcipriani@deploy1001: Synchronized wmf-config/CommonSettings.php: SWAT Remove legacy eventBus config settings. (duration: 00m 53s)
  • 19:36 _joe_: upgrading scap on all servers
  • 19:30 thcipriani@deploy1001: Synchronized wmf-config/throttle.php: SWAT: Add throttle rule for Art+Feminism 2019 editathon T217336 (duration: 00m 54s)
  • 19:26 thcipriani@deploy1001: Synchronized wmf-config/InitialiseSettings-labs.php: SWAT: Enable WikibaseCirrusSearch on Beta Cluster (beta only change/noop sync) T215684 (duration: 00m 55s)
  • 19:22 robh: mw1272 being worked on by onsite
  • 19:21 robh: mw1272 unresponsive to mgmt or production interfaces
  • 19:16 thcipriani@deploy1001: Synchronized wmf-config/InitialiseSettings.php: SWAT: GrowthExperiments: Start help panel experiment on viwiki T215666 (duration: 03m 02s)
  • 18:52 moritzm: installing libgd security updates on trusty
  • 18:52 herron: migrating logstash1006 kafka to logstash1012 T213898
  • 18:43 XioNoX: start pybal on lvs1016 - T212348
  • 18:34 robh: cp1078 power down for network move
  • 18:28 XioNoX: stop pybal on lvs1016 - T212348
  • 18:28 robh: cp1077 power off for network port relocation
  • 18:21 robh: cp1076 power down for network port move
  • 17:51 herron: logstash1011 kafka now in sync. transitioning logstash1005 to spare system T213898
  • 17:24 cmjohnson1: powering down sodium to move racks T212348
  • 17:23 jynus: recreating replicas, master ops events for db1078, db1075 T213858
  • 16:43 filippo@puppetmaster1001: conftool action : set/pooled=yes; selector: name=ms-fe1006.eqiad.wmnet
  • 16:39 elukey: clean up old/stale zookeeper znodes from conf100[4-6] - T216979
  • 16:28 herron: migrating kafka on logstash1005 to logstash1011 T213898
  • 16:27 herron: migrating kafka on logstash1005 to logstash1011 T213898
  • 16:15 addshore@deploy1001: Synchronized wmf-config/InitialiseSettings.php: T214905 Add ReferencePreviews to allowed BetaFeatures (duration: 00m 54s)
  • 16:08 jbond42: rebooting labstore2003
  • 15:56 thcipriani@deploy1001: Synchronized README: noop sync to test opcache-manager in scap 3.9.1-1 (duration: 00m 53s)
  • 15:52 jbond42: rebooting labsdb1004
  • 15:50 thcipriani@deploy1001: Synchronized README: noop sync scap 3.9.1-1 (duration: 00m 52s)
  • 15:49 akosiaris@deploy1001: scap-helm citoid finished
  • 15:49 akosiaris@deploy1001: scap-helm citoid cluster staging completed
  • 15:48 akosiaris@deploy1001: scap-helm citoid upgrade -f citoid-staging-values.yaml staging stable/citoid [namespace: citoid, clusters: staging]
  • 15:46 _joe_: install scap 3.9.1-1 on the deployment servers
  • 15:43 filippo@puppetmaster1001: conftool action : set/pooled=no; selector: name=ms-fe1006.eqiad.wmnet
  • 15:43 jbond42: rebooting labsdb1007
  • 15:37 jbond42: rebooting labsdb1006
  • 15:36 filippo@puppetmaster1001: conftool action : set/pooled=yes; selector: name=ms-fe1005.eqiad.wmnet
  • 15:33 jbond42: rebooting labstore2002
  • 15:29 jbond42: rebooting labstore2001
  • 15:23 filippo@puppetmaster1001: conftool action : set/pooled=no; selector: name=ms-fe1005.eqiad.wmnet
  • 15:19 jbond42: rebooting rhodium
  • 15:15 cmjohnson1: powering off db1114 to replace motherboard T214720
  • 15:14 _joe_: uploading scap 3.9.1-1 to {stretch,jessie}-wikimedia
  • 14:50 jbond42: reboot cloudnet2001-dev.codfw.wmnet
  • 14:47 hashar: mw1272 fixed by running "scap sync-l10n" from deploy host
  • 14:46 hashar: mw1272 had /srv/mediawiki/php-1.33.0-wmf.19/includes/cache/localisation/LocalisationCache.php:475) No localisation cache found for English. Please run maintenance/rebuildLocalisationCache.php.
  • 14:46 hashar@deploy1001: scap sync-l10n completed (1.33.0-wmf.19) (duration: 03m 33s)
  • 14:42 jbond@cumin1001: conftool action : set/pooled=no; selector: name=rhodium.eqiad.wmnet
  • 14:41 hashar@deploy1001: Synchronized php: group1 wikis to 1.33.0-wmf.19 (duration: 00m 53s)
  • 14:40 hashar@deploy1001: rebuilt and synchronized wikiversions files: group1 wikis to 1.33.0-wmf.19
  • 14:34 milimetric@deploy1001: Finished deploy [analytics/refinery@f605fad]: New sqoop logic that uses the sharded replicas (duration: 10m 00s)
  • 14:30 akosiaris@deploy1001: scap-helm citoid finished
  • 14:30 akosiaris@deploy1001: scap-helm citoid cluster staging completed
  • 14:30 akosiaris@deploy1001: scap-helm citoid upgrade -f citoid-staging-values.yaml staging stable/citoid [namespace: citoid, clusters: staging]
  • 14:28 hashar@deploy1001: Synchronized php-1.33.0-wmf.19/extensions/WikibaseMediaInfo: Move up checks to test if we should construct depicts widgets - T217285 (duration: 00m 58s)
  • 14:24 milimetric@deploy1001: Started deploy [analytics/refinery@f605fad]: New sqoop logic that uses the sharded replicas
  • 13:56 elukey: re-start cleanup of 20k+ zookeeper nodes on conf100[4-6] (old Hadoop Yarn state) - T216952
  • 13:52 filippo@puppetmaster1001: conftool action : set/pooled=no; selector: name=prometheus1003.eqiad.wmnet
  • 13:43 godog: depool prometheus1003.eqiad.wmnet to take a data snapshot
  • 13:34 filippo@puppetmaster1001: conftool action : set/pooled=yes; selector: name=prometheus2003.codfw.wmnet
  • 12:36 zeljkof: EU SWAT finished
  • 12:35 zfilipin@deploy1001: Synchronized wmf-config/throttle.php: SWAT: Add throttle rule for Day of Digital Service (T217155) (duration: 00m 52s)
  • 12:31 zfilipin@deploy1001: Synchronized wmf-config/throttle.php: SWAT: New throttle rule for Czech Wikigap 2019 (T217270) (duration: 00m 53s)
  • 12:18 zfilipin@deploy1001: Synchronized wmf-config/: SWAT: Show referencePreviews on group0 wikis as beta feature (T214905) (duration: 00m 56s)
  • 11:59 jbond42: rolling openssl security updates to jessie systems
  • 11:32 akosiaris: remove sca1003, sca1004, sca2003, sca2004 from the fleet. Celebrate!!!!
  • 11:28 elukey: pause cleanup of 20k+ zookeeper nodes on conf100[4-6] (old Hadoop Yarn state) - T216952
  • 10:00 _joe_: executing a rolling puppet run (2 server at a time per cluster, per dc) in eqiad,codfw as an HHVM restart will be triggered
  • 09:37 gilles@deploy1001: Synchronized php-1.33.0-wmf.19/extensions/NavigationTiming/modules/ext.navigationTiming.js: T217210 Don't assume PerformanceObserver entry types are supported (duration: 00m 54s)
  • 09:30 elukey: start cleanup of 20k+ zookeeper nodes on conf100[4-6] (old Hadoop Yarn state) - T216952
  • 09:26 moritzm: installed php security updates on netmon1002 and people1001
  • 09:22 marostegui: Stop MySQL on db1125 (sanitarium) to upgrade, this will generate lag on labs on: s2, s4, s6,s7
  • 09:21 marostegui@deploy1001: Synchronized wmf-config/db-eqiad.php: Repool db1121 (duration: 00m 54s)
  • 09:08 marostegui: Stop MySQL on db1121 for upgrade, this will generate lag on labsdb:s4
  • 09:08 marostegui@deploy1001: Synchronized wmf-config/db-eqiad.php: Depool db1121 (duration: 00m 53s)
  • 08:59 marostegui@deploy1001: Synchronized wmf-config/db-eqiad.php: Fully repool db1079 (duration: 00m 53s)
  • 08:32 marostegui@deploy1001: Synchronized wmf-config/db-eqiad.php: Increase API traffic db1079 after mysql upgrade (duration: 00m 53s)
  • 08:31 elukey: roll restart of Yarn Resource Managers on an-master100[1,2] to pick up new settings
  • 08:22 marostegui: Change abuse_filter_log indexes on s3 codfw, lag will appear on codfw - T187295
  • 08:12 marostegui@deploy1001: Synchronized wmf-config/db-eqiad.php: Increase traffic for db1079 after mysql upgrade (duration: 00m 54s)
  • 08:06 moritzm: installing glibc security updates for stretch
  • 07:47 marostegui@deploy1001: Synchronized wmf-config/db-eqiad.php: Slowly repool db1079 in API after mysql upgrade (duration: 00m 53s)
  • 07:24 marostegui@deploy1001: Synchronized wmf-config/db-eqiad.php: Slowly repool db1079 after mysql upgrade (duration: 00m 56s)
  • 07:08 marostegui: Stop MySQL on db1079 for mysql upgrade
  • 06:50 marostegui: Deploy schema change on db1079, this will generate lag on s7 on labs - T86342
  • 06:23 marostegui@deploy1001: Synchronized wmf-config/db-eqiad.php: Depool db1079 (duration: 00m 55s)
  • 06:18 kart_: Finished manual run of unpublished ContentTranslation draft purge script (T216983)
  • 05:56 marostegui: Upgrade MySQL on db1124 (Sanitarium) lag will be generated on s1,s3,s5,s8
  • 03:03 kart_: Manual run of unpublished ContentTranslation draft purge script (T216983)
  • 02:08 bstorm_: clouddb1002 is now in place to replace labsdb1004 as replica for toolsdb but not wikilabels postgres yet T193264
  • 01:43 twentyafterfour: phabricator upgrade completed without issues (actually completed at 01:23 UTC but I failed to hit enter and submit this message)
  • 01:20 twentyafterfour: deploying phabricator update 2019-02-27
  • 01:03 twentyafterfour: preparing to deploy phabricator-2019-02-27
  • 00:55 ebernhardson@deploy1001: Synchronized php-1.33.0-wmf.19/vendor/: vendor/ruflin/Elastica: Remove scalar return type hints (duration: 01m 33s)
  • 00:22 ebernhardson@deploy1001: Synchronized vendor/: Remove scalar type hints from ruflin/Elastica (duration: 00m 58s)
  • 00:10 ebernhardson@deploy1001: Synchronized wmf-config/CommonSettings.php: T215725 Remove mediawikiwiki from wgCentralAuthAutoCreateWikis (duration: 00m 54s)
  • 00:07 ebernhardson@deploy1001: Synchronized wmf-config/: T215684 Add config for switching Wikibase search to WikibaseCirrusSearch codebase (duration: 00m 55s)

2019-02-27

  • 21:57 XioNoX: delete local pref for peering sessions in eqiad - T204281
  • 21:44 eileen: civicrm revision is c81fe7a4fd, config revision is 050abdf9e8
  • 21:26 XioNoX: delete local pref for peering sessions in eqord - T204281
  • 20:53 XioNoX: delete local pref for peering sessions in codfw/eqdfw - T204281
  • 20:50 hashar: 1.33.0-wmf.19 not rolled to group1. Pending T217285 (Wikibase raising exception on commonswiki). To be figured out during European day time.
  • 20:50 eileen: civicrm revision changed from 224bf15206 to c81fe7a4fd, config revision is d1826e371b
  • 20:14 hashar@deploy1001: rebuilt and synchronized wikiversions files: (no justification provided)
  • 20:04 hashar@deploy1001: Synchronized php: group1 wikis to 1.33.0-wmf.19 (duration: 00m 53s)
  • 20:04 hashar@deploy1001: rebuilt and synchronized wikiversions files: group1 wikis to 1.33.0-wmf.19
  • 19:49 bstorm_: stopped slave on labsbd1004 for T193264
  • 19:43 bstorm_: downtimed labsdb1004 to stop mysql for transferring data for T193264
  • 19:32 SMalyshev: repooled wdqs1005, caught up
  • 19:26 herron: replacing kafka on logstash1004 with logstash1010 T213898
  • 18:56 SMalyshev: depooled wdqs1005 to let it catch up
  • 18:36 smalyshev@deploy1001: Finished deploy [wdqs/wdqs@465673b]: Redeploy GUI for T217161 (duration: 10m 51s)
  • 18:28 cmjohnson1: powering off mw126[3-6] one at a time to move to different rack A5 T212348
  • 18:25 smalyshev@deploy1001: Started deploy [wdqs/wdqs@465673b]: Redeploy GUI for T217161
  • 18:21 cmjohnson1: powering off mw1262 to move to different rack A5 T212348
  • 18:15 cmjohnson1: powering off mw1261 to move to different rack A5 T212348
  • 17:57 niharika29@deploy1001: Synchronized php-1.33.0-wmf.19/extensions/Flow/: Make VisualEditor unwrap <section> tags T217206 (duration: 01m 00s)
  • 17:56 elukey: roll restart hadoop hdfs namenodes on an-master100[1,2] to pick up the new rack config of analytics1071
  • 17:37 niharika29@deploy1001: Synchronized wmf-config/InitialiseSettings.php: Welcome survey: add a control group to viwiki T216669 (duration: 00m 54s)
  • 17:34 niharika29@deploy1001: Synchronized wmf-config/InitialiseSettings.php: Stop collecting data for CitaitonUsage and CitationUsagePageLoad T213969 (duration: 00m 55s)
  • 17:22 elukey: drain + shutdown of analytics1071 to allow its move to A5 - T212348
  • 17:19 cmjohnson1: powering off wtp1030 to move to different rack A5 T212348
  • 17:14 cmjohnson1: powering off wtp1029 to move to different rack A5 T212348
  • 17:06 cmjohnson1: powering off wtp1029 to move to different rack A5 T212348
  • 17:05 RoanKattouw: Running foreachwikiindblist dblists/echo.dblist extensions/Echo/maintenance/removeOrphanedEvents.php on mwmaint1002
  • 16:58 hashar@deploy1001: Synchronized php-1.33.0-wmf.19/extensions/Score: Revert "beautify lilypond error message output" - T217241 (duration: 00m 56s)
  • 16:49 jijiki: Deploy LVS for eventgate-analytics - T211247
  • 16:26 volans: temporarily disabled puppet on icinga[12]001 to deploy g/493171
  • 16:21 volans: force-rebooting icinga1001 (to test some puppet changes) - T214760
  • 15:34 jbond42: rolling openssl security updates to jessie canary servers
  • 14:26 marostegui: Deploy schema change on abuse_filter_log on s7 codfw - lag will be generated on codfw - T187295
  • 14:01 marostegui: Change indexes on abuse_filter_log on db1089 - T187295
  • 14:00 moritzm: uploaded openssl 1.0.2r to jessie-wikimedia
  • 12:08 jbond42: correction: rolling updates of apache on mw api servers *not* jobrunners
  • 12:04 jbond42: rolling updates of apache on mw jobrunners
  • 11:38 marostegui@deploy1001: Synchronized wmf-config/db-eqiad.php: Fully repool db1077 after MySQL upgrade (duration: 00m 53s)
  • 11:28 godog: cleanup log4j from lvs eqiad / ipvsadm -D -t logstash.svc.eqiad.wmnet:4560
  • 11:25 marostegui@deploy1001: Synchronized wmf-config/db-eqiad.php: Give more traffic to db1077 after MySQL upgrade (duration: 00m 54s)
  • 11:17 godog: roll-restart pybal after removing logstash log4j service
  • 10:54 marostegui@deploy1001: Synchronized wmf-config/db-eqiad.php: Give more traffic to db1077 after MySQL upgrade (duration: 00m 54s)
  • 10:35 marostegui@deploy1001: Synchronized wmf-config/db-eqiad.php: Repool db1077 with low weight after MySQL upgrade (duration: 00m 53s)
  • 09:55 marostegui: Stop MySQL on db1077 for mysql upgrade - this will generate lag on labsdb:s3
  • 09:55 marostegui@deploy1001: Synchronized wmf-config/db-eqiad.php: Depool db1077 for MySQL upgrade (duration: 00m 53s)
  • 09:50 marostegui@deploy1001: Synchronized wmf-config/db-eqiad.php: Fully repool db1082 (duration: 00m 54s)
  • 09:35 marostegui@deploy1001: Synchronized wmf-config/db-eqiad.php: Repool in API db1082 after mysql upgrade (duration: 00m 53s)
  • 09:21 marostegui@deploy1001: Synchronized wmf-config/db-eqiad.php: Slowly repool db1082 after mysql upgrade (duration: 00m 54s)
  • 09:05 marostegui: Stop MySQL on db1082 for mysql upgrade
  • 08:41 godog: enable mmjsonparse by default on kafka outputs - T213189
  • 08:40 marostegui: Deploy schema change on db1082 - will generate lag on labsdb:s5 - T86342
  • 08:40 marostegui@deploy1001: Synchronized wmf-config/db-eqiad.php: Depool db1082 for mysql upgrade (duration: 00m 54s)
  • 08:26 marostegui: Retroactive log, T216444 Global rename of Дагиров Умар → Takhirgeran Umar was done by alanajjar
  • 08:02 marostegui: Global rename of HeavyTony → QTHCCAN by alanajjar - T217222
  • 07:01 marostegui: Deploy schema change on s5 codfw master (db2052), this will generate lag on codfw - T86342
  • 06:30 marostegui@deploy1001: Synchronized wmf-config/db-eqiad.php: Repool db1076 (duration: 00m 55s)
  • 06:12 marostegui@deploy1001: Synchronized wmf-config/db-eqiad.php: Depool db1076 (duration: 01m 08s)
  • 05:05 kart_: Finished manual run of unpublished ContentTranslation draft purge script (T216983)
  • 04:58 SMalyshev: repooled wdqs1006
  • 03:09 kart_: Manual run of unpublished ContentTranslation draft purge script (T216983)
  • 00:38 ebernhardson@deploy1001: Synchronized wmf-config/CirrusSearch-common.php: [cirrus] decrease regex timeouts by 25% and drop timeout hack (duration: 00m 53s)
  • 00:30 ebernhardson@deploy1001: Synchronized php-1.33.0-wmf.19/skins/MinervaNeue/resources/skins.minerva.scripts/errorLogging.js: MinervaNeue: Allow us to distinguish errors for logged in users (duration: 00m 53s)
  • 00:30 bd808: Re-enabled puppet on labweb100[12]
  • 00:23 bd808: Disabled puppet on labweb100[12]
  • 00:15 bd808: Manually changed logging level and restarted Horizon on labweb100[12]
  • 00:15 ebernhardson@deploy1001: Synchronized wmf-config/InitialiseSettings.php: [cirrus] autocomplete: enable subphrase matching for officewiki (2/2) (duration: 00m 54s)
  • 00:14 ebernhardson@deploy1001: sync-file aborted: (no justification provided) (duration: 00m 01s)
  • 00:07 ebernhardson@deploy1001: Synchronized wmf-config/InitialiseSettings.php: [cirrus] autocomplete: enable subphrase suggester builds on officewiki (1/2) (duration: 00m 54s)
  • 00:03 ebernhardson@deploy1001: Synchronized wmf-config/InitialiseSettings-labs.php: noop sync for labs files gerrit:493103 (duration: 00m 54s)

2019-02-26

  • 23:39 tgr: T217203 running mwscript ~/fixStuckGlobalRename.php --wiki=metawiki --logwiki=metawiki 'LaurenceKingPublishing' 'Fiona at Laurence King Publishing'
  • 23:37 tgr: T217203 running mwscript ~/fixStuckGlobalRename.php --wiki=metawiki --logwiki=metawiki 'Citycarclubfi' 'Urbaanimies'
  • 23:16 SMalyshev: depooled wdqs1006 to see if it's catch up
  • 22:43 akosiaris@deploy1001: scap-helm eventgate-analytics finished
  • 22:43 akosiaris@deploy1001: scap-helm eventgate-analytics cluster staging completed
  • 22:43 akosiaris@deploy1001: scap-helm eventgate-analytics upgrade -f eventgate-analytics-staging-values.yaml staging stable/eventgate-analytics [namespace: eventgate-analytics, clusters: staging]
  • 22:43 akosiaris@deploy1001: scap-helm eventgate-analytics finished
  • 22:43 akosiaris@deploy1001: scap-helm eventgate-analytics cluster eqiad completed
  • 22:43 akosiaris@deploy1001: scap-helm eventgate-analytics upgrade -f eventgate-analytics-eqiad-values.yaml production stable/eventgate-analytics [namespace: eventgate-analytics, clusters: eqiad]
  • 22:43 akosiaris@deploy1001: scap-helm eventgate-analytics finished
  • 22:43 akosiaris@deploy1001: scap-helm eventgate-analytics cluster codfw completed
  • 22:43 akosiaris@deploy1001: scap-helm eventgate-analytics upgrade -f eventgate-analytics-codfw-values.yaml production stable/eventgate-analytics [namespace: eventgate-analytics, clusters: codfw]
  • 22:42 akosiaris@deploy1001: scap-helm eventgate-analytics upgrade -f eventgate-analytics-staging-values.yaml stable/eventgate-analytics [namespace: eventgate-analytics, clusters: staging]
  • 22:42 akosiaris@deploy1001: scap-helm eventgate-analytics upgrade -f eventgate-analytics-eqiad-values.yaml stable/eventgate-analytics [namespace: eventgate-analytics, clusters: eqiad]
  • 22:42 akosiaris@deploy1001: scap-helm eventgate-analytics upgrade -f eventgate-analytics-codfw-values.yaml stable/eventgate-analytics [namespace: eventgate-analytics, clusters: codfw]
  • 22:13 XioNoX: delete local pref for peering sessions in eqsin - T204281
  • 19:12 reedy@deploy1001: Synchronized php-1.33.0-wmf.19/extensions/EventBus/: T217145 (duration: 00m 54s)
  • 18:24 arlolra: Updated Parsoid to e82347d (T204608, T214099, T217093)
  • 18:17 arlolra@deploy1001: Finished deploy [parsoid/deploy@ae76aa2]: Updating Parsoid to e82347d (duration: 11m 03s)
  • 18:06 arlolra@deploy1001: Started deploy [parsoid/deploy@ae76aa2]: Updating Parsoid to e82347d
  • 16:38 cdanis: cdanis@krypton sudo apt-get remove grafana
  • 16:35 otto@deploy1001: scap-helm eventgate-analytics finished
  • 16:35 otto@deploy1001: scap-helm eventgate-analytics cluster codfw completed
  • 16:35 otto@deploy1001: scap-helm eventgate-analytics install -n production -f eventgate-analytics-codfw-values.yaml stable/eventgate-analytics [namespace: eventgate-analytics, clusters: codfw]
  • 16:35 otto@deploy1001: scap-helm eventgate-analytics finished
  • 16:35 otto@deploy1001: scap-helm eventgate-analytics cluster eqiad completed
  • 16:35 otto@deploy1001: scap-helm eventgate-analytics install -n production -f eventgate-analytics-eqiad-values.yaml stable/eventgate-analytics [namespace: eventgate-analytics, clusters: eqiad]
  • 16:34 otto@deploy1001: scap-helm eventgate-analytics install -n production eventgate-analytics-eqiad-values.yaml stable/eventgate-analytics [namespace: eventgate-analytics, clusters: eqiad]
  • 16:24 jijiki: Restarting memcached on mc1028 - T208844
  • 16:14 hashar@deploy1001: rebuilt and synchronized wikiversions files: group0 to 1.33.0-wmf.19 # T206673
  • 16:09 herron: elasticsearch stopped on logstash100[456] T213898
  • 16:07 otto@deploy1001: scap-helm eventgate-analytics finished
  • 16:07 otto@deploy1001: scap-helm eventgate-analytics cluster staging completed
  • 16:07 otto@deploy1001: scap-helm eventgate-analytics upgrade staging stable/eventgate-analytics --set main_app.version=v1.0.0-rc2 [namespace: eventgate-analytics, clusters: staging]
  • 16:01 otto@deploy1001: scap-helm eventgate-analytics finished
  • 16:01 otto@deploy1001: scap-helm eventgate-analytics cluster staging completed
  • 16:01 otto@deploy1001: scap-helm eventgate-analytics upgrade staging stable/eventgate-analytics [namespace: eventgate-analytics, clusters: staging]
  • 16:00 herron: re-enabling ircecho
  • 16:00 hashar@deploy1001: Finished scap: testwiki to php-1.33.0-wmf.19 and rebuild l10n cache # T206673 (duration: 58m 17s)
  • 15:47 akosiaris@deploy1001: scap-helm mathoid finished
  • 15:47 akosiaris@deploy1001: scap-helm mathoid cluster codfw completed
  • 15:47 akosiaris@deploy1001: scap-helm mathoid cluster eqiad completed
  • 15:47 akosiaris@deploy1001: scap-helm mathoid upgrade -f mathoid-values.yaml production stable/mathoid [namespace: mathoid, clusters: eqiad,codfw]
  • 15:43 akosiaris@deploy1001: scap-helm mathoid finished
  • 15:43 akosiaris@deploy1001: scap-helm mathoid cluster staging completed
  • 15:43 akosiaris@deploy1001: scap-helm mathoid upgrade -f mathoid-staging-values.yaml staging stable/mathoid [namespace: mathoid, clusters: staging]
  • 15:21 godog: force puppet run on failed agents in codfw
  • 15:17 herron: stopped ircecho to squelch puppet run alerts
  • 15:13 godog: poweroff ms-be2030 - T204567
  • 15:02 hashar@deploy1001: Started scap: testwiki to php-1.33.0-wmf.19 and rebuild l10n cache # T206673
  • 15:02 otto@deploy1001: scap-helm eventgate-analytics finished
  • 15:02 otto@deploy1001: scap-helm eventgate-analytics cluster staging completed
  • 15:02 otto@deploy1001: scap-helm eventgate-analytics upgrade staging stable/eventgate-analytics [namespace: eventgate-analytics, clusters: staging]
  • 14:58 otto@deploy1001: scap-helm eventgate-analytics finished
  • 14:58 otto@deploy1001: scap-helm eventgate-analytics cluster staging completed
  • 14:58 otto@deploy1001: scap-helm eventgate-analytics upgrade staging stable/eventgate-analytics [namespace: eventgate-analytics, clusters: staging]
  • 14:36 hashar@deploy1001: Pruned MediaWiki: 1.33.0-wmf.19 (duration: 04m 42s)
  • 14:20 hashar: Applied 1.33.0-wmf.19 security patches | T206673
  • 14:00 marostegui@deploy1001: Synchronized wmf-config/db-eqiad.php: Repool db1122 (duration: 00m 45s)
  • 13:37 hashar: cutting deployment branch 1.33.0-wmf.19
  • 13:23 marostegui@deploy1001: Synchronized wmf-config/db-eqiad.php: Depool db1122 (duration: 00m 46s)
  • 12:14 moritzm: uploaded php7.2 7.2.15-1+0~20190209065123.16+stretch~1.gbp3ad8c0+wmf1 to component/php72 (T216712)
  • 11:25 marostegui@deploy1001: Synchronized wmf-config/db-eqiad.php: Full repool db1074 (duration: 00m 46s)
  • 11:12 jiji@deploy1001: Finished deploy [3d2png/deploy@ca39432]: (no justification provided) (duration: 00m 38s)
  • 11:12 jijiki: Pooling thumbor2004 - T214597
  • 11:11 jiji@deploy1001: Started deploy [3d2png/deploy@ca39432]: (no justification provided)
  • 11:09 marostegui@deploy1001: Synchronized wmf-config/db-eqiad.php: Slowly repool db1074 into API (duration: 00m 45s)
  • 10:51 marostegui@deploy1001: Synchronized wmf-config/db-eqiad.php: Slowly repool db1074 (duration: 00m 46s)
  • 10:35 marostegui: Stop MySQL on db1074 for upgrade
  • 10:20 marostegui: Deploy schema change on db1074, this will generate lag on labsdb for s2 - T86342
  • 10:20 marostegui@deploy1001: Synchronized wmf-config/db-eqiad.php: Depool db1074 (duration: 00m 46s)
  • 10:12 godog: bounce gerrit on gerrit2001 and cobalt after https://gerrit.wikimedia.org/r/c/operations/puppet/+/492633 - T213899
  • 09:10 jynus: temporarilly stop dbstore1001:s1replication to perform new backup system test
  • 09:04 jijiki: Pooling thumbor1003
  • 08:48 jiji@deploy1001: Finished deploy [3d2png/deploy@ca39432]: (no justification provided) (duration: 00m 49s)
  • 08:47 jiji@deploy1001: Started deploy [3d2png/deploy@ca39432]: (no justification provided)
  • 08:45 moritzm: installing elfutils security updates
  • 08:42 marostegui@deploy1001: Synchronized wmf-config/db-eqiad.php: Repool db1080 (duration: 00m 45s)
  • 08:31 marostegui@deploy1001: Synchronized wmf-config/db-eqiad.php: Depool db1080 (duration: 00m 46s)
  • 08:08 jijiki: Depool and reimage thumbor2004 - T214597
  • 08:07 jijiki: Pooling thumbor2003 - T214597
  • 08:04 jiji@deploy1001: Finished deploy [3d2png/deploy@ca39432]: (no justification provided) (duration: 00m 30s)
  • 08:04 jiji@deploy1001: Started deploy [3d2png/deploy@ca39432]: (no justification provided)
  • 07:54 elukey: removed /rmstore-analytics-test-hadoop from zookeeper main-eqiad - T216952
  • 07:45 _joe_: publishing golang:1.11.5-1 docker image
  • 07:44 moritzm: installing tiff security updates
  • 07:02 marostegui: Deploy schema change on s2 codfw (this will generate lag on s2 codfw) T86342
  • 06:53 marostegui@deploy1001: Synchronized wmf-config/db-eqiad.php: Repool db1097:3314 (duration: 00m 45s)
  • 06:50 jijiki: Depool and reimage thumbor1003 and thumbor2003 - T214597
  • 06:46 jiji@deploy1001: Finished deploy [3d2png/deploy@ca39432]: (no justification provided) (duration: 00m 07s)
  • 06:46 jiji@deploy1001: Started deploy [3d2png/deploy@ca39432]: (no justification provided)
  • 06:45 marostegui@deploy1001: Synchronized wmf-config/db-eqiad.php: Depool db1097:3314 (duration: 00m 45s)
  • 06:42 marostegui@deploy1001: Synchronized wmf-config/db-eqiad.php: Repool db1088 (duration: 00m 45s)
  • 06:41 jijiki: Pooling tthumbor1002
  • 06:35 marostegui@deploy1001: Synchronized wmf-config/db-eqiad.php: Repool db1083 (duration: 00m 46s)
  • 06:34 tgr: T215107 running mwscript extensions/CentralAuth/maintenance/fixStuckGlobalRename.php --wiki=commonswiki --logwiki=metawiki --ignorestatus 'The_Photographer' 'Wilfredor'
  • 06:28 marostegui@deploy1001: Synchronized wmf-config/db-eqiad.php: Depool db1088 T86342 (duration: 00m 48s)
  • 06:17 marostegui: Change abuse_filter_log indexes on db1083 - T187295
  • 06:17 marostegui@deploy1001: Synchronized wmf-config/db-eqiad.php: Depool db1083 T187295 (duration: 00m 51s)
  • 06:10 tgr: T215107 running mwscript extensions/CentralAuth/maintenance/fixStuckGlobalRename.php --wiki=commonswiki --logwiki=metawiki 'The_Photographer' 'Wilfredor'
  • 04:25 kart_: Finished manual run of unpublished ContentTranslation draft purge script (T216983)
  • 04:24 eileen-sorting-k: civicrm revision changed from d1fc603677 to 224bf15206, config revision is d1826e371b
  • 03:06 kart_: Manual run of unpublished ContentTranslation draft purge script (T216983)

2019-02-25

  • 23:50 XioNoX: Re-enabled BGP to Zayo on cr2-codfw - T215193
  • 23:15 herron: service restarts to make logstash101[012] master eligible are taking longer than expected, leaving elasticsearch on logstash100[456] enabled overnight T213898
  • 22:56 mholloway-shell@deploy1001: Started restart [mobileapps/deploy@1ac3c38]: Restarting mobileapps on scb2003
  • 21:54 eileen: update process-control config revision is d1826e371b
  • 21:14 arlolra@deploy1001: Finished deploy [parsoid/deploy@cb62482]: Updating Parsoid to a8fe45e (duration: 04m 19s)
  • 21:11 herron: turning down elasticsearch service on logstash100[456] (data has been migrated to logstash101[012]) T213898
  • 21:10 arlolra@deploy1001: Started deploy [parsoid/deploy@cb62482]: Updating Parsoid to a8fe45e
  • 21:09 mholloway-shell@deploy1001: Finished deploy [mobileapps/deploy@1ac3c38]: Update mobileapps to c3871cc (duration: 03m 48s)
  • 21:05 mholloway-shell@deploy1001: Started deploy [mobileapps/deploy@1ac3c38]: Update mobileapps to c3871cc
  • 19:58 reedy@deploy1001: Synchronized wmf-config/CommonSettings.php: Use EventBus multi endpoint configuration for eventbus configs (duration: 00m 45s)
  • 19:53 reedy@deploy1001: Synchronized wmf-config/InitialiseSettings.php: Swat! (duration: 00m 45s)
  • 19:46 reedy@deploy1001: Synchronized dblists/mobilemainpagelegacy.dblist: Disable MFSpecialCaseMainPage for srwiki and enwikivoyage (duration: 00m 46s)
  • 19:41 vgutierrez: restarting pybal on lvs5003 - T213121
  • 19:35 reedy@deploy1001: Synchronized php-1.33.0-wmf.18/extensions/Renameuser: T215107 (duration: 00m 46s)
  • 19:31 reedy@deploy1001: Synchronized wmf-config/InitialiseSettings-labs.php: labs! (duration: 00m 46s)
  • 18:44 krinkle@deploy1001: Synchronized php-1.33.0-wmf.18/includes/libs/objectcache/WANObjectCache.php: 79a1593cae48 / T203786 (duration: 00m 48s)
  • 18:18 jijiki: Pooling thumbor2001
  • 18:18 jiji@deploy1001: Finished deploy [3d2png/deploy@ca39432]: (no justification provided) (duration: 01m 09s)
  • 18:16 jiji@deploy1001: Started deploy [3d2png/deploy@ca39432]: (no justification provided)
  • 18:13 onimisionipe@deploy1001: Finished deploy [wdqs/wdqs@4c27682]: New GUI, Updater & Blazegraph builds (duration: 09m 53s)
  • 18:04 onimisionipe@deploy1001: Started deploy [wdqs/wdqs@4c27682]: New GUI, Updater & Blazegraph builds
  • 17:59 jijiki: Depooling and reimaging thumbor1002 to stretch - T214597
  • 17:58 vgutierrez@cumin1001: END (PASS) - Cookbook sre.hosts.decommission (exit_code=0)
  • 17:58 vgutierrez@cumin1001: START - Cookbook sre.hosts.decommission
  • 17:42 vgutierrez@cumin1001: END (FAIL) - Cookbook sre.hosts.decommission (exit_code=99)
  • 17:42 vgutierrez@cumin1001: START - Cookbook sre.hosts.decommission
  • 17:26 vgutierrez@cumin1001: END (FAIL) - Cookbook sre.hosts.decommission (exit_code=99)
  • 17:26 vgutierrez@cumin1001: START - Cookbook sre.hosts.decommission
  • 16:48 thcipriani@deploy1001: Synchronized README: noop sync for scap 3.9.0-1 (duration: 00m 46s)
  • 16:43 vgutierrez@cumin1001: END (FAIL) - Cookbook sre.hosts.decommission (exit_code=99)
  • 16:43 vgutierrez@cumin1001: START - Cookbook sre.hosts.decommission
  • 16:41 jijiki: Pooling thumbor1001
  • 16:40 jiji@deploy1001: Finished deploy [3d2png/deploy@ca39432]: (no justification provided) (duration: 00m 04s)
  • 16:40 jiji@deploy1001: Started deploy [3d2png/deploy@ca39432]: (no justification provided)
  • 16:23 chasemp: reset 2fa for JBennett on phab with video confirmation
  • 16:21 jijiki: Depooling and reimaging thumbor2001 - T214597
  • 16:17 fsero: upload envoy 1.9.0 to stretch-wikimedia T215810
  • 15:39 marostegui@deploy1001: Synchronized wmf-config/db-eqiad.php: Fully repool in API db1085 after MySQL upgrade (duration: 00m 45s)
  • 15:35 vgutierrez@cumin1001: END (FAIL) - Cookbook sre.hosts.decommission (exit_code=99)
  • 15:35 vgutierrez@cumin1001: START - Cookbook sre.hosts.decommission
  • 15:34 vgutierrez@cumin1001: END (FAIL) - Cookbook sre.hosts.decommission (exit_code=99)
  • 15:34 vgutierrez@cumin1001: START - Cookbook sre.hosts.decommission
  • 15:33 vgutierrez@cumin1001: END (FAIL) - Cookbook sre.hosts.decommission (exit_code=99)
  • 15:33 vgutierrez@cumin1001: START - Cookbook sre.hosts.decommission
  • 15:30 akosiaris@deploy1001: scap-helm citoid finished
  • 15:30 akosiaris@deploy1001: scap-helm citoid cluster staging completed
  • 15:30 akosiaris@deploy1001: scap-helm citoid install -n staging -f citoid-staging-values.yaml stable/citoid [namespace: citoid, clusters: staging]
  • 15:28 jiji@deploy1001: Finished deploy [3d2png/deploy@ca39432]: (no justification provided) (duration: 00m 07s)
  • 15:28 jiji@deploy1001: Started deploy [3d2png/deploy@ca39432]: (no justification provided)
  • 15:27 jiji@deploy1001: deploy aborted: (no justification provided) (duration: 00m 04s)
  • 15:27 jiji@deploy1001: Started deploy [3d2png/deploy@ca39432]: (no justification provided)
  • 15:20 vgutierrez: shutting down certcentral VMs for decommission - T207389
  • 15:18 marostegui@deploy1001: Synchronized wmf-config/db-eqiad.php: Increase API traffic for db1085 after MySQL upgrade (duration: 00m 45s)
  • 15:04 marostegui@deploy1001: Synchronized wmf-config/db-eqiad.php: Increase traffic for db1085 after MySQL upgrade (duration: 00m 45s)
  • 14:50 marostegui@deploy1001: Synchronized wmf-config/db-eqiad.php: Increase traffic for db1085 after MySQL upgrade (duration: 00m 45s)
  • 14:49 jiji@deploy1001: Finished deploy [3d2png/deploy@ca39432]: (no justification provided) (duration: 00m 15s)
  • 14:49 jiji@deploy1001: Started deploy [3d2png/deploy@ca39432]: (no justification provided)
  • 14:47 jiji@deploy1001: deploy aborted: (no justification provided) (duration: 00m 19s)
  • 14:46 jiji@deploy1001: Started deploy [3d2png/deploy@ca39432]: (no justification provided)
  • 14:32 marostegui@deploy1001: Synchronized wmf-config/db-eqiad.php: Slowly repool into API db1085 after MySQL upgrade (duration: 00m 45s)
  • 14:15 marostegui@deploy1001: Synchronized wmf-config/db-eqiad.php: Slowly repool db1085 after MySQL upgrade (duration: 00m 45s)
  • 14:04 marostegui: Stop MySQL on db1085 for mysql upgrade
  • 13:55 marostegui@deploy1001: Synchronized wmf-config/db-eqiad.php: Depool db1085 for MySQL upgrade and schema change (duration: 00m 46s)
  • 13:32 akosiaris: upgrade etherpad-lite to 1.7.5
  • 12:38 gilles@deploy1001: Finished deploy [3d2png/deploy@ca39432]: (no justification provided) (duration: 00m 07s)
  • 12:38 gilles@deploy1001: Started deploy [3d2png/deploy@ca39432]: (no justification provided)
  • 12:27 gilles@deploy1001: Finished deploy [3d2png/deploy@ca39432]: (no justification provided) (duration: 00m 05s)
  • 12:27 gilles@deploy1001: Started deploy [3d2png/deploy@ca39432]: (no justification provided)
  • 12:22 gilles@deploy1001: Finished deploy [3d2png/deploy@ca39432]: (no justification provided) (duration: 01m 15s)
  • 12:21 gilles@deploy1001: Started deploy [3d2png/deploy@ca39432]: (no justification provided)
  • 11:49 moritzm: rolling out intel-microcode 3.20180807a.2 on all jessie/stretch servers, tests on a number of previously unsupported servers with Westmere CPU were successful and I've verified that all other microcode files are identical compared to the current 3.20180807a.1 microcode
  • 11:19 jijiki: Reimageing thumbor1001 - T214597
  • 10:40 jdrewniak@deploy1001: Synchronized portals: Wikimedia Portals Update: Bumping portals to master (T128546, T202497) (duration: 00m 46s)
  • 10:39 jdrewniak@deploy1001: Synchronized portals/wikipedia.org/assets: Wikimedia Portals Update: Bumping portals to master (T128546, T202497) (duration: 00m 46s)
  • 10:32 gtirloni: labstore1004 restarted nfsd and killed stuck rpc.mountd.real processed (T216988)
  • 10:16 jijiki: Depooling thumbor1001 to reimage - T214597
  • 09:54 marostegui: Deploy schema change on db1074, this will generate lag on labsdb:s2 - T187295
  • 09:07 marostegui@deploy1001: Synchronized wmf-config/InitialiseSettings.php: Increase ParserCache TTL from 24 days to 30 - T210992 (duration: 00m 46s)
  • 08:52 marostegui: Deploy schema change on s2 on codfw master - lag will happen on s2 codfw - T187295
  • 08:49 _joe_: generating mcrouter certificate for mw2151 T192457
  • 07:09 marostegui@deploy1001: Synchronized wmf-config/db-eqiad.php: Fully repool db1104 after MySQL upgrade (duration: 00m 45s)
  • 06:28 marostegui@deploy1001: Synchronized wmf-config/db-eqiad.php: Repool db1104 in API after MySQL upgrade (duration: 00m 45s)
  • 06:13 marostegui@deploy1001: Synchronized wmf-config/db-eqiad.php: Slowly repool db1104 after MySQL upgrade (duration: 00m 45s)
  • 06:02 marostegui: Stop MySQL on db1104 for mysql upgrade
  • 06:02 marostegui@deploy1001: Synchronized wmf-config/db-eqiad.php: Depool db1104 for MySQL upgrade (duration: 00m 50s)

2019-02-24

  • 21:49 eileen: civicrm revision changed from 1b5d974569 to d1fc603677, config revision is 00f9c08766
  • 18:20 elukey: clean up 2017/2018 log files in /var/log/jmxtrans on kafka1013-22 - root partitions filling up
  • 18:15 elukey: clean up 2017/2018 log files in /var/log/jmxtrans - root partition almost filled up on kafka1012
  • 10:22 elukey: force remount of /mnt/hdfs on an-coord1001 (fuse-hdfs stuck)

2019-02-22

  • 18:02 gehel: rolling upgrade on elasticsearch / cirrus / eqiad completed - T215931
  • 18:00 gehel@cumin2001: END (PASS) - Cookbook sre.elasticsearch.force-shard-allocation (exit_code=0)
  • 18:00 gehel@cumin2001: START - Cookbook sre.elasticsearch.force-shard-allocation
  • 17:33 gehel@cumin2001: END (PASS) - Cookbook sre.elasticsearch.rolling-upgrade (exit_code=0)
  • 17:33 gehel@cumin2001: START - Cookbook sre.elasticsearch.rolling-upgrade
  • 17:33 gehel@cumin2001: END (ERROR) - Cookbook sre.elasticsearch.rolling-upgrade (exit_code=97)
  • 17:33 gehel@cumin2001: END (PASS) - Cookbook sre.elasticsearch.force-shard-allocation (exit_code=0)
  • 17:33 gehel@cumin2001: START - Cookbook sre.elasticsearch.force-shard-allocation
  • 17:14 bblack: cp5007: repooling into service - T216716
  • 17:13 bblack: cp5006: repooling into service - T216717
  • 17:06 gehel@cumin2001: END (PASS) - Cookbook sre.elasticsearch.force-shard-allocation (exit_code=0)
  • 17:06 gehel@cumin2001: START - Cookbook sre.elasticsearch.force-shard-allocation
  • 16:29 gehel@cumin2001: END (PASS) - Cookbook sre.elasticsearch.force-shard-allocation (exit_code=0)
  • 16:29 gehel@cumin2001: START - Cookbook sre.elasticsearch.force-shard-allocation
  • 15:33 gehel@cumin2001: END (PASS) - Cookbook sre.elasticsearch.force-shard-allocation (exit_code=0)
  • 15:32 gehel@cumin2001: START - Cookbook sre.elasticsearch.force-shard-allocation
  • 15:15 gehel@cumin2001: END (PASS) - Cookbook sre.elasticsearch.force-shard-allocation (exit_code=0)
  • 15:15 gehel@cumin2001: START - Cookbook sre.elasticsearch.force-shard-allocation
  • 14:23 gehel@cumin2001: END (PASS) - Cookbook sre.elasticsearch.force-shard-allocation (exit_code=0)
  • 14:22 gehel@cumin2001: START - Cookbook sre.elasticsearch.force-shard-allocation
  • 14:03 moritzm: removed labvirt1008 from debmonitor (T216661)
  • 14:02 gehel@cumin2001: END (PASS) - Cookbook sre.elasticsearch.force-shard-allocation (exit_code=0)
  • 14:02 gehel@cumin2001: START - Cookbook sre.elasticsearch.force-shard-allocation
  • 13:54 akosiaris: reboot helium for kernel/microcode updates
  • 13:25 moritzm: installing wireshark security updates
  • 13:19 gehel@cumin2001: END (PASS) - Cookbook sre.elasticsearch.force-shard-allocation (exit_code=0)
  • 13:17 gehel@cumin2001: START - Cookbook sre.elasticsearch.force-shard-allocation
  • 13:09 gehel@cumin2001: END (PASS) - Cookbook sre.elasticsearch.force-shard-allocation (exit_code=0)
  • 13:09 gehel@cumin2001: START - Cookbook sre.elasticsearch.force-shard-allocation
  • 13:01 gehel@cumin2001: END (PASS) - Cookbook sre.elasticsearch.force-shard-allocation (exit_code=0)
  • 13:00 gehel@cumin2001: START - Cookbook sre.elasticsearch.force-shard-allocation
  • 12:56 gehel@cumin2001: START - Cookbook sre.elasticsearch.rolling-upgrade
  • 12:48 gehel@cumin2001: END (ERROR) - Cookbook sre.elasticsearch.rolling-upgrade (exit_code=97)
  • 12:43 moritzm: rebooting auth1002 for kernel update
  • 12:17 moritzm: rebooting tungsten to pick up updated microcode to address SSBD/L1TF
  • 12:13 gehel@cumin2001: END (PASS) - Cookbook sre.elasticsearch.force-shard-allocation (exit_code=0)
  • 12:12 gehel@cumin2001: START - Cookbook sre.elasticsearch.force-shard-allocation
  • 12:12 gehel@cumin2001: START - Cookbook sre.elasticsearch.rolling-upgrade
  • 11:54 moritzm: various reboots of servers with Westmere-EP CPUs to pick up updated microcode to address SSBD/L1TF
  • 11:41 gehel@cumin2001: END (PASS) - Cookbook sre.elasticsearch.force-shard-allocation (exit_code=0)
  • 11:41 gehel@cumin2001: START - Cookbook sre.elasticsearch.force-shard-allocation
  • 11:34 gehel@cumin2001: END (PASS) - Cookbook sre.elasticsearch.force-shard-allocation (exit_code=0)
  • 11:34 moritzm: rebooting cp1008 for some microcode test
  • 11:33 gehel@cumin2001: START - Cookbook sre.elasticsearch.force-shard-allocation
  • 11:32 jijiki: Pooling thumbor2002 after upgrade - T214597
  • 11:20 moritzm: imported intel-microcode 3.20180807a.2 for jessie-wikimedia (T216802)
  • 11:01 godog: swift eqiad set thumbor write ACLs for wikipedia-meta-local-thumb
  • 10:37 gehel@cumin2001: END (ERROR) - Cookbook sre.elasticsearch.rolling-upgrade (exit_code=97)
  • 10:36 gehel@cumin2001: END (PASS) - Cookbook sre.elasticsearch.force-shard-allocation (exit_code=0)
  • 10:35 gehel@cumin2001: START - Cookbook sre.elasticsearch.force-shard-allocation
  • 10:15 jijiki: Pooling thumbor1004 after upgrade - T214597
  • 09:55 gehel@cumin2001: END (PASS) - Cookbook sre.elasticsearch.force-shard-allocation (exit_code=0)
  • 09:51 gehel@cumin2001: START - Cookbook sre.elasticsearch.force-shard-allocation
  • 09:51 moritzm: fixed package state on mw2167
  • 09:38 akosiaris@deploy1001: scap-helm citoid install -n staging -f citoid-staging-values.yaml stable/citoid [namespace: citoid, clusters: staging]
  • 09:33 gehel@cumin2001: END (PASS) - Cookbook sre.elasticsearch.force-shard-allocation (exit_code=0)
  • 09:33 moritzm: installing tor security update on torrelay1001
  • 09:33 gehel@cumin2001: START - Cookbook sre.elasticsearch.force-shard-allocation
  • 09:32 _joe_: set pooled=inactive on mw1272, T211668
  • 09:26 gehel@cumin2001: START - Cookbook sre.elasticsearch.rolling-upgrade
  • 09:22 gilles@deploy1001: Finished deploy [3d2png/deploy@ca39432]: (no justification provided) (duration: 00m 16s)
  • 09:22 gilles@deploy1001: Started deploy [3d2png/deploy@ca39432]: (no justification provided)
  • 09:22 moritzm: updated tor packages to 0.3.5.8-1~d90.stretch+1
  • 09:18 gehel@cumin2001: END (FAIL) - Cookbook sre.elasticsearch.rolling-upgrade (exit_code=99)
  • 09:16 gilles@deploy1001: Finished deploy [3d2png/deploy@ca39432]: (no justification provided) (duration: 00m 14s)
  • 09:16 gilles@deploy1001: Started deploy [3d2png/deploy@ca39432]: (no justification provided)
  • 09:16 gehel@cumin2001: START - Cookbook sre.elasticsearch.rolling-upgrade
  • 09:16 gehel: starting rolling upgrade on elasticsearch / cirrus / eqiad - T215931
  • 08:52 godog: force ftpsync run on sodium after debian mirror update
  • 08:19 moritzm: installing uriparser security updates
  • 08:18 godog: temporarily stop prometheus global on prometheus2004 to take a snapshot
  • 07:47 moritzm: installing krb5 updates for jessie
  • 07:46 marostegui@deploy1001: Synchronized wmf-config/db-eqiad.php: Fully repool es1013 after MySQL upgrade (duration: 00m 46s)
  • 07:28 elukey: manually delete WANCache:v:metawiki:translate-groups from memcache on mc1022 to test fix for T203786
  • 07:21 marostegui@deploy1001: Synchronized wmf-config/db-eqiad.php: Give more traffic to es1013 after MySQL upgrade (duration: 00m 45s)
  • 07:15 _joe_: deactivating mw1272, memory problems
  • 07:03 marostegui@deploy1001: Synchronized wmf-config/db-eqiad.php: Slowly repool es1013 after MySQL upgrade (duration: 00m 45s)
  • 06:51 marostegui: Power cycle mw1272 as it crashed - T211668
  • 06:49 marostegui: Stop MySQL on es1013 to upgrade MySQL
  • 06:48 marostegui@deploy1001: Synchronized wmf-config/db-eqiad.php: Depool es1013 for MySQL upgrade (duration: 02m 50s)
  • 06:30 marostegui@deploy1001: Synchronized wmf-config/db-eqiad.php: Repool db1087 after MySQL upgrade (duration: 02m 51s)
  • 06:16 marostegui@deploy1001: Synchronized wmf-config/db-eqiad.php: Depool db1087 for MySQL upgrade (duration: 02m 53s)
  • 06:15 marostegui: Stop MySQL on db1087 for kernel and mysql upgrade
  • 03:26 XioNoX: delete old gr-1/0/0 from cr1-eqsin - T213121
  • 01:58 XioNoX: power-down cp5007 - T216716
  • 01:40 XioNoX: power-down cp5006 - T216717
  • 00:57 ebernhardson@deploy1001: Synchronized wmf-config/InitialiseSettings-labs.php: Noop sync of labs settings (duration: 00m 44s)
  • 00:46 ebernhardson@deploy1001: Synchronized wmf-config/InitialiseSettings.php: T215931 [cirrus] Switch production search traffic to codfw (2/2) (duration: 00m 46s)
  • 00:45 ebernhardson@deploy1001: sync-file aborted: T215931 [cirrus] Switch production search traffic to codfw (2/2) (duration: 00m 05s)
  • 00:39 ebernhardson@deploy1001: Synchronized wmf-config/Wikibase.php: Deploy WikibaseCirrusSearch: Part III, Wikibase.php (duration: 00m 45s)
  • 00:27 ebernhardson@deploy1001: Synchronized wmf-config/InitialiseSettings.php: Deploy WikibaseCirrusSearch: Part II, InitialiseSettings.php (duration: 00m 46s)
  • 00:23 ebernhardson@deploy1001: Synchronized wmf-config/extension-list: Deploy WikibaseCirrusSearch: Part I, extensionlist (duration: 00m 46s)
  • 00:21 ebernhardson@deploy1001: Synchronized wmf-config/InitialiseSettings.php: T215931 [cirrus] Switch production search traffic to codfw (1/2) (duration: 00m 45s)
  • 00:18 ebernhardson@deploy1001: Synchronized wmf-config/InitialiseSettings.php: T215931 [cirrus] Switch production search traffic to codfw (1/2) (duration: 00m 46s)
  • 00:17 ebernhardson@deploy1001: sync-file aborted: T215931 (duration: 00m 00s)

2019-02-21

  • 22:25 tzatziki: change pw for NazarSusP
  • 22:17 volans: forcing a puppet run on A:ganeti
  • 20:35 gehel@cumin2001: END (PASS) - Cookbook sre.elasticsearch.rolling-upgrade (exit_code=0)
  • 20:18 thcipriani@deploy1001: rebuilt and synchronized wikiversions files: all wikis to 1.33.0-wmf.18
  • 20:06 ladsgroup@deploy1001: Finished deploy [ores/deploy@5d937b1]: Drop accepting pickle altogether (T206333) (duration: 13m 17s)
  • 19:58 bblack: eqsin: repooling user traffic
  • 19:52 ladsgroup@deploy1001: Started deploy [ores/deploy@5d937b1]: Drop accepting pickle altogether (T206333)
  • 19:35 ladsgroup@deploy1001: Synchronized wmf-config/InitialiseSettings.php: SWAT: Drop obsolete Wikibase configs (T213713), Part II (duration: 00m 53s)
  • 19:33 ladsgroup@deploy1001: Synchronized wmf-config/Wikibase.php: SWAT: Drop obsolete Wikibase configs (T213713), Part I (duration: 00m 52s)
  • 19:32 gehel@cumin2001: END (PASS) - Cookbook sre.elasticsearch.force-shard-allocation (exit_code=0)
  • 19:32 gehel@cumin2001: START - Cookbook sre.elasticsearch.force-shard-allocation
  • 19:25 gehel@cumin2001: START - Cookbook sre.elasticsearch.rolling-upgrade
  • 19:19 ladsgroup@deploy1001: Synchronized wmf-config/InitialiseSettings.php: SWAT Set wmgWikibaseRepoIdGeneratorSeparateDbConnection to true for wikidata (T215147) (duration: 00m 56s)
  • 18:59 ladsgroup@deploy1001: Finished deploy [ores/deploy@2d84709]: Change default task serializer of celery from pickle to json (T206333) (duration: 16m 54s)
  • 18:46 jynus: shutting down db1114 T214720
  • 18:42 ladsgroup@deploy1001: Started deploy [ores/deploy@2d84709]: Change default task serializer of celery from pickle to json (T206333)
  • 18:33 gehel@cumin2001: END (ERROR) - Cookbook sre.elasticsearch.rolling-upgrade (exit_code=97)
  • 18:30 robh: ignore icinga1001 alerts, rebooting it into hardware tests via T214760
  • 18:29 gehel@cumin2001: END (PASS) - Cookbook sre.elasticsearch.force-shard-allocation (exit_code=0)
  • 18:28 gehel@cumin2001: START - Cookbook sre.elasticsearch.force-shard-allocation
  • 18:28 ladsgroup@deploy1001: Finished deploy [ores/deploy@5d50713]: (no justification provided) (duration: 14m 37s)
  • 18:13 ladsgroup@deploy1001: Started deploy [ores/deploy@5d50713]: (no justification provided)
  • 17:54 robh: cp5007 rebooting into bios update and hardware testing via T216716
  • 17:47 gehel@cumin2001: START - Cookbook sre.elasticsearch.rolling-upgrade
  • 17:11 bblack: eqsin: restarting all varnish frontends to wipe cache after purge loss (site currently depooled) (skipping 5006/7 since they're being rebooted for bios flashing anyways)
  • 17:10 robh: rebooting cp5006 to flash bios in memory troubleshooting steps via T216717
  • 16:50 bblack: eqsin: restarting all varnish backends to wipe cache after purge loss (site currently depooled)
  • 16:41 volans: applied hot band-aid patch to spicerack/remote.py on cumin2001 ( https://gerrit.wikimedia.org/r/c/operations/software/spicerack/+/481858 )
  • 16:38 gehel@cumin2001: END (ERROR) - Cookbook sre.elasticsearch.rolling-upgrade (exit_code=97)
  • 16:23 herron: updated phabricator.wikimedia.org spf record T216714
  • 16:22 fsero: uploading scap3 3.9.0.1 package to trusty, jessie and stretch T216666
  • 16:20 gehel@cumin2001: END (PASS) - Cookbook sre.elasticsearch.force-shard-allocation (exit_code=0)
  • 16:18 gehel@cumin2001: START - Cookbook sre.elasticsearch.force-shard-allocation
  • 16:17 fsero: uploading scap3 3.9.0.1 package to trusty, jessie and stretch
  • 16:17 fsero: updating scap3 to 3.9.0-1
  • 15:57 gehel@cumin2001: START - Cookbook sre.elasticsearch.rolling-upgrade
  • 15:52 gehel@cumin2001: END (ERROR) - Cookbook sre.elasticsearch.rolling-upgrade (exit_code=97)
  • 15:23 moritzm: installing krb5 updates for jessie
  • 15:07 herron: migrating ES shards away from logstash100[456] with "cluster.routing.allocation.exclude._name" : "logstash1004-production-logstash-eqiad,logstash1005-production-logstash-eqiad,logstash1006-production-logstash-eqiad” T214608
  • 14:50 gehel@cumin2001: END (PASS) - Cookbook sre.elasticsearch.force-shard-allocation (exit_code=0)
  • 14:50 gehel@cumin2001: START - Cookbook sre.elasticsearch.force-shard-allocation
  • 14:41 bmansurov@deploy1001: Finished deploy [recommendation-api/deploy@600e689]: Update to 0bb0a07 (duration: 04m 59s)
  • 14:37 bblack: restart vhtcpd on cp5002 to debug multicast loss
  • 14:36 bmansurov@deploy1001: Started deploy [recommendation-api/deploy@600e689]: Update to 0bb0a07
  • 13:57 godog: depool and reimage logstash1007 - T213898
  • 13:25 gehel@cumin2001: START - Cookbook sre.elasticsearch.rolling-upgrade
  • 13:20 gilles@deploy1001: Finished deploy [3d2png/deploy@ca39432]: (no justification provided) (duration: 00m 16s)
  • 13:19 gilles@deploy1001: Started deploy [3d2png/deploy@ca39432]: (no justification provided)
  • 13:19 gehel@cumin2001: END (FAIL) - Cookbook sre.elasticsearch.rolling-upgrade (exit_code=99)
  • 13:19 jbond42: restarting hhvm and updateing apache on deploy1001.eqiad.wmnet
  • 13:18 gehel@cumin2001: START - Cookbook sre.elasticsearch.rolling-upgrade
  • 13:18 gehel: restarting rolling upgrade on elasticsearch / cirrus / codfw - T215931
  • 12:50 jbond42: restarting hhvm and updateing apache on mwmaint1002.eqiad.wmnet
  • 12:44 zeljkof: EU SWAT finished
  • 12:42 zfilipin@deploy1001: Synchronized wmf-config/InitialiseSettings.php: SWAT: Add img.raremaps.com at wgCopyUploadsDomains (T216638) (duration: 00m 52s)
  • 12:40 gilles@deploy1001: Finished deploy [3d2png/deploy@ca39432]: (no justification provided) (duration: 00m 20s)
  • 12:39 gilles@deploy1001: Started deploy [3d2png/deploy@ca39432]: (no justification provided)
  • 12:38 zfilipin@deploy1001: Synchronized wmf-config/throttle.php: SWAT: Throttle rule for National Gallery of Canada Library and Archives edit-a-thon (T216642) (duration: 00m 53s)
  • 12:33 arturo: disable puppet in cloudnet2001-dev to test T216497
  • 12:31 akosiaris@deploy1001: scap-helm mathoid finished
  • 12:31 akosiaris@deploy1001: scap-helm mathoid cluster codfw completed
  • 12:30 akosiaris@deploy1001: scap-helm mathoid cluster eqiad completed
  • 12:30 akosiaris@deploy1001: scap-helm mathoid upgrade --recreate-pods -f mathoid-values.yaml production stable/mathoid [namespace: mathoid, clusters: eqiad,codfw]
  • 12:27 gilles@deploy1001: Finished deploy [3d2png/deploy@ca39432]: (no justification provided) (duration: 00m 38s)
  • 12:26 gilles@deploy1001: Started deploy [3d2png/deploy@ca39432]: (no justification provided)
  • 12:24 gilles@deploy1001: Finished deploy [3d2png/deploy@ca39432]: l thumbor2002.codfw.wmnet (duration: 00m 04s)
  • 12:24 gilles@deploy1001: Started deploy [3d2png/deploy@ca39432]: l thumbor2002.codfw.wmnet
  • 12:24 gilles@deploy1001: Finished deploy [3d2png/deploy@ca39432]: l thumbor2002 (duration: 00m 08s)
  • 12:24 gilles@deploy1001: Started deploy [3d2png/deploy@ca39432]: l thumbor2002
  • 12:23 arturo: importing openstack mitaka packages to reprepro @ install1002 (T216497)
  • 12:17 arturo: enable puppet in install1002 (done testing T216497)
  • 12:14 zfilipin@deploy1001: Synchronized dblists/mobilemainpagelegacy.dblist: SWAT: Disable mobile main page special casing on huwiki (T216563) (duration: 00m 54s)
  • 12:13 gilles@deploy1001: Finished deploy [3d2png/deploy@ca39432]: Updating repo (duration: 00m 29s)
  • 12:13 gilles@deploy1001: Started deploy [3d2png/deploy@ca39432]: Updating repo
  • 12:10 arturo: T216497 import reprepro key 7638D0442B90D010 (debian archive automatic signing key (8/jessie)
  • 12:01 arturo: disable puppet in install1002 to test T216497
  • 11:13 volans: upgraded spicerack to 0.0.19 on cumin[12]001
  • 11:11 volans: uploaded spicerack_0.0.19-1_amd64.deb to apt.wikimedia.org stretch-wikimedia
  • 10:55 akosiaris: upgrade mathoid staging+production to latest helm chart
  • 10:47 akosiaris@deploy1001: scap-helm mathoid finished
  • 10:47 akosiaris@deploy1001: scap-helm mathoid cluster codfw completed
  • 10:47 akosiaris@deploy1001: scap-helm mathoid cluster eqiad completed
  • 10:47 akosiaris@deploy1001: scap-helm mathoid upgrade -f mathoid-values.yaml production stable/mathoid [namespace: mathoid, clusters: eqiad,codfw]
  • 10:29 akosiaris@deploy1001: scap-helm mathoid finished
  • 10:29 akosiaris@deploy1001: scap-helm mathoid cluster staging completed
  • 10:29 akosiaris@deploy1001: scap-helm mathoid upgrade -f mathoid-staging-values.yaml staging stable/mathoid [namespace: mathoid, clusters: staging]
  • 10:28 akosiaris@deploy1001: scap-helm mathoid finished
  • 10:28 akosiaris@deploy1001: scap-helm mathoid cluster staging completed
  • 10:28 akosiaris@deploy1001: scap-helm mathoid upgrade -f mathoid-values.yaml staging stable/mathoid [namespace: mathoid, clusters: staging]
  • 10:27 akosiaris@deploy1001: scap-helm mathoid upgrade -f mathoid-values.yaml stable/mathoid [namespace: mathoid, clusters: staging]
  • 10:26 akosiaris@deploy1001: scap-helm list finished
  • 10:26 akosiaris@deploy1001: scap-helm list cluster codfw completed
  • 10:26 akosiaris@deploy1001: scap-helm list cluster eqiad completed
  • 10:26 akosiaris@deploy1001: scap-helm list [namespace: list, clusters: eqiad,codfw]
  • 10:23 godog: on boron unblock trusty builds with umount /var/cache/pbuilder/base-trusty-amd64.cow/dev/ptmx
  • 10:04 akosiaris: create citoid namespace on kubernetes eqiad codfw staging clusters T213194
  • 10:04 akosiaris: create cxserver namespace on kubernetes eqiad codfw staging clusters T213195
  • 09:35 volans: force rebooting unresponsive icinga1001 T214760
  • 09:29 marostegui: Deploy schema change on s3 primary master (db1078) - T210713
  • 09:29 marostegui@deploy1001: Synchronized wmf-config/db-eqiad.php: Repool db1075 T210713 (duration: 00m 52s)
  • 09:14 moritzm: temporarily stop prometheus@labs.service on labmon for journald restarts (part of security update)
  • 08:40 marostegui: Deploy schema change on db1075 - T210713
  • 08:40 marostegui@deploy1001: Synchronized wmf-config/db-eqiad.php: Depool db1075 T210713 (duration: 00m 54s)
  • 08:36 marostegui@deploy1001: Synchronized wmf-config/db-eqiad.php: Repool db1077 T210713 (duration: 00m 53s)
  • 07:44 moritzm: rolling out remaining systemd security updates on jessie
  • 07:12 marostegui: Deploy schema change on db1077 - this will generate lag on labsdb:s3 T210713
  • 07:12 marostegui@deploy1001: Synchronized wmf-config/db-eqiad.php: Depool db1077 T210713 (duration: 00m 56s)
  • 07:08 marostegui@deploy1001: Synchronized wmf-config/db-eqiad.php: Repool db1123 T210713 (duration: 00m 55s)
  • 06:22 marostegui: Deploy schema change on db1123 - T210713
  • 06:21 marostegui@deploy1001: Synchronized wmf-config/db-eqiad.php: Depool db1123 T210713 (duration: 00m 57s)
  • 05:46 bblack: repooling cp5010 - T214274
  • 05:42 bblack: removing cp5010 downtimes from icinga - T214274
  • 05:34 bblack: rebooting cp5010 for device name on swapped disk (depooled) - T214274
  • 04:30 kart_: Finished: Fifth manual run of unpublished draft purge script for ContentTranslation (T216470)
  • 04:16 XioNoX: Unplug Tata/NTT/PCCW from cr1-eqsin - T213121
  • 03:21 XioNoX: replace cp5010 disk 1 - T214274
  • 03:15 kart_: Fifth manual run of unpublished draft purge script for ContentTranslation (T216470)
  • 02:44 XioNoX: depool eqsin - T213121
  • 02:31 twentyafterfour: phabricator upgrade finished, service appears to be returned to normal
  • 01:43 twentyafterfour: running phabricator database schema changes
  • 01:38 twentyafterfour: now taking phabricator offline for upgrade
  • 01:15 twentyafterfour: Taking phabricator offline momentarily for upgrade
  • 01:01 twentyafterfour: set downtime in icinga for phab100*
  • 00:17 catrope@deploy1001: Synchronized wmf-config/InitialiseSettings.php: Enable partial blocks on metawiki and mediawikiwiki (T216065) (duration: 00m 54s)

2019-02-20

  • 23:59 ppchelko@deploy1001: Finished deploy [changeprop/deploy@5e4486a]: Purge varnish on revision restrictions (duration: 01m 23s)
  • 23:57 ppchelko@deploy1001: Started deploy [changeprop/deploy@5e4486a]: Purge varnish on revision restrictions
  • 21:48 eileen: civicrm revision changed from 165fbf5894 to 1b5d974569, config revision is ccefa3716b
  • 21:46 arlolra: Updated Parsoid to 9b204a0 (T153080, T169975, T215824)
  • 21:28 arlolra@deploy1001: Finished deploy [parsoid/deploy@c4574d1]: Updating Parsoid to 9b204a0 (duration: 09m 33s)
  • 21:19 arlolra@deploy1001: Started deploy [parsoid/deploy@c4574d1]: Updating Parsoid to 9b204a0
  • 21:08 _joe_: rolling restart of php-fpm to catch up with the tideways change
  • 20:35 thcipriani@deploy1001: Synchronized php: group1 wikis to 1.33.0-wmf.18 (duration: 00m 53s)
  • 20:14 thcipriani@deploy1001: Synchronized php-1.33.0-wmf.18/extensions/EventBus/includes/EventBusRCFeedEngine.php: Check for eventServiceName in config before accessing T216561 (duration: 00m 55s)
  • 18:30 fdans@deploy1001: Finished deploy [analytics/refinery@ccf837e]: deploying refinery for new wikis and changes in scripts (duration: 11m 13s)
  • 18:24 mobrovac@deploy1001: Finished deploy [restbase/deploy@80f518c]: Remove VE request logging - T215956 (duration: 20m 19s)
  • 18:19 fdans@deploy1001: Started deploy [analytics/refinery@ccf837e]: deploying refinery for new wikis and changes in scripts
  • 18:04 mobrovac@deploy1001: Started deploy [restbase/deploy@80f518c]: Remove VE request logging - T215956
  • 17:22 sbisson@deploy1001: Synchronized php-1.33.0-wmf.18/extensions/Flow/modules/mw.flow.Initializer.js: SWAT: Unbreak reply clicks with existing widget (duration: 00m 58s)
  • 17:08 hashar: contint1001: fix broken root ownership on zuul git deploy repo: sudo find /etc/zuul/wikimedia/.git -not -user zuul -exec chown zuul:zuul {} +
  • 16:49 herron: migrating es shards away from logstash100[56] with "cluster.routing.allocation.exclude._name" : "logstash1005-production-logstash-eqiad,logstash1006-production-logstash-eqiad” T214608
  • 16:40 twentyafterfour: started phd again, seems to be working now without killing the db
  • 16:38 bblack: multatuli: upgrade gdnsd to 3.0.0-1~wmf1
  • 16:36 godog: depool and reimage logstash1008 with stretch - T213898
  • 16:26 twentyafterfour: stopped phd on phab1001 and scheduled downtime in icinga
  • 16:24 bblack: authdns1001: upgrade gdnsd to 3.0.0-1~wmf1
  • 16:19 twentyafterfour: stopped phd on phab1002
  • 16:03 ottomata: removing spark 1 from Analytics cluster - T212134
  • 15:55 bblack: authdns2001: upgrade gdnsd to 3.0.0-1~wmf1
  • 15:37 fsero: restarting docker-registry service on systemd
  • 15:35 moritzm: temporarily stop prometheus instances on prometheus1004 for systemd upgrade/journald restart
  • 14:43 gehel@cumin2001: END (FAIL) - Cookbook sre.elasticsearch.rolling-upgrade (exit_code=99)
  • 14:35 gehel@cumin2001: START - Cookbook sre.elasticsearch.rolling-upgrade
  • 14:35 volans: upgraded spicerack to 0.0.18 on cumin[12]001
  • 14:34 volans: uploaded spicerack_0.0.18-1_amd64.deb to apt.wikimedia.org stretch-wikimedia
  • 14:00 gehel@cumin2001: END (ERROR) - Cookbook sre.elasticsearch.rolling-upgrade (exit_code=97)
  • 14:00 gehel@cumin2001: START - Cookbook sre.elasticsearch.rolling-upgrade
  • 13:59 gehel: rolling upgrade of elasticsearch / cirrus / codfw to 5.6.14 - T215931
  • 13:51 godog: prometheus on prometheus2004 crashed/exited after journald upgrade -- starting up again now
  • 13:00 jbond42: rolling restarts for hhvm in eqiad
  • 12:28 volans: upgraded spicerack to 0.0.17 on cumin[12]001
  • 12:25 volans: uploaded spicerack_0.0.17-1_amd64.deb to apt.wikimedia.org stretch-wikimedia
  • 12:08 moritzm: restarted ircecho on kraz.wikimedia.org
  • 11:46 jbond42: rolling restarts for hhvm in codfw
  • 11:28 akosiaris: rebuild and re-upload rsyslog_8.38.0-1~bpo9+1wmf1_amd64.changes to apt.wikimedia.org/stretch-wikimedia to have mmkubernetes package
  • 10:36 marostegui: Deploy schema change on db1095:3313 - T210713
  • 10:04 marostegui: Deploy schema change on dbstore1004:3313 - T210713
  • 09:57 moritzm: installing systemd security updates on jessie hosts
  • 09:33 marostegui: Deploy schema change on db2043 (s3 codfw master), lag will be generated on s3 codfw - T210713
  • 09:06 marostegui@deploy1001: Synchronized wmf-config/db-eqiad.php: Fully repool db1109 (duration: 00m 52s)
  • 08:48 moritzm: powercycling rdb1001 for a test
  • 07:45 moritzm: installing gnupg2 updates on stretch
  • 07:14 marostegui: Deploy schema change on s1 primary master (db1067) - T210713
  • 07:13 marostegui@deploy1001: Synchronized wmf-config/db-eqiad.php: Repool db1080 T210713 (duration: 00m 52s)
  • 07:09 marostegui@deploy1001: Synchronized wmf-config/db-eqiad.php: Increase traffic for db1109 after kernel upgrade (duration: 00m 52s)
  • 06:54 oblivian@deploy1001: Synchronized wmf-config/profiler.php: Fix the tideways setup (duration: 00m 52s)
  • 06:50 marostegui@deploy1001: Synchronized wmf-config/db-eqiad.php: Increase traffic for db1109 after kernel upgrade (duration: 00m 52s)
  • 06:47 marostegui@deploy1001: Synchronized wmf-config/db-eqiad.php: Depool db1080 T210713 (duration: 00m 51s)
  • 06:44 marostegui@deploy1001: Synchronized wmf-config/db-eqiad.php: Repool db1119 T210713 (duration: 00m 51s)
  • 06:38 marostegui@deploy1001: Synchronized wmf-config/db-eqiad.php: Slowly repool db1109 after kernel upgrade (duration: 00m 52s)
  • 06:28 marostegui@deploy1001: Synchronized wmf-config/db-eqiad.php: Slowly repool db1109 after kernel upgrade (duration: 00m 52s)
  • 06:18 marostegui: Stop MySQL on db1109 for kernel and mysql upgrade
  • 06:18 marostegui@deploy1001: Synchronized wmf-config/db-eqiad.php: Depool db1109 for kernel and mysql upgrade (duration: 00m 52s)
  • 06:12 marostegui@deploy1001: Synchronized wmf-config/db-eqiad.php: Depool db1119 T210713 (duration: 01m 05s)
  • 04:45 XioNoX: add avoid-paths WIRESTAR-OPTICALTEL to cr2-eqdfw
  • 02:15 mobrovac@deploy1001: Finished deploy [restbase/deploy@751dc5c]: Temporarily collect VE lrequest ogs for T215956 (duration: 22m 37s)
  • 01:52 mobrovac@deploy1001: Started deploy [restbase/deploy@751dc5c]: Temporarily collect VE lrequest ogs for T215956
  • 00:24 ebernhardson@deploy1001: Synchronized php-1.33.0-wmf.17/skins/MinervaNeue/resources/skins.minerva.content.styles/lists.less: Revert switch to outside list style from ordered lists (duration: 00m 52s)
  • 00:23 ebernhardson@deploy1001: Synchronized php-1.33.0-wmf.18/skins/MinervaNeue/resources/skins.minerva.content.styles/lists.less: Revert switch to outside list style from ordered lists (duration: 00m 59s)
  • 00:05 ebernhardson@deploy1001: Synchronized wmf-config/CirrusSearch-production.php: SWAT T215969 Return cirrussearch master timeout back to the default value (duration: 00m 57s)

2019-02-19

  • 23:51 ebernhardson: restarted ferm on relforge1001
  • 23:50 ebernhardson: temporarly stop ferm on relforge1001 to test where a connection is being blocked
  • 20:49 thcipriani@deploy1001: rebuilt and synchronized wikiversions files: group0 to 1.33.0-wmf.18
  • 20:34 thcipriani@deploy1001: Finished scap: testwiki to php-1.33.0-wmf.18 and rebuild l10n cache (duration: 30m 31s)
  • 20:07 gehel@cumin1001: END (PASS) - Cookbook sre.elasticsearch.rolling-upgrade (exit_code=0)
  • 20:04 thcipriani@deploy1001: Started scap: testwiki to php-1.33.0-wmf.18 and rebuild l10n cache
  • 20:01 gehel@cumin1001: START - Cookbook sre.elasticsearch.rolling-upgrade
  • 19:57 thcipriani: restarting ci-jenkins for plugin update
  • 19:49 thcipriani@deploy1001: Pruned MediaWiki: 1.33.0-wmf.13 (duration: 11m 52s)
  • 19:39 gtirloni: re-pooled labsdb1011 T216481
  • 19:09 andrewbogott: rebooting cloudvirt1009 to poke around in the bios
  • 18:20 thcipriani: starting branch-cut for 1.33.0-wmf.18
  • 17:55 herron: temporarily increased eqiad logstash elasticsearch low disk watermark to 87% (will restore to 85% when eqiad expansion hosts are fully online)
  • 17:52 jijiki: Restarting memcache on mc1027 - T208844
  • 17:00 hashar: Offlined compiler1002.puppet-diffs.eqiad.wmflabs from Jenkins. Its disk is corrupt | T216513
  • 16:39 gtirloni: depooled labsdb1011 T216481
  • 16:33 moritzm: installing libssh update from stretch point release
  • 16:28 jforrester@deploy1001: Synchronized php-1.33.0-wmf.17/includes/specials/pagers/ActiveUsersPager.php: T216200 Hot deploy variable name fix for ActiveUsersPager query (duration: 00m 48s)
  • 16:26 herron: enabling elasticsearch on new eqiad hosts logstash101[0-2]
  • 16:18 gtirloni: re-pooled labsdb1010 T216481
  • 16:07 gehel@cumin1001: END (FAIL) - Cookbook sre.elasticsearch.rolling-upgrade (exit_code=99)
  • 16:04 gehel@cumin1001: START - Cookbook sre.elasticsearch.rolling-upgrade
  • 15:47 jijiki: Reimaging thumbor2002 to stretch - T214597
  • 15:38 gehel@cumin1001: END (FAIL) - Cookbook sre.elasticsearch.rolling-upgrade (exit_code=99)
  • 15:32 hashar: apt-get upgrade on compiler1001 and compiler1002.puppet-diffs.eqiad.wmflabs
  • 15:27 gehel@cumin1001: START - Cookbook sre.elasticsearch.rolling-upgrade
  • 15:25 hashar: Started instance compiler1002.puppet-diffs.eqiad.wmflabs via Horizon. It was in shutoff state | T216513
  • 15:10 _joe_: uploading tideways-xhprof_5.0.0~beta3 to reprepro T176916
  • 15:09 gtirloni: depooled labsdb1010 T216481
  • 14:53 jynus: stopping db2089 for hw maintenance T216240
  • 14:41 gehel@cumin1001: END (FAIL) - Cookbook sre.elasticsearch.rolling-upgrade (exit_code=99)
  • 14:40 gehel@cumin1001: START - Cookbook sre.elasticsearch.rolling-upgrade
  • 14:36 gehel@cumin1001: END (FAIL) - Cookbook sre.elasticsearch.rolling-upgrade (exit_code=99)
  • 14:35 gehel@cumin1001: START - Cookbook sre.elasticsearch.rolling-upgrade
  • 14:31 gehel@cumin1001: END (FAIL) - Cookbook sre.elasticsearch.rolling-upgrade (exit_code=99)
  • 14:30 gehel@cumin1001: START - Cookbook sre.elasticsearch.rolling-upgrade
  • 14:30 gehel: rolling upgrade of elasticsearch on relforge - T215931
  • 14:16 jynus: stop db2090 for reboot testing T216240
  • 14:04 gtirloni: running `maintain-views --all-databases --replace-all --clean --debug` on labsdb1010 (T216481)
  • 13:44 Amir1: mwscript maintenance/createAndPromote.php --wiki=testwikidatawiki --force --interface-admin Ladsgroup
  • 13:43 Amir1: ladsgroup@mwmaint1002:~$ mwscript maintenance/createAndPromote.php --wiki=testwikidatawiki --force --sysop Ladsgroup (T215919)
  • 13:31 moritzm: installing rssh update for jessie
  • 13:25 marostegui@deploy1001: Synchronized wmf-config/db-eqiad.php: Repool db1118 T210713 (duration: 00m 46s)
  • 13:23 gtirloni: running `maintain-views --all-databases --replace-all --clean --debug` on labsdb1009 (T216481)
  • 12:57 zeljkof: EU SWAT finished
  • 12:56 zfilipin@deploy1001: Synchronized wmf-config/throttle.php: SWAT: Add new throttle rule for Kickstarter Edit-a-thon (T215839) (duration: 00m 43s)
  • 12:50 zfilipin@deploy1001: Synchronized wmf-config/InitialiseSettings.php: SWAT: Set $wgArticleCountMethod = any on fiwikinews (T216333) (duration: 00m 45s)
  • 12:37 zfilipin@deploy1001: Synchronized wmf-config/InitialiseSettings.php: SWAT: Add namespace Додатак on srwiktionary (T216343) (duration: 00m 46s)
  • 12:29 _joe_: creating gerrit repo operations/debs/tideways-xhprof T176916
  • 12:28 zfilipin@deploy1001: Synchronized wmf-config/throttle.php: SWAT: Add new throttle rule for WikiProject Women in red, enwiki (T215295) (duration: 00m 47s)
  • 12:19 zfilipin@deploy1001: Synchronized wmf-config/InitialiseSettings.php: SWAT: Modifying configuration about Chinese Wikiversity (T212919) (duration: 00m 48s)
  • 11:59 marostegui: Deploy schema change on db1118 - T210713
  • 11:59 marostegui@deploy1001: Synchronized wmf-config/db-eqiad.php: Depool db1118 T210713 (duration: 00m 46s)
  • 11:53 jynus@deploy1001: Synchronized wmf-config/db-eqiad.php: Repool db1064 (duration: 00m 46s)
  • 11:49 moritzm: installing ruby-rack security updates
  • 11:26 gilles@deploy1001: Synchronized wmf-config/InitialiseSettings.php: T187299 Launch performance perception survey on eswiki (duration: 00m 46s)
  • 11:22 jynus: stop and restart db1064
  • 11:12 marostegui@deploy1001: Synchronized wmf-config/db-eqiad.php: Repool db1083 T210713 (duration: 00m 46s)
  • 11:10 jynus@deploy1001: Synchronized wmf-config/db-eqiad.php: Depool db1064 (duration: 00m 46s)
  • 11:05 marostegui: Deploy schema change on dbstore1002
  • 10:25 marostegui: Deploy schema change on db1083 - T210713
  • 10:25 marostegui@deploy1001: Synchronized wmf-config/db-eqiad.php: Depool db1083 T210713 (duration: 00m 46s)
  • 10:13 marostegui@deploy1001: Synchronized wmf-config/db-eqiad.php: Fully repool db1093 after kernel upgrade (duration: 00m 46s)
  • 09:51 marostegui@deploy1001: Synchronized wmf-config/db-eqiad.php: More traffic to db1093 after kernel upgrade (duration: 00m 46s)
  • 09:42 mforns@deploy1001: Finished deploy [analytics/refinery@0d7ec19]: deploying refinery to update EL sanitization whitelist (duration: 07m 49s)
  • 09:34 mforns@deploy1001: Started deploy [analytics/refinery@0d7ec19]: deploying refinery to update EL sanitization whitelist
  • 09:29 marostegui@deploy1001: Synchronized wmf-config/db-eqiad.php: Repool db1089 T210713 (duration: 00m 45s)
  • 09:23 marostegui@deploy1001: Synchronized wmf-config/db-eqiad.php: Repool db1093 on API after kernel upgrade (duration: 00m 46s)
  • 09:08 marostegui@deploy1001: Synchronized wmf-config/db-eqiad.php: Slowly repool db1093 after kernel upgrade (duration: 00m 46s)
  • 08:56 _joe_: experimenting with php-fpm configuration on mwdebug1001 for T176916
  • 08:56 marostegui@deploy1001: Synchronized wmf-config/db-eqiad.php: Depool db1093 for kernel upgrade (duration: 00m 45s)
  • 08:55 hashar: Cleaning contint1001 / partition
  • 08:50 marostegui: Deploy schema change on db1089 - T210713
  • 08:50 marostegui@deploy1001: Synchronized wmf-config/db-eqiad.php: Depool db1089 T210713 (duration: 00m 46s)
  • 08:05 marostegui@deploy1001: Synchronized wmf-config/db-eqiad.php: Repool db1106 T210713 (duration: 00m 49s)
  • 07:51 marostegui: Drop ep_* tables on s1 - T174802
  • 07:50 moritzm: installing systemd security updates on stretch
  • 07:46 marostegui: Reboot db1106 for kernel upgrade (and remove debug from kernel) T216240 T216273
  • 07:21 marostegui: Drop ep_* tables on s3 - T174802
  • 06:56 marostegui: Deploy schema change on db1106 - this will generate lag on labsdb:s1 T210713
  • 06:56 marostegui: Deploy schema change on db1106 - T210713
  • 06:56 marostegui@deploy1001: Synchronized wmf-config/db-eqiad.php: Depool db1106 T210713 (duration: 00m 52s)
  • 05:31 XioNoX: delete local pref for peering sessions in ulsfo - T204281
  • 05:17 XioNoX: deleted previously deactivated BGP_community_actions terms - T204281
  • 00:01 XioNoX: disable BGP to Zayo on cr2-codfw for intrusive testing - T215193

2019-02-18

  • 20:19 gtirloni: icinga2001 ran puppet ahead of schedule (enable tools-checker-toolsdb monitor)
  • 18:26 jynus: setting clouddb1001 in read_write mode
  • 18:14 volans: upgraded to spicerack 0.0.16-1 cumin[12]001
  • 18:12 volans: uploaded spicerack_0.0.16-1_amd64.deb to apt.wikimedia.org stretch-wikimedia
  • 18:08 jynus: killing mysql on labsdb1005
  • 18:08 jynus: disabled puppet and edited my.cnf on labsdb1005
  • 17:56 jynus: restarting labsdb1004
  • 17:53 jynus: set clouddb1001 in read_only=1
  • 17:50 jijiki: Reimaging thumbor1004 to stretch - T214597
  • 15:41 jynus: performing es2 & es3 backups into es2002
  • 15:21 jynus: move logical backups to subdirectory T210292
  • 14:29 moritzm: rebooting mw2167 for kernel tests
  • 13:59 marostegui: Drop ep_* tables from s7 - T174802
  • 13:25 jijiki: Depooling thumbor1004 to check if the rest of our hosts can handle the load without it - T214597
  • 12:34 moritzm: installing brltty bugfix update from stretch point release
  • 12:31 moritzm: installing upgrading stat1005 to buster
  • 12:28 XioNoX: update clouddb_return term from cloud-in4 on cr1/2-eqiad - T216353
  • 11:53 moritzm: installing hdparm bugfix update from stretch point release
  • 11:36 moritzm: installing uriparser security updates
  • 11:11 moritzm: installing c3p0 security updates
  • 10:54 marostegui@deploy1001: Synchronized wmf-config/db-eqiad.php: Repool db1105:3311 T210713 (duration: 00m 46s)
  • 10:54 jijiki: Reimaging thumbor2002 to stretch - T214597
  • 10:40 marostegui: Drop tables ep_* from s2 (cswiki nlwiki ptwiki svwiki) T174802
  • 09:50 marostegui: Deploy schema change on db1105:3311 T210713
  • 09:50 marostegui@deploy1001: Synchronized wmf-config/db-eqiad.php: Depool db1105:3311 T210713 (duration: 00m 46s)
  • 09:46 marostegui@deploy1001: Synchronized wmf-config/db-eqiad.php: Repool db1099:3311 T210713 (duration: 00m 46s)
  • 09:28 marostegui: Drop ep_* from s6 (ruwiki) - T174802
  • 09:16 marostegui: Deploy schema change on db1099:3311 - T210713
  • 09:16 marostegui@deploy1001: Synchronized wmf-config/db-eqiad.php: Depool db1099:3311 T210713 (duration: 00m 48s)
  • 09:08 marostegui: Deploy schema change on dbstore1003:3311 and dbstore1001:3311 - T210713
  • 08:27 marostegui: Drop ep_* tables from s5 (srwiki) - T174802
  • 08:23 marostegui: Deploy schema change on s1 codfw master (db2048), lag will be generated on s1 codfw - T210713
  • 07:07 marostegui@deploy1001: Synchronized wmf-config/db-eqiad.php: Fully repool db1119 after mysql upgrade (duration: 00m 46s)
  • 06:53 marostegui@deploy1001: Synchronized wmf-config/db-eqiad.php: repool db1119 into API service after mysql upgrade (duration: 00m 46s)
  • 06:49 marostegui: Reboot db2085 to disable debug mode on kernel T216273
  • 06:42 marostegui@deploy1001: Synchronized wmf-config/db-eqiad.php: Slowly repool db1119 after mysql upgrade (duration: 00m 46s)
  • 06:29 marostegui: Stop MySQL on db1119 for mysql and kernel upgrade
  • 06:29 marostegui@deploy1001: Synchronized wmf-config/db-eqiad.php: Depool db1119 for mysql upgrade (duration: 01m 01s)
  • 05:55 marostegui: Deploy schema change on s8 primary master (db1071) - T210713
  • 05:52 marostegui: Set dbstore1002 on read only to start the migration T210478 T215589

2019-02-17

  • 21:20 bstorm_: The slave of labsdb1005.eqiad.wmnet is now clouddb1001.clouddb-services.eqiad.wmflabs
  • 13:14 XioNoX: add term labsdb_return to cloud-in4 - T216353

2019-02-16

  • 16:26 ariel@deploy1001: Finished deploy [dumps/dumps@8f83eea]: fix up multistream index file recombines for large files; better errors for misc dumps failures (duration: 00m 03s)
  • 16:25 ariel@deploy1001: Started deploy [dumps/dumps@8f83eea]: fix up multistream index file recombines for large files; better errors for misc dumps failures
  • 14:21 arturo: T194855 cloudvirt1020 is poweroff, waiting for disk setup before installing
  • 00:20 XioNoX: add port 22 in cloud-in4 term labsdb

2019-02-15

  • 20:40 andrewbogott: enabled virtualization (all three settings) on cloudvirt1019
  • 19:41 arturo: T193264 reimaging cloudvirt1019 to get mitaka/stretch
  • 18:51 arturo: T193264 icinga downtime cloudvirt1019 for 1 week
  • 18:44 bstorm_: stopped replication and then mariadb on labsdb1004
  • 16:52 cdanis: correction, needed to increment version; adding backported rasdaemon 0.6.0-1.2+deb8u2 to jessie-wikimedia
  • 16:48 cdanis: adding backported rasdaemon 0.6.0-1.2+deb8u1 to jessie-wikimedia
  • 16:29 bblack: reprepro: uploaded gdnsd-3.0.0-1~wmf1 to stretch-wikimedia
  • 15:45 moritzm: rebooting auth1001 for kernel security update
  • 14:50 moritzm: installing unbound update from stretch point release
  • 14:45 moritzm: removed labvirt1012 from debmonitor (got renamed to cloudvirt1012) (T216190)
  • 14:06 moritzm: rebooting mwlog1001 for kernel security update
  • 13:54 moritzm: rebooting mwlog2001 for kernel security update
  • 13:46 jbond42: install tar security updates
  • 13:19 moritzm: rolling reboot of mwdebug servers in eqiad to pick up SSBD-enabled qemu
  • 13:12 gtirloni: reboot cloudvirt1020
  • 13:11 arturo: T216239 labvirt1019 has been drained of any workload
  • 13:06 moritzm: installing NSS security updates
  • 12:42 moritzm: installing squid3 security updates
  • 12:30 jynus: stop db2089 mysql instances for reboot testing T216240
  • 12:30 arturo: T216239 schedule 1week of icinga downtime for labvirt1019
  • 10:48 akosiaris: upgrade docker on contint2001 to 18.06.2 T216236
  • 10:42 akosiaris: upgrade docker on contint1001 to 18.06.2 T216236
  • 10:35 gtirloni: reboot cloudvirt1019
  • 09:44 gehel: repool maps100[12]
  • 09:33 moritzm: imported php-defaults debs to thirdparty/php72
  • 08:42 akosiaris: restart gerrit to pick up https://gerrit.wikimedia.org/r/490640 T177868
  • 08:40 marostegui@deploy1001: Synchronized wmf-config/db-eqiad.php: Repool db1109 (duration: 00m 46s)
  • 08:28 moritzm: rolling restart of apertium to pick up Python 3.4 security update
  • 07:55 godog: bounce prometheus@ops on prometheus2004 to take a snapshot
  • 06:41 marostegui: Stop puppet on labsdb1005 to leave "max_user_connections" on my.cnf - T216170 T216208
  • 06:39 marostegui: Restart labsdb1005 with max_user_connections = 20 T216208
  • 06:17 marostegui: Deploy schema change on db1109 - T210713
  • 06:16 marostegui@deploy1001: Synchronized wmf-config/db-eqiad.php: Depool db1109 (duration: 00m 49s)
  • 06:13 marostegui: Reload haproxy on dbproxy11 to repool labsdb1009
  • 00:39 mutante: puppetmaster1001: sudo puppet node clean bast3003.wikimedia.org ; sudo puppet node deactivate bast3003.wikimedia.org (T216199)
  • 00:15 jynus: setting labsdb1005 back into read-write

2019-02-14

  • 23:47 jynus: restarting labsdb1005 mysql in read only mode
  • 23:37 niharika29@deploy1001: Finished deploy [scholarships/scholarships@25ea138]: Update app with updated dependencies to mitigate PHPMailer error T215302 (duration: 00m 02s)
  • 23:37 niharika29@deploy1001: Started deploy [scholarships/scholarships@25ea138]: Update app with updated dependencies to mitigate PHPMailer error T215302
  • 22:07 andrewbogott: rebuilding labvirt1012 as cloudvirt1012, T216190
  • 20:38 bstorm_: Restarted mariadb on labsdb1005 for https://wikitech.wikimedia.org/wiki/Incident_documentation/20190214-labsdb1005
  • 20:09 ejegg: updated fundraising CiviCRM from 02ea871b88 to 165fbf5894
  • 19:42 thcipriani@deploy1001: Synchronized php-1.33.0-wmf.17/extensions/GrowthExperiments/modules/help: SWAT: Help Panel: Fix IME broken in help panel search T216131 (duration: 00m 54s)
  • 19:14 thcipriani@deploy1001: Synchronized wmf-config/InitialiseSettings.php: SWAT: Stop NavPopups gadget conflict with PagePreviews on Wikivoyage T214878 (duration: 00m 54s)
  • 19:01 mutante: scandium - deleting parsoid clone dir and running puppet one more time, to fix permissions to allow wikidev
  • 18:52 mutante: scandium - deleting parsoid clone dir and running puppet one more time, to fix permissions to allow wikidev
  • 18:12 mutante: scandium - deleting parsoid clone dir and running puppet
  • 18:03 fsero: upgrading tiller to 2.12.2 on eqiad
  • 17:34 godog: bounce rsyslog on wezen/lithium, tls listener timeout in icinga
  • 16:59 moritzm: restarting apertium-apy on scb1001 to pick up Python security update
  • 16:39 marostegui: Depool labsdb1009 - T210713
  • 16:26 fsero: upgrading tiller on codfw
  • 16:11 fsero: updating tiller version on staging cluster
  • 16:10 marostegui@deploy1001: Synchronized wmf-config/db-codfw.php: Repool db2085 - T214840 (duration: 00m 52s)
  • 15:50 fsero: building and publishing new tiller docker image on boron
  • 15:50 END: (PASS) - Cookbook sre.hosts.upgrade-and-reboot (exit_code=0) (volans@cumin1001)
  • 15:43 START: - Cookbook sre.hosts.upgrade-and-reboot (volans@cumin1001)
  • 15:28 volans: upgraded spicerack to v0.0.15 on cumin[12]001
  • 15:26 volans: uploaded spicerack_0.0.15-1_amd64.deb to apt.wikimedia.org stretch-wikimedia
  • 15:12 marostegui: Clear idrac logs from db2085 - T214840
  • 14:45 godog: depool and stop logstash1009 for stretch reimage - T213898
  • 14:20 marostegui: Stop MySQL on db2085 for on-site maintenance - T214840
  • 14:12 jijiki: Enabling puppet on thumbor* servers - T214597
  • 13:39 arturo: T215892 icinga downtime cloudvirt1024 for 2 weeks
  • 12:22 zeljkof: EU SWAT finished
  • 12:21 zfilipin@deploy1001: Synchronized php-1.33.0-wmf.17/extensions/ExternalGuidance/: SWAT: Fix the eventlogging schema definition as per manifest_version=2 (duration: 00m 55s)
  • 11:43 _joe_: restarting hhvm on mw1338, hot tc exhausted T216084
  • 11:04 _joe_: upgrading python3-etcd on stretch T209136
  • 11:03 jbond42: rolling security updates for curl
  • 11:02 jijiki: Disabling puppet on thumbor* servers - T214597
  • 10:59 moritzm: installing python3.4 security updates
  • 10:53 godog: bounce prometheus instances on prometheus2004 to take a snapshot
  • 08:10 marostegui@deploy1001: Synchronized wmf-config/db-eqiad.php: Repool db1106 T214840 (duration: 00m 52s)
  • 07:57 marostegui@deploy1001: Synchronized wmf-config/db-eqiad.php: Repool db1087 T210713 (duration: 00m 54s)
  • 07:36 marostegui: Stop MySQL on db1106 for reboot - T214840
  • 06:10 marostegui: Deploy schema change on db1087 with replication, lag will be generated on labsdb:s8 T210713
  • 06:10 marostegui@deploy1001: Synchronized wmf-config/db-eqiad.php: Depool db1087 T210713 (duration: 00m 55s)
  • 01:52 mutante: scandium - removing parsoid deploy dir and letting puppet re-clone it after merging gerrit fix 484602 - replace manual clone with proper puppetization (T201366)
  • 01:52 mutante: scandium - removing parsoid deploy dir and letting puppet re-clone it after merging gerrit fix 484602 - replace manual hack with proper puppet
  • 01:15 mutante: phab1001 - phabricator mail config converted to cluster.mailers to adjust to upstream change (T212989)
  • 00:36 bd808@deploy1001: Finished deploy [scholarships/scholarships@1d89fe2]: Live hack PHPMailer namespace T215302 (duration: 00m 02s)
  • 00:36 bd808@deploy1001: Started deploy [scholarships/scholarships@1d89fe2]: Live hack PHPMailer namespace T215302
  • 00:32 catrope@deploy1001: Synchronized wmf-config/InitialiseSettings.php: Enable ORES (damaging only) on itwiki (T211032) (duration: 00m 53s)
  • 00:24 catrope@deploy1001: Synchronized wmf-config/InitialiseSettings.php: Enable help panel search on cswiki and kowiki (T209301) (duration: 00m 55s)

2019-02-13

  • 23:42 niharika29@deploy1001: Finished deploy [scholarships/scholarships@1d89fe2]: Update scholarships app for 2019 cycle T215302 (duration: 00m 02s)
  • 23:42 niharika29@deploy1001: Started deploy [scholarships/scholarships@1d89fe2]: Update scholarships app for 2019 cycle T215302
  • 21:31 jijiki: Restarting nutcracker on scb100*.eqiad.wmnet
  • 20:54 mutante: ruthenium - shell access for parsoid-testers revoked by puppet, please use scandium.eqiad.wmnet (T201366)
  • 20:44 otto@deploy1001: Started restart [eventstreams/deploy@07033d4]: bouncing eventstreams to apply page-links-change stream config
  • 20:43 mutante: ms-be2021 - powercycling
  • 20:09 thcipriani@deploy1001: Synchronized php: group1 wikis to 1.33.0-wmf.17 (duration: 00m 53s)
  • 20:08 thcipriani@deploy1001: rebuilt and synchronized wikiversions files: group1 wikis to 1.33.0-wmf.17
  • 19:55 mforns@deploy1001: Finished deploy [analytics/refinery@5f1461e]: Deploying analytics refinery with refinery-source v0.0.85 jars (duration: 07m 36s)
  • 19:48 mforns@deploy1001: Started deploy [analytics/refinery@5f1461e]: Deploying analytics refinery with refinery-source v0.0.85 jars
  • 18:13 jynus@deploy1001: Synchronized wmf-config/db-eqiad.php: Repool es1014 (duration: 00m 52s)
  • 18:06 godog: reimage prometheus2003 - T187987
  • 18:01 krinkle@deploy1001: Synchronized php-1.33.0-wmf.17/includes/libs/rdbms/loadbalancer/LoadBalancer.php: Id70fdfa62ef / T215611 (duration: 00m 55s)
  • 17:49 marostegui: Stop MYSQL on db1114 for onsite maintenance - T214720
  • 17:25 jijiki: Pooling mw1299 back - T215569
  • 17:06 cmjohnson1: db1106, troubleshooting idrac issue and updating f/w
  • 16:58 otto@deploy1001: scap-helm eventgate-analytics finished
  • 16:58 otto@deploy1001: scap-helm eventgate-analytics cluster staging completed
  • 16:58 otto@deploy1001: scap-helm eventgate-analytics upgrade staging stable/eventgate-analytics [namespace: eventgate-analytics, clusters: staging]
  • 16:30 elukey: reimage stat1005 to Debian Buster (again)
  • 16:22 otto@deploy1001: scap-helm list finished
  • 16:22 otto@deploy1001: scap-helm list cluster staging completed
  • 16:22 otto@deploy1001: scap-helm list [namespace: list, clusters: staging]
  • 16:13 otto@deploy1001: scap-helm eventgate-analytics finished
  • 16:13 otto@deploy1001: scap-helm eventgate-analytics cluster staging completed
  • 16:13 otto@deploy1001: scap-helm eventgate-analytics upgrade staging stable/eventgate-analytics [namespace: eventgate-analytics, clusters: staging]
  • 15:46 marostegui: Stop MySQL on db1106 for onsite maintenance - this will generate lag on s1 labs - T214840
  • 15:28 jynus: stop and upgrade es1014
  • 15:27 otto@deploy1001: scap-helm eventgate-analytics finished
  • 15:27 otto@deploy1001: scap-helm eventgate-analytics cluster staging completed
  • 15:27 otto@deploy1001: scap-helm eventgate-analytics upgrade staging stable/eventgate-analytics [namespace: eventgate-analytics, clusters: staging]
  • 15:27 otto@deploy1001: scap-helm eventgate-analytics upgrade staging stable/eventgate-analytics [namespace: eventgate-analytics, clusters: eqiad,codfw]
  • 15:17 akosiaris@deploy1001: scap-helm eventgate-analytics finished
  • 15:17 akosiaris@deploy1001: scap-helm eventgate-analytics cluster staging completed
  • 15:17 akosiaris@deploy1001: scap-helm eventgate-analytics install -f /srv/scap-helm/eventgate/eventgate-analytics-staging-values.yaml --set service.port=31193 ../ [namespace: eventgate-analytics, clusters: staging]
  • 15:16 moritzm: updated thirdparty/php72 component to PHP 7.2.15
  • 15:10 akosiaris@deploy1001: scap-helm eventgate-analytics finished
  • 15:10 akosiaris@deploy1001: scap-helm eventgate-analytics cluster staging completed
  • 15:10 akosiaris@deploy1001: scap-helm eventgate-analytics install -f /srv/scap-helm/eventgate/eventgate-analytics-staging-values.yaml --set service.port=31193 ../ [namespace: eventgate-analytics, clusters: staging]
  • 15:09 akosiaris@deploy1001: scap-helm eventgate-analytics install -f /srv/scap-helm/eventgate/eventgate-analytics-staging-values.yaml ../ [namespace: eventgate-analytics, clusters: staging]
  • 15:08 akosiaris@deploy1001: scap-helm eventgate-analytics finished
  • 15:08 akosiaris@deploy1001: scap-helm eventgate-analytics cluster staging completed
  • 15:08 akosiaris@deploy1001: scap-helm eventgate-analytics install --dry-run --debug -f /srv/scap-helm/eventgate/eventgate-analytics-staging-values.yaml ../ [namespace: eventgate-analytics, clusters: staging]
  • 15:05 akosiaris@deploy1001: scap-helm eventgate-analytics finished
  • 15:05 akosiaris@deploy1001: scap-helm eventgate-analytics cluster staging completed
  • 15:05 akosiaris@deploy1001: scap-helm eventgate-analytics install --dry-run --debug -f eventgate-analytics-staging-values.yaml stable/eventgate-analytics [namespace: eventgate-analytics, clusters: staging]
  • 15:05 akosiaris@deploy1001: scap-helm eventgate-analytics install -n staging -f eventgate-analytics-staging-values.yaml stable/eventgate-analytics --dry-run --debug [namespace: eventgate-analytics, clusters: staging]
  • 14:53 otto@deploy1001: scap-helm eventgate-analytics finished
  • 14:53 otto@deploy1001: scap-helm eventgate-analytics cluster staging completed
  • 14:53 otto@deploy1001: scap-helm eventgate-analytics install -n staging -f eventgate-analytics-staging-values.yaml stable/eventgate-analytics [namespace: eventgate-analytics, clusters: staging]
  • 14:25 elukey: reimage stat1005 back to stretch to test GPU drivers
  • 14:06 godog: cancel https://integration.wikimedia.org/ci/job/operations-mw-config-composer-test-docker/12236 to unblock test-prio zuul queue
  • 14:05 jynus@deploy1001: Synchronized wmf-config/db-eqiad.php: Repool db1120, depool es1014 (duration: 00m 52s)
  • 12:34 arturo: T216030 icinga downtime cloudvirt1018 for 2 weeks
  • 12:32 arturo: T216030 T216004 rebooting cloudvirt1018
  • 11:55 moritzm: installing avahi security updates
  • 11:49 jynus: stop and upgrade db1120
  • 11:43 moritzm: installing golang updates on jessie
  • 11:41 volans: upgraded spicerack on cumin[12]001 to v0.0.14
  • 11:38 volans: uploaded spicerack_0.0.14-1_amd64.deb to apt.wikimedia.org stretch-wikimedia
  • 11:33 jynus@deploy1001: Synchronized wmf-config/db-eqiad.php: Depool db1120 (duration: 00m 53s)
  • 11:11 moritzm: installing postgis security updates
  • 09:46 moritzm: installing golang security updates
  • 09:33 gtirloni: labsdb1005 rebooted server
  • 09:26 gtirloni: labsdb1005 stopped mysql
  • 09:22 marostegui: Stop MySQL on db1106 - T214840
  • 09:21 marostegui@deploy1001: Synchronized wmf-config/db-eqiad.php: Depool db1106 T214840 (duration: 00m 53s)
  • 08:18 marostegui@deploy1001: Synchronized wmf-config/db-eqiad.php: Repool db1104 (duration: 00m 53s)
  • 06:46 vgutierrez: uploaded acme-chief 0.10 to apt.wikimedia.org (buster) - T215925
  • 06:18 marostegui: Deploy schema change on db1104 - T210713
  • 06:17 marostegui@deploy1001: Synchronized wmf-config/db-eqiad.php: Depool db1104 (duration: 01m 07s)
  • 06:12 marostegui: Stop MySQL on db2085 to keep debugging kernel issues - T214840
  • 01:31 thcipriani@deploy1001: Synchronized wmf-config/CommonSettings.php: SWAT: Add ExternalGuidance extension T213076 (part 3) (duration: 00m 53s)
  • 01:30 thcipriani@deploy1001: Synchronized wmf-config/InitialiseSettings.php: SWAT: Add ExternalGuidance extension T213076 (part 2) (duration: 00m 53s)
  • 01:15 thcipriani@deploy1001: Finished scap: SWAT: Add ExternalGuidance extension T213076 (part I: build l10n and sync code) (duration: 27m 51s)
  • 00:47 thcipriani@deploy1001: Started scap: SWAT: Add ExternalGuidance extension T213076 (part I: build l10n and sync code)
  • 00:41 thcipriani@deploy1001: Synchronized php-1.33.0-wmf.17/extensions/Thanks/modules/ext.thanks.mobilediff.css: SWAT: Follow ups to I807f729c1b1a9e9b5952685bb18f540f81d70f47 (duration: 00m 55s)
  • 00:27 XioNoX: merge VRRP Icinga Check

2019-02-12

  • 23:14 jforrester@deploy1001: Finished scap: Another full scap, hoping to find the new i18n in RL for T214482 T215471 T215472 (duration: 06m 01s)
  • 23:09 foks: removed 4 files for legal compliance
  • 23:08 jforrester@deploy1001: Started scap: Another full scap, hoping to find the new i18n in RL for T214482 T215471 T215472
  • 22:47 jforrester@deploy1001: Finished scap: Full scap for new i18n and code for T214482 T215471 T215472 (duration: 18m 03s)
  • 22:29 jforrester@deploy1001: Started scap: Full scap for new i18n and code for T214482 T215471 T215472
  • 21:38 robh: icinga1001 in hardware testing, dont mess with it T214760
  • 21:10 robh: working on troubleshooting icinga1001 via T214760
  • 20:58 jforrester@deploy1001: Synchronized php-1.33.0-wmf.16/extensions/Wikibase/view/resources/resources.php: Hot-deploy I74f6389ae for other code, file 2 (duration: 00m 52s)
  • 20:57 jforrester@deploy1001: Synchronized php-1.33.0-wmf.16/extensions/Wikibase/view/lib/resources.php: Hot-deploy I74f6389ae for other code, file 1 (duration: 00m 51s)
  • 20:52 jforrester@deploy1001: Synchronized php-1.33.0-wmf.16/resources/Resources.php: Hot-deploy If0d7b687e for other code (duration: 00m 54s)
  • 20:06 thcipriani@deploy1001: rebuilt and synchronized wikiversions files: Group0 to 1.33.0-wmf.17
  • 19:59 thcipriani@deploy1001: Finished scap: testwiki to php-1.33.0-wmf.17 and rebuild l10n (duration: 18m 54s)
  • 19:40 thcipriani@deploy1001: Started scap: testwiki to php-1.33.0-wmf.17 and rebuild l10n
  • 19:37 thcipriani@deploy1001: Pruned MediaWiki: 1.33.0-wmf.12 (duration: 03m 10s)
  • 19:32 thcipriani@deploy1001: Pruned MediaWiki: 1.33.0-wmf.9 (duration: 10m 05s)
  • 18:48 thcipriani: make-wmf-branch 1.33.0-wmf.17
  • 17:54 chaomodus: notebook1003 - restarted nagios-nrpe-server T212824
  • 17:04 marostegui: Start MySQL again on db2085 for s1 and s8 - T214840
  • 16:18 akosiaris: refresh kubernetes default egress policy T211247
  • 15:58 akosiaris@deploy1001: scap-helm eventgate-analytics finished
  • 15:58 akosiaris@deploy1001: scap-helm eventgate-analytics cluster codfw completed
  • 15:58 akosiaris@deploy1001: scap-helm eventgate-analytics cluster eqiad completed
  • 15:58 akosiaris@deploy1001: scap-helm eventgate-analytics [namespace: eventgate-analytics, clusters: eqiad,codfw]
  • 15:46 akosiaris: create namespaces for eventgate-analytics on eqiad/codfw/staging cluster T211247 T213194
  • 15:45 moritzm: rebooting db2085 for some tests
  • 15:38 marostegui: Stop MySQL on db2085 - T214840
  • 15:38 marostegui@deploy1001: Synchronized wmf-config/db-codfw.php: Depool db2085 - T214840 (duration: 00m 47s)
  • 15:30 otto@deploy1001: scap-helm --help finished
  • 15:30 otto@deploy1001: scap-helm --help cluster codfw completed
  • 15:30 otto@deploy1001: scap-helm --help cluster eqiad completed
  • 15:30 otto@deploy1001: scap-helm --help [namespace: --help, clusters: eqiad,codfw]
  • 15:03 ejegg: updated fundraising CiviCRM from a541a83cb2 to 02ea871b88
  • 14:39 marostegui@deploy1001: Synchronized wmf-config/db-eqiad.php: Fully repool db1092 (duration: 00m 46s)
  • 14:26 marostegui@deploy1001: Synchronized wmf-config/db-eqiad.php: More api traffic to db1092 (duration: 00m 44s)
  • 14:13 marostegui@deploy1001: Synchronized wmf-config/db-eqiad.php: More traffic to db1092 (duration: 00m 46s)
  • 13:58 marostegui@deploy1001: Synchronized wmf-config/db-eqiad.php: Give some api traffic to db1092 (duration: 00m 46s)
  • 13:40 marostegui@deploy1001: Synchronized wmf-config/db-eqiad.php: Slowly repool db1092 (duration: 00m 47s)
  • 13:39 vgutierrez: uploaded acme-chief 0.9 to apt.wikimedia.org (stretch) - T207389 T213737
  • 12:57 moritzm: installing openssl1.0 security updates
  • 12:30 zeljkof: EU SWAT finished
  • 12:30 zfilipin@deploy1001: Synchronized wmf-config/InitialiseSettings.php: SWAT: Add https://polona.pl/ to $wgCopyUploadsDomains (T215501) (duration: 00m 46s)
  • 12:19 moritzm: install ghostscript security updates on scb*
  • 12:15 zfilipin@deploy1001: Synchronized wmf-config/InitialiseSettings.php: SWAT: Create extendedconfirmed user group for viwiki (T215493) (duration: 00m 47s)
  • 12:10 zfilipin@deploy1001: Synchronized wmf-config/InitialiseSettings.php: SWAT: Enable Rollbackers User Group Right on azwiki (T215200) (duration: 00m 47s)
  • 12:03 marostegui: Stop MySQL on db1092 to upgrade mysql and kernel
  • 11:27 moritzm: rebooting stat1005
  • 11:20 moritzm: installing ghostscript security updates on remaining thumbor hosts
  • 10:25 marostegui: Deploy schema change on db1092 T210713
  • 10:25 marostegui@deploy1001: Synchronized wmf-config/db-eqiad.php: Depool db1092 (duration: 00m 46s)
  • 10:21 marostegui@deploy1001: Synchronized wmf-config/db-eqiad.php: Repool db1101:3318 (duration: 00m 46s)
  • 10:00 moritzm: installing ghostscript security updates on thumbor1001
  • 09:36 moritzm: reimaging stat1005 to buster
  • 08:20 marostegui: Deploy schema change on db1101:3318 - T210713
  • 08:20 marostegui: Depool db1101:3318 - T210713
  • 08:20 marostegui@deploy1001: Synchronized wmf-config/db-eqiad.php: Depool db1101:3318 (duration: 00m 46s)
  • 08:16 marostegui@deploy1001: Synchronized wmf-config/db-eqiad.php: Repool db1099:3318 (duration: 00m 49s)
  • 07:49 ebernhardson@deploy1001: Finished deploy [search/mjolnir/deploy@125354e]: maintain symlink for old venv path with new virtualenv deploy script (duration: 03m 55s)
  • 07:46 ebernhardson@deploy1001: Started deploy [search/mjolnir/deploy@125354e]: maintain symlink for old venv path with new virtualenv deploy script
  • 07:40 ebernhardson@deploy1001: Finished deploy [search/mjolnir/deploy@125354e]: testing simplified virtualenv deploy (take 2) (duration: 04m 14s)
  • 07:35 ebernhardson@deploy1001: Started deploy [search/mjolnir/deploy@125354e]: testing simplified virtualenv deploy (take 2)
  • 07:31 ebernhardson@deploy1001: Finished deploy [search/mjolnir/deploy@125354e]: testing simplified virtualenv deploy (duration: 01m 07s)
  • 07:30 ebernhardson@deploy1001: Started deploy [search/mjolnir/deploy@125354e]: testing simplified virtualenv deploy
  • 07:26 elukey: update analytics-in4 term mysql-dbstore on cr1/cr2 eqiad
  • 07:09 marostegui: Rename ep_* tables on db1089 (s1) - T174802
  • 06:33 kart_: Finished fourth manual run of unpublished draft purge script (T203059)
  • 06:14 marostegui: Deploy schema change on db1099:3318 T210713
  • 06:14 marostegui@deploy1001: Synchronized wmf-config/db-eqiad.php: Depool db1099:3318 (duration: 00m 52s)
  • 06:04 kart_: Fourth manual run of unpublished draft purge script (T203059)
  • 02:18 thcipriani: restarting gerrit due to high load
  • 00:49 ebernhardson@deploy1001: Finished scap: SWAT: full sync for gerrit:489309 i18n (duration: 18m 20s)
  • 00:30 ebernhardson@deploy1001: Started scap: SWAT: full sync for gerrit:489309 i18n
  • 00:28 ebernhardson@deploy1001: Synchronized wmf-config/WikibaseSearchSettings.php: gerrit:489780 T214515 Promote new wbsearchentities profiles to default in de, fr, es (duration: 00m 46s)
  • 00:13 jforrester@deploy1001: Synchronized php-1.33.0-wmf.16/extensions/CentralNotice/: SWAT Merge branch 'master' into wmf_deploy I8e52d222eb (duration: 00m 49s)
  • 00:05 jforrester@deploy1001: Synchronized wmf-config/CommonSettings.php: SWAT Stop setting wgSessionsInObjectCache, it's being removed from MW I2946b5b9a (duration: 00m 47s)

2019-02-11

  • 23:22 cdanis: T214760 icinga2001% sudo killall nsca
  • 22:53 cdanis: icinga.w.o-->icinga2001 DNS change deployed T214760
  • 22:40 cdanis: icinga1001 now passive T214760
  • 22:34 cdanis: failing over icinga to icinga2001
  • 21:33 arlolra: Updated Parsoid to b4b9603 (T208901, T215537, T213468, T215638)
  • 21:24 arlolra@deploy1001: Finished deploy [parsoid/deploy@4e9b142]: Updating Parsoid to b4b9603 (duration: 09m 33s)
  • 21:22 otto@deploy1001: Synchronized wmf-config/CommonSettings.php: Use newer RCFeed config for EventBus based recentchange event - T215834 (duration: 00m 47s)
  • 21:20 ottomata: deploying mediawiki-config change for update to EventBus RCFeed config (no-op)
  • 21:16 smalyshev@deploy1001: Finished deploy [wdqs/wdqs@c6a6285]: Weekly GUI deploy (duration: 11m 54s)
  • 21:14 arlolra@deploy1001: Started deploy [parsoid/deploy@4e9b142]: Updating Parsoid to b4b9603
  • 21:13 mobrovac@deploy1001: Finished deploy [citoid/deploy@0b91bea]: Use Zotero for DOIs and pass it the A-L header - T214766 T210806 T215755 (duration: 03m 47s)
  • 21:09 mobrovac@deploy1001: Started deploy [citoid/deploy@0b91bea]: Use Zotero for DOIs and pass it the A-L header - T214766 T210806 T215755
  • 21:04 smalyshev@deploy1001: Started deploy [wdqs/wdqs@c6a6285]: Weekly GUI deploy
  • 20:08 ppchelko@deploy1001: Finished deploy [changeprop/deploy@bdb4740]: Update dependencies, minor refactor, safer deduplication, T207329 (duration: 01m 37s)
  • 20:07 ppchelko@deploy1001: Started deploy [changeprop/deploy@bdb4740]: Update dependencies, minor refactor, safer deduplication, T207329
  • 19:42 jynus@deploy1001: Synchronized wmf-config/db-eqiad.php: Repool db1106, db1118 with full weight (duration: 00m 46s)
  • 19:34 catrope@deploy1001: Synchronized dblists/mobilemainpagelegacy.dblist: Remove main page special casing from lawiki (T215709) (duration: 00m 46s)
  • 19:28 catrope@deploy1001: Synchronized wmf-config/InitialiseSettings.php: Set wgRestrictionLevels on Serbian projects (T215653) (duration: 00m 46s)
  • 19:16 catrope@deploy1001: Synchronized php-1.33.0-wmf.16/extensions/GrowthExperiments/: Help panel search instrumentation (T211166) (duration: 00m 47s)
  • 19:08 catrope@deploy1001: Synchronized wmf-config/throttle.php: Lift account creation cap for edit-a-thon (T215069) (duration: 00m 47s)
  • 19:08 jijiki: Repooled thumbor1004 - T215411
  • 18:50 robh: thumbor1004 rebooted and updated firmware T215411
  • 18:50 robh: thumbor1004 rebooted and updated firmware
  • 16:49 jynus: stop, upgrade and restart db1106
  • 16:36 marostegui: Reload haproxy on dbproxy1010 to repool labsdb1011
  • 16:31 marostegui: Reverse password for globaldev user on dbstore1002 - T200801
  • 16:29 jynus@deploy1001: Synchronized wmf-config/db-eqiad.php: Depool db1106 (duration: 00m 52s)
  • 15:49 jynus@deploy1001: Synchronized wmf-config/db-eqiad.php: Repool db1118 (duration: 00m 48s)
  • 15:24 addshore@deploy1001: Synchronized wmf-config/InitialiseSettings-labs.php: BETA ONLY (duration: 00m 47s)
  • 15:23 marostegui: Relohad haproxy on dbproxy1010 to depool labsdb1011 - https://phabricator.wikimedia.org/T212308
  • 15:21 addshore@deploy1001: Synchronized wmf-config/Wikibase.php: Wikibase.php, add conditional setting of useEntitySourceBasedFederation (duration: 00m 47s)
  • 15:20 marostegui: Repool labsdb1010 - T212308
  • 15:19 jynus: add missing grants to db1118
  • 15:07 jynus@deploy1001: Synchronized wmf-config/db-eqiad.php: Revert, second try (duration: 00m 47s)
  • 15:00 addshore@deploy1001: sync-file aborted: Wikibase.php, add conditional setting of useEntitySourceBasedFederation (duration: 00m 01s)
  • 14:55 jynus@deploy1001: Synchronized wmf-config/db-eqiad.php: Revert (duration: 00m 45s)
  • 14:53 jynus@deploy1001: Synchronized wmf-config/db-eqiad.php: Pool db1118 for the first time (duration: 00m 47s)
  • 14:51 mbsantos@deploy1001: Finished deploy [tilerator/deploy@d546183] (stretch): Updating maps2004 tilerator for the stretch migration work (duration: 00m 39s)
  • 14:50 mbsantos@deploy1001: Started deploy [tilerator/deploy@d546183] (stretch): Updating maps2004 tilerator for the stretch migration work
  • 14:48 mbsantos@deploy1001: Finished deploy [kartotherian/deploy@173adbe] (stretch): Updating maps2004 kartotherian for the stretch migration work (duration: 00m 21s)
  • 14:48 mbsantos@deploy1001: Started deploy [kartotherian/deploy@173adbe] (stretch): Updating maps2004 kartotherian for the stretch migration work
  • 14:47 moritzm: installing curl security updates on trusty
  • 14:21 marostegui: Remove staging from dbstore1003 - T210478
  • 14:16 godog: depool and take a snapshot of prometheus data for all instances on prometheus2003 - T187987
  • 14:09 marostegui: Reload haproxy on dbproxy1010 to depool labsdb1010 - T212308
  • 14:08 marostegui: Deploy schema change on db1116:3318 - T210713
  • 12:21 godog: bounce rsyslogd on lithium / wezen, syslog tls listener stuck
  • 12:19 zeljkof: EU SWAT finished
  • 12:18 zfilipin@deploy1001: Synchronized wmf-config/throttle.php: SWAT: New throttle rule for Senior Citizens Write Wikipedia course (T215618) (duration: 00m 48s)
  • 12:14 zfilipin@deploy1001: Synchronized wmf-config/throttle.php: SWAT: Clean expired throttle rules (duration: 00m 48s)
  • 10:47 jdrewniak@deploy1001: Synchronized portals: Wikimedia Portals Update: Bumping portals to master (T128546) (duration: 00m 46s)
  • 10:46 jdrewniak@deploy1001: Synchronized portals/wikipedia.org/assets: Wikimedia Portals Update: Bumping portals to master (T128546) (duration: 00m 48s)
  • 10:41 jynus: upgrading mariadb client on cumin* hosts
  • 10:27 mvolz@deploy1001: scap-helm zotero finished
  • 10:27 mvolz@deploy1001: scap-helm zotero cluster codfw completed
  • 10:27 mvolz@deploy1001: scap-helm zotero upgrade production -f zotero-values-codfw.yaml stable/zotero [namespace: zotero, clusters: codfw]
  • 10:24 mvolz@deploy1001: scap-helm zotero finished
  • 10:24 mvolz@deploy1001: scap-helm zotero cluster eqiad completed
  • 10:24 mvolz@deploy1001: scap-helm zotero upgrade production -f zotero-values-eqiad.yaml stable/zotero [namespace: zotero, clusters: eqiad]
  • 10:19 marostegui: Add dbstore1005:3350 to tendril and zarcillo - T210478
  • 10:17 mvolz@deploy1001: scap-helm zotero finished
  • 10:17 mvolz@deploy1001: scap-helm zotero cluster staging completed
  • 10:17 mvolz@deploy1001: scap-helm zotero upgrade staging -f zotero-values-staging.yaml --version=0.0.1 stable/zotero [namespace: zotero, clusters: staging]
  • 10:17 jynus: restart db1114
  • 09:38 marostegui: Stop all mysql instances on dbstore1005 for reboot
  • 09:11 marostegui: Stop all mysql instances on dbstore1003 for reboot
  • 08:17 moritzm: removed cloudcontrol2001-dev.codfw.wmnet from debmonitor (actual hostname in use is cloudcontrol2001-dev.wikimedia.org)
  • 08:07 marostegui: Deploy schema change on s8 codfw master (db2045) - this will generate lag on codfw T210713
  • 07:43 marostegui@deploy1001: Synchronized wmf-config/db-eqiad.php: Fully repool db1100 (duration: 00m 46s)
  • 07:39 marostegui: Deploy schema change on s7 primary master (db1062) - T210713
  • 07:27 marostegui@deploy1001: Synchronized wmf-config/db-eqiad.php: Give api traffic to db1100 (duration: 00m 46s)
  • 07:18 marostegui: Stop all mysql instances on dbstore1004 for a reboot
  • 07:17 marostegui@deploy1001: Synchronized wmf-config/db-eqiad.php: Repool db1100 with low weight (duration: 00m 46s)
  • 07:06 marostegui: Upgrade MySQL on db1100
  • 07:06 marostegui@deploy1001: Synchronized wmf-config/db-eqiad.php: Depool db1100 for mysql upgrade (duration: 00m 47s)
  • 07:00 marostegui: Restart icinga on icinga1001 - checks went awol
  • 06:51 marostegui@deploy1001: Synchronized wmf-config/db-eqiad.php: Repool db1079 (duration: 00m 48s)
  • 06:14 marostegui@deploy1001: Synchronized wmf-config/db-eqiad.php: Depool db1079 (duration: 00m 48s)
  • 06:14 marostegui@deploy1001: sync-file aborted: Depool db0179 (duration: 00m 01s)
  • 04:23 TimStarling: on mwmaint1002: running normalizeThrottleParameters.php --dry-run on all wikis (T209565)
  • 04:19 tstarling@deploy1001: Synchronized php-1.33.0-wmf.16/extensions/AbuseFilter/maintenance/normalizeThrottleParameters.php: maintenance script update for new dry run (duration: 00m 47s)
  • 04:19 tstarling@deploy1001: Synchronized php-1.33.0-wmf.16/extensions/WikimediaEvents/tests/phpunit/PageViewsTest.php: test-only undeployed change (duration: 00m 46s)
  • 04:18 tstarling@deploy1001: Synchronized php-1.33.0-wmf.16/extensions/NavigationTiming/tests/ext.navigationTiming.test.js: test-only undeployed change (duration: 00m 51s)
  • 04:10 tstarling@deploy1001: sync-file aborted: test-only undeployed change (duration: 00m 12s)
  • 03:05 kartik@deploy1001: Finished deploy [cxserver/deploy@ee4a15a]: Update cxserver to 8928852 (T213256) (duration: 04m 08s)
  • 03:01 kartik@deploy1001: Started deploy [cxserver/deploy@ee4a15a]: Update cxserver to 8928852 (T213256)

2019-02-10

  • off: force rebooting mw1299, stuck again - T215569
  • off: forcing reboot of icinga1001 because it's stuck again (no ping, no ssh, CPU stuck messages on console) - T214760
  • 09:25 marostegui: Disable notifications for lag checks on dbstore1002 - T210478

2019-02-09

  • 21:42 Reedy: running `foreachwiki refreshImageMetadata.php --mediatype BITMAP --mime image/vnd.djvu --force` on mwmaint1002 T215635
  • 21:41 Reedy: refreshImageMetadata.php for commonswiki done T215635
  • 16:51 Jeff_Green: restarted icinga process on icinga1001 because of passive check alert-storm

2019-02-08

  • 23:23 Reedy: running `refreshImageMetadata.php --mediatype BITMAP --mime image/vnd.djvu --force` against commonswiki on mwmaint1002 T215635 (this time we mean it)
  • 22:56 Reedy: running `refreshImageMetadata.php --mediatype BITMAP --mime image/vnd.djvu` against commonswiki on mwmaint1002 T215635
  • 21:25 reedy@deploy1001: Synchronized multiversion/MWMultiVersion.php: Move variable (duration: 00m 49s)
  • 19:50 krinkle@deploy1001: Synchronized w/touch.php: Ia1e610a5f (duration: 00m 46s)
  • 19:49 krinkle@deploy1001: Synchronized w/robots.php: Ia1e610a5f (duration: 00m 46s)
  • 19:48 krinkle@deploy1001: Synchronized w/favicon.php: Ia1e610a5f (duration: 00m 46s)
  • 19:47 krinkle@deploy1001: Synchronized w/extract2.php: Ia1e610a5f (duration: 00m 48s)
  • 18:14 gtirloni: T213527 graphite2002 disabled puppet and commented prometheus_puppet_agent_stats cronjob due to cronspam
  • 18:08 jynus@deploy1001: Synchronized wmf-config/db-eqiad.php: Increase weight for s1 rc slaves (duration: 00m 49s)
  • 17:55 mutante: phab1001 - restart aphlict service
  • 17:52 mutante: phab1001 - restarting phd service
  • 17:49 arturo: T215605 add prometheus-openstack-exporter 0.0.8-4 to stretch-wikimedia
  • 17:47 mutante: phab1001 - restarting apache2 service for library upgrade
  • 17:42 mutante: graceful reload of apache on phabricator prod server (phab1001)
  • 17:27 XioNoX: merge Icinga: add ping check for ulsfo PDUs
  • 16:50 ejegg: updated payments-wiki-staging from 52a271e681 to 31647bc97e
  • 16:09 jynus: stopping s1 replication on dbstore1001 to speed up cloning T214720
  • 16:08 moritzm: imported git-fat 0.1.3-2+deb10u1 to buster-wikimedia (T213527)
  • 15:46 marostegui: Repool labsdb1009 - T212308
  • 15:33 _joe_: apt-get upgrade on mwmaint2001 to fix the php installation T215376
  • 15:31 moritzm: imported debmonitor 0.1.5-1+deb10u1 to buster-wikimedia (T213527)
  • 15:31 _joe_: upgraded all php extensions to php 7.2 compatible versions on mwmaint1002
  • 15:10 jijiki: Upgrading php-redis 4.1.1 to mwmaint1002 - T215376
  • 14:51 marostegui: Reload haproxy on dbproxy1011 to depool labsdb1009 - https://phabricator.wikimedia.org/T212308
  • 13:56 moritzm: updated firmware-enriched buster netboot image to 20190208 daily build, the alpha5 image no longer works as Linux 4.19.16-1 bumped the ABI and migrated to testing yesterday
  • 13:45 jynus: racadm serveraction powercycle db1114
  • 13:39 onimisionipe: starting osm-initial-import for maps2004 which is the newly migrated to stretch master - T198622
  • 13:37 elukey: roll restart of aqs on aqs1* to pick up new druid backend changes
  • 13:05 arturo: T209029 reimaging cloudelastic1004
  • 12:54 ejegg: updated fundraising CiviCRM from 3a1bb82373 to a541a83cb2
  • 12:51 jynus: disabling notifications on db1114
  • 12:44 elukey@deploy1001: Synchronized wmf-config/db-eqiad.php: depooling db1114, host down (duration: 00m 47s)
  • 11:36 moritzm: reimage graphite2002 to buster
  • 11:08 jynus@deploy1001: Synchronized wmf-config/db-eqiad.php: Repool db1083 fully (duration: 00m 47s)
  • 10:50 jynus@deploy1001: Synchronized wmf-config/db-eqiad.php: Repool db1099 (duration: 00m 47s)
  • 10:27 jijiki: Restarting memcached on mc1026 to apply '-R 200' - T208844
  • 10:23 godog: swift codfw-prod: more weight to ms-be2047 - T209395 T209921
  • 10:15 jynus: stop and upgrade db1099
  • 10:14 marostegui@deploy1001: Synchronized wmf-config/db-eqiad.php: Repool db1090:3317 (duration: 00m 47s)
  • 09:37 marostegui@deploy1001: Synchronized wmf-config/db-eqiad.php: Depool db1090:3317 (duration: 00m 46s)
  • 09:28 jynus@deploy1001: Synchronized wmf-config/db-eqiad.php: Depool db1099 (duration: 00m 46s)
  • 09:22 marostegui@deploy1001: Synchronized wmf-config/db-eqiad.php: Fully repool db1086 (duration: 00m 46s)
  • 09:16 moritzm: installing rssh security updates
  • 09:06 moritzm: installing libarchive security updates
  • 09:04 marostegui@deploy1001: Synchronized wmf-config/db-eqiad.php: More traffic to db1086 (duration: 00m 47s)
  • 08:53 moritzm: reimage graphite2002 to buster
  • 08:50 jynus@deploy1001: Synchronized wmf-config/db-eqiad.php: Repool db1083 with low load (duration: 00m 46s)
  • 08:30 marostegui@deploy1001: Synchronized wmf-config/db-eqiad.php: Slowly repool db1086 (duration: 00m 47s)
  • 08:24 jynus: stop and upgrade db1083
  • 08:23 jynus@deploy1001: Synchronized wmf-config/db-eqiad.php: Depool db1083 (duration: 00m 47s)
  • 08:15 marostegui: Upgrade MySQL on db1086
  • 08:05 marostegui: Upgrade MySQL on db1086 and deploy schema change
  • 08:03 marostegui@deploy1001: Synchronized wmf-config/db-eqiad.php: Depool db1086 (duration: 00m 46s)
  • 07:55 marostegui@deploy1001: Synchronized wmf-config/db-eqiad.php: Full repool db1094 (duration: 00m 47s)
  • 07:45 jmm@puppetmaster1001: conftool action : set/pooled=inactive; selector: name=mw1299.eqiad.wmnet
  • 07:41 marostegui@deploy1001: Synchronized wmf-config/db-eqiad.php: More traffic to db1094 (duration: 02m 55s)
  • 07:27 marostegui@puppetmaster1001: conftool action : set/pooled=no; selector: name=mw1299.eqiad.wmnet
  • 07:27 marostegui@deploy1001: Synchronized wmf-config/db-eqiad.php: Slowly repool db1094 (duration: 02m 56s)
  • 07:12 marostegui: Upgrade mysql and kernel on db1094
  • 06:58 marostegui: Deploy schema change on db1094 T210713
  • 06:58 marostegui@deploy1001: Synchronized wmf-config/db-eqiad.php: Depool db1094 (duration: 00m 46s)
  • 06:54 marostegui: Take a mysqldump from staging on dbstore1003 from dbstore1002 - T210478
  • 06:54 marostegui@deploy1001: Synchronized wmf-config/db-eqiad.php: Repool db1098:3317 (duration: 00m 49s)
  • 06:29 marostegui: powercycle mw1299 - T215569
  • 06:21 marostegui: Deploy schema change on db1098:3317
  • 06:21 marostegui@deploy1001: Synchronized wmf-config/db-eqiad.php: Depool db1098:3317 (duration: 02m 58s)
  • 06:07 marostegui: Drop staging.mep_word_persistence from dbstore1002 T215450 T213706
  • 02:34 ejegg: updated fundraising CiviCRM from 08be00e87f to 3a1bb82373
  • 01:37 dzahn@puppetmaster1001: conftool action : set/pooled=no; selector: name=mw1299.eqiad.wmnet
  • 01:10 mutante: mw1299 has been down about 8 hours, does it need deployment.. depooling
  • 01:08 mutante: powercycle crashed mw1299 via mgmt (garbled console output) (T215569)
  • 00:22 ebernhardson@deploy1001: Synchronized wmf-config/InitialiseSettings.php: SWAT gerrit:488588 phab:T214515 Turn off wikidata wbsearchentities ab test in de, fr, es (duration: 02m 55s)
  • 00:16 ebernhardson: scap sync timed out on mw1299.eqiad.wmnet
  • 00:15 ebernhardson@deploy1001: Synchronized wmf-config/InitialiseSettings.php: SWAT gerrit:483044 T209873 Give protect right to centralnoticeadmin on Meta (duration: 02m 56s)

2019-02-07

  • 23:29 XioNoX: restart ps1-22-ulsfo
  • 23:23 reedy@deploy1001: Synchronized tests/dblistTest.php: Sync test (duration: 02m 55s)
  • 23:18 reedy@deploy1001: Synchronized README: must be up to date (duration: 02m 54s)
  • 22:48 reedy@deploy1001: Synchronized dblists/: alphasort dblists (duration: 02m 56s)
  • 21:43 twentyafterfour@deploy1001: rebuilt and synchronized wikiversions files: group2 wikis to 1.33.0-wmf.16 refs T206670
  • 21:38 robh: updating firmware on ps1-23-ulsfo via T209101 ps1-22-ulsfo update completed
  • 21:22 robh: updating firmware on ps1-22-ulsfo via T209101
  • 20:55 twentyafterfour: train status: deploying 1.33.0-wmf.16 to group2
  • 20:19 sbisson@deploy1001: Synchronized php-1.33.0-wmf.16/extensions/WikibaseLexeme/src/DataAccess/Search/LexemeFulltextResult.php: SWAT: Fix fatal error - EmptySet does not exist anymore (duration: 03m 03s)
  • 19:45 sbisson@deploy1001: Synchronized php-1.33.0-wmf.16/extensions/GrowthExperiments/: SWAT: Help Panel: Fix iOS scroll bug (duration: 03m 02s)
  • 19:28 sbisson@deploy1001: sync-file aborted: SWAT: GrowthExperiments: Enable search for help panel on testwiki (duration: 02m 22s)
  • 19:25 sbisson@deploy1001: Synchronized wmf-config/InitialiseSettings.php: SWAT: GrowthExperiments: Enable search for help panel on testwiki (duration: 03m 04s)
  • 18:32 mutante: LDAP - adding raz-shuty to group nda (T214488)
  • 17:06 jynus@deploy1001: Synchronized wmf-config/db-eqiad.php: Repool db1085 (duration: 03m 03s)
  • 16:03 jynus: restart db1085, temporary s6 lag on wikireplicas
  • 15:55 gehel: starting reimage of maps2004 - T198622
  • 15:51 jynus@deploy1001: Synchronized wmf-config/db-eqiad.php: Depool db1085 (duration: 00m 58s)
  • 15:16 anomie@mwmaint1002: Fixing log_search after migrateActors.php on wikitech for T215464. This may cause lag in codfw.
  • 15:16 anomie@mwmaint1002: Fixing log_search after migrateActors.php on section 8 wikis for T215464. This may cause lag in codfw.
  • 15:16 anomie@mwmaint1002: Fixing log_search after migrateActors.php on section 7 wikis for T215464. This may cause lag in codfw.
  • 15:16 anomie@mwmaint1002: Fixing log_search after migrateActors.php on section 6 wikis for T215464. This may cause lag in codfw.
  • 15:16 anomie@mwmaint1002: Fixing log_search after migrateActors.php on section 5 wikis for T215464. This may cause lag in codfw.
  • 15:16 anomie@mwmaint1002: Fixing log_search after migrateActors.php on section 4 wikis for T215464. This may cause lag in codfw.
  • 15:16 anomie@mwmaint1002: Fixing log_search after migrateActors.php on remaining section 3 wikis for T215464. This may cause lag in codfw.
  • 15:16 anomie@mwmaint1002: Fixing log_search after migrateActors.php on section 2 wikis for T215464. This may cause lag in codfw.
  • 15:16 anomie@mwmaint1002: Fixing log_search after migrateActors.php on section 1 wikis for T215464. This may cause lag in codfw.
  • 15:07 anomie@mwmaint1002: Fixing log_search after migrateActors.php on test wikis and mediawikiwiki for T215464. This may cause lag in codfw.
  • 15:01 marostegui@deploy1001: Synchronized wmf-config/db-eqiad.php: Fully repool db1101 (duration: 00m 55s)
  • 14:39 marostegui@deploy1001: Synchronized wmf-config/db-eqiad.php: Slowly repool db1101 after alter and mysql upgrade (duration: 00m 55s)
  • 14:34 jbond42: deploying security updates for libgd3
  • 12:42 Amir1: EU SWAT is done
  • 12:42 ladsgroup@deploy1001: Synchronized wmf-config/Wikibase.php: SWAT: Set EntityUsageTable addUsage batch size to 300, Part II (duration: 00m 54s)
  • 12:42 marostegui: Set dbstore1002 as IDEMPOTENT - T213670
  • 12:39 ladsgroup@deploy1001: Synchronized wmf-config/InitialiseSettings.php: SWAT: Set EntityUsageTable addUsage batch size to 300 (T215146), Part I (duration: 00m 55s)
  • 12:34 marostegui: Powercycle mw1299 as it is down and not responding
  • 12:31 marostegui@deploy1001: Synchronized wmf-config/db-eqiad.php: Slowly repool db1101 after alter and mysql upgrade (duration: 03m 02s)
  • 12:26 ladsgroup@deploy1001: Synchronized wmf-config/interwiki.php: SWAT: Update interwiki cache to have yuewiktionary instead of zh-yue (T214400) (duration: 03m 04s)
  • 12:06 vgutierrez@puppetmaster1001: conftool action : set/pooled=yes; selector: name=cp4026.ulsfo.wmnet
  • 12:03 arturo: T214448 reimaging again cloudvirt200[1-3]-dev.codfw.wmnet
  • 11:55 marostegui: Stop MySQL on db1101:3317 and db1101:3318 for mysql upgrade
  • 11:37 jynus@deploy1001: Synchronized wmf-config/db-codfw.php: Repool db2055 (duration: 03m 02s)
  • 11:17 fsero: upgrade helm to 2.12.2 on deploy{1001,2001} and contint{1001,2001} T215244
  • 11:16 fsero: upgrade helm to 2.12.2 on deploy{1001,2001} and contint{1001,2001}
  • 10:58 marostegui@deploy1001: Synchronized wmf-config/db-eqiad.php: Depool db1101 for alter and mysql upgrade (duration: 00m 56s)
  • 10:43 marostegui: Run mysqldump from dbstore1003 to dump dbstore1002:staging.mep_word_persistence - T215450
  • 09:49 marostegui: Deploy schema change on db1116 - T210713
  • 09:41 akosiaris: reboot mwdebug1001, mwdebug1002, mwdebug2001, mwdebug2002 for VCPU upgrade. T212955
  • 09:23 jynus: running alter table on db2055 for perforamance testing T212092
  • 09:15 fsero: uploading helm and tiller 2.12.2 deb package to stretch and jessie
  • 08:53 marostegui: Deploy schema change on s7 codfw master (db2047), this will generate lag on s7 codfw - T210713
  • 08:34 godog: swift codfw-prod: more weight to ms-be2047 - T209395 T209921
  • 08:14 marostegui: Deploy schema change on s4 primary master (db1068) - T210713
  • 08:13 marostegui@deploy1001: Synchronized wmf-config/db-eqiad.php: Repool db1081 (duration: 00m 54s)
  • 07:50 marostegui: Deploy schema change on db1081
  • 07:49 marostegui@deploy1001: Synchronized wmf-config/db-eqiad.php: Depool db1081 (duration: 00m 53s)
  • 07:48 reedy@deploy1001: Synchronized wmf-config/interwiki.php: Update interwiki cache (duration: 02m 20s)
  • 07:43 marostegui@deploy1001: Synchronized wmf-config/db-eqiad.php: Repool db1084 (duration: 00m 53s)
  • 07:42 reedy@deploy1001: Synchronized dblists/: Wikimania T215486 (duration: 00m 54s)
  • 07:03 marostegui: Deploy schema change on db1084 - T210713
  • 07:03 marostegui@deploy1001: Synchronized wmf-config/db-eqiad.php: Depool db1084 (duration: 00m 55s)
  • 06:48 marostegui: Restore consistency options on db2051
  • 06:14 marostegui: Ease consistency options on db2051 (s4 master) to let it catch up on replication
  • 04:35 tstarling@deploy1001: Synchronized wmf-config/set-time-limit.php: (no justification provided) (duration: 00m 54s)
  • 04:00 reedy@deploy1001: Synchronized wmf-config/InitialiseSettings.php: Disable EP namespaces on wikis with no EP pages (duration: 00m 57s)
  • 01:31 eileen: civicrm revision changed from c5aec3ae76 to 08be00e87f, config revision is 306b4de48f
  • 01:24 eileen: civicrm revision changed from 6161a021c0 to c5aec3ae76, config revision is 306b4de48f
  • 01:05 twentyafterfour: US Evening SWAT is complete
  • 01:04 twentyafterfour: no phabricator deployment tonight
  • 01:04 eileen: civicrm revision changed from 613b388916 to 6161a021c0, config revision is 306b4de48f
  • 00:57 twentyafterfour@deploy1001: Synchronized wmf-config/InitialiseSettings.php: SWAT config change for Bug: T214003 (duration: 00m 53s)
  • 00:53 twentyafterfour@deploy1001: Synchronized php-1.33.0-wmf.16/extensions/VisualEditor/: SWAT f89e12f to fix bug: T209610 (duration: 00m 55s)
  • 00:48 twentyafterfour@deploy1001: Synchronized php-1.33.0-wmf.16/extensions/MobileFrontend/: SWAT dd8654a (duration: 01m 00s)
  • 00:47 twentyafterfour: syncing commit dd8654a for Bug: T209052
  • 00:24 twentyafterfour: running `mwscript migrateUserGroup.php commonswiki extended-uploader autopatrolled` on deploy1001

2019-02-06

  • 23:58 mutante: restarting icinga on icinga1001 to pick up new check command ?
  • 22:22 twentyafterfour@deploy1001: Synchronized php: group1 wikis to 1.33.0-wmf.16 refs T206670 (duration: 00m 53s)
  • 22:22 twentyafterfour@deploy1001: rebuilt and synchronized wikiversions files: group1 wikis to 1.33.0-wmf.16 refs T206670
  • 21:45 mutante: LDAP - adding brennen to wmf, releng, ciadmin - Welcome Brennen Bearnes, Software Engineer in Release Engineering (T215365 T214556)
  • 21:05 arlolra@deploy1001: Finished deploy [parsoid/deploy@a4acfa6]: Updating Parsoid to fb67a71 (duration: 03m 43s)
  • 21:04 Krinkle: krinkle@webperf1002 Kill xenon-log (pid 449). It seems its Redis TCP socket to mwlog1001 has been stuck since Dec 13, causing the process to indefinitely hang on listen()/socket.recv()
  • 21:01 arlolra@deploy1001: Started deploy [parsoid/deploy@a4acfa6]: Updating Parsoid to fb67a71
  • 20:49 mutante: LDAP - adding h78na to wmf - welcome Hana Worku, developer on the multimedia team (T215352)
  • 20:40 mutante: LDAP - adding egardner to wmf - welcome Eric Gardner , software engineer in Audiences (T214654)
  • 20:35 twentyafterfour: 1.33.0-wmf.16 has a significantly higher rate of "entire web request took longer than 60 seconds and timed out"
  • 20:03 twentyafterfour: Resuming the MediaWiki train for version 1.33.0-wmf.16. Will deploy Group0 wikis first and then catch up to group1 after a few minutes monitoring logs for stability.
  • 19:50 robh: updated firmware on cp4026 and re-seated (already well seated) dimm b3. errors have cleared for now T214516
  • 19:24 milimetric@deploy1001: Finished deploy [analytics/refinery@cd413dd]: Small bug fix for history checker (duration: 12m 45s)
  • 19:13 robh: taking cp4026 offline to flash firmware and reseat dimm for testing on T214516
  • 19:12 milimetric@deploy1001: Started deploy [analytics/refinery@cd413dd]: Small bug fix for history checker
  • 19:11 thcipriani@deploy1001: Finished deploy [gerrit/gerrit@3272a46]: Add healthcheck plugin (no restart) cobalt T214326 (duration: 00m 09s)
  • 19:11 thcipriani@deploy1001: Started deploy [gerrit/gerrit@3272a46]: Add healthcheck plugin (no restart) cobalt T214326
  • 19:09 thcipriani@deploy1001: Finished deploy [gerrit/gerrit@3272a46]: Add healthcheck plugin (no restart) gerrit2001 first (duration: 00m 10s)
  • 19:09 mutante: LDAP - adding afandian2 and toddleroux to nda (T214727)
  • 19:09 thcipriani@deploy1001: Started deploy [gerrit/gerrit@3272a46]: Add healthcheck plugin (no restart) gerrit2001 first
  • 19:04 jforrester@deploy1001: Synchronized php-1.33.0-wmf.16/extensions/Flow/includes/Conversion/Utils.php: I405dd193 Update Parsoid Accept header to 2.0.0 so service can deploy (duration: 00m 54s)
  • 19:03 jforrester@deploy1001: Synchronized php-1.33.0-wmf.14/extensions/Flow/includes/Conversion/Utils.php: I405dd193 Update Parsoid Accept header to 2.0.0 so service can deploy (duration: 00m 56s)
  • 18:03 thcipriani@deploy1001: Synchronized wmf-config/InitialiseSettings.php: SWAT: Removed namespace Коментар, added namespace Портал on srwikinews T214561 T214563 (duration: 00m 53s)
  • 18:01 mutante: LDAP - adding alaasarhan to wmde (T215066)
  • 17:57 thcipriani@deploy1001: Synchronized wmf-config/InitialiseSettings.php: SWAT: Changed wgImportSources for srwikinews T214562 (duration: 00m 53s)
  • 17:53 thcipriani@deploy1001: Synchronized dblists/s3.dblist: SWAT: dblists/s3.dblist: Fix sorting of list of wikis per alphabetical order (duration: 00m 54s)
  • 17:49 thcipriani@deploy1001: Synchronized php-1.33.0-wmf.16/extensions/MobileFrontend: SWAT: VE: Load HTML in parallel with modules T209052 (duration: 00m 57s)
  • 17:40 thcipriani@deploy1001: Synchronized php-1.33.0-wmf.16/extensions/MobileFrontend: SWAT: EditorOverlay: Pass constructor of itself to VisualEditorOverlay, not instance T215408 (duration: 00m 57s)
  • 17:10 jynus: setting db1111 in read-write mode
  • 16:24 moritzm: reimaging graphite2002 to buster
  • 16:19 jynus: running alter table on db2055 T93564
  • 16:14 gehel@puppetmaster1001: conftool action : set/pooled=true; selector: dnsdisc=kartotherian,name=eqiad
  • 15:44 papaul: powering down thumbor2002 for disk replacement
  • 15:42 moritzm: installing spice security updates
  • 15:41 andrewbogott: rebooting cloudvirt1015 to make sure that nothing drastic changes once libguestfs is installed T215423
  • 15:11 moritzm: installing libav security updates
  • 15:08 jynus@deploy1001: Synchronized wmf-config/db-codfw.php: Depool db2055 for performance testing T93564 (duration: 00m 55s)
  • 14:50 moritzm: draining restbase1018 for eventual reboot for kernel security update (bundled with Java update)
  • 14:36 moritzm: draining restbase1017 for eventual reboot for kernel security update (bundled with Java update)
  • 14:29 elukey: add term mysql-dbstore to analytics-in4/6 on cr1/2-eqiad to allow tcp connections to dbstore100[3-5] - T210478
  • 12:30 vgutierrez@puppetmaster1001: conftool action : set/pooled=yes; selector: name=cp3032.esams.wmnet
  • 12:29 vgutierrez@puppetmaster1001: conftool action : set/pooled=no; selector: name=cp3032.esams.wmnet
  • 12:28 vgutierrez@puppetmaster1001: conftool action : set/pooled=yes; selector: name=cp3033.esams.wmnet
  • 12:26 vgutierrez@puppetmaster1001: conftool action : set/pooled=no; selector: name=cp3033.esams.wmnet
  • 12:25 vgutierrez@puppetmaster1001: conftool action : set/pooled=yes; selector: name=cp3040.esams.wmnet
  • 12:24 vgutierrez@puppetmaster1001: conftool action : set/pooled=no; selector: name=cp3040.esams.wmnet
  • 12:22 vgutierrez@puppetmaster1001: conftool action : set/pooled=yes; selector: name=cp3041.esams.wmnet
  • 12:22 Amir1: EU SWAT is done
  • 12:21 vgutierrez@puppetmaster1001: conftool action : set/pooled=no; selector: name=cp3041.esams.wmnet
  • 12:20 vgutierrez: restarting varnish-fe safely across esams/text cluster - T215389
  • 12:19 ladsgroup@deploy1001: Synchronized wmf-config/Wikibase.php: SWAT: Use separate DB connection for ID insertions on testwikidatawiki (T215147), Part II (duration: 00m 54s)
  • 12:17 ladsgroup@deploy1001: Synchronized wmf-config/InitialiseSettings.php: SWAT: Use separate DB connection for ID insertions on testwikidatawiki (T215147), Part I (duration: 00m 55s)
  • 11:58 vgutierrez@puppetmaster1001: conftool action : set/pooled=yes; selector: name=cp3042.esams.wmnet
  • 11:57 vgutierrez@puppetmaster1001: conftool action : set/pooled=no; selector: name=cp3042.esams.wmnet
  • 11:56 vgutierrez: restarting varnish-fe in cp3042 - T215389
  • 11:02 _joe_: restarting nginx safely across the appserver fleets in order to be able to run puppet without errors
  • 10:41 marostegui: Revoke access to testreduce from ruthenium on m5 - https://phabricator.wikimedia.org/T214740
  • 10:04 moritzm: reimaging graphite2002 to buster
  • 10:01 akosiaris: restart varnish-frontend on cp3030 T215389
  • 10:01 marostegui@deploy1001: Synchronized wmf-config/db-eqiad.php: Repool db1091 (duration: 00m 52s)
  • 09:33 marostegui: Remove wikiuser from dbstore1003-dbstore1005 T210478
  • 09:15 godog: swift codfw-prod: more weight for ms-be2047 - T209395 T209921
  • 09:00 marostegui: Create research_role on dbstore1003-1005 on all instances - T214469
  • 08:49 marostegui: Deploy schema change on db1091 - T210713
  • 08:48 marostegui@deploy1001: Synchronized wmf-config/db-eqiad.php: Depool db1091 (duration: 00m 53s)
  • 08:24 marostegui@deploy1001: Synchronized wmf-config/db-eqiad.php: Repool db1121 (duration: 00m 53s)
  • 07:51 marostegui: Deploy schema change on db1121 - this will generate lag on s4 labs - also upgrade MySQL on db1121 T210713
  • 07:51 marostegui@deploy1001: Synchronized wmf-config/db-eqiad.php: Depool db1121 (duration: 00m 54s)
  • 07:47 marostegui@deploy1001: Synchronized wmf-config/db-eqiad.php: Repool db1103:3314 (duration: 00m 54s)
  • 07:19 marostegui: Deploy schema change on wikitech T210713
  • 07:14 marostegui: Stop 's4' slave on dbstore1002
  • 07:13 marostegui: Deploy schema change on db1103:3314 (db1097:3314 was also done previously) - T210713
  • 07:13 marostegui@deploy1001: Synchronized wmf-config/db-eqiad.php: Depool db1103:3314 (duration: 00m 53s)
  • 07:09 marostegui@deploy1001: Synchronized wmf-config/db-eqiad.php: Repool db1097:3314 (duration: 00m 56s)
  • 06:29 marostegui@deploy1001: Synchronized wmf-config/db-eqiad.php: Depool db1097:3314 (duration: 01m 06s)
  • 04:39 mutante: reloaded icinga service, cant find new check command definition
  • 03:14 twentyafterfour@deploy1001: Finished scap: testwikis wikis to 1.33.0-wmf.16 refs T206670 (duration: 04m 18s)
  • 03:09 twentyafterfour@deploy1001: Started scap: testwikis wikis to 1.33.0-wmf.16 refs T206670
  • 03:05 mutante: actinium - gzipping and rotating some access logs
  • 03:01 twentyafterfour@deploy1001: Synchronized scap/plugins/updateinterwikicache.py: (no justification provided) (duration: 00m 55s)
  • 02:47 mutante: actinium - blocking a bad domain and restarting squid3
  • 02:40 twentyafterfour@deploy1001: Finished scap: sync and update localization for 1.33.0-wmf.16 (duration: 15m 50s)
  • 02:32 XioNoX: push firewall rule to pfw3-eqiad - T215364
  • 02:27 mutante: actinium - apt-get clean for 8% more disk space after icinga alert
  • 02:25 twentyafterfour@deploy1001: Started scap: sync and update localization for 1.33.0-wmf.16
  • 02:16 twentyafterfour@deploy1001: rebuilt and synchronized wikiversions files: all wikis to 1.33.0-wmf.14 refs T206670
  • 02:12 eileen: civicrm revision changed from 6042acb363 to 613b388916, config revision is 306b4de48f
  • 02:02 twentyafterfour@deploy1001: scap failed: average error rate on 7/11 canaries increased by 10x (rerun with --force to override this check, see https://logstash.wikimedia.org/goto/db09a36be5ed3e81155041f7d46ad040 for details)
  • 01:22 XioNoX: remove peering4/6 prefix-list from routers
  • 01:07 XioNoX: add maintenance and rollback to junos operations class
  • 00:47 twentyafterfour@deploy1001: Started scap: testwikis wikis to 1.33.0-wmf.16 refs T206670
  • 00:33 niharika29@deploy1001: Synchronized php-1.33.0-wmf.14/extensions/MobileFrontend/: EditorOverlay: captcha/abusefilter weren't being shown correctly T215101, T202374 (duration: 00m 50s)
  • 00:24 niharika29@deploy1001: Synchronized wmf-config/InitialiseSettings.php: Demystify Logstash debug level behavior (duration: 00m 51s)
  • 00:23 niharika29@deploy1001: Synchronized wmf-config/logging.php: Demystify Logstash debug level behavior (duration: 00m 46s)
  • 00:18 niharika29@deploy1001: Synchronized wmf-config/logging.php: Add PHP version to MW logs T215350 (duration: 00m 46s)
  • 00:16 niharika29@deploy1001: Synchronized wmf-config/CommonSettings.php: Preserve Composer's include paths - T215126, T215224 (duration: 01m 40s)

2019-02-05

  • 18:56 arlolra@deploy1001: Finished deploy [parsoid/deploy@a4acfa6]: (no justification provided) (duration: 02m 06s)
  • 18:53 arlolra@deploy1001: Started deploy [parsoid/deploy@a4acfa6]: (no justification provided)
  • 18:39 arlolra@deploy1001: Finished deploy [parsoid/deploy@a4acfa6]: Updating Parsoid to fb67a71 (duration: 09m 54s)
  • 18:29 arlolra@deploy1001: Started deploy [parsoid/deploy@a4acfa6]: Updating Parsoid to fb67a71
  • 18:26 bsitzmann@deploy1001: Finished deploy [mobileapps/deploy@2959e12]: Update mobileapps to 107c1b1 (T214714) (duration: 04m 43s)
  • 18:21 bsitzmann@deploy1001: Started deploy [mobileapps/deploy@2959e12]: Update mobileapps to 107c1b1 (T214714)
  • 18:17 mutante: contint1001/contint2001 -manually deleting crontab lines unpuppetized in gerrit:488019 (T209361)
  • 18:13 Jeff_Green: authdns-update to deploy 7fee817fd3
  • 17:22 mutante: scandium - restart parsoid-vd service
  • 17:21 mutante: scandium -- copy /srv/visualdiff/testrecude/testrun.ids from ruthenium to the same locatio
  • 15:15 godog: force curator action 'replicas' to set older logstash indices to 1 replica - T213078
  • 14:30 marostegui: Deploy schema change on s4 codfw master with replication, lag will be generated on s4 codfw - T210713
  • 14:26 Jeff_Green: authdns-update for payments dev/testing hostname
  • 14:12 marostegui: Deploy schema change on db1066 (s2 master) - T210713
  • 14:05 marostegui: Delete non used grants from dbstore1002: log, warehouse,project_illustration, cognate\_wiktionary, datasets - T212487 T210478
  • 13:55 godog: swift codfw-prod: add ms-be2047 - T209395 T209921
  • 12:18 addshore: swat done
  • 12:18 addshore@deploy1001: Synchronized wmf-config/InitialiseSettings.php: SWAT: Disable confirmation prompt on rollback by default T215019 (duration: 00m 47s)
  • 11:35 moritzm: added firmware-enriched buster netboot image (T213546)
  • 11:04 marostegui@deploy1001: Synchronized wmf-config/db-eqiad.php: Repool db1074 (duration: 00m 46s)
  • 10:43 marostegui: Deploy schema change on db1074 with replication, lag will be generated on s2 - T210713
  • 10:42 marostegui@deploy1001: Synchronized wmf-config/db-eqiad.php: Depool db1074 (duration: 00m 47s)
  • 10:12 marostegui@deploy1001: Synchronized wmf-config/db-eqiad.php: Repool db1090:3312 (duration: 00m 46s)
  • 09:42 hashar: contint1001: docker image prune -f
  • 09:34 marostegui: Deploy schema change on db1090:3312 - T210713
  • 09:34 marostegui@deploy1001: Synchronized wmf-config/db-eqiad.php: Depool db1090:3312 (duration: 00m 45s)
  • 09:26 marostegui@deploy1001: Synchronized wmf-config/db-eqiad.php: Fully repool db1076 (duration: 00m 46s)
  • 09:11 marostegui: Start all slaves on dbstore1002 - T213670
  • 08:56 marostegui@deploy1001: Synchronized wmf-config/db-eqiad.php: Slowly repool db1076 (duration: 00m 45s)
  • 08:33 marostegui@deploy1001: Synchronized wmf-config/db-eqiad.php: Increase traffic for db1076 (duration: 00m 46s)
  • 08:16 marostegui@deploy1001: Synchronized wmf-config/db-eqiad.php: Slowly repool db1076 (duration: 00m 45s)
  • 07:56 marostegui: Upgrade MySQL and kernel on db1076
  • 07:44 marostegui: Deploy schema change on db1076 - T210713
  • 07:44 marostegui@deploy1001: Synchronized wmf-config/db-eqiad.php: Depool db1076 T210713 (duration: 00m 47s)
  • 07:13 marostegui: Taking mysqldump from dbstore1002.staging - T210478
  • 07:05 marostegui: Reboot mysql on db1117:3323 (this will make the dbproxies complain) T214248
  • 02:24 XioNoX: remove BGP session to as6412 on cr2-eqiad (gone from IX)
  • 02:21 XioNoX: delete 2nd as9121 router on cr2-esams
  • 00:47 XioNoX: add BGP sessions to AS64050 on cr1-eqsin
  • 00:24 maxsem@deploy1001: Synchronized wmf-config/CommonSettings.php: https://gerrit.wikimedia.org/r/#/c/operations/mediawiki-config/+/486405/ (duration: 00m 46s)
  • 00:11 maxsem@deploy1001: Synchronized wmf-config/InitialiseSettings.php: (no justification provided) (duration: 00m 46s)

2019-02-04

  • 22:05 mutante: scandium - systemctl start parsoid-vd (T201366)
  • 20:01 herron: manually ran puppet on mc1023
  • 19:50 jforrester@deploy1001: Synchronized wmf-config/CommonSettings.php: Clean-up: Stop setting wgParsoidWikiPrefix, unused since the Parsoid extension (duration: 00m 45s)
  • 19:45 jforrester@deploy1001: Synchronized wmf-config/CommonSettings.php: Clean-up: Stop setting wgFlowEventLogging, unread (duration: 00m 45s)
  • 19:39 jforrester@deploy1001: Synchronized wmf-config/InitialiseSettings.php: Clean-up: Stop setting values for wgEcho*FooterNotice*, unread (duration: 00m 46s)
  • 19:32 James_F: Manually purged atjwiki*.png logos for T215122.
  • 19:28 jforrester@deploy1001: Synchronized static/images/project-logos/atjwiki.png: SWAT: Milestone lobo for atjwiki T215122, 1x (duration: 00m 46s)
  • 19:27 jforrester@deploy1001: Synchronized static/images/project-logos/atjwiki-1.5x.png: SWAT: Milestone lobo for atjwiki T215122, 1.5x (duration: 00m 45s)
  • 19:26 jforrester@deploy1001: Synchronized static/images/project-logos/atjwiki-2x.png: SWAT: Milestone lobo for atjwiki T215122, 2x (duration: 00m 44s)
  • 19:22 jforrester@deploy1001: Synchronized wmf-config/InitialiseSettings.php: SWAT: T191039 Enable wgAbuseFilterRuntimeProfile on all wikis (duration: 00m 47s)
  • 19:19 smalyshev@deploy1001: Finished deploy [wdqs/wdqs@8b2f078]: Weekly GUI deploy (duration: 09m 47s)
  • 19:09 smalyshev@deploy1001: Started deploy [wdqs/wdqs@8b2f078]: Weekly GUI deploy
  • 18:31 XioNoX: adding Papaul to root@wiki
  • 18:22 jforrester@deploy1001: Synchronized wmf-config/CommonSettings.php: Clean-up: Drop reading for wgEcho*FooterNotice*, unread (duration: 00m 46s)
  • 18:18 jforrester@deploy1001: Synchronized wmf-config/CommonSettings.php: Clean-up: Stop setting wgEchoConfig, unused since 2016 (duration: 00m 48s)
  • 18:11 jforrester@deploy1001: Synchronized dblists/: T213504: Finally, drop the wikidatarepo dblist (duration: 00m 45s)
  • 18:09 jforrester@deploy1001: Synchronized wmf-config/CommonSettings.php: T213504: Stop telling CommonsSettings about the wikidatarepo dblist (duration: 00m 45s)
  • 18:06 jforrester@deploy1001: Synchronized wmf-config/InitialiseSettings.php: T213504: Unconfigure the wikidatarepo dblist (duration: 00m 46s)
  • 18:05 XioNoX: manually rotate log file wtmp on csw2-esams
  • 18:02 jforrester@deploy1001: Synchronized wmf-config/InitialiseSettings.php: T213504: Configure wikibaserepo dblist just like the wikidatarepo one (duration: 00m 46s)
  • 17:58 jforrester@deploy1001: Synchronized wmf-config/CommonSettings.php: T213504: Tell CommonSettings about the new wikibaserepo dblist (duration: 00m 47s)
  • 17:56 jforrester@deploy1001: Synchronized dblists/wikibaserepo.dblist: T213504: Create the new wikibaserepo dblist (duration: 00m 47s)
  • 17:25 papaul: powering down thumbor2002 for disk replacement
  • 17:10 XioNoX: revert ospf metrics to normal values on esams-eqiad Level3 link
  • 16:50 Lucas_WMDE: deployed patch for T212118
  • 12:41 Lucas_WMDE: EU SWAT done
  • 12:40 lucaswerkmeister-wmde@deploy1001: Synchronized wmf-config/InitialiseSettings.php: SWAT: Fix Wikidata base URI in client config (T198946) (duration: 00m 46s)
  • 12:34 ladsgroup@deploy1001: Synchronized wmf-config/InitialiseSettings.php: SWAT: Populate wmgWikibaseRepoSpecialSiteLinkGroups for commonswiki (T213975) (duration: 00m 51s)
  • 11:04 moritzm: installing ghostscript security updates
  • 08:48 jynus: fixing dbstore1002 x1 replication
  • 07:56 vgutierrez: uploaded certcentral 0.8 to apt.wikimedia.org (stretch) - T209980 T213820 T213301

2019-02-03

  • 20:25 elukey: powercycle mw1272 - no ssh, no tty available via com2 - DIMM correctable errors + OEM errors registered in getsel
  • 18:56 elukey: started a tmux session on dbstore1002 to migrate all the tokudb tables of mediawikiwiki to InnoDB - (s3 replication broken)
  • 17:53 elukey: start all slaves on dbstore1002 (After a crash + recovery) + moved mediawikiwiki.revision_actor_temp to Innodb to unblock s3 slave replication (still broken though)
  • 04:55 legoktm@deploy1001: Synchronized wmf-config/extension-list: Remove WikibaseQuality from extensions-list (T208499) (duration: 00m 51s)
  • 01:10 elukey: powercycle mw1299 - can't ssh nor get a tty via console - racadm getsel shows "An OEM diagnostic event occurred."

2019-02-02

  • 20:42 chaomodus: restarted pdfrender on scb1003
  • 20:41 chaomodus: restarted pdfrender on scb1004
  • 20:06 chaomodus: parsoid was failed on scandium and alerting, the service parsoid-vd was restarted and appears to have come back
  • 05:44 jforrester@deploy1001: Synchronized php-1.33.0-wmf.14/extensions/VisualEditor/lib/ve/src/ui/dialogs/ve.ui.FindAndReplaceDialog.js: b/src/ui/dialogs/ve.ui.FindAndReplaceDialog.js T214963 Hot-deploy VE fix to stop hitting user pref writes without debounce (duration: 01m 02s)

2019-02-01

  • 23:16 vgutierrez: restart pdfrender on scb1004
  • 21:57 ejegg: updated payments-wiki-staging from 7767c7027e to 52a271e681
  • 21:25 ejegg: updated payments-wiki-staging to fundraising/REL1_31 branch
  • 07:13 bawolff_: reset 2FA on wikitech for User:Cicalese

2019-01-31

  • 17:44 jynus: running alter table on metawiki.revision_actor_temp, trying to fix TokuDB horrible bugs
  • 15:54 jynus: stop, upgrade and restart db1117
  • 13:34 mvolz@deploy1001: scap-helm zotero finished
  • 13:34 mvolz@deploy1001: scap-helm zotero cluster codfw completed
  • 13:34 mvolz@deploy1001: scap-helm zotero upgrade production -f zotero-values-codfw.yaml stable/zotero [namespace: zotero, clusters: codfw]
  • 13:31 mvolz@deploy1001: scap-helm zotero finished
  • 13:31 mvolz@deploy1001: scap-helm zotero cluster eqiad completed
  • 13:31 mvolz@deploy1001: scap-helm zotero upgrade production -f zotero-values-eqiad.yaml stable/zotero [namespace: zotero, clusters: eqiad]
  • 13:19 mvolz@deploy1001: scap-helm zotero finished
  • 13:19 mvolz@deploy1001: scap-helm zotero cluster staging completed
  • 13:19 mvolz@deploy1001: scap-helm zotero upgrade staging -f zotero-values-staging.yaml --version=0.0.1 stable/zotero [namespace: zotero, clusters: staging]
  • 13:18 mvolz@deploy1001: scap-helm zotero upgrade staging -f zotero-values-staging.yaml stable/zotero [namespace: zotero, clusters: staging]
  • 12:54 jynus: stop, upgrade and restart db2044
  • 12:12 jynus: apply new grants to m5-master with replication T214740
  • 11:30 arturo: T215012 icinga downtime cloudvirt1015 for 4h while investigating issues
  • 11:24 arturo: T215012 reboot cloudvirt1015
  • 11:24 jynus: restart eventstreams on scb1002,3,4
  • 11:22 jynus: restart eventstreams on scb1001
  • 10:22 jynus: resetting to defaults innodb consistency options for db2048 T188327
  • 10:00 jynus: restarting pdfrender on scb1002,3,4
  • 09:54 jynus: restarting pdfrender on scb1001
  • 02:01 gtirloni: T215004 restarted gerrit (using 1200% cpu, 71% mem)

2019-01-30

  • 20:28 bawolff_: reset 2FA@wikitech for User:deigo
  • 18:25 ladsgroup@deploy1001: Finished deploy [ores/deploy@ad160b0]: (no justification provided) (duration: 12m 46s)
  • 18:12 ladsgroup@deploy1001: Started deploy [ores/deploy@ad160b0]: (no justification provided)
  • 18:03 jynus: reducing innodb consistency options for db2048 T188327
  • 17:36 XioNoX: deactivate/activate cr2-esams:xe-0/1/3
  • 17:28 akosiaris: restart pdfrender on scb1003, scb1004
  • 16:19 akosiaris: restart proton on proton1002
  • 15:52 jynus: stop, upgrade and restart db2037
  • 15:24 jynus: stop, upgrade and restart db2042
  • 14:27 jynus: stop, upgrade and restart db2034, this will cause some lag on x1-codfw
  • 13:53 jynus: stop, upgrade and restart db2069
  • 11:20 jynus: stop, upgrade and restart db2045, this will cause some lag on s8-codfw
  • 10:54 jynus: stop, upgrade and restart db2079
  • 10:33 jynus: stop, upgrade and restart db2039, this will cause some lag on s6-codfw
  • 10:03 jynus: stop, upgrade and restart db2052, this will cause some lag on s5-codfw
  • 09:31 jynus: stop, upgrade and restart db2089 (s5/s6)
  • 08:58 jynus: stop, upgrade and restart db2051, this will cause some lag on s4-codfw
  • 08:44 jynus: stop, upgrade and restart db2090

2019-01-29

  • 21:52 jijiki: Depooling thumbor2002 due to disc failure - T214813
  • 16:51 arturo: T214499 update Netbox status for cloudvirt1023/1024/1025/1026/1027 from PLANNED to ACTIVE. These servers are actually providing services already.
  • 10:05 jynus: stop, upgrade and restart db2065
  • 09:28 jynus: stop, upgrade and restart db2058
  • 09:12 jynus: stopping, upgrading and restarting db2035, this will cause lag on codfw-s2
  • 08:58 jynus: stop, upgrade and restart db2041
  • 08:38 jynus: stop, upgrade and restart db2056
  • 08:17 jynus@deploy1001: Synchronized wmf-config/db-eqiad.php: Repool db1114 after crash (duration: 00m 52s)
  • 03:32 XioNoX: bump cr2-esams-cr2-eqiad ospf cost to 2000 for level3 link flapping

2019-01-28

  • 23:51 vgutierrez: restarting cp2014 - T214872
  • 21:02 Zoranzoki21: Done wikitext export of content of database for education program on srwiki - T174802 (duration: 8 minutes)
  • 20:54 Zoranzoki21: Starting wikitext export of content of database for education program on srwiki - T174802 (21:54 UTC+1)
  • 19:55 brion: running final pass of requeueTranscodes.php on all wikis to make sure stray missing VP9 transcodes are cleaned up (on mwmaint1002 in a tmux session)
  • 16:41 hashar: contint1001: cleaning up disk space on / (docker images)
  • 16:36 jynus: remove backups dir at dbstore2001 T214831
  • 15:22 thcipriani: restarting jenkins for update
  • 14:16 jynus: stop, upgrade and reboot db2048, this will cause general lag/read only on enwiki/s1-codfw for some minutes
  • 13:52 jynus: stop, upgrade and reboot db2092
  • 12:55 jynus: stop, upgrade and reboot db2085
  • 12:45 jynus: powercycle ms-be1034
  • 12:42 onimisionipe: restarting all elatsicsearch instances on relforge1002 to test spicerack command
  • 11:21 jynus: stop, upgrade and reboot db2062
  • 10:45 jynus: stop, upgrade and reboot db2055

2019-01-27

  • 16:22 godog: powercycle ms-be1020 - T214778
  • 03:28 marostegui: Fix x1 on dbstore1002 - T213670
  • 02:24 jforrester@deploy1001: Synchronized php-1.33.0-wmf.14/extensions/WikibaseMediaInfo/src/WikibaseMediaInfoHooks.php: Hot-deploy Ic2b08cb27 in WBMI to fix Commons File page display (duration: 00m 49s)

2019-01-26

  • 11:06 volans: force rebooting icinga1001 (no ping, no ssh, stuck console)
  • 03:23 marostegui: Convert all tables on incubatorwiki to innodb to fix s3 thread - T213670
  • 00:03 XioNoX: split member-range ge-3/0/0 to ge-3/0/38 on asw-b-codfw

2019-01-25

  • 22:45 bsitzmann@deploy1001: Finished deploy [mobileapps/deploy@5e859c4]: Update mobileapps to a8834e8 (T214728) (duration: 03m 27s)
  • 22:42 bsitzmann@deploy1001: Started deploy [mobileapps/deploy@5e859c4]: Update mobileapps to a8834e8 (T214728)
  • 21:56 krinkle@deploy1001: Synchronized wmf-config/flaggedrevs.php: I95c37d628557c (duration: 00m 46s)
  • 21:44 krinkle@deploy1001: Synchronized wmf-config/: Idb695dd033d42 (duration: 00m 46s)
  • 21:43 krinkle@deploy1001: Synchronized wmf-config/PhpAutoPrepend.php: Idb695dd033d42 (duration: 00m 47s)
  • 21:05 robh: cleared sel on db1068, it had a power redundancy loss event (old and resolved) that was triggering the icinga check
  • 20:04 jynus@deploy1001: Synchronized wmf-config/db-eqiad.php: Pool db1106 as an extra api host (duration: 00m 46s)
  • 19:36 jynus: powercycle db1114 T214720
  • 19:21 jynus: disabling notifications on db1114
  • 19:21 jynus@deploy1001: Synchronized wmf-config/db-eqiad.php: Depool db1114 (duration: 00m 46s)
  • 18:32 bsitzmann@deploy1001: Finished deploy [mobileapps/deploy@94b76f5]: Update mobileapps to 4c42e3d (T214714) (duration: 03m 33s)
  • 18:28 bsitzmann@deploy1001: Started deploy [mobileapps/deploy@94b76f5]: Update mobileapps to 4c42e3d (T214714)
  • 17:17 chaomodus: notebook1003 restarted nagios-nrpe-server due to oom - T212824
  • 14:43 hashar: contint1001: stopping zuul-merger for cleanup duties
  • 09:48 marostegui: Add dbstore1005:3318 to tendril T210478
  • 08:24 marostegui@deploy1001: Synchronized wmf-config/db-eqiad.php: Fully repool db1105 (duration: 00m 45s)
  • 08:00 marostegui@deploy1001: Synchronized wmf-config/db-eqiad.php: Give more traffic to db1105:3312 (duration: 00m 45s)
  • 07:51 elukey: restart yarn/hdfs daemons on analytics1056 to pick up new disk settings - T214057
  • 07:40 elukey: drain + reboot analytics1054 after disk swap (verify reboot + restore correct fstab mountpoints) - T213038
  • 07:30 marostegui@deploy1001: Synchronized wmf-config/db-eqiad.php: Slowly repool db1105:3312 (duration: 00m 45s)
  • 07:21 marostegui@deploy1001: Synchronized wmf-config/db-eqiad.php: Slowly repool db1105 (duration: 00m 47s)
  • 06:53 marostegui: Stop MySQL on db1105 to upgrade MySQL
  • 06:53 marostegui@deploy1001: Synchronized wmf-config/db-eqiad.php: Fully depool db1105 (duration: 00m 46s)
  • 06:49 marostegui@deploy1001: Synchronized wmf-config/db-eqiad.php: Repool db1122 T210713 (duration: 00m 47s)
  • 06:13 marostegui: Deploy schema change on db1122 - T210713
  • 06:12 marostegui@deploy1001: Synchronized wmf-config/db-eqiad.php: Depool db1122 T210713 (duration: 00m 48s)
  • 06:04 marostegui: Compress dbstore1002: staging.mep_word_persistence from Aria to InnoDB - T213706
  • 05:42 kartik@deploy1001: Finished deploy [cxserver/deploy@a5d7181]: Update cxserver to 356f0a1 (T213257, T213275) (duration: 04m 09s)
  • 05:38 kartik@deploy1001: Started deploy [cxserver/deploy@a5d7181]: Update cxserver to 356f0a1 (T213257, T213275)
  • 03:12 mutante: scandium sudo chgrp -R wikidev /srv/deployment/parsoid/deploy/ ; sudo chmod -R g+w /srv/deployment/parsoid/deploy/ (T201366)
  • 03:03 mutante: scandium - apt-get -t stretch-backports install npm ; run puppet ; remove manually created /apt/preferences.d/npm.pref ; puppet created npm_stretch_backports.pref ; puppet run without errors again (T201366)
  • 01:33 crusnov@deploy1001: Finished deploy [netbox/deploy@7770453]: Cleanup deploy - T212524 (duration: 00m 11s)
  • 01:33 crusnov@deploy1001: Started deploy [netbox/deploy@7770453]: Cleanup deploy - T212524
  • 01:28 crusnov@deploy1001: Finished deploy [netbox/deploy@7770453]: Upgrade netbox to 2.5.3 - T212524 Try 2 (duration: 00m 31s)
  • 01:27 crusnov@deploy1001: Started deploy [netbox/deploy@7770453]: Upgrade netbox to 2.5.3 - T212524 Try 2
  • 01:26 crusnov@deploy1001: Finished deploy [netbox/deploy@7770453]: Upgrade netbox to 2.5.3 - T212524 (duration: 07m 43s)
  • 01:18 crusnov@deploy1001: Started deploy [netbox/deploy@7770453]: Upgrade netbox to 2.5.3 - T212524
  • 00:46 ebernhardson@deploy1001: Synchronized wmf-config/InitialiseSettings.php: SWAT T214515 gerrit:486154: Turn on wbsearchentities ab test in de, fr, es (duration: 00m 46s)
  • 00:37 ebernhardson@deploy1001: Synchronized wmf-config/WikibaseSearchSettings.php: SWAT T214515 gerrit:484334: Add wbsearchentities profiles for de, fr, es (duration: 00m 45s)
  • 00:34 ebernhardson@deploy1001: Synchronized php-1.33.0-wmf.14/extensions/MobileFrontend/: SWAT T214606 gerrit:486392: MobileFrontend if wikidatadata description exists, set it as tagline (duration: 00m 47s)
  • 00:29 ebernhardson@deploy1001: Synchronized php-1.33.0-wmf.14/includes/Title.php: SWAT T210739 gerrit:486369: Clone the Title object to prevent mutation (duration: 00m 47s)
  • 00:20 ebernhardson@deploy1001: Synchronized wmf-config/CirrusSearch-common.php: SWAT T212788 gerrit:485609: autocomplete subphrase matching on wikitech and mw.org 2 of 2 (duration: 00m 45s)
  • 00:14 ebernhardson@deploy1001: Synchronized wmf-config/InitialiseSettings.php: SWAT T212788 gerrit:485608: autocomplete subphrase matching on wikitech and mw.org (duration: 00m 46s)
  • 00:01 arlolra: Updated Parsoid to 4772f44 (T214649, T214648)

2019-01-24

  • 23:54 arlolra@deploy1001: Finished deploy [parsoid/deploy@f9ef630]: Updating Parsoid to 4772f44 (duration: 11m 58s)
  • 23:42 arlolra@deploy1001: Started deploy [parsoid/deploy@f9ef630]: Updating Parsoid to 4772f44
  • 22:21 mutante: wikitech-static splitting apache2 config files into one file per vhost to make it possible for certbot t odetect them
  • 22:11 mutante: wikitech-static attempted to use certbot with --authenticator webroot and --installer apache to make it properly work with certbot renew in the future. it created account in /etc/letsencrypt/ made backup in /root/; challenge fails though because all domains need to serve out of a webroot and there is status.wikimedia.org here as well. (T21640)
  • 22:08 mutante: wikitech-static - certbot was already installed but it wasn't used to generate the existing certs so just running certbot renew did not work, attempted to use certbot to renew but apache plugin missing, installed python-certbot-apache (T214640)
  • 21:40 twentyafterfour: Finished MediaWiki train for 1.33.0-wmf.14 (T206668) - there is no train next week so I'll be back with wmf.16 (T206670) in two weeks.
  • 21:16 twentyafterfour@deploy1001: Synchronized php-1.33.0-wmf.14/extensions/AbuseFilter/includes/Views/AbuseFilterView.php: sync I67ca47 refs T206668 (duration: 00m 47s)
  • 20:47 twentyafterfour@deploy1001: rebuilt and synchronized wikiversions files: all wikis to 1.33.0-wmf.14 refs T206668
  • 20:11 jforrester@deploy1001: Finished scap: Post-SWAT full sync for new i18n for T208097 (duration: 33m 54s)
  • 19:59 mutante: temp disabled puppet on phab1001 , applying ferm change to allow deployment servers to http to phab servers
  • 19:37 jforrester@deploy1001: Started scap: Post-SWAT full sync for new i18n for T208097
  • 19:35 jforrester@deploy1001: Synchronized wmf-config/InitialiseSettings.php: SWAT T213356 Enable WelcomeSurvey experiment 2 on viwiki (duration: 00m 53s)
  • 19:33 akosiaris: delete 8505 tickets from OTRS with customerID Mailer-Daemon@wizengo.ds.planet-work.net T214604 - correction
  • 19:32 akosiaris: delete 5076 tickets from OTRS with customerID Mailer-Daemon@wizengo.ds.planet-work.net T214604
  • 19:32 jforrester@deploy1001: Synchronized php-1.33.0-wmf.14/extensions/WikibaseMediaInfo/src/WikibaseMediaInfoHooks.php: SWAT T213885 Don't add mw:mediainfoView on File pages with no captions either (duration: 00m 51s)
  • 19:26 jforrester@deploy1001: Synchronized php-1.33.0-wmf.14/extensions/WikimediaMessages/i18n/wikimedia/en.json: SWAT T208097 WikimediaMessages: Add message for BlockAttacker password policy (duration: 00m 50s)
  • 19:25 arlolra: Updated Parsoid to f1d717f (T187958, T205337, T214103)
  • 19:23 akosiaris: delete 5076 tickets from OTRS with customerID MAILER-DAEMON@ubuntu.member.linode.com T214604
  • 19:23 jforrester@deploy1001: Synchronized php-1.33.0-wmf.14/extensions/AbuseFilter/includes/AbuseFilter.php: SWAT AbuseFilter Optionally pass the filter ID to checkConditions for error reporting I8510319c (duration: 00m 53s)
  • 19:19 jforrester@deploy1001: Synchronized php-1.33.0-wmf.14/extensions/GrowthExperiments/GrowthExperiments.alias.php: SWAT T213356 Add Special:WelcomeSurvey Vietnamese alias (duration: 00m 54s)
  • 19:12 marostegui: Convert dbstore1002 staging.organic_link from Aria to InnoDB - T213706
  • 19:03 arlolra@deploy1001: Finished deploy [parsoid/deploy@f2384f0]: Updating Parsoid to f1d717f (duration: 09m 41s)
  • 19:02 cdanis: T214529: cdanis@cp4026.ulsfo.wmnet ~ % sudo apt-get --purge remove edac-utils libsysfs2 libedac1
  • 18:53 arlolra@deploy1001: Started deploy [parsoid/deploy@f2384f0]: Updating Parsoid to f1d717f
  • 18:53 mutante: notebook1003 - restarted nagios-nrpe-server... T212824
  • 18:52 chaomodus: notebook1002: restarted nagios-nrpe-server due to oom
  • 18:49 cdanis: cp4026: T214529: apt-get install'ing edac-utils with new deps libedac1 libsysfs2
  • 18:37 onimisionipe: pooling maps1003 - stretch migration is complete. T198622
  • 18:22 onimisionipe@deploy1001: Finished deploy [kartotherian/deploy@26a8bbd] (stretch): Updating maps1001 to reflect latest changes (duration: 01m 24s)
  • 18:21 onimisionipe@deploy1001: Started deploy [kartotherian/deploy@26a8bbd] (stretch): Updating maps1001 to reflect latest changes
  • 18:19 mutante: deploying polygerrit (new gerrit UI) theme change to roughly match MediaWiki timeless theme (gerrit:482379) (shoutouts: paladox, thcipiriani)
  • 18:07 XioNoX: re-activate ping offload redirect for ping1001 restart
  • 18:03 moritzm: rebooting ping1001 to pick up SSBD-enabled qemu
  • 18:01 XioNoX: deactive ping offload redirect for ping1001 restart
  • 17:58 moritzm: rebooting ping2001 to pick up SSBD-enabled qemu
  • 17:50 akosiaris: restart exim on mendelevium T214604
  • 17:44 akosiaris: block specific IPv4, IPv6 address on mx1001, mx2001 T214604
  • 17:35 akosiaris: freeze all current info@wikipedia.org emails on mx1001, mx2001 T214604
  • 17:31 moritzm: rebooting seaborgium to pick SSBD-enabled qemu
  • 17:01 akosiaris: stop exim on mendelevium
  • 16:25 moritzm: rebooting serpens to pick SSBD-enabled qemu
  • 15:45 reedy@deploy1001: Synchronized php-1.33.0-wmf.14/extensions/WikimediaEvents/: Revive wgPoweredByHHVM (duration: 00m 55s)
  • 15:14 moritzm: rebooting pollux to pick SSBD-enabled qemu
  • 14:50 godog: roll restart prometheus after https://gerrit.wikimedia.org/r/c/operations/puppet/+/486251 - T187987
  • 14:45 ariel@deploy1001: Finished deploy [dumps/dumps@25358e7]: fix up web links to multistream dump files (duration: 00m 03s)
  • 14:45 ariel@deploy1001: Started deploy [dumps/dumps@25358e7]: fix up web links to multistream dump files
  • 14:31 andrew@deploy1001: Finished deploy [horizon/deploy@94f3ec1]: Rolling out an upgraded proxy dashboard -- now use designate v2 API (duration: 03m 21s)
  • 14:28 andrew@deploy1001: Started deploy [horizon/deploy@94f3ec1]: Rolling out an upgraded proxy dashboard -- now use designate v2 API
  • 14:23 marostegui: Stop replication on all threads in dbstore1002 - T213706
  • 13:13 zeljkof: EU SWAT finished
  • 13:10 zfilipin@deploy1001: Synchronized wmf-config/InitialiseSettings.php: SWAT: Configure $wgSitename and $wgMetaNamespace for ur.wiktionary, ur.wikibooks and ur.wikiquote (T214290) (duration: 00m 53s)
  • 13:02 zfilipin@deploy1001: Synchronized wmf-config/InitialiseSettings.php: SWAT: Assign "suppressredirect" to rollbacker on newiki (T214012) (duration: 00m 53s)
  • 13:00 zeljkof: extending EU SWAT for 5-10 minuts
  • 12:53 reedy@deploy1001: Synchronized private/PrivateSettings.php: fix minor typo (duration: 00m 52s)
  • 12:46 zfilipin@deploy1001: Synchronized wmf-config/InitialiseSettings.php: SWAT: Change $wgUploadNavigationUrl for the Persian (fa) Wikisource to Commons (T214048) (duration: 00m 53s)
  • 12:36 zfilipin@deploy1001: Synchronized wmf-config/InitialiseSettings.php: SWAT: Add few domains at $wgCopyUploadsDomains and cleanup inline comments (T213961 T213632 T213649 T213924) (duration: 00m 53s)
  • 12:32 zfilipin@deploy1001: sync-file aborted: SWAT: Enable reference previews on beta (T213415) (duration: 00m 01s)
  • 12:28 jbond42: restarting pdns-recursor and ntp on dns1001 and dns1002 for a security update
  • 12:25 zfilipin@deploy1001: Synchronized wmf-config/: SWAT: Enable reference previews on beta (T213415) (duration: 00m 54s)
  • 12:17 zfilipin@deploy1001: Synchronized wmf-config/abusefilter.php: SWAT: Enable $wgAbuseFilterProfile on every wiki (T191039) (duration: 00m 54s)
  • 12:12 onimisionipe: initializing postgres replication for maps1001
  • 11:55 moritzm: installing memcached updates on dbmonitor*
  • 11:41 moritzm: installing polarssl security updates
  • 11:38 gehel: restart elasticsearch on elastic20205 to validate configuration change
  • 11:27 gehel: restarting blazegraph + updater on wdqs* for jvm upgrade
  • 11:26 moritzm: installing xen security updates (only some client libs are used)
  • 11:12 marostegui: Add dbstore1005:3318 to zarcillo - T210478
  • 11:08 moritzm: installing Java security updates on wdqs hosts
  • 10:59 arturo: T214299 additional reboot for cloudnet1004
  • 10:51 marostegui: Compress innodb tables on dbstore1005:3318 - T210478
  • 10:42 marostegui@deploy1001: Synchronized wmf-config/db-eqiad.php: Repool db1075 (duration: 00m 53s)
  • 10:37 moritzm: installing libsndfile security updates
  • 10:37 gehel: starting stretch upgrade on maps1001 - T198622
  • 10:28 marostegui@deploy1001: Synchronized wmf-config/db-eqiad.php: Depool db1075 (duration: 00m 52s)
  • 10:13 moritzm: installing libav security updates
  • 10:03 arturo: T214299 reimage cloudnet1004 to debian stretch
  • 09:58 moritzm: installing tiff security updates on trusty
  • 09:45 marostegui@deploy1001: Synchronized wmf-config/db-eqiad.php: Repool db1103:3312 T210713 (duration: 00m 53s)
  • 09:43 marostegui: Deploy schema change on db1095:3312 - T210713
  • 09:30 marostegui: Deploy schema change on db1103:3312 - T210713
  • 09:29 marostegui@deploy1001: Synchronized wmf-config/db-eqiad.php: Depool db1103:3312 T210713 (duration: 00m 53s)
  • 09:24 godog: temp stop prometheus@global on prometheus2003 to grab a snapshot
  • 08:51 dcausse: elasticsearch: deleting indices moved out of the search-chi@(eqiad|codfw) cluster (T214052)
  • 08:49 marostegui: Transfer s8 from db1116:3318 to dbstore1005:3318 T210478
  • 08:40 marostegui: Deploy schema change on s2 codfw master (db2035). this will generate lag on codfw - T210713
  • 08:30 marostegui: Deploy schema change on db1070 (s5 master) - T210713
  • 08:30 marostegui@deploy1001: Synchronized wmf-config/db-eqiad.php: Repool db1110 T210713 (duration: 00m 52s)
  • 08:18 marostegui: Deploy schema change on db1110 - T210713
  • 08:17 marostegui@deploy1001: Synchronized wmf-config/db-eqiad.php: Depool db1110 T210713 (duration: 00m 53s)
  • 08:08 oblivian@deploy1001: Synchronized wmf-config/InitialiseSettings.php: Whitelist the php7 beta feature (duration: 00m 54s)
  • 07:58 marostegui: Compress innodb on dbstor1004 s2 and s3 - T210478
  • 07:53 marostegui: Deploy schema change on db1102:3315
  • 07:50 marostegui: Compress InnoDB tables on dbstore1005:3316 - T210478
  • 07:43 marostegui: Add dbstore1005:3316 to tendril and zarcillo - T210478
  • 07:26 marostegui@deploy1001: Synchronized wmf-config/db-eqiad.php: Repool db1082 T210713 (duration: 00m 52s)
  • 07:18 marostegui: Transfer s6 from dbstore1001 to dbstore1005 using mariadbbackup - T210478
  • 07:09 marostegui: Compress Aria tables to InnoDB on dbstore1002 staging database - T213706
  • 07:07 marostegui: Deploy schema change on db1082, this will generate lag on labsdb s5 - T210713
  • 07:07 marostegui@deploy1001: Synchronized wmf-config/db-eqiad.php: Depool db1082 T210713 (duration: 00m 52s)
  • 07:02 marostegui@deploy1001: Synchronized wmf-config/db-eqiad.php: Repool db1113:3315 T210713 (duration: 00m 53s)
  • 06:55 marostegui: Transfer x1 from dbstore1001 to dbstore1005 using mariadbbackup - T210478
  • 06:51 marostegui: Deploy schema change on db1113:3315 - T210713
  • 06:51 marostegui@deploy1001: Synchronized wmf-config/db-eqiad.php: Depool db1113:3315 T210713 (duration: 00m 53s)
  • 06:43 marostegui: Add dbstore1005:3320 to tendril and zarcillo - T210478
  • 06:37 marostegui@deploy1001: Synchronized wmf-config/db-eqiad.php: Repool db1100 T210713 (duration: 00m 52s)
  • 06:27 marostegui: Deploy schema change on db1100 - T210713
  • 06:27 marostegui@deploy1001: Synchronized wmf-config/db-eqiad.php: Depool db1100 T210713 (duration: 00m 53s)
  • 06:14 marostegui: Reboot dbstore1005 - T210478
  • 06:10 marostegui: Add dbstore1003:3311 to tendril - T210478
  • 05:03 tstarling@deploy1001: Synchronized wmf-config/profiler.php: gerrit 478137 (duration: 00m 53s)
  • 05:01 tstarling@deploy1001: Synchronized wmf-config/PhpAutoPrepend.php: gerrit 478137 (duration: 00m 53s)
  • 04:53 tstarling@deploy1001: Synchronized wmf-config/PhpAutoPrepend-labs.php: gerrit 477957 (duration: 00m 53s)
  • 04:52 tstarling@deploy1001: Synchronized wmf-config/PhpAutoPrepend.php: gerrit 477957 (duration: 00m 52s)
  • 04:51 tstarling@deploy1001: Synchronized wmf-config/LabsServices.php: gerrit 477957 (duration: 00m 52s)
  • 04:50 tstarling@deploy1001: Synchronized wmf-config/ProductionServices.php: gerrit 477957 (duration: 00m 56s)
  • 01:35 krinkle@deploy1001: Synchronized errorpages/: Ic093c3122f - rm php-fatal-error.html (duration: 00m 54s)
  • 01:01 tstarling@deploy1001: Synchronized wmf-config/CommonSettings.php: 477956 and Aaron's 486134 (duration: 00m 52s)
  • 00:59 tstarling@deploy1001: Synchronized errorpages/hhvm-fatal-error.php: (no justification provided) (duration: 00m 53s)
  • 00:58 tstarling@deploy1001: Synchronized multiversion/MWRealm.php: (no justification provided) (duration: 00m 52s)
  • 00:57 tstarling@deploy1001: Synchronized src/ServiceConfig.php: gerrit 477956 (duration: 00m 53s)
  • 00:45 thcipriani@deploy1001: Synchronized php-1.33.0-wmf.14/skins/MinervaNeue/includes/skins/minerva.mustache: SWAT: Restore banners to Wikivoyage project (duration: 00m 52s)
  • 00:42 thcipriani@deploy1001: Synchronized php-1.33.0-wmf.13/extensions/MobileFrontend: SWAT: Explicitly pass in parseHTML T214451 (duration: 00m 55s)
  • 00:34 thcipriani@deploy1001: Synchronized php-1.33.0-wmf.14/extensions/MobileFrontend: SWAT: Explicitly pass in parseHTML T214451 (duration: 00m 57s)

2019-01-23

  • 23:32 crusnov@deploy1001: Finished deploy [netbox/deploy@aa3c342]: Upgrade netbox to 2.5.3 - T212524 (duration: 04m 46s)
  • 23:28 twentyafterfour@deploy1001: Synchronized php: group1 wikis to 1.33.0-wmf.14 refs T206668 (duration: 00m 52s)
  • 23:28 crusnov@deploy1001: Started deploy [netbox/deploy@aa3c342]: Upgrade netbox to 2.5.3 - T212524
  • 23:26 chaomodus: scap deploy netbox 2.5.3
  • 23:13 jforrester@deploy1001: Synchronized php-1.33.0-wmf.14/extensions/Translate/TranslateHooks.php: T214517 T214358 Hot-deploy Ic9d85fec1 to un-block train, hopefully (duration: 00m 53s)
  • 23:00 jforrester@deploy1001: Synchronized php-1.33.0-wmf.14/extensions/WikimediaEvents/includes/WikimediaEventsHooks.php: Hot-deploy I81165bf00 to use the right name and value for the cookie (duration: 00m 53s)
  • 22:08 chaomodus: proton1001 restarted nagios-nrpe-server which died from oom
  • 21:30 mutante: scandium - removing npm and nodejs*, testing puppetization to reinstall them
  • 20:50 twentyafterfour@deploy1001: Synchronized php: group1 wikis to 1.33.0-wmf.13 refs T206668 (duration: 00m 52s)
  • 20:50 twentyafterfour@deploy1001: rebuilt and synchronized wikiversions files: group1 wikis to 1.33.0-wmf.13 refs T206668
  • 20:43 twentyafterfour@deploy1001: Synchronized php: group1 wikis to 1.33.0-wmf.14 refs T206668 (duration: 00m 52s)
  • 20:42 twentyafterfour@deploy1001: rebuilt and synchronized wikiversions files: group1 wikis to 1.33.0-wmf.14 refs T206668
  • 20:33 twentyafterfour@deploy1001: rebuilt and synchronized wikiversions files: group0 wikis to 1.33.0-wmf.14 refs T206668
  • 20:23 twentyafterfour@deploy1001: rebuilt and synchronized wikiversions files: group0 wikis to 1.33.0-wmf.13 refs T206668
  • 20:21 twentyafterfour: rolling back because error rate increased significantly after promoting
  • 20:10 twentyafterfour: twentyafterfour@deploy1001 rebuilt and synchronized wikiversions files: group0 wikis to 1.33.0-wmf.14 refs T206668
  • 19:33 moritzm: rebooting dubnium to pick up SSBD-enabled qemu
  • 19:03 moritzm: rebooting puppetdb2001 to pick up SSBD-enabled qemu
  • 18:46 jforrester@deploy1001: Synchronized wmf-config/InitialiseSettings.php: Config: Disable showing 'depicts' statements on Commons for now via I66d97031 (duration: 00m 52s)
  • 18:44 jforrester@deploy1001: Synchronized php-1.33.0-wmf.14/extensions/WikimediaEvents/includes/WikimediaEventsHooks.php: Hot-deploy Ief9c9155c to avoid auto-opting new accounts into PHP7 (duration: 00m 53s)
  • 18:35 anomie@deploy1001: Synchronized php-1.33.0-wmf.13/includes/page/WikiPage.php: Add even more temporary logging for T210739 (duration: 00m 54s)
  • 18:26 moritzm: rebooting mendelevium/ticket.wikimedia.org to pick up SSBD-enabled qemu
  • 18:10 sbisson@deploy1001: Synchronized wmf-config/InitialiseSettings.php: SWAT: Reapply Enable the Welcome survey on viwiki (duration: 00m 53s)
  • 18:09 sbisson@deploy1001: Synchronized php-1.33.0-wmf.13/extensions/GrowthExperiments/: SWAT: Help panel: ResourceLoaderHelpPanelModule handle help panel disabled (duration: 00m 54s)
  • 18:03 jbond@cumin1001: conftool action : set/pooled=yes; selector: name=dns5002.wikimedia.org
  • 17:57 jbond@cumin1001: conftool action : set/pooled=no; selector: name=dns5002.wikimedia.org
  • 17:24 dcausse@deploy1001: Finished deploy [search/mjolnir/deploy@a141ad3]: fix retry_on_conflict (duration: 04m 21s)
  • 17:20 dcausse@deploy1001: Started deploy [search/mjolnir/deploy@a141ad3]: fix retry_on_conflict
  • 16:57 jbond@cumin1001: conftool action : set/pooled=yes; selector: name=dns5001.wikimedia.org
  • 16:53 jbond@cumin1001: conftool action : set/pooled=no; selector: name=dns5001.wikimedia.org
  • 16:50 jbond@cumin1001: conftool action : set/pooled=yes; selector: name=dns4002.wikimedia.org
  • 16:44 jbond@cumin1001: conftool action : set/pooled=no; selector: name=dns4002.wikimedia.org
  • 16:43 jbond@cumin1001: conftool action : set/pooled=yes; selector: name=dns4001.wikimedia.org
  • 16:36 jbond@cumin1001: conftool action : set/pooled=no; selector: name=dns4001.wikimedia.org
  • 16:31 jbond@cumin1001: conftool action : set/pooled=yes; selector: name=dns2002.wikimedia.org
  • 16:14 jbond@cumin1001: conftool action : set/pooled=no; selector: name=dns2002.wikimedia.org
  • 16:13 jbond42: rolling restarts of PDNS recursors/ntpd in codfw/esams/ulsfi/eqsin to pick up openssl security update
  • 16:02 jbond42: restarting ntpd on dns2001
  • 16:00 jbond@cumin1001: conftool action : set/pooled=yes; selector: name=dns2001.wikimedia.org
  • 15:57 jynus: adding dbstore1004:s2 to tendril
  • 15:43 jbond@cumin1001: conftool action : set/pooled=no; selector: name=dns2001.wikimedia.org
  • 15:20 marostegui: Truncate wmf_checksum table on dbstore1002 - T213670
  • 14:55 marostegui: Compress InnoDB on a few tables on dbstore1002 to gain some extra space - T213670
  • 14:18 marostegui: Convert tokudb tables into innodb on dbstore1002 - T213706
  • 13:47 marostegui: Convert a bunch of Aria tables to InnoDB on dbstore1002
  • 13:38 onimisionipe: repooling maps1002
  • 13:32 gehel: restarting kartotherian on maps100[234]
  • 13:30 gehel: restarting kartotherian on maps1003
  • 13:27 marostegui: Migrate some tokudb tables to innodb on dbstore1002 - T213706
  • 13:18 gehel: running cumin 'P{O:cache::upload} and A:eqiad' 'run-puppet-agent'
  • 13:10 zeljkof: EU SWAT finished
  • 12:36 zfilipin@deploy1001: Synchronized php-1.33.0-wmf.14/extensions/AbuseFilter: SWAT: Re-fix the throttle script (T209565) (duration: 00m 55s)
  • 12:32 zfilipin@deploy1001: Synchronized php-1.33.0-wmf.13/extensions/AbuseFilter/: SWAT: Re-fix the throttle script (T209565) (duration: 00m 54s)
  • 12:20 ladsgroup@deploy1001: Synchronized wmf-config/InitialiseSettings.php: SWAT: Add new namespace abbreviation for Swedish (sv) (T214329) (duration: 00m 53s)
  • 12:17 ladsgroup@deploy1001: Synchronized wmf-config/InitialiseSettings.php: SWAT: Fix project talk namespace alias of Persian Wikipedia (T213733) (duration: 00m 53s)
  • 12:09 zfilipin@deploy1001: Synchronized wmf-config/InitialiseSettings.php: SWAT: Define ImportSources for nywiki (duration: 00m 54s)
  • 11:44 reedy@deploy1001: Synchronized wmf-config/CommonSettings.php: T214456 (duration: 00m 53s)
  • 11:04 arturo: T214299 reboot cloudnet2001-dev, cloudnet2002-dev and cloudnet1003 for new interface names
  • 11:02 marostegui@deploy1001: Synchronized wmf-config/db-eqiad.php: Repool db1097:3315 T210713 (duration: 00m 52s)
  • 10:52 marostegui@deploy1001: Synchronized wmf-config/db-eqiad.php: Depool db1097:3315 T210713 (duration: 00m 52s)
  • 10:39 arturo: updating puppet catalog compiler facts: `PUPPET_COMPILER=compiler1002.puppet-diffs.eqiad.wmflabs modules/puppet_compiler/files/compiler-update-facts`
  • 10:37 marostegui@deploy1001: Synchronized wmf-config/db-eqiad.php: Repool db1096:3315 T210713 (duration: 00m 52s)
  • 10:33 Amir1: Deployed patch for T207814 on wmf.14
  • 10:31 Amir1: Deployed patch for T207814 on wmf.13
  • 10:12 marostegui: Deploy schema change on db1096:3315 - T210713
  • 10:12 marostegui@deploy1001: Synchronized wmf-config/db-eqiad.php: Depool db1096:3315 T210713 (duration: 00m 53s)
  • 09:39 akosiaris: upgrade mathoid in eqiad and codfw to latest chart version
  • 09:38 akosiaris@deploy1001: scap-helm mathoid finished
  • 09:38 akosiaris@deploy1001: scap-helm mathoid cluster codfw completed
  • 09:38 akosiaris@deploy1001: scap-helm mathoid cluster eqiad completed
  • 09:38 akosiaris@deploy1001: scap-helm mathoid upgrade -f mathoid-values.yaml production stable/mathoid [namespace: mathoid, clusters: eqiad,codfw]
  • 09:30 filippo@puppetmaster1001: conftool action : set/pooled=yes; selector: name=prometheus1003.eqiad.wmnet
  • 09:23 akosiaris@deploy1001: scap-helm mathoid finished
  • 09:23 akosiaris@deploy1001: scap-helm mathoid cluster staging completed
  • 09:23 akosiaris@deploy1001: scap-helm mathoid upgrade -f mathoid-values.yaml --set resources.replicas=1 staging stable/mathoid [namespace: mathoid, clusters: staging]
  • 09:22 akosiaris@deploy1001: scap-helm mathoid finished
  • 09:22 akosiaris@deploy1001: scap-helm mathoid cluster staging completed
  • 09:22 akosiaris@deploy1001: scap-helm mathoid upgrade -f mathoid-values.yaml staging stable/mathoid [namespace: mathoid, clusters: staging]
  • 08:55 marostegui: Deploy schema change on s5 codfw master with replication, lag will be generated - T210713
  • 08:44 addshore: addshore@mwmaint1002:~$ mwscript extensions/Cognate/maintenance/populateCognatePages.php --wiki yuewiktionary --batch-size 1000 // T214400
  • 08:28 marostegui: Deploy schema change on db1061 (s6 primary master) - T210713
  • 08:27 marostegui@deploy1001: Synchronized wmf-config/db-eqiad.php: Repool db1088 T210713 (duration: 00m 55s)
  • 08:19 marostegui: Add dbstore1004:3314 to tendril - T210478
  • 08:18 marostegui: Add dbstore1004:3314 to zarcillo - T210478
  • 08:12 marostegui: Deploy schema change on db1088 T210713
  • 08:12 marostegui@deploy1001: Synchronized wmf-config/db-eqiad.php: Depool db1088 T210713 (duration: 00m 52s)
  • 08:09 marostegui@deploy1001: Synchronized wmf-config/db-eqiad.php: Repool db1093 T210713 (duration: 00m 52s)
  • 07:51 marostegui: Compress tables on dbstore1004:3314 - T210478
  • 07:48 marostegui: Deploy schema change on db1093 - T210713
  • 07:48 marostegui@deploy1001: Synchronized wmf-config/db-eqiad.php: Depool db1093 T210713 (duration: 00m 54s)
  • 07:39 marostegui@deploy1001: Synchronized wmf-config/db-eqiad.php: Repool db1085 T210713 (duration: 00m 52s)
  • 07:13 marostegui: Deploy schema change on db1085, this will generate lag on s6 labs - T210713
  • 07:12 marostegui@deploy1001: Synchronized wmf-config/db-eqiad.php: Depool db1085 T210713 (duration: 00m 53s)
  • 07:07 marostegui@deploy1001: Synchronized wmf-config/db-eqiad.php: Repool db1113:3316 T210713 (duration: 00m 52s)
  • 06:53 marostegui: Deploy schema change on db1113:3316 - T210713
  • 06:53 marostegui@deploy1001: Synchronized wmf-config/db-eqiad.php: Depool db1113:3316 T210713 (duration: 00m 53s)
  • 06:25 marostegui: Stop s4 on db1102 to clone dbstore1004 - T210478
  • 06:16 marostegui@deploy1001: Synchronized wmf-config/InitialiseSettings.php: Increase parsercache TTL keys from 22 to 24 days T210992 (duration: 01m 06s)
  • 04:05 tstarling@deploy1001: Finished scap: gerrit 480419 (duration: 19m 33s)
  • 03:45 tstarling@deploy1001: Started scap: gerrit 480419
  • 03:44 tstarling@deploy1001: Synchronized wmf-config/PhpAutoPrepend.php: gerrit 480419 (duration: 00m 52s)
  • 03:41 tstarling@deploy1001: Synchronized wmf-config/profiler.php: gerrit 480419 (duration: 00m 54s)
  • 03:40 tstarling@deploy1001: Synchronized wmf-config/CommonSettings.php: gerrit 480419 (duration: 00m 54s)
  • 03:38 tstarling@deploy1001: scap failed: average error rate on 9/11 canaries increased by 10x (rerun with --force to override this check, see https://logstash.wikimedia.org/goto/db09a36be5ed3e81155041f7d46ad040 for details)
  • 03:36 tstarling@deploy1001: Synchronized wmf-config/arclamp.php: gerrit 480419 (duration: 00m 54s)
  • 03:32 tstarling@deploy1001: Synchronized php-1.33.0-wmf.13/LocalSettings.php: gerrit 480419 (duration: 00m 54s)
  • 03:29 tstarling@deploy1001: Synchronized php-1.33.0-wmf.14/LocalSettings.php: gerrit 480419 (duration: 00m 52s)
  • 03:27 tstarling@deploy1001: Synchronized src/XWikimediaDebug.php: gerrit 480419 (duration: 00m 55s)
  • 03:22 TimStarling: manually edited LocalSettings.php in php-1.33.0-wmf.13 and php-1.33.0-wmf.14 to use a relative path, like in https://gerrit.wikimedia.org/r/#/c/operations/puppet/+/480695/
  • 03:09 tstarling@deploy1001: Scap failed!: Call to mwscript eval.php returned: None
  • 01:15 mutante: scandium - puppet run now without errors for the first time for the parsoid testing role on stretch instead of jessie. nodejs 10. - @subbu @arlolra you can start using it to replace ruthenium (T201366)
  • 01:12 mutante: scandium - git cloning parsoid from gerrit - mediawiki/services/parsoid/deploy to /srv/deployment/parsoid/deploy ; still needs https://gerrit.wikimedia.org/r/#/c/operations/puppet/+/484602/ (T201366)
  • 01:05 mutante: scandium - deleting /etc/apt/preferences.d/stretch_backports.pref ; apt-get remove nodejs ; apt-get install -t stretch-backports npm ; now has nodejs 10 and npm from backports installed (T201366)
  • 00:58 mutante: scandium - deleting /etc/apt/preferences.d/stretch_backports.pref ; apt-get remove nodejs
  • 00:52 ebernhardson@deploy1001: Synchronized php-1.33.0-wmf.13/extensions/ContentTranslation/scripts/purge-unpublished-drafts.php: SWAT T203059 ContentTranslation: Remove waitForReplication for dry-run (duration: 00m 55s)
  • 00:40 ebernhardson@deploy1001: Synchronized wmf-config/InitialiseSettings.php: SWAT T213851 Cirrus: Setup archive index shard/replica counts (duration: 00m 54s)
  • 00:05 gtirloni: T209527 disabled notifications for cloudstore100{8,9}

2019-01-22

  • 23:09 cstone: Updated payments-wiki from 7d4cd165d9 to ca7c280f3e
  • 22:22 twentyafterfour@deploy1001: Finished scap: testwikis wikis to 1.33.0-wmf.14 refs T206668 (duration: 43m 00s)
  • 21:39 twentyafterfour@deploy1001: Started scap: testwikis wikis to 1.33.0-wmf.14 refs T206668
  • 21:31 twentyafterfour@deploy1001: Synchronized wmf-config/CommonSettings.php: deploy I91e902 (duration: 01m 39s)
  • 20:26 gehel: resetting cassandra authentication on maps / eqiad
  • 20:25 milimetric@deploy1001: Finished deploy [analytics/refinery@d806b62]: Update jar versions on modified jobs (duration: 06m 48s)
  • 20:19 milimetric@deploy1001: Started deploy [analytics/refinery@d806b62]: Update jar versions on modified jobs
  • 20:07 onimisionipe@deploy1001: deploy aborted: Updating maps1002 to reflect latest changes (duration: 00m 01s)
  • 20:07 onimisionipe@deploy1001: Started deploy [kartotherian/deploy@e847e7b] (stretch): Updating maps1002 to reflect latest changes
  • 20:06 oblivian@puppetmaster1001: conftool action : set/pooled=false; selector: dnsdisc=kartotherian,name=eqiad
  • 20:06 volans: running cumin 'P{O:cache::upload} and A:eqiad' 'run-puppet-agent'
  • 20:03 gehel: running nodetool repair on system_auth for maps / eqiad servers
  • 19:30 arturo: T214299 additional reboot for cloudnet1003
  • 19:03 onimisionipe@deploy1001: Finished deploy [kartotherian/deploy@e847e7b] (stretch): Updating maps1002 to reflect latest changes (duration: 01m 02s)
  • 19:02 onimisionipe@deploy1001: Started deploy [kartotherian/deploy@e847e7b] (stretch): Updating maps1002 to reflect latest changes
  • 18:56 bsitzmann@deploy1001: Finished deploy [mobileapps/deploy@0bcdd3f]: Update mobileapps to 0aac268 (fix pronunciation detection in mobile-sections T214338) (duration: 04m 00s)
  • 18:52 bsitzmann@deploy1001: Started deploy [mobileapps/deploy@0bcdd3f]: Update mobileapps to 0aac268 (fix pronunciation detection in mobile-sections T214338)
  • 18:36 arturo: T214299 reimaging cloudnet1003 as debian stretch
  • 18:00 milimetric@deploy1001: Finished deploy [analytics/refinery@b07451e]: Denormalized job updates for actor/comment refactor (duration: 17m 24s)
  • 17:43 milimetric@deploy1001: Started deploy [analytics/refinery@b07451e]: Denormalized job updates for actor/comment refactor
  • 17:42 milimetric@deploy1001: Finished deploy [analytics/refinery@372c0b6]: Denormalized job updates for actor/comment refactor (duration: 02m 11s)
  • 17:40 milimetric@deploy1001: Started deploy [analytics/refinery@372c0b6]: Denormalized job updates for actor/comment refactor
  • 17:30 ppchelko@deploy1001: Finished deploy [cpjobqueue/deploy@afca813]: Add the constraintsRunCheck job definition T204031 (duration: 00m 55s)
  • 17:29 ppchelko@deploy1001: Started deploy [cpjobqueue/deploy@afca813]: Add the constraintsRunCheck job definition T204031
  • 16:12 XioNoX: deactivate local pref for peering sessions in es/knams - T204281
  • 15:45 akosiaris: upgrade zotero to latest chart version
  • 15:44 akosiaris@puppetmaster1001: conftool action : set/pooled=true; selector: name=eqiad,dnsdisc=zotero
  • 15:43 akosiaris@deploy1001: scap-helm zotero finished
  • 15:43 akosiaris@deploy1001: scap-helm zotero cluster eqiad completed
  • 15:43 akosiaris@deploy1001: scap-helm zotero install -f zotero-values-eqiad.yaml -n production stable/zotero [namespace: zotero, clusters: eqiad]
  • 15:42 akosiaris@deploy1001: scap-helm zotero upgrade -f zotero-values-eqiad.yaml production stable/zotero [namespace: zotero, clusters: eqiad]
  • 15:34 addshore: addshore@mwmaint1002:~$ mwscript extensions/Cognate/maintenance/populateCognatePages.php --wiki yuewiktionary // T214400 (1 row)
  • 15:32 akosiaris@puppetmaster1001: conftool action : set/pooled=false; selector: name=eqiad,dnsdisc=zotero
  • 15:31 akosiaris@puppetmaster1001: conftool action : set/pooled=true; selector: name=codfw,dnsdisc=zotero
  • 15:30 addshore: addshore@mwmaint1002:~$ mwscript extensions/Cognate/maintenance/populateCognateSites.php --wiki yuewiktionary --site-group wiktionary // T214400
  • 15:30 akosiaris@deploy1001: scap-helm zotero finished
  • 15:30 akosiaris@deploy1001: scap-helm zotero cluster codfw completed
  • 15:30 akosiaris@deploy1001: scap-helm zotero upgrade -f zotero-values-codfw.yaml production stable/zotero [namespace: zotero, clusters: codfw]
  • 15:29 addshore: addshore@mwmaint1002:~$ mwscript extensions/Cognate/maintenance/populateCognateSites.php --wiki yuewiktionary --site-group wiktionary
  • 15:14 godog: turn on partitions.auto for rsyslog output to kafka - T214309
  • 15:14 marostegui: Add dbstore1003:3317 to tendril - T210478
  • 15:13 mbsantos@deploy1001: Finished deploy [kartotherian/deploy@bb30697] (stretch): monkey patching geoshapes service for maps100[3-4] (duration: 01m 45s)
  • 15:11 mbsantos@deploy1001: Started deploy [kartotherian/deploy@bb30697] (stretch): monkey patching geoshapes service for maps100[3-4]
  • 15:11 akosiaris@puppetmaster1001: conftool action : set/pooled=false; selector: name=codfw,dnsdisc=zotero
  • 15:11 akosiaris@puppetmaster1001: conftool action : set/pooled=true; selector: name=codfw,dnsdisc=zotero
  • 15:08 akosiaris@deploy1001: scap-helm zotero finished
  • 15:08 akosiaris@deploy1001: scap-helm zotero cluster codfw completed
  • 15:08 akosiaris@deploy1001: scap-helm zotero install -n production -f zotero-values-codfw.yaml stable/zotero [namespace: zotero, clusters: codfw]
  • 15:05 akosiaris@puppetmaster1001: conftool action : set/pooled=false; selector: name=codfw,dnsdisc=zotero
  • 14:56 thcipriani@deploy1001: Finished deploy [gerrit/gerrit@6cdece9]: Remove reviewers-by-blame from deployment cobalt no restart required (duration: 00m 11s)
  • 14:56 anomie@deploy1001: Synchronized php-1.33.0-wmf.13/includes/page/WikiPage.php: Add more temporary logging for T210739 (duration: 00m 47s)
  • 14:56 thcipriani@deploy1001: Started deploy [gerrit/gerrit@6cdece9]: Remove reviewers-by-blame from deployment cobalt no restart required
  • 14:54 thcipriani@deploy1001: Finished deploy [gerrit/gerrit@6cdece9]: Remove reviewers-by-blame from deployment gerrit2001 no restart required (duration: 00m 10s)
  • 14:54 thcipriani@deploy1001: Started deploy [gerrit/gerrit@6cdece9]: Remove reviewers-by-blame from deployment gerrit2001 no restart required
  • 14:45 onimisionipe: starting init of postgres replication on maps1002 - T198622
  • 14:34 gehel: monkey patch kartotherian configuration to re-add proxy on maps100[34] - T214350
  • 14:18 akosiaris@deploy1001: scap-helm mathoid finished
  • 14:18 akosiaris@deploy1001: scap-helm mathoid cluster codfw completed
  • 14:18 akosiaris@deploy1001: scap-helm mathoid cluster eqiad completed
  • 14:18 akosiaris@deploy1001: scap-helm mathoid upgrade -f mathoid-values.yaml production stable/mathoid [namespace: mathoid, clusters: eqiad,codfw]
  • 14:17 akosiaris: upgrade mathoid to the latest chart version (0.0.15)
  • 14:17 akosiaris: upgrade blubberoid to the latest chart version (0.0.5)
  • 14:17 akosiaris@deploy1001: scap-helm mathoid finished
  • 14:17 akosiaris@deploy1001: scap-helm mathoid cluster staging completed
  • 14:17 akosiaris@deploy1001: scap-helm mathoid upgrade -f mathoid-values.yaml --set resources.replicas=1 staging stable/mathoid [namespace: mathoid, clusters: staging]
  • 14:15 akosiaris@deploy1001: scap-helm mathoid finished
  • 14:15 akosiaris@deploy1001: scap-helm mathoid cluster staging completed
  • 14:15 akosiaris@deploy1001: scap-helm mathoid install -n staging -f mathoid-values.yaml --version=0.0.12 stable/mathoid [namespace: mathoid, clusters: staging]
  • 14:15 akosiaris@deploy1001: scap-helm mathoid install -n staging -f mathoid-values.yaml --version=0.0.12 stable/mathoid [namespace: mathoid, clusters: staging]
  • 14:14 akosiaris@deploy1001: scap-helm mathoid upgrade -f mathoid-values.yaml staging stable/mathoid [namespace: mathoid, clusters: staging]
  • 14:10 akosiaris@deploy1001: scap-helm blubberoid finished
  • 14:10 akosiaris@deploy1001: scap-helm blubberoid cluster staging completed
  • 14:10 akosiaris@deploy1001: scap-helm blubberoid install -n staging -f blubberoid-values.yaml stable/blubberoid [namespace: blubberoid, clusters: staging]
  • 14:04 akosiaris@deploy1001: scap-helm blubberoid finished
  • 14:04 akosiaris@deploy1001: scap-helm blubberoid cluster codfw completed
  • 14:04 akosiaris@deploy1001: scap-helm blubberoid cluster eqiad completed
  • 14:04 akosiaris@deploy1001: scap-helm blubberoid install -n production -f blubberoid-values.yaml stable/blubberoid [namespace: blubberoid, clusters: eqiad,codfw]
  • 14:04 akosiaris@deploy1001: scap-helm blubberoid upgrade -f blubberoid-values.yaml production stable/blubberoid [namespace: blubberoid, clusters: eqiad,codfw]
  • 13:59 marostegui@deploy1001: Synchronized wmf-config/db-eqiad.php: Repool db1098:3316 T210713 (duration: 00m 45s)
  • 13:55 godog: bump logstash kafka consumer threads - T214309
  • 13:41 marostegui: Stop replication in sync on dbstore1001:3316 and db1098:3316
  • 13:35 Amir1: running extensions/Wikibase/lib/maintenance/populateSitesTable.php on all.dblist (T211530 )
  • 13:30 Amir1: EU SWAT is finished
  • 13:29 ladsgroup@deploy1001: Synchronized langlist: SWAT: Add yue to langlist (T211530) (duration: 00m 46s)
  • 13:26 moritzm: installing apt security updates for jessie
  • 13:19 Amir1: ladsgroup@mwmaint1002:~$ mwscript namespaceDupes.php fawiki --fix (T213733)
  • 13:18 ladsgroup@deploy1001: Synchronized wmf-config/InitialiseSettings.php: SWAT: Add new synonyms for namespaces in Persian (fa) (T213733) (duration: 00m 47s)
  • 13:13 moritzm: installing apt security updates for trusty
  • 13:07 pmiazga@deploy1001: Synchronized wmf-config/InitialiseSettings.php: SWAT: Enable page issues improvements on English Wikipedia ([T210554]) (duration: 00m 46s)
  • 12:52 zfilipin@deploy1001: Synchronized wmf-config/InitialiseSettings.php: SWAT: Use new logos in IS.php (T150618) (duration: 00m 47s)
  • 12:40 gehel: start stretch upgrade for maps1002 - T198622
  • 12:36 zfilipin@deploy1001: Synchronized static/images/project-logos/: SWAT: Upload HD logos for several projects (T150618) (duration: 00m 46s)
  • 12:29 zfilipin@deploy1001: Synchronized wmf-config/InitialiseSettings.php: SWAT: Remove ability for bureaucrats on outreachwiki to remove bureaucrat flag (T214133) (duration: 00m 46s)
  • 12:21 moritzm: installing apt security updates for stretch
  • 12:20 zfilipin@deploy1001: Synchronized wmf-config/InitialiseSettings.php: SWAT: Create extra namespace in kawiktionary (T212956) (duration: 00m 46s)
  • 12:13 zfilipin@deploy1001: Synchronized wmf-config/InitialiseSettings.php: SWAT: Enable transwiki user group on ne.wikipedia (T214036) (duration: 00m 47s)
  • 12:09 jynus: running mariabackup on dbstore1001:s1
  • 12:02 Lucas_WMDE: tried and failed to deploy patch for T212118
  • 10:55 marostegui: Deploy schema change on db1098:3316 - T210713
  • 10:55 marostegui@deploy1001: Synchronized wmf-config/db-eqiad.php: Depool db1098:3316 T210713 (duration: 00m 45s)
  • 10:20 addshore@deploy1001: Synchronized wmf-config/InitialiseSettings.php: T204031 wikidata: post edit constraint jobs on 25% of edits (duration: 00m 45s)
  • 10:15 addshore@deploy1001: Synchronized wmf-config/InitialiseSettings.php: T209504 Decrease WBQualityConstraintsTypeCheckMaxEntities from 300 to 150 (duration: 00m 47s)
  • 10:08 addshore@deploy1001: Synchronized wmf-config/InitialiseSettings.php: T204031 wikidata: post edit constraint jobs on 10% of edits (duration: 00m 47s)
  • 09:59 marostegui@deploy1001: Synchronized wmf-config/db-eqiad.php: Repool db1096:3316 T210713 (duration: 00m 47s)
  • 09:56 gehel@puppetmaster1001: conftool action : set/pooled=yes; selector: dc=eqiad,cluster=maps,name=maps1003.eqiad.wmnet
  • 09:55 gehel: repooling maps1003 after upgrade to stretch - T198622
  • 09:40 marostegui: Deploy schema change on db1096:3316 - T210713
  • 09:40 marostegui@deploy1001: Synchronized wmf-config/db-eqiad.php: Depool db1096:3316 T210713 (duration: 00m 48s)
  • 09:23 jynus: stop upgrade and restart db1097
  • 08:55 dcausse: elasticsearch: closing indices in search-chi@(eqiad|codfw) moved to other elastic instances (T214052)
  • 08:53 jynus@deploy1001: Synchronized wmf-config/db-eqiad.php: Depool db1097 (duration: 00m 45s)
  • 08:42 moritzm: installing policykit-1 security updates on trusty
  • 08:26 marostegui: Deploy schema change on dbstore1001:3316 - T210713
  • 08:21 marostegui@deploy1001: Synchronized wmf-config/db-eqiad.php: Repool db1090:3317 T210478 (duration: 00m 48s)
  • 08:14 marostegui: Compress s7 on dbstore1003 - T210478
  • 06:42 marostegui: Deploy schema change on db1078 (s3 master) - T85757
  • 06:36 marostegui: Stop MySQL on db1090:3317 to clone dbstore1003 - T210478
  • 06:36 marostegui@deploy1001: Synchronized wmf-config/db-eqiad.php: Depool db1090:3317 T210478 (duration: 00m 49s)
  • 05:45 kartik@deploy1001: Finished deploy [cxserver/deploy@e0ca16b]: Update cxserver to c5ff0bf (duration: 04m 15s)
  • 05:40 kartik@deploy1001: Started deploy [cxserver/deploy@e0ca16b]: Update cxserver to c5ff0bf
  • 02:17 onimisionipe: restarting tilerator on maps100[1-2]
  • 00:38 chaomodus: stat1007 nagios-srpe-server was off and alerted, restarting fixed it

2019-01-21

  • 22:33 krinkle@deploy1001: Synchronized php-1.33.0-wmf.13/extensions/TemplateData/includes/api/ApiTemplateData.php: I7647ddfc47 - T213953 (duration: 00m 47s)
  • 19:35 jynus@deploy1001: Synchronized wmf-config/db-codfw.php: Repool db2040 (duration: 00m 45s)
  • 19:23 jynus: mysql.py -h db1115 zarcillo -e "UPDATE masters SET instance = 'db2047' WHERE section = 's7' and dc = 'codfw'" T214264
  • 18:55 jynus: stop and upgrade db2040 T214264
  • 18:52 onimisionipe: pool maps1003 - postgresql sql lag issues has been fixed
  • 18:24 jynus@deploy1001: Synchronized wmf-config/db-codfw.php: Depool db2040, promote db2047 to s7 master (duration: 00m 46s)
  • 17:51 jynus: stop and apply puppet changes to db2047 T214264
  • 17:44 jynus: stop replication on db2040 for master switch T214264
  • 17:16 jynus: stop and upgrade db2054
  • 16:03 arturo: T214303 reimaging/renaming labtestneutron2002.codfw.wmnet (jessie) to cloudnet2002-dev.codfw.wmnet (stretch)
  • 15:58 onimisionipe: reinitializing slave replication(postgres) on maps1003
  • 15:52 jynus: stop and upgrade db2061
  • 15:19 dcausse: closing frwikiquote_* indices on elasticsearch search-chi@codfw (T214052)
  • 15:11 dcausse: closing frwikiquote_* indices on elasticsearch search-chi@eqiad (T214052)
  • 13:58 marostegui: Compress enwiki on dbstore1003:3311 - T210478
  • 12:36 jijiki: Restarting memcached on mc1025 to apply '-R 200' - T208844
  • 11:25 onimisionipe: depool maps1003 to fix replication lag issues
  • 10:51 elukey: disable puppet fleetwide to ease the merge/deploy of a puppet admin module change - T212949
  • 10:36 marostegui@deploy1001: Synchronized wmf-config/db-eqiad.php: Repool db1077 - T85757 (duration: 00m 44s)
  • 10:33 jynus: upgrade and restart db2047 T214264
  • 10:26 addshore@deploy1001: Synchronized php-1.33.0-wmf.13/extensions/ArticlePlaceholder/includes/AboutTopicRenderer.php: T213739 Pass a usageAccumulator to SidebarGenerator (duration: 00m 47s)
  • 10:19 marostegui@deploy1001: Synchronized wmf-config/db-eqiad.php: Fully repool db1089 (duration: 00m 45s)
  • 09:57 marostegui@deploy1001: Synchronized wmf-config/db-eqiad.php: Give more traffic to db1089 (duration: 00m 45s)
  • 09:42 marostegui@deploy1001: Synchronized wmf-config/db-eqiad.php: Slowly Repool db1089 T210478 (duration: 00m 45s)
  • 09:30 marostegui: Compress a few tables on dbstore1003:3315 - T210478
  • 08:35 marostegui: Stop replication db1077 to deploy schema change - T85757
  • 08:32 marostegui@deploy1001: Synchronized wmf-config/db-eqiad.php: Depool db1077 - T85757 (duration: 00m 46s)
  • 08:24 marostegui@deploy1001: Synchronized wmf-config/db-eqiad.php: Repool db1123 - T85757 (duration: 00m 48s)
  • 08:10 moritzm: installing OpenSSL security updates
  • 07:39 marostegui: Stop replication on db1124:3313 to fix triggers - T85757
  • 07:00 marostegui: Stop MySQL on db1089 to clone dbstore1003 - T210478
  • 07:00 marostegui@deploy1001: Synchronized wmf-config/db-eqiad.php: Depool db1089 T210478 (duration: 00m 47s)
  • 06:54 marostegui: Deploy schema change on db1123 - T85757
  • 06:54 marostegui@deploy1001: Synchronized wmf-config/db-eqiad.php: Depool db1123 - T85757 (duration: 00m 50s)
  • 06:47 marostegui: Drop tag_summary table from db1023, db1077, db1075 and db1078 T212255
  • 06:45 vgutierrez@puppetmaster1001: conftool action : set/pooled=no; selector: name=cp5010.eqsin.wmnet
  • 06:32 marostegui: Drop tag_summary table from db1095:3313 - T212255
  • 06:27 marostegui: Drop tag_summary table from dbstore1002:s3 - T212255
  • 06:12 marostegui: Drop tag_summary table from s3 codfw - T212255
  • 06:09 marostegui: tag_summary table from s8 - T212255

2019-01-20

  • 15:13 marostegui: Force WriteBack on db2040 - T214264
  • 01:07 cdanis: cdanis@wdqs1004.eqiad.wmnet /var/log/wdqs % sudo service wdqs-blazegraph restart

2019-01-19

  • 22:12 ariel@deploy1001: Finished deploy [dumps/dumps@ab79bbb]: multistream dumps in parallel, recombine gz and multistream without decompression (duration: 00m 03s)
  • 22:12 ariel@deploy1001: Started deploy [dumps/dumps@ab79bbb]: multistream dumps in parallel, recombine gz and multistream without decompression
  • 20:34 gtirloni: upgraded and rebooted labstore200{3,4}
  • 12:34 onimisionipe: pool maps1003 - stretch migration is complete T198622
  • 12:08 elukey: run 'start all slaves' on dbstore1002 after crash
  • 08:42 marostegui: Fixing dbstore1002 x1 replication T213670
  • 07:36 elukey: restart pdfrender on scb1004
  • 05:55 ebernhardson@deploy1001: Finished deploy [wikimedia/discovery/analytics@af21320]: test swapping venv build to scap fetch/script step (duration: 00m 14s)
  • 05:55 ebernhardson@deploy1001: Started deploy [wikimedia/discovery/analytics@af21320]: test swapping venv build to scap fetch/script step
  • 05:55 ebernhardson@deploy1001: Finished deploy [wikimedia/discovery/analytics@af21320]: test swapping venv build to scap fetch/script step (duration: 00m 15s)
  • 05:55 ebernhardson@deploy1001: Started deploy [wikimedia/discovery/analytics@af21320]: test swapping venv build to scap fetch/script step
  • 05:46 ebernhardson@deploy1001: Finished deploy [wikimedia/discovery/analytics@af21320]: test swapping venv build to scap fetch/script step (duration: 00m 13s)
  • 05:46 ebernhardson@deploy1001: Started deploy [wikimedia/discovery/analytics@af21320]: test swapping venv build to scap fetch/script step
  • 05:25 ebernhardson@deploy1001: Finished deploy [wikimedia/discovery/analytics@af21320]: bump discovery analytics to latest (duration: 00m 17s)
  • 05:25 ebernhardson@deploy1001: Started deploy [wikimedia/discovery/analytics@af21320]: bump discovery analytics to latest
  • 05:18 legoktm@deploy1001: Synchronized php-1.33.0-wmf.13/extensions/JsonConfig/includes/JCCache.php: Revert "JCCache: Explicit load the main slot to avoid API warnings" - T214179 (duration: 00m 58s)

2019-01-18

  • 23:57 mobrovac@deploy1001: Finished deploy [restbase/deploy@f24d681]: Deploy latest version to restbase1016 (was out of rotation), take #3 (duration: 01m 01s)
  • 23:56 mobrovac@deploy1001: Started deploy [restbase/deploy@f24d681]: Deploy latest version to restbase1016 (was out of rotation), take #3
  • 23:55 mobrovac@deploy1001: Finished deploy [restbase/deploy@f24d681]: Deploy latest version to restbase1016 (was out of rotation), take #2 (duration: 00m 18s)
  • 23:54 mobrovac@deploy1001: Started deploy [restbase/deploy@f24d681]: Deploy latest version to restbase1016 (was out of rotation), take #2
  • 23:53 mobrovac@deploy1001: Finished deploy [restbase/deploy@f24d681]: Deploy latest version to restbase1016 (was out of rotation) - T212418 (duration: 00m 34s)
  • 23:53 mobrovac@deploy1001: Started deploy [restbase/deploy@f24d681]: Deploy latest version to restbase1016 (was out of rotation) - T212418
  • 20:47 mobrovac: restbase/cassandra bootstrap restbase1016-c - T212418
  • 20:47 mobrovac: restbase/cassandra bootstrap restbase1016-c
  • 17:24 godog: bootstrap cassandra-b on restbase1016 - T212418
  • 17:06 marostegui: Reload haproxy on dbproxy1009 after rack a2 maintenance
  • 16:14 arturo: T214167 reimage+rename labtestneutron2001.codfw.wmnet (jessie) to cloudnet2001-dev.codfw.wmnet (stretch)
  • 15:36 moritzm: rebooting mwdebug servers in codfw to pick up SSBD-enabled qemu
  • 15:27 moritzm: rebooting elnath to pick up SSBD-enabled qemu
  • 13:41 marostegui: reload haproxy on dbproxy1004
  • 13:18 godog: start cassandra-a on restbase1016 - T212418
  • 13:07 mbsantos@deploy1001: Finished deploy [kartotherian/deploy@0d11a2b] (stretch): Updating stretch instance with latest code, maps1003 have wrong dependencies installed (duration: 00m 45s)
  • 13:06 mbsantos@deploy1001: Started deploy [kartotherian/deploy@0d11a2b] (stretch): Updating stretch instance with latest code, maps1003 have wrong dependencies installed
  • 12:50 moritzm: uploaded ferm 2.4-1+wmf1 to buster-wikimedia (T213527)
  • 11:46 moritzm: copied prometheus-rsyslog-exporter from stretch-wikimedia to buster-wikimedia
  • 11:09 marostegui: Deploy schema change on db2039 (s6 codfw master) - T210713
  • 10:54 marostegui: Deploy schema change on dbstore2001:3316 - T210713
  • 10:42 jynus: killing and removing data from db1118
  • 10:41 marostegui: Deploy schema change on db2076 - T210713
  • 10:29 vgutierrez: restarting pybal in lvs2002 - T214072
  • 10:23 marostegui: Deploy schema change on db2087:3316 - T210713
  • 10:23 vgutierrez: restarting pybal in lvs2005 - T214072
  • 10:02 marostegui: Add dbstore1003:3315 to zarcillo - T210478
  • 09:58 marostegui@deploy1001: Synchronized wmf-config/db-eqiad.php: Repool db1113:3315 - T210478 (duration: 00m 45s)
  • 09:57 marostegui: Add dbstore1003:3315 to tendril - T210478
  • 09:53 marostegui: Deploy schema change on db2089 - T210713
  • 09:35 marostegui: Deploy schema change on db2067 - T210713
  • 09:29 _joe_: uploading python{,3}-pygerrit2 to stretch-wikimedia, T214149
  • 09:23 marostegui@deploy1001: Synchronized wmf-config/db-codfw.php: Add migrated wikis from s3 to s5 to codfw config T184805 (duration: 00m 45s)
  • 09:01 marostegui@deploy1001: Synchronized wmf-config/db-eqiad.php: Repool db1113:3316 after mysql upgrade (duration: 00m 46s)
  • 08:12 godog: depool and take snapshots of prometheus data on prometheus2003 to test v2 conversion - T187987
  • 07:31 moritzm: rolling restart of AQS to pick up OpenSSL security updates for nodejs
  • 07:30 marostegui: Stop MySQL on db1113:3315 and db1113:3316 to clone dbstore1003 and for mysql and kernel upgrade
  • 07:29 marostegui@deploy1001: Synchronized wmf-config/db-eqiad.php: Depool db1113:3316 for mysql upgrade (duration: 00m 45s)
  • 07:24 marostegui@deploy1001: Synchronized wmf-config/db-eqiad.php: Depool db1113:3315 - T210478 (duration: 00m 46s)
  • 07:16 moritzm: installing OpenSSL security updates
  • 06:54 marostegui: Drop table tag_summary from s7 - T212255
  • 06:46 marostegui@deploy1001: Synchronized wmf-config/db-eqiad.php: Repool db1075 and db1103 after DC hw maintenance (duration: 00m 44s)
  • 06:46 marostegui: Deploy schema change on dbstore1002:s3 - T85757
  • 06:29 marostegui: Deploy schema change on db1075 - T85757
  • 06:29 marostegui@deploy1001: Synchronized wmf-config/db-eqiad.php: Repool DBs on A2 rack T213748 (duration: 00m 47s)
  • 00:00 ejegg: updated payments-wiki from c455bbc6bb to 7d4cd165d9

2019-01-17

  • 23:02 bsitzmann@deploy1001: Finished deploy [mobileapps/deploy@6b344ca]: Update mobileapps to 258d76b page summary changes, 2nd try (duration: 02m 03s)
  • 23:00 bsitzmann@deploy1001: Started deploy [mobileapps/deploy@6b344ca]: Update mobileapps to 258d76b page summary changes, 2nd try
  • 19:29 catrope@deploy1001: Synchronized php-1.33.0-wmf.13/extensions/GrowthExperiments/: Make welcome survey C unescapable (T213958) (duration: 00m 52s)
  • 19:17 catrope@deploy1001: Synchronized wmf-config/InitialiseSettings.php: Update groupOverrides for Serbian wikis (T213055, T213059, T213063, T213065, T213679, T213680, T213681, T213682, T213684, T213685, T213686, T213687, T213824, T213825, T213826, T213827, T213828, T213829, T213830, T213832) (duration: 00m 53s)
  • 19:02 ppchelko@deploy1001: Finished deploy [restbase/deploy@f24d681]: Update recommendation api endpoints (duration: 20m 26s)
  • 18:42 ppchelko@deploy1001: Started deploy [restbase/deploy@f24d681]: Update recommendation api endpoints
  • 18:22 vgutierrez: running ipvsadm -D -t 10.2.1.29:1968 in lvs2003 - T214041
  • 18:19 vgutierrez: running ipvsadm -D -t 10.2.1.29:1968 in lvs2006 - T214041
  • 18:18 bmansurov@deploy1001: Finished deploy [recommendation-api/deploy@5ba7582]: Update to I25c97e (duration: 05m 36s)
  • 18:12 bmansurov@deploy1001: Started deploy [recommendation-api/deploy@5ba7582]: Update to I25c97e
  • 17:52 elukey: re-enable eventlogging mysql clients and db1108's el replication after db1107 maintenance
  • 17:38 addshore@deploy1001: Synchronized wmf-config/InitialiseSettings.php: ConstraintsCheckJobs on wikidatawiki (5% of edits) T204031 (duration: 00m 52s)
  • 17:25 dcausse: restarting mjolnir services on all elastic* nodes
  • 17:19 dcausse@deploy1001: Finished deploy [search/mjolnir/deploy@85aec7a]: fix multi-instances support (duration: 03m 42s)
  • 17:15 dcausse@deploy1001: Started deploy [search/mjolnir/deploy@85aec7a]: fix multi-instances support
  • 16:57 jforrester@deploy1001: Synchronized php-1.33.0-wmf.12/extensions/VisualEditor/modules/ve-mw/: T213922: Revert 48db45df7602 for wmf.12 (duration: 00m 52s)
  • 16:56 jforrester@deploy1001: Synchronized php-1.33.0-wmf.13/extensions/VisualEditor/modules/ve-mw/: T213922: Revert 48db45df7602 for wmf.13 (duration: 00m 51s)
  • 16:46 dcausse@deploy1001: Finished deploy [search/mjolnir/deploy@42414ca]: add support for multi-instances setup (duration: 04m 59s)
  • 16:45 paravoid: updating ps1-a3-eqiad's SNMP communities to the new ones
  • 16:41 dcausse@deploy1001: Started deploy [search/mjolnir/deploy@42414ca]: add support for multi-instances setup
  • 16:28 fsero: uncordoned kubernetes1001
  • 16:27 fsero@puppetmaster1001: conftool action : set/pooled=yes; selector: name=kubernetes1001.eqiad.wmnet
  • 16:19 moritzm: rebooting roentgenium (failoid node in eqiad) to enable SSBD-enabled qemu
  • 16:18 cmjohnson1: ps1-a2-eqiad removing redundant power from side A to replace blown fuse
  • 16:16 moritzm: rebooting tureis (failoid node in codfw) to enable SSBD-enabled qemu
  • 15:12 moritzm: rebooting archiva1001 (archiva.wikimedia.org) to enable SSBD-enabled qemu
  • 14:49 moritzm: rebooting darmstadtium (docker registry) to enable SSBD-enabled qemu
  • 14:36 jbond42: rolling out update for debdeploy 0.0.99.6-1 -> 0.0.99.7-1 T207845
  • 14:24 anomie: Restarting migrateActors.php on s3
  • 14:19 marostegui: Drop empty frimpressions database from m2 - T213973
  • 14:04 vgutierrez: running ipvsadm -D -t 10.2.2.29:1968 in lvs1016 - T214041
  • 14:03 vgutierrez: running ipvsadm -D -t 10.2.2.29:1968 in lvs1006 - T214041
  • 14:01 gehel: pooling maps1004 (first time after stretch upgrade) - T198622
  • 13:46 akosiaris@puppetmaster1001: conftool action : set/pooled=no; selector: dc=.*,service=.*,cluster=kubernetes,name=kubernetes1001.eqiad.wmnet
  • 13:38 gehel: starting upgrade to stretch for maps1003 - T198622
  • 12:59 addshore: swat done!
  • 12:58 fsero@puppetmaster1001: conftool action : set/pooled=no; selector: name=kubernetes1001.eqiad.wmnet
  • 12:58 addshore@deploy1001: Synchronized php-1.33.0-wmf.13/extensions/Wikibase/view/resources/jquery/wikibase/jquery.wikibase.badgeselector.js: T213998 Fix js type error when adding badges to items (duration: 00m 53s)
  • 12:53 dcausse@deploy1001: Synchronized wmf-config/CirrusSearch-production.php: T210381: [cirrus] Enable CirrusSearchCrossClusterSearch (duration: 00m 51s)
  • 12:46 dcausse@deploy1001: Synchronized php-1.33.0-wmf.13/extensions/UploadWizard/: T214007: Don't reuse existing input object (duration: 00m 53s)
  • 12:41 gtirloni: imported nfsd-ldap_1.2+deb9u1 in stretch-wikimedia (T209527)
  • 12:41 fsero: poweroff kubernetes1001 - T213859
  • 12:40 dcausse@deploy1001: Synchronized php-1.33.0-wmf.13/extensions/CirrusSearch/: Hack around cross cluster search bug (duration: 00m 59s)
  • 12:34 gehel: shutting down relforge1001 for PDU swap - T213859
  • 12:33 akosiaris@deploy1001: Finished deploy [citoid/deploy@269c9c7]: (no justification provided) (duration: 00m 48s)
  • 12:32 akosiaris@deploy1001: Started deploy [citoid/deploy@269c9c7]: (no justification provided)
  • 12:29 dcausse@deploy1001: Synchronized php-1.33.0-wmf.12/extensions/CirrusSearch/: Hack around cross cluster search bug (duration: 01m 00s)
  • 12:25 godog: poweroff restbase1010 / restbase1011 before A3 maint - T213859
  • 12:19 jynus: killing migrateActors.php --wiki=ptwiki on mwmaint, was using outdated db config T188327
  • 12:17 jijiki: poweroff rdb1005.eqiad.wmnet before A3 maint - T213859
  • 12:11 godog: poweroff ms-be1019 / ms-be1044 / ms-be1045 before A2 maint - T213748
  • 12:09 mvolz@deploy1001: scap-helm zotero finished
  • 12:09 mvolz@deploy1001: scap-helm zotero cluster codfw completed
  • 12:09 mvolz@deploy1001: scap-helm zotero upgrade production -f zotero-values-codfw.yaml stable/zotero [namespace: zotero, clusters: codfw]
  • 12:08 elukey: stop mariadb and shutdown db1107 to ease rack a2 maintenance
  • 12:04 mvolz@deploy1001: scap-helm zotero finished
  • 12:04 mvolz@deploy1001: scap-helm zotero cluster eqiad completed
  • 12:04 mvolz@deploy1001: scap-helm zotero upgrade production -f zotero-values-eqiad.yaml stable/zotero [namespace: zotero, clusters: eqiad]
  • 11:56 mvolz@deploy1001: scap-helm zotero finished
  • 11:56 mvolz@deploy1001: scap-helm zotero cluster staging completed
  • 11:56 mvolz@deploy1001: scap-helm zotero upgrade staging -f zotero-values-staging.yaml --version=0.0.1 stable/zotero [namespace: zotero, clusters: staging]
  • 11:55 arturo: T209527 copy nfsd-ldap between jessie-wikimedia and stretch-wikimedia in reprepro. It will require a rebuild though bc updated build-deps/deps
  • 11:55 mvolz@deploy1001: scap-helm zotero upgrade staging -f zotero-values-staging.yaml stable/zotero [namespace: zotero, clusters: staging]
  • 11:43 marostegui: Poweroff db1082 db1081 db1080 db1079 db1075 db1074 es1012 es1011 - T213748
  • 11:36 mvolz@deploy1001: scap-helm zotero finished
  • 11:36 mvolz@deploy1001: scap-helm zotero cluster codfw completed
  • 11:36 mvolz@deploy1001: scap-helm zotero upgrade production -f zotero-values-codfw.yaml stable/zotero [namespace: zotero, clusters: codfw]
  • 11:16 onimisionipe: shutdown elastic103[0-5] to prepare for T213859
  • 11:09 elukey: stop eventlogging on eventlog1002 and eventlogging replication on db1108 as prep step for db1107 maintenance
  • 10:55 marostegui: Lag will be generated on labs due to maintenance on sanitarium db masters
  • 10:54 marostegui: Stop MySQL on db1082 db1081 db1080 db1079 db1075 db1074 es1012 es1011 - T213748
  • 10:54 marostegui@deploy1001: Synchronized wmf-config/db-eqiad.php: Depool DBs on A2 rack T213748 (duration: 00m 54s)
  • 10:39 moritzm: installing libcaca security updates
  • 10:30 arturo: T213859 icinga downtime cloudservices1004 for 1 day
  • 10:29 moritzm: installing ruby-loofah security updates
  • 10:09 marostegui: Stop MySQL on db1103:3312 and db1103:3314, also poweroff the server - T213859
  • 10:08 moritzm: installing krb5 security updates on trusty
  • 10:04 marostegui@deploy1001: Synchronized wmf-config/db-eqiad.php: Depool db1103 - T213859 (duration: 00m 53s)
  • 09:59 marostegui: Poweroff dbproxy1001 dbproxy1002 dbproxy1003 for a3 maintenance - T213859
  • 09:25 marostegui: Poweroff dbstore1003 for hw maintenance T213859
  • 09:24 moritzm: power off graphite1003 for later hw maintenance (T213859)
  • 09:18 marostegui: Deploy schema change on db1095:3313 - T85757
  • 09:02 vgutierrez: rolling NIC firmware upgrade cp[1081-1090] - T203194
  • 08:42 jijiki: Enabling puppet on rdb1005 and switch redis::misc::master to rdb1006 - T213859
  • 08:37 moritzm: installing remaining systemd security updates on stretch
  • 08:32 jijiki: Restarting nutcracker on scb100* for 484572 - T213859
  • 08:32 jynus: stop, upgrade and restart db1075
  • 08:31 marostegui: Deploy schema change on s3 codfw, lag will be generated - T85757
  • 08:28 marostegui: Drop table tag_summary from enwiki - T212255
  • 08:24 jijiki: Disabling puppet on rdb1005 and switch redis::misc::master to rdb1006 - T213859
  • 07:43 marostegui@deploy1001: Synchronized wmf-config/db-eqiad.php: Increase weight for db1123 (duration: 00m 53s)
  • 07:20 marostegui: Change thread_pool_stall_limit on db1075 and db1078 - T213858
  • 07:18 marostegui: Enable GTID on db1075 - T213858
  • 07:04 marostegui@deploy1001: Synchronized wmf-config/db-eqiad.php: Remove s3 ready only T213858 (duration: 00m 30s)
  • 07:03 marostegui@deploy1001: Synchronized wmf-config/db-eqiad.php: Switchover s3master eqiad from db1075 to db1078 T213858 (duration: 00m 30s)
  • 07:01 marostegui@deploy1001: Synchronized wmf-config/db-eqiad.php: Set s3 on read-only T213858 (duration: 00m 31s)
  • 07:00 marostegui: Start s3 failover T213858
  • 06:30 marostegui: Disable puppet on db1075 and db1078 - T213858
  • 06:26 marostegui: Enable GTID back on all hosts but db1075 db1078 - T213858
  • 06:19 marostegui: Change s3 topology to get ready for s3 failover - T213858
  • 06:14 marostegui: Disable gtid on s3 hosts - T213858
  • 06:10 marostegui: Downtime s3 hosts for 2 hours - T213858
  • 04:12 ppchelko@deploy1001: Finished deploy [mobileapps/deploy@89c4d8d]: revert new summary (duration: 01m 55s)
  • 04:10 ppchelko@deploy1001: Started deploy [mobileapps/deploy@89c4d8d]: revert new summary
  • 04:02 cdanis@deploy1001: Started restart [parsoid/deploy@4b82683]: (no justification provided)

2019-01-16

  • 23:25 ppchelko@deploy1001: Finished deploy [recommendation-api/deploy@0ff39e2]: Deployment attempt with decreased worker count (duration: 04m 08s)
  • 23:21 ppchelko@deploy1001: Started deploy [recommendation-api/deploy@0ff39e2]: Deployment attempt with decreased worker count
  • 23:10 Krinkle: krinkle@tungsten:/srv/: rm -rf xhprof; for T196406
  • 21:35 ppchelko@deploy1001: Finished deploy [recommendation-api/deploy@c1b6b32]: Rollback update to 1a1f824 (duration: 01m 59s)
  • 21:33 ppchelko@deploy1001: Started deploy [recommendation-api/deploy@c1b6b32]: Rollback update to 1a1f824
  • 21:29 ppchelko@deploy1001: deploy aborted: log (duration: 00m 02s)
  • 21:29 ppchelko@deploy1001: Started deploy [recommendation-api/deploy@da83637]: log
  • 21:28 bmansurov@deploy1001: Finished deploy [recommendation-api/deploy@da83637]: Update to 1a1f824 (duration: 06m 14s)
  • 21:22 bmansurov@deploy1001: Started deploy [recommendation-api/deploy@da83637]: Update to 1a1f824
  • 21:17 bsitzmann@deploy1001: Finished deploy [mobileapps/deploy@6b344ca]: Update mobileapps to 258d76b page summary changes (duration: 06m 31s)
  • 21:10 bsitzmann@deploy1001: Started deploy [mobileapps/deploy@6b344ca]: Update mobileapps to 258d76b page summary changes
  • 20:20 dduvall@deploy1001: Synchronized php: group1 wikis to 1.33.0-wmf.13 (duration: 00m 51s)
  • 20:19 dduvall@deploy1001: rebuilt and synchronized wikiversions files: group1 wikis to 1.33.0-wmf.13
  • 19:48 gehel: switching wdqs categories traffic to new second instance, puppet will be disabled during the operation on all wdqs nodes - T213212
  • 19:29 thcipriani: restarting ci jenkins for upgrade
  • 19:13 thcipriani: restarting gerrit on cobalt for 2.15.8 upgrade
  • 19:12 thcipriani@deploy1001: Finished deploy [gerrit/gerrit@cec7995]: Gerrit to 2.15.8 on cobalt (duration: 00m 10s)
  • 19:12 thcipriani@deploy1001: Started deploy [gerrit/gerrit@cec7995]: Gerrit to 2.15.8 on cobalt
  • 19:09 thcipriani@deploy1001: Finished deploy [gerrit/gerrit@cec7995]: Gerrit to 2.15.8 on gerrit2001 only (duration: 00m 11s)
  • 19:09 thcipriani@deploy1001: Started deploy [gerrit/gerrit@cec7995]: Gerrit to 2.15.8 on gerrit2001 only
  • 19:04 thcipriani: starting gerrit upgrade to 2.15.8
  • 18:56 mutante: upgraded jenkins version for jessie and stretch in apt.wikimedia.org to latest LTS
  • 18:16 addshore: deploy slot done
  • 18:13 addshore@deploy1001: Synchronized wmf-config/InitialiseSettings.php: ConstraintsCheckJobs enabled on wikidatawiki (1% of edits) T204031 (duration: 00m 51s)
  • 18:07 smalyshev@deploy1001: Finished deploy [wdqs/wdqs@0aa107a]: Re-deploy for fixing vars.sh (duration: 11m 49s)
  • 18:03 addshore@deploy1001: Synchronized wmf-config/InitialiseSettings.php: ConstraintsCheckJobs enabled on testwikidatawiki T204031 (duration: 00m 52s)
  • 17:55 smalyshev@deploy1001: Started deploy [wdqs/wdqs@0aa107a]: Re-deploy for fixing vars.sh
  • 17:53 jynus: stop upgrade and restart db1111
  • 17:36 dcausse@deploy1001: Synchronized wmf-config/CirrusSearch-production.php: [cirrus] Start using replica group settings (take 2) (T210381) (duration: 00m 51s)
  • 17:35 dcausse@deploy1001: Synchronized wmf-config/InitialiseSettings.php: [cirrus] Start using replica group settings (take 2) (T210381) (duration: 00m 51s)
  • 17:22 vgutierrez: rolling NIC firmware upgrade cp[1077-1080] - T203194
  • 17:18 dcausse@deploy1001: Synchronized wmf-config/InitialiseSettings.php: EditorJourney: Enable data collection for viwiki T213348 (duration: 00m 52s)
  • 17:07 anomie@deploy1001: Synchronized php-1.33.0-wmf.12/includes/page/WikiPage.php: Add temporary logging for T210739 (duration: 00m 53s)
  • 17:05 vgutierrez: upgrading NIC firmware in cp1076 - T203194
  • 17:01 gehel@deploy1001: Finished deploy [wdqs/wdqs@6685dc0]: multi instance fixes (duration: 00m 27s)
  • 17:01 gehel@deploy1001: Started deploy [wdqs/wdqs@6685dc0]: multi instance fixes
  • 16:58 gehel@deploy1001: Finished deploy [wdqs/wdqs@6685dc0]: multi instance fixes (duration: 10m 29s)
  • 16:53 jynus: stop upgrade and restart db1112
  • 16:47 gehel@deploy1001: Started deploy [wdqs/wdqs@6685dc0]: multi instance fixes
  • 16:45 jynus@deploy1001: Synchronized wmf-config/db-eqiad.php: Repool db1123 (duration: 00m 52s)
  • 16:45 vgutierrez: upgrading NIC firmware on cp1075 - T203194
  • 16:08 jynus: upgrade and stop db1123
  • 16:02 jbond42: Import new debdeploy 0.0.99.7 packages for trusty T207845
  • 15:59 jbond42: Import new debdeploy 0.0.99.7 packages for buster T207845
  • 15:59 jynus@deploy1001: Synchronized wmf-config/db-eqiad.php: Depool db1123 (duration: 00m 52s)
  • 15:58 otto@deploy1001: Finished deploy [analytics/superset/deploy@f73b897]: bump to 0.26.3-wikimedia2 with chart format string fix (duration: 00m 36s)
  • 15:57 otto@deploy1001: Started deploy [analytics/superset/deploy@f73b897]: bump to 0.26.3-wikimedia2 with chart format string fix
  • 15:56 jbond42: Import new debdeploy 0.0.99.7 packages for jessie T207845
  • 15:41 jbond42: "Import new debdeploy 0.0.99.7 packages for stretch T207845
  • 15:35 marostegui@deploy1001: Synchronized wmf-config/db-eqiad.php: Repool db1078 T209815 (duration: 00m 52s)
  • 15:12 addshore: addshore@mwmaint1002:~$ mwscript extensions/OATHAuth/maintenance/disableOATHAuthForUser.php --wiki=labswiki Matthias_Geisler // T213928
  • 14:56 jynus: stop upgrade db1125 (this may cause temp. lag on labsdb hosts for s7, s6, s4, s2)
  • 14:35 otto@deploy1001: Started deploy [analytics/superset/deploy@UNKNOWN]: attempt to deploy 0.26.3-wikimedia1
  • 14:29 jynus: stop upgrade db1124 (this may have temp. lag on labsdb hosts for s1, s3, s5, s8)
  • 14:20 jynus@deploy1001: Synchronized wmf-config/db-eqiad.php: Repool es1019 fully (duration: 00m 52s)
  • 14:05 jynus@deploy1001: Synchronized wmf-config/db-eqiad.php: Repool es1019 with low load (duration: 00m 52s)
  • 13:15 marostegui: Stop MySQL on db1078 and power it off for firmware update - T209815
  • 13:15 marostegui@deploy1001: Synchronized wmf-config/db-eqiad.php: Depool db1078 T209815 (duration: 00m 52s)
  • 13:12 dcausse: eu SWAT done
  • 13:06 jynus@deploy1001: Synchronized wmf-config/db-eqiad.php: Repool db1077 fully (duration: 00m 52s)
  • 12:41 addshore@deploy1001: Synchronized php-1.33.0-wmf.12/extensions/WikibaseQualityConstraints: gerrit:484654 T204031 T204022 Fix constraintsRunCheck Job class & test (duration: 00m 54s)
  • 12:40 addshore@deploy1001: Synchronized php-1.33.0-wmf.13/extensions/WikibaseQualityConstraints: gerrit:484654 T204031 T204022 Fix constraintsRunCheck Job class & test (duration: 00m 57s)
  • 12:25 reedy@deploy1001: Synchronized wmf-config/throttle.php: T213848 (duration: 00m 53s)
  • 12:21 zfilipin@deploy1001: Synchronized wmf-config/InitialiseSettings.php: SWAT: Deploy the FileExporter as a beta feature on all Wikimedia wikis (T213425) (duration: 00m 53s)
  • 12:12 zfilipin@deploy1001: Synchronized wmf-config/InitialiseSettings.php: SWAT: Enable Partial Blocks on itwiki (T210444) (duration: 00m 53s)
  • 12:12 jynus: upgrade and restart db1095
  • 11:02 fsero: draining kubernetes1001 for maintenance T213859
  • 10:59 addshore: slot done
  • 10:59 addshore@deploy1001: Synchronized wmf-config/InitialiseSettings.php: wgWBQualityConstraintsEnableConstraintsCheckJobs false (duration: 00m 51s)
  • 10:53 addshore@deploy1001: Synchronized wmf-config/InitialiseSettings.php: wgWBQualityConstraintsEnableConstraintsCheckJobs true wd (duration: 00m 52s)
  • 10:48 addshore@deploy1001: Synchronized wmf-config/InitialiseSettings.php: wgWBQualityConstraintsEnableConstraintsCheckJobs true testwd (duration: 00m 52s)
  • 10:38 addshore@deploy1001: Synchronized wmf-config/InitialiseSettings.php: wikidatawiki, wgWBQualityConstraintsEnableConstraintsCheckJobsRatio 1% T204031 gerrit:484621 (duration: 00m 52s)
  • 10:28 godog: restart rsyslog on wezen, tls listener stuck
  • 10:25 jynus@deploy1001: Synchronized wmf-config/db-eqiad.php: Repool db1077 with low load (duration: 00m 51s)
  • 10:19 elukey: executed kafka preferred-replica-election on the logging Kafka cluster as attempt to spread load more uniformly
  • 10:19 addshore@deploy1001: Synchronized wmf-config/InitialiseSettings.php: testwikidatawiki, wgWBQualityConstraintsEnableConstraintsCheckJobsRatio 100 T204031 gerrit:484621 (duration: 00m 52s)
  • 10:18 addshore@deploy1001: sync-file aborted: testwikidatawiki, wgWBQualityConstraintsEnableConstraintsCheckJobsRatio 100 T204031 gerrit:484621 (duration: 00m 02s)
  • 10:14 addshore@deploy1001: Synchronized wmf-config/InitialiseSettings.php: testwikidatawiki, wgWBQualityConstraintsEnableConstraintsCheckJobsRatio 50 T204031 gerrit:484621 (duration: 00m 52s)
  • 10:13 addshore@deploy1001: sync-file aborted: testwikidatawiki, wgWBQualityConstraintsEnableConstraintsCheckJobsRatio 50 T204031 gerrit:484621 (duration: 00m 00s)
  • 10:03 addshore@deploy1001: Synchronized wmf-config/InitialiseSettings-labs.php: BETA ONLY, gerrit:484621 (duration: 00m 52s)
  • 09:53 godog: upgrade controller firmware on ms-be1016 - T213856
  • 09:47 jynus: upgrade and restart db1077
  • 09:42 jynus@deploy1001: Synchronized wmf-config/db-eqiad.php: Depool db1077 (duration: 00m 52s)
  • 09:29 marostegui: Stop s3 actor-migration script in order to allow s3 to catch up and to avoid lag during the failover - T188327 T213858
  • 09:17 godog: powercycle ms-be1016 - T213856
  • 09:16 marostegui: Stop replication in sync on dbstore1002:x1 and db2034 - T213670
  • 09:10 dcausse: T210381: elasticsearch search cluster, creating completion suggester indices on psi&omega elastic instances in eqiad&codfw
  • 09:00 godog: test roll-restart rsyslog on mw hosts in eqiad - T211124
  • 08:58 akosiaris@deploy1001: scap-helm zotero finished
  • 08:58 akosiaris@deploy1001: scap-helm zotero cluster eqiad completed
  • 08:58 akosiaris@deploy1001: scap-helm zotero install -n production -f zotero-values-eqiad.yaml stable/zotero [namespace: zotero, clusters: eqiad]
  • 08:57 marostegui: Re-point m3-master from dbproxy1003 to dbproxy1008 - T213865
  • 08:53 moritzm: installing systemd security updates for stretch
  • 08:53 akosiaris: depool zotero eqiad for helm release cleanup
  • 08:47 akosiaris: repool zotero in codfw
  • 08:42 filippo@deploy1001: Synchronized wmf-config/InitialiseSettings.php: Default to new logging infrastructure - T211124 (duration: 01m 05s)
  • 08:40 akosiaris@deploy1001: scap-helm zotero finished
  • 08:40 akosiaris@deploy1001: scap-helm zotero cluster codfw completed
  • 08:40 akosiaris@deploy1001: scap-helm zotero upgrade production -f zotero-values-codfw.yaml stable/zotero [namespace: zotero, clusters: codfw]
  • 08:30 akosiaris@deploy1001: scap-helm zotero finished
  • 08:30 akosiaris@deploy1001: scap-helm zotero cluster codfw completed
  • 08:30 akosiaris@deploy1001: scap-helm zotero install -n production -f zotero-values-codfw.yaml stable/zotero [namespace: zotero, clusters: codfw]
  • 08:25 akosiaris@deploy1001: scap-helm zotero finished
  • 08:25 akosiaris@deploy1001: scap-helm zotero cluster codfw completed
  • 08:25 akosiaris@deploy1001: scap-helm zotero install -f zotero-values-codfw.yaml stable/zotero [namespace: zotero, clusters: codfw]
  • 08:24 marostegui: Drop table tag_summary from s4 - T212255
  • 08:19 elukey: convert aria tables to innodb on dbstore1002 - T213706
  • 08:18 akosiaris: depool codfw zotero for helm release cleanups
  • 08:15 marostegui: Upgrade MySQL on db2043 (s3 codfw master)
  • 08:11 elukey: drop unneeded tables from the staging db on dbstore1002 according to T212493#4883535
  • 07:36 vgutierrez: powercycling cp1088 - T203194
  • 07:27 marostegui: Drop table tag_summary from s2 - T212255
  • 07:14 marostegui: Upgrade MySQL on db2050 and db2036
  • 06:07 SMalyshev: started transfer wdqs2005->2006
  • 06:06 marostegui: Deploy schema change on db1067 (s1 primary master) - T85757
  • 06:01 SMalyshev: depooling wdq2005 and wdqs2006 for T213854
  • 01:02 SMalyshev: repooled wdqs200[45] for now, 2006 still not done, will get to it later today
  • 00:15 mobrovac@deploy1001: Finished deploy [restbase/deploy@a04ebdd]: Restart RESTBase to pick up the fact that restbase1016 is not there - T212418 (duration: 21m 34s)

2019-01-15

  • 23:54 mobrovac@deploy1001: Started deploy [restbase/deploy@a04ebdd]: Restart RESTBase to pick up the fact that restbase1016 is not there - T212418
  • 22:53 tzatziki: removing one file for legal compliance
  • 22:50 jforrester@deploy1001: Synchronized php-1.33.0-wmf.13/extensions/WikibaseMediaInfo/resources/filepage/CaptionsPanel.js: Hot-deploy Ibb1f763f to unbreak setting captions on WikibaseMediaInfo (duration: 00m 51s)
  • 22:39 SMalyshev: repooled wdqs1008
  • 21:49 XioNoX: re-activate BGP to Zayo on cr1-eqiad - T212791
  • 21:39 SMalyshev: depooling wdqs2005 for T213854
  • 21:23 mutante: contint1001 rmdir /srv/org/wikimedia/integration/coverage ; rmdir /srv/org/wikimedia/integration/logs (T137890)
  • 21:21 mutante: doc.wikimedia.org httpd config has been removed from contint1001, is now on doc1001
  • 21:13 dduvall@deploy1001: rebuilt and synchronized wikiversions files: group0 to 1.33.0-wmf.13
  • 21:09 dduvall@deploy1001: Finished scap: testwiki to php-1.33.0-wmf.13 and rebuild l10n cache (duration: 32m 42s)
  • 20:36 dduvall@deploy1001: Started scap: testwiki to php-1.33.0-wmf.13 and rebuild l10n cache
  • 20:33 dduvall@deploy1001: Pruned MediaWiki: 1.33.0-wmf.8 (duration: 03m 04s)
  • 20:30 dduvall@deploy1001: Pruned MediaWiki: 1.33.0-wmf.6 (duration: 09m 15s)
  • 19:36 SMalyshev: started copying wdqs1008->wdqs2004 for T213854
  • 19:28 SMalyshev: depooling wdqs1008 and wdqs2004 for DB copying for T213854
  • 18:52 bblack: authdns-update for https://gerrit.wikimedia.org/r/c/operations/dns/+/484546 (make normal git stuff match manual changes already in place)
  • 18:44 hashar: [2019-01-15 18:44:06,959] [main] INFO com.google.gerrit.pgm.Daemon : Gerrit Code Review 2.15.6-5-g4b9c845200 ready
  • 18:43 hashar: Restarting Gerrit to catch up with a DNS change with the database
  • 18:43 volans: restarted debmonitor on debmonitor1001
  • 18:40 bblack: DNS manually updated for m1-master -> dbproxy1006 and m2-master -> dbproxy1007
  • 17:26 godog: roll-restart logstash in eqiad - T213081
  • 17:21 godog: depool logstash1007 before restarting logstash - T213081
  • 17:13 godog: set partitions to 3 for existing kafka-logging topics - T213081
  • 17:06 XioNoX: move back cr1-eqiad:xe-4/1/3 to xe-3/3/1 - T212791
  • 16:57 XioNoX: move cr1-eqiad:xe-3/3/1 to xe-4/1/3 - T212791
  • 16:52 jynus: stop db1115 for hw maintenance
  • 16:50 godog: roll-restart kafka-logging in eqiad to apply new topic defaults - T213081
  • 16:00 jynus: stop es1019 for hw maintenance T213422
  • 15:53 dcausse: T210381: elastic search clusters, catching up updates since first import on new psi&omega clusters in eqiad&codfw (from mwmaint1002)
  • 15:10 fdans@deploy1001: Finished deploy [analytics/superset/deploy@UNKNOWN]: reverting deploy of 0.26.3-wikimedia1 (duration: 00m 32s)
  • 15:10 fdans@deploy1001: Started deploy [analytics/superset/deploy@UNKNOWN]: reverting deploy of 0.26.3-wikimedia1
  • 15:02 fdans@deploy1001: Finished deploy [analytics/superset/deploy@9d6156a]: reverting deploy of 0.26.3-wikimedia1 (duration: 06m 06s)
  • 15:01 jynus@deploy1001: Synchronized wmf-config/db-eqiad.php: Repool db1103 (duration: 00m 48s)
  • 14:56 fdans@deploy1001: Started deploy [analytics/superset/deploy@9d6156a]: reverting deploy of 0.26.3-wikimedia1
  • 14:41 fdans@deploy1001: Finished deploy [analytics/superset/deploy@408a30e]: deploying 0.26.3-wikimedia1 (duration: 00m 36s)
  • 14:40 fdans@deploy1001: Started deploy [analytics/superset/deploy@408a30e]: deploying 0.26.3-wikimedia1
  • 14:14 moritzm: rebooting acamar
  • 13:53 marostegui: Downtime db1115 and es1019 for 4 hours - T196726 T213422
  • 13:33 marostegui@deploy1001: Synchronized wmf-config/db-eqiad.php: Repool db1119 T85757 (duration: 00m 46s)
  • 13:15 marostegui: Deploy schema change on db1119 - T85757
  • 13:15 marostegui@deploy1001: Synchronized wmf-config/db-eqiad.php: Depool db1119 T85757 (duration: 00m 46s)
  • 13:00 elukey: restart memcached on mc1024 to pick up new settings (-R 200) - T208844
  • 12:47 dcausse: EU SWAT done
  • 12:36 dcausse@deploy1001: Synchronized wmf-config/InitialiseSettings.php: T210381: [cirrus] Start writing to psi & omega (take 2) (2/2) (duration: 00m 45s)
  • 12:33 dcausse@deploy1001: Synchronized wmf-config/CirrusSearch-production.php: T210381: [cirrus] Start writing to psi & omega (take 2) (1/2) (duration: 00m 45s)
  • 12:15 onimisionipe: starting upgrading of prometheus-elasticsearch-exporter for eqiad T210592
  • 12:14 dcausse@deploy1001: Synchronized wmf-config/InitialiseSettings.php: Change links of wgGEHelpPanelLinks for kowiki T209467 (duration: 00m 46s)
  • 12:09 dcausse@deploy1001: Synchronized wmf-config/CommonSettings.php: [cirrus] Add cirrussearch-big-indices tag T210381 (duration: 00m 46s)
  • 12:06 jynus: upgrade and restart db1103
  • 12:03 onimisionipe: starting upgrading of prometheus-elasticsearch-exporter for codfw T210592
  • 11:50 jynus@deploy1001: Synchronized wmf-config/db-eqiad.php: Depool db1103 (duration: 00m 45s)
  • 11:44 jynus@deploy1001: Synchronized wmf-config/db-eqiad.php: Repool db1091 fully (duration: 00m 45s)
  • 11:10 marostegui@deploy1001: Synchronized wmf-config/db-eqiad.php: Repool db1083 T85757 (duration: 00m 45s)
  • 11:02 jynus: dropping database test on db1124:s5 with replication
  • 11:01 elukey: run 'apt-get purge tmpreaper' on mw1297,1298,2150,2151,2244,2245 (all role spare) to avoid daily cronspam
  • 10:58 END: (PASS) - Cookbook sre.hosts.upgrade-and-reboot (exit_code=0) (volans@cumin2001)
  • 10:57 marostegui: Deploy schema change on db1083 - T85757
  • 10:56 marostegui@deploy1001: Synchronized wmf-config/db-eqiad.php: Depool db1083 T85757 (duration: 00m 46s)
  • 10:53 START: - Cookbook sre.hosts.upgrade-and-reboot (volans@cumin2001)
  • 10:49 jynus@deploy1001: Synchronized wmf-config/db-eqiad.php: Repool db1091 with low load (duration: 00m 45s)
  • 10:40 marostegui@deploy1001: Synchronized wmf-config/db-eqiad.php: Repool db1080 T85757 (duration: 00m 45s)
  • 10:20 marostegui: Deploy schema change on db1080 - T85757
  • 10:20 marostegui@deploy1001: Synchronized wmf-config/db-eqiad.php: Depool db1080 T85757 (duration: 00m 45s)
  • 10:19 jynus: upgrade and restart db1091
  • 10:16 moritzm: installing zeromq3 security updates on stretch (jessie/trusty not affected)
  • 10:13 marostegui@deploy1001: Synchronized wmf-config/db-eqiad.php: Repool db1114 T85757 (duration: 00m 45s)
  • 09:51 marostegui: Deploy schema change on db1114 - T85757
  • 09:51 marostegui@deploy1001: Synchronized wmf-config/db-eqiad.php: Depool db1114 T85757 (duration: 00m 45s)
  • 09:45 marostegui@deploy1001: Synchronized wmf-config/db-eqiad.php: Repool db1106 T85757 (duration: 00m 46s)
  • 09:25 jynus@deploy1001: Synchronized wmf-config/db-eqiad.php: Depool db1091 (duration: 00m 46s)
  • 09:20 addshore: deploy slot done
  • 09:18 jynus: upgrade and restart db2078
  • 09:10 addshore@deploy1001: Synchronized wmf-config/InitialiseSettings.php: wgWBQualityConstraintsTypeCheckMaxEntities 300, T209504 (duration: 00m 46s)
  • 09:06 addshore@deploy1001: Synchronized wmf-config/InitialiseSettings.php: T209922 Add WikibaseQualityConstraints configs in testwikidatawiki (duration: 00m 47s)
  • 08:38 marostegui: Stop replication on s1 on all labs hosts - T85757
  • 08:28 marostegui: Deploy schema change on db1106 - T85757
  • 08:28 marostegui@deploy1001: Synchronized wmf-config/db-eqiad.php: Depool db1106 T85757 (duration: 00m 45s)
  • 08:23 marostegui@deploy1001: Synchronized wmf-config/db-eqiad.php: Repool db1089 T85757 (duration: 00m 46s)
  • 08:02 marostegui: Deploy schema change on db1089 - T85757
  • 08:02 marostegui@deploy1001: Synchronized wmf-config/db-eqiad.php: Depool db1089 T85757 (duration: 00m 45s)
  • 07:56 marostegui@deploy1001: Synchronized wmf-config/db-eqiad.php: Repool db1099:3311 T85757 (duration: 00m 46s)
  • 07:28 marostegui: Drop tag_summary from wikitech - T212255
  • 07:20 marostegui: Drop tag_summary from s5 - T212255
  • 07:07 marostegui: Deploy schema change on db1099:3311 - T85757
  • 07:07 marostegui@deploy1001: Synchronized wmf-config/db-eqiad.php: Depool db1099:3311 T85757 (duration: 00m 45s)
  • 06:35 marostegui@deploy1001: Synchronized wmf-config/db-eqiad.php: Pool pc1007 in pc1 - T208383 (duration: 00m 49s)
  • 02:12 smalyshev@deploy1001: Finished deploy [wdqs/wdqs@c920aec]: Re-deploy namespace script (duration: 08m 42s)
  • 02:04 smalyshev@deploy1001: Started deploy [wdqs/wdqs@c920aec]: Re-deploy namespace script
  • 01:54 mutante: wdqs1009 - icinga alerts about Blazegraph process for wdqs categories. starting wdsq blazegraph,.. already running
  • 01:12 mutante: cp1078 - bnxt_en - TX timeout detected - Host cp1078 is DOWN - powercycled via mgmt (T203194)
  • 00:44 catrope@deploy1001: Synchronized wmf-config/InitialiseSettings.php: Welcome survey experiment 2: 50% variation A, 50% variation C (duration: 00m 46s)
  • 00:37 catrope@deploy1001: Synchronized php-1.33.0-wmf.12/extensions/GrowthExperiments/: Make welcome survey config use array_plus_2d (duration: 00m 46s)
  • 00:34 catrope@deploy1001: Synchronized php-1.33.0-wmf.12/resources/lib/ooui/oojs-ui-core.js: OOUI backport (T213544) (duration: 00m 46s)
  • 00:08 catrope@deploy1001: Synchronized wmf-config/InitialiseSettings.php: Improve list of privileged groups (duration: 00m 46s)

2019-01-14

  • 23:49 gehel@deploy1001: Finished deploy [wdqs/wdqs@59d5f40]: New wdqs startup script for multi-instance (duration: 09m 53s)
  • 23:39 gehel@deploy1001: Started deploy [wdqs/wdqs@59d5f40]: New wdqs startup script for multi-instance
  • 23:30 mutante: doc1001 - disabling puppet, testing apache config change 483775
  • 23:12 ejegg: updated fundraising CiviCRM from 5580f0b11c to 6042acb363
  • 22:39 andrewbogott: upgraded packages and MW version on wikitech-static
  • 21:30 bsitzmann@deploy1001: Finished deploy [mobileapps/deploy@89c4d8d]: Update mobileapps to f2658de (fix ITN explore feed for dawiki) (duration: 03m 51s)
  • 21:26 bsitzmann@deploy1001: Started deploy [mobileapps/deploy@89c4d8d]: Update mobileapps to f2658de (fix ITN explore feed for dawiki)
  • 20:37 jforrester@deploy1001: Synchronized php-1.33.0-wmf.12/resources/Resources.php: Hot-deploy I18193b19 to add missing message for OOUI v0.30.0 (duration: 00m 47s)
  • 20:27 gehel@deploy1001: Finished deploy [wdqs/wdqs@f71131e]: upgradign wdqs1010 to latest version (duration: 00m 24s)
  • 20:27 gehel@deploy1001: Started deploy [wdqs/wdqs@f71131e]: upgradign wdqs1010 to latest version
  • 20:08 gehel: disabling puppet on all wdqs servers to deploy T213234
  • 19:58 dcausse: Morning SWAT done
  • 19:37 dcausse@deploy1001: Synchronized wmf-config/InitialiseSettings.php: Clean-up: Explain why WBMI wikis don't need wmgWikibaseRepoEntityNamespaces set (duration: 00m 46s)
  • 19:32 XioNoX: re-deactivate BGP to Zayo on cr1-eqiad - T212791
  • 19:29 dcausse@deploy1001: Synchronized php-1.33.0-wmf.12/extensions/GrowthExperiments/includes/WelcomeSurvey.php: Welcome survey: ignore check confirmed email (duration: 00m 45s)
  • 19:28 XioNoX: re-activate BGP to Zayo on cr1-eqiad - T212791
  • 19:19 jynus@deploy1001: Synchronized wmf-config/db-eqiad.php: Repool db1081 with low load (duration: 00m 47s)
  • 19:09 dcausse@deploy1001: Synchronized wmf-config/InitialiseSettings.php: T204016: Remove old ArticleCreationWorkflows config (duration: 00m 46s)
  • 18:48 jynus: stop upgrade and restart db1081
  • 18:45 jynus@deploy1001: Synchronized wmf-config/db-eqiad.php: depool db1081 (duration: 00m 46s)
  • 18:18 onimisionipe@deploy1001: Finished deploy [wdqs/wdqs@f71131e]: Category script and GUI updates, blazegraph launcher updates and moved RWStore from scap to puppet (duration: 10m 56s)
  • 18:07 onimisionipe@deploy1001: Started deploy [wdqs/wdqs@f71131e]: Category script and GUI updates, blazegraph launcher updates and moved RWStore from scap to puppet
  • 17:25 addshore: deploy slot done
  • 17:22 addshore@deploy1001: Synchronized wmf-config/InitialiseSettings.php: T201831 T201838 wmgWikibaseMaxItemIdForNewPropertyIdHtmlFormatter fully on (duration: 00m 46s)
  • 17:13 addshore@deploy1001: Synchronized wmf-config/InitialiseSettings.php: T201831 T201838 wmgWikibaseMaxItemIdForNewPropertyIdHtmlFormatter 3000 (duration: 00m 46s)
  • 17:11 addshore@deploy1001: sync-file aborted: T201831 T201838 wmgWikibaseMaxItemIdForNewPropertyIdHtmlFormatter 3000 (duration: 00m 01s)
  • 17:09 addshore@deploy1001: Synchronized wmf-config/Wikibase.php: T201831 T201838 Introduce wmgWikibaseMaxItemIdForNewPropertyIdHtmlFormatter PT 2/2 (duration: 00m 45s)
  • 17:08 addshore@deploy1001: Synchronized wmf-config/InitialiseSettings.php: T201831 T201838 Introduce wmgWikibaseMaxItemIdForNewPropertyIdHtmlFormatter PT 1/2 (duration: 00m 47s)
  • 16:56 ejegg: re-enabled fundraising scheduled jobs
  • 16:43 mobrovac@deploy1001: scap-helm -h finished
  • 16:43 mobrovac@deploy1001: scap-helm -h cluster codfw completed
  • 16:43 mobrovac@deploy1001: scap-helm -h cluster eqiad completed
  • 16:43 mobrovac@deploy1001: scap-helm -h [namespace: -h, clusters: eqiad,codfw]
  • 16:03 marostegui@deploy1001: Synchronized wmf-config/db-eqiad.php: Repool db1105:3311 T85757 (duration: 00m 45s)
  • 16:02 marostegui@deploy1001: Synchronized wmf-config/db-codfw.php: Repool db1105:3311 T85757 (duration: 00m 46s)
  • 15:57 akosiaris@deploy1001: scap-helm zotero finished
  • 15:57 akosiaris@deploy1001: scap-helm zotero cluster eqiad completed
  • 15:57 akosiaris@deploy1001: scap-helm zotero [namespace: zotero, clusters: eqiad]
  • 15:45 anomie: Running cleanupUsersWithNoIds.php on labswiki and labtestwiki, apparently they were left out when that was done for all other wikis (and so caused issues with the migrateActors.php run).
  • 15:44 fsero: downscaling old zotero-production-645dccfb64 replicaset on eqiad
  • 15:33 vgutierrez: rolling restart of cp1076-cp1090 to upgrade to kernel 4.9.144 - T203194
  • 15:17 ejegg: disabled fundraising scheduled jobs
  • 15:16 marostegui: Deploy schema change on db1105:3311 - T85757
  • 15:16 marostegui@deploy1001: Synchronized wmf-config/db-codfw.php: Depool db1105:3311 T85757 (duration: 00m 46s)
  • 15:08 volans: testing switchdc cookbooks in DRY-RUN mode w/ latest spicerack T205884 (no real changes expected)
  • 15:04 akosiaris: upgrade zotero pods to 2019-01-14-115905-candidate in eqiad T213693
  • 15:04 akosiaris@deploy1001: scap-helm zotero finished
  • 15:04 akosiaris@deploy1001: scap-helm zotero cluster eqiad completed
  • 15:04 akosiaris@deploy1001: scap-helm zotero upgrade production -f zotero-values-eqiad.yaml stable/zotero [namespace: zotero, clusters: eqiad]
  • 15:02 moritzm: imported debdeploy 0.0.99.6-1+deb10u1 for buster-wikimedia (T213527)
  • 15:02 vgutierrez: upgrading kernel in cp1075 to 4.1.144-1 - T203194
  • 15:00 moritzm: ran systemctl reset-failed on relforge1001
  • 14:57 marostegui: Drop table tag_summary from s6 - T212255
  • 14:52 akosiaris: upgrade zotero pods to 2019-01-14-115905-candidate in codfw T213693
  • 14:51 akosiaris@deploy1001: scap-helm zotero finished
  • 14:51 akosiaris@deploy1001: scap-helm zotero cluster codfw completed
  • 14:51 akosiaris@deploy1001: scap-helm zotero upgrade production -f zotero-values-codfw.yaml stable/zotero [namespace: zotero, clusters: codfw]
  • 14:42 anomie@mwmaint1002: Running migrateActors.php on wikitech for T188327. This may cause lag in codfw.
  • 14:42 anomie@mwmaint1002: Running migrateActors.php on section 8 wikis for T188327. This may cause lag in codfw.
  • 14:42 anomie@mwmaint1002: Running migrateActors.php on section 7 wikis for T188327. This may cause lag in codfw.
  • 14:42 anomie@mwmaint1002: Running migrateActors.php on section 6 wikis for T188327. This may cause lag in codfw.
  • 14:42 anomie@mwmaint1002: Running migrateActors.php on section 5 wikis for T188327. This may cause lag in codfw.
  • 14:42 anomie@mwmaint1002: Running migrateActors.php on section 4 wikis for T188327. This may cause lag in codfw.
  • 14:42 anomie@mwmaint1002: Running migrateActors.php on section 2 wikis for T188327. This may cause lag in codfw.
  • 14:41 anomie@mwmaint1002: Running migrateActors.php on section 1 wikis for T188327. This may cause lag in codfw.
  • 14:41 anomie@mwmaint1002: Running migrateActors.php on remaining section 3 wikis for T188327. This may cause lag in codfw.
  • 14:39 volans: updated python3-phabricator on cumin[12]001 T205884
  • 14:36 volans: uploaded python{,3}-phabricator 0.7.0-2~wmf1 to apt.w.o T205884 (upstream removes egg files)
  • 14:18 dcausse: elasticsearch (search cluster): pre-populating omega & psi clusters in eqiad & codfw (from mwmaint1002 and mwmaint2001 respectively) (T210381)
  • 14:13 akosiaris@deploy1001: scap-helm zotero finished
  • 14:13 akosiaris@deploy1001: scap-helm zotero cluster eqiad completed
  • 14:13 akosiaris@deploy1001: scap-helm zotero upgrade production -f zotero-values-eqiad.yaml stable/zotero [namespace: zotero, clusters: eqiad]
  • 14:11 akosiaris@deploy1001: scap-helm zotero upgrade production --debug -f zotero-values-eqiad.yaml stable/zotero [namespace: zotero, clusters: eqiad]
  • 14:10 akosiaris@deploy1001: scap-helm zotero upgrade production -f zotero-values-eqiad.yaml stable/zotero [namespace: zotero, clusters: eqiad]
  • 14:04 marostegui: Add pc1007 to tendril and zarcillo - T208383
  • 13:51 akosiaris@deploy1001: scap-helm zotero finished
  • 13:51 akosiaris@deploy1001: scap-helm zotero cluster codfw completed
  • 13:51 akosiaris@deploy1001: scap-helm zotero upgrade production -f zotero-values-codfw.yaml stable/zotero [namespace: zotero, clusters: codfw]
  • 13:49 Jeff_Green: authdns update for T210445
  • 13:48 dcausse: creating testcommonswiki index in the omega search-elastic cluster (eqiad & codfw)
  • 13:42 akosiaris@deploy1001: scap-helm zotero finished
  • 13:42 akosiaris@deploy1001: scap-helm zotero cluster codfw completed
  • 13:42 akosiaris@deploy1001: scap-helm zotero upgrade production --dry-run --debug -f zotero-values-codfw.yaml stable/zotero [namespace: zotero, clusters: codfw]
  • 13:41 akosiaris: rollback zotero codfw deployment
  • 13:37 akosiaris@deploy1001: scap-helm zotero upgrade production -f zotero-values-codfw.yaml stable/zotero [namespace: zotero, clusters: codfw]
  • 13:37 akosiaris@deploy1001: scap-helm zotero upgrade -f zotero-values-codfw.yaml stable/zotero [namespace: zotero, clusters: codfw]
  • 13:10 jijiki: Restarted npre on proton1002
  • 13:03 zeljkof: eu swat finished
  • 13:03 zfilipin@deploy1001: Synchronized wmf-config/InitialiseSettings.php: SWAT: Add http://mbc.cyfrowemazowsze.pl to $wgCopyUploadsDomains (T212469) (duration: 00m 46s)
  • 12:56 zfilipin@deploy1001: Synchronized wmf-config/InitialiseSettings.php: SWAT: Localisation of Babel categories on nap.wikipedia.org (T123188) (duration: 00m 44s)
  • 12:48 zfilipin@deploy1001: Synchronized wmf-config/InitialiseSettings.php: SWAT: Configure $wgImportSources for ne.wiktionary (T213023) (duration: 00m 45s)
  • 12:44 zfilipin@deploy1001: sync-file aborted: SWAT: Configure $wgNamespaceAliases for yue.wiktionary (T212678) (duration: 00m 01s)
  • 12:37 zfilipin@deploy1001: Synchronized wmf-config/InitialiseSettings.php: SWAT: Configure $wgNamespaceAliases for yue.wiktionary (T212678) (duration: 00m 45s)
  • 12:27 zfilipin@deploy1001: Synchronized wmf-config/InitialiseSettings.php: SWAT: Configure $wgAddGroups, $wgRemoveGroups and $wgImportSources for ur.wiki (T212612) (duration: 00m 46s)
  • 12:19 zfilipin@deploy1001: Synchronized wmf-config/InitialiseSettings.php: SWAT: Add suppressredirect user right to patroller user group at zh.wikivoyage (T212272) (duration: 00m 46s)
  • 12:10 zfilipin@deploy1001: Synchronized wmf-config/InitialiseSettings.php: SWAT: Create Portal namespace on shn.wikipedia (T212992) (duration: 00m 46s)
  • 12:05 zfilipin@deploy1001: Synchronized wmf-config/throttle.php: SWAT: Add new throttle rule for Berklee College of Music library (T213311) (duration: 00m 52s)
  • 11:20 volans: installed spicerack 0.0.13 on cumin1001 - T205884
  • 10:39 moritzm: start installing systemd security updates for stretch
  • 10:13 volans: installed spicerack 0.0.13 on cumin2001 for final testing - T205884
  • 10:11 volans: uploaded spicerack_0.0.13-1_amd64.deb to apt.wikimedia.org stretch-wikimedia T205884
  • 10:07 moritzm: install tmpreaper security updates on remaining hosts
  • 09:51 marostegui: Running aria_chk for all myisam tables on dbstore1002 T213670
  • 09:37 marostegui: Running aria_chk for all linter tables on dbstore1002 - T213670
  • 08:44 marostegui: Stop mysql on dbstore1002 - T213670
  • 08:38 marostegui: Stop MySQL on pc2010 to clone pc1007 - T208383
  • 07:48 elukey: executed bmc-device --debug --cold-reset on dbstore1002 - "No more sessions available" for mgmt

2019-01-13

  • 16:33 hoo: Updated operations/dumps/dcat (559dee37452..a86285f4e7) on snapshot1008

2019-01-12

  • 21:46 akosiaris: restart all zotero pods in eqiad
  • 16:12 moritzm: rebooting mw2167 for a test
  • 02:16 legoktm@deploy1001: Synchronized docroot/mediawiki.org/keys: Add Mukunda's new subkey that was used for the 1.32 release - T213521 (duration: 00m 47s)

2019-01-11

  • 21:56 jforrester@deploy1001: Finished scap: Full scap sync to update wmf.12 i18n for the weekend Idf2a67860f (duration: 19m 12s)
  • 21:37 jforrester@deploy1001: Started scap: Full scap sync to update wmf.12 i18n for the weekend Idf2a67860f
  • 18:43 legoktm@deploy1001: Synchronized wmf-config/CommonSettings.php: Update ExtensionDistributor for 1.32 release - https://gerrit.wikimedia.org/r/c/operations/mediawiki-config/+/483735 (duration: 00m 46s)
  • 18:07 marostegui@deploy1001: Synchronized wmf-config/db-codfw.php: Repool db2060 T210713 (duration: 00m 46s)
  • 17:10 marostegui: Deploy schema change on db2060 - T210713
  • 16:55 marostegui@deploy1001: Synchronized wmf-config/db-codfw.php: Depool db2060 T210713 (duration: 00m 46s)
  • 16:53 marostegui: Defragment change_tag table on db2060 - T210713
  • 14:37 jynus: upgrade and restart db2091 (s2, s4)
  • 14:12 jynus: updating mariadb client packages on cumin* hosts
  • 11:36 jynus@deploy1001: Synchronized wmf-config/db-eqiad.php: repool es1018 fully (duration: 00m 46s)
  • 11:21 jynus: stop, upgrade and reboot es2017
  • 11:04 jynus: stop, upgrade and reboot es2016
  • 10:51 jynus@deploy1001: Synchronized wmf-config/db-eqiad.php: repool es1018 with low load (duration: 00m 46s)
  • 10:31 jynus@deploy1001: Synchronized wmf-config/db-codfw.php: repool es2013 (duration: 00m 45s)
  • 10:30 jynus: upgrade and restart es1018
  • 09:58 jynus: upgrade and reboot es2013
  • 09:53 jynus@deploy1001: Synchronized wmf-config/db-codfw.php: depool es2013 (duration: 00m 45s)
  • 09:49 jynus@deploy1001: Synchronized wmf-config/db-codfw.php: depool es2013 (duration: 00m 47s)
  • 09:32 jynus: reset iLo on db2053
  • 08:49 moritzm: installing tmpreaper security updates
  • 02:40 krinkle@deploy1001: Synchronized wmf-config/InitialiseSettings.php: Ib87407165382 (duration: 00m 46s)
  • 01:20 jforrester@deploy1001: Synchronized wmf-config/InitialiseSettings.php: SWAT T211993 Enable GrowthExperiments help panel for 50% of new users on cswiki and kowiki (duration: 00m 46s)
  • 01:05 jforrester@deploy1001: Synchronized wmf-config/InitialiseSettings.php: SWAT T211993 Enable GrowthExperiments help panel on cswiki and kowiki (duration: 00m 45s)
  • 01:03 jforrester@deploy1001: Synchronized php-1.33.0-wmf.12/extensions/WikimediaEvents/includes/PageViews.php: SWAT: T213186 GrowthExperiments: Support templates for help desk title (duration: 00m 46s)
  • 00:50 XioNoX: bump prefix limit for AS6939 in eqsin
  • 00:18 jforrester@deploy1001: Synchronized php-1.33.0-wmf.12/extensions/AbuseFilter/includes/AbuseFilterHooks.php: T213453: Use slot in onEditFilterMergedContent and newVariableHolderForEdit in AbuseFilter (duration: 00m 47s)
  • 00:12 James_F: 482373 is live on mwdebug1002 for extensive checks.
  • 00:08 jforrester@deploy1001: Synchronized wmf-config/InitialiseSettings.php: SWAT Help panel: Set help desk page correctly on kowiki Ia94cfc571 (duration: 00m 46s)

2019-01-10

  • 23:45 Krinkle: krinkle@tungsten: upgrade xhgui to include upstream f039fb9f99f - T213218
  • 23:45 Krinkle: upgraded xhgui to upstream 2965240c91e52 (current upstream master) - T213218
  • 23:36 jforrester@deploy1001: Synchronized wmf-config/Wikibase.php: T213497 [Commons, TestCommons] Don't use Wikibase entity search (duration: 00m 46s)
  • 22:57 jforrester@deploy1001: Synchronized php-1.33.0-wmf.12/extensions/Wikibase/repo/includes/EditEntity/MediawikiEditFilterHookRunner.php: T213453: Pass slotrole into EditFilterMergedContent hook in Wikibase repo (duration: 00m 47s)
  • 20:47 marxarelli: both mediawiki error rates and 500 response rates have subsided back to pre-deploy levels
  • 20:19 marxarelli: seeing increase in "60 second timed out" error rate and rise in 503 rate, as was the case with group1 deployment. continuing to monitor
  • 20:11 gehel: restart blazegraph on wdqs1009 to validate new config
  • 20:02 tgr@deploy1001: Synchronized php-1.33.0-wmf.12/extensions/WikimediaEvents/modules/ve-wme/campaigns.js: SWAT: Remove unnecessary addPlugin wrapper (T213338) (duration: 00m 53s)
  • 19:50 tgr@deploy1001: Synchronized wmf-config/InitialiseSettings.php: SWAT: Remove AICaptcha settings (T186244) (duration: 00m 52s)
  • 19:47 tgr@deploy1001: Synchronized wmf-config/InitialiseSettings.php: SWAT: Whitelist *.*.archive.org in wgCopyUploadsDomains (T207581) (duration: 00m 53s)
  • 19:41 tgr: ran mwscript namespaceDupes.php bnwikibooks --fix (238 links fixed)
  • 19:41 volans: installed spicerack 0.0.12-1 on cumin2001 T205884
  • 19:39 volans: uploaded spicerack_0.0.12-1_amd64.deb to apt.wikimedia.org stretch-wikimedia T205884
  • 19:39 tgr@deploy1001: Synchronized wmf-config/InitialiseSettings.php: SWAT: Note that namespaceDupes.php maintenance script run will be needed after the deployment. (T203534) (duration: 00m 53s)
  • 19:14 marostegui: Deploy schema change on dbstore1001 - T85757
  • 19:13 marostegui: Deploy schema change on dbstore1002 - T85757
  • 18:57 tzatziki: deleting three files for legal compliance
  • 18:52 anomie@mwmaint1002: Running migrateActors.php on test wikis and mediawikiwiki for T188327. This may cause lag in codfw.
  • 18:47 marostegui: Deploy schema change on s1 codfw master (db2048) with replication, this will generate lag on s1 codfw - T85757
  • 18:46 marostegui: Stop replication on s1 codfw master for a schema change - T85757
  • 18:37 marostegui: Stop replication on s8 codfw master for a schema change - T85757
  • 18:30 marostegui: Upgrade mysql and kernel on db2060
  • 18:30 marostegui@deploy1001: Synchronized wmf-config/db-codfw.php: Repool db2053, db2060 for kernel and mysql upgrade (duration: 00m 51s)
  • 18:13 marostegui: Stop MySQL on db2046 for kernel upgrade
  • 18:12 marostegui: The above change was db2053 and not db2060
  • 18:11 marostegui: Stop MySQL on db2053 and db2060 for mysql and kernel upgrade
  • 18:11 marostegui@deploy1001: Synchronized wmf-config/db-codfw.php: Depool db2053, db2060 for kernel and mysql upgrade (duration: 00m 53s)
  • 17:50 jynus@deploy1001: Synchronized wmf-config/db-codfw.php: repool es2015 (duration: 00m 53s)
  • 17:49 marostegui: Deploy schema change on db2053 - T210713
  • 17:33 marostegui: Deploy schema change on db2046 - T210713
  • 16:59 jynus: stop and upgrade es2015
  • 16:52 jynus@deploy1001: Synchronized wmf-config/db-codfw.php: depool es2015 (duration: 00m 52s)
  • 16:41 onimisionipe: data transfer from wdqs1004 -> wdqs1006 completed! - T213361
  • 16:32 jforrester@deploy1001: Synchronized wmf-config/InitialiseSettings.php: T159708 Enable Structured Data on Commons, captions-only (duration: 00m 53s)
  • 16:17 James_F: T180981 Placed patch to enable WBMI on Commons on mwdebug1002
  • 16:13 jforrester@deploy1001: Synchronized wmf-config/InitialiseSettings.php: T180981 Add Commons to wikis with WikibaseMediaInfo installed (duration: 00m 52s)
  • 16:11 jforrester@deploy1001: Synchronized dblists/wikidatarepo.dblist: T180981 Add Commons to wikis with WikibaseRepo installed (duration: 00m 54s)
  • 16:04 James_F: T180981 Placed patch to install but not enable WBMI on Commons on mwdebug1002
  • 15:56 marostegui: Deploy schema change on db1068 (s4 master) - T86338
  • 15:31 fsero: rollbacking last zotero codfw deployment
  • 15:27 marostegui: Deploy schema change on db1067 (s1 master) - T86338 T202167
  • 15:26 marostegui@deploy1001: Synchronized wmf-config/db-eqiad.php: Repool db1080 T86338 T202167 (duration: 00m 49s)
  • 15:24 addshore: T208330, MariaDB [testcommonswiki]> TRUNCATE TABLE wb_terms; # Was https://phabricator.wikimedia.org/P7973
  • 15:22 fsero@deploy1001: scap-helm zotero upgrade production -f /srv/scap-helm/zotero/zotero-values-codfw.yaml /srv/deployment-charts/charts/zotero-0.0.1.tgz [namespace: zotero, clusters: codfw]
  • 15:21 fsero@deploy1001: scap-helm zotero upgrade -f /srv/scap-helm/zotero/zotero-values-codfw.yaml /srv/deployment-charts/charts/zotero-0.0.1.tgz [namespace: zotero, clusters: codfw]
  • 15:20 addshore@deploy1001: Synchronized php-1.33.0-wmf.12/extensions/Wikibase/repo/includes/Content: T208330 dont write to wb_terms for mediainfo (duration: 00m 54s)
  • 15:12 addshore@deploy1001: Synchronized php-1.33.0-wmf.9/extensions/Wikibase/repo/includes/Content: T208330 dont write to wb_terms for mediainfo (duration: 00m 55s)
  • 14:59 marostegui: Deploy schema change on db1080 - T86338 T202167
  • 14:59 marostegui@deploy1001: Synchronized wmf-config/db-eqiad.php: Depool db1080 T86338 T202167 (duration: 00m 52s)
  • 14:55 marostegui@deploy1001: Synchronized wmf-config/db-eqiad.php: Repool db1114 T86338 T202167 (duration: 00m 52s)
  • 14:42 fsero@deploy1001: scap-helm zotero finished
  • 14:42 fsero@deploy1001: scap-helm zotero cluster staging completed
  • 14:42 fsero@deploy1001: scap-helm zotero upgrade staging -f /srv/scap-helm/zotero/zotero-values-staging.yaml /srv/deployment-charts/charts/zotero-0.0.1.tgz [namespace: zotero, clusters: staging]
  • 14:36 fsero@deploy1001: scap-helm zotero finished
  • 14:36 fsero@deploy1001: scap-helm zotero cluster staging completed
  • 14:36 fsero@deploy1001: scap-helm zotero upgrade staging -f /srv/scap-helm/zotero/zotero-values-staging.yaml /srv/deployment-charts/charts/zotero-0.0.1.tgz [namespace: zotero, clusters: staging]
  • 14:35 fsero@deploy1001: scap-helm zotero upgrade staging -f /srv/scap-helm/zotero/zotero-values-staging.yaml [namespace: zotero, clusters: staging]
  • 14:33 fsero@deploy1001: scap-helm -h finished
  • 14:33 fsero@deploy1001: scap-helm -h cluster staging completed
  • 14:33 fsero@deploy1001: scap-helm -h [namespace: -h, clusters: staging]
  • 14:33 marostegui: Deploy schema change on db1114 - T86338 T202167
  • 14:33 marostegui@deploy1001: Synchronized wmf-config/db-eqiad.php: Depool db1114 T86338 T202167 (duration: 00m 53s)
  • 14:14 jynus@deploy1001: Synchronized wmf-config/db-eqiad.php: depool es1019 (duration: 00m 53s)
  • 13:51 arturo: T212302 icinga downtime for 2h cloudvirt[1013,1024,1026-1030].eqiad.wmnet bc wrong puppet code
  • 13:24 jynus@deploy1001: Synchronized wmf-config/db-eqiad.php: depool es1018 (duration: 00m 52s)
  • 13:10 jynus@deploy1001: Synchronized wmf-config/db-codfw.php: Repool es2012 (duration: 00m 52s)
  • 13:01 zeljkof: EU SWAT finished
  • 13:01 zfilipin@deploy1001: Synchronized dblists/mobilemainpagelegacy.dblist: SWAT: Remove main page special casing from ruwikibooks and ruwikiquote (T212849) (duration: 00m 52s)
  • 12:58 zfilipin@deploy1001: Synchronized dblists/mobilemainpagelegacy.dblist: SWAT: Remove main page special casing from eswiki (T212849) (duration: 00m 53s)
  • 12:53 zfilipin@deploy1001: Synchronized dblists/mobilemainpagelegacy.dblist: SWAT: Turn off main page special casing for svwiki (T213018) (duration: 00m 52s)
  • 12:46 zfilipin@deploy1001: Synchronized dblists/flow.dblist: SWAT: Disable unused Flow extension on ur.wikibooks (T207627) (duration: 00m 55s)
  • 12:42 onimisionipe: starting data transfer from wdqs1004 -> wdqs1006 - T213361
  • 12:34 onimisionipe: starting data transfer from wdqs1003 -> wdqs1006 - T213361 - aborted (nodes are in different cluster)
  • 12:28 zfilipin@deploy1001: Synchronized wmf-config/InitialiseSettings.php: SWAT: Re-enable QuickSurveys extension on enwiki (T209882) (duration: 00m 52s)
  • 12:20 jynus: stop and upgrade es2012
  • 12:12 zfilipin@deploy1001: Synchronized dblists/flow.dblist: SWAT: Reverted "Revert "Disable unused Flow extension on de.wikiversity"" (T207626) (duration: 00m 53s)
  • 12:01 jynus@deploy1001: Synchronized wmf-config/db-codfw.php: Depool es2012 (duration: 00m 52s)
  • 11:54 onimisionipe: starting data transfer from wdqs1003 -> wdqs1006 - T213361
  • 10:59 gilles@deploy1001: Synchronized wmf-config/InitialiseSettings.php: T209857 Increase CPU benchmark sampling rate (duration: 00m 53s)
  • 10:58 fsero: uploaded docker-registry_2.7.0~rc0~wmf1-1 debian package to reprepro for stretch-wikimedia (done yesterday at 17:21 UTC forgot about the log)
  • 10:26 gilles@deploy1001: Synchronized wmf-config/InitialiseSettings.php: T209857 Run CPU benchmark for a portion of navtiming pageloads (duration: 00m 52s)
  • 10:10 gilles@deploy1001: Synchronized wmf-config/InitialiseSettings.php: T209857 Run CPU benchmark for a portion of navtiming pageloads (duration: 00m 53s)
  • 09:52 gilles@deploy1001: Synchronized wmf-config/InitialiseSettings.php: T187299 Decrease ruwiki navtiming rate (duration: 00m 52s)
  • 09:45 gilles@deploy1001: Synchronized tests/InitialiseSettingsTest.php: T211395 T211529 tests: Assert that extra namespaces have correspondent talk namespaces (duration: 00m 56s)
  • 09:34 moritzm: updated thirdparty/php72 component for stretch-wikimedia to 7.2.13
  • 01:41 catrope@deploy1001: Synchronized wmf-config/InitialiseSettings.php: Make GrowthExperiments config wmf.12-proof (duration: 00m 52s)
  • 01:21 catrope@deploy1001: Synchronized wmf-config/InitialiseSettings.php: Revert latest config patch (caused fatal errors on kowiki) (duration: 00m 52s)
  • 00:58 catrope@deploy1001: Synchronized wmf-config/InitialiseSettings.php: Configure help desk page for help panel correctly on kowiki (T213186) (duration: 00m 53s)
  • 00:56 cstone: updated fundraising tools from 5f44d9dd43 to da82ed111d
  • 00:34 catrope@deploy1001: Synchronized php-1.33.0-wmf.12/includes/MovePage.php: Fix missing ATOMIC_CANCELABLE in MovePage::move() (T213168) (duration: 00m 53s)
  • 00:20 catrope@deploy1001: Synchronized php-1.33.0-wmf.12/extensions/GrowthExperiments/: Help panel fixes (T212973, T212890, T213186) (duration: 00m 54s)
  • 00:13 catrope@deploy1001: Synchronized wmf-config/InitialiseSettings.php: Enable EventLogging for GrowthExperiments help panel (T211991) (duration: 00m 54s)

2019-01-09

  • 23:51 mutante: thumb1004 - still needs broken RAM replaced, expired downtime, re-ACKed (T207721)
  • 23:39 mutante: mw2151 - change netbox status from active to staged - it's not actually active, it's role(spare) and was jessie (T192457)
  • 23:34 mutante: reinstalling mw2151.codfw.wmnet because it was the very last mw* host on jessie
  • 21:20 bblack: multatuli (ns2) - upgrade gdnsd to 9949 beta release
  • 21:04 ppchelko@deploy1001: Finished deploy [cpjobqueue/deploy@bfa9241]: Increase concurrency for categoryMembershipJob T192691 (duration: 00m 45s)
  • 21:04 James_F: Creating Wikibase repo tables on Commons for T68108
  • 21:03 ppchelko@deploy1001: Started deploy [cpjobqueue/deploy@bfa9241]: Increase concurrency for categoryMembershipJob T192691
  • 21:00 James_F: Running rebuildall on TestCommons
  • 20:53 bblack: authdns1001 (ns0) - upgrade gdnsd to 9949 beta release
  • 20:45 James_F: Created Wikibase repo tables on TestCommons
  • 20:11 dduvall@deploy1001: Synchronized php: group1 wikis to 1.33.0-wmf.12 (duration: 00m 53s)
  • 20:10 dduvall@deploy1001: rebuilt and synchronized wikiversions files: group1 wikis to 1.33.0-wmf.12
  • 19:28 crusnov@deploy1001: Finished deploy [netbox/deploy@7fe39e1]: Deploy Django security upgrade (duration: 04m 33s)
  • 19:23 crusnov@deploy1001: Started deploy [netbox/deploy@7fe39e1]: Deploy Django security upgrade
  • 19:01 ejegg: updated standalone SmashPig deploy from 25713ca232 to 78b92b7fef
  • 18:43 bblack: authdns2001 (ns1) - upgrade gdnsd to 9949 beta release
  • 18:26 XioNoX: add bgp sessions to AS31800 on cr1-eqsin
  • 18:19 marostegui: Rename table tag_summary on enwiki on db1089 - T212255
  • 18:18 XioNoX: add bgp sessions to AS38895 on cr1-eqsin
  • 18:04 marostegui: Drop valid_tag from s3 master (db1075) - T212254
  • 17:39 tarrow: That last one was SWAT: T209504 Increase PHP constraint check entities to 150
  • 17:36 tarrow@deploy1001: Synchronized wmf-config/InitialiseSettings.php: (no justification provided) (duration: 00m 53s)
  • 17:28 marostegui: Reload haproxy on dbproxy1010 to depool labsdb1011 - T86338
  • 17:18 James_F: Ran `namespaceDupes.php --wiki=bewikibooks` on mwmaint1002, no change
  • 17:16 bblack: uploaded gdnsd-2.99.9949-beta-1+wmf1 to reprepro for stretch-wikimedia
  • 17:05 marostegui@deploy1001: Synchronized wmf-config/db-eqiad.php: Repool db1083 T86338 T202167 (duration: 00m 52s)
  • 16:29 marostegui: Deploy schema change on db1083 - T86338 T202167
  • 16:29 marostegui@deploy1001: Synchronized wmf-config/db-eqiad.php: Depool db1083 T86338 T202167 (duration: 00m 53s)
  • 16:17 jynus@deploy1001: Synchronized wmf-config/db-eqiad.php: Repool db1082 with full weight (duration: 00m 53s)
  • 16:11 jforrester@deploy1001: Synchronized php-1.33.0-wmf.12/extensions/Wikibase/repo/RepoHooks.php: T213227 RepoHooks::onApiCheckCanExecute: Only fail if the edit is for our entity's slot (duration: 00m 54s)
  • 15:50 marostegui: Drop valid_tag tables from db1095 (s3) - T212254
  • 15:34 marostegui@deploy1001: Synchronized wmf-config/db-eqiad.php: Repool db1106 T86338 T202167 (duration: 00m 51s)
  • 15:23 jijiki: restarting scb* pdfrender
  • 15:10 marostegui: Deploy schema change on db1106 (sanitarium s1 master) with replication, lag will be generated on s1 labs - T86338 T202167
  • 15:10 marostegui@deploy1001: Synchronized wmf-config/db-eqiad.php: Depool db1106 T86338 T202167 (duration: 00m 52s)
  • 14:39 elukey: restart Hadoop HDFS namenodes on an-master100[1,2] to complete decom of analytics1028->41
  • 14:36 marostegui@deploy1001: Synchronized wmf-config/db-eqiad.php: Repool db1077 T212254 (duration: 00m 53s)
  • 14:36 volans@deploy1001: Finished deploy [debmonitor/deploy@0f096de]: Deploy Django security upgrade (duration: 01m 50s)
  • 14:34 volans@deploy1001: Started deploy [debmonitor/deploy@0f096de]: Deploy Django security upgrade
  • 14:28 marostegui: valid_tag table on db1077 with replication (lag will be generated on labs s3) - T212254
  • 14:27 marostegui@deploy1001: Synchronized wmf-config/db-eqiad.php: Depool db1077 T212254 (duration: 00m 52s)
  • 13:32 urandom: forcing removal of restbase1016-c (host down way too long to salvage) -- T212418
  • 13:29 jynus@deploy1001: Synchronized wmf-config/db-eqiad.php: Repool db1082 with low weight (duration: 00m 52s)
  • 13:26 zeljkof: EU SWAT finished
  • 13:22 zfilipin@deploy1001: Synchronized php-1.33.0-wmf.9/: SWAT: Fix order of arguments in ChangeTags::getPrevTags ([T212703]) (duration: 05m 50s)
  • 13:08 zfilipin@deploy1001: Synchronized php-1.33.0-wmf.12/: SWAT: Fix order of arguments in ChangeTags::getPrevTags ([T212703]) (duration: 06m 54s)
  • 13:00 zeljkof: extending eu swat for 5-10 minutes
  • 12:51 zfilipin@deploy1001: Synchronized wmf-config/InitialiseSettings.php: SWAT: Enable signature button in toolbar for the "Arbitration" namespace in ruwiki (T213049) (duration: 00m 52s)
  • 12:44 moritzm: installing OpenSSL 1.0.2 security updates for stretch
  • 12:40 zfilipin@deploy1001: Synchronized wmf-config/InitialiseSettings.php: SWAT: Disable reader trust survey (T209882) (duration: 01m 07s)
  • 12:02 gehel: repool wdqs100[78] - data import complete - T213210
  • 11:55 jynus: enabling gtid on db1124:s5
  • 11:54 jynus: enabling gtid on db1082
  • 11:23 jynus: stopping db1082 and db2052 s5 replication in sync to migrate db1124:s5 master
  • 10:30 moritzm: fixed package installation status on db2062
  • 10:01 volans: upgraded spicerack to 0.0.11 on cumin2001 T205884
  • 10:00 volans: uploaded spicerack_0.0.11 to apt.wikimedia.org stretch-wikimedia T205884
  • 09:44 hashar: Some CI npm jobs get broken due to a faulty node module. https://phabricator.wikimedia.org/T213249
  • 09:38 banyek: repooling labdsb1010 - T210693
  • 09:26 banyek: dropping materialized views on labdb1010 - T210693
  • 09:26 banyek: depooled labsdb1010
  • 08:28 moritzm: installing openssl security updates for on stretch-based DB servers
  • 07:55 moritzm: installing libseccomp updates from stretch point release
  • 07:43 hashar: contint1001: restarted Zuul to take in account SMTP configuration | https://gerrit.wikimedia.org/r/376739 | T93414
  • 06:03 kartik@deploy1001: Finished deploy [cxserver/deploy@1098942]: Update cxserver to 656c468 (duration: 04m 08s)
  • 05:59 kartik@deploy1001: Started deploy [cxserver/deploy@1098942]: Update cxserver to 656c468
  • 01:15 jforrester@deploy1001: Synchronized php-1.33.0-wmf.12/extensions/Wikibase/repo/RepoHooks.php: T213227 Don't have onApiCheckCanExecute die for inactive entity types (duration: 00m 53s)
  • 01:04 jforrester@deploy1001: Synchronized docroot/: T187716 Remove mobilelanding.php, no longer pointed to by Apache (duration: 00m 52s)
  • 00:58 jforrester@deploy1001: Synchronized wmf-config/InitialiseSettings.php: [Wikimania] Add 2019 content to default search (duration: 00m 53s)
  • 00:48 jforrester@deploy1001: Synchronized wmf-config/InitialiseSettings.php: T202683 [Wikimania] Create year namespaces for each Wikimania, 2005–2019 (duration: 00m 53s)
  • 00:34 tgr@deploy1001: Synchronized wmf-config/CommonSettings.php: SWAT: Make password policy and logging code saner (duration: 00m 52s)
  • 00:33 tgr@deploy1001: Synchronized wmf-config/InitialiseSettings.php: SWAT: Make password policy and logging code saner (duration: 00m 55s)

2019-01-08

  • 23:44 SMalyshev: repooled wdqs1004
  • 23:35 eileen: process-control config revision is 9dc6e63fcd
  • 23:00 XioNoX: Update pfw3-codfw/eqiad security policies - T213100
  • 22:39 XioNoX: deactivate policy-statement BGP_fundraising_aggregates term nat on pfw3-eqiad/codfw - T211028
  • 22:29 gehel: starting data copy from wdqs1007 to wdqs1008 (both will be depooled) - T213217
  • 22:27 jforrester@deploy1001: Synchronized wmf-config/InitialiseSettings.php: TestCommons: Add default search NSes (duration: 00m 51s)
  • 22:22 James_F: Ran /docroot/noc/createTxtFileSymlinks.sh for new dblist
  • 22:21 jforrester@deploy1001: Synchronized wmf-config/InitialiseSettings.php: Use new wikidatarepo dblist where appropriate (duration: 00m 52s)
  • 22:20 jforrester@deploy1001: Synchronized wmf-config/CommonSettings.php: dblists: Load wikibaserepo (duration: 00m 52s)
  • 22:15 jforrester@deploy1001: scap failed: average error rate on 9/11 canaries increased by 10x (rerun with --force to override this check, see https://logstash.wikimedia.org/goto/db09a36be5ed3e81155041f7d46ad040 for details)
  • 22:14 jforrester@deploy1001: Synchronized dblists/wikidata.dblist: dblists: Remove testcommons from wikidata list (duration: 00m 52s)
  • 22:13 jforrester@deploy1001: Synchronized dblists/wikidatarepo.dblist: dblists: Add wikidatarepo list (duration: 00m 53s)
  • 22:12 urandom: forcing removal of restbase1016-b (host down way too long to salvage) -- T212418
  • 22:08 marostegui: Drop valid_tag table from db2043 with replication (s3 codfw master - lag will be generated) - T212254
  • 22:03 krinkle@deploy1001: Synchronized wmf-config/CommonSettings.php: cleanup - Idfa129a65a41 (duration: 00m 53s)
  • 21:53 marostegui@deploy1001: Synchronized wmf-config/db-eqiad.php: Repool db1078 T212254 (duration: 00m 52s)
  • 21:49 marostegui: Drop valid_tag table from db1078 (s3) - T212254
  • 21:49 marostegui@deploy1001: Synchronized wmf-config/db-eqiad.php: Depool db1078 T212254 (duration: 00m 53s)
  • 21:43 marostegui@deploy1001: Synchronized wmf-config/db-eqiad.php: Repool db1123 T212254 (duration: 00m 53s)
  • 21:38 marostegui: Drop valid_tag table from db1123 (s3) - T212254
  • 21:38 marostegui@deploy1001: Synchronized wmf-config/db-eqiad.php: Depool db1123 T212254 (duration: 00m 53s)
  • 21:31 dduvall@deploy1001: rebuilt and synchronized wikiversions files: group0 to 1.33.0-wmf.12
  • 21:03 dduvall@deploy1001: Finished scap: testwiki to php-1.33.0-wmf.12 and rebuild l10n cache (duration: 39m 22s)
  • 20:42 ejegg: updated payments-wiki from b8acb95a2a to c455bbc6bb
  • 20:24 dduvall@deploy1001: Started scap: testwiki to php-1.33.0-wmf.12 and rebuild l10n cache
  • 20:24 gehel: starting data copy from wdqs1004 to wdqs1007 (both will be depooled) - T213217
  • 20:21 jforrester@deploy1001: Synchronized wmf-config/InitialiseSettings.php: TestCommons: Don't enable entities, we're not Wikidata.org (duration: 01m 44s)
  • 20:11 XioNoX: change BGP_fundraising_aggregates term nat from static to aggregate on pfw3-eqiad - T211028
  • 19:51 ejegg: updated fundraising CiviCRM from b8e3a71845 to 5580f0b11c
  • 19:48 krinkle@deploy1001: Finished deploy [performance/navtiming@68fd54d]: (no justification provided) (duration: 00m 05s)
  • 19:48 krinkle@deploy1001: Started deploy [performance/navtiming@68fd54d]: (no justification provided)
  • 19:48 dduvall@deploy1001: Pruned MediaWiki: 1.33.0-wmf.12 (duration: 06m 26s)
  • 19:11 arlolra: Updated Parsoid to 2c5dc7b (T197616, T205491, T209772, T199926, T209194, T204622)
  • 19:06 marostegui: Drop valid_tag table from s1 - T212254
  • 19:00 arlolra@deploy1001: Finished deploy [parsoid/deploy@4b82683]: Updating Parsoid to 2c5dc7b (duration: 10m 40s)
  • 18:54 XioNoX: make pfw3-codfw source NAT similar to pfw3-eqiad - T211028
  • 18:54 ejegg: updated SmashPig standalone install from fb3268897b to 25713ca232
  • 18:50 marostegui: Drop valid_tag table from s4 - T212254
  • 18:50 XioNoX: add NAT workaround to pfw3-eqiad - T211028
  • 18:49 arlolra@deploy1001: Started deploy [parsoid/deploy@4b82683]: Updating Parsoid to 2c5dc7b
  • 18:38 XioNoX: temporarily permit ssh from frpm1001 to pfw3-eqiad on pfw3-eqiad
  • 18:28 marostegui@deploy1001: Synchronized wmf-config/db-eqiad.php: Repool db1105:3311 T86338 T202167 (duration: 00m 45s)
  • 18:27 jynus: restarting s5 replication on labsdb1009/10/11
  • 17:41 moritzm: installing libseccomp updates from stretch point release
  • 17:40 mobrovac@deploy1001: Finished deploy [restbase/deploy@503b29c]: Add test-commons and nap.wikisource, take #2 (duration: 02m 29s)
  • 17:38 mobrovac@deploy1001: Started deploy [restbase/deploy@503b29c]: Add test-commons and nap.wikisource, take #2
  • 17:37 mobrovac@deploy1001: Finished deploy [restbase/deploy@503b29c]: Add test-commons and nap.wikisource - T210752 T197616 (duration: 96m 50s)
  • 17:33 _joe_: applying the new apache configuration to jobrunners in eqiad
  • 17:24 elukey: roll restart of aqs on aqs100* to pick up new Druid settings
  • 17:20 _joe_: depooling mw1299 for testing of the apache change
  • 17:16 SMalyshev: restarted Blazegraph wdqs1006 due to unresponsiveness (caused by load?)
  • 16:56 urandom: forcing removal of restbase1016-a (host down way too long to salvage) -- T212418
  • 16:56 jynus: changing db1124:s5 replication to db2066
  • 16:55 marostegui: Deploy schema change on db1105:3311 T86338 T202167
  • 16:54 marostegui@deploy1001: Synchronized wmf-config/db-eqiad.php: Depool db1105:3311 T86338 T202167 (duration: 00m 44s)
  • 16:54 jynus: stopping s5 replication on labsdb1009/10/11 to prevent undoable mistakes
  • 16:34 marostegui@deploy1001: Synchronized wmf-config/db-codfw.php: Repool es2019 - T212833 (duration: 02m 51s)
  • 16:29 marostegui@deploy1001: Synchronized wmf-config/db-eqiad.php: Repool db1089 T86338 T202167 (duration: 00m 45s)
  • 16:12 XioNoX: add BGP sessions to AS64050 in AMS-IX
  • 16:04 marostegui: Drop valid_tag table from s7 - T212254
  • 16:00 mobrovac@deploy1001: Started deploy [restbase/deploy@503b29c]: Add test-commons and nap.wikisource - T210752 T197616
  • 15:59 marostegui: Deploy schema change on db1089 T86338 T202167
  • 15:56 marostegui@deploy1001: Synchronized wmf-config/db-eqiad.php: Depool db1089 T86338 T202167 (duration: 00m 45s)
  • 15:45 marostegui: Drop valid_tag table from s2 - T212254
  • 15:32 marostegui: Stop MySQL on es2019 for upgrade - T212833
  • 15:23 godog: briefly stop carbon daemons on graphite1004 to move /srv/whisper -> /srv/carbon/whisper
  • 15:17 marostegui: Increase connections from 10 to 50 for recommendationapiservice on m2 - T212154
  • 15:10 marostegui@deploy1001: Synchronized wmf-config/db-codfw.php: Depool es2019 - T212833 (duration: 00m 44s)
  • 15:04 hashar: Restarted CI Jenkins
  • 13:02 zeljkof: EU SWAT finished
  • 12:59 jynus: transfering db1102:s5 mariadb datadir to db1082
  • 12:57 zfilipin@deploy1001: Synchronized wmf-config/InitialiseSettings.php: SWAT: Give all users (including IPs) the pagequality right in plwikisource (T212478) (duration: 00m 45s)
  • 12:45 akosiaris@deploy1001: scap-helm zotero finished
  • 12:45 akosiaris@deploy1001: scap-helm zotero cluster codfw completed
  • 12:45 akosiaris@deploy1001: scap-helm zotero install --name production2 -f zotero-values-codfw.yaml stable/zotero [namespace: zotero, clusters: codfw]
  • 12:44 zfilipin@deploy1001: Synchronized wmf-config/InitialiseSettings.php: SWAT: Allow ptwikis bureaucrats to grant/revoke rollbacker user group (T212735) (duration: 00m 45s)
  • 12:39 akosiaris@deploy1001: scap-helm zotero upgrade production2 -f zoterov2-values-codfw.yaml stable/zotero [namespace: zotero, clusters: codfw]
  • 12:29 zfilipin@deploy1001: Synchronized wmf-config/InitialiseSettings.php: SWAT: Use localized wgMetaNamespace and wgMetaNamespaceTalk in satwiki (T211294) (duration: 00m 45s)
  • 12:23 zfilipin@deploy1001: Synchronized wmf-config/throttle.php: SWAT: New throttle rule for students writing Wikipedia program (T212226) (duration: 00m 44s)
  • 12:14 zfilipin@deploy1001: Synchronized wmf-config/throttle.php: SWAT: New throttle rule for University of Southern California editathon (T212917) (duration: 00m 45s)
  • 12:07 dcausse@deploy1001: Synchronized wmf-config/CirrusSearch-production.php: T212768 [cirrus] re-enable HHVM connection pooling (duration: 00m 45s)
  • 12:01 mobrovac@deploy1001: Finished deploy [restbase/deploy@503b29c] (dev-cluster): Add test-commons and nap.wikisource (duration: 12m 38s)
  • 11:49 mobrovac@deploy1001: Started deploy [restbase/deploy@503b29c] (dev-cluster): Add test-commons and nap.wikisource
  • 11:46 mobrovac@deploy1001: Synchronized wmf-config/CommonSettings.php: Increase time out on the MW side to 60s - T204183 (duration: 00m 51s)
  • 11:36 akosiaris@deploy1001: scap-helm zotero finished
  • 11:36 akosiaris@deploy1001: scap-helm zotero cluster eqiad completed
  • 11:36 akosiaris@deploy1001: scap-helm zotero upgrade production -f zoterov2-values-eqiad.yaml stable/zotero [namespace: zotero, clusters: eqiad]
  • 11:35 akosiaris@deploy1001: scap-helm zotero finished
  • 11:35 akosiaris@deploy1001: scap-helm zotero cluster eqiad completed
  • 11:35 akosiaris@deploy1001: scap-helm zotero upgrade production -f zoterov2-values-eqiad.yaml stable/zotero [namespace: zotero, clusters: eqiad]
  • 11:33 mobrovac@deploy1001: Started restart [electron-render/deploy@94d27d7]: Electron strugling, restart - T213154
  • 11:29 oblivian@puppetmaster1001: conftool action : set/pooled=true; selector: dnsdisc=zotero,name=codfw
  • 11:24 oblivian@puppetmaster1001: conftool action : set/pooled=false; selector: dnsdisc=zotero,name=codfw
  • 11:07 jynus: stoping and restarting db1102 (s5, s4) for upgrade
  • 11:04 moritzm: rebooting mw1261
  • 10:48 moritzm: installing libseccomp updates from stretch point release
  • 10:34 dcausse: elastic@eqiad setting crosscluster conf on production search cluster (T213150)
  • 10:25 banyek: executing schema change on db1062 - T85757
  • 09:39 foks: reset user email for Zergiorubio
  • 09:26 akosiaris@deploy1001: scap-helm zotero finished
  • 09:26 akosiaris@deploy1001: scap-helm zotero cluster codfw completed
  • 09:26 akosiaris@deploy1001: scap-helm zotero install --name production2 -f zotero-values-codfw.yaml stable/zotero [namespace: zotero, clusters: codfw]
  • 09:22 jynus: stop replication on db1124:s5 T213108
  • 09:21 akosiaris@deploy1001: scap-helm zotero finished
  • 09:21 akosiaris@deploy1001: scap-helm zotero cluster eqiad completed
  • 09:21 akosiaris@deploy1001: scap-helm zotero install --name production2 -f zotero-values-eqiad.yaml stable/zotero [namespace: zotero, clusters: eqiad]
  • 09:19 hashar: gerrit: resaved configuration for All-Projects by changing "Max Reviewers" from 3 to 4. Might enable adding reviewers automatically based on git blame. See task for config diff # T101131
  • 09:12 mobrovac@deploy1001: Finished deploy [cpjobqueue/deploy@f91cf04]: Increase the concurrency of categoryMembershipJob - T192691 (duration: 00m 59s)
  • 09:12 mobrovac@deploy1001: Started deploy [cpjobqueue/deploy@f91cf04]: Increase the concurrency of categoryMembershipJob - T192691
  • 05:39 SMalyshev: restarted some Blazegraph servers as precaution against corruption issues
  • 04:26 onimisionipe: depooling wdqs1008 - T213134
  • 03:23 kartik@deploy1001: Finished deploy [cxserver/deploy@b669f95]: Update cxserver to d6b1d6f (duration: 05m 00s)
  • 03:18 kartik@deploy1001: Started deploy [cxserver/deploy@b669f95]: Update cxserver to d6b1d6f
  • 00:22 gehel: restarting tilerator on all maps servers
  • 00:06 gehel: depooling wdqs1007 (something looks like DB corruption)

2019-01-07

  • 23:56 eileen: update civicrm revision changed from bcb4b7a7d1 to b8e3a71845, config revision is 260be32d0a
  • 22:08 jforrester@deploy1001: Synchronized wmf-config/InitialiseSettings.php: TestCommons: Re-enable uploading of files, accidentally prevented (duration: 00m 44s)
  • 21:19 XioNoX: push NAT changes to pfw3-eqiad - T211028
  • 21:16 awight@deploy1001: Finished deploy [ores/deploy@9253beb]: T212530: new ORES models; revscoring 2.3.0 (duration: 15m 28s)
  • 21:13 mforns@deploy1001: Finished deploy [analytics/refinery@faac592]: deploying analytics/refinery to account with refinery-source v0.0.83 (duration: 06m 52s)
  • 21:06 mforns@deploy1001: Started deploy [analytics/refinery@faac592]: deploying analytics/refinery to account with refinery-source v0.0.83
  • 21:00 awight@deploy1001: Started deploy [ores/deploy@9253beb]: T212530: new ORES models; revscoring 2.3.0
  • 20:19 jforrester@deploy1001: Synchronized wmf-config/InitialiseSettings.php: TestCommons: Final go-switch for WBMI Ie52b8af006ba (duration: 00m 45s)
  • 19:52 catrope@deploy1001: Synchronized wmf-config/InitialiseSettings.php: Remove redundant namespace talk definitions (T206952) (duration: 00m 44s)
  • 19:46 catrope@deploy1001: Synchronized wmf-config/InitialiseSettings.php: Set $wgMetaNamespace for bewikibooks (T212665) (duration: 00m 45s)
  • 19:43 catrope@deploy1001: Synchronized wmf-config/InitialiseSettings.php: Enable WikibaseRepo and WikibaseMediaInfo on testcommonswiki (duration: 00m 44s)
  • 19:42 XioNoX: push firewall change to pfw3-codfw/eqiad - T211712
  • 19:40 catrope@deploy1001: Synchronized wmf-config/Wikibase.php: Set empty clientDbList for testcommonswiki (duration: 00m 44s)
  • 19:38 catrope@deploy1001: Synchronized dblists/wikidata.dblist: Enable Wikidata on testcommonswiki (duration: 00m 44s)
  • 19:28 catrope@deploy1001: Synchronized wmf-config/InitialiseSettings.php: Add importupload to sysops on testcommons (duration: 00m 45s)
  • 19:14 catrope@deploy1001: Synchronized wmf-config/InitialiseSettings.php: Enable Flow beta feature on viwikisource (T212929) (duration: 00m 45s)
  • 19:13 catrope@deploy1001: Synchronized dblists/flow.dblist: Enable Flow on viwikisource (T212929) (duration: 00m 45s)
  • 19:11 RoanKattouw: Ran emptyUserGroup.php for autoreview, reviewer and editor groups on srwikinews (T212058)
  • 18:51 XioNoX: re-deactivate bgp sessions to Zayo on cr1-eqiad - T212791
  • 18:20 onimisionipe@deploy1001: Finished deploy [wdqs/wdqs@d8f911c]: new GUI, Updater & Blazegraph build (duration: 10m 13s)
  • 18:18 XioNoX: activate bgp sessions to Zayo on cr1-eqiad - T212791
  • 18:10 jynus: manually creating tables on es1015, es1017 with replication for testcommonswiki
  • 18:10 onimisionipe@deploy1001: Started deploy [wdqs/wdqs@d8f911c]: new GUI, Updater & Blazegraph build
  • 18:07 onimisionipe@deploy1001: deploy aborted: (no justification provided) (duration: 00m 04s)
  • 18:06 onimisionipe@deploy1001: Started deploy [wdqs/wdqs@d8f911c]: (no justification provided)
  • 18:05 XioNoX: deactivate bgp sessions to Zayo on cr1-eqiad T212791
  • 17:35 akosiaris: restart pdfrender on scb1004
  • 17:35 akosiaris: restart pdfrender
  • 17:23 kartik@deploy1001: Finished deploy [cxserver/deploy@594420b]: Update cxserver to 7632c43 (duration: 04m 06s)
  • 17:19 kartik@deploy1001: Started deploy [cxserver/deploy@594420b]: Update cxserver to 7632c43
  • 16:24 jynus: shutting down mariadb again and rebooting db1107
  • 16:15 jynus: starting mariadb on db1107
  • 16:12 onimisionipe: starting inplace reindexing for enwiki - T212224
  • 16:07 volans: powercycle db1107
  • 16:03 elukey: stop eventlogging mysql consumers on eventlog1002 and eventlogging replication on db1108 due to issues with db1107
  • 16:02 jynus@deploy1001: Synchronized wmf-config/db-eqiad.php: Depool db1082 (duration: 00m 45s)
  • 15:46 cmjohnson1: replacing bad fuse on the PDU rack A2 eqiad
  • 14:19 moritzm: added jbond to WMF-LDAP group in Phabricator (T213079)
  • 13:56 ariel@deploy1001: Finished deploy [dumps/dumps@acd9bca]: logging and quiet mode for adds-changes and other dumps (duration: 00m 05s)
  • 13:56 ariel@deploy1001: Started deploy [dumps/dumps@acd9bca]: logging and quiet mode for adds-changes and other dumps
  • 13:02 zeljkof: EU SWAT finished
  • 13:01 zfilipin@deploy1001: Synchronized wmf-config/InitialiseSettings.php: SWAT: cirrus: increase number of shards (T212224) (duration: 00m 44s)
  • 12:48 zfilipin@deploy1001: Synchronized wmf-config/InitialiseSettings.php: SWAT: Restrict moving categories for users at srwiki (T213050) (duration: 00m 44s)
  • 12:40 zfilipin@deploy1001: Synchronized wmf-config/throttle.php: SWAT: Cleanup old throttle rules (duration: 00m 44s)
  • 12:34 zfilipin@deploy1001: Synchronized wmf-config/throttle.php: SWAT: To lift a cap on account creation from IP for mrwiki community (T212921) (duration: 00m 43s)
  • 12:30 Zoranzoki21: tools.zoranzoki21wiki Archived https://www.mediawiki.org/w/index.php?title=Extension:Woopra (https://www.wikidata.org/wiki/Q21679347) - T212994
  • 12:29 zfilipin@deploy1001: Synchronized wmf-config/InitialiseSettings.php: SWAT: Enable reader trust survey (T209882) (duration: 00m 45s)
  • 12:21 zfilipin@deploy1001: Synchronized wmf-config/InitialiseSettings.php: SWAT: Enable Quiz extension on ru.wikibooks (T212622) (duration: 00m 45s)
  • 12:15 zfilipin@deploy1001: Synchronized wmf-config/InitialiseSettings.php: SWAT: Add suppressredirect user right to editor user group at pl.wikisource (T212655) (duration: 00m 44s)
  • 12:11 gtirloni: disabled notifications for cloudvirt0124 (T212360)
  • 12:11 zfilipin@deploy1001: Synchronized wmf-config/InitialiseSettings.php: SWAT: Enable extendedmover user group at en.wiktionary (T212662) (duration: 00m 46s)
  • 12:07 kartik@deploy1001: Finished deploy [cxserver/deploy@2d54a64]: Deploy Google Translation (T90208) (duration: 05m 07s)
  • 12:02 kartik@deploy1001: Started deploy [cxserver/deploy@2d54a64]: Deploy Google Translation (T90208)
  • 10:36 banyek@deploy1001: Synchronized wmf-config/db-eqiad.php: repool db1079 after schema change - T85757 (duration: 00m 44s)
  • 10:31 filippo@deploy1001: Synchronized wmf-config/InitialiseSettings.php: Move group1 to new logging infrastructure - T211124 (duration: 00m 45s)
  • 10:30 banyek: repooling db1079 after schema change - T85757
  • 10:27 banyek: restarting replication on db1079 - T85757
  • 09:55 banyek: executing schema change on db1079 with replication enabled - T85757
  • 09:53 banyek: stopping replication on db1079 - T85757
  • 09:47 banyek@deploy1001: Synchronized wmf-config/db-eqiad.php: depool db1079 for schema change - T85757 (duration: 01m 02s)
  • 09:36 banyek: depooling db1079 for schema change - T85757
  • 08:30 moritzm: rolling restart of swift backend servers to pick up OpenSSL security update
  • 07:24 elukey: restart pdfrender on scb1002

2019-01-06

  • 14:50 ariel@deploy1001: Finished deploy [dumps/dumps@cb30b6c]: check xml files for closing mediawiki tag (duration: 00m 06s)
  • 14:50 ariel@deploy1001: Started deploy [dumps/dumps@cb30b6c]: check xml files for closing mediawiki tag

2019-01-05

  • 20:23 elukey: manually clean up of big logs under /var/log/.. on analytics-tool1002 due to root partition almost filled up

2019-01-04

  • 23:07 mutante: scandium apt-get remove nodejs nodes-legacy ; puppet agent -tv - after merging gerrit:482150 this fixed "you have held broken packages" issue, now we are at a puppet dependecy cycle with apt::pin T201366
  • 15:42 bawolff@deploy1001: Synchronized private/PrivateSettings.php: T212667 - More aggressive anti-spam measures for account creation on kowiki (duration: 00m 48s)
  • 14:08 moritzm: rebooting etcd1001-1003 to pick up SSBD-enabled qemu
  • 13:52 moritzm: rebooting etcd1004-1006 to pick up SSBD-enabled qemu
  • 13:33 moritzm: rebooting kubernetes staging etcd hosts to pick up SSBD-enabled qemu
  • 13:11 moritzm: rebooting kubernetes staging master to pick up SSBD-enabled qemu
  • 12:57 moritzm: rebooting kubernetes staging workers for kernel security update
  • 11:58 moritzm: installing libsndfile security updates
  • 11:33 moritzm: installing jasper security updates
  • 11:31 moritzm: installing libdatetime-timezone-perl updates for recent tz changes
  • 10:47 arturo: T212898 reimaging cloudvirt1024 as stretch
  • 10:46 moritzm: rolling restart of swift proxies to pick up OpenSSL update
  • 09:57 jijiki: restarting thumbor services to pick up 481141
  • 09:50 onimisionipe: restarting nginx on all wdqs hosts
  • 09:40 banyek: executing schema change on dbstore1002 - T85757
  • 09:13 moritzm: restarting nginx on puppetdb hosts to pick up new OpenSSL
  • 09:03 banyek: executing schema change on db1116 - T85757
  • 08:44 moritzm: restarting nginx on francium to pick up new OpenSSL
  • 08:16 elukey: restart eventlogging daemons on eventlog1002 to pick up openssl updates
  • 07:56 moritzm: installing OpenSSL security updates
  • 00:07 mutante: an-coord1001 - apt-get clean to free disk space, reacting to Icinga alert for running out of disk

2019-01-03

  • 23:08 volans: restarted pdfrender on scb1004
  • 22:29 volans: restarted all slaves on dbstore1002 (relayed from banyek)
  • 22:14 banyek: stopping all slaves on dbstore1002 (NOT labsdb)
  • 22:14 banyek: stopping all slaves on labsdb1002
  • 20:50 reedy@deploy1001: Synchronized multiversion/MWMultiVersion.php: Fix error for testcommons (duration: 00m 44s)
  • 20:46 reedy@deploy1001: Synchronized dblists/group0.dblist: Add testcommonswiki to group0 (duration: 00m 43s)
  • 20:43 reedy@deploy1001: Synchronized wmf-config/interwiki.php: Updating interwiki cache (duration: 02m 05s)
  • 20:24 reedy@deploy1001: Synchronized wmf-config/db-codfw.php: T197616 (duration: 00m 44s)
  • 20:23 reedy@deploy1001: Synchronized wmf-config/db-eqiad.php: T197616 (duration: 00m 44s)
  • 20:13 reedy@deploy1001: Synchronized wmf-config/InitialiseSettings.php: T197616 (duration: 00m 44s)
  • 20:12 reedy@deploy1001: Synchronized multiversion/MWMultiVersion.php: T197616 (duration: 00m 44s)
  • 20:11 reedy@deploy1001: rebuilt and synchronized wikiversions files: T197616
  • 20:09 reedy@deploy1001: Synchronized dblists/: T197616 (duration: 00m 45s)
  • 18:51 bsitzmann@deploy1001: Finished deploy [mobileapps/deploy@1182b3b]: Update mobileapps to f6ad0e5: Set timeout for backend /page/html requests, part 2 (duration: 05m 27s)
  • 18:46 bsitzmann@deploy1001: Started deploy [mobileapps/deploy@1182b3b]: Update mobileapps to f6ad0e5: Set timeout for backend /page/html requests, part 2
  • 18:37 bsitzmann@deploy1001: Finished deploy [mobileapps/deploy@c470ed2]: Update mobileapps to f6ad0e5: Set timeout for backend /page/html requests (duration: 04m 11s)
  • 18:33 bsitzmann@deploy1001: Started deploy [mobileapps/deploy@c470ed2]: Update mobileapps to f6ad0e5: Set timeout for backend /page/html requests
  • 18:21 volans: restart pdfrender on scb1003
  • 17:58 ariel@deploy1001: Finished deploy [dumps/dumps@10dc8ad]: return properly if commands failed (duration: 00m 08s)
  • 17:58 ariel@deploy1001: Started deploy [dumps/dumps@10dc8ad]: return properly if commands failed
  • 16:32 XioNoX: remove old 10.64.22.0/24 IPs from cloud-instance-transport1-b-eqiad - T207663
  • 16:22 moritzm: rebooting kubernetes workers in eqiad for kernel security update
  • 16:02 arturo: reimaging cloudvirt1013 cloudvirt1026-1028 to stretch
  • 15:48 moritzm: restart parsoid on wtp1025 to pick up OpenSSL update for nodejs
  • 15:43 jijiki: Enabled puppet on mw servers after merging 481796 - T197616
  • 15:31 jijiki: Disabling puppet on mw servers to test 481796 - T197616
  • 15:14 ejegg: updated Fundraising CiviCRM from b33dcd3c94 to bcb4b7a7d1
  • 14:37 moritzm: rebooting kubernetes workers in codfw for kernel security update
  • 14:37 banyek@deploy1001: Synchronized wmf-config/db-eqiad.php: repool db1101:3317 after schema change - T85757 (duration: 00m 44s)
  • 14:32 banyek: repooling db1101:3317 after schema change - T85757
  • 14:21 moritzm: rebooting kubernetes masters in eqiad to pick up SSBD-enabled qemu
  • 14:14 moritzm: rebooting kubernetes mastes in codfw to pick up SSBD-enabled qemu
  • 14:05 arturo: T209616 reimage cloudvirt1029 as debian stretch
  • 13:43 banyek@deploy1001: Synchronized wmf-config/db-eqiad.php: depool db1101:3317 for schema change - T85757 (duration: 00m 44s)
  • 13:41 banyek: depooling db1101:3317 for schema change - T85757
  • 13:38 banyek@deploy1001: Synchronized wmf-config/db-eqiad.php: repool db1098:3317 after schema change - T85757 (duration: 00m 44s)
  • 13:34 banyek: repooling db1098:3317 after schema change - T85757
  • 13:24 kartik@deploy1001: Finished deploy [cxserver/deploy@3b2ede7]: Update cxserver to 2369a18 (duration: 04m 30s)
  • 13:20 kartik@deploy1001: Started deploy [cxserver/deploy@3b2ede7]: Update cxserver to 2369a18
  • 12:58 banyek@deploy1001: Synchronized wmf-config/db-eqiad.php: depool db1098:3317 for schema change - T85757 (duration: 00m 45s)
  • 12:55 banyek: depooling db1098:3317 for schema change - T85757
  • 12:54 banyek@deploy1001: Synchronized wmf-config/db-eqiad.php: repool db1094 after schema change - T85757 (duration: 00m 45s)
  • 12:49 banyek: repooling db1094 after schema change - T85757
  • 12:41 arturo: T212302 reimaging again cloudvirt1030 to test final puppet code
  • 12:33 banyek@deploy1001: Synchronized wmf-config/db-eqiad.php: depool db1094 for schema change - T85757 (duration: 00m 46s)
  • 12:28 banyek: depooling db1094 for schema change - T85757
  • 12:27 moritzm: restarting tor on torrelay1001 to pick up OpenSSL security update
  • 11:02 _joe_: manually reloading icinga to pick up changes to commands.cfg
  • 10:55 moritzm: installing apache updates on puppetmasters
  • 10:22 moritzm: installing ghostscript security updates on jessie
  • 09:51 elukey: restart memcached on mc1023 to apply -R 200 - T208844
  • 09:46 moritzm: remove imagemagick remnants from ATS hosts (obsoleted by upstream packaging change which dropped the webp plugin)
  • 09:39 moritzm: installing nginx updates on puppetdb*
  • 09:26 banyek@deploy1001: Synchronized wmf-config/db-codfw.php: repool es2019 - T212833 (duration: 01m 33s)
  • 09:18 banyek: repooling es2019 - T212833
  • 08:46 moritzm: rolling restart of proton to pick up OpenSSL update
  • 08:35 banyek: depooled es2019 as host was unsresponsive - T212833
  • 08:35 banyek@deploy1001: Synchronized wmf-config/db-codfw.php: depool es2019, host is unsresponsible - T212833 (duration: 00m 49s)
  • 08:11 moritzm: installing OpenSSL security updates
  • 00:21 mutante: notebook1004 - started nagios-nrpe-server one more time

2019-01-02

  • 23:59 mutante: notebook1004 still keeps running out of memory from some user actions and that kills nagios-nrpe-server and that causes a bunch of Icinga alerts
  • 23:39 mutante: notebook1004 - systemctl start nagios-nrpe-server
  • 23:39 mutante: notebook1004 - systemctl status nagios-nrpe-server
  • 20:59 herron@puppetmaster1001: conftool action : set/pooled=yes; selector: dc=eqiad,cluster=parsoid,service=parsoid,name=wtp1028.eqiad.wmnet
  • 20:59 herron: repooling wtp1028 T212624
  • 20:52 herron: rebooting wtp1028 — looking for POST errors T212624
  • 20:05 Krinkle: mwmaint1002: foreachwikiindblist s5 deleteEqualMessages.php
  • 20:04 Krinkle: mwmaint1002: foreachwikiindblist s2 deleteEqualMessages.php
  • 18:35 volans: restarting icinga on icinga1001 T212669
  • 16:50 XioNoX: create BGP sessions to AS3214 in AMS-IX
  • 16:46 XioNoX: remove BGP sessions to AS42949 in AMS-IX (leaving the IX)
  • 16:43 XioNoX: remove BGP sessions to AS6866 in AMS-IX (leaving the IX)
  • 16:33 banyek@deploy1001: Synchronized wmf-config/db-eqiad.php: repool db1090:3317 after schema change - T85757 (duration: 00m 46s)
  • 16:30 arturo: reimaging cloudvirt1030 with stretch, server cleanup after puppet refactoring
  • 16:29 moritzm: restarting Superset to pick up openssl security update
  • 16:25 moritzm: restarting Hue to pick up openssl security update
  • 16:23 arturo: T212302 re-enable puppet in all {cloud,lab}virt* servers, all was fine
  • 16:22 banyek: repooling db1090:3317 after schema change (T85757)
  • 16:11 arturo: T212302 disable puppet in all {cloud,lab}virt* servers to merge https://gerrit.wikimedia.org/r/#/c/operations/puppet/+/481194/
  • 15:39 banyek@deploy1001: Synchronized wmf-config/db-eqiad.php: depool db1090:3317 for schema change - T85757 (duration: 00m 44s)
  • 15:34 moritzm: installing OpenSSL security updates
  • 15:31 banyek: depooling db1090:3317 for schema change (T85757)
  • 15:13 banyek@deploy1001: Synchronized wmf-config/db-eqiad.php: repool db1086 after schema change - T85757 (duration: 00m 44s)
  • 15:07 banyek: repooling db1086 after schema change (T85757)
  • 14:49 banyek: executing schema change on db1086 - T85757
  • 14:48 moritzm: installing ghostscript security update for jessie
  • 14:47 banyek@deploy1001: Synchronized wmf-config/db-eqiad.php: depool db1086 for schema change - T85757 (duration: 00m 45s)
  • 14:38 banyek: depooling db1086 for schema change (T85757)
  • 14:15 ema: cp hosts: upgrade OpenSSL from 1.1.0f to 1.1.0j
  • 13:39 moritzm: installing ghostscript update for stretch
  • 13:33 moritzm: installing libav security updates
  • 13:30 marostegui@deploy1