Server Admin Log

From Wikitech
Jump to navigation Jump to search

2019-04-18

  • 13:30 jbond42: rolling updates of ruby2.1 on jessie
  • 13:08 elukey: roll restart of cassandra on aqs* to pick up new openjdk upgrades
  • 13:05 jmm@cumin2001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
  • 13:05 jmm@cumin2001: START - Cookbook sre.hosts.downtime
  • 12:58 reedy@deploy1001: rebuilt and synchronized wikiversions files: group1 back to .25
  • 12:36 anomie: Ran `php7adm /opcache-free` on mw1274 to test a theory related to T221347. The log entries related to that task stopped immediately.
  • 12:30 gehel: restarting blazegraph + updater on wdqs* for jvm upgrade
  • 12:22 moritzm: installing Java security updates on restbase-dev hosts (along with Cassandra restarts)
  • 12:21 gehel: restarting blazegraph + updater on wdqs1009 / wdqs1010 for jvm upgrade
  • 12:19 moritzm: installing Java security updates on WDQS autodeploy/test hosts
  • 10:40 jmm@cumin2001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
  • 10:40 jmm@cumin2001: START - Cookbook sre.hosts.downtime
  • 10:35 moritzm: installing rails security updates on jessie hosts
  • 10:21 moritzm: installing jasper updates on jessie hosts
  • 09:44 akosiaris: update grafana service/ dashboard to have user, system, throttled CPU metrics under the CPU saturation row
  • 09:41 gilles@deploy1001: Synchronized wmf-config/InitialiseSettings.php: T216597 Run CPU benchmark for all samples on eswiki/ruwiki (duration: 01m 06s)
  • 09:11 jmm@cumin2001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
  • 09:10 jmm@cumin2001: START - Cookbook sre.hosts.downtime
  • 09:00 jmm@cumin2001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
  • 09:00 jmm@cumin2001: START - Cookbook sre.hosts.downtime
  • 08:54 elukey@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
  • 08:54 elukey@cumin1001: START - Cookbook sre.hosts.downtime
  • 08:54 elukey@cumin1001: END (FAIL) - Cookbook sre.hosts.downtime (exit_code=99)
  • 08:54 elukey@cumin1001: START - Cookbook sre.hosts.downtime
  • 08:53 elukey: reboot kafka10[12-23] (old Analytics cluster) for kernel + openjdk upgrades
  • 08:23 gehel@cumin1001: END (PASS) - Cookbook sre.wdqs.data-transfer (exit_code=0)
  • 08:14 moritzm: installing libssh2 security updates on jessie
  • 08:01 moritzm: restarting mw1261-mw1265 to pick up new libssh2
  • 07:55 jmm@cumin2001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
  • 07:55 jmm@cumin2001: START - Cookbook sre.hosts.downtime
  • 07:53 filippo@puppetmaster1001: conftool action : set/pooled=yes; selector: name=prometheus2004.codfw.wmnet
  • 07:28 moritzm: installing libssh2 security updates
  • 07:19 gehel@cumin1001: START - Cookbook sre.wdqs.data-transfer
  • 06:58 moritzm: restarting icinga on icinga1001 (T196336)
  • 06:37 moritzm: rolling reboots of Swift backends in eqiad for combined kernel/glibc/OpenSSL update

2019-04-17

  • 22:46 krinkle@deploy1001: Synchronized php-1.34.0-wmf.1/includes/: I3a50508178159 (duration: 01m 21s)
  • 22:40 XioNoX: push firewall change to pfw3-codfw - T221278
  • 22:28 krinkle@deploy1001: Synchronized php-1.34.0-wmf.1/extensions/Score/: Id58156cfca805 / T219342 (duration: 01m 03s)
  • 21:30 XioNoX: enable option-82 on asw2-b:cloud-hosts1-b-eqiad vlan
  • 21:10 thcipriani: gerrit back
  • 21:07 thcipriani@deploy1001: Finished deploy [gerrit/gerrit@4dcb851]: Gerrit update (cobalt -- restart incoming) (duration: 00m 10s)
  • 21:07 thcipriani@deploy1001: Started deploy [gerrit/gerrit@4dcb851]: Gerrit update (cobalt -- restart incoming)
  • 21:06 thcipriani@deploy1001: Finished deploy [gerrit/gerrit@4dcb851]: Gerrit update (gerrit2001 only) (duration: 00m 11s)
  • 21:06 thcipriani@deploy1001: Started deploy [gerrit/gerrit@4dcb851]: Gerrit update (gerrit2001 only)
  • 19:14 twentyafterfour@deploy1001: Synchronized php: group1 wikis to 1.34.0-wmf.1 refs T220726 (duration: 01m 49s)
  • 19:13 twentyafterfour@deploy1001: rebuilt and synchronized wikiversions files: group1 wikis to 1.34.0-wmf.1 refs T220726
  • 18:04 thcipriani: gerrit back
  • 18:01 thcipriani: gerrit restart for https://gerrit.wikimedia.org/r/504611/
  • 17:59 jforrester@deploy1001: Synchronized wmf-config/InitialiseSettings.php: SDC: Enable Wikidata federation on Commons again T214075 (duration: 01m 00s)
  • 17:20 otto@deploy1001: Synchronized wmf-config/InitialiseSettings.php: Enabling EventGate api-request logging on group1 wikis (duration: 01m 00s)
  • 17:18 mutante: LDAP - added 'brennen' to group 'gerritadmin' (T218858)
  • 17:18 jforrester@deploy1001: Synchronized php-1.34.0-wmf.1/extensions/OATHAuth/: UBN T221257 train un-blocker (duration: 01m 02s)
  • 17:09 jforrester@deploy1001: Synchronized php-1.34.0-wmf.1/extensions/Echo/includes/formatters/: Notifications: Revert 7121b9c4 per I8f9a6a19ba (duration: 01m 01s)
  • 16:49 tzatziki: deleting three files for legal compliance
  • 16:47 jforrester@deploy1001: Synchronized php-1.34.0-wmf.1/extensions/WikibaseMediaInfo/: SDC: Various fixes T218922 T221071 T221110 T221123 (duration: 01m 02s)
  • 16:41 jforrester@deploy1001: Synchronized php-1.34.0-wmf.1/autoload.php: Update to point to new maintenance scripts (duration: 01m 00s)
  • 16:39 jforrester@deploy1001: Synchronized php-1.34.0-wmf.1/maintenance/language/generateUpperCharTable.php: Maintenance script for _joe_ (duration: 00m 59s)
  • 16:38 jforrester@deploy1001: Synchronized php-1.34.0-wmf.1/maintenance/language/generateUcfirstOverrides.php: Maintenance script for _joe_ (duration: 01m 00s)
  • 16:21 jforrester@deploy1001: Synchronized php-1.34.0-wmf.1/languages/Language.php: T219279 Ability to set wgOverrideUcfirstCharacters part 1 try two (duration: 01m 00s)
  • 16:18 jforrester@deploy1001: Synchronized php-1.34.0-wmf.1/includes/DefaultSettings.php: T219279 Ability to set wgOverrideUcfirstCharacters part 1b (duration: 01m 03s)
  • 16:13 jforrester@deploy1001: scap failed: average error rate on 4/11 canaries increased by 10x (rerun with --force to override this check, see https://logstash.wikimedia.org/goto/db09a36be5ed3e81155041f7d46ad040 for details)
  • 16:11 XioNoX: set fasw-c-eqiad:ge-[0-1]/0/17 in admin vlan - T221232
  • 16:09 jforrester@deploy1001: Synchronized wmf-config/InitialiseSettings.php: SWAT T220434 Deploy Partial blocks to Chinese Wikipedia (duration: 01m 02s)
  • 14:37 ariel@deploy1001: Finished deploy [dumps/dumps@dcf04a0]: fix up paths for 1.34_wmf.1 for AbstractFilter (duration: 00m 04s)
  • 14:36 ariel@deploy1001: Started deploy [dumps/dumps@dcf04a0]: fix up paths for 1.34_wmf.1 for AbstractFilter
  • 14:35 otto@deploy1001: scap-helm eventgate-analytics finished
  • 14:35 otto@deploy1001: scap-helm eventgate-analytics cluster eqiad completed
  • 14:35 otto@deploy1001: scap-helm eventgate-analytics upgrade production -f eventgate-analytics-eqiad-values.yaml --reset-values stable/eventgate-analytics [namespace: eventgate-analytics, clusters: eqiad]
  • 14:34 otto@deploy1001: scap-helm eventgate-analytics finished
  • 14:34 otto@deploy1001: scap-helm eventgate-analytics cluster codfw completed
  • 14:34 otto@deploy1001: scap-helm eventgate-analytics upgrade production -f eventgate-analytics-codfw-values.yaml --reset-values stable/eventgate-analytics [namespace: eventgate-analytics, clusters: codfw]
  • 14:13 elukey@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
  • 14:12 elukey@cumin1001: START - Cookbook sre.hosts.downtime
  • 13:56 otto@deploy1001: scap-helm eventgate-analytics finished
  • 13:56 otto@deploy1001: scap-helm eventgate-analytics cluster staging completed
  • 13:56 otto@deploy1001: scap-helm eventgate-analytics upgrade staging -f eventgate-analytics-staging-values.yaml --reset-values stable/eventgate-analytics [namespace: eventgate-analytics, clusters: staging]
  • 13:52 elukey: upgrading hadoop cdh distrubition to 5.16.1 on all the Hadoop-related nodes - T218343
  • 13:48 elukey@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
  • 13:48 elukey@cumin1001: START - Cookbook sre.hosts.downtime
  • 13:48 godog: reimage prometheus2004 - T187987
  • 12:57 filippo@puppetmaster1001: conftool action : set/pooled=yes; selector: name=prometheus1004.eqiad.wmnet
  • 12:44 godog: bounce prometheus instances on prometheus[12]003 after https://gerrit.wikimedia.org/r/c/operations/puppet/+/499742
  • 12:33 moritzm: running some ferm tests on graphite2002
  • 12:10 godog: briefly stop all prometheus on prometheus1003 to finish metrics rsync - T187987
  • 11:39 Lucas_WMDE: EU SWAT done
  • 11:38 lucaswerkmeister-wmde@deploy1001: Synchronized wmf-config/InitialiseSettings.php: SWAT: Enable suggestion constraint status on testwikidata (T221108, T204439) (duration: 01m 01s)
  • 10:58 volans@deploy1001: Finished deploy [debmonitor/deploy@f049b3b]: Deploy Debmonitor v0.1.9 (duration: 01m 00s)
  • 10:57 volans@deploy1001: Started deploy [debmonitor/deploy@f049b3b]: Deploy Debmonitor v0.1.9
  • 10:40 moritzm: installing Java security updates on kafka/analytics cluster
  • 09:17 godog: swift eqiad-prod continue ms-be1013 decom - T220590
  • 09:09 elukey: restart eventlogging on eventlog1002 due to errors in processors and consumer lag accumulated after the last Kafka Jumbo roll restart
  • 08:47 godog: reimage prometheus1004 - T187987
  • 08:38 jynus@deploy1001: Synchronized wmf-config/db-eqiad.php: Repool db1078 fully (duration: 01m 00s)
  • 08:29 moritzm: installing ghostscript security updates
  • 07:51 gilles@deploy1001: Synchronized php-1.33.0-wmf.25/extensions/NavigationTiming: T216597 Event timing support (duration: 01m 01s)
  • 07:45 gilles@deploy1001: Synchronized wmf-config/InitialiseSettings.php: T216597 Enable Event Timing origin trial on ruwiki and eswiki (duration: 01m 04s)
  • 07:21 jynus@deploy1001: Synchronized wmf-config/db-eqiad.php: Repool db1078 with low load (duration: 01m 18s)
  • 07:07 moritzm: rolling reboots of Swift backends in codfw for combined kernel/glibc/OpenSSL update

2019-04-16

  • 23:42 ebernhardson@deploy1001: Synchronized wmf-config/InitialiseSettings.php: Return CirrusSearch to standard execution against eqiad cluster (duration: 01m 00s)
  • 23:37 ebernhardson@deploy1001: Synchronized php-1.33.0-wmf.25/extensions/CirrusSearch/includes/: Fix fatals on malformed search queries against overridden clusters (duration: 01m 06s)
  • 22:42 thcipriani: gerrit back
  • 22:39 thcipriani: restarting gerrit for configuration update https://gerrit.wikimedia.org/r/504448
  • 22:24 jforrester@deploy1001: Synchronized wmf-config/InitialiseSettings.php: T165795 Give bureaucrats the usermerge right (duration: 00m 59s)
  • 22:20 jforrester@deploy1001: Synchronized php-1.34.0-wmf.1/extensions/NewUserMessage/includes/NewUserMessage.php: Disable onLocalUserCreated for known bot accounts (duration: 01m 01s)
  • 22:17 mobrovac@deploy1001: Finished deploy [restbase/deploy@f1c767d]: mobile-sections simplification: use the key/value bucket only - T215960 (duration: 20m 02s)
  • 22:09 jforrester@deploy1001: Synchronized wmf-config/InitialiseSettings.php: T165795 Enable the UserMerge extension for clean-up on wikitech (duration: 01m 00s)
  • 21:57 mobrovac@deploy1001: Started deploy [restbase/deploy@f1c767d]: mobile-sections simplification: use the key/value bucket only - T215960
  • 21:56 eileen: civicrm revision changed from 1bc1570967 to 31982324b8, config revision is e5a7908330
  • 21:56 mobrovac@deploy1001: Finished deploy [restbase/deploy@f1c767d] (dev-cluster): mobile-sections simplification: use the key/value bucket only (duration: 05m 24s)
  • 21:50 mobrovac@deploy1001: Started deploy [restbase/deploy@f1c767d] (dev-cluster): mobile-sections simplification: use the key/value bucket only
  • 21:47 twentyafterfour@deploy1001: rebuilt and synchronized wikiversions files: group0 wikis to 1.34.0-wmf.1 refs T220726
  • 21:24 andrewbogott: deleting 'eqiad' endpoint in keystone
  • 21:21 twentyafterfour@deploy1001: Finished scap: testwikis wikis to 1.34.0-wmf.1 refs T220726 (duration: 36m 47s)
  • 21:09 XioNoX: add wpao to wmf/ops in LDAP - T221142
  • 21:02 cdanis@puppetmaster1001: conftool action : set/pooled=yes; selector: name=mw1280.eqiad.wmnet
  • 20:59 otto@deploy1001: scap-helm eventgate-analytics finished
  • 20:58 otto@deploy1001: scap-helm eventgate-analytics cluster staging completed
  • 20:58 otto@deploy1001: scap-helm eventgate-analytics upgrade staging -f eventgate-analytics-staging-values.yaml --reset-values stable/eventgate-analytics [namespace: eventgate-analytics, clusters: staging]
  • 20:55 andrewbogott: removing keystone endpoints for the 'eqiad' region
  • 20:45 twentyafterfour@deploy1001: Started scap: testwikis wikis to 1.34.0-wmf.1 refs T220726
  • 20:43 mobrovac@deploy1001: Finished deploy [restbase/deploy@dfca9e6]: Use the simplified key/value bucket - T215960 (duration: 19m 52s)
  • 20:43 otto@deploy1001: scap-helm eventgate-analytics finished
  • 20:42 otto@deploy1001: scap-helm eventgate-analytics cluster staging completed
  • 20:42 otto@deploy1001: scap-helm eventgate-analytics upgrade staging -f eventgate-analytics-staging-values.yaml --reset-values --set debug_mode_enabled=true stable/eventgate-analytics [namespace: eventgate-analytics, clusters: staging]
  • 20:23 mobrovac@deploy1001: Started deploy [restbase/deploy@dfca9e6]: Use the simplified key/value bucket - T215960
  • 20:19 ariel@deploy1001: Finished deploy [dumps/dumps@796ccb5]: use safe_load yaml and getReplicaServer.php, cleanup symlinks once per job only (duration: 00m 04s)
  • 20:19 ariel@deploy1001: Started deploy [dumps/dumps@796ccb5]: use safe_load yaml and getReplicaServer.php, cleanup symlinks once per job only
  • 20:11 mobrovac@deploy1001: Finished deploy [restbase/deploy@dfca9e6] (dev-cluster): Use the simplified key/value bucket (duration: 05m 24s)
  • 20:05 mobrovac@deploy1001: Started deploy [restbase/deploy@dfca9e6] (dev-cluster): Use the simplified key/value bucket
  • 20:04 otto@deploy1001: scap-helm eventgate-analytics finished
  • 20:04 otto@deploy1001: scap-helm eventgate-analytics cluster staging completed
  • 20:04 otto@deploy1001: scap-helm eventgate-analytics upgrade staging -f eventgate-analytics-staging-values.yaml --reset-values --set debug_mode_enabled=true stable/eventgate-analytics [namespace: eventgate-analytics, clusters: staging]
  • 19:59 otto@deploy1001: scap-helm eventgate-analytics finished
  • 19:59 otto@deploy1001: scap-helm eventgate-analytics cluster staging completed
  • 19:59 otto@deploy1001: scap-helm eventgate-analytics upgrade staging -f eventgate-analytics-staging-values.yaml --reset-values --set debug_mode_enabled=true stable/eventgate-analytics [namespace: eventgate-analytics, clusters: staging]
  • 19:56 gehel: restarting cassandra on maps* for config change - T221055
  • 19:49 otto@deploy1001: scap-helm eventgate-analytics finished
  • 19:49 otto@deploy1001: scap-helm eventgate-analytics cluster staging completed
  • 19:49 otto@deploy1001: scap-helm eventgate-analytics upgrade staging -f eventgate-analytics-staging-values.yaml --reset-values --set debug_mode_enabled=true stable/eventgate-analytics [namespace: eventgate-analytics, clusters: staging]
  • 19:48 otto@deploy1001: scap-helm eventgate-analytics finished
  • 19:48 otto@deploy1001: scap-helm eventgate-analytics cluster staging completed
  • 19:48 otto@deploy1001: scap-helm eventgate-analytics upgrade staging -f eventgate-analytics-staging-values.yaml --reset-values --set main_app.debug_mode_enabled=true stable/eventgate-analytics [namespace: eventgate-analytics, clusters: staging]
  • 19:11 twentyafterfour: twentyafterfour@deploy1001:/srv/mediawiki-staging$ scap prep 1.34.0-wmf.1
  • 19:07 bblack: restarting varnish backend on cp1083
  • 19:04 bblack: restarting varnish backend on cp1085
  • 18:55 cdanis: cdanis@cp1085.eqiad.wmnet ~ % sudo -i depool
  • 18:53 otto@deploy1001: scap-helm eventgate-analytics finished
  • 18:53 otto@deploy1001: scap-helm eventgate-analytics cluster staging completed
  • 18:53 otto@deploy1001: scap-helm eventgate-analytics upgrade staging -f eventgate-analytics-staging-values.yaml --reset-values --set main_app.profiling_enabled=true stable/eventgate-analytics [namespace: eventgate-analytics, clusters: staging]
  • 18:46 twentyafterfour: branching 1.34.0-wmf.1 refs T220726
  • 18:25 otto@deploy1001: scap-helm eventgate-analytics finished
  • 18:25 otto@deploy1001: scap-helm eventgate-analytics cluster staging completed
  • 18:25 otto@deploy1001: scap-helm eventgate-analytics upgrade staging -f eventgate-analytics-staging-values.yaml --reset-values --set wmfdebug_enabled=true stable/eventgate-analytics [namespace: eventgate-analytics, clusters: staging]
  • 18:14 cmjohnson1: powering off mw1280 to replace DIMM
  • 18:08 mutante: restbase2007, restbase2008 - re-enabled puppet which was disabled with reason 'decom'ed' but actually needed to run to decom after they had moved to role::spare::system (T208087)
  • 17:56 reedy@deploy1001: Synchronized php-1.33.0-wmf.25/extensions/WikimediaIncubator/: T220623 (duration: 00m 53s)
  • 17:47 herron: beginning rolling ELK upgrade to 5.6.15
  • 17:46 lucaswerkmeister-wmde@deploy1001: Synchronized wmf-config/InitialiseSettings-labs.php: no-op preparatory change (T221107) (duration: 00m 52s)
  • 17:36 arturo: toolforge k8s reallocation (from nova-network to neutron) is causing troubles with IRC bots, expect missing entries in the SAL
  • 17:28 aborrero@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
  • 17:28 aborrero@cumin1001: START - Cookbook sre.hosts.downtime
  • 17:27 andrewbogott: restarting rabbitmq on cloudcontrol1003
  • 17:26 cdanis@puppetmaster1001: conftool action : set/pooled=no; selector: dc=eqiad,name=mw1280.eqiad.wmnet,cluster=api_appserver
  • 17:25 arturo: rebooted cloudnet1003
  • 17:24 gehel: force initialization of unassigned shards on elasticsearch eqiad
  • 17:16 lucaswerkmeister-wmde@deploy1001: Synchronized wmf-config/InitialiseSettings.php: no-op preparatory change (T221108) (duration: 00m 52s)
  • 16:54 Lucas_WMDE: lucaswerkmeister-wmde@mwmaint1002:~$ mwscript extensions/WikibaseQualityConstraints/maintenance/ImportConstraintEntities.php --wiki=testwikidatawiki --config-format=wgConf | tee T221108.php
  • 16:53 mutante: bast2001 - shutdown -h now - decom'ed (T219492)
  • 16:48 mutante: puppet node clean bast2001.wikimedia.org ; puppet node deactivate bast2001.wikimedia.org ; it showed up in Icinga again despite running decom cookbook (T219492)
  • 16:47 otto@deploy1001: scap-helm eventgate-analytics finished
  • 16:47 otto@deploy1001: scap-helm eventgate-analytics cluster staging completed
  • 16:47 otto@deploy1001: scap-helm eventgate-analytics upgrade staging -f eventgate-analytics-staging-values.yaml --reset-values --set wmfdebug_enabled=true stable/eventgate-analytics [namespace: eventgate-analytics, clusters: staging]
  • 16:44 otto@deploy1001: scap-helm eventgate-analytics finished
  • 16:44 otto@deploy1001: scap-helm eventgate-analytics cluster staging completed
  • 16:44 otto@deploy1001: scap-helm eventgate-analytics upgrade staging -f eventgate-analytics-staging-values.yaml --reset-values stable/eventgate-analytics [namespace: eventgate-analytics, clusters: staging]
  • 16:43 jynus: upgrading and shutting down db1078 T219115
  • 16:41 jynus: disabling notifications on db1078 T219115
  • 16:37 jynus@deploy1001: Synchronized wmf-config/db-eqiad.php: Depool db1078 (duration: 00m 52s)
  • 15:36 arturo: reimaging cloudnet2002-dev because role name change
  • 15:21 otto@deploy1001: scap-helm eventgate-analytics finished
  • 15:21 otto@deploy1001: scap-helm eventgate-analytics cluster staging completed
  • 15:20 otto@deploy1001: scap-helm eventgate-analytics upgrade staging --version 0.0.28 -f eventgate-analytics-staging-values.yaml --reset-values stable/eventgate-analytics [namespace: eventgate-analytics, clusters: staging]
  • 15:19 otto@deploy1001: scap-helm eventgate-analytics finished
  • 15:19 otto@deploy1001: scap-helm eventgate-analytics cluster staging completed
  • 15:19 otto@deploy1001: scap-helm eventgate-analytics upgrade staging -f eventgate-analytics-staging-values.yaml --reset-values stable/eventgate-analytics [namespace: eventgate-analytics, clusters: staging]
  • 15:18 otto@deploy1001: scap-helm eventgate-analytics finished
  • 15:18 otto@deploy1001: scap-helm eventgate-analytics cluster staging completed
  • 15:18 otto@deploy1001: scap-helm eventgate-analytics upgrade staging -f eventgate-analytics-staging-values.yaml --reset-values stable/eventgate-analytics [namespace: eventgate-analytics, clusters: staging]
  • 15:16 elukey: roll restart kafka on kafka-jumbo100[1-6] to pick up openjdk upgrades
  • 14:58 gehel: manual data transfer from wdqs1008 to wdqs1009 - T220830
  • 14:56 ema: swift-fe-eqiad: nginx reload for new TLS certificate T204245
  • 14:53 gehel@cumin1001: END (PASS) - Cookbook sre.wdqs.data-transfer (exit_code=0)
  • 14:52 gehel@cumin1001: START - Cookbook sre.wdqs.data-transfer
  • 14:51 ema@puppetmaster1001: conftool action : set/pooled=yes; selector: name=ms-fe1005.eqiad.wmnet
  • 14:45 ema: test https://gerrit.wikimedia.org/r/504340 on ms-fe1005 T204245
  • 14:30 ema: swift-fe-codfw: nginx reload for new TLS certificate T204245
  • 14:22 gehel@cumin1001: END (PASS) - Cookbook sre.wdqs.data-transfer (exit_code=0)
  • 14:21 gehel@cumin1001: START - Cookbook sre.wdqs.data-transfer
  • 14:20 elukey: roll restart of all the druid daemons on druid100[1-6] to pick up new openjdk updates
  • 14:17 ema@puppetmaster1001: conftool action : set/pooled=yes; selector: name=ms-fe2005.codfw.wmnet
  • 14:07 jijiki: Pooling thumbor1001
  • 14:04 ema: test https://gerrit.wikimedia.org/r/#/c/operations/puppet/+/504331/ on ms-fe2005 T204245
  • 14:01 ema@puppetmaster1001: conftool action : set/pooled=no; selector: name=ms-fe2005.codfw.wmnet
  • 14:01 jijiki: Depooling thumbor1001
  • 13:58 jijiki: Disable puppet on thumbor1001 for ~24h to serve traffic via haproxy - T187765
  • 13:54 gehel@cumin1001: END (FAIL) - Cookbook sre.wdqs.data-transfer (exit_code=99)
  • 13:53 gehel@cumin1001: START - Cookbook sre.wdqs.data-transfer
  • 13:52 jijiki: Enable puppet on thumbor*
  • 13:42 gehel@cumin1001: END (FAIL) - Cookbook sre.wdqs.data-transfer (exit_code=99)
  • 13:41 gehel@cumin1001: START - Cookbook sre.wdqs.data-transfer
  • 13:39 gehel: restetting cookbooks repo on cumin1001 (local changes)
  • 13:34 jijiki: Disabling puppet on thumbor* to merge 504284
  • 13:13 ema: cp-ats: upgrade fifo-log-demux to 0.2 and restart services
  • 13:10 ema: fifo-log-demux 0.2 uploaded to stretch-wikimedia
  • 13:03 arturo: T220095 renaming/reimaging labtestcontrol2003 as cloudcontrol2003-dev
  • 12:58 moritzm: installing ghostscript update on thumbor1001
  • 12:54 gehel: cleanup redundant prometheus-elasticsearch units on elasticsearch servers
  • 12:52 godog: swift eqiad-prod continue ms-be1013 decom - T220590
  • 12:17 moritzm: installing OpenSSL 1.0.2 updates on cp* Varnish hosts
  • 12:07 arturo: rebooting cloudvirt200[123]-dev because deep changes in config
  • 11:18 hoo@deploy1001: Synchronized wmf-config/InitialiseSettings.php: Add wgWikibaseMusicalNotationLineWidthInches to config (T218191) (duration: 00m 52s)
  • 11:10 hoo@deploy1001: Synchronized wmf-config/Wikibase.php: Revert "WikibaseClient: Conditionally enable mapframe support" (T218051) (duration: 00m 51s)
  • 11:08 hoo@deploy1001: Synchronized wmf-config/InitialiseSettings.php: Enable signatures in 2019: NS (ID 128) for wikimaniawiki (T221062) (duration: 00m 52s)
  • 10:49 gilles: T221065 eswiki purge finished
  • 10:45 moritzm: installing libjs-bootstrap updates from Stretch point release
  • 10:21 gilles: T221065 mwscript purgeList.php eswiki --all --verbose on mwmaint1002
  • 10:21 moritzm: installing xapian-core update from stretch point release
  • 10:18 gilles@deploy1001: Synchronized wmf-config/InitialiseSettings.php: T221065 Set up origin trials on Spanish Wikipedia mobile site (duration: 00m 52s)
  • 09:59 jijiki: Enabling puppet again on on dbproxy* and thumbor*
  • 09:51 jynus@deploy1001: Synchronized wmf-config/db-eqiad.php: Reduce db1078 load (duration: 00m 53s)
  • 09:37 jijiki: Disabling puppet on dbproxy* and thumbor* to merge 502972
  • 09:26 fsero: [late logging] swift container-to-container synchronization enabled between docker_registry_eqiad and docker_registry_codfw swift containers at 08:15:00 UTC
  • 09:05 ema@puppetmaster1001: conftool action : set/pooled=yes; selector: name=cp1076.eqiad.wmnet,service=varnish-fe
  • 09:05 ema@puppetmaster1001: conftool action : set/pooled=yes; selector: name=cp1076.eqiad.wmnet,service=nginx
  • 09:05 ema: cp1076: repool varnish-fe pointing to Varnish T213263
  • 08:57 ema@puppetmaster1001: conftool action : set/pooled=no; selector: name=cp1076.eqiad.wmnet,service=varnish-fe
  • 08:57 ema@puppetmaster1001: conftool action : set/pooled=no; selector: name=cp1076.eqiad.wmnet,service=nginx
  • 08:57 ema: cp1076: depool varnish-fe in preparation of traffic switchback to Varnish T213263
  • 08:40 hoo: Updated the Wikidata property suggester with data from the 2019-04-08 JSON dump and applied the T132839 workarounds
  • 08:33 moritzm: rebooting ms-be1020 for combined kernel/glibc/OpenSSL update
  • 08:01 moritzm: rebooting Swift frontends in codfw for combined kernel/glibc/OpenSSL security updates
  • 07:50 ema@puppetmaster1001: conftool action : set/pooled=yes; selector: name=cp2002.codfw.wmnet,service=varnish-fe
  • 07:50 ema@puppetmaster1001: conftool action : set/pooled=yes; selector: name=cp2002.codfw.wmnet,service=nginx
  • 07:50 ema: cp2002: repool varnish-fe pointing to Varnish T213263
  • 07:47 moritzm: rebooting Swift frontends in eqiad combined kernel/glibc/OpenSSL security updates
  • 07:45 ema@puppetmaster1001: conftool action : set/pooled=no; selector: name=cp2002.codfw.wmnet,service=varnish-fe
  • 07:45 ema@puppetmaster1001: conftool action : set/pooled=no; selector: name=cp2002.codfw.wmnet,service=nginx
  • 07:45 ema: cp2002: depool varnish-fe in preparation of traffic switchback to Varnish T213263
  • 07:36 marostegui: Upgrade db2093
  • 07:33 ema@puppetmaster1001: conftool action : set/pooled=yes; selector: name=cp2005.codfw.wmnet,service=varnish-fe
  • 07:33 ema@puppetmaster1001: conftool action : set/pooled=yes; selector: name=cp2005.codfw.wmnet,service=nginx
  • 07:32 ema: cp2005: repool varnish-fe pointing to Varnish T213263
  • 07:25 ema@puppetmaster1001: conftool action : set/pooled=no; selector: name=cp2005.codfw.wmnet,service=varnish-fe
  • 07:25 ema@puppetmaster1001: conftool action : set/pooled=no; selector: name=cp2005.codfw.wmnet,service=nginx
  • 07:25 ema: cp2005: depool varnish-fe in preparation of traffic switchback to Varnish T213263
  • 07:11 moritzm: upgrading Java on Hadoop/Kafka/Jumbo/Druid clusters
  • 05:02 marostegui@deploy1001: Synchronized wmf-config/db-eqiad.php: Repool pc1007 (duration: 00m 31s)
  • 01:46 aaron@deploy1001: Synchronized php-1.33.0-wmf.25/includes/parser/Parser.php: 73529ae6c5ffb6 (duration: 00m 53s)
  • 00:34 onimisionipe: pooled maps2003 - postgres init complete!
  • 00:33 krinkle@deploy1001: Synchronized wmf-config/profiler.php: I7589aa153 (duration: 00m 52s)
  • 00:33 urandom: creating new restbase schema -- T221031

2019-04-15

  • 23:39 dzahn@cumin1001: END (PASS) - Cookbook sre.hosts.decommission (exit_code=0)
  • 23:39 dzahn@cumin1001: START - Cookbook sre.hosts.decommission
  • 23:20 cdanis: cdanis@icinga1001.wikimedia.org ~ % sudo systemctl restart tcpircbot-logmsgbot.service
  • 23:17 bd808: scap: SWAT: wikitech: Use cn:caseExactMatch: as account search filter (T165795)
  • 20:59 thcipriani: gerrit back
  • 20:57 gehel: shutting down blazegraph and updater on wdqs1010, waiting for data reimport
  • 20:55 thcipriani: gerrit restart to pick up gc log changes incoming
  • 20:37 arlolra: Updated Parsoid to 83c17fc9
  • 20:23 Amir1: the ores deployment is over
  • 19:49 XioNoX: export BGP communities (prepend x3 outside asia) to AS3491 in eqsin
  • 19:46 mutante: bromine/vega: rm /etc/rsyncd.conf ; systemctl stop rsync (clean up old rsync config gerrit:503961)
  • 19:45 XioNoX: update (and add) AS3491 BGP communities in eqsin
  • 18:58 XioNoX: update mr1-* security policies - T219384
  • 18:41 onimisionipe: depooling maps2003 for psotgres init
  • 18:40 onimisionipe: pooling map2002 - postgres init complete
  • 18:39 Amir1: Morning SWAT is done
  • 18:35 shdubsh: logstash1009: disabling puppet and testing logstash config
  • 18:09 mutante: LDAP - adding legoktm and qchris to gerritadmin group (T219086)
  • 17:45 anomie: Backporting fix for T220991
  • 17:41 akosiaris: force puppet agent run on maps* after moving config-vars.yaml file for kartotherian, tilerator, tileratorui T220982
  • 17:33 mutante: LDAP - re-adding 'pbj' to 'nda' group, extended access until May 6th, transparency report contractor
  • 17:23 mutante: wikibugs - qdel'ed jobs and restarted another time, make it rejoin
  • 17:17 onimisionipe: wdqs deployment is complete! for some reasons I don't know scap did not logging here
  • 17:17 herron: restarted logstash on logstash1007
  • 17:15 mutante: restarted wikibugs because it stopped talking
  • 16:08 onimisionipe: pooling maps2001 - postgres reinit is complete
  • 15:55 Reedy: changed /srv/mediawiki/docroot/wikimedia.org to a symlink to standard-docroot
  • 15:53 XioNoX: add cloud-in4 firewall filter to codfw - T211921
  • 15:31 onimisionipe: restarting prometheus-wmf-elasticsearch-exporter-9* on all elastic nodes
  • 15:30 onimisionipe: restarting prometheus-wmf-elasticsearch-exporter-9200 on all elastic nodes
  • 15:28 _joe_: systemctl reset-failed on ms-be1027, debmonitor session
  • 15:24 Amir1: end of foreachwikiindblist wikidataclient extensions/Wikibase/lib/maintenance/populateSitesTable.php --force-protocol https (T219871)
  • 14:55 gehel: deploying tilerator to maps1001 to validate deployment is working - T220982
  • 14:55 Amir1: start of foreachwikiindblist wikidataclient extensions/Wikibase/lib/maintenance/populateSitesTable.php --force-protocol https (T219871)
  • 14:43 _joe_: running apply-config-tilerator on maps1001
  • 14:40 _joe_: running apply-config-karthoterian on maps1001
  • 14:22 cdanis: T220982 cdanis@cumin1001.eqiad.wmnet ~ % sudo cumin 'maps1*' 'sudo chmod -R a+r /srv/deployment/tilerator /srv/deployment/kartotherian'
  • 14:21 cdanis: cdanis@cumin1001.eqiad.wmnet ~ % sudo cumin 'maps1*' "disable-puppet 'bad permissions - T220982 - cdanis'"
  • 14:18 cdanis: cdanis@cumin1001.eqiad.wmnet ~ % sudo cumin 'maps*' 'sudo chmod -R a+r /srv/deployment/tilerator /srv/deployment/kartotherian'
  • 14:18 gehel: reseting permissions on maps server fir /srv/deployment/kartotherian and /srv/deplyoment/tilerator
  • 14:04 moritzm: rebooting ms-fe1005 for combined kernel/glibc/OpenSSL update
  • 13:57 jbond42: upgrading puppet 4 -> 5 and facter 2 -> 3 on mediawiki::canary_appserver, mediawiki::appserver::canary_api and cache::cache roles
  • 13:56 gehel: restart tilerator / kartotherian on all maps servers for openssl update
  • 13:55 godog: start ms-be1013 decom - T220590
  • 13:42 godog: reboot ms-be1013
  • 13:09 moritzm: installing wget security updates on trusty hosts
  • 12:59 moritzm: restarting archiva on archiva1001 for OpenJDK security update
  • 12:50 moritzm: restarting Apache on matomo1001 to pick up OpenSSL update
  • 12:14 moritzm: rolling restart of HHVM/Apache on deployment servers to pick up OpenSSL update
  • 11:59 fsero: pointing boron docker builds to the new registry temporarily (docker builds on boron might fail)
  • 11:35 Amir1: EU swat is done
  • 11:26 moritzm: rolling restart of HHVM/Apache on labweb* to pick up OpenSSL update
  • 09:58 moritzm: installing openssl1.0 security updates
  • 09:18 gehel: unbanning elastic1029 from cluster
  • 08:58 moritzm: updating mediawiki servers in eqiad to version 1.8.1 of the PHP extension for wikidiff
  • 08:29 onimisionipe: increase wal_keep_segments on codfw maps master
  • 08:19 moritzm: updating mediawiki servers in codfw to version 1.8.1 of the PHP extension for wikidiff
  • 07:50 Amir1: ladsgroup@mwmaint1002:~$ mwscript maintenance/initSiteStats.php --wiki=hywwiki --active (T220936)
  • 05:31 marostegui: Upgrade db1100
  • 05:07 marostegui: powercycle mw1280 (crashed)

2019-04-14

  • 06:10 ebernhardson: unban elastic1027 from eqiad-psi
  • 05:36 ebernhardson: unbanning elastic1027 after about half the shards left and load dropped
  • 05:31 ebernhardson: ban elastic1027 from elasticsearch-psi in eqiad
  • 04:59 ebernhardson: restart elasticsearch_6@production-searhc-psi-eqiad on elastic1027 due to 100% cpu for last 30+ minutes

2019-04-13

  • 18:46 godog: 3h downtime for cloudvirt1015
  • 15:58 ebernhardson: restart elasticsearch on elastic1027
  • 15:34 shdubsh: restart recommendation_api on scb1001
  • 15:33 shdubsh: restart recommendation_api on scb2001
  • 10:46 onimisionipe: depooling maps2001 for postgres init
  • 08:05 gehel: repooling wdqs1008 - data transfer completed - T220830
  • 00:32 krinkle@deploy1001: Synchronized php-1.33.0-wmf.25/includes/: Idc19cc29764a / T220854 - hot fix (duration: 05m 37s)

2019-04-12

  • 21:16 Krinkle: scap was unable to sync to 1 apache (connect to host cloudweb2001-dev.wikimedia.org port 22: Connection timed out)
  • 21:10 krinkle@deploy1001: Synchronized php-1.33.0-wmf.25/extensions/ImageMap/includes/ImageMap.php: I0ee84f059da / T217087 (duration: 05m 12s)
  • 19:27 dzahn@cumin1001: END (FAIL) - Cookbook sre.hosts.decommission (exit_code=99)
  • 19:27 dzahn@cumin1001: START - Cookbook sre.hosts.decommission
  • 19:24 dzahn@cumin1001: END (FAIL) - Cookbook sre.hosts.decommission (exit_code=99)
  • 19:24 dzahn@cumin1001: START - Cookbook sre.hosts.decommission
  • 18:59 dzahn@cumin1001: END (PASS) - Cookbook sre.hosts.decommission (exit_code=0)
  • 18:59 dzahn@cumin1001: START - Cookbook sre.hosts.decommission
  • 17:17 onimisionipe: depooling maps2002 for postgres init
  • 17:16 onimisionipe: repooling maps2001 - postgres init is complete
  • 16:14 elukey: install ifstat on all the mc1* hosts for network bandwidth investigation
  • 15:56 gehel: starting data trasnfer from wdqs1008 to wdqs1009 - T220830
  • 15:32 thcipriani: gerrit back
  • 15:29 thcipriani: gerrit restart incoming
  • 14:29 onimisionipe: depool maps2001 for postgres initialization
  • 13:24 akosiaris: re-enable puppet across the fleet. Patch merged, recovery storm coming
  • 13:18 akosiaris: disable puppet across the fleet to avoid incoming puppet alert storm
  • 12:57 marostegui: Purge old rows and optimize tables on spare host pc1010 T210725
  • 12:53 urandom: decommissioning cassandra-c, restbase2008 -- T208087
  • 12:49 gehel: rolling restart of cassandra on maps* for jvm upgrade
  • 12:22 arturo: T220095 disable icinga checks for labtestcontrol2003
  • 12:16 gilles@deploy1001: Synchronized wmf-config/InitialiseSettings.php: T220807 Reduce cawiki survey sampling rate (duration: 05m 11s)
  • 11:56 moritzm: upgrading app server canaries to version 1.8.1 of the PHP wikidiff extension (HHVM already deployed) T203069
  • 11:46 moritzm: upgrading acmechief hosts to latest buster state
  • 11:44 gilles@deploy1001: Synchronized wmf-config/InitialiseSettings.php: T220807 Oversample navtiming on cawiki and commonswiki (duration: 05m 14s)
  • 11:37 Trey314159: reindexing Greek, Turkish, and Irish wikis on elastic@eqiad and elastic@codfw complete (T217806)
  • 11:19 moritzm: installed Java security updates on relforge* hosts
  • 11:10 moritzm: installing Java security updates on remaining maps hosts
  • 10:32 arturo: T219626 reimaging cloudcontrol2001-dev
  • 10:13 elukey: matomo updated to 3.9.1 on matomo1001 + deb upload to wikimedia-stretch - T218037
  • 09:53 moritzm: updated mwdebug1001 to php-wikidiff 1.8.1
  • 09:37 moritzm: updated mwdebug1002 to php-wikidiff 1.8.1
  • 09:30 volans: reset mgmt card on labtestcontrol2003 - T220783
  • 09:07 moritzm: added the wikimedia repository key to the stretch build chroot on boron, fixes builds using the PHP72/SPICERACK hooks
  • 09:05 arturo: T218021 disable icinga checks for labtestcontrol2001
  • 08:35 gilles@deploy1001: Synchronized php-1.33.0-wmf.25/extensions/NavigationTiming/modules/ext.navigationTiming.js: T220788 Fix veaction === null case (duration: 00m 54s)
  • 08:02 moritzm: updated ssacli in thirdparty/hwraid component for stretch to 3.30-13.0 T220787
  • 07:12 marostegui: Manually install ssacli on db2[097|098|099|100|101|102] T220787 T220572
  • 07:04 moritzm: synced ssacli to thirdparty/hwraid components for jessie/stretch T220787
  • 01:00 mutante: puppet cert clean, puppet node clean, puppet node deactivate on cloudnet2001-dev.codfw.wmnet (T218025)
  • 00:25 tstarling@deploy1001: Synchronized wmf-config/profiler.php: increase excimer max depth (duration: 00m 53s)
  • 00:02 ejegg: updated fundraising CiviCRM from 24b968b1f9 to 1bc1570967

2019-04-11

  • 23:57 urandom: decommissioning cassandra-b, restbase2008 -- T208087
  • 22:15 jforrester@deploy1001: Synchronized php-1.33.0-wmf.25/extensions/WikibaseMediaInfo/resources/: Hot-deploy fix for WBMI variable cache miss T220665 (duration: 00m 55s)
  • 20:46 mutante: deleting job of wikibugs-phab-listener in an attempt to restart it
  • 19:47 cdanis: cdanis@mwdebug1001.eqiad.wmnet ~ % sudo systemctl stop hhvm && sudo rm /var/cache/hhvm/fcgi.hhbc.sq3 && sudo systemctl start hhvm
  • 19:39 twentyafterfour: mediawiki error rate seems to be back to normal after deploying 1.33.0-wmf.25, the new branch looks stable refs T206679
  • 18:55 mutante: disabling puppet on hosts using class 'confd' to safely deploy gerrit:456317
  • 18:55 Trey314159: reindexing Greek, Turkish, and Irish wikis on elastic@eqiad and elastic@codfw (T217806)
  • 18:01 onimisionipe: increase replication factor on maps codfw cluster
  • 17:45 onimisionipe@deploy1001: Finished deploy [kartotherian/deploy@5394b59] (stretch): Insert maps2001 into stretch environment (duration: 00m 22s)
  • 17:45 onimisionipe@deploy1001: Started deploy [kartotherian/deploy@5394b59] (stretch): Insert maps2001 into stretch environment
  • 17:22 mbsantos@deploy1001: Finished deploy [proton/deploy@5cb8bbe]: Update chromium-renderer to 8988283 (T213362, T216191, T212322) (duration: 01m 33s)
  • 17:21 mbsantos@deploy1001: Started deploy [proton/deploy@5cb8bbe]: Update chromium-renderer to 8988283 (T213362, T216191, T212322)
  • 16:25 otto@deploy1001: scap-helm eventgate-analytics upgrade staging -f eventgate-analytics-staging-values.yaml --reset-values stable/eventgate-analytics [namespace: eventgate-analytics, clusters: staging]
  • 15:48 otto@deploy1001: scap-helm eventgate-analytics finished
  • 15:47 otto@deploy1001: scap-helm eventgate-analytics cluster staging completed
  • 15:47 otto@deploy1001: scap-helm eventgate-analytics upgrade staging -f eventgate-analytics-staging-values.yaml --reset-values stable/eventgate-analytics [namespace: eventgate-analytics, clusters: staging]
  • 15:42 otto@deploy1001: scap-helm eventgate-analytics finished
  • 15:42 otto@deploy1001: scap-helm eventgate-analytics cluster staging completed
  • 15:42 otto@deploy1001: scap-helm eventgate-analytics upgrade staging -f eventgate-analytics-staging-values.yaml --reset-values stable/eventgate-analytics [namespace: eventgate-analytics, clusters: staging]
  • 15:41 otto@deploy1001: scap-helm eventgate-analytics upgrade staging -f eventgate-analytics-staging-values.yaml --reset-values stable/eventgate-analytics [namespace: eventgate-analytics, clusters: staging]
  • 15:36 onimisionipe@deploy1001: Finished deploy [kartotherian/deploy@13d9ebb] (stretch): Update stretch instance with latest code (duration: 00m 22s)
  • 15:35 onimisionipe@deploy1001: Started deploy [kartotherian/deploy@13d9ebb] (stretch): Update stretch instance with latest code
  • 15:23 lucaswerkmeister-wmde@deploy1001: Synchronized wmf-config/InitialiseSettings.php: no-op comment update (duration: 01m 00s)
  • 15:06 cdanis@puppetmaster1001: conftool action : set/pooled=true; selector: dnsdisc=kartotherian,name=codfw
  • 14:53 paravoid: rebooting labnet1002
  • 14:49 vgutierrez: uploaded acme-chief 0.16 to apt.wikimedia.org (buster) - T207461
  • 14:47 urandom: decommissioning cassandra-a, restbase2008 -- T208087
  • 14:46 akosiaris: cxserver Add gargage collections graphs under saturation. T205911
  • 14:18 Amir1: Deployment of Url shortener is done now
  • 14:15 ladsgroup@deploy1001: Synchronized wmf-config/InitialiseSettings.php: Deploy UrlShortener to metawiki, let's get the party started (T108557, T44085) (duration: 01m 00s)
  • 12:49 gehel@puppetmaster1001: conftool action : set/pooled=inactive; selector: dc=codfw,cluster=maps,name=maps2001.codfw.wmnet
  • 12:20 kartik@deploy1001: scap-helm cxserver finished
  • 12:19 kartik@deploy1001: scap-helm cxserver cluster eqiad completed
  • 12:19 kartik@deploy1001: scap-helm cxserver upgrade -f cxserver-eqiad-values.yaml production stable/cxserver [namespace: cxserver, clusters: eqiad]
  • 12:16 kartik@deploy1001: scap-helm cxserver finished
  • 12:16 kartik@deploy1001: scap-helm cxserver cluster codfw completed
  • 12:15 kartik@deploy1001: scap-helm cxserver upgrade -f cxserver-codfw-values.yaml production stable/cxserver [namespace: cxserver, clusters: codfw]
  • 12:12 kartik@deploy1001: scap-helm cxserver finished
  • 12:12 kartik@deploy1001: scap-helm cxserver cluster staging completed
  • 12:12 kartik@deploy1001: scap-helm cxserver upgrade -f cxserver-staging-values.yaml staging stable/cxserver [namespace: cxserver, clusters: staging]
  • 11:40 zeljkof: EU SWAT finished
  • 11:39 zfilipin@deploy1001: Synchronized wmf-config/InitialiseSettings.php: SWAT: Increase musical notation datatype string length limit (T218767) (duration: 01m 02s)
  • 11:37 akosiaris@deploy1001: scap-helm cxserver finished
  • 11:36 akosiaris@deploy1001: scap-helm cxserver cluster codfw completed
  • 11:36 akosiaris@deploy1001: scap-helm cxserver upgrade -f cxserver-codfw-values.yaml production stable/cxserver [namespace: cxserver, clusters: codfw]
  • 11:30 onimisionipe: removing maps2002 from cassandra cluster due to dead node error
  • 10:46 moritzm: upgrading remaining app servers to HHVM 3.18.5+dfsg-1+wmf8+deb9u2 and wikidiff 1.8.1 (T203069)
  • 10:39 hashar: Upgrading CI Jenkins
  • 10:21 volans: forcing puppet run on A:cp-upload_codfw
  • 10:15 gehel: remove maps2001 from new cassandra cluster -T198622
  • 10:10 oblivian@puppetmaster1001: conftool action : set/pooled=false; selector: dnsdisc=kartotherian,name=codfw
  • 09:57 elukey: roll restart druid-coordinator/overlord on druid100[4-6] to pick up new jvm settings
  • 09:01 moritzm: deployment servers to HHVM 3.18.5+dfsg-1+wmf8+deb9u2 and wikidiff 1.8.1 (T203069)
  • 08:20 moritzm: upgrading remaining job runners to HHVM 3.18.5+dfsg-1+wmf8+deb9u2 and wikidiff 1.8.1 (T203069)
  • 08:19 elukey: roll restart of druid-broker/historical on druid100[4-6] to pick up new settings
  • 06:33 moritzm: uploaded jenkins 2.164.2 to apt.wikimedia.org (stretch-wikimedia / thirdparty/ci)
  • 06:32 moritzm: uploaded jenkins 2.164.2 to apt.wikimedia.org (jessie-wikimedia / thirdparty)
  • 06:24 moritzm: upgrading remaining API Servers to HHVM 3.18.5+dfsg-1+wmf8+deb9u2 and wikidiff 1.8.1 (T203069)
  • 05:03 marostegui@deploy1001: Synchronized wmf-config/db-eqiad.php: Remove s3 ready only T219115 (duration: 00m 36s)
  • 05:02 marostegui@deploy1001: Synchronized wmf-config/db-eqiad.php: Switchover s3 master eqiad from db1078 to db1075 T219115 (duration: 00m 36s)
  • 05:01 marostegui@deploy1001: Synchronized wmf-config/db-eqiad.php: Set s3 on read-only T219115 (duration: 00m 37s)
  • 05:00 marostegui: Starting s3 failover from db1078 to db1075 - T219115
  • 04:32 marostegui: Disable puppet on db1078 and db1075 T219115
  • 04:18 marostegui: Start topology changes to move s3 slaves under db1075 T219115
  • 04:14 marostegui: Disable GTID on s3 hosts - https://phabricator.wikimedia.org/T219115
  • 00:45 jforrester@deploy1001: Synchronized php-1.33.0-wmf.25/extensions/PageTriage/: UBN Fix for pageTriage and ORES T220649 (duration: 01m 04s)
  • 00:12 twentyafterfour: deploying phabricator upgrade

2019-04-10

  • 20:43 urandom: decommissioning cassandra-c, restbase2007 -- T208087
  • 20:27 otto@deploy1001: Synchronized wmf-config/InitialiseSettings.php: Revert - Enabling api-request logging via eventgate-analytics for group1 wikis - T214080 (duration: 01m 00s)
  • 19:48 otto@deploy1001: Synchronized wmf-config/InitialiseSettings.php: Enabling api-request logging via eventgate-analytics for group1 wikis - T214080 (duration: 00m 59s)
  • 19:42 twentyafterfour@deploy1001: Synchronized php: group1 wikis to 1.33.0-wmf.25 refs T206679 (duration: 01m 48s)
  • 19:40 twentyafterfour@deploy1001: rebuilt and synchronized wikiversions files: group1 wikis to 1.33.0-wmf.25 refs T206679
  • 19:28 twentyafterfour@deploy1001: rebuilt and synchronized wikiversions files: group0 wikis to 1.33.0-wmf.25 refs T206679
  • 19:26 XioNoX: enable sampling on cr2-eqiad external links, outbound
  • 19:17 twentyafterfour@deploy1001: Pruned MediaWiki: 1.33.0-wmf.20 [keeping static files] (duration: 02m 18s)
  • 19:14 ejegg: updated fundraising CiviCRM from d0e44a9e51 to 24b968b1f9
  • 19:08 twentyafterfour@deploy1001: Pruned MediaWiki: 1.33.0-wmf.19 [keeping static files] (duration: 02m 22s)
  • 17:44 twentyafterfour@deploy1001: Pruned MediaWiki: 1.33.0-wmf.18 [keeping static files] (duration: 02m 22s)
  • 16:58 chaomodus: restarted nagios-nrpe-server on proton1001 (it died due to OOM)
  • 16:51 elukey@puppetmaster1001: conftool action : set/pooled=no; selector: name=druid1004.eqiad.wmnet
  • 16:01 elukey: restart brokers on druid100[3-6] - locking after segments get deleted
  • 15:46 krinkle@deploy1001: Synchronized php-1.33.0-wmf.25/includes/parser/DateFormatter.php: Ib2b3fb / T220563 (duration: 01m 00s)
  • 15:28 gilles@deploy1001: Synchronized php-1.33.0-wmf.25/includes/media/ThumbnailImage.php: T216499 Only apply high priority hint half the time (duration: 00m 59s)
  • 15:26 oblivian@deploy1001: Finished deploy [docker-pkg/deploy@605690c]: Upgrade to docker-pkg 2.0.0 everywhere (duration: 00m 21s)
  • 15:26 oblivian@deploy1001: Started deploy [docker-pkg/deploy@605690c]: Upgrade to docker-pkg 2.0.0 everywhere
  • 15:24 jforrester@deploy1001: Synchronized php-1.33.0-wmf.25/extensions/Score/: UBN Revert Score changes that broke VE T220465 (duration: 01m 01s)
  • 15:19 oblivian@deploy1001: Finished deploy [docker-pkg/deploy@605690c]: Upgrade to docker-pkg 2.0.0 (duration: 00m 13s)
  • 15:19 oblivian@deploy1001: Started deploy [docker-pkg/deploy@605690c]: Upgrade to docker-pkg 2.0.0
  • 15:01 fsero: pooled back mwdebug200[1,2] T219989
  • 15:00 fsero: repooling mwdebug2002
  • 15:00 jijiki: Enable puppet on thumbor1001, switch back to nginx, pool thumbor1004 - T187765
  • 14:57 fsero: repooling mwdebug2001
  • 14:20 hashar: CI processing was a bit slower than usual over the past couple hours or so. It should be slightly faster now T220606
  • 14:13 joal@deploy1001: Finished deploy [analytics/aqs/deploy@fc1d232]: Deploying per-page limits for druid-endpoints (duration: 14m 41s)
  • 13:58 joal@deploy1001: Started deploy [analytics/aqs/deploy@fc1d232]: Deploying per-page limits for druid-endpoints
  • 13:47 fsero: resizing disk on mwdebug2002 T219989
  • 13:42 anomie@deploy1001: Synchronized wmf-config/InitialiseSettings.php: Setting actor migration to write-both/read-new on group0 (T188327) (duration: 01m 00s)
  • 13:19 marostegui: Deploy schema change on aawiki aawikibooks aawiktionary abwiki abwiktionary acewiki advisorswiki advisorywiki adywiki afwiki on x1 - T136427
  • 12:41 urandom: decommissioning cassandra-b, restbase2007 -- T208087
  • 12:40 hashar: contint2001: stopped puppet and zuul-merger for debugging
  • 12:17 jbond42: rolling security update of systemd on stretch systems
  • 12:07 Amir1: EU swat is done
  • 12:07 ladsgroup@deploy1001: Synchronized wmf-config/CommonSettings.php: SWAT: Prep work for deploying UrlShortener extension (T108557), part II (duration: 01m 00s)
  • 12:05 ladsgroup@deploy1001: Synchronized wmf-config/InitialiseSettings.php: SWAT: Prep work for deploying UrlShortener extension (T108557), part I (duration: 01m 00s)
  • 11:46 dcausse: elastisearch search cluster: reindexing zh-min-nan wikis (T219533)
  • 10:55 moritzm: upgrading nodejs on analytics-tool1002 to latest node 10 version from component/node10
  • 10:46 gilles: T220265 setZoneAccess on all wikis finished
  • 10:40 akosiaris: upgrade kubernetes-node on kubestage1002 (staging cluster) to 1.12.7-1 T220405
  • 10:33 moritzm: upgrading nodejs on aqs* to latest node 10 version from component/node10
  • 10:25 fsero: resizing disk on mwdebug2001 T219989
  • 10:17 akosiaris: upload kubernetes_1.12.7-1 to apt.wikimedia.org/stretch-wikimedia component main T220405
  • 10:14 marostegui@deploy1001: Synchronized wmf-config/db-eqiad.php: Repool db1064 T217453 (duration: 00m 59s)
  • 10:08 marostegui@deploy1001: Synchronized wmf-config/db-eqiad.php: Repool db1120 T217453 (duration: 01m 03s)
  • 09:59 moritzm: upgrading labweb hosts (wikitech) to HHVM 3.18.5+dfsg-1+wmf8+deb9u2 and wikidiff 1.8.1 (T203069)
  • 09:51 akosiaris: upgrade kubernetes-node on kubestage1001 (staging cluster) to 1.12.7-1 T220405
  • 09:50 moritzm: upgrading snapshot hosts to HHVM 3.18.5+dfsg-1+wmf8+deb9u2 and wikidiff 1.8.1 (T203069)
  • 09:40 akosiaris: upgrade kubernetes-master on neon (staging cluster) to 1.12.7-1 T220405
  • 09:40 akosiaris: upgrade kubernetes-master on neon (staging cluster) to 1.12.7-1
  • 09:05 moritzm: upgrading job runners mw1299-mw1311 to HHVM 3.18.5+dfsg-1+wmf8+deb9u2 and wikidiff 1.8.1 (T203069)
  • 08:56 elukey: restart druid-broker on druid100[4-6] - stuck after attempt datasource delete action
  • 08:46 godog: roll-restart swift frontends - T214289
  • 08:36 elukey: update thirdparty/cloudera packages to cdh 5.16.1 for jessie/stretch-wikimedia - T218343
  • 08:26 onimisionipe@deploy1001: Finished deploy [kartotherian/deploy@f7518bb] (stretch): Insert maps2003 into stretch environment (duration: 00m 22s)
  • 08:26 onimisionipe@deploy1001: Started deploy [kartotherian/deploy@f7518bb] (stretch): Insert maps2003 into stretch environment
  • 08:12 gilles: T220265 foreachwiki extensions/WikimediaMaintenance/filebackend/setZoneAccess.php --backend local-multiwrite
  • 07:22 mholloway-shell@deploy1001: Finished deploy [mobileapps/deploy@efd5bd5]: Revert "Bifurcate imageinfo queries to improve performance" (T220574) (duration: 04m 05s)
  • 07:18 mholloway-shell@deploy1001: Started deploy [mobileapps/deploy@efd5bd5]: Revert "Bifurcate imageinfo queries to improve performance" (T220574)
  • 07:12 onimisionipe: depooling maps200[34] to increase cassandra replication factor - T198622
  • 07:09 jijiki: Rolling restart thumbor service
  • 07:08 jijiki: Upgrading Thumbor servers to python-thumbor-wikimedia to 2.4-1+deb9u1
  • 06:59 marostegui: Deploy schema change on x1 master, with replication, lag will happen on x1 T217453
  • 06:59 marostegui@deploy1001: Synchronized wmf-config/db-eqiad.php: Depool x1 slaves T217453 (duration: 01m 13s)
  • 05:52 _joe_: setting both mwdebug200{1,2} to pooled = inactive to remove them from scap dsh list and allow deployments, T219989
  • 05:12 _joe_: same on mwdebug2001
  • 05:08 _joe_: removing hhvm cache on mwdebug2002
  • 00:37 Krinkle: last scap sync-file failed to mwdebug2002.codfw and mwdebug2001.codfw due to insufficient disk space
  • 00:20 krinkle@deploy1001: Synchronized php-1.33.0-wmf.25/resources/src/startup/: I3b9f1a13379a / Ie9db60e417cca (duration: 01m 01s)

2019-04-09

  • 23:14 twentyafterfour@deploy1001: Pruned MediaWiki: 1.33.0-wmf.17 [keeping static files] (duration: 06m 03s)
  • 22:31 twentyafterfour@deploy1001: Finished scap: testwikis wikis to 1.33.0-wmf.25 refs T206679 (duration: 39m 59s)
  • 22:19 chaomodus: uploaded python-pynetbox to apt.wikimedia.org/stretch-wikimedia (T217072)
  • 22:13 mobrovac@deploy1001: Finished deploy [restbase/deploy@c0a2977]: Bring RB on restbase20(19|20) up to date - T208087 (duration: 02m 32s)
  • 22:11 mobrovac@deploy1001: Started deploy [restbase/deploy@c0a2977]: Bring RB on restbase20(19|20) up to date - T208087
  • 21:57 twentyafterfour@deploy1001: Started scap: testwikis wikis to 1.33.0-wmf.25 refs T206679
  • 21:48 urandom: decommissioning cassandra-a, restbase2007 -- T208087
  • 19:46 herron: added myself to ldap group cn=archiva-deployers,ou=groups,dc=wikimedia,dc=org
  • 19:10 twentyafterfour: branching 1.33.0-wmf.25
  • 18:53 crusnov@deploy1001: Finished deploy [netbox/deploy@018d83e]: Minor fix to Netbox-Ganeti sync script (duration: 00m 52s)
  • 18:52 crusnov@deploy1001: Started deploy [netbox/deploy@018d83e]: Minor fix to Netbox-Ganeti sync script
  • 18:50 thcipriani: gerrit back
  • 18:48 thcipriani: gerrit restart
  • 18:48 thcipriani@deploy1001: Finished deploy [gerrit/gerrit@43d2d2e]: Gerrit update (cobalt) -- restart incoming (duration: 00m 10s)
  • 18:47 thcipriani@deploy1001: Started deploy [gerrit/gerrit@43d2d2e]: Gerrit update (cobalt) -- restart incoming
  • 18:46 thcipriani@deploy1001: Finished deploy [gerrit/gerrit@43d2d2e]: Gerrit update (gerrit2001 only) (duration: 00m 10s)
  • 18:46 thcipriani@deploy1001: Started deploy [gerrit/gerrit@43d2d2e]: Gerrit update (gerrit2001 only)
  • 18:42 volans: restart icinga on icinga1001 - T196336
  • 18:38 cdanis: T196336 cdanis@icinga1001$ sudo systemctl restart nsca
  • 18:27 crusnov@deploy1001: Finished deploy [netbox/deploy@4aa3e47]: Add node sync to Netbox-Ganeti sync script - T215229 (duration: 00m 57s)
  • 18:26 crusnov@deploy1001: Started deploy [netbox/deploy@4aa3e47]: Add node sync to Netbox-Ganeti sync script - T215229
  • 18:11 gilles@deploy1001: Finished deploy [performance/asoranking@4c83130]: (no justification provided) (duration: 00m 03s)
  • 18:11 gilles@deploy1001: Started deploy [performance/asoranking@4c83130]: (no justification provided)
  • 18:07 urandom: bootstrapping cassandra-c, restbase2020 -- T208087
  • 17:58 gilles@deploy1001: Finished deploy [performance/asoranking@4c83130]: (no justification provided) (duration: 00m 02s)
  • 17:58 gilles@deploy1001: Started deploy [performance/asoranking@4c83130]: (no justification provided)
  • 17:56 elukey: restart keyholder-agent on deploy1001 to pick up new settings for analytics (+ arm all the keys)
  • 17:42 gilles@deploy1001: Finished deploy [performance/asoranking@4c83130]: (no justification provided) (duration: 00m 04s)
  • 17:42 gilles@deploy1001: Started deploy [performance/asoranking@4c83130]: (no justification provided)
  • 17:42 elukey: restart keyholder-proxy.service on deploy1001 as attempt to reload perms for the analytics_deploy key
  • 17:37 gilles@deploy1001: Finished deploy [performance/asoranking@4c83130]: (no justification provided) (duration: 00m 10s)
  • 17:37 gilles@deploy1001: Started deploy [performance/asoranking@4c83130]: (no justification provided)
  • 17:19 bsitzmann@deploy1001: Finished deploy [mobileapps/deploy@b04c397]: Update mobileapps to 3edfcad (T220045 T219411 T219667) (duration: 03m 50s)
  • 17:15 bsitzmann@deploy1001: Started deploy [mobileapps/deploy@b04c397]: Update mobileapps to 3edfcad (T220045 T219411 T219667)
  • 17:14 twentyafterfour@deploy1001: Synchronized php-1.33.0-wmf.24/includes/export/WikiExporter.php: deploy https://gerrit.wikimedia.org/r/c/mediawiki/core/+/502537/1 (duration: 00m 51s)
  • 17:09 twentyafterfour@deploy1001: Synchronized php-1.33.0-wmf.24/includes/export/XmlDumpWriter.php: deploy https://gerrit.wikimedia.org/r/c/mediawiki/core/+/502538/1 (duration: 00m 52s)
  • 17:04 gilles@deploy1001: Synchronized php-1.33.0-wmf.24/includes/specials/SpecialUploadStash.php: T220265 Add support for X-Swift-Secret to upload stash (duration: 00m 53s)
  • 17:03 twentyafterfour: deploying https://gerrit.wikimedia.org/r/c/mediawiki/core/+/502538/1 and https://gerrit.wikimedia.org/r/c/mediawiki/core/+/502537/1
  • 17:01 arturo: T220426 reimaging+renaming labtestnet2002 to cloudweb2001-dev
  • 16:49 otto@deploy1001: scap-helm eventgate-analytics finished
  • 16:49 otto@deploy1001: scap-helm eventgate-analytics cluster eqiad completed
  • 16:49 otto@deploy1001: scap-helm eventgate-analytics upgrade production -f eventgate-analytics-eqiad-values.yaml --reset-values stable/eventgate-analytics [namespace: eventgate-analytics, clusters: eqiad]
  • 16:46 otto@deploy1001: scap-helm eventgate-analytics finished
  • 16:46 otto@deploy1001: scap-helm eventgate-analytics cluster codfw completed
  • 16:46 otto@deploy1001: scap-helm eventgate-analytics upgrade production -f eventgate-analytics-codfw-values.yaml --reset-values stable/eventgate-analytics [namespace: eventgate-analytics, clusters: codfw]
  • 16:45 otto@deploy1001: scap-helm eventgate-analytics finished
  • 16:45 otto@deploy1001: scap-helm eventgate-analytics cluster staging completed
  • 16:45 otto@deploy1001: scap-helm eventgate-analytics upgrade staging -f eventgate-analytics-staging-values.yaml --reset-values stable/eventgate-analytics [namespace: eventgate-analytics, clusters: staging]
  • 16:41 herron: performing rolling restart of kafka main brokers and eventbus instances in eqiad to pick up security updates
  • 16:32 otto@deploy1001: scap-helm eventgate-analytics finished
  • 16:32 otto@deploy1001: scap-helm eventgate-analytics cluster staging completed
  • 16:32 otto@deploy1001: scap-helm eventgate-analytics upgrade staging -f eventgate-analytics-staging-values.yaml --reset-values stable/eventgate-analytics [namespace: eventgate-analytics, clusters: staging]
  • 16:28 jijiki: Restarting thumbor service on thumbor1001
  • 16:26 jijiki: Upgrading thumbor1001 to python-thumbor-wikimedia_2.4-1+deb9u1
  • 16:18 jijiki: Uploading python-thumbor-wikimedia_2.4-1+deb9u1 to component/thumbor in stretch-wikimedia
  • 15:05 moritzm: uploaded jenkins 2.164.1 for stretch-wikimedia/thirdparty/ci
  • 15:04 moritzm: uploaded jenkins 2.164.1 for jessie-wikimedia/thirdparty
  • 14:42 ejegg: updated payments-wiki from 15bcb3d1a6 to aa8dad50e7
  • 14:10 ema: reboot lvs2010 with systemd 232 T209707
  • 14:09 godog: bootstrapping cassandra-b, restbase2020 -- T208087
  • 13:19 godog: bounce rsyslog on wezen
  • 13:11 fsero: building envoy docker image
  • 13:07 jbond42: rolling security updates of systemd on canary systems
  • 12:35 godog: bounce rsyslog on lithium
  • 12:13 elukey: powercycle logstash1012 - no ssh, no mgmt console available, seems completely stuck
  • 12:10 jbond42: remove facter2.4 from wikimedia-buster
  • 11:27 moritzm: upgrading API servers mw1276-mw1290 to HHVM 3.18.5+dfsg-1+wmf8+deb9u2 and wikidiff 1.8.1 (T203069)
  • 11:07 akosiaris: pool both DCs for newly created swift.recovery.wmnet RR
  • 11:07 akosiaris@puppetmaster1001: conftool action : set/pooled=true; selector: name=.*,dnsdisc=swift
  • 11:00 ema: rebooting lvs2010 with systemd 241-1~bpo9+1 T209707
  • 10:57 moritzm: updated buster installer to daily build from 9th of April
  • 10:09 godog: bootstrapping cassandra-a, restbase2020 -- T208087
  • 10:07 moritzm: rebooting stat1005 for some tests again
  • 09:49 gilles@deploy1001: Synchronized php-1.33.0-wmf.24/extensions/NavigationTiming: T220476 Add originCountry to paintTiming context (duration: 00m 54s)
  • 09:46 moritzm: rebooting stat1005 for some tests
  • 08:47 akosiaris: switch swift to be accessed from varnish+ats active/active rw
  • 08:38 marostegui@deploy1001: Synchronized wmf-config/db-eqiad.php: Remove old comment from db1089 (duration: 00m 51s)
  • 08:33 marostegui@deploy1001: Synchronized wmf-config/db-codfw.php: Repool db2069 (duration: 00m 50s)
  • 08:10 marostegui: Upgrade db2069
  • 08:10 marostegui@deploy1001: Synchronized wmf-config/db-codfw.php: Depool db2069 (duration: 00m 51s)
  • 07:52 moritzm: upgrading app servers mw1319-mw1333 to HHVM 3.18.5+dfsg-1+wmf8+deb9u2 and wikidiff 1.8.1 (T203069)
  • 07:49 marostegui@deploy1001: Synchronized wmf-config/db-eqiad.php: Deploy parsercache key change everywhere T210725 (duration: 00m 53s)
  • 07:37 moritzm: installing samba security updates
  • 07:21 marostegui: Change parsercache keys on mw[1230-1235,1238-1239] - T210725
  • 07:10 jijiki: Depool thumbor1004 for testing - T187765
  • 07:09 marostegui: Change parsercache keys on mw[1221-1229] - T210725
  • 07:03 marostegui: Change parsercache keys on mw[1280-1289] - T210725
  • 06:51 dcausse: elasticsearch search cluster: reindex all spaceless languages in eqiad and codfw (T219533)
  • 06:47 moritzm: installing libav security updates
  • 06:39 marostegui: Change parsercache keys on mw[1260-1269] - T210725
  • 06:30 marostegui: Change parsercache keys on mw[1270-1279] - T210725
  • 06:01 marostegui: Deploy parsercache key change on canaries only - T210725
  • 03:23 krinkle@deploy1001: Synchronized php-1.33.0-wmf.24/extensions/ExternalGuidance/extension.json: Id04a3a / T219841 (duration: 00m 52s)
  • 03:16 onimisionipe: depooled maps2003 - T219849
  • 02:47 onimisionipe: restarting tilerator on maps2003 - T219849
  • 02:40 krinkle@deploy1001: Synchronized php-1.33.0-wmf.24/extensions/ExternalGuidance/extension.json: I8614f6 / T219841 (duration: 00m 53s)
  • 01:27 eileen: civicrm revision changed from dfe89516b3 to d0e44a9e51, config revision is 2bcbf44521
  • 00:45 urandom: bootstrapping cassandra-c, restbase2019 -- T208087
  • 00:07 ebernhardson@deploy1001: Synchronized wmf-config/: T218716: Migrade configs to WikibaseCirrusSearch (duration: 00m 51s)

2019-04-08

  • 23:57 ebernhardson@deploy1001: Synchronized wmf-config/InitialiseSettings.php: T218954: Enable WBCS search on commons too (duration: 00m 50s)
  • 23:45 ebernhardson@deploy1001: Synchronized wmf-config: T218954: Disable wbcs dispatching query builder on commons (3/3) (duration: 00m 52s)
  • 23:41 ebernhardson@deploy1001: Synchronized wmf-config: T218954: Disable wbcs dispatching query builder on commons (3/3) (duration: 00m 51s)
  • 23:33 ebernhardson@deploy1001: Synchronized wmf-config/Wikibase.php: T218954: Disable wbcs dispatching query builder on commons (2/3) (duration: 00m 52s)
  • 23:10 ebernhardson@deploy1001: Synchronized wmf-config/: T218954: Disable wbcs dispatching query builder on commons (1/3) (duration: 00m 52s)
  • 22:45 XioNoX: rollback enable sampling on cr2-eqiad external links
  • 22:29 XioNoX: enable sampling on cr2-eqiad external links
  • 22:18 XioNoX: enable sampling on eqiad Telia transit link
  • 22:04 jforrester@deploy1001: Synchronized php-1.33.0-wmf.24/extensions/WikibaseMediaInfo/src/WikibaseMediaInfoHooks.php: WBMI T220277 (duration: 00m 57s)
  • 22:01 XioNoX: pfw firewall rules update - T217355
  • 20:49 bsitzmann@deploy1001: Finished deploy [mobileapps/deploy@c7fa522]: Update mobileapps to cdb9928 (T220045 T219411 T219667) (duration: 07m 55s)
  • 20:41 bsitzmann@deploy1001: Started deploy [mobileapps/deploy@c7fa522]: Update mobileapps to cdb9928 (T220045 T219411 T219667)
  • 20:24 urandom: bootstrapping cassandra-b, restbase2019 -- T208087
  • 20:08 bearND: mobileapps deploy failed on canary (Check 'endpoints' failed). Rolled back canary.
  • 20:08 bsitzmann@deploy1001: Finished deploy [mobileapps/deploy@c7fa522]: Update mobileapps to cdb9928 (T220045 T219411 T219667) (duration: 02m 10s)
  • 20:05 bsitzmann@deploy1001: Started deploy [mobileapps/deploy@c7fa522]: Update mobileapps to cdb9928 (T220045 T219411 T219667)
  • 19:59 marxarelli: promotion of 1.33.0-wmf.24 to all wikis completed. error rates nominal aside from usual timeouts. cc: T206678, T220037
  • 19:51 dduvall@deploy1001: rebuilt and synchronized wikiversions files: all wikis to 1.33.0-wmf.24
  • 19:48 marxarelli: promoting 1.33.0-wmf.24 to all wikis. cc: T220037, T206678
  • 19:41 dduvall@deploy1001: Synchronized php: group1 wikis to 1.33.0-wmf.24 (duration: 01m 46s)
  • 19:41 marxarelli: dduvall@deploy1001 rebuilt and synchronized wikiversions files: group1 wikis to 1.33.0-wmf.24
  • 19:41 marxarelli: dduvall@deploy1001 rebuilt and synchronized wikiversions files: group1 wikis to 1.33.0-wmf.2
  • 19:35 marxarelli: starting promotion of 1.33.0-wmf.24 to group1
  • 18:45 Lucas_WMDE: Morning SWAT done
  • 18:31 bblack: deploying wiktionary CNAME experiment - https://phabricator.wikimedia.org/T208263#5094712
  • 18:27 mobrovac@deploy1001: Finished deploy [restbase/deploy@9cf5364]: Lower AQS rate limits and fix recommendation-api spec - T219910 T220221 (duration: 21m 14s)
  • 18:27 otto@deploy1001: Synchronized wmf-config/InitialiseSettings.php: Enable eventgate-analytics api-request logging for group0 wikis - T214080 (duration: 00m 56s)
  • 18:24 mobrovac: restart pdfrender on scb2001 - T174916
  • 18:13 otto@deploy1001: scap-helm eventgate-analytics finished
  • 18:13 otto@deploy1001: scap-helm eventgate-analytics cluster eqiad completed
  • 18:12 otto@deploy1001: scap-helm eventgate-analytics upgrade production -f eventgate-analytics-eqiad-values.yaml --reset-values stable/eventgate-analytics [namespace: eventgate-analytics, clusters: eqiad]
  • 18:12 otto@deploy1001: scap-helm eventgate-analytics finished
  • 18:12 otto@deploy1001: scap-helm eventgate-analytics cluster codfw completed
  • 18:12 otto@deploy1001: scap-helm eventgate-analytics upgrade production -f eventgate-analytics-codfw-values.yaml --reset-values stable/eventgate-analytics [namespace: eventgate-analytics, clusters: codfw]
  • 18:10 otto@deploy1001: scap-helm eventgate-analytics upgrade production -f eventgate-analytics-codfw-values.yaml --reset-values stable/eventgate-analytics [namespace: eventgate-analytics, clusters: codfw]
  • 18:09 otto@deploy1001: scap-helm eventgate-analytics finished
  • 18:09 otto@deploy1001: scap-helm eventgate-analytics cluster staging completed
  • 18:09 otto@deploy1001: scap-helm eventgate-analytics upgrade staging -f eventgate-analytics-staging-values.yaml --reset-values stable/eventgate-analytics [namespace: eventgate-analytics, clusters: staging]
  • 18:06 mobrovac@deploy1001: Started deploy [restbase/deploy@9cf5364]: Lower AQS rate limits and fix recommendation-api spec - T219910 T220221
  • 17:50 arturo: T220129 renaming labtestmetal2001.codfw.wmnet to clouddb2001-dev.codfw.wmnet
  • 17:42 XioNoX: add swift term to cr1/2-eqiad - T220081
  • 17:14 onimisionipe@deploy1001: Finished deploy [wdqs/wdqs@c30a540]: GUI updates, Updater with redirect fix and Blazegraph with XSS fix (duration: 11m 17s)
  • 17:03 onimisionipe@deploy1001: Started deploy [wdqs/wdqs@c30a540]: GUI updates, Updater with redirect fix and Blazegraph with XSS fix
  • 16:59 mobrovac@deploy1001: Finished deploy [mobileapps/deploy@64f09a0]: Force-deploy to scb1001 to test the config perms (duration: 00m 16s)
  • 16:59 mobrovac@deploy1001: Started deploy [mobileapps/deploy@64f09a0]: Force-deploy to scb1001 to test the config perms
  • 16:55 mholloway-shell@deploy1001: Synchronized wmf-config/InitialiseSettings-labs.php: Replace needed WikimediaEditorTasks Beta Cluster config (T220153) (duration: 00m 58s)
  • 16:31 urandom: bootstrapping cassandra-a, restbase2019 -- T208087
  • 15:35 herron: aborting ores to logstash kafka logging pipeline switchover for now. puppet applied only to ores2009, reverting now
  • 15:19 herron: switching ores to logstash kafka logging pipeline (via temporary puppet disable and rolling puppet agent runs)
  • 15:09 jijiki: Pool mw2206 - T215415
  • 14:55 papaul: powering down mw2206 for DIMM replacement
  • 14:49 otto@deploy1001: Finished deploy [analytics/refinery@7fa6fb7]: deploying oozie article recommender for baho (duration: 18m 35s)
  • 14:45 papaul: powering down elastic2048 for disk replacement
  • 14:30 otto@deploy1001: Started deploy [analytics/refinery@7fa6fb7]: deploying oozie article recommender for baho
  • 14:17 anomie@deploy1001: Synchronized wmf-config/InitialiseSettings.php: Setting actor migration to write-both/read-new on test wikis and mediawikiwiki (T188327) (duration: 00m 59s)
  • 14:06 jijiki: Temporarily serve thumbor traffic on thumbor1001 via haproxy - T187765
  • 13:41 moritzm: upgrading job runners in codfw to HHVM 3.18.5+dfsg-1+wmf8+deb9u2 and wikidiff 1.8.1 (T203069)
  • 12:31 hashar: contint2001: upgraded python-pbr 0.8.2-1 -> 1.10.0-1 # T218559
  • 12:25 moritzm: upgrading API servers in codfw to HHVM 3.18.5+dfsg-1+wmf8+deb9u2 and wikidiff 1.8.1 (T203069)
  • 12:06 arturo: reboot cloudvirt1009 to clean some ACPI errors in dmesg
  • 12:03 arturo: T219776 puppet node deactivate labtestnet2003.codfw.wmnet
  • 12:00 hashar: contint1001 upgraded zuul to 2.5.1-wmf6 # T208426
  • 11:53 hoo@deploy1001: Synchronized wmf-config/Wikibase.php: WikibaseClient: Conditionally enable mapframe support (T218051) (duration: 00m 58s)
  • 11:48 hashar: contint2001: stopping zuul-server , it is not meant to be running there
  • 11:41 hoo@deploy1001: Synchronized wmf-config/abusefilter.php: Enable blocking feature of AbuseFilter in zh.wikipedia (T210364) (duration: 00m 58s)
  • 11:25 hoo@deploy1001: Synchronized wmf-config/InitialiseSettings.php: Create uploader user group for thwiki (T216615) (duration: 00m 58s)
  • 11:12 jijiki: Restarted thumbor services after librsvg upgrade
  • 11:11 fsero: upgrading envoy to 1.9.1 T215810
  • 10:42 jdrewniak@deploy1001: Synchronized portals: Wikimedia Portals Update: Bumping portals to master (T128546) (duration: 00m 58s)
  • 10:41 jdrewniak@deploy1001: Synchronized portals/wikipedia.org/assets: Wikimedia Portals Update: Bumping portals to master (T128546) (duration: 00m 59s)
  • 10:34 moritzm: upgrading app servers in codfw to HHVM 3.18.5+dfsg-1+wmf8+deb9u2 and wikidiff 1.8.1 (T203069)
  • 10:23 jijiki: Running debdeploy to upgrade librsvg
  • 09:43 gehel: force allocation of 3 unassigned shards on elasticsearch / cirrus / eqiad
  • 09:30 arturo: T219776 puppet node clean labtestnet2003.codfw.wmnet
  • 09:20 volans: restarting icinga on icinga1001 - T196336
  • 08:45 moritzm: upgrading API servers mw1221-mw1235 to HHVM 3.18.5+dfsg-1+wmf8+deb9u2 and wikidiff 1.8.1 (T203069)
  • 08:34 akosiaris@deploy1001: scap-helm zotero finished
  • 08:34 akosiaris@deploy1001: scap-helm zotero cluster staging completed
  • 08:34 akosiaris@deploy1001: scap-helm zotero upgrade -f zotero-values-staging.yaml --reset-values staging stable/zotero [namespace: zotero, clusters: staging]
  • 08:32 akosiaris@deploy1001: scap-helm zotero finished
  • 08:32 akosiaris@deploy1001: scap-helm zotero cluster eqiad completed
  • 08:32 akosiaris@deploy1001: scap-helm zotero upgrade -f zotero-values-eqiad.yaml production stable/zotero [namespace: zotero, clusters: eqiad]
  • 08:32 akosiaris: lower CPU, memory limits for zotero pods. Set 1 cpu, 700Mi. This should help the pods to recover faster in some cases. The old memory leak issues we used to have seem to be no longer present
  • 08:31 akosiaris@deploy1001: scap-helm zotero finished
  • 08:31 akosiaris@deploy1001: scap-helm zotero cluster codfw completed
  • 08:31 akosiaris@deploy1001: scap-helm zotero upgrade -f zotero-values-codfw.yaml production stable/zotero [namespace: zotero, clusters: codfw]
  • 08:17 godog: delete fundraising folder from public grafana - T219825
  • 08:01 godog: bounce grafana after https://gerrit.wikimedia.org/r/c/operations/puppet/+/501519
  • 07:59 moritzm: upgrading mw1266-mw1275 to HHVM 3.18.5+dfsg-1+wmf8+deb9u2 and wikidiff 1.8.1 (T203069)
  • 07:59 moritzm: upgrading mw1266-mw1255 to HHVM 3.18.5+dfsg-1+wmf8+deb9u2 and wikidiff 1.8.1 (T203069)
  • 07:24 marostegui@deploy1001: Synchronized wmf-config/db-eqiad.php: Repool all slaves in x1 T217453 (duration: 00m 58s)
  • 07:19 marostegui: Deploy schema change on the first 10 wikis - T217453
  • 07:18 marostegui@deploy1001: Synchronized wmf-config/db-eqiad.php: Depool all slaves in x1 T217453 (duration: 00m 59s)
  • 07:02 moritzm: installing wget security updates
  • 07:02 marostegui@deploy1001: Synchronized wmf-config/db-eqiad.php: Repool all slaves in x1 T143763 (duration: 00m 58s)
  • 06:34 _joe_: restarted netbox, SIGSEGV on HUP-induced reload
  • 05:20 marostegui: Deploy schema change on x1 master with replication, there will be lag on x1 slaves T143763
  • 05:18 marostegui@deploy1001: Synchronized wmf-config/db-eqiad.php: Depool all slaves in x1 T219777 T143763 (duration: 01m 30s)

2019-04-07

  • off: restarted icinga on icinga2001
  • 06:34 oblivian@puppetmaster1001: conftool action : set/pooled=true; selector: dnsdisc=zotero,name=codfw
  • 06:23 _joe_: deleting zotero pods with high memory watermark in codfw
  • 06:03 oblivian@puppetmaster1001: conftool action : set/pooled=false; selector: dnsdisc=zotero,name=codfw

2019-04-06

  • 10:09 gilles: Purging ruwiki namespaces > 0

2019-04-05

  • 23:10 thcipriani: revert some recent problematic gerrit acl changes
  • 22:46 chaomodus: restarted pdfrender on scb1002 T174916
  • 21:45 hashar: thcipriani restarted Gerrit. CI works again # T220243
  • 21:37 thcipriani: restarting gerrit
  • 21:30 hashar: CI / Zuul is no more processing events / T220243
  • 17:29 thcipriani: gerrit back on 2.15.11
  • 17:27 thcipriani: restart gerrit
  • 17:26 thcipriani@deploy1001: Finished deploy [gerrit/gerrit@a4e66d4]: Gerrit to back to 2.15.11 on cobalt (restart incoming) (duration: 00m 11s)
  • 17:26 thcipriani@deploy1001: Started deploy [gerrit/gerrit@a4e66d4]: Gerrit to back to 2.15.11 on cobalt (restart incoming)
  • 17:25 thcipriani@deploy1001: Finished deploy [gerrit/gerrit@a4e66d4]: Gerrit to back to 2.15.11 (on gerrit2001 only) (duration: 00m 10s)
  • 17:25 thcipriani@deploy1001: Started deploy [gerrit/gerrit@a4e66d4]: Gerrit to back to 2.15.11 (on gerrit2001 only)
  • 17:19 krinkle@deploy1001: Synchronized php-1.33.0-wmf.23/includes/diff/TextSlotDiffRenderer.php: Ia326c6 / T220217 (duration: 01m 02s)
  • 17:12 krinkle@deploy1001: Synchronized php-1.33.0-wmf.24/includes/diff/TextSlotDiffRenderer.php: Ia326c6 / T220217 (duration: 01m 00s)
  • 16:02 krinkle@deploy1001: Synchronized php-1.33.0-wmf.24/includes/jobqueue/jobs/RefreshLinksJob.php: Ib1ac31365f9c / T220037 (duration: 00m 59s)
  • 15:58 ejegg: re-enabled recurring donations queue consumer
  • 15:57 krinkle@deploy1001: Synchronized php-1.33.0-wmf.24/extensions/NavigationTiming/: I6b23be / T220156 (duration: 01m 00s)
  • 15:51 krinkle@deploy1001: Synchronized php-1.33.0-wmf.24/extensions/GlobalBlocking/includes/specials/: I5843cd181ca7d (duration: 01m 02s)
  • 15:08 ejegg: upgraded fundraising CiviCRM from 3c55850631 to 83478013a8
  • 15:01 ejegg: disabled recurring donation queue consumer
  • 14:55 papaul: powering down restbase2019 and 2020 for relocation
  • 13:53 gehel@cumin2001: END (FAIL) - Cookbook sre.elasticsearch.rolling-reboot (exit_code=99)
  • 13:45 akosiaris: repool eqiad for all kubernetes services T217426
  • 13:45 akosiaris: ρepool eqiad for all kubernetes services T217426
  • 13:45 akosiaris@puppetmaster1001: conftool action : set/pooled=true; selector: name=eqiad,dnsdisc=citoid
  • 13:45 akosiaris@puppetmaster1001: conftool action : set/pooled=true; selector: name=eqiad,dnsdisc=cxserver
  • 13:45 akosiaris@puppetmaster1001: conftool action : set/pooled=true; selector: name=eqiad,dnsdisc=mathoid
  • 13:44 akosiaris@puppetmaster1001: conftool action : set/pooled=true; selector: name=eqiad,dnsdisc=blubberoid
  • 13:44 akosiaris@puppetmaster1001: conftool action : set/pooled=true; selector: name=eqiad,dnsdisc=zotero
  • 13:41 arturo: T220203 reimage labtestnet2002 as spare in stretch
  • 13:36 arturo: T220101 disable active icinga checks for cloudcontrol2002-dev
  • 13:35 gehel@cumin1001: END (PASS) - Cookbook sre.elasticsearch.force-shard-allocation (exit_code=0)
  • 13:35 gehel@cumin1001: START - Cookbook sre.elasticsearch.force-shard-allocation
  • 13:35 gehel@cumin1001: END (PASS) - Cookbook sre.elasticsearch.force-shard-allocation (exit_code=0)
  • 13:35 gehel@cumin1001: START - Cookbook sre.elasticsearch.force-shard-allocation
  • 12:50 gehel@cumin1001: END (FAIL) - Cookbook sre.elasticsearch.force-shard-allocation (exit_code=99)
  • 12:49 gehel@cumin1001: START - Cookbook sre.elasticsearch.force-shard-allocation
  • 12:48 jijiki: Restarting pybal on lvs1016 and lvs2003 for 496382
  • 12:43 gehel@cumin1001: END (PASS) - Cookbook sre.elasticsearch.force-shard-allocation (exit_code=0)
  • 12:43 gehel@cumin1001: START - Cookbook sre.elasticsearch.force-shard-allocation
  • 12:43 gehel@cumin1001: END (PASS) - Cookbook sre.elasticsearch.force-shard-allocation (exit_code=0)
  • 12:43 gehel@cumin1001: START - Cookbook sre.elasticsearch.force-shard-allocation
  • 12:33 gehel@cumin2001: START - Cookbook sre.elasticsearch.rolling-reboot
  • 12:33 gehel@cumin2001: END (ERROR) - Cookbook sre.elasticsearch.rolling-reboot (exit_code=97)
  • 12:32 akosiaris: depool eqiad for all kubernetes services T217426
  • 12:32 akosiaris@puppetmaster1001: conftool action : set/pooled=false; selector: name=eqiad,dnsdisc=citoid
  • 12:32 akosiaris@puppetmaster1001: conftool action : set/pooled=false; selector: name=eqiad,dnsdisc=cxserver
  • 12:32 akosiaris@puppetmaster1001: conftool action : set/pooled=false; selector: name=eqiad,dnsdisc=mathoid
  • 12:32 akosiaris@puppetmaster1001: conftool action : set/pooled=false; selector: name=eqiad,dnsdisc=blubberoid
  • 12:32 akosiaris@puppetmaster1001: conftool action : set/pooled=false; selector: name=eqiad,dnsdisc=zotero
  • 12:31 akosiaris: repool codfw for all kubernetes services T217426
  • 12:30 gehel@cumin1001: END (PASS) - Cookbook sre.elasticsearch.force-shard-allocation (exit_code=0)
  • 12:30 gehel@cumin1001: START - Cookbook sre.elasticsearch.force-shard-allocation
  • 12:29 akosiaris: repool codfw for all kubernetes services
  • 12:29 akosiaris@puppetmaster1001: conftool action : set/pooled=true; selector: name=codfw,dnsdisc=citoid
  • 12:29 akosiaris@puppetmaster1001: conftool action : set/pooled=true; selector: name=codfw,dnsdisc=cxserver
  • 12:29 akosiaris@puppetmaster1001: conftool action : set/pooled=true; selector: name=codfw,dnsdisc=mathoid
  • 12:29 akosiaris@puppetmaster1001: conftool action : set/pooled=true; selector: name=codfw,dnsdisc=blubberoid
  • 12:29 akosiaris@puppetmaster1001: conftool action : set/pooled=true; selector: name=codfw,dnsdisc=zotero
  • 12:18 gehel@cumin2001: START - Cookbook sre.elasticsearch.rolling-reboot
  • 12:18 gehel@cumin2001: END (ERROR) - Cookbook sre.elasticsearch.rolling-reboot (exit_code=97)
  • 12:15 gehel@cumin1001: END (PASS) - Cookbook sre.elasticsearch.force-shard-allocation (exit_code=0)
  • 12:15 gehel@cumin1001: START - Cookbook sre.elasticsearch.force-shard-allocation
  • 12:12 bblack: repool esams
  • 12:04 gehel@cumin2001: START - Cookbook sre.elasticsearch.rolling-reboot
  • 11:53 bblack: esams depooled in DNS
  • 11:37 jijiki: Restarting pybal on lvs1006 and lvs2006 for 496382
  • 11:27 gehel@cumin2001: END (FAIL) - Cookbook sre.elasticsearch.rolling-reboot (exit_code=99)
  • 10:57 arturo: updating puppet catalog compiler facts
  • 10:42 elukey: restart druid broker on druid100[5,6] - exceptions in the logs after old datasource removal
  • 10:41 elukey: restart druid broker on druid1004 - exceptions in the logs after old datasource removal
  • 10:10 gehel@cumin2001: START - Cookbook sre.elasticsearch.rolling-reboot
  • 10:10 gehel@cumin2001: END (ERROR) - Cookbook sre.elasticsearch.rolling-reboot (exit_code=97)
  • 09:27 gehel@cumin2001: START - Cookbook sre.elasticsearch.rolling-reboot
  • 09:27 gehel@cumin2001: END (ERROR) - Cookbook sre.elasticsearch.rolling-reboot (exit_code=97)
  • 09:26 gehel@cumin1001: END (PASS) - Cookbook sre.elasticsearch.force-shard-allocation (exit_code=0)
  • 09:26 gehel@cumin1001: START - Cookbook sre.elasticsearch.force-shard-allocation
  • 08:57 akosiaris: depool codfw kubernetes apps from discovery in preparation for upgrade
  • 08:57 akosiaris@puppetmaster1001: conftool action : set/pooled=false; selector: name=codfw,dnsdisc=citoid
  • 08:57 akosiaris@puppetmaster1001: conftool action : set/pooled=false; selector: name=codfw,dnsdisc=cxserver
  • 08:57 akosiaris@puppetmaster1001: conftool action : set/pooled=false; selector: name=codfw,dnsdisc=mathoid
  • 08:57 akosiaris@puppetmaster1001: conftool action : set/pooled=false; selector: name=codfw,dnsdisc=blubberoid
  • 08:57 akosiaris@puppetmaster1001: conftool action : set/pooled=false; selector: name=codfw,dnsdisc=zotero
  • 08:55 arturo: T220101 reimaging+renaming labtestservices2002 to cloudservices2002-dev
  • 08:43 akosiaris: upgrade kubernetes staging cluster to 1.11.9
  • 08:32 elukey: roll restart of aqs on aqs100* to pick up new druid settings
  • 08:07 marostegui@deploy1001: Synchronized wmf-config/db-eqiad.php: Fully repool db1075 (duration: 00m 59s)
  • 08:06 gehel@cumin2001: START - Cookbook sre.elasticsearch.rolling-reboot
  • 07:51 elukey: restart gerrit on cobalt (timeouts and general slowdown)
  • 07:34 jijiki: Repooling thumbor1004 until we replace its memory - T215411
  • 07:18 moritzm: upgrading mw1262-mw1265 to HHVM 3.18.5+dfsg-1+wmf8+deb9u2 and wikidiff 1.8.1 (T203069)
  • 06:56 marostegui@deploy1001: Synchronized wmf-config/db-eqiad.php: More traffic to db1075 (duration: 00m 57s)
  • 06:04 marostegui@deploy1001: Synchronized wmf-config/db-eqiad.php: More traffic to db1075 (duration: 01m 00s)
  • 05:32 marostegui@deploy1001: Synchronized wmf-config/db-eqiad.php: Repool db1075 with low weight (duration: 00m 58s)
  • 05:15 marostegui: Fully upgrade and reboot db1075
  • 05:14 marostegui@deploy1001: Synchronized wmf-config/db-eqiad.php: Depool db1075 (duration: 00m 59s)
  • 04:49 gilles: T216594 Start purge of namespace 0 on ruwiki
  • 02:27 eileen: update civicrm revision changed from 7560af93df to 3c55850631, config revision is 9ad5ef3e15
  • 00:09 bd808@deploy1001: Synchronized wmf-config/InitialiseSettings.php: SWAT: wikitech: Lock LDAP accounts when users are blocked, Disable Phabricator accounts when blocked on wikitech (T168692) 2/2 (duration: 00m 57s)
  • 00:07 bd808@deploy1001: Synchronized wmf-config/wikitech.php: SWAT: wikitech: Lock LDAP accounts when users are blocked, Disable Phabricator accounts when blocked on wikitech (T168692) (duration: 00m 59s)

2019-04-04

  • 23:52 bd808@deploy1001: Synchronized php-1.33.0-wmf.23/extensions/LdapAuthentication: SWAT: Also set an LDAP password policy on Block (T168692) (duration: 01m 01s)
  • 23:38 bd808@deploy1001: Synchronized wmf-config/InitialiseSettings.php: SWAT: Add smn and sms to wmgExtraLanguageNames (T220118) (duration: 01m 02s)
  • 21:22 XioNoX: renumber AS58587 to AS10075 in eqsin
  • 21:17 bblack: DNS deploying https://gerrit.wikimedia.org/r/c/operations/dns/+/500731 which can affect resolution of our CNAME records. If dns-related issues, can revert at will!
  • 21:09 herron: restarting eqiad ELK stack for security updates
  • 20:45 marxarelli: promotion of 1.33.0-wmf.24 rolled back to group0 and holding. cc: T206678, T220037
  • 20:41 dduvall@deploy1001: rebuilt and synchronized wikiversions files: Revert "group2/group1 wikis to 1.33.0-wmf.24"
  • 20:36 marxarelli: rolling back again following still high rates of DBTransactionError (avg ~ 800/min)
  • 20:16 dduvall@deploy1001: rebuilt and synchronized wikiversions files: all wikis to 1.33.0-wmf.24
  • 20:11 marxarelli: promoting 1.33.0-wmf.24 to all wikis
  • 20:11 marxarelli: error rates look good after proper syncs and re-deploy. cc: T220037
  • 20:06 dduvall@deploy1001: Synchronized php-1.33.0-wmf.24/extensions/Citoid/modules/ve.ui.Citoid.init.js: sync for https://gerrit.wikimedia.org/r/c/mediawiki/extensions/Citoid/+/501114 (duration: 00m 58s)
  • 20:04 dduvall@deploy1001: Synchronized php-1.33.0-wmf.24/extensions/LdapAuthentication/LdapAuthenticationPlugin.php: sync for https://gerrit.wikimedia.org/r/c/mediawiki/extensions/LdapAuthentication/+/500994 (duration: 00m 57s)
  • 20:03 dduvall@deploy1001: Synchronized php-1.33.0-wmf.24/extensions/LdapAuthentication/LdapAuthenticationHooks.php: sync for https://gerrit.wikimedia.org/r/c/mediawiki/extensions/LdapAuthentication/+/500994 (duration: 00m 58s)
  • 20:02 dduvall@deploy1001: Synchronized php-1.33.0-wmf.24/extensions/LdapAuthentication/LdapAuthentication.php: sync for https://gerrit.wikimedia.org/r/c/mediawiki/extensions/LdapAuthentication/+/500994 (duration: 00m 58s)
  • 19:58 dduvall@deploy1001: Synchronized php-1.33.0-wmf.24/extensions/EventBus/includes/JobExecutor.php: syncing JobExecutor changes (duration: 00m 58s)
  • 19:55 dduvall@deploy1001: Synchronized php: group1 wikis to 1.33.0-wmf.24 (duration: 01m 47s)
  • 19:53 dduvall@deploy1001: rebuilt and synchronized wikiversions files: group1 wikis to 1.33.0-wmf.24
  • 19:51 marxarelli: re-deploying to group1 after proper syncs
  • 19:47 dduvall@deploy1001: Synchronized php-1.33.0-wmf.24/extensions/Citoid/modules/ve.ui.Citoid.init.js: (no justification provided) (duration: 00m 59s)
  • 19:46 dduvall@deploy1001: Synchronized php-1.33.0-wmf.24/extensions/EventBus/includes/JobExecutor.php: (no justification provided) (duration: 00m 58s)
  • 19:45 dduvall@deploy1001: Synchronized php-1.33.0-wmf.24/extensions/LdapAuthentication/LdapAuthenticationPlugin.php: (no justification provided) (duration: 00m 58s)
  • 19:44 dduvall@deploy1001: Synchronized php-1.33.0-wmf.24/extensions/LdapAuthentication/LdapAuthenticationHooks.php: (no justification provided) (duration: 00m 59s)
  • 19:43 dduvall@deploy1001: Synchronized php-1.33.0-wmf.24/extensions/LdapAuthentication/LdapAuthentication.php: (no justification provided) (duration: 00m 59s)
  • 19:19 dduvall@deploy1001: rebuilt and synchronized wikiversions files: Revert "group1 wikis to 1.33.0-wmf.24"
  • 19:13 marxarelli: large spike in DBTransactionError errors. rolling back. cc: T220037
  • 19:12 dduvall@deploy1001: Synchronized php: group1 wikis to 1.33.0-wmf.24 (duration: 01m 46s)
  • 19:10 dduvall@deploy1001: rebuilt and synchronized wikiversions files: group1 wikis to 1.33.0-wmf.24
  • 19:06 marxarelli: fetch/rebase looks good, incorporates fixes for T220037, T219510. deploying
  • 19:03 marxarelli: preparing to promote 1.33.0-wmf.24 to group1
  • 18:33 catrope@deploy1001: Synchronized wmf-config/InitialiseSettings.php: Enable partial blocks on frwiki, plwiki (T219327, T219218) (duration: 00m 58s)
  • 18:23 catrope@deploy1001: Synchronized wmf-config/InitialiseSettings.php: Enable ORES RCFilters on eswikiquote (T219160) (duration: 01m 02s)
  • 18:13 moritzm: restarted apache on people.wikimedia.org to pick up OpenSSL update
  • 17:59 bstorm_: stopped postgresql on labsdb1006.eqiad.wmnet and moved the database master functionality (and all rsyncs) to clouddb1003.clouddb-services.eqiad.wmflabs
  • 17:59 ppchelko@deploy1001: Finished deploy [cpjobqueue/deploy@922cbc0]: Switch to new logging infrastructure T211125 (duration: 04m 03s)
  • 17:55 ppchelko@deploy1001: Started deploy [cpjobqueue/deploy@922cbc0]: Switch to new logging infrastructure T211125
  • 17:47 ppchelko@deploy1001: Finished deploy [changeprop/deploy@f69dc9c]: Switch to new logging infrastructure T211125 (duration: 01m 44s)
  • 17:45 ppchelko@deploy1001: Started deploy [changeprop/deploy@f69dc9c]: Switch to new logging infrastructure T211125
  • 17:33 jynus: stopping replication on dbstore2001:s8 for backup testing T206203
  • 17:29 jynus: killing ongoing backup at dbprov2002, stuck
  • 17:28 gehel@cumin2001: END (ERROR) - Cookbook sre.elasticsearch.rolling-reboot (exit_code=97)
  • 17:10 gehel@cumin2001: START - Cookbook sre.elasticsearch.rolling-reboot
  • 16:31 herron: beginning rolling kafka restarts on kafka200[123] for security updates
  • 16:01 herron: repooling kafka2003 eventbus
  • 15:59 mutante: wikivoyage-old.org domain has been retired and deactivated (T219867, T81727)
  • 15:56 herron: depooling kafka2003 for eventbus security updates
  • 15:55 herron: repooling kafka2002 eventbus
  • 15:52 herron: depooling kafka2002 for eventbus security updates
  • 15:52 herron: pooling kafka2001 eventbus
  • 15:42 herron: depooling kafka2001 for eventbus security updates
  • 15:38 moritzm: rolling restart of proton to pick up openssl security update
  • 15:03 gehel@cumin2001: END (PASS) - Cookbook sre.elasticsearch.rolling-reboot (exit_code=0)
  • 14:59 moritzm: installing libdatetime-timezone-perl updates
  • 14:24 jiji@cumin1001: conftool action : set/pooled=no; selector: dc=codfw,service=cxserver,cluster=scb,name=scb.*
  • 14:24 jiji@cumin1001: conftool action : set/pooled=no; selector: dc=eqiad,service=cxserver,cluster=scb,name=scb.*
  • 14:23 jijiki: Depooling scb* from service cxserver traffic
  • 13:46 gehel@cumin2001: START - Cookbook sre.elasticsearch.rolling-reboot
  • 13:46 ladsgroup@deploy1001: Synchronized wmf-config/interwiki.php: Update interwiki cache (duration: 03m 37s)
  • 13:29 jbond42: restart of gerrit apache service will occure at 13:40
  • 13:28 volans: upgraded spicerack to 0.0.22 on cumin[12]001
  • 13:27 volans: uploaded spicerack_0.0.22-1_amd64.deb to apt.wikimedia.org stretch-wikimedia
  • 13:23 moritzm: upgrading mw1261 to HHVM 3.18.5+dfsg-1+wmf8+deb9u2 / wikidiff 1.8.1
  • 13:20 jijiki: Stopped all citoid services from scb* - 494215
  • 13:15 jbond42: restart of phabricator apache service will occure at 14:25
  • 12:46 moritzm: uploaded HHVM 3.18.5+dfsg-1+wmf8+deb9u2 to apt.wikimedia.org/stretch-wikimedia
  • 12:10 arturo: T219626 reimaging cloudcontrol2001-dev again
  • 11:43 moritzm: upgrading HHVM on mwdebug servers in eqiad along with update to hhvm-wikidiff 1.8.1
  • 11:35 moritzm: uploaded nodejs 10.15.2~dfsg-1+wmf1 to the component/node10 component of apt.wikimedia.org/stretch-wikimedia (updated to latest 10.x release and a change to ensure zlib binary compat with NodeSource) (T215562)
  • 11:34 Amir1: EU SWAT is done
  • 11:32 ladsgroup@deploy1001: Synchronized wmf-config/CommonSettings.php: SWAT: Add mediawiki.org to the URL shortener whitelist (duration: 00m 58s)
  • 11:28 jbond42: rolling security updates for apache on jessie
  • 11:25 ladsgroup@deploy1001: Synchronized wmf-config/InitialiseSettings.php: SWAT: Enable ReferencePreviews beta feature on de- and ar-wiki (T218766) (duration: 01m 00s)
  • 11:21 arturo: T219626 reimaging cloudcontrol2001-dev again
  • 11:08 arturo: drop python-psutil from jessie-wikimedia/openstack-mitaka-jessie, related to T219626
  • 10:56 moritzm: uploaded hhvm-wikidiff 1.8.1 to apt.wikimedia.org/stretch-wikimedia (source package is named php-wikdiff2 for legacy reasons) (T203069)
  • 10:21 arturo: T219626 reimaging cloudcontrol2001-dev again
  • 10:01 moritzm: installing openssl1.0 security updates on stretch-based DB hosts
  • 08:36 moritzm: rolling restart of parsoid to pick up OpenSSL security update
  • 08:06 moritzm: uploaded Apache 2.4.10-10+deb8u14+wmf1 to apt.wikimedia.org/jessie-wikimedia (latest jessie security update rebased with our local patches)
  • 05:39 marostegui: Stop MySQL on db2033 for decommission - T219493
  • 05:32 marostegui: Remove db2033 from tendril and zarcillo - T219493
  • 05:19 marostegui@deploy1001: Synchronized wmf-config/db-eqiad.php: Remove db2033 for decommission T219493 (duration: 00m 59s)
  • 05:18 marostegui@deploy1001: Synchronized wmf-config/db-codfw.php: Remove db2033 for decommission T219493 (duration: 00m 59s)
  • 04:58 marostegui: Deploy schema change on labswiki for the job table - T219887
  • 00:40 chaomodus: restart pdfrender on scb1003 - T174916

2019-04-03

  • 23:51 catrope@deploy1001: Synchronized wmf-config/InitialiseSettings.php: Enable Flow beta feature on zhwikisource (T219588) (duration: 00m 58s)
  • 23:50 catrope@deploy1001: Synchronized dblists/flow.dblist: Enable Flow on zhwikisource (T219588) (duration: 00m 57s)
  • 23:38 catrope@deploy1001: Synchronized wmf-config/InitialiseSettings.php: Enable GrowthExperiments homepage EventLogging on testwiki (duration: 00m 59s)
  • 23:20 catrope@deploy1001: Synchronized wmf-config/InitialiseSettings.php: Configure GrowthExperiments homepage tutorial pages on cswiki, kowiki, viwiki (dark deploy) (duration: 00m 59s)
  • 23:18 catrope@deploy1001: sync-file aborted: (no justification provided) (duration: 00m 00s)
  • 23:14 catrope@deploy1001: Synchronized wmf-config/InitialiseSettings.php: Configure GrowthExperiments homepage on testwiki (duration: 01m 01s)
  • 21:32 elukey: start hadoop-hdfs-namenode on an-master1002 after outage due to big job hitting HDFS
  • 20:40 gehel: excluding elastic2048 from cluster and depooling - T220038
  • 20:29 arlolra: Updated Parsoid to 0b3bb10 (T219337)
  • 20:20 arlolra@deploy1001: Finished deploy [parsoid/deploy@4f740e3]: Updating Parsoid to 0b3bb10 (duration: 05m 44s)
  • 20:14 arlolra@deploy1001: Started deploy [parsoid/deploy@4f740e3]: Updating Parsoid to 0b3bb10
  • 20:09 marxarelli: 1.33.0-wmf.24 is holding at group0 following rollback. filed T220037. cc: T206678
  • 19:56 marxarelli: log correction group1 reverted to 1.33.0-wmf.23
  • 19:56 dduvall@deploy1001: rebuilt and synchronized wikiversions files: Revert group1 to 1.33.0-wmf.24
  • 19:55 marxarelli: 111,185 and counting DBTransactionError for jobrunner.discovery.wmnet
  • 19:53 marxarelli: rolling back group1
  • 19:53 marxarelli: massive spike in DBTransactionError ([{exception_id}] {exception_url} Wikimedia\Rdbms\DBTransactionError from line 246 of /srv/mediawiki/php-1.33.0-wmf.24/includes/libs/rdbms/lbfactory/LBFactory.php: RefreshLinksJob::runForTitle: transaction round 'RefreshLinksJob::run' already started.)
  • 19:51 dduvall@deploy1001: Synchronized php: group1 wikis to 1.33.0-wmf.24 (duration: 01m 49s)
  • 19:50 marxarelli: dduvall@deploy1001 rebuilt and synchronized wikiversions files: group1 wikis to 1.33.0-wmf.24
  • 19:34 smalyshev@deploy1001: Finished deploy [wdqs/wdqs@50b2af9]: Deploy new Updater for more cache-friendly update startegy (duration: 10m 54s)
  • 19:23 smalyshev@deploy1001: Started deploy [wdqs/wdqs@50b2af9]: Deploy new Updater for more cache-friendly update startegy
  • 18:14 thcipriani: gerrit back on 2.15.12
  • 18:12 thcipriani: restarting gerrit for 2.15.12 update
  • 18:11 thcipriani@deploy1001: Finished deploy [gerrit/gerrit@e416edf]: Gerrit to 2.15.12 on cobalt (restart to follow) (duration: 00m 11s)
  • 18:11 thcipriani@deploy1001: Started deploy [gerrit/gerrit@e416edf]: Gerrit to 2.15.12 on cobalt (restart to follow)
  • 18:09 thcipriani@deploy1001: Finished deploy [gerrit/gerrit@e416edf]: Gerrit to 2.15.12 on gerrit2001 only (duration: 00m 11s)
  • 18:09 thcipriani@deploy1001: Started deploy [gerrit/gerrit@e416edf]: Gerrit to 2.15.12 on gerrit2001 only
  • 17:57 elukey: restart hadoop-hdfs-namenode on an-master1001 as precautionary measure after the outage (currently standby)
  • 17:44 herron: shortly postponing restarts of eventbus and kafka services for security updates due to unrelated firefighting - repooling kafka1001
  • 17:19 elukey: restart hadoop-hdfs-namenode on an-master1002 after forced shutdown due to errors
  • 17:14 herron: depooling kafka1001 to restart eventbus and kafka services for security updates
  • 17:04 Lucas_WMDE: EU SWAT done
  • 17:04 Lucas_WMDE: lucaswerkmeister-wmde@mwmaint1002:~$ mwscript maintenance/namespaceDupes.php --wiki=srwiki --fix # T214428 – 0 pages to fix, 0 links to fix, Looks good!
  • 17:03 lucaswerkmeister-wmde@deploy1001: Synchronized wmf-config/throttle.php: SWAT: Add new throttle rule (T220001) (duration: 00m 58s)
  • 17:00 lucaswerkmeister-wmde@deploy1001: Synchronized php-1.33.0-wmf.24/extensions/EventBus: SWAT: Incorrect order of calls in createPageDeleteEvent. (duration: 00m 59s)
  • 16:51 gehel@cumin2001: END (FAIL) - Cookbook sre.elasticsearch.rolling-reboot (exit_code=99)
  • 16:44 gehel@cumin2001: START - Cookbook sre.elasticsearch.rolling-reboot
  • 16:37 Lucas_WMDE: lucaswerkmeister-wmde@mwmaint1002:~$ mwscript maintenance/namespaceDupes.php --wiki=idwiktionary --fix # T218796 – 41 links to fix, 41 were resolvable, Looks good!
  • 16:36 lucaswerkmeister-wmde@deploy1001: Synchronized wmf-config/InitialiseSettings.php: Add namespace "Lampiran" at id.wiktionary (T218796) (duration: 00m 59s)
  • 16:29 lucaswerkmeister-wmde@deploy1001: Synchronized wmf-config/InitialiseSettings.php: SWAT: Enable Draft namespace on srwiki (T214428) (duration: 01m 00s)
  • 16:22 lucaswerkmeister-wmde@deploy1001: Synchronized wmf-config/InitialiseSettings.php: SWAT: Add three domains at wgCopyUploadDomains (T216886, T219075) (duration: 01m 00s)
  • 16:16 lucaswerkmeister-wmde@deploy1001: Synchronized wmf-config/flaggedrevs.php: SWAT: Remove namespace 104 from FlaggedRevs configuration for arwiki (T217507) (duration: 01m 00s)
  • 15:18 volans: shutdown ms-be2026 for firmware upgrade - T219854
  • 15:16 volans@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
  • 15:16 volans@cumin1001: START - Cookbook sre.hosts.downtime
  • 15:00 anomie@mwmaint1002: Fixing empty values for 'target_author_actor' in log_search on wikitech for T215525
  • 15:00 anomie@mwmaint1002: Fixing empty values for 'target_author_actor' in log_search on section 8 wikis for T215525
  • 15:00 anomie@mwmaint1002: Fixing empty values for 'target_author_actor' in log_search on section 7 wikis for T215525
  • 15:00 anomie@mwmaint1002: Fixing empty values for 'target_author_actor' in log_search on section 6 wikis for T215525
  • 15:00 anomie@mwmaint1002: Fixing empty values for 'target_author_actor' in log_search on section 5 wikis for T215525
  • 15:00 anomie@mwmaint1002: Fixing empty values for 'target_author_actor' in log_search on section 4 wikis for T215525
  • 15:00 anomie@mwmaint1002: Fixing empty values for 'target_author_actor' in log_search on remaining section 3 wikis for T215525
  • 15:00 anomie@mwmaint1002: Fixing empty values for 'target_author_actor' in log_search on section 2 wikis for T215525
  • 14:59 anomie@mwmaint1002: Fixing empty values for 'target_author_actor' in log_search on section 1 wikis for T215525
  • 14:56 anomie@deploy1001: Synchronized php-1.33.0-wmf.24/maintenance/includes/MigrateActors.php: Backporting fix from gerrit:500754 (duration: 01m 01s)
  • 14:55 anomie@deploy1001: Synchronized php-1.33.0-wmf.23/maintenance/includes/MigrateActors.php: Backporting fix from gerrit:500754 (duration: 01m 01s)
  • 14:18 marostegui: Stop replication on pc2007 for testing - T210725
  • 14:03 andrewbogott: restarting rabbitmq on cloudcontrol1003
  • 13:59 andrewbogott: restarting neutron-l3-agent on cloudnet1003 and cloudnet1004
  • 13:46 andrewbogott: restarting neutron-metadata-agent on cloudnet1003
  • 13:44 gilles@deploy1001: Synchronized php-1.33.0-wmf.23/includes/media/MediaTransformOutput.php: T216499 Identify images that should have had high importance (duration: 00m 59s)
  • 13:34 moritzm: reverting dbmonitor2001 to deb8u12+wmf1 build
  • 13:02 gehel@cumin2001: END (FAIL) - Cookbook sre.elasticsearch.rolling-reboot (exit_code=99)
  • 13:01 gehel@cumin2001: START - Cookbook sre.elasticsearch.rolling-reboot
  • 12:49 gehel@cumin2001: END (FAIL) - Cookbook sre.elasticsearch.rolling-reboot (exit_code=99)
  • 12:45 gehel@cumin2001: START - Cookbook sre.elasticsearch.rolling-reboot
  • 12:42 arturo: T219626 reimaging cloudcontrol2001-dev
  • 12:31 mutante: restarting gerrit service to apply change 498431
  • 11:25 Amir1: EU SWAT is done
  • 11:16 jbond42: rolling security updates for apache
  • 10:29 mutante: planet1001/2001 - apt autoremove un-required packages
  • 10:27 mutante: planet1001/2001 - upgrade apache2, openssh, locales, rsyslog ..
  • 10:25 arturo: updating puppet compiler facts
  • 10:19 volans: upgraded spicerack to 0.0.21 on cumin[12]001
  • 10:17 volans: uploaded spicerack_0.0.21-1_amd64.deb to apt.wikimedia.org stretch-wikimedia
  • 09:56 marostegui: Alter empty job table on s6 primary master - T219887
  • 09:55 moritzm: upgrading beta to hhvm wikidiff 1.8.1 (T203069)
  • 09:54 mutante: running mysql select queries on m3-slave to get data from phabricator conpherence as requested by andre
  • 09:45 moritzm: removed labtestnet2003.codfw.wmnet from debmonitor (T219776)
  • 09:29 ema: cp-ats-codfw: test ATS rolling restart T213263
  • 09:27 marostegui: Drop wikishared.wikimedia_editor_tasks_entity_description_exists table from x1 T219963
  • 09:27 marostegui@deploy1001: Synchronized wmf-config/db-eqiad.php: Repool s8 sanitarium master (duration: 00m 56s)
  • 09:26 marostegui@deploy1001: Synchronized wmf-config/db-codfw.php: Repool s8 sanitarium master (duration: 01m 00s)
  • 08:35 jynus: merging change on network constants (firewall operation)
  • 08:23 marostegui: Restart mysql on sanitarium hosts db1124 db1125 db2094 db2095 - T218302
  • 08:18 marostegui: Stop replication on db2082 and db1087 (s8 sanitarium masters) T218302
  • 08:17 marostegui@deploy1001: Synchronized wmf-config/db-codfw.php: Depool s8 sanitarium master (duration: 00m 57s)
  • 08:16 marostegui@deploy1001: Synchronized wmf-config/db-eqiad.php: Depool s8 sanitarium master (duration: 00m 58s)
  • 08:09 moritzm: installing new apache packages on mmw1261
  • 07:53 gilles@deploy1001: Synchronized php-1.33.0-wmf.24/includes/media/ThumbnailImage.php: T216499 Only apply high priority hint half the time (duration: 00m 58s)
  • 07:51 moritzm: installing new apache packages on mwdebug
  • 07:42 marostegui: Reboot db1115 - tendril and dbtree will be down
  • 07:40 marostegui: DIsable event scheduler on db1115 before restarting - tendril is stuck
  • 07:26 marostegui@deploy1001: Synchronized wmf-config/db-eqiad.php: Repool db1120 T219493 (duration: 00m 57s)
  • 07:25 marostegui: Deploy schema change on db1073, labtestwiki - T219887
  • 07:09 marostegui: Stop replication in sync on db1120 and db2034 (x1 codfw master) - T219493
  • 07:07 marostegui@deploy1001: Synchronized wmf-config/db-eqiad.php: Depool db1120 T219493 (duration: 01m 13s)
  • 06:04 _joe_: restart varnish backend on cp1085, causing unavailability
  • 05:57 marostegui: Fix data drifts on bnwikisource on x1 - T219493
  • 05:31 marostegui@deploy1001: Synchronized wmf-config/db-eqiad.php: Repool pc1007 (duration: 00m 59s)
  • 05:23 marostegui: Upgrade pc1007
  • 05:23 marostegui@deploy1001: Synchronized wmf-config/db-eqiad.php: Depool pc1007 for upgrade (duration: 01m 00s)

2019-04-02

  • 23:44 jforrester@deploy1001: Synchronized wmf-config/InitialiseSettings.php: SWAT enwiki: Restrict move-categorypages to +extendedmover/+sysop/+bot T219261 (duration: 00m 58s)
  • 23:30 jforrester@deploy1001: Synchronized wmf-config/InitialiseSettings.php: SWAT Add new WMCS IP range to wgRateLimitsExcludedIps T167432 (duration: 00m 57s)
  • 23:27 jforrester@deploy1001: Synchronized wmf-config/InitialiseSettings.php: SWAT Enable SandboxLink for rowiki T219855 (duration: 00m 56s)
  • 23:22 jforrester@deploy1001: Synchronized wmf-config/InitialiseSettings.php: SDC: Add 'depicts' statements to search index on testcommons (duration: 00m 59s)
  • 21:27 andrewbogott: rebooting labservices1001
  • 21:16 andrewbogott: rebooting labservices1002
  • 20:54 andrewbogott: restarting pdns and pdns-recursor on labservices1001 and 1002 in hopes of getting those machines to act a bit less sluggish
  • 20:23 krinkle@deploy1001: Synchronized php-1.33.0-wmf.24/skins/Vector/includes/: I6e04b512d / T219864 (duration: 00m 59s)
  • 20:20 krinkle@deploy1001: Synchronized php-1.33.0-wmf.23/skins/Vector/includes/: I6e04b512d / T219864 (duration: 01m 00s)
  • 20:16 marxarelli: 1.33.0-wmf.24 successfully deployed to group0. errors rates look normal (T206678)
  • 20:07 dduvall@deploy1001: rebuilt and synchronized wikiversions files: Group0 to 1.33.0-wmf.24
  • 19:57 dduvall@deploy1001: Finished scap: testwiki to php-1.33.0-wmf.24 and rebuild l10n cache (duration: 44m 20s)
  • 19:12 dduvall@deploy1001: Started scap: testwiki to php-1.33.0-wmf.24 and rebuild l10n cache
  • 18:41 ppchelko@deploy1001: Finished deploy [restbase/deploy@2cb53a7]: Kafka logging pipeline, full deploy T211125 (duration: 20m 49s)
  • 18:22 marxarelli: cutting mediawiki branch 1.33.0-wmf.24 (T206678)
  • 18:22 marxarelli: cutting mediawiki branch 1.33.0-wmf.24
  • 18:20 ppchelko@deploy1001: Started deploy [restbase/deploy@2cb53a7]: Kafka logging pipeline, full deploy T211125
  • 18:20 ppchelko@deploy1001: deploy aborted: Kafka logging pipeline, full deploy T211125 (duration: 00m 03s)
  • 18:20 ppchelko@deploy1001: Started deploy [restbase/deploy@2cb53a7]: Kafka logging pipeline, full deploy T211125
  • 18:09 ppchelko@deploy1001: Finished deploy [restbase/deploy@2cb53a7]: Kafka logging pipeline, canary on restbase2010 T211125 (duration: 02m 33s)
  • 18:06 ppchelko@deploy1001: Started deploy [restbase/deploy@2cb53a7]: Kafka logging pipeline, canary on restbase2010 T211125
  • 17:59 ppchelko@deploy1001: Finished deploy [restbase/deploy@2cb53a7] (dev-cluster): Kafka logging pipeline, dev cluster only T211125 (duration: 03m 25s)
  • 17:56 ppchelko@deploy1001: Started deploy [restbase/deploy@2cb53a7] (dev-cluster): Kafka logging pipeline, dev cluster only T211125
  • 17:51 ppchelko@deploy1001: Finished deploy [restbase/deploy@3dcf328]: Upgrade swagger to v3, attempt 2, T218218 (duration: 20m 47s)
  • 17:37 ejegg: updated payments-wiki-staging from 793bce1a5f to 15bcb3d1a6
  • 17:30 ppchelko@deploy1001: Started deploy [restbase/deploy@3dcf328]: Upgrade swagger to v3, attempt 2, T218218
  • 17:30 ppchelko@deploy1001: Finished deploy [restbase/deploy@3dcf328] (dev-cluster): Upgrade swagger to v3, attempt 2, T218218 (duration: 03m 02s)
  • 17:27 ppchelko@deploy1001: Started deploy [restbase/deploy@3dcf328] (dev-cluster): Upgrade swagger to v3, attempt 2, T218218
  • 16:47 XioNoX: - replacing accepted-prefix-limit with prefix-limit in eqsin - T211730
  • 16:44 ppchelko@deploy1001: Finished deploy [restbase/deploy@6026ad1]: Switch to swagger 3 T218218 (duration: 04m 52s)
  • 16:39 ppchelko@deploy1001: Started deploy [restbase/deploy@6026ad1]: Switch to swagger 3 T218218
  • 16:36 XioNoX: - replacing accepted-prefix-limit with prefix-limit on esams - T211730
  • 16:12 XioNoX: - replacing accepted-prefix-limit with prefix-limit on cr2-eqiad - T211730
  • 16:02 mutante: T194174 - bump. started alerting again 2 days ago
  • 16:00 mutante: icinga - schedule (30d) downtime for kubernetes operational latencies alerts (T219696) on kubernetes1004
  • 15:57 arturo: T219626 reimaging cloudcontrol2001-dev again
  • 15:55 mutante: scandium - systemctl start parsoid-vd was failed (T201366)
  • 15:55 herron: beginning rolling upgrade of codfw ELK cluster to 5.6.15 T219571
  • 15:52 mutante: icinga - re-enabling notifications for scandium. setup task is resolved yet systemd is alerting, should not have been turned off anymore (T201366)
  • 15:39 XioNoX: repool eqsin - T219847
  • 15:32 jbond42: add cpp-hocon 0.1.6 to jessie-wikimedia/backports
  • 15:21 jforrester@deploy1001: Synchronized wmf-config/InitialiseSettings.php: VE: Enable mobile section editing A/B test on all remaining wikis T219564 (duration: 00m 51s)
  • 15:07 moritzm: stopped/disabled ipmievd on cumin2001
  • 14:54 jbond42: add leatherman 1.4 to jessie-wikimedia/backports
  • 13:44 anomie@mwmaint1002: Fixing empty values for 'target_author_actor' in log_search on test wikis and mediawikiwiki for T215525
  • 13:24 volans: reboot ms-be2026 to see if that fixes the controller - T219854
  • 13:23 volans@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
  • 13:23 volans@cumin1001: START - Cookbook sre.hosts.downtime
  • 13:20 volans@cumin1001: END (FAIL) - Cookbook sre.hosts.downtime (exit_code=99)
  • 13:20 volans@cumin1001: START - Cookbook sre.hosts.downtime
  • 13:20 jynus: updating puppet compiler facts
  • 12:11 arturo: icinga downtime toolschecker for 1 month T219243
  • 12:07 hashar: contint1001: compressing some MediaWiki debugging logs under /srv/jenkins/builds # T219850
  • 11:42 moritzm: restarting parsoid on wtp1025 to pick up openssl update
  • 11:33 hashar: contint1001: cleaning Docker containers #T219850
  • 11:23 Amir1: EU SWAT is done
  • 11:22 ladsgroup@deploy1001: Synchronized wmf-config/InitialiseSettings.php: SWAT: Add the urlshortener-manage-url right and enable it for stewards (T133109), Part I (duration: 00m 51s)
  • 11:21 ladsgroup@deploy1001: Synchronized wmf-config/CommonSettings.php: SWAT: Add the urlshortener-manage-url right and enable it for stewards (T133109), Part I (duration: 00m 53s)
  • 11:14 akosiaris: T217715 Update mathoid, citoid, cxserver, eventgate grafana dashboards to use the new recording rules for the quantiles
  • 11:14 jbond42: add cmake 3.6.2 to jessie-wikimedia/backports
  • 11:02 jbond42: add rapidjson 1.1.0 to jessie-wikimedia/backports
  • 10:47 jbond42: add catch 1.10 to jessie-wikimedia/backports
  • 10:42 jbond42: add strip-nondeterminism 0.034 to jessie-wikimedia/backports
  • 10:39 jbond42: add dh-autoreconf 12 to jessie-wikimedia/backports
  • 10:30 jbond42: add debhelper 10.2.5 and dh-systemd 10.2.5 to jessie-wikimedia/backports
  • 10:08 elukey: manually purge varnishkafka graphite alert's URL as attempt to avoid a flapping alert - T219842
  • 09:14 arturo: T219776 finally reimaging cloudnet2003-dev.codfw.wmnet (was labtestnet2003)
  • 09:03 _joe_: uploaded patched version of bootstrap-vz to account for jessie-updates vanishing (T219683)
  • 08:58 marostegui@deploy1001: Synchronized wmf-config/db-eqiad.php: Repool all slaves in x1 T219777 T143763 (duration: 00m 53s)
  • 08:50 marostegui: Execute schema change on db1069 x1 master with replication enabled on the following small wikis: aawiki aawikibooks aawiktionary abwiki abwiktionary acewiki advisorswiki advisorywiki adywiki afwiki T143763
  • 08:20 marostegui: Compress wikishared.urlshortcodes table on x1, directly on the master with replication (table has 1 row) - T219777
  • 08:16 marostegui@deploy1001: Synchronized wmf-config/db-eqiad.php: Depool all slaves in x1 T219777 T143763 (duration: 00m 53s)
  • 08:13 moritzm: installing debdeploy updates on remaining hosts in eqiad/codfw
  • 08:05 moritzm: installing openssl1.0 security updates
  • 07:52 moritzm: removed labvirt1008 from debmonitor (T216661)
  • 06:38 marostegui@deploy1001: Synchronized wmf-config/db-eqiad.php: Repool db1120 (duration: 00m 50s)
  • 06:31 marostegui@deploy1001: Synchronized wmf-config/db-eqiad.php: Depool db1120 (duration: 00m 52s)
  • 06:26 marostegui@deploy1001: Synchronized wmf-config/db-eqiad.php: Repool db1064 (duration: 00m 52s)
  • 06:16 marostegui@deploy1001: Synchronized wmf-config/db-eqiad.php: Depool db1064 (duration: 00m 54s)
  • 06:02 marostegui@deploy1001: Synchronized wmf-config/db-eqiad.php: Repool pc1008 (duration: 00m 53s)
  • 05:58 oblivian@deploy1001: Finished deploy [docker-pkg/deploy@2a090ef]: New version for T219778 (duration: 00m 19s)
  • 05:58 oblivian@deploy1001: Started deploy [docker-pkg/deploy@2a090ef]: New version for T219778
  • 05:55 marostegui: Upgrade pc1008
  • 05:55 marostegui@deploy1001: Synchronized wmf-config/db-eqiad.php: Depool pc1008 (duration: 00m 56s)
  • 04:14 onimisionipe: restarted tilerator on maps200[1-3] - connection refused
  • 01:18 XioNoX: replacing accepted-prefix-limit with prefix-limit on cr1-eqiad - T211730
  • 01:14 XioNoX: replacing accepted-prefix-limit with prefix-limit in eqord - T211730
  • 00:52 XioNoX: depool eqsin due to Telia eqsin-codfw link outage
  • 00:40 XioNoX: replacing accepted-prefix-limit with prefix-limit in [co|eq]dfw - T211730
  • 00:25 XioNoX: replacing accepted-prefix-limit with prefix-limit on all ulsfo peers - T211730
  • 00:19 XioNoX: replacing accepted-prefix-limit with prefix-limit on one ulsfo peer - T211730
  • 00:06 XioNoX: jnt push to msw switches

2019-04-01

  • 23:54 shdubsh: restarting kafka on kafka-jumbo1004
  • 23:47 shdubsh: restarting kafka on kafka-jumbo1003
  • 23:36 shdubsh: restart kafka on kafka-jumbo1002
  • 23:28 shdubsh: restart kafka on kafka-jumbo1001
  • 23:16 XioNoX: jnt push to csw2-esams
  • 22:52 XioNoX: restart pdfrender on scb1003 - T174916
  • 21:44 sbassett@deploy1001: Synchronized private/PrivateSettings.php: Remove kowiki spam mitigations T212679 (duration: 00m 54s)
  • 21:28 XioNoX: Push AS specific policy-statements to cr1/2-eqsin v4 peers - T211930
  • 21:11 dcausse: elasticsearch search cluster: reindex spaceless languages (T219533)
  • 19:48 gilles@deploy1001: Synchronized wmf-config/InitialiseSettings.php: T216499 Renew Priority Hints origin trial token (duration: 00m 54s)
  • 19:48 bblack: authdns2001 (ns1) upgrade gdnsd -> 3.1.0
  • 18:58 XioNoX: re-set ulsfo-codfw ospf cost to previous default - T219591
  • 18:52 shdubsh: restart mjolnir-kafka-msearch on relforge1002 to adopt new logging config
  • 18:44 dcausse: Morning SWAT done
  • 18:42 dcausse@deploy1001: Synchronized wmf-config/InitialiseSettings.php: T219268: [cirrus] Use bm25 similarity for all wikis (duration: 00m 51s)
  • 18:33 dcausse@deploy1001: Synchronized wmf-config/CirrusSearch-production.php: T210381: [cirrus] Cleanup transitional states (duration: 00m 53s)
  • 18:22 lucaswerkmeister-wmde@deploy1001: Synchronized wmf-config/CommonSettings.php: SWAT: ExternalGuidance: Allow google translate hosts as known services (T218948) (duration: 00m 53s)
  • 18:18 bblack: multatuli (ns2) upgrade gdnsd -> 3.1.0
  • 18:10 lucaswerkmeister-wmde@deploy1001: Synchronized wmf-config/InitialiseSettings.php: SWAT: Add tmpSerializeEmptyListsAsObjects Wikibase repo config (T138104) (duration: 00m 54s)
  • 17:55 XioNoX: remove asw2-c-eqiad:et-3/1/2 from disabled interfaces - T218059
  • 17:31 bblack: authdns1001 (ns0) upgrade gdnsd -> 3.1.0
  • 17:22 bblack: upgrade gdnsd -> 3.1.0 (wmf2) on cp1099 (authdns test)
  • 17:21 bblack: uploading gdnsd-3.1.0-1~wmf2 to stretch-wikimedia
  • 17:15 onimisionipe@deploy1001: Finished deploy [wdqs/wdqs@115a6bf]: Added more endpoint, GUI updates and new bot pattern (duration: 12m 10s)
  • 17:07 arturo: restart dhcp server in install2002 to release old lease for labtestnet2003
  • 17:03 onimisionipe@deploy1001: Started deploy [wdqs/wdqs@115a6bf]: Added more endpoint, GUI updates and new bot pattern
  • 16:32 vgutierrez: slowly reenabling puppet in cache text cluster - T213705
  • 16:28 bblack: upgrade gdnsd -> 3.1.0 on cp1099 (authdns test)
  • 16:25 bblack: uploading gdnsd-3.1.0-1~wmf1 to stretch-wikimedia
  • 16:15 arturo: T219776 reimaging + renaming labtestnet2003 into cloudnet2003-dev
  • 16:13 vgutierrez@puppetmaster1001: conftool action : set/pooled=yes; selector: name=cp1075.eqiad.wmnet
  • 16:07 vgutierrez@puppetmaster1001: conftool action : set/pooled=no; selector: name=cp1075.eqiad.wmnet
  • 16:05 vgutierrez@puppetmaster1001: conftool action : set/pooled=yes; selector: name=cp2023.codfw.wmnet
  • 15:57 vgutierrez@puppetmaster1001: conftool action : set/pooled=no; selector: name=cp2023.codfw.wmnet
  • 15:56 vgutierrez@puppetmaster1001: conftool action : set/pooled=yes; selector: name=cp3042.esams.wmnet
  • 15:49 vgutierrez@puppetmaster1001: conftool action : set/pooled=no; selector: name=cp3042.esams.wmnet
  • 15:48 vgutierrez@puppetmaster1001: conftool action : set/pooled=yes; selector: name=cp4032.ulsfo.wmnet
  • 15:43 vgutierrez@puppetmaster1001: conftool action : set/pooled=no; selector: name=cp4032.ulsfo.wmnet
  • 15:42 vgutierrez@puppetmaster1001: conftool action : set/pooled=yes; selector: name=cp5007.eqsin.wmnet
  • 15:30 vgutierrez@puppetmaster1001: conftool action : set/pooled=no; selector: name=cp5007.eqsin.wmnet
  • 15:24 vgutierrez: disable puppet in the cache text cluster - T213705
  • 15:09 Amir1: mwscript extensions/CirrusSearch/maintenance/updateSearchIndexConfig.php --wiki=hywwiki --baseName hywwiki --cluster (eqiad|codfw)
  • 14:59 mholloway-shell@deploy1001: Synchronized wmf-config/InitialiseSettings-labs.php: Cleanup: Remove obsolete WikimediaEditorTasks beta cluster prefs (duration: 00m 50s)
  • 14:44 moritzm: rolling out debdeploy 0.0.99.10 for jessie, buster, stretch systems
  • 14:42 moritzm: restarting superset on analytics-tool1004 to pick up latest Python
  • 14:41 Amir1: ladsgroup@mwmaint1002:~$ mwscript maintenance/createAndPromote.php --wiki=hywwiki --force --sysop Ladsgroup
  • 14:37 ladsgroup@deploy1001: Synchronized langlist: (no justification provided) (duration: 00m 50s)
  • 14:35 ladsgroup@deploy1001: Synchronized wmf-config/InitialiseSettings.php: (no justification provided) (duration: 00m 50s)
  • 14:33 ladsgroup@deploy1001: Synchronized multiversion/MWMultiVersion.php: T212597 (duration: 00m 51s)
  • 14:32 Amir1: wikiadmin@10.64.32.136(hywwiki)> update text set old_text = 'DB://cluster25/1';
  • 14:18 ladsgroup@deploy1001: rebuilt and synchronized wikiversions files: (no justification provided)
  • 14:11 moritzm: uploaded debdeploy 0.0.99.10 to apt.wikimedia.org (jessie, stretch, buster)
  • 14:07 ladsgroup@deploy1001: Synchronized dblists: (no justification provided) (duration: 00m 52s)
  • 14:04 vgutierrez@puppetmaster1001: conftool action : set/pooled=yes; selector: name=cp5007.eqsin.wmnet
  • 13:57 vgutierrez@puppetmaster1001: conftool action : set/pooled=no; selector: name=cp5007.eqsin.wmnet
  • 13:56 vgutierrez@puppetmaster1001: conftool action : set/pooled=yes; selector: name=cp5001.eqsin.wmnet
  • 13:50 hashar: Reverted CI Jenkins jobs to Quibble 0.0.28 # T219647
  • 13:47 vgutierrez@puppetmaster1001: conftool action : set/pooled=no; selector: name=cp5001.eqsin.wmnet
  • 13:26 mvolz@deploy1001: scap-helm citoid finished
  • 13:26 mvolz@deploy1001: scap-helm citoid cluster codfw completed
  • 13:26 mvolz@deploy1001: scap-helm citoid upgrade production -f citoid-codfw-values.yaml stable/citoid [namespace: citoid, clusters: codfw]
  • 13:23 mvolz@deploy1001: scap-helm citoid finished
  • 13:23 mvolz@deploy1001: scap-helm citoid cluster eqiad completed
  • 13:23 mvolz@deploy1001: scap-helm citoid upgrade production -f citoid-eqiad-values.yaml stable/citoid [namespace: citoid, clusters: eqiad]
  • 13:12 mvolz@deploy1001: scap-helm citoid finished
  • 13:12 mvolz@deploy1001: scap-helm citoid cluster staging completed
  • 13:12 mvolz@deploy1001: scap-helm citoid upgrade staging -f citoid-staging-values.yaml stable/citoid [namespace: citoid, clusters: staging]
  • 13:11 hashar: Upgraded CI Jenkins jobs to Quibble 0.0.30 # T219647
  • 13:09 jbond42: rolling security update of tshark
  • 12:24 oblivian@deploy1001: Finished deploy [docker-pkg/deploy@46ba982]: Rollback - third time is the charm (duration: 00m 43s)
  • 12:23 oblivian@deploy1001: Started deploy [docker-pkg/deploy@46ba982]: Rollback - third time is the charm
  • 12:08 oblivian@deploy1001: Finished deploy [docker-pkg/deploy@0c32dc1]: Rollback to 1.0.0, T219778 (duration: 00m 18s)
  • 12:08 oblivian@deploy1001: Started deploy [docker-pkg/deploy@0c32dc1]: Rollback to 1.0.0, T219778
  • 12:02 oblivian@deploy1001: Finished deploy [docker-pkg/deploy@UNKNOWN]: Rollback to 1.0.0, T219778 (duration: 00m 34s)
  • 12:02 oblivian@deploy1001: Started deploy [docker-pkg/deploy@UNKNOWN]: Rollback to 1.0.0, T219778
  • 11:58 Lucas_WMDE: EU SWAT done
  • 11:57 lucaswerkmeister-wmde@deploy1001: Synchronized php-1.33.0-wmf.23/extensions/WikibaseLexeme: SWAT: Fix GrammaticalFeatureListWidget (T219134, T219734) (duration: 01m 00s)
  • 11:53 moritzm: uploaded logstash/kibana/elasticsearch 5.6.15 to component thirdparty/elastic56
  • 11:52 moritzm: uploaded logstash/kibana/elasticsearch to component thirdparty/elastic56
  • 11:51 zfilipin@deploy1001: Synchronized wmf-config/InitialiseSettings.php: SWAT: Add unwatchedpages permission to rollbacker and patroller at zhwiki (T219285) (duration: 00m 52s)
  • 11:41 zfilipin@deploy1001: Synchronized static/images/project-logos/: SWAT: Correct logos for the Gujarati Wikipedia (T219373) (duration: 00m 52s)
  • 11:34 zfilipin@deploy1001: Synchronized wmf-config/abusefilter.php: SWAT: Enable logging of private filters on commonswiki (T218527) (duration: 00m 50s)
  • 11:25 zfilipin@deploy1001: Synchronized wmf-config/InitialiseSettings.php: SWAT: Revert "Revert "Remove $wgAbuseFilterRuntimeProfile"" (T191039) (duration: 00m 51s)
  • 11:17 zfilipin@deploy1001: Synchronized wmf-config/abusefilter.php: SWAT: Revert "Revert "Remove $wgAbuseFilterProfile"" (T191039) (duration: 00m 52s)
  • 11:16 oblivian@deploy1001: Finished deploy [docker-pkg/deploy@0c32dc1]: Upgrade to 1.1.2 (duration: 01m 08s)
  • 11:15 oblivian@deploy1001: Started deploy [docker-pkg/deploy@0c32dc1]: Upgrade to 1.1.2
  • 11:00 jbond42: halt rolling updates of tshark untill after SWAT
  • 10:48 jdrewniak@deploy1001: Synchronized portals: Wikimedia Portals Update: Bumping portals to master (T128546) (duration: 00m 50s)
  • 10:47 jdrewniak@deploy1001: Synchronized portals/wikipedia.org/assets: Wikimedia Portals Update: Bumping portals to master (T128546) (duration: 00m 52s)
  • 10:42 jbond42: rolling security update of tshark
  • 10:32 _joe_: pruning old images on boron
  • 10:31 oblivian@deploy1001: Finished deploy [docker-pkg/deploy@7ef5ca3]: Upgrade to 1.1.2 (duration: 00m 26s)
  • 10:31 oblivian@deploy1001: Started deploy [docker-pkg/deploy@7ef5ca3]: Upgrade to 1.1.2
  • 10:27 arturo: T219626 reimaging cloudcontrol2001-dev
  • 09:09 moritzm: installing Chromium security updates on proton* (tested the new release in deployment-prep)
  • 08:40 marostegui@deploy1001: Synchronized wmf-config/db-codfw.php: Repool db2033 (duration: 00m 51s)
  • 08:09 marostegui: Deploy testing schema change on enwiki.echo_event on db2033 and upgrade mysql - T143961
  • 07:54 ariel@deploy1001: Finished deploy [dumps/dumps@7abb6c8]: get db user/passwd va mw maint script (duration: 00m 03s)
  • 07:54 ariel@deploy1001: Started deploy [dumps/dumps@7abb6c8]: get db user/passwd va mw maint script
  • 07:30 marostegui@deploy1001: Synchronized wmf-config/db-codfw.php: Depool db2033 (duration: 00m 51s)
  • 06:28 _joe_: pushing wikimedia-jessie:{20190401,latest} to docker-registry.w.o T219580
  • 06:27 _joe_: installing new bootstrap-vz on boron T219580
  • 05:38 marostegui@deploy1001: Synchronized wmf-config/db-eqiad.php: Repool db1077 (duration: 00m 50s)
  • 05:08 marostegui: Deploy schema change on db1077, this will generate lag on s3 on labs
  • 05:08 marostegui@deploy1001: Synchronized wmf-config/db-eqiad.php: Depool db1077 (duration: 00m 53s)

2019-03-31

  • 06:57 marostegui: Remove old files from dbstore1001 to clean up the disk space warning

2019-03-30

  • 03:39 krinkle@deploy1001: Synchronized php-1.33.0-wmf.23/extensions/ImageMap/includes/ImageMap.php: I1387825f25e / T217087 (duration: 00m 52s)
  • 03:16 krinkle@deploy1001: Synchronized php-1.33.0-wmf.23/skins/Vector/includes/templates/index.mustache: I0d6e036b65da0 / T219359 / i18n regression (duration: 00m 54s)

2019-03-29

  • 22:06 bstorm_: stopped database services on labsdb1004 and labsdb1005
  • 21:01 ebernhardson@deploy1001: Finished deploy [search/mjolnir/deploy@ff9d424]: Elasticsearch 6 fixes for content-type headers (part 3) (duration: 05m 14s)
  • 20:55 ebernhardson@deploy1001: Started deploy [search/mjolnir/deploy@ff9d424]: Elasticsearch 6 fixes for content-type headers (part 3)
  • 20:49 ebernhardson@deploy1001: Finished deploy [search/mjolnir/deploy@ff9d424]: Elasticsearch 6 fixes for content-type headers (part 3) (duration: 03m 13s)
  • 20:46 ebernhardson@deploy1001: Started deploy [search/mjolnir/deploy@ff9d424]: Elasticsearch 6 fixes for content-type headers (part 3)
  • 20:35 ebernhardson@deploy1001: Finished deploy [search/mjolnir/deploy@ff9d424]: Elasticsearch 6 fixes for content-type headers (part 2) (duration: 03m 30s)
  • 20:31 ebernhardson@deploy1001: Started deploy [search/mjolnir/deploy@ff9d424]: Elasticsearch 6 fixes for content-type headers (part 2)
  • 20:30 ebernhardson@deploy1001: Finished deploy [search/mjolnir/deploy@ff9d424]: Elasticsearch 6 fixes for content-type headers (duration: 00m 30s)
  • 20:29 ebernhardson@deploy1001: Started deploy [search/mjolnir/deploy@ff9d424]: Elasticsearch 6 fixes for content-type headers
  • 18:41 ejegg: updated payments-wiki from 4b49bb7333 to 793bce1a5f
  • 15:51 XioNoX: repool ulsfo - T219591
  • 15:48 XioNoX: bump ulsfo-codfw ospf link cost to 1000 - T219591
  • 15:14 _joe_: pruning old images and containers on boron
  • 15:00 mutante: ldap-eqiad-replica02 - running out of disk - apt-get clean - gzipping /var/log/debug
  • 13:06 ema@puppetmaster1001: conftool action : set/pooled=yes; selector: name=cp2005.codfw.wmnet,service=varnish-fe
  • 13:06 ema@puppetmaster1001: conftool action : set/pooled=yes; selector: name=cp2005.codfw.wmnet,service=nginx
  • 13:06 ema@puppetmaster1001: conftool action : set/pooled=yes; selector: name=cp2002.codfw.wmnet,service=varnish-fe
  • 13:06 ema@puppetmaster1001: conftool action : set/pooled=yes; selector: name=cp2002.codfw.wmnet,service=nginx
  • 13:05 ema: cp2002/cp2005: repool varnish-fe for user traffic T213263
  • 12:55 thcipriani: gerrit running on 2.15.11
  • 12:53 thcipriani: restarting gerrit to finish rollback to 2.15.11
  • 12:52 thcipriani@deploy1001: Finished deploy [gerrit/gerrit@670ddb8]: Gerrit (back) to version 2.15.11 on cobalt -- restart of gerrit incoming (duration: 00m 11s)
  • 12:52 thcipriani@deploy1001: Started deploy [gerrit/gerrit@670ddb8]: Gerrit (back) to version 2.15.11 on cobalt -- restart of gerrit incoming
  • 12:51 moritzm: removing php 7.0 packages from snapshot1008, dumps are only using 7.2 (T218193)
  • 12:50 thcipriani@deploy1001: Finished deploy [gerrit/gerrit@670ddb8]: Gerrit (back) to version 2.15.11 (gerrit2001 only) (duration: 00m 10s)
  • 12:50 thcipriani@deploy1001: Started deploy [gerrit/gerrit@670ddb8]: Gerrit (back) to version 2.15.11 (gerrit2001 only)
  • 12:47 moritzm: upgrading snapshot1008 to component/php72 (T218193)
  • 12:46 moritzm: upgrading snapshot1005-1007/1009 to component/php72 (T218193)
  • 12:23 ema: rolling ATS restarts to apply https://gerrit.wikimedia.org/r/#/c/operations/puppet/+/500011/ T213263
  • 11:45 mutante: cobalt - systemctl restart gerrit
  • 10:36 ema@puppetmaster1001: conftool action : set/pooled=no; selector: name=cp2002.codfw.wmnet,service=varnish-fe
  • 10:36 ema@puppetmaster1001: conftool action : set/pooled=no; selector: name=cp2002.codfw.wmnet,service=nginx
  • 10:35 ema@puppetmaster1001: conftool action : set/pooled=yes; selector: name=cp2002.codfw.wmnet,service=varnish-fe
  • 10:35 ema@puppetmaster1001: conftool action : set/pooled=yes; selector: name=cp2002.codfw.wmnet,service=nginx
  • 09:37 mutante: restarting zuul on contint1001
  • 09:05 ema@puppetmaster1001: conftool action : set/pooled=no; selector: name=cp2005.codfw.wmnet,service=varnish-fe
  • 09:05 ema@puppetmaster1001: conftool action : set/pooled=no; selector: name=cp2005.codfw.wmnet,service=nginx
  • 09:05 ema@puppetmaster1001: conftool action : set/pooled=no; selector: name=cp2002.codfw.wmnet,service=varnish-fe
  • 09:05 ema@puppetmaster1001: conftool action : set/pooled=no; selector: name=cp2002.codfw.wmnet,service=nginx
  • 08:36 godog: depool ulsfo as precaution -- link repair in progress
  • 08:30 marostegui@deploy1001: Synchronized wmf-config/db-eqiad.php: Fully repool db1110 (duration: 00m 50s)
  • 07:58 gilles@deploy1001: Synchronized php-1.33.0-wmf.23/includes/media/MediaTransformOutput.php: T216499 Only apply high priority half the time (duration: 00m 50s)
  • 07:54 marostegui@deploy1001: Synchronized wmf-config/db-eqiad.php: More traffic to db1110 (duration: 00m 51s)
  • 07:28 marostegui@deploy1001: Synchronized wmf-config/db-eqiad.php: More traffic to db1110 (duration: 00m 50s)
  • 07:19 vgutierrez: reenabling puppet in acme-chief clients after verifying NOOP in netmon2001
  • 07:17 marostegui@deploy1001: Synchronized wmf-config/db-eqiad.php: Slowly repool db1110 (duration: 01m 06s)
  • 07:11 vgutierrez: disabling puppet in acme-chief clients to merge I437b91 safely
  • 07:06 marostegui: Upgrade db1110
  • 07:06 marostegui@deploy1001: Synchronized wmf-config/db-eqiad.php: Depool db1110 (duration: 00m 49s)
  • 07:01 gilles@deploy1001: Synchronized wmf-config/InitialiseSettings.php: T216598 T216594 Element Timing for Images and Layout Stability on ruwiki (duration: 00m 51s)
  • 06:56 marostegui: Remove tools section from tendril by doing: update shards set display='0' where name='tools'; T216749
  • 06:44 marostegui@deploy1001: Synchronized wmf-config/db-eqiad.php: Repool pc1009 (duration: 00m 49s)
  • 06:41 marostegui: Upgrade pc1009
  • 06:32 marostegui@deploy1001: Synchronized wmf-config/db-eqiad.php: Depool pc1009 (duration: 00m 50s)
  • 06:19 marostegui@deploy1001: Synchronized wmf-config/db-eqiad.php: Repool db1075 (duration: 00m 50s)
  • 05:49 marostegui: Disable notifications on labsdb1004 and labsdb1005 - T216749
  • 05:47 marostegui: Remove labsdb1004 and labsdb1005 from tendril - T216749
  • 05:46 marostegui@deploy1001: Synchronized wmf-config/db-eqiad.php: Depool db1075 (duration: 00m 52s)
  • 00:18 krinkle@deploy1001: Synchronized php-1.33.0-wmf.23/includes/api/ApiStashEdit.php: I35213d83a0 (duration: 00m 49s)
  • 00:16 krinkle@deploy1001: Synchronized wmf-config/InitialiseSettings.php: I8887ce013a8 (duration: 00m 51s)
  • 00:00 krinkle@deploy1001: Synchronized wmf-config/InitialiseSettings.php: I24a5469dbfd0 / T216206 for testwikidatawiki (duration: 00m 50s)

2019-03-28

  • 23:54 krinkle@deploy1001: Synchronized wmf-config/Wikibase.php: Ib9d617 (duration: 00m 50s)
  • 23:53 krinkle@deploy1001: Synchronized wmf-config/WikibaseSearchSettings.php: Ib9d617 (duration: 00m 51s)
  • 23:14 bstorm_: completed setting up clouddb1003 as the replica of labsdb1006 (osm)
  • 22:13 bd808@deploy1001: Finished deploy [striker/deploy@2f62c43]: Fixes for error pages and repo creation (T176325) (duration: 00m 59s)
  • 22:12 bd808@deploy1001: Started deploy [striker/deploy@2f62c43]: Fixes for error pages and repo creation (T176325)
  • 22:11 XioNoX: add AS specific policy-statements to cr1-eqsin v6 transits - T211930
  • 21:51 thcipriani: restarting gerrit
  • 21:18 jforrester@deploy1001: Synchronized wmf-config/InitialiseSettings.php: [Wikimania] Enable VisualEditor in the 2019 namespace T218645 (duration: 00m 50s)
  • 21:16 XioNoX: add AS specific policy-statements to cr2-eqsin v6 transits - T211930
  • 21:13 jforrester@deploy1001: Synchronized wmf-config/InitialiseSettings.php: [Wikitech] Enable VisualEditor in extra namespaces (duration: 00m 50s)
  • 20:48 jforrester@deploy1001: Synchronized wmf-config/InitialiseSettings.php: VisualEditor: Enable mobile section editing A/B test on 10 Wikipedias T218851 T218939 (duration: 00m 50s)
  • 20:29 moritzm: restarting Gerrit on cobalt to effect new Java security update
  • 19:47 mholloway-shell@deploy1001: Synchronized wmf-config/InitialiseSettings.php: Enable WikimediaEditorTasks on wikidatawiki (duration: 00m 52s)
  • 19:39 mdholloway: created table wikimedia_editor_tasks_entity_description_exists on wikidatawiki
  • 19:19 marxarelli: 1.33.0-wmf.23 deployed for all wikis (T206677)
  • 19:09 marxarelli: dduvall@deploy1001 rebuilt and synchronized wikiversions files: all wikis to 1.33.0-wmf.23
  • 18:45 bstorm_: switching replica for osmdb to clouddb1003 VM from labsdb1007
  • 18:42 addshore@deploy1001: Synchronized wmf-config/db-labs.php: BETA ONLY db-labs (duration: 00m 57s)
  • 18:35 addshore@deploy1001: Synchronized wmf-config/Wikibase.php: SWAT: wikibase.php, define sharedCacheKeyGroup (duration: 00m 57s)
  • 18:32 jforrester@deploy1001: Synchronized php-1.33.0-wmf.23/extensions/ProofreadPage/includes/Index/IndexContent.php: ProofreadPage: Fix AbuseFilter UBN T219514 (duration: 00m 57s)
  • 18:17 jforrester@deploy1001: Synchronized php-1.33.0-wmf.23/extensions/AdvancedSearch/: AdvancedSearch: Fix two UBNs T219455 T219539 (duration: 00m 59s)
  • 18:03 ejegg: updated payments-wiki from 6661655e37 to 4b49bb7333
  • 17:46 ebernhardson@deploy1001: Finished deploy [search/mjolnir/deploy@ff9d424]: Deploy logging @cee: prefixing bugfix (duration: 03m 24s)
  • 17:43 ebernhardson@deploy1001: Started deploy [search/mjolnir/deploy@ff9d424]: Deploy logging @cee: prefixing bugfix
  • 16:39 XioNoX: enable cr2-codfw:xe-5/0/0 (to cr2-eqdfw)
  • 16:36 mutante: wikitech-static - changing [renewalparams] authenticator = to 'apache' from 'standalone' (installer = was already apache) (T214640)
  • 16:36 jbond42: move python3-requests and python3-urllib3 from jessie-wikimedia backports to component/kube2proxy
  • 16:33 XioNoX: disable cr2-codfw:xe-5/0/0 (to cr2-eqdfw)
  • 16:00 akosiaris: poweroff sessionstore2001 for a re-racking
  • 15:15 mutante: wikitech-static - removing acme-setup cron jobs from root's crontab. this was used before the switch to certbot, is unrelated and added to confusion and maybe the problem (T214640)
  • 15:07 otto@deploy1001: scap-helm eventgate-analytics finished
  • 15:07 otto@deploy1001: scap-helm eventgate-analytics cluster eqiad completed
  • 15:07 otto@deploy1001: scap-helm eventgate-analytics upgrade production -f eventgate-analytics-eqiad-values.yaml --reset-values stable/eventgate-analytics [namespace: eventgate-analytics, clusters: eqiad]
  • 15:07 otto@deploy1001: scap-helm eventgate-analytics finished
  • 15:07 otto@deploy1001: scap-helm eventgate-analytics cluster codfw completed
  • 15:07 otto@deploy1001: scap-helm eventgate-analytics upgrade production -f eventgate-analytics-codfw-values.yaml --reset-values stable/eventgate-analytics [namespace: eventgate-analytics, clusters: codfw]
  • 15:06 otto@deploy1001: scap-helm eventgate-analytics finished
  • 15:06 otto@deploy1001: scap-helm eventgate-analytics cluster staging completed
  • 15:06 otto@deploy1001: scap-helm eventgate-analytics upgrade staging -f eventgate-analytics-staging-values.yaml --reset-values stable/eventgate-analytics [namespace: eventgate-analytics, clusters: staging]
  • 14:46 ppchelko@deploy1001: Finished deploy [cpjobqueue/deploy@4deeb04]: Partition htmlCacheUpdate topic, final cleanup stage T219159 (duration: 00m 52s)
  • 14:45 ppchelko@deploy1001: Started deploy [cpjobqueue/deploy@4deeb04]: Partition htmlCacheUpdate topic, final cleanup stage T219159
  • 14:32 ppchelko@deploy1001: Finished deploy [cpjobqueue/deploy@3a8a889]: Partition htmlCacheUpdate topic, step 2 T219159 (duration: 00m 53s)
  • 14:31 ppchelko@deploy1001: Started deploy [cpjobqueue/deploy@3a8a889]: Partition htmlCacheUpdate topic, step 2 T219159
  • 14:07 gehel: reindexing changes from '2019-03-26T12:00:00Z' to '2019-03-28T12:00:00Z' into cirrus / elasticsearch - T218878
  • 13:59 gehel: restarting elasticsearch on elastic2050 to validate JVM upgrade
  • 13:57 moritzm: upgrading Java on elasticsearch hosts
  • 13:50 filippo@puppetmaster1001: conftool action : set/pooled=no; selector: name=prometheus2004.codfw.wmnet
  • 13:49 filippo@puppetmaster1001: conftool action : set/pooled=yes; selector: name=prometheus2003.codfw.wmnet
  • 13:22 ppchelko@deploy1001: Finished deploy [cpjobqueue/deploy@c120b38]: Partition htmlCacheUpdate topic, explicitly exclude htmlCacheUpdate T219159 (duration: 00m 48s)
  • 13:21 ppchelko@deploy1001: Started deploy [cpjobqueue/deploy@c120b38]: Partition htmlCacheUpdate topic, explicitly exclude htmlCacheUpdate T219159
  • 13:14 ppchelko@deploy1001: Finished deploy [cpjobqueue/deploy@17285f8]: Partition htmlCacheUpdate topic, step 1 T219159 (duration: 01m 46s)
  • 13:12 ppchelko@deploy1001: Started deploy [cpjobqueue/deploy@17285f8]: Partition htmlCacheUpdate topic, step 1 T219159
  • 12:20 moritzm: removing php 7.0 packages from snapshot1005-1007/1009, dumps are only using 7.2 (T218193)
  • 12:13 jbond42: move git from jessie-wikimedia backports repo components/ci
  • 12:02 lucaswerkmeister-wmde@deploy1001: Synchronized wmf-config/InitialiseSettings.php: Revert "SDC: Enable both new-style and old-style Wikibase federation on Commons" (T219450) (duration: 00m 57s)
  • 11:54 moritzm: upgrading snapshot1005-1007/1009 to component/php72 (T218193)
  • 11:53 ladsgroup@deploy1001: rebuilt and synchronized wikiversions files: Revert T212597
  • 11:51 ladsgroup@deploy1001: Synchronized dblists: Revert T212597 (duration: 00m 58s)
  • 11:27 ladsgroup@deploy1001: Synchronized dblists: T212597 (duration: 00m 56s)
  • 11:01 godog: test copying prometheus metrics on bast3002
  • 10:54 gehel: restarting elasticsearch-psi on elastic20[35,36,53] (shards stuck in recovery) - T218878
  • 10:22 gehel: restarting elasticsearch on elastic20[34,36,50] (shards stuck in recovery) - T218878
  • 10:15 addshore@deploy1001: Synchronized php-1.33.0-wmf.23/extensions/Wikibase/lib: T219452 Revert: Use enableModuleContentVersion() for Wikibase\lib\SitesModule (duration: 01m 06s)
  • 10:11 gehel: restarting elasticsearch-omega on elastic2050 (shards stuck in recovery) - T218878
  • 09:56 gehel: restarting elasticsearch-omega on elastic2031 (shards stuck in recovery) - T218878
  • 09:42 gehel: restarting elasticsearch on elastic20[28,29,41] (shards stuck in recovery) - T218878
  • 09:37 gehel: restarting elasticsearch-psi on elastic20[39,40] (shards stuck in recovery) - T218878
  • 09:33 marostegui@deploy1001: Synchronized wmf-config/db-eqiad.php: Repool db1123 (duration: 00m 56s)
  • 09:28 gehel: restarting elasticsearch on elastic20[25,27] (shards stuck in recovery) - T218878
  • 09:19 gehel: restarting elasticsearch-omega on elastic20[38,50] (shards stuck in recovery) - T218878
  • 09:14 godog: install rsyslog 8.1901.0-1~bpo8+wmf1 on phab1001 and copper
  • 09:09 gehel: restarting elasticsearch-omega on elastic2050 (shards stuck in recovery) - T218878
  • 09:06 gehel: restarting elasticsearch-psi on elastic20[35,36,53] (shards stuck in recovery) - T218878
  • 09:00 gehel: restarting elasticsearch-psi on elastic2036 (shards stuck in recovery) - T218878
  • 08:57 marostegui@deploy1001: Synchronized wmf-config/db-eqiad.php: Depool db1123 (duration: 00m 55s)
  • 08:43 marostegui@deploy1001: Synchronized wmf-config/db-codfw.php: Repool pc2007 after upgrade (duration: 00m 57s)
  • 08:38 gehel: retry shard allocation on elasticsearch codfw all clusters (curl -k -XPOST 'https://localhost:9243/_cluster/reroute?pretty&explain=true&retry_failed') - T218878
  • 08:37 gehel: retry shard allocation on elasticsearch codfw (curl -k -XPOST 'https://localhost:9243/_cluster/reroute?pretty&explain=true&retry_failed')
  • 08:33 elukey: move hadoop yarn configuration from hdfs back to zookeeper - T218758
  • 08:32 marostegui: Upgrade pc2007
  • 08:31 marostegui@deploy1001: Synchronized wmf-config/db-codfw.php: Depool pc2007 for upgrade (duration: 00m 56s)
  • 08:23 marostegui@deploy1001: Synchronized wmf-config/db-codfw.php: Repool pc2009 after upgrade (duration: 00m 57s)
  • 08:12 marostegui: Upgrade pc2009
  • 08:11 marostegui@deploy1001: Synchronized wmf-config/db-codfw.php: Depool pc2009 for upgrade (duration: 00m 57s)
  • 08:10 gehel@cumin2001: END (PASS) - Cookbook sre.elasticsearch.force-shard-allocation (exit_code=0)
  • 08:07 gehel@cumin2001: START - Cookbook sre.elasticsearch.force-shard-allocation
  • 07:32 marostegui@deploy1001: Synchronized wmf-config/db-codfw.php: Repool pc2008 after upgrade (duration: 00m 57s)
  • 07:22 marostegui: Upgrade pc2008
  • 07:22 marostegui@deploy1001: Synchronized wmf-config/db-codfw.php: Depool pc2008 for upgrade (duration: 00m 57s)
  • 07:18 marostegui@deploy1001: Synchronized wmf-config/db-eqiad.php: Clean up old non used entries (duration: 01m 04s)
  • 06:27 marostegui: Deploy schema change on s3 codfw, lag will be generated on s3 codfw.
  • 05:39 marostegui: Restart apache on phab1001 - phabricator is down
  • 02:50 chaomodus: restarted pdfrender on scb1004 in order to attempt to address flapping errors
  • 01:45 XioNoX: add AS specific policy-statements to cr2-eqsin (but don't apply them yet) - T211930
  • 01:20 XioNoX: progressive jnt push to standardize cr*
  • 01:15 XioNoX: remove sandbox-out6 filter from all routers
  • 00:56 XioNoX: jnt push to standardize asw*
  • 00:32 XioNoX: jnt push to standardize mr1-*
  • 00:21 krinkle@deploy1001: Synchronized php-1.33.0-wmf.23/includes/api/ApiStashEdit.php: Ic357dbfcd9ab / T203786 (duration: 00m 57s)

2019-03-27

  • 23:46 mholloway-shell@deploy1001: Synchronized php-1.33.0-wmf.23/extensions/WikimediaEditorTasks: Fix: Pass database name to the NameTableStore constructor (duration: 00m 57s)
  • 23:34 thcipriani@deploy1001: Synchronized wmf-config/InitialiseSettings.php: SWAT: Load WikibaseLexemeCirrusSearch on Wikidata T216206 (duration: 00m 58s)
  • 23:25 thcipriani@deploy1001: Synchronized wmf-config/InitialiseSettings.php: SWAT: Load WikibaseLexemeCirrusSearch on test.wikidata.org T216206 (duration: 00m 59s)
  • 22:51 smalyshev@deploy1001: Finished deploy [wdqs/wdqs@4fd1022]: Deploy new bot patterns (duration: 00m 31s)
  • 22:51 smalyshev@deploy1001: Started deploy [wdqs/wdqs@4fd1022]: Deploy new bot patterns
  • 22:47 smalyshev@deploy1001: Finished deploy [wdqs/wdqs@4fd1022]: Deploy new bot patterns (duration: 00m 04s)
  • 22:47 smalyshev@deploy1001: Started deploy [wdqs/wdqs@4fd1022]: Deploy new bot patterns
  • 22:45 krinkle@deploy1001: Synchronized wmf-config/profiler.php: I8c7f8c / T176916 (duration: 00m 59s)
  • 22:36 smalyshev@deploy1001: Finished deploy [wdqs/wdqs@4fd1022]: Deploy new bot patterns (duration: 00m 34s)
  • 22:35 smalyshev@deploy1001: Started deploy [wdqs/wdqs@4fd1022]: Deploy new bot patterns
  • 22:30 niharika29@deploy1001: Finished deploy [scholarships/scholarships@9db232d]: Update wikimania-scholarships; includes fix for broken privacy policy link (duration: 00m 02s)
  • 22:30 niharika29@deploy1001: Started deploy [scholarships/scholarships@9db232d]: Update wikimania-scholarships; includes fix for broken privacy policy link
  • 22:21 smalyshev@deploy1001: Finished deploy [wdqs/wdqs@4fd1022]: Deploy new bot patterns (duration: 00m 31s)
  • 22:21 smalyshev@deploy1001: Started deploy [wdqs/wdqs@4fd1022]: Deploy new bot patterns
  • 21:59 chaomodus: restarting proton1001 to upgrade ram
  • 21:58 chaomodus: restarting proton1002 to upgrade ram
  • 21:57 chaomodus: restarting proton2001 in order to upgrade ram
  • 21:54 chaomodus: restarting proton2002 in order to upgrade ram
  • 21:25 dcausse@deploy1001: Synchronized wmf-config/Wikibase.php: T219448 (duration: 00m 55s)
  • 21:25 eileen: civicrm revision changed from 67b8405b60 to 7560af93df, config revision is 5a0cbb3c7d (was actually before the process control one)
  • 21:24 eileen: process-control config revision is e1bc772c89
  • 21:17 chaomodus: restarted proton on proton1001 in response to memory exhaustion and cpu peg
  • 21:07 milimetric@deploy1001: Finished deploy [analytics/refinery@fdd21a4]: non-deploy changes and two new oozie jobs (duration: 11m 48s)
  • 20:55 milimetric@deploy1001: Started deploy [analytics/refinery@fdd21a4]: non-deploy changes and two new oozie jobs
  • 20:29 mholloway-shell@deploy1001: Synchronized wmf-config/InitialiseSettings.php: Update WikimediaEditorTasks config for DB location split (duration: 00m 57s)
  • 20:23 mholloway-shell@deploy1001: Synchronized php-1.33.0-wmf.23/extensions/WikimediaEditorTasks: Update DB utils to handle counts and suggestion DBs in different locations (duration: 00m 58s)
  • 20:14 gehel@cumin2001: END (FAIL) - Cookbook sre.elasticsearch.e6-upgrade (exit_code=99)
  • 20:14 mholloway-shell@deploy1001: Synchronized php-1.33.0-wmf.23/extensions/WikimediaEditorTasks: Fix: Use READ_LOCKING when evaluating whether to update targets_passed (duration: 00m 58s)
  • 20:04 gehel@cumin2001: START - Cookbook sre.elasticsearch.e6-upgrade
  • 20:03 gehel@cumin2001: END (ERROR) - Cookbook sre.elasticsearch.e6-upgrade (exit_code=97)
  • 19:48 gehel@cumin2001: END (PASS) - Cookbook sre.elasticsearch.force-shard-allocation (exit_code=0)
  • 19:43 gehel@cumin2001: START - Cookbook sre.elasticsearch.force-shard-allocation
  • 19:43 herron: removed queued wikidata notification messages for a***a@w**gm**ster.** on mx1001 to address gmail excessive volume rate limiting
  • 19:32 jijiki: restarting pdfrender on scb1001
  • 19:30 gehel@cumin2001: END (PASS) - Cookbook sre.elasticsearch.force-shard-allocation (exit_code=0)
  • 19:27 marxarelli: (resent; originally @ 1916) dduvall@deploy1001 rebuilt and synchronized wikiversions files: group1 wikis to 1.33.0-wmf.23
  • 19:23 gehel@cumin2001: START - Cookbook sre.elasticsearch.force-shard-allocation
  • 19:18 dduvall@deploy1001: Synchronized php: group1 wikis to 1.33.0-wmf.23 (duration: 01m 45s)
  • 19:14 gehel@cumin2001: START - Cookbook sre.elasticsearch.e6-upgrade
  • 18:48 thcipriani: restarting gerrit process
  • 18:12 jynus: update grants on db1115 for new provisioning hosts on codfw T218336
  • 18:10 elukey: interface::rps applied to all the mc10XX hosts - T203786
  • 17:41 gehel@cumin2001: END (PASS) - Cookbook sre.elasticsearch.force-shard-allocation (exit_code=0)
  • 17:41 gehel@cumin2001: START - Cookbook sre.elasticsearch.force-shard-allocation
  • 17:10 ema: fermium: /usr/local/sbin/disable_list wikimetrics T211835
  • 16:55 jforrester@deploy1001: Synchronized wmf-config/InitialiseSettings.php: T214075 SDC: Enable Wikidata federation on Commons (duration: 00m 57s)
  • 16:38 elukey: mc20XX and mc1022 have interface::rps enabled - T203786
  • 16:28 jforrester@deploy1001: Synchronized php-1.33.0-wmf.23/extensions/GlobalPreferences/includes/GlobalPreferencesFactory.php: Hot-fix T219380 GlobalPreferences: Allow modifiedPrefs to be set even if no UI control (duration: 00m 58s)
  • 16:18 jforrester@deploy1001: Synchronized wmf-config/CommonSettings.php: SDC: Use feature flag for enabling depicts in UW (duration: 00m 57s)
  • 16:13 jforrester@deploy1001: Synchronized wmf-config/InitialiseSettings.php: SDC: Add feature flag for enabling depicts in UW (duration: 00m 57s)
  • 15:56 jbond42: bastion reboots complete
  • 15:56 ariel@deploy1001: Finished deploy [dumps/dumps@88ddd76]: ability to use lbzip2 for meta-history compression (duration: 00m 03s)
  • 15:56 ariel@deploy1001: Started deploy [dumps/dumps@88ddd76]: ability to use lbzip2 for meta-history compression
  • 15:44 jbond42: rebooting bast2001.wikimedia.org in 5 minutes
  • 15:44 gehel@cumin2001: END (PASS) - Cookbook sre.elasticsearch.force-shard-allocation (exit_code=0)
  • 15:43 gehel@cumin2001: START - Cookbook sre.elasticsearch.force-shard-allocation
  • 15:42 jbond42: rebooting bast2002.wikimedia.org in 5 minutes
  • 15:38 jbond42: rebooting bast1002.wikimedia.org in 5 minutes
  • 15:34 jbond42: rebooting bast4002.wikimedia.org in 5 minutes
  • 15:30 jbond42: rebooting bast5001.wikimedia.org in 5 minutes
  • 15:24 jbond42: rebooting iron.wikimedia.org in 5 minutes
  • 15:22 gehel@cumin2001: END (PASS) - Cookbook sre.elasticsearch.force-shard-allocation (exit_code=0)
  • 15:21 gehel@cumin2001: START - Cookbook sre.elasticsearch.force-shard-allocation
  • 15:19 elukey: slowly rolling out interface::rps to all the mcXXXX nodes - T203786
  • 14:52 gehel@cumin2001: END (FAIL) - Cookbook sre.elasticsearch.e6-upgrade (exit_code=99)
  • 14:45 gehel@cumin2001: END (PASS) - Cookbook sre.elasticsearch.force-shard-allocation (exit_code=0)
  • 14:44 gehel@cumin2001: START - Cookbook sre.elasticsearch.force-shard-allocation
  • 14:13 gehel@cumin2001: END (PASS) - Cookbook sre.elasticsearch.force-shard-allocation (exit_code=0)
  • 14:12 marostegui: Sanitize hywwiki on db1124:3313 T212625
  • 14:11 gehel@cumin2001: START - Cookbook sre.elasticsearch.force-shard-allocation
  • 14:05 godog: roll-restart logstash to apply https://gerrit.wikimedia.org/r/c/operations/puppet/+/498417
  • 13:38 gehel@cumin2001: START - Cookbook sre.elasticsearch.e6-upgrade
  • 13:35 gehel@cumin2001: END (PASS) - Cookbook sre.elasticsearch.force-shard-allocation (exit_code=0)
  • 13:35 gehel@cumin2001: START - Cookbook sre.elasticsearch.force-shard-allocation
  • 13:11 ladsgroup@deploy1001: Synchronized dblists: (no justification provided) (duration: 00m 57s)
  • 12:42 ladsgroup@deploy1001: Synchronized dblists: (no justification provided) (duration: 00m 58s)
  • 12:41 Amir1: scap sync-file dblists
  • 12:30 Amir1: mwscript extensions/WikimediaMaintenance/addWiki.php --wiki=mediawikiwiki hyw wikipedia hywwiki hyw.wikipedia.org
  • 12:25 gehel@cumin2001: END (PASS) - Cookbook sre.elasticsearch.force-shard-allocation (exit_code=0)
  • 12:23 gehel@cumin2001: START - Cookbook sre.elasticsearch.force-shard-allocation
  • 12:15 gehel@cumin2001: END (FAIL) - Cookbook sre.elasticsearch.e6-upgrade (exit_code=99)
  • 11:47 gehel@cumin2001: END (PASS) - Cookbook sre.elasticsearch.force-shard-allocation (exit_code=0)
  • 11:43 gehel@cumin2001: START - Cookbook sre.elasticsearch.force-shard-allocation
  • 11:37 mdholloway: created wikimedia_editor_tasks_entity_description_exists table on testwikidatawiki
  • 11:28 _joe_: SWAT done
  • 11:24 oblivian@deploy1001: Synchronized php-1.33.0-wmf.22/extensions/WikimediaEvents: SWAT: Backport Use a cookie to persist the seed for php7 a/b test to .22 T216676 (duration: 00m 58s)
  • 11:20 zfilipin@deploy1001: Synchronized wmf-config/throttle.php: SWAT: Throttle rule for The Art and Feminism Edit-a-thon in Taiwan (T219113) (duration: 00m 59s)
  • 11:14 zfilipin@deploy1001: Synchronized wmf-config/throttle.php: SWAT: Clean the throttles up (T219311) (duration: 00m 57s)
  • 11:10 dcausse: elasticsearch search cluster: setting cluster.routing.allocation.disk.watermark.flood_stage to 100% on omega/psi/chi@eqiad (T219364)
  • 11:08 zfilipin@deploy1001: Synchronized wmf-config/throttle.php: SWAT: Add throttle rule for Czech editathon (T219291) (duration: 00m 58s)
  • 11:06 dcausse: elasticsearch search cluster: setting "index.blocks.read_only_allow_delete" to null on all indices in omega/psi/chi@omega (T219364)
  • 11:04 mutante: re-enabled puppet on logstash1007 through 1011 - then on logstash*
  • 11:00 gehel@cumin2001: START - Cookbook sre.elasticsearch.e6-upgrade
  • 10:57 godog: upgrade rsyslog to 8.1903.0-3~bpo8+wmf1 on cobalt to test imfile file rotation fix - T214176
  • 10:53 mutante: enabling and running puppet on logstash1007
  • 10:49 mutante: disabling puppet on logstash* via cumin
  • 10:47 marostegui@deploy1001: Synchronized wmf-config/db-eqiad.php: Repool db1090:3312 (duration: 00m 58s)
  • 10:20 godog: upgrade rsyslog to 8.1903.0-3~bpo8+wmf1 on phab1001 to test imfile file rotation fix - T214176
  • 09:58 marostegui@deploy1001: Synchronized wmf-config/db-eqiad.php: Depool db1090:3312 (duration: 00m 56s)
  • 09:41 marostegui@deploy1001: Synchronized wmf-config/db-eqiad.php: Repool db1074 (duration: 00m 57s)
  • 09:41 marostegui: Upgrade db2092
  • 09:06 vgutierrez: puppet reenabled in acme-chief clients - T207295
  • 09:01 marostegui: Deploy schema change on db1074, this will generate lag on labsdb hosts for s2
  • 09:01 marostegui@deploy1001: Synchronized wmf-config/db-eqiad.php: Depool db1074 (duration: 00m 57s)
  • 08:56 marostegui@deploy1001: Synchronized wmf-config/db-eqiad.php: Repool db1076 (duration: 00m 54s)
  • 08:33 vgutierrez: disabling puppet in acme-chief clients to get rid safely of old TLS material - T207295
  • 08:25 marostegui@deploy1001: Synchronized wmf-config/db-eqiad.php: Depool db1076 (duration: 00m 57s)
  • 08:17 godog: bounce rsyslog on phab* - apache access logs stopped at ~6.30 today
  • 08:09 godog: bounce rsyslog on cobalt - apache access logs stopped at ~6.30 today
  • 08:07 marostegui@deploy1001: Synchronized wmf-config/db-eqiad.php: Repool db1122 (duration: 00m 57s)
  • 07:11 marostegui@deploy1001: Synchronized wmf-config/db-eqiad.php: Depool db1122 (duration: 00m 58s)
  • 06:57 SMalyshev: depooled wdqs1005 to catch up
  • 06:56 SMalyshev: repooled wdqs1004
  • 06:42 marostegui@deploy1001: Synchronized wmf-config/db-eqiad.php: Repool db1103:3312 (duration: 00m 58s)
  • 06:02 marostegui@deploy1001: Synchronized wmf-config/db-codfw.php: Change one parsercache key on codfw - T210725 (duration: 00m 57s)
  • 05:58 marostegui@deploy1001: Synchronized wmf-config/db-eqiad.php: Depool db1103:3312 (duration: 01m 10s)
  • 00:56 SMalyshev: depooled wdq1004 to catch up
  • 00:55 SMalyshev: repooled wdq1006

2019-03-26

  • 23:37 SMalyshev: repooled wdqs2003
  • 23:12 ebernhardson@deploy1001: Synchronized wmf-config/InitialiseSettings-labs.php: T216206 : sync noop labs config: Actually load WBCS-Lexeme extension before trying to use it (duration: 00m 57s)
  • 22:12 gehel: freezing and unfreezing writes to elasticsearch codfw
  • 21:47 SMalyshev: depool wdq2003 to catch it up
  • 21:32 ebernhardson: manually thaw search.svc.codfw.wmnet:9643
  • 21:31 mholloway-shell@deploy1001: Synchronized wmf-config/InitialiseSettings.php: Enable WikimediaEditorTasks on testwikidatawiki (duration: 00m 57s)
  • 21:22 mdholloway: created new db tables for WikimediaEditorTasks in x1
  • 21:00 SMalyshev: depooled wdqs1006 to see if it'd catch up better
  • 20:19 marxarelli: correction: group0 to 1.33.0-wmf.23
  • 20:15 dduvall@deploy1001: rebuilt and synchronized wikiversions files: group0 to 1.33.0-wmf.0
  • 20:08 ejegg: updated payments-wiki from f42910460b to 6661655e37
  • 19:58 dduvall@deploy1001: Finished scap: testwiki to php-1.33.0-wmf.23 and rebuild l10n cache (duration: 37m 59s)
  • 19:20 dduvall@deploy1001: Started scap: testwiki to php-1.33.0-wmf.23 and rebuild l10n cache
  • 19:18 marxarelli: scap clean failure due to T218783. train is rolling without cleanup
  • 19:17 jynus: reloading db2095 mariadb instances to reload and check filters
  • 19:13 jynus: reloading db2094 mariadb instances to reload and check filters
  • 19:07 dduvall@deploy1001: clean aborted: Pruned MediaWiki: 1.33.0-wmf.17 (duration: 00m 10s)
  • 19:04 jynus: reloading db1125 mariadb instances to reload and check filters
  • 18:49 marxarelli: branch 1.33.0-wmf.23 was cut successfully (T206677)
  • 18:24 jynus: reloading db1124 mariadb instances to reload and check filters
  • 18:21 marxarelli: starting branch cut for 1.33.0-wmf.23 (T206677)
  • 18:09 thcipriani: gerrit back on version 2.15.12, upgrade complete.
  • 18:05 thcipriani: restarting gerrit on cobalt for update to 2.15.12
  • 18:05 gehel@cumin2001: END (FAIL) - Cookbook sre.elasticsearch.e6-upgrade (exit_code=99)
  • 18:05 thcipriani@deploy1001: Finished deploy [gerrit/gerrit@d3d2134]: Gerrit to 2.15.12 on cobalt (duration: 00m 15s)
  • 18:04 thcipriani@deploy1001: Started deploy [gerrit/gerrit@d3d2134]: Gerrit to 2.15.12 on cobalt
  • 18:03 thcipriani@deploy1001: Finished deploy [gerrit/gerrit@d3d2134]: Gerrit to 2.15.12 on gerrit2001 only (duration: 00m 11s)
  • 18:03 thcipriani@deploy1001: Started deploy [gerrit/gerrit@d3d2134]: Gerrit to 2.15.12 on gerrit2001 only
  • 18:01 thcipriani: starting gerrit 2.15.12 upgrade
  • 17:45 otto@deploy1001: scap-helm eventgate-analytics finished
  • 17:45 otto@deploy1001: scap-helm eventgate-analytics cluster eqiad completed
  • 17:45 otto@deploy1001: scap-helm eventgate-analytics upgrade production -f eventgate-analytics-eqiad-values.yaml --reset-values stable/eventgate-analytics [namespace: eventgate-analytics, clusters: eqiad]
  • 17:43 otto@deploy1001: scap-helm eventgate-analytics finished
  • 17:43 otto@deploy1001: scap-helm eventgate-analytics cluster codfw completed
  • 17:43 otto@deploy1001: scap-helm eventgate-analytics upgrade production -f eventgate-analytics-codfw-values.yaml --reset-values stable/eventgate-analytics [namespace: eventgate-analytics, clusters: codfw]
  • 17:41 otto@deploy1001: scap-helm eventgate-analytics finished
  • 17:41 otto@deploy1001: scap-helm eventgate-analytics cluster staging completed
  • 17:41 otto@deploy1001: scap-helm eventgate-analytics upgrade staging -f eventgate-analytics-staging-values.yaml --reset-values stable/eventgate-analytics [namespace: eventgate-analytics, clusters: staging]
  • 17:39 otto@deploy1001: scap-helm eventgate-analytics finished
  • 17:39 otto@deploy1001: scap-helm eventgate-analytics cluster staging completed
  • 17:39 otto@deploy1001: scap-helm eventgate-analytics upgrade staging -f eventgate-analytics-staging-values.yaml --dry-run --debug stable/eventgate-analytics [namespace: eventgate-analytics, clusters: staging]
  • 17:38 otto@deploy1001: scap-helm eventgate-analytics finished
  • 17:38 otto@deploy1001: scap-helm eventgate-analytics cluster staging completed
  • 17:38 otto@deploy1001: scap-helm eventgate-analytics upgrade staging -f eventgate-analytics-staging-values.yaml stable/eventgate-analytics [namespace: eventgate-analytics, clusters: staging]
  • 17:38 arlolra: Updated Parsoid to f58c3d1 (T219023)
  • 17:38 otto@deploy1001: scap-helm eventgate-analytics finished
  • 17:38 otto@deploy1001: scap-helm eventgate-analytics cluster staging completed
  • 17:38 otto@deploy1001: scap-helm eventgate-analytics upgrade staging -f eventgate-analytics-staging-values.yaml --dry-run --debug stable/eventgate-analytics [namespace: eventgate-analytics, clusters: staging]
  • 17:33 otto@deploy1001: scap-helm eventgate-analytics finished
  • 17:33 otto@deploy1001: scap-helm eventgate-analytics cluster staging completed
  • 17:33 otto@deploy1001: scap-helm eventgate-analytics upgrade staging -f eventgate-analytics-staging-values.yaml --dry-run --debug stable/eventgate-analytics [namespace: eventgate-analytics, clusters: staging]
  • 17:31 otto@deploy1001: scap-helm eventgate-analytics finished
  • 17:31 otto@deploy1001: scap-helm eventgate-analytics cluster staging completed
  • 17:31 otto@deploy1001: scap-helm eventgate-analytics upgrade staging -f eventgate-analytics-staging-values.yaml stable/eventgate-analytics [namespace: eventgate-analytics, clusters: staging]
  • 17:28 arlolra@deploy1001: Finished deploy [parsoid/deploy@395a214]: Updating Parsoid to f58c3d1 (duration: 06m 51s)
  • 17:21 arlolra@deploy1001: Started deploy [parsoid/deploy@395a214]: Updating Parsoid to f58c3d1
  • 17:14 gehel@cumin2001: END (PASS) - Cookbook sre.elasticsearch.force-shard-allocation (exit_code=0)
  • 17:13 gehel@cumin2001: START - Cookbook sre.elasticsearch.force-shard-allocation
  • 17:12 otto@deploy1001: scap-helm eventgate-analytics finished
  • 17:12 otto@deploy1001: scap-helm eventgate-analytics cluster staging completed
  • 17:12 otto@deploy1001: scap-helm eventgate-analytics upgrade staging -f eventgate-analytics-staging-values.yaml --dry-run --debug stable/eventgate-analytics [namespace: eventgate-analytics, clusters: staging]
  • 17:10 otto@deploy1001: scap-helm eventgate-analytics finished
  • 17:10 otto@deploy1001: scap-helm eventgate-analytics cluster staging completed
  • 17:10 otto@deploy1001: scap-helm eventgate-analytics upgrade staging -f eventgate-analytics-staging-values.yaml stable/eventgate-analytics [namespace: eventgate-analytics, clusters: staging]
  • 17:06 otto@deploy1001: scap-helm eventgate-analytics finished
  • 17:06 otto@deploy1001: scap-helm eventgate-analytics cluster staging completed
  • 17:06 otto@deploy1001: scap-helm eventgate-analytics upgrade staging -f eventgate-analytics-staging-values.yaml --dry-run --debug stable/eventgate-analytics [namespace: eventgate-analytics, clusters: staging]
  • 17:03 otto@deploy1001: scap-helm eventgate-analytics finished
  • 17:03 otto@deploy1001: scap-helm eventgate-analytics cluster staging completed
  • 17:03 otto@deploy1001: scap-helm eventgate-analytics upgrade staging -f eventgate-analytics-staging-values.yaml --dry-run --debug stable/eventgate-analytics [namespace: eventgate-analytics, clusters: staging]
  • 16:59 otto@deploy1001: scap-helm eventgate-analytics finished
  • 16:59 otto@deploy1001: scap-helm eventgate-analytics cluster staging completed
  • 16:59 otto@deploy1001: scap-helm eventgate-analytics upgrade staging -f eventgate-analytics-staging-values.yaml --dry-run --debug stable/eventgate-analytics [namespace: eventgate-analytics, clusters: staging]
  • 16:59 otto@deploy1001: scap-helm eventgate- upgrade staging -f eventgate-analytics-staging-values.yaml --dry-run --debug stable/eventgate-analytics [namespace: eventgate-, clusters: staging]
  • 16:59 otto@deploy1001: scap-helm eventgate- upgrade staging -f eventgate-analytics-staging-values.yaml --dry-run --debug stable/eventgate-analytics [namespace: eventgate-, clusters: staging]
  • 16:58 gehel@cumin2001: START - Cookbook sre.elasticsearch.e6-upgrade
  • 16:58 gehel@cumin2001: END (PASS) - Cookbook sre.elasticsearch.force-shard-allocation (exit_code=0)
  • 16:57 gehel@cumin2001: START - Cookbook sre.elasticsearch.force-shard-allocation
  • 16:57 otto@deploy1001: scap-helm eventgate-analytics finished
  • 16:57 otto@deploy1001: scap-helm eventgate-analytics cluster staging completed
  • 16:57 otto@deploy1001: scap-helm eventgate-analytics upgrade staging -f eventgate-analytics-staging-values.yaml stable/eventgate-analytics [namespace: eventgate-analytics, clusters: staging]
  • 16:31 gilles@deploy1001: Finished deploy [performance/asoranking@9a1e5ef]: (no justification provided) (duration: 00m 52s)
  • 16:30 gilles@deploy1001: Started deploy [performance/asoranking@9a1e5ef]: (no justification provided)
  • 16:07 gehel@cumin2001: END (PASS) - Cookbook sre.elasticsearch.force-shard-allocation (exit_code=0)
  • 16:07 gehel@cumin2001: START - Cookbook sre.elasticsearch.force-shard-allocation
  • 16:05 robh: decom of labtestvirt200[12] started via T218023
  • 15:45 otto@deploy1001: scap-helm eventgate-analytics finished
  • 15:45 otto@deploy1001: scap-helm eventgate-analytics cluster staging completed
  • 15:45 otto@deploy1001: scap-helm eventgate-analytics upgrade staging --version 0.0.16 -f eventgate-analytics-staging-values.yaml stable/eventgate-analytics [namespace: eventgate-analytics, clusters: staging]
  • 15:44 otto@deploy1001: scap-helm eventgate-analytics finished
  • 15:44 otto@deploy1001: scap-helm eventgate-analytics cluster staging completed
  • 15:44 otto@deploy1001: scap-helm eventgate-analytics upgrade staging --version 0.0.16 -f eventgate-analytics-staging-values.yaml stable/eventgate-analytics [namespace: eventgate-analytics, clusters: staging]
  • 15:43 gehel@cumin2001: END (PASS) - Cookbook sre.elasticsearch.force-shard-allocation (exit_code=0)
  • 15:43 otto@deploy1001: scap-helm eventgate-analytics upgrade staging --version 52 -f eventgate-analytics-staging-values.yaml stable/eventgate-analytics [namespace: eventgate-analytics, clusters: staging]
  • 15:43 gehel@cumin2001: START - Cookbook sre.elasticsearch.force-shard-allocation
  • 15:40 otto@deploy1001: scap-helm eventgate-analytics finished
  • 15:40 otto@deploy1001: scap-helm eventgate-analytics cluster staging completed
  • 15:40 otto@deploy1001: scap-helm eventgate-analytics upgrade --help [namespace: eventgate-analytics, clusters: staging]
  • 15:34 otto@deploy1001: scap-helm eventgate-analytics finished
  • 15:34 otto@deploy1001: scap-helm eventgate-analytics cluster staging completed
  • 15:34 otto@deploy1001: scap-helm eventgate-analytics upgrade staging -f eventgate-analytics-staging-values.yaml --dry-run --debug stable/eventgate-analytics [namespace: eventgate-analytics, clusters: staging]
  • 15:32 otto@deploy1001: scap-helm eventgate-analytics finished
  • 15:32 otto@deploy1001: scap-helm eventgate-analytics cluster staging completed
  • 15:32 otto@deploy1001: scap-helm eventgate-analytics upgrade staging -f eventgate-analytics-staging-values.yaml stable/eventgate-analytics [namespace: eventgate-analytics, clusters: staging]
  • 15:31 gehel@cumin2001: END (FAIL) - Cookbook sre.elasticsearch.e6-upgrade (exit_code=99)
  • 15:31 otto@deploy1001: scap-helm eventgate-analytics finished
  • 15:31 otto@deploy1001: scap-helm eventgate-analytics cluster staging completed
  • 15:31 otto@deploy1001: scap-helm eventgate-analytics upgrade staging -f eventgate-analytics-staging-values.yaml stable/eventgate-analytics [namespace: eventgate-analytics, clusters: staging]
  • 15:27 otto@deploy1001: scap-helm eventgate-analytics finished
  • 15:27 otto@deploy1001: scap-helm eventgate-analytics cluster staging completed
  • 15:27 otto@deploy1001: scap-helm eventgate-analytics upgrade staging -f eventgate-analytics-staging-values.yaml stable/eventgate-analytics [namespace: eventgate-analytics, clusters: staging]
  • 15:20 gehel@cumin2001: END (PASS) - Cookbook sre.elasticsearch.force-shard-allocation (exit_code=0)
  • 15:20 gehel@cumin2001: START - Cookbook sre.elasticsearch.force-shard-allocation
  • 15:08 gehel@cumin2001: END (PASS) - Cookbook sre.elasticsearch.force-shard-allocation (exit_code=0)
  • 15:07 gehel@cumin2001: START - Cookbook sre.elasticsearch.force-shard-allocation
  • 14:01 jbond42: rolling update of passenger on puppet masters
  • 13:35 gehel@cumin2001: END (PASS) - Cookbook sre.elasticsearch.force-shard-allocation (exit_code=0)
  • 13:35 gehel@cumin2001: START - Cookbook sre.elasticsearch.force-shard-allocation
  • 13:06 gehel@cumin2001: END (PASS) - Cookbook sre.elasticsearch.force-shard-allocation (exit_code=0)
  • 13:04 gehel@cumin2001: START - Cookbook sre.elasticsearch.force-shard-allocation
  • 12:58 gehel@cumin2001: START - Cookbook sre.elasticsearch.e6-upgrade
  • 11:42 Amir1: EU SWAT is done
  • 11:40 Amir1: ladsgroup@mwmaint1002:~$ mwscript extensions/Wikibase/lib/maintenance/populateSitesTable.php --wiki=wikimaniawiki --force-protocol https (T217730)
  • 11:39 Amir1: wikiadmin@db1078.eqiad.wmnet(wikimaniawiki)> DELETE FROM sites; and site_identifiers
  • 11:28 ladsgroup@deploy1001: Synchronized wmf-config/InitialiseSettings.php: SWAT: Set $wmgWikibaseSiteGroup for wikimaniawiki (T217730) (duration: 00m 49s)
  • 11:22 elukey: temporary install ifstat on mc1022 + tmux session to log in/out bandwidth usage every 1s for T203786
  • 11:20 ladsgroup@deploy1001: Synchronized wmf-config/throttle.php: SWAT: Throttle rule for Wikimedia Hackathon 2019 (T213869), try II (duration: 00m 49s)
  • 11:11 ladsgroup@deploy1001: Synchronized wmf-config/throttle.php: SWAT: Throttle rule for Wikimedia Hackathon 2019 (T213869) (duration: 00m 51s)
  • 10:54 marostegui@deploy1001: Synchronized wmf-config/db-eqiad.php: Repool db1105:3312 (duration: 00m 49s)
  • 09:56 marostegui@deploy1001: Synchronized wmf-config/db-eqiad.php: Depool db1105:3312 (duration: 00m 50s)
  • 09:54 marostegui: Upgrade db2071
  • 09:42 marostegui: Upgrade db2070
  • 09:15 jijiki: Restarting pdfrender on scb1001
  • 09:09 filippo@puppetmaster1001: conftool action : set/pooled=no; selector: name=prometheus1004.eqiad.wmnet
  • 09:05 filippo@puppetmaster1001: conftool action : set/pooled=yes; selector: name=prometheus1003.eqiad.wmnet
  • 08:09 marostegui: Deploy schema change on s2 codfw master, this will generate lag on codfw s2
  • 07:16 marostegui@deploy1001: Synchronized wmf-config/db-eqiad.php: Repool db1119 (duration: 00m 49s)
  • 06:51 marostegui@deploy1001: Synchronized wmf-config/db-eqiad.php: Depool db1119 (duration: 00m 50s)
  • 06:35 marostegui@deploy1001: Synchronized wmf-config/db-eqiad.php: Repool db1106 (duration: 00m 52s)
  • 06:02 marostegui@deploy1001: Synchronized wmf-config/db-eqiad.php: Depool db1106 (duration: 00m 51s)
  • 06:02 marostegui: Deploy schema change on db1106, this will generate lag on s1 on labs hosts

2019-03-25

  • 23:20 jforrester@deploy1001: Synchronized wmf-config/InitialiseSettings.php: SWAT T219234 Turn on Elastica logging channel (duration: 00m 51s)
  • 22:32 krinkle@deploy1001: Synchronized docroot/wikipedia.org/speed-tests/Banksy.enwiki.872156204: T185446 (duration: 00m 49s)
  • 21:44 jforrester@deploy1001: Synchronized wmf-config/Wikibase.php: T216206 Deploy WikibaseLexemeCirrusSearch: Part 1 - set up variables, sub-part b (duration: 00m 49s)
  • 21:43 jforrester@deploy1001: Synchronized wmf-config/InitialiseSettings.php: T216206 Deploy WikibaseLexemeCirrusSearch: Part 1 - set up variables, sub-part a (duration: 00m 50s)
  • 21:40 XioNoX: apply transport-in4 filter to cr1/2-eqiad - T190090
  • 21:33 jforrester@deploy1001: Synchronized wmf-config/InitialiseSettings.php: T218715 Enable WBCS on Testcommons too (duration: 00m 50s)
  • 20:32 ebernhardson: T218994 set various deprecation channels on all six cirrus elasticsearch clusters to ERROR
  • 19:54 dcausse: elasticsearch search cluster: SET "logger.org.elasticsearch.common.logging.DeprecationLogger" to "ERROR" to psi/omega@eqiad (T218994)
  • 19:48 dcausse: elasticsearch search cluster: SET "logger.org.elasticsearch.deprecation.index.query.functionscore.ScoreFunctionBuilder" to "ERROR" to chi/psi/omega@eqiad (T218994)
  • 19:40 volans: restart icinga on icinga1001 to reset modified attributes
  • 19:37 dcausse: morning SWAT done
  • 19:33 dcausse@deploy1001: Synchronized wmf-config/InitialiseSettings.php: [cirrus] switch all wikis to eqiad (elastic 6.5.4) (duration: 00m 50s)
  • 19:21 dcausse@deploy1001: Synchronized wmf-config/InitialiseSettings.php: T192254 (duration: 00m 49s)
  • 19:13 dcausse@deploy1001: Synchronized wmf-config/CommonSettings.php: T218260 (duration: 00m 49s)
  • 19:06 ebernhardson@deploy1001: Finished deploy [search/mjolnir/deploy@1cbf290]: Roll update to mjolnir-bulk-daemon es6 handling of super_detect_noop (duration: 03m 27s)
  • 19:02 ebernhardson@deploy1001: Started deploy [search/mjolnir/deploy@1cbf290]: Roll update to mjolnir-bulk-daemon es6 handling of super_detect_noop
  • 18:46 dcausse@deploy1001: Synchronized wmf-config/flaggedrevs.php: revert T217507 (duration: 00m 49s)
  • 18:43 ebernhardson: restart mjolnir-kafka-msearch-daemon across cirrus elasticsearch servers
  • 18:41 dcausse@deploy1001: scap failed: average error rate on 6/11 canaries increased by 10x (rerun with --force to override this check, see https://logstash.wikimedia.org/goto/db09a36be5ed3e81155041f7d46ad040 for details)
  • 18:32 dcausse@deploy1001: Synchronized php-1.33.0-wmf.22/extensions/FlaggedRevs/: T218949: Fix reject changes when user is partially blocked (duration: 00m 51s)
  • 18:27 dcausse@deploy1001: Synchronized dblists/visualeditor-nondefault.dblist: T192135 (duration: 00m 50s)
  • 18:15 dcausse@deploy1001: Synchronized wmf-config/CommonSettings.php: T211622: Enforce 8 char password length requirements for non-privileged users (duration: 00m 50s)
  • 17:24 onimisionipe@deploy1001: Finished deploy [wdqs/wdqs@281aaf8]: New build for blazegraph and updater plus GUI updates (duration: 10m 31s)
  • 17:24 elukey: restart pdfrender on scb1004
  • 17:14 onimisionipe@deploy1001: Started deploy [wdqs/wdqs@281aaf8]: New build for blazegraph and updater plus GUI updates
  • 17:11 ebernhardson: restart mjolnir-kafka-msearch-daemon on relforge100[12]
  • 17:10 dcausse@deploy1001: Synchronized wmf-config/InitialiseSettings.php: T218878: [cirrus] switch low volume wikis to eqiad (elastic 6.5.4) (duration: 00m 49s)
  • 16:56 ebernhardson@deploy1001: Finished deploy [search/mjolnir/deploy@2fb5038]: Ship new logging support code via new simplified virtualenv deployment (duration: 09m 52s)
  • 16:47 ebernhardson@deploy1001: Started deploy [search/mjolnir/deploy@2fb5038]: Ship new logging support code via new simplified virtualenv deployment
  • 16:28 ebernhardson@deploy1001: Finished deploy [search/mjolnir/deploy@ba09eb5]: Ship new logging support code via new simplified virtualenv deployment (duration: 09m 10s)
  • 16:19 ebernhardson@deploy1001: Started deploy [search/mjolnir/deploy@ba09eb5]: Ship new logging support code via new simplified virtualenv deployment
  • 16:19 hashar: updating Jenkins plugins and restarting
  • 16:16 ebernhardson@deploy1001: Finished deploy [search/mjolnir/deploy@6eda7d8]: Ship new logging support code via new simplified virtualenv deployment (duration: 02m 38s)
  • 16:13 ebernhardson@deploy1001: Started deploy [search/mjolnir/deploy@6eda7d8]: Ship new logging support code via new simplified virtualenv deployment
  • 15:48 XioNoX: remove 2nd AS7568 router in Equinix Singapore
  • 15:21 ebernhardson@deploy1001: Finished deploy [search/mjolnir/deploy@ddf26d0]: Ship new logging support code (duration: 01m 29s)
  • 15:20 ebernhardson@deploy1001: Started deploy [search/mjolnir/deploy@ddf26d0]: Ship new logging support code
  • 15:00 jbond42: updateing passenger on rhodium
  • 14:29 andrewbogott: updating slapd indexes on seaborgium, serpens, ldap-eqiad-replica01, ldap-eqiad-replica02 for 498396
  • 13:52 ema@puppetmaster1001: conftool action : set/pooled=yes; selector: name=cp1076.eqiad.wmnet,service=varnish-fe
  • 13:52 ema@puppetmaster1001: conftool action : set/pooled=yes; selector: name=cp1076.eqiad.wmnet,service=nginx
  • 13:52 ema: cp1076: repool varnish-fe, frontend misses served by cp-ats T213263
  • 13:41 ema@puppetmaster1001: conftool action : set/pooled=no; selector: name=cp1076.eqiad.wmnet,service=varnish-fe
  • 13:41 ema@puppetmaster1001: conftool action : set/pooled=no; selector: name=cp1076.eqiad.wmnet,service=nginx
  • 13:41 ema: cp1076: depool varnish-fe and point it to cp-ats T213263
  • 13:28 mutante: planet - manually updating en version since new monitoring check warned it wasn't current (T203208)
  • 13:17 mutante: mwmaint1002 - manually running tor_exit_node cron command and test with PHP 7.2
  • 12:48 mutante: reloading icinga config
  • 12:15 Lucas_WMDE: EU SWAT finished
  • 12:08 oblivian@deploy1001: Synchronized wmf-config/CommonSettings.php: Move 0.1% of anonymous users to php7 T212828 (duration: 00m 49s)
  • 12:07 moritzm: installing openssl1.0 security updates on stretch
  • 12:00 lucaswerkmeister-wmde@deploy1001: Synchronized wmf-config/InitialiseSettings.php: SWAT: Revert "Remove $wgAbuseFilterRuntimeProfile" (T191039) (duration: 00m 51s)
  • 11:48 lucaswerkmeister-wmde@deploy1001: Synchronized wmf-config/InitialiseSettings.php: SWAT: Remove $wgAbuseFilterRuntimeProfile (T191039) (duration: 00m 49s)
  • 11:46 ema: cp-ats-codfw: upgrade trafficserver to 8.0.3-1wm1
  • 11:35 lucaswerkmeister-wmde@deploy1001: Synchronized php-1.33.0-wmf.22/extensions/Wikibase/repo: SWAT: Revert "OutputPageBeforeHTML: do nothing for non entity pages" (T218907) (duration: 01m 06s)
  • 11:26 filippo@puppetmaster1001: conftool action : set/pooled=no; selector: name=prometheus2003.codfw.wmnet
  • 11:23 filippo@puppetmaster1001: conftool action : set/pooled=yes; selector: name=prometheus2004.codfw.wmnet
  • 11:23 godog: switch codfw prometheus from prometheus2003 to prometheus2004
  • 11:19 ema: cp-ats-eqiad: upgrade trafficserver to 8.0.3-1wm1
  • 11:18 oblivian@deploy1001: Synchronized wmf-config/ProductionServices.php: switching to use of the local proxy for search in php7 (duration: 00m 50s)
  • 11:16 oblivian@deploy1001: Synchronized wmf-config/LabsServices.php: switching to use of the local proxy for search in php7 (duration: 00m 50s)
  • 11:09 ema: trafficserver 8.0.3-1wm1 uploaded to stretch-wikimedia
  • 10:56 marostegui@deploy1001: Synchronized wmf-config/db-eqiad.php: Fully repool db1083 (duration: 00m 48s)
  • 10:40 marostegui@deploy1001: Synchronized wmf-config/db-eqiad.php: More traffic to db1083 (duration: 00m 49s)
  • 10:40 gehel: disable deprecation warnings on elasticsearch eqiad - T218994
  • 10:38 jdrewniak@deploy1001: Synchronized portals: Wikimedia Portals Update: Bumping portals to master (T128546) (duration: 00m 49s)
  • 10:37 jdrewniak@deploy1001: Synchronized portals/wikipedia.org/assets: Wikimedia Portals Update: Bumping portals to master (T128546) (duration: 00m 49s)
  • 10:27 moritzm: installing Java security updates on Hadoop/Druid test cluster
  • 10:26 marostegui@deploy1001: Synchronized wmf-config/db-eqiad.php: More traffic to db1083 (duration: 00m 49s)
  • 10:12 marostegui@deploy1001: Synchronized wmf-config/db-eqiad.php: More traffic to db1083 (duration: 00m 49s)
  • 10:07 moritzm: installing ntfs-3g security updates
  • 10:02 marostegui@deploy1001: Synchronized wmf-config/db-eqiad.php: Slowly repool db1083 (duration: 00m 49s)
  • 09:42 moritzm: uploaded openjdk 8u212-b01-1~deb8u1 to apt.wikimedia.org/jessie-wikimedia/main
  • 09:34 marostegui: Upgrade db2062
  • 09:24 hashar: contint1001: manually compressing Zuul log files sudo -u zuul gzip --best /var/log/zuul/*.log.????-??-??
  • 09:22 marostegui@deploy1001: Synchronized wmf-config/db-eqiad.php: Depool db1083+ (duration: 00m 49s)
  • 09:18 marostegui: Upgrade db2055
  • 09:16 marostegui@deploy1001: Synchronized wmf-config/db-eqiad.php: Repool db1105:3311 (duration: 00m 49s)
  • 09:10 mutante: contint1001 - restarting zuul
  • 08:21 marostegui@deploy1001: Synchronized wmf-config/db-eqiad.php: Depool db1105:3311 (duration: 00m 49s)
  • 08:08 vgutierrez: reenabling puppet in openldap servers
  • 08:05 marostegui@deploy1001: Synchronized wmf-config/db-eqiad.php: Fully repool db1118 (duration: 00m 49s)
  • 07:58 vgutierrez: disable puppet and downtime host in icinga for labtestservices2001 - T218022
  • 07:40 vgutierrez: disable puppet in production openldap servers before merging https://gerrit.wikimedia.org/r/498776
  • 07:24 marostegui@deploy1001: Synchronized wmf-config/db-eqiad.php: More traffic to db1118 (duration: 00m 49s)
  • 06:59 marostegui@deploy1001: Synchronized wmf-config/db-eqiad.php: Slowly repool db1118 after mysql upgrade (duration: 00m 50s)
  • 06:45 marostegui: Stop MySQL on db1118 for upgrade
  • 06:44 marostegui: Deploy schema change on s1 codfw master, this will generate lag on codfw
  • 06:10 marostegui@deploy1001: Synchronized wmf-config/db-eqiad.php: Depool db1118 for schema change and upgrade (duration: 00m 54s)
  • 04:31 chaomodus: restarted pdfrender on scb1003 to try to help flapping

2019-03-24

  • 15:00 jijiki: Restart pdfrender on scb1002 and scb1004

2019-03-23

  • 13:02 mholloway-shell@deploy1001: Synchronized wmf-config/InitialiseSettings-labs.php: Fix WikimediaEditorTasks Beta Cluster DB config, take 2 (duration: 00m 50s)
  • 12:36 mholloway-shell@deploy1001: Synchronized wmf-config/InitialiseSettings-labs.php: Fix WikimediaEditorTasks Beta Cluster DB config (duration: 00m 52s)

2019-03-22

  • 22:13 bd808: Restarted uwsgi-striker on labweb1002
  • 22:12 bd808: Restarted uwsgi-striker on labweb1001
  • 20:14 otto@deploy1001: scap-helm eventgate-analytics finished
  • 20:14 otto@deploy1001: scap-helm eventgate-analytics cluster eqiad completed
  • 20:14 otto@deploy1001: scap-helm eventgate-analytics upgrade production -f eventgate-analytics-eqiad-values.yaml stable/eventgate-analytics [namespace: eventgate-analytics, clusters: eqiad]
  • 20:04 otto@deploy1001: scap-helm eventgate-analytics finished
  • 20:04 otto@deploy1001: scap-helm eventgate-analytics cluster eqiad completed
  • 20:04 otto@deploy1001: scap-helm eventgate-analytics upgrade production -f eventgate-analytics-eqiad-values.yaml stable/eventgate-analytics [namespace: eventgate-analytics, clusters: eqiad]
  • 19:59 ejegg: updated payments-wiki-staging from 31647bc97e to f42910460b
  • 19:57 otto@deploy1001: scap-helm eventgate-analytics finished
  • 19:57 otto@deploy1001: scap-helm eventgate-analytics cluster eqiad completed
  • 19:57 otto@deploy1001: scap-helm eventgate-analytics upgrade production -f eventgate-analytics-eqiad-values.yaml stable/eventgate-analytics [namespace: eventgate-analytics, clusters: eqiad]
  • 19:55 otto@deploy1001: scap-helm eventgate-analytics finished
  • 19:55 otto@deploy1001: scap-helm eventgate-analytics cluster codfw completed
  • 19:55 otto@deploy1001: scap-helm eventgate-analytics upgrade production -f eventgate-analytics-codfw-values.yaml stable/eventgate-analytics [namespace: eventgate-analytics, clusters: codfw]
  • 19:52 otto@deploy1001: scap-helm eventgate-analytics finished
  • 19:52 otto@deploy1001: scap-helm eventgate-analytics cluster staging completed
  • 19:52 otto@deploy1001: scap-helm eventgate-analytics upgrade staging -f eventgate-analytics-staging-values.yaml stable/eventgate-analytics --set main_app.extra_kafka_conf= [namespace: eventgate-analytics, clusters: staging]
  • 19:52 otto@deploy1001: scap-helm eventgate-analytics finished
  • 19:52 otto@deploy1001: scap-helm eventgate-analytics cluster staging completed
  • 19:52 otto@deploy1001: scap-helm eventgate-analytics upgrade staging -f eventgate-analytics-staging-values.yaml stable/eventgate-analytics --set main_app.extra_kafka_conf={} [namespace: eventgate-analytics, clusters: staging]
  • 19:46 otto@deploy1001: scap-helm eventgate-analytics finished
  • 19:46 otto@deploy1001: scap-helm eventgate-analytics cluster staging completed
  • 19:46 otto@deploy1001: scap-helm eventgate-analytics upgrade staging -f eventgate-analytics-staging-values.yaml stable/eventgate-analytics [namespace: eventgate-analytics, clusters: staging]
  • 19:39 otto@deploy1001: scap-helm eventgate-analytics finished
  • 19:39 otto@deploy1001: scap-helm eventgate-analytics cluster staging completed
  • 19:39 otto@deploy1001: scap-helm eventgate-analytics upgrade staging -f eventgate-analytics-staging-values.yaml stable/eventgate-analytics [namespace: eventgate-analytics, clusters: staging]
  • 19:36 otto@deploy1001: scap-helm eventgate-analytics finished
  • 19:36 otto@deploy1001: scap-helm eventgate-analytics cluster staging completed
  • 19:36 otto@deploy1001: scap-helm eventgate-analytics upgrade staging -f eventgate-analytics-staging-values.yaml stable/eventgate-analytics [namespace: eventgate-analytics, clusters: staging]
  • 18:53 otto@deploy1001: scap-helm eventgate-analytics finished
  • 18:53 otto@deploy1001: scap-helm eventgate-analytics cluster staging completed
  • 18:53 otto@deploy1001: scap-helm eventgate-analytics upgrade staging -f eventgate-analytics-staging-values.yaml stable/eventgate-analytics [namespace: eventgate-analytics, clusters: staging]
  • 18:41 krinkle@deploy1001: Synchronized php-1.33.0-wmf.22/extensions/Collection/: I2c4f5d / T217835 (duration: 00m 52s)
  • 18:21 otto@deploy1001: scap-helm eventgate-analytics finished
  • 18:21 otto@deploy1001: scap-helm eventgate-analytics cluster staging completed
  • 18:21 otto@deploy1001: scap-helm eventgate-analytics upgrade staging -f eventgate-analytics-staging-values.yaml stable/eventgate-analytics [namespace: eventgate-analytics, clusters: staging]
  • 18:16 otto@deploy1001: scap-helm eventgate-analytics finished
  • 18:16 otto@deploy1001: scap-helm eventgate-analytics cluster staging completed
  • 18:16 otto@deploy1001: scap-helm eventgate-analytics upgrade staging -f eventgate-analytics-staging-values.yaml --set main_app.kafka_broker_list=kafka-jumbo1003.eqiad.wmnet:9092 stable/eventgate-analytics [namespace: eventgate-analytics, clusters: staging]
  • 18:13 tzatziki: removing 5 files for legal compliance
  • 18:13 otto@deploy1001: scap-helm eventgate-analytics upgrade staging -f eventgate-analytics-staging-values.yaml --set main_app.kafka_broker_list=kafka-jumbo1003.eqiad.wmnet:9092 stable/eventgate-analytics [namespace: eventgate-analytics, clusters: staging]
  • 17:29 otto@deploy1001: scap-helm eventgate-analytics upgrade staging -f eventgate-analytics-staging-values.yaml --set main_app.kafka_broker_list=kafka-jumbo1003.eqiad.wmnet:9092 stable/eventgate-analytics [namespace: eventgate-analytics, clusters: staging]
  • 16:06 jijiki: Restart ferm on db2096
  • 15:58 James_F: UBN hot-deploy for T218918: Only load latest revision in MessageCache::loadFromDB
  • 15:26 gehel: restarting elasticsearch on elastic1046 for logging configuration change - T218994
  • 14:34 mutante: scandium - apt-get remove --purge php* ; apt autoremove ; letting puppet reinstall php 7.2 one more time using mediawiki::profile::php now
  • 14:33 gehel: upgrading to elasticsearch-curator 5.6.0 on all elasticsearch nodes (including logstash) - T218991
  • 11:22 ema: lvs1002: bounce pybal to clear backends health icinga warning T218133
  • 11:18 ema: lvs1005: bounce pybal to clear backends health icinga warning T218133
  • 10:24 mutante: scandium - apt autoremove
  • 10:20 mutante: scandium - manually removing all php* packages to let puppet reinstall 7.2 instead of 7.0
  • 10:05 ema: cp2005: repooled, serving traffic via ATS T213263
  • 10:00 ema@puppetmaster1001: conftool action : set/pooled=yes; selector: name=cp2005.codfw.wmnet,service=varnish-fe
  • 10:00 ema@puppetmaster1001: conftool action : set/pooled=yes; selector: name=cp2005.codfw.wmnet,service=nginx
  • 09:48 ema@puppetmaster1001: conftool action : set/pooled=no; selector: name=cp2005.codfw.wmnet,service=varnish-fe
  • 09:48 ema@puppetmaster1001: conftool action : set/pooled=no; selector: name=cp2005.codfw.wmnet,service=nginx
  • 09:47 ema: cp2005: depool varnish-fe in preparation of traffic switch to ATS T213263
  • 09:42 moritzm: rebooting pool counters in codfw to pick up SSBD-enabled qemu
  • 09:04 elukey: start tcpdump on mc1022 to gather traffic for analysis
  • 06:26 marostegui@deploy1001: Synchronized wmf-config/db-eqiad.php: Repool db1094 (duration: 00m 50s)
  • 06:07 marostegui@deploy1001: Synchronized wmf-config/db-eqiad.php: Depool db1094 (duration: 00m 49s)
  • 06:05 marostegui@deploy1001: Synchronized wmf-config/db-codfw.php: Repool db2096 after onsite maintenance (duration: 00m 51s)
  • 01:31 bd808: labweb: upgraded mariadb packages installed on labweb100[12]
  • 01:19 bd808@deploy1001: Finished deploy [striker/deploy@b4bcd08]: Update python wheels (duration: 01m 00s)
  • 01:18 bd808@deploy1001: Started deploy [striker/deploy@b4bcd08]: Update python wheels
  • 00:54 bd808: Striker down following upgrade. scap3 did not rebuild venv as expected. Manually resolved, but not having mysql library issues.
  • 00:47 Krinkle: krinkle@mwmaint1002 Fixing corrupt 'log_params' field of kawiki.logging row where log_id=1021367; T93110
  • 00:36 bd808@deploy1001: Finished deploy [striker/deploy@c4726e3]: Django upgrade and various bug fixes (T192487, T182142, T176325, T217932) (duration: 01m 15s)
  • 00:34 bd808@deploy1001: Started deploy [striker/deploy@c4726e3]: Django upgrade and various bug fixes (T192487, T182142, T176325, T217932)
  • 00:32 James_F: SWAT done, 12 minutes ago.
  • 00:20 jforrester@deploy1001: Finished scap: SWAT: Full scap for i18n rebuild for 498259 and 498113 (duration: 24m 49s)

2019-03-21

  • 23:57 gtirloni: downtimed systemd check in labweb1001/1002 (T218935)
  • 23:56 jforrester@deploy1001: Started scap: SWAT: Full scap for i18n rebuild for 498259 and 498113
  • 23:53 gtirloni: downtimed systemd check in labwen1001 (T210818)
  • 23:32 jforrester@deploy1001: Synchronized php-1.33.0-wmf.22/extensions/ContentTranslation/api/ApiQueryContentTranslationSuggestions.php: SWAT T218902 CX: Return API error on anonymous suggestions queries (duration: 00m 51s)
  • 23:08 jforrester@deploy1001: Synchronized wmf-config/Wikibase.php: SWAT T217730 Add wikimaniawiki to another special group in Wikibase client (duration: 00m 49s)
  • 22:33 jijiki: Restarting pdfrender on scb1003
  • 22:26 otto@deploy1001: scap-helm eventgate-analytics finished
  • 22:26 otto@deploy1001: scap-helm eventgate-analytics cluster staging completed
  • 22:26 otto@deploy1001: scap-helm eventgate-analytics upgrade staging -f eventgate-analytics-staging-values.yaml stable/eventgate-analytics [namespace: eventgate-analytics, clusters: staging]
  • 22:14 gehel@cumin1001: END (PASS) - Cookbook sre.elasticsearch.e6-upgrade (exit_code=0)
  • 22:02 mholloway-shell@deploy1001: Synchronized wmf-config/CommonSettings.php: Enable WikimediaEditorTasks on the Beta Cluster (duration: 00m 49s)
  • 21:56 mholloway-shell@deploy1001: Synchronized wmf-config/InitialiseSettings-labs.php: Add WikimediaEditorTasks labs config to InitializeSettings-labs.php (duration: 00m 47s)
  • 21:54 mholloway-shell@deploy1001: Synchronized wmf-config/InitialiseSettings.php: Add WikimediaEditorTasks default config to InitializeSettings.php (duration: 00m 49s)
  • 21:53 jijiki: Restarting pdfrender on scb1004
  • 21:52 mholloway-shell@deploy1001: Synchronized wmf-config/extension-list: Add WikimediaEditorTasks to extension-list (duration: 00m 50s)
  • 21:45 gehel@cumin1001: START - Cookbook sre.elasticsearch.e6-upgrade
  • 21:39 XioNoX: Ping offload - replace test IP with text-lb.codfw IP on cr1/2-codfw - T190090
  • 21:11 XioNoX: remove peering sessions to AS7385 on cr4-ulsfo
  • 21:08 otto@deploy1001: scap-helm eventgate-analytics finished
  • 21:08 otto@deploy1001: scap-helm eventgate-analytics cluster staging completed
  • 21:08 otto@deploy1001: scap-helm eventgate-analytics upgrade staging -f eventgate-analytics-staging-values.yaml stable/eventgate-analytics [namespace: eventgate-analytics, clusters: staging]
  • 20:55 otto@deploy1001: scap-helm eventgate-analytics finished
  • 20:55 otto@deploy1001: scap-helm eventgate-analytics cluster staging completed
  • 20:55 otto@deploy1001: scap-helm eventgate-analytics upgrade staging -f eventgate-analytics-staging-values.yaml stable/eventgate-analytics [namespace: eventgate-analytics, clusters: staging]
  • 20:43 otto@deploy1001: scap-helm eventgate-analytics finished
  • 20:43 otto@deploy1001: scap-helm eventgate-analytics cluster staging completed
  • 20:43 otto@deploy1001: scap-helm eventgate-analytics upgrade staging -f eventgate-analytics-staging-values.yaml stable/eventgate-analytics [namespace: eventgate-analytics, clusters: staging]
  • 20:29 otto@deploy1001: scap-helm eventgate-analytics finished
  • 20:29 otto@deploy1001: scap-helm eventgate-analytics cluster staging completed
  • 20:29 otto@deploy1001: scap-helm eventgate-analytics upgrade staging -f eventgate-analytics-staging-values.yaml stable/eventgate-analytics [namespace: eventgate-analytics, clusters: staging]
  • 20:29 otto@deploy1001: scap-helm eventgate-analytics finished
  • 20:29 otto@deploy1001: scap-helm eventgate-analytics cluster staging completed
  • 20:29 otto@deploy1001: scap-helm eventgate-analytics upgrade staging -f eventgate-analytics-staging-values.yaml --set main_app.kafka_broker_list=kafka-jumbo1006.eqiad.wmnet:9092 stable/eventgate-analytics [namespace: eventgate-analytics, clusters: staging]
  • 20:27 otto@deploy1001: scap-helm eventgate-analytics finished
  • 20:27 otto@deploy1001: scap-helm eventgate-analytics cluster staging completed
  • 20:27 otto@deploy1001: scap-helm eventgate-analytics upgrade staging -f eventgate-analytics-staging-values.yaml --set main_app.kafka_broker_list=kafka-jumbo1006.eqiad.wmnet:9092 stable/eventgate-analytics [namespace: eventgate-analytics, clusters: staging]
  • 20:26 otto@deploy1001: scap-helm eventgate-analytics finished
  • 20:26 otto@deploy1001: scap-helm eventgate-analytics cluster staging completed
  • 20:26 otto@deploy1001: scap-helm eventgate-analytics upgrade staging -f eventgate-analytics-staging-values.yaml --set main_app.kafka_broker_list=kafka-jumbo1005.eqiad.wmnet:9092 stable/eventgate-analytics [namespace: eventgate-analytics, clusters: staging]
  • 20:24 otto@deploy1001: scap-helm eventgate-analytics finished
  • 20:24 otto@deploy1001: scap-helm eventgate-analytics cluster staging completed
  • 20:24 otto@deploy1001: scap-helm eventgate-analytics upgrade staging -f eventgate-analytics-staging-values.yaml --set main_app.kafka_broker_list=kafka-jumbo1004.eqiad.wmnet:9092 stable/eventgate-analytics [namespace: eventgate-analytics, clusters: staging]
  • 20:23 otto@deploy1001: scap-helm eventgate-analytics finished
  • 20:23 otto@deploy1001: scap-helm eventgate-analytics cluster staging completed
  • 20:23 otto@deploy1001: scap-helm eventgate-analytics upgrade staging -f eventgate-analytics-staging-values.yaml --set main_app.kafka_broker_list=kafka-jumbo1003.eqiad.wmnet:9092 stable/eventgate-analytics [namespace: eventgate-analytics, clusters: staging]
  • 20:23 otto@deploy1001: scap-helm eventgate-analytics finished
  • 20:23 otto@deploy1001: scap-helm eventgate-analytics cluster staging completed
  • 20:23 otto@deploy1001: scap-helm eventgate-analytics upgrade staging -f eventgate-analytics-staging-values.yaml --set main_app.kafka_broker_list=kafka-jumbo1001.eqiad.wmnet:9092 stable/eventgate-analytics [namespace: eventgate-analytics, clusters: staging]
  • 20:22 otto@deploy1001: scap-helm eventgate-analytics finished
  • 20:22 otto@deploy1001: scap-helm eventgate-analytics cluster staging completed
  • 20:22 otto@deploy1001: scap-helm eventgate-analytics upgrade staging -f eventgate-analytics-staging-values.yaml --set main_app.kafka_broker_list=kafka-jumbo1001.eqiad.wmnet:9092 stable/eventgate-analytics [namespace: eventgate-analytics, clusters: staging]
  • 20:21 otto@deploy1001: scap-helm eventgate-analytics finished
  • 20:21 otto@deploy1001: scap-helm eventgate-analytics cluster staging completed
  • 20:21 otto@deploy1001: scap-helm eventgate-analytics upgrade staging -f eventgate-analytics-staging-values.yaml --set main_app.kafka_broker_list=kafka-jumbo1002.eqiad.wmnet:9092 stable/eventgate-analytics [namespace: eventgate-analytics, clusters: staging]
  • 20:03 otto@deploy1001: scap-helm eventgate-analytics finished
  • 20:03 otto@deploy1001: scap-helm eventgate-analytics cluster staging completed
  • 20:03 otto@deploy1001: scap-helm eventgate-analytics upgrade staging -f eventgate-analytics-staging-values.yaml stable/eventgate-analytics [namespace: eventgate-analytics, clusters: staging]
  • 19:45 jforrester@deploy1001: Synchronized wmf-config/Wikibase.php: T213483 Disable RDF output of mediainfo Wikibase entities (duration: 00m 49s)
  • 19:40 jforrester@deploy1001: Synchronized wmf-config/Wikibase.php: T213483 Read wmgWikibaseEntityTypesWithoutRdfOutput value (duration: 00m 50s)
  • 19:39 jforrester@deploy1001: Synchronized wmf-config/InitialiseSettings.php: T213483 Set default wmgWikibaseEntityTypesWithoutRdfOutput value (duration: 00m 51s)
  • 18:49 gehel: resetting archived settings on elasticsearch cirrus eqiad - T218879
  • 18:41 otto@deploy1001: scap-helm eventgate-analytics finished
  • 18:41 otto@deploy1001: scap-helm eventgate-analytics cluster staging completed
  • 18:41 otto@deploy1001: scap-helm eventgate-analytics upgrade staging -f eventgate-analytics-staging-values.yaml stable/eventgate-analytics [namespace: eventgate-analytics, clusters: staging]
  • 18:36 sbisson@deploy1001: Synchronized php-1.33.0-wmf.22/languages/Language.php: SWAT: languages: Partial revert of I8287118cf8ec01326ead9 (duration: 00m 50s)
  • 18:30 gehel@cumin1001: END (FAIL) - Cookbook sre.elasticsearch.e6-upgrade (exit_code=99)
  • 18:25 sbisson@deploy1001: Synchronized wmf-config/InitialiseSettings.php: SWAT: Disable Welcome survey on viwiki (duration: 00m 49s)
  • 18:23 otto@deploy1001: scap-helm eventgate-analytics finished
  • 18:23 otto@deploy1001: scap-helm eventgate-analytics cluster staging completed
  • 18:23 otto@deploy1001: scap-helm eventgate-analytics upgrade staging -f eventgate-analytics-staging-values.yaml stable/eventgate-analytics [namespace: eventgate-analytics, clusters: staging]
  • 18:17 gehel@cumin2001: END (PASS) - Cookbook sre.elasticsearch.force-shard-allocation (exit_code=0)
  • 18:16 sbisson@deploy1001: Synchronized wmf-config/InitialiseSettings.php: SWAT: Enable logging for CitationUsage and CitationUsagePageLoad (duration: 00m 49s)
  • 18:13 gehel@cumin2001: START - Cookbook sre.elasticsearch.force-shard-allocation
  • 18:12 otto@deploy1001: scap-helm eventgate-analytics finished
  • 18:12 otto@deploy1001: scap-helm eventgate-analytics cluster staging completed
  • 18:12 otto@deploy1001: scap-helm eventgate-analytics upgrade staging -f eventgate-analytics-staging-values.yaml stable/eventgate-analytics [namespace: eventgate-analytics, clusters: staging]
  • 18:11 sbisson@deploy1001: Synchronized wmf-config/InitialiseSettings.php: SWAT: Disable reader trust survey v2 (duration: 00m 50s)
  • 18:08 gehel@cumin1001: START - Cookbook sre.elasticsearch.e6-upgrade
  • 18:05 otto@deploy1001: scap-helm eventgate-analytics finished
  • 18:05 otto@deploy1001: scap-helm eventgate-analytics cluster staging completed
  • 18:05 otto@deploy1001: scap-helm eventgate-analytics upgrade staging -f eventgate-analytics-staging-values.yaml stable/eventgate-analytics [namespace: eventgate-analytics, clusters: staging]
  • 18:01 otto@deploy1001: scap-helm eventgate-analytics finished
  • 18:01 otto@deploy1001: scap-helm eventgate-analytics cluster staging completed
  • 18:01 otto@deploy1001: scap-helm eventgate-analytics upgrade staging -f eventgate-analytics-staging-values.yaml stable/eventgate-analytics [namespace: eventgate-analytics, clusters: staging]
  • 17:56 bblack: everything back to normal for lvs1002/lvs1005 (high-traffic2 @ eqiad)
  • 17:55 bblack: restarting pybal on lvs1002
  • 17:54 otto@deploy1001: scap-helm eventgate-analytics finished
  • 17:54 otto@deploy1001: scap-helm eventgate-analytics cluster staging completed
  • 17:54 otto@deploy1001: scap-helm eventgate-analytics upgrade staging -f eventgate-analytics-staging-values.yaml stable/eventgate-analytics [namespace: eventgate-analytics, clusters: staging]
  • 17:49 reedy@deploy1001: Synchronized php-1.33.0-wmf.22/includes/user/User.php: Iab2492 (duration: 00m 51s)
  • 17:43 bblack: restarting pybal on lvs1005
  • 17:43 jforrester@deploy1001: Synchronized wmf-config/InitialiseSettings.php: SDC: Enable EntitySourceBasedFederation on TestCommons (duration: 00m 50s)
  • 17:37 gehel@cumin1001: END (FAIL) - Cookbook sre.elasticsearch.e6-upgrade (exit_code=99)
  • 17:35 bblack: disabled puppet on lvs1002 + lvs1005 for new service rollout
  • 17:28 gehel@cumin1001: START - Cookbook sre.elasticsearch.e6-upgrade
  • 17:27 jforrester@deploy1001: Synchronized wmf-config/CommonSettings.php: SDC: Add test-commons.wikimedia.org to wgCrossSiteAJAXdomains (duration: 00m 49s)
  • 17:11 gehel@cumin1001: END (FAIL) - Cookbook sre.elasticsearch.e6-upgrade (exit_code=99)
  • 17:07 jforrester@deploy1001: Synchronized wmf-config/InitialiseSettings.php: SDC: Enable Depicts on TestCommons, with related config (duration: 00m 50s)
  • 17:03 gehel@cumin1001: START - Cookbook sre.elasticsearch.e6-upgrade
  • 17:03 gehel@cumin1001: END (ERROR) - Cookbook sre.elasticsearch.e6-upgrade (exit_code=97)
  • 17:02 gehel@cumin2001: END (PASS) - Cookbook sre.elasticsearch.force-shard-allocation (exit_code=0)
  • 17:02 gehel@cumin2001: START - Cookbook sre.elasticsearch.force-shard-allocation
  • 16:39 gehel@cumin1001: START - Cookbook sre.elasticsearch.e6-upgrade
  • 16:38 gehel@cumin1001: END (ERROR) - Cookbook sre.elasticsearch.e6-upgrade (exit_code=97)
  • 16:38 gehel@cumin1001: START - Cookbook sre.elasticsearch.e6-upgrade
  • 16:38 gehel@cumin1001: END (ERROR) - Cookbook sre.elasticsearch.e6-upgrade (exit_code=97)
  • 16:38 gehel@cumin2001: END (PASS) - Cookbook sre.elasticsearch.force-shard-allocation (exit_code=0)
  • 16:38 gehel@cumin2001: START - Cookbook sre.elasticsearch.force-shard-allocation
  • 16:29 gehel@cumin2001: END (PASS) - Cookbook sre.elasticsearch.force-shard-allocation (exit_code=0)
  • 16:29 gehel@cumin2001: START - Cookbook sre.elasticsearch.force-shard-allocation
  • 16:03 marostegui@deploy1001: Synchronized wmf-config/db-codfw.php: Depool db2096 for onsite maintenance (duration: 00m 50s)
  • 16:01 marostegui: Poweroff db2096 for onsite maintenance T218336
  • 15:20 moritzm: rebooting flerovium/furud for kernel updates
  • 14:35 moritzm: restarging jenkins on releases* after Java update
  • 14:18 gtirloni: downtimed labtestweb2001 (T218881)
  • 14:11 vgutierrez: re-enabling puppet in acme-chief clients - T218862
  • 14:09 arturo: T218024 disabled icinga checks for labtestweb2001
  • 14:07 gehel@cumin1001: START - Cookbook sre.elasticsearch.e6-upgrade
  • 13:58 vgutierrez: update acme-chief to version 0.15 in acmechief1001 - T218862
  • 13:54 vgutierrez: disabling puppet in acme-chief clients - T218862
  • 13:48 akosiaris: reboot oresrdb2001
  • 13:39 marostegui@deploy1001: Synchronized wmf-config/db-eqiad.php: Repool db1090:3317 (duration: 00m 51s)
  • 13:37 elukey: upgrade openjdk-8 on an-worker1080 and restarted hadoop daemons
  • 13:28 moritzm: installing Java security updates on notebook hosts
  • 13:22 zfilipin@deploy1001: rebuilt and synchronized wikiversions files: all wikis to 1.33.0-wmf.22
  • 13:18 gtirloni: downtimed cloudcontrol*, cloudservices*, labcontrol*, labweb* (T210818)
  • 13:06 moritzm: installing Java security updates on stat hosts
  • 12:40 arturo: T216497 remove python-cliff from jessie-wikimedia/openstack-mitaka-jessie
  • 12:35 jijiki: Pooling mw1339 back
  • 12:33 jijiki: Pooling mw1290 back
  • 12:08 arturo: T216497 add python-cliff to jessie-wikimedia/openstack-mitaka-jessie
  • 12:02 vgutierrez: uploaded acme-chief 0.15 to apt.wikimedia.org (buster) - T218862
  • 11:54 elukey: restart yarn node managers on an-worker10[82,89,92] - shutdown after a long yarn failover and only now downtime is expired
  • 11:36 mutante: gerrit2001 (not the master prod server)- scheduled downtime and rebooting for upgrade
  • 11:04 zeljkof: EU SWAT finished
  • 11:04 zfilipin@deploy1001: Synchronized wmf-config/throttle.php: SWAT: Add new throttle rule for LMU Edit-a-thon (T217929) (duration: 00m 57s)
  • 10:57 filippo@puppetmaster1001: conftool action : set/pooled=no; selector: name=prometheus2004.codfw.wmnet
  • 10:52 filippo@puppetmaster1001: conftool action : set/pooled=yes; selector: name=prometheus2003.codfw.wmnet
  • 10:46 elukey: restart hadoop yarn resource managers on an-master100[1,2] to pick up new settings
  • 10:23 moritzm: rebooting labtestcontrol2001 for kernel update
  • 10:19 marostegui@deploy1001: Synchronized wmf-config/db-eqiad.php: Depool db1090:3317 (duration: 00m 56s)
  • 09:59 marostegui@deploy1001: Synchronized wmf-config/db-eqiad.php: Fully repool db1086 (duration: 00m 58s)
  • 09:42 jiji@cumin1001: conftool action : set/pooled=no; selector: dc=codfw,service=cxserver,cluster=scb,name=scb.*
  • 09:42 jijiki: Depool scb* in codfw from serving cxserver, finishing its migration to k8s - T213195
  • 09:29 marostegui@deploy1001: Synchronized wmf-config/db-eqiad.php: Slowly repool db1086 after mysql upgrade (duration: 00m 56s)
  • 09:27 moritzm: rolling reboot of maps servers in codfw for kernel update
  • 09:17 marostegui: Upgrade and reboot db1086
  • 08:53 marostegui: Upgrade db1086
  • 08:53 marostegui@deploy1001: Synchronized wmf-config/db-eqiad.php: Depool db1086 for upgrade (duration: 00m 56s)
  • 08:43 marostegui@deploy1001: Synchronized wmf-config/db-eqiad.php: Repool db1086 (duration: 00m 57s)
  • 08:20 marostegui@deploy1001: Synchronized wmf-config/db-eqiad.php: Depool db1086 (duration: 00m 56s)
  • 08:09 marostegui@deploy1001: Synchronized wmf-config/db-eqiad.php: Repool db1079 (duration: 00m 56s)
  • 08:01 vgutierrez: deploying directory based certificates in acme-chief clients - T207295
  • 07:35 _joe_: rolling restart of php-fpm to pick up some changes
  • 07:34 marostegui: Deploy schema change on db1079, this will generate lag on labsdb:s8
  • 07:33 marostegui@deploy1001: Synchronized wmf-config/db-eqiad.php: Depool db1079 (duration: 00m 57s)
  • 07:03 elukey: restart pdfrender on scb1002
  • 06:55 marostegui@deploy1001: Synchronized wmf-config/db-eqiad.php: Repool db1101:3317 (duration: 00m 56s)
  • 06:24 marostegui: Run wmcs-wikireplica-dns on cloudcontrol1003 to get dbproxy1011 back
  • 06:13 marostegui@deploy1001: Synchronized wmf-config/db-eqiad.php: Depool db1101:3317 (duration: 01m 10s)
  • 06:12 marostegui: Upgrade and reboot dbproxy1011
  • 06:04 marostegui: Run wmcs-wikireplica-dns on cloudcontrol1003 to drain dbproxy1011
  • 00:09 jforrester@deploy1001: Synchronized php-1.33.0-wmf.22/includes/parser/BlockLevelPass.php: SAT T218817 Unbreak parser line counting for long wikitext pages I22eebb70a I55a2c4c0 I41a45266d (duration: 00m 56s)
  • 00:08 twentyafterfour: deploying phabricator upgrade
  • 00:01 jforrester@deploy1001: Synchronized wmf-config/CommonSettings.php: SWAT Move FundraisingTranslateWorkflow load to after Translate I73452ae8 (duration: 00m 56s)

2019-03-20

  • 23:49 jforrester@deploy1001: Synchronized php-1.33.0-wmf.22/resources/lib/ooui/oojs-ui-core.js: SWAT T218722 T218830 Bring forward UBN OOUI fix (duration: 00m 57s)
  • 23:28 maxsem@deploy1001: Synchronized wmf-config/InitialiseSettings.php: https://gerrit.wikimedia.org/r/#/c/operations/mediawiki-config/+/497948/ (duration: 00m 56s)
  • 23:10 maxsem@deploy1001: Synchronized wmf-config/InitialiseSettings.php: https://gerrit.wikimedia.org/r/#/c/operations/mediawiki-config/+/490648/ (duration: 00m 56s)
  • 22:29 jforrester@deploy1001: Synchronized wmf-config/InitialiseSettings.php: T214075 Enable federation of Wikidata items and properties on Test Commons (duration: 00m 57s)
  • 21:37 XioNoX: apply transit-in4 term offload-ping4 with test IP to cr1/2-codfw - T190090
  • 21:34 XioNoX: apply transit-in4 term offload-ping4 with test IP to cr2-codfw
  • 21:00 XioNoX: apply icmp redirect on cr1-codfw:xe-5/0/2 (to cr4-ulsfo) for test IP 208.80.154.225 - T190090
  • 20:24 zfilipin@deploy1001: Synchronized php: group1 wikis to 1.33.0-wmf.22 (duration: 01m 46s)
  • 20:23 zfilipin@deploy1001: rebuilt and synchronized wikiversions files: group1 wikis to 1.33.0-wmf.22
  • 20:13 otto@deploy1001: scap-helm eventgate-analytics cluster codfw completed
  • 20:13 otto@deploy1001: scap-helm eventgate-analytics finished
  • 20:13 otto@deploy1001: scap-helm eventgate-analytics upgrade production -f eventgate-analytics-codfw-values.yaml stable/eventgate-analytics [namespace: eventgate-analytics, clusters: codfw]
  • 20:13 otto@deploy1001: scap-helm eventgate-analytics finished
  • 20:13 otto@deploy1001: scap-helm eventgate-analytics cluster eqiad completed
  • 20:13 otto@deploy1001: scap-helm eventgate-analytics upgrade production -f eventgate-analytics-eqiad-values.yaml stable/eventgate-analytics [namespace: eventgate-analytics, clusters: eqiad]
  • 20:07 otto@deploy1001: scap-helm eventgate-analytics finished
  • 20:07 otto@deploy1001: scap-helm eventgate-analytics cluster staging completed
  • 20:07 otto@deploy1001: scap-helm eventgate-analytics upgrade staging -f eventgate-analytics-staging-values.yaml stable/eventgate-analytics [namespace: eventgate-analytics, clusters: staging]
  • 19:38 zfilipin@deploy1001: rebuilt and synchronized wikiversions files: group0 to 1.33.0-wmf.22
  • 19:13 otto@deploy1001: scap-helm eventgate-analytics cluster staging completed
  • 19:13 otto@deploy1001: scap-helm eventgate-analytics finished
  • 19:13 otto@deploy1001: scap-helm eventgate-analytics upgrade staging -f eventgate-analytics-staging-values.yaml stable/eventgate-analytics [namespace: eventgate-analytics, clusters: staging]
  • 19:13 otto@deploy1001: scap-helm eventgate-analytics upgrade staging -f eventgate-analytics-staging-values.yaml stable/eventgate-analytics [namespace: eventgate-analytics, clusters: staging]
  • 19:10 otto@deploy1001: scap-helm eventgate-analytics upgrade staging -f eventgate-analytics-staging-values.yaml stable/eventgate-analytics [namespace: eventgate-analytics, clusters: staging]
  • 19:04 zfilipin@deploy1001: Finished scap: testwiki to php-1.33.0-wmf.22 and rebuild l10n cache (duration: 38m 29s)
  • 18:50 jijiki: restarting pdfrender on scb1003
  • 18:49 ottomata: hitting eventgate-analytics in eqiad with ab
  • 18:39 otto@deploy1001: scap-helm eventgate-analytics cluster eqiad completed
  • 18:39 otto@deploy1001: scap-helm eventgate-analytics finished
  • 18:39 otto@deploy1001: scap-helm eventgate-analytics upgrade production -f eventgate-analytics-eqiad-values.yaml stable/eventgate-analytics [namespace: eventgate-analytics, clusters: eqiad]
  • 18:39 otto@deploy1001: scap-helm eventgate-analytics cluster codfw completed
  • 18:39 otto@deploy1001: scap-helm eventgate-analytics finished
  • 18:39 otto@deploy1001: scap-helm eventgate-analytics upgrade production -f eventgate-analytics-codfw-values.yaml stable/eventgate-analytics [namespace: eventgate-analytics, clusters: codfw]
  • 18:37 otto@deploy1001: scap-helm eventgate-analytics cluster staging completed
  • 18:37 otto@deploy1001: scap-helm eventgate-analytics finished
  • 18:37 otto@deploy1001: scap-helm eventgate-analytics upgrade staging -f eventgate-analytics-staging-values.yaml stable/eventgate-analytics [namespace: eventgate-analytics, clusters: staging]
  • 18:26 zfilipin@deploy1001: Started scap: testwiki to php-1.33.0-wmf.22 and rebuild l10n cache
  • 16:44 XioNoX: disable lldp on asw2-a-eqiad:ge-8/0/10
  • 16:25 chasemp: mkdir /srv/dumps/xmldatadumps/public/other/rook for T218587 (fyi apergos)
  • 15:55 gehel@cumin1001: END (PASS) - Cookbook sre.elasticsearch.e6-upgrade (exit_code=0)
  • 15:52 gehel@cumin1001: START - Cookbook sre.elasticsearch.e6-upgrade
  • 15:35 gehel@cumin1001: END (FAIL) - Cookbook sre.elasticsearch.e6-upgrade (exit_code=99)
  • 15:35 marostegui@deploy1001: Synchronized wmf-config/db-eqiad.php: Repool db1098:3317 (duration: 00m 50s)
  • 15:33 gehel@cumin1001: START - Cookbook sre.elasticsearch.e6-upgrade
  • 15:24 gehel@cumin1001: END (FAIL) - Cookbook sre.elasticsearch.e6-upgrade (exit_code=99)
  • 15:24 gehel@cumin1001: START - Cookbook sre.elasticsearch.e6-upgrade
  • 15:23 gehel@cumin1001: START - Cookbook sre.elasticsearch.e6-upgrade
  • 15:23 gehel@cumin1001: END (FAIL) - Cookbook sre.elasticsearch.e6-upgrade (exit_code=99)
  • 15:22 bawolff@deploy1001: Synchronized wmf-config/wikitech.php: Adjust account stuff at wikitech 4adc89bce4 (duration: 00m 48s)
  • 15:20 gehel@cumin1001: END (PASS) - Cookbook sre.elasticsearch.e6-upgrade (exit_code=0)
  • 15:20 gehel@cumin1001: START - Cookbook sre.elasticsearch.e6-upgrade
  • 15:10 gehel@cumin1001: END (FAIL) - Cookbook sre.elasticsearch.e6-upgrade (exit_code=99)
  • 15:09 gehel@cumin1001: START - Cookbook sre.elasticsearch.e6-upgrade
  • 15:09 gehel@cumin1001: END (ERROR) - Cookbook sre.elasticsearch.e6-upgrade (exit_code=97)
  • 15:08 gehel@cumin1001: START - Cookbook sre.elasticsearch.e6-upgrade
  • 14:51 marostegui@deploy1001: Synchronized wmf-config/db-eqiad.php: Depool db1098:3317 (duration: 00m 56s)
  • 14:35 zfilipin@deploy1001: clean aborted: Pruned MediaWiki: 1.33.0-wmf.17 (duration: 00m 03s)
  • 14:02 moritzm: rebooting oresrdb2002 for kernel update
  • 13:48 godog: take a snapshot of prometheus data on prometheus1004
  • 13:44 zfilipin@deploy1001: clean aborted: Pruned MediaWiki: 1.33.0-wmf.17 (duration: 00m 05s)
  • 13:37 zfilipin@deploy1001: clean aborted: Pruned MediaWiki: 1.33.0-wmf.17 (duration: 00m 08s)
  • 13:29 otto@deploy1001: scap-helm eventgate-analytics cluster staging completed
  • 13:29 otto@deploy1001: scap-helm eventgate-analytics finished
  • 13:29 otto@deploy1001: scap-helm eventgate-analytics upgrade staging -f eventgate-analytics-staging-values.yaml stable/eventgate-analytics [namespace: eventgate-analytics, clusters: staging]
  • 11:51 akosiaris: re-enable puppet across fleet
  • 11:45 Amir1: EU SWAT is done
  • 11:44 ladsgroup@deploy1001: Synchronized wmf-config/InitialiseSettings.php: SWAT: Add wikimania as a special group to wikidata sitelinks (T217730) (duration: 00m 50s)
  • 11:40 ladsgroup@deploy1001: Synchronized dblists/wikidataclient.dblist: SWAT: Add wikimaniawiki to wikidataclient.dblist (T217730) (duration: 00m 50s)
  • 11:34 ladsgroup@deploy1001: Synchronized wmf-config/InitialiseSettings.php: SWAT: Enable Advanced Mobile Contributions mode for ar,id,es and test wikis (T217643) (duration: 00m 50s)
  • 11:34 akosiaris" disable puppet across fleet to avoid alert spam storm
  • 11:28 ladsgroup@deploy1001: Synchronized wmf-config/InitialiseSettings.php: SWAT: Partially revert "Enable musical notation datatype in wikidata" (T218535) (duration: 00m 50s)
  • 11:16 ladsgroup@deploy1001: Synchronized wmf-config/InitialiseSettings.php: SWAT: Increased maxSerializedEntitySize from 2500 to 3000 (T217739) (duration: 01m 47s)
  • 11:03 akosiaris: restart gerrit for testing https://gerrit.wikimedia.org/r/#/c/operations/puppet/+/497727/
  • 10:28 akosiaris: restart gerrit for merge of https://gerrit.wikimedia.org/r/#/c/operations/puppet/+/497561/
  • 10:26 godog: reimage prometheus1003 with stretch - T205870
  • 10:20 marostegui: Repool dbproxy1010 and running wmcs-wikireplica-dns script
  • 10:12 marostegui: Reboot dbproxy1010 for upgrade
  • 09:45 vgutierrez: updated acme-chief to version 0.14 in acmechief[12]001
  • 09:32 marostegui: Deploy schema change on s7 codfw master, lag will appear on codfw
  • 09:06 marostegui@deploy1001: Synchronized wmf-config/db-eqiad.php: Repool db1087 (duration: 00m 48s)
  • 08:59 marostegui@deploy1001: Synchronized wmf-config/db-eqiad.php: Depool db1087 (duration: 00m 48s)
  • 08:55 filippo@puppetmaster1001: conftool action : set/pooled=no; selector: name=prometheus1003.eqiad.wmnet
  • 08:53 marostegui@deploy1001: Synchronized wmf-config/db-eqiad.php: Repool db1109 (duration: 00m 48s)
  • 08:47 marostegui@deploy1001: Synchronized wmf-config/db-eqiad.php: Depool db1109 (duration: 00m 48s)
  • 08:33 marostegui@deploy1001: Synchronized wmf-config/db-eqiad.php: Repool db1092 (duration: 00m 48s)
  • 08:26 marostegui@deploy1001: Synchronized wmf-config/db-eqiad.php: Depool db1092 (duration: 00m 48s)
  • 08:20 ema: cp2009, cp1071 (cp-ats): reboot for kernel upgrades
  • 07:32 elukey: pool kafka1001 in pybal's eventbus service after yesterday's network maintenance
  • 06:09 marostegui@deploy1001: Synchronized wmf-config/db-eqiad.php: Repool databases in row A - T187960 (duration: 00m 49s)
  • 00:48 thcipriani@deploy1001: Synchronized php-1.33.0-wmf.21/includes/Title.php: SWAT: Improve Caching in Title::loadRestrictions() (duration: 00m 51s)

2019-03-19

  • 22:20 otto@deploy1001: Finished deploy [eventlogging/analytics@9aea626]: fix for production error where mw api is returning html instead of json schemas (duration: 00m 04s)
  • 22:20 otto@deploy1001: Started deploy [eventlogging/analytics@9aea626]: fix for production error where mw api is returning html instead of json schemas
  • 21:50 otto@deploy1001: scap-helm eventgate-analytics finished
  • 21:50 otto@deploy1001; scap-helm eventgate-analytics cluster eqiad completed
  • 21:50 otto@deploy1001: scap-helm eventgate-analytics upgrade production -f eventgate-analytics-eqiad-values.yaml stable/eventgate-analytics [namespace: eventgate-analytics, clusters: eqiad]
  • 21:50 otto@deploy1001: scap-helm eventgate-analytics finished
  • 21:50 otto@deploy1001: scap-helm eventgate-analytics cluster codfw completed
  • 21:50 otto@deploy1001: scap-helm eventgate-analytics upgrade production -f eventgate-analytics-codfw-values.yaml stable/eventgate-analytics [namespace: eventgate-analytics, clusters: codfw]
  • 21:36 otto@deploy1001: scap-helm eventgate-analytics cluster staging completed
  • 21:36 otto@deploy1001: scap-helm eventgate-analytics finished
  • 21:36 otto@deploy1001; scap-helm eventgate-analytics upgrade staging -f eventgate-analytics-staging-values.yaml stable/eventgate-analytics [namespace: eventgate-analytics, clusters: staging]
  • 21:23 otto@deploy1001: scap-helm eventgate-analytics finished
  • 21:23 otto@deploy1001: scap-helm eventgate-analytics cluster staging completed
  • 21:23 otto@deploy1001: scap-helm eventgate-analytics upgrade staging -f eventgate-analytics-staging-values.yaml stable/eventgate-analytics [namespace: eventgate-analytics, clusters: staging]
  • 21:07 cdanis: cdanis@wikitech-static.wikimedia.org: apt install sshguard
  • 21:06 otto@deploy1001: scap-helm eventgate-analytics cluster staging completed
  • 21:06 otto@deploy1001: scap-helm eventgate-analytics finished
  • 21:06 otto@deploy1001: scap-helm eventgate-analytics upgrade staging -f eventgate-analytics-staging-values.yaml stable/eventgate-analytics [namespace: eventgate-analytics, clusters: staging]
  • 20:58 XioNoX: disable down ports with no description on switches
  • 20:44 cdanis: enabling puppet on contint1001
  • 19:54 gehel@cumin1001: END (FAIL) - Cookbook sre.elasticsearch.e6-upgrade (exit_code=99)
  • 19:52 gehel@cumin1001: START - Cookbook sre.elasticsearch.e6-upgrade
  • 19:47 XioNoX: disable asw2-a<->asw-a link
  • 19:44 cdanis: icinga failed over to icinga1001 successfully
  • 19:43 XioNoX: remove forced failover on cr1/cr2-eqiad
  • 19:36 cdanis: failing over icinga to icinga1001
  • 19:35 XioNoX: enable cr2-eqiad:ae1
  • 19:29 ariel@deploy1001: Finished deploy [dumps/dumps@da66149]: move maxretries to config (duration: 00m 03s)
  • 19:29 ariel@deploy1001: Started deploy [dumps/dumps@da66149]: move maxretries to config
  • 19:09 ejegg: updated CiviCRM from a2316be94f to 3bfc7a762e
  • 19:09 gtirloni: rebooted labmon1001
  • 19:02 XioNoX: disable cr2-eqiad:ae1
  • 18:46 XioNoX: failover cr2-eqiad:ae1 VRRP master to cr1
  • 18:17 XioNoX: starting pybal on lvs1002
  • 18:11 XioNoX: stopping pybal on lvs1002
  • 18:09 XioNoX: starting pybal on lvs1001
  • 18:01 XioNoX: stopping pybal on lvs1001
  • 18:01 jijiki: restart pdfrender on scb1003
  • 17:56 XioNoX: shutdown scp1001 for uplink move
  • 17:47 Lucas_WMDE: Updated the Wikidata property suggester with data from last Monday's JSON dump and applied the T132839 workarounds (T216270)
  • 17:33 hasharAway: contint1001 / CI going for a quick scheduled maintenance -network cable being moved-
  • 17:33 mholloway-shell@deploy1001: Finished deploy [mobileapps/deploy@64f09a0]: T150377: Bump wikimedia-page-library to 6.3.0 (duration: 01m 50s)
  • 17:31 mholloway-shell@deploy1001: Started deploy [mobileapps/deploy@64f09a0]: T150377: Bump wikimedia-page-library to 6.3.0
  • 17:30 mdholloway: mobileapps deploy failed for group default3, retrying
  • 17:24 tzatziki: changing email for User:St3f
  • 17:18 mholloway-shell@deploy1001: Finished deploy [mobileapps/deploy@64f09a0]: T150377: Bump wikimedia-page-library to 6.3.0 (duration: 03m 47s)
  • 17:16 addshore: started "foreachwikiindblist wiktionary extensions/Cognate/maintenance/populateCognatePages.php --batch-size 1000" in a screen on mwdebug1002 (catching up cognate after x1 readonly time)
  • 17:14 mholloway-shell@deploy1001: Started deploy [mobileapps/deploy@64f09a0]: T150377: Bump wikimedia-page-library to 6.3.0
  • 16:45 vgutierrez: uploaded acme-chief 0.14 to apt.wikimedia.org (buster) - T218685 T218418 T207295
  • 16:30 elukey: stop eventlogging's mysql kafka consumers on eventlog1002, eventlogging's db replication on db1108 to ease db1107's maintenance
  • 16:29 elukey: stop eventlogging's mysql kafka consumers on eventlog1002, eventlogging's db replication on db1108 to ease db1107's maintenance
  • 16:15 bstorm_: downtimed labstore1003 for network moves so it doesn't page
  • 16:10 otto@deploy1001: scap-helm eventgate-analytics cluster staging completed
  • 16:10 otto@deploy1001: scap-helm eventgate-analytics finished
  • 16:10 otto@deploy1001: scap-helm eventgate-analytics upgrade staging -f eventgate-analytics-staging-values.yaml stable/eventgate-analytics [namespace: eventgate-analytics, clusters: staging]
  • 16:08 ayounsi@puppetmaster1001: conftool action : set/pooled=yes; selector: name=dns1001.wikimedia.org,service=pdns_recursor
  • 16:02 ayounsi@puppetmaster1001: conftool action : set/pooled=no; selector: name=dns1001.wikimedia.org,service=pdns_recursor
  • 16:01 mobrovac@deploy1001: Finished deploy [restbase/deploy@62df8c3]: Update the docs page title; deploy v0.19.3, take #2 (duration: 21m 01s)
  • 15:58 tzatziki: changing password for User:St3f
  • 15:57 XioNoX: enable pybal on lvs1006
  • 15:55 XioNoX; disable pybal on lvs1006
  • 15:54 XioNoX: enable pybal on lvs1005
  • 15:52 XioNoX: disable pybal on lvs1005
  • 15:50 otto@deploy1001: scap-helm eventgate-analytics cluster staging completed
  • 15:50 otto@deploy1001: scap-helm eventgate-analytics finished
  • 15:50 otto@deploy1001: scap-helm eventgate-analytics upgrade staging -f eventgate-analytics-staging-values.yaml stable/eventgate-analytics [namespace: eventgate-analytics, clusters: staging]
  • 15:49 XioNoX: enable pybal on lvs1004
  • 15:45 XioNoX: disable pybal on lvs1004
  • 15:40 mobrovac@deploy1001: Started deploy [restbase/deploy@62df8c3]: Update the docs page title; deploy v0.19.3, take #2
  • 15:40 mobrovac@deploy1001: Finished deploy [restbase/deploy@62df8c3]: Update the docs page title; deploy v0.19.3 (duration: 12m 27s)
  • 15:28 mobrovac@deploy1001: Started deploy [restbase/deploy@62df8c3]: Update the docs page title; deploy v0.19.3
  • 15:27 marostegui@deploy1001: Synchronized wmf-config/db-eqiad.php: Set s2 read only OFF - T187960 (duration: 00m 26s)
  • 15:21 marostegui@deploy1001: Synchronized wmf-config/db-eqiad.php: Set s2 database master on read only - T187960 (duration: 00m 48s)
  • 15:12 XioNoX: eqiad A7 servers uplink move - T187960
  • 14:46 moritzm: rebooting icinga1001 for kernel update
  • 14:44 marostegui@deploy1001: Synchronized wmf-config/db-eqiad.php: Depool databases in row A - T187960 (duration: 00m 48s)
  • 14:41 reedy@deploy1001: Synchronized wmf-config/CommonSettings.php: Reapply I49a18d from gerrit for consistency (duration: 00m 49s)
  • 14:32 otto@deploy1001: scap-helm eventgate-analytics finished
  • 14:32 otto@deploy1001: scap-helm eventgate-analytics cluster eqiad completed
  • 14:32 otto@deploy1001: scap-helm eventgate-analytics upgrade production -f eventgate-analytics-eqiad-values.yaml stable/eventgate-analytics [namespace: eventgate-analytics, clusters: eqiad]
  • 14:32 otto@deploy1001: scap-helm eventgate-analytics cluster codfw completed
  • 14:32 otto@deploy1001: scap-helm eventgate-analytics finished
  • 14:31 otto@deploy1001: scap-helm eventgate-analytics upgrade production -f eventgate-analytics-codfw-values.yaml stable/eventgate-analytics [namespace: eventgate-analytics, clusters: codfw]
  • 14:31 otto@deploy1001: scap-helm eventgate-analytics install -n production -f eventgate-analytics-codfw-values.yaml stable/eventgate-analytics [namespace: eventgate-analytics, clusters: codfw]
  • 14:28 otto@deploy1001: scap-helm eventgate-analytics finished
  • 14:28 otto@deploy1001: scap-helm eventgate-analytics cluster staging completed
  • 14:28 <otto@deploy1001: scap-helm eventgate-analytics upgrade staging -f eventgate-analytics-staging-values.yaml stable/eventgate-analytics [namespace: eventgate-analytics, clusters: staging]
  • 14:10 otto@deploy1001: scap-helm eventgate-analytics cluster staging completed
  • 14:10 otto@deploy1001: scap-helm eventgate-analytics finished
  • 14:10 otto@deploy1001: scap-helm eventgate-analytics upgrade staging -f eventgate-analytics-staging-values.yaml stable/eventgate-analytics [namespace: eventgate-analytics, clusters: staging]
  • 13:19 akosiaris: start zuul/zuul-merger
  • 13:12 akosiaris: unfirewall gerrit, put service back in action
  • 11:31 moritzm: installing php5 security updates
  • 09:08 akosiaris: start nagios-nrpe-server on proton1002, failed due to fork() failed with error 12, bailing out...
  • 07:25 kart_: Finished manual run of unpublished ContentTranslation draft purge script (T218279)
  • 07:20 twentyafterfour@deploy1001: Synchronized wmf-config/CommonSettings.php: Temporarily disable account creation on wikitech (duration: 00m 51s)
  • 06:47 akosiaris: stop zuul and zuul-merger on contint1001
  • 03:45 kart_: Started manual run of unpublished ContentTranslation draft purge script (T218279)
  • 02:12 krinkle@deploy1001: Synchronized php-1.33.0-wmf.21/extensions/EventLogging/includes/ApiJsonSchema.php: If280a4056a (duration: 00m 48s)
  • 02:11 krinkle@deploy1001: Synchronized php-1.33.0-wmf.21/extensions/EventLogging/includes/RemoteSchema.php: If280a4056a (duration: 00m 51s)
  • 00:14 reedy@deploy1001: Synchronized php-1.33.0-wmf.21/tests/phpunit/includes/: Replace wgUser with RequestContext::getUser in User::getBlockedStatus (duration: 01m 00s)
  • 00:12 reedy@deploy1001: Synchronized php-1.33.0-wmf.21/includes/user/User.php: Replace wgUser with RequestContext::getUser in User::getBlockedStatus (duration: 00m 49s)

2019-03-18

  • 23:54 maxsem@deploy1001: Synchronized wmf-config/InitialiseSettings.php: https://gerrit.wikimedia.org/r/#/c/operations/mediawiki-config/+/494551/ (duration: 00m 49s)
  • 23:45 maxsem@deploy1001: Synchronized wmf-config/InitialiseSettings.php: https://gerrit.wikimedia.org/r/#/c/operations/mediawiki-config/+/494551/ (duration: 00m 48s)
  • 23:33 maxsem@deploy1001: Synchronized php-1.33.0-wmf.21/includes/EditPage.php: https://gerrit.wikimedia.org/r/#/c/mediawiki/core/+/497347/ (duration: 00m 49s)
  • 23:25 twentyafterfour: running puppet on phab1001 to get out of degraded state
  • 23:23 XioNoX: renumber Telia transit in eqsin
  • 23:14 maxsem@deploy1001> Synchronized wmf-config/InitialiseSettings.php: https://gerrit.wikimedia.org/r/#/c/operations/mediawiki-config/+/497317/ (duration: 00m 49s)
  • 23:07 maxsem@deploy1001> Synchronized wmf-config/CommonSettings.php: https://gerrit.wikimedia.org/r/#/c/operations/mediawiki-config/+/496515/ (duration: 00m 48s)
  • 22:18 greg-g: gjg@phab1001:~$ sudo /srv/phab/phabricator/bin/auth strip --all-types --user Barras # per request/verification from foks
  • 19:57 bawolff@deploy1001: Synchronized wmf-config/InitialiseSettings.php: Enable block disables login on wikitech (duration: 00m 48s)
  • 19:56 bawolff@deploy1001: Synchronized wmf-config/wikitech.php: Adjust ldap config (duration: 00m 48s)
  • 16:17 volans: restarting pdfrender on scb1003
  • 16:15 volans: restarting pdfrender on scb1004
  • 15:48 jiji@cumin1001: conftool action : set/pooled=no; selector: dc=eqiad,service=cxserver,cluster=scb,name=scb.*
  • 15:45 jijiki: Depool sbc* from serving cxserver on eqiad - T213195
  • 15:06 papaul: shutting down mw2206 for memtest
  • 14:47 gehel@cumin1001: END (FAIL) - Cookbook sre.elasticsearch.e6-upgrade (exit_code=99)
  • 14:46 gehel@cumin1001: START - Cookbook sre.elasticsearch.e6-upgrade
  • 14:13 filippo@puppetmaster1001: conftool action : set/pooled=no; selector: name=prometheus2003.codfw.wmnet
  • 13:42 ema: cp-ats rolling restart to apply proxy.config.cache.ram_cache.size config change T213263
  • 13:23 mvolz@deploy1001: scap-helm citoid finished
  • 13:22 mvolz@deploy1001: scap-helm citoid cluster codfw completed
  • 13:22 mvolz@deploy1001: scap-helm citoid upgrade production -f citoid-codfw-values.yaml stable/citoid [namespace: citoid, clusters: codfw]
  • 13:18 mvolz@deploy1001: scap-helm citoid finished
  • 13:18 mvolz@deploy1001: scap-helm citoid cluster eqiad completed
  • 13:17 mvolz@deploy1001: scap-helm citoid upgrade production -f citoid-eqiad-values.yaml stable/citoid [namespace: citoid, clusters: eqiad]
  • 13:04 arturo: T218022 disable icinga checks for labtestservices2001.wikimedia.org
  • 12:54 arturo: T218025 disable icinga checks for cloudnet2001-dev.codfw.wmnet
  • 12:49 mvolz@deploy1001: scap-helm citoid finished
  • 12:49 mvolz@deploy1001: scap-helm citoid cluster staging completed
  • 12:49 mvolz@deploy1001: scap-helm citoid upgrade staging -f citoid-staging-values.yaml stable/citoid [namespace: citoid, clusters: staging]
  • 12:48 mvolz@deploy1001: scap-helm citoid upgrade staging -f citoid-values-staging.yaml stable/citoid [namespace: citoid, clusters: staging]
  • 11:45 zeljkof: EU SWAT finished
  • 11:45 zfilipin@deploy1001: Synchronized wmf-config/InitialiseSettings.php: SWAT: Enable mobile section editing on bnwiki, hewiki, zh_yuewiki (T218375) (duration: 00m 50s)
  • 10:51 _joe_: testing safety checks for php-fpm on mwdebug2001
  • 10:35 jdrewniak@deploy1001: Synchronized portals: Wikimedia Portals Update: Bumping portals to master (T128546) (duration: 00m 48s)
  • 10:34 jdrewniak@deploy1001: Synchronized portals/wikipedia.org/assets: Wikimedia Portals Update: Bumping portals to master (T128546) (duration: 00m 49s)
  • 10:12 vgutierrez: uploaded acme-chief 0.12 to apt.wikimedia.org (buster) - T218543
  • 10:12 volans: restarted irc echo on icinga2001
  • 10:04 _joe_: hot-patching the error in php7.2-fpm config
  • 10:02 volans: running puppet on hosts matching 'C:php::fpm' to apply I004349
  • 10:00 volans: running puppet on failed hosts
  • 09:57 volans: temporarily stop ircecho to avoid spam
  • 09:40 ema: superior-cache-analyzer_3.3.7 uploaded to stretch-wikimedia T213263
  • 09:29 godog: switch to mpm_event for prometheus apache before merging https://gerrit.wikimedia.org/r/c/operations/puppet/+/496750
  • 08:58 vgutierrez: uploaded acme-chief 0.11 to apt.wikimedia.org (buster) - T207295
  • 08:52 moritzm: restarting ferm on sessionstore, was stuck in resolving one of the -a records, which were only merged in a subsequent step (T215883)
  • 08:50 marostegui@deploy1001: Synchronized wmf-config/db-eqiad.php: Repool db1104 (duration: 00m 48s)
  • 08:39 marostegui@deploy1001: Synchronized wmf-config/db-eqiad.php: Depool db1104 (duration: 00m 48s)
  • 08:34 filippo@puppetmaster1001: conftool action : set/pooled=yes; selector: name=prometheus2003.codfw.wmnet
  • 08:32 ema@puppetmaster1001: conftool action : set/pooled=yes; selector: name=cp2002.codfw.wmnet,service=varnish-fe
  • 08:32 ema@puppetmaster1001: conftool action : set/pooled=yes; selector: name=cp2002.codfw.wmnet,service=nginx
  • 08:31 ema: cp2002: repool varnish-fe to resume ATS testing T213263
  • 08:28 marostegui@deploy1001: Synchronized wmf-config/db-eqiad.php: Fully repool db1101 (duration: 00m 48s)
  • 08:22 moritzm: armed keyholder on neodymium
  • 07:49 marostegui@deploy1001: Synchronized wmf-config/db-eqiad.php: More traffic to db1101 (duration: 00m 48s)
  • 07:35 marostegui@deploy1001: Synchronized wmf-config/db-eqiad.php: More traffic to db1101 (duration: 00m 48s)
  • 07:24 marostegui@deploy1001: Synchronized wmf-config/db-eqiad.php: More traffic to db1101 (duration: 00m 48s)
  • 07:16 marostegui@deploy1001: Synchronized wmf-config/db-eqiad.php: Slowly repool db1101 (duration: 00m 49s)
  • 07:02 marostegui: Stop db1101 to upgrade mysql and kernel
  • 07:02 marostegui@deploy1001: Synchronized wmf-config/db-eqiad.php: Depool db1101 (duration: 00m 48s)
  • 06:33 marostegui: Deploy schema change on s8 codfw master (db2045), this will generate lag on s8 codfw
  • 06:30 marostegui@deploy1001: Synchronized wmf-config/db-eqiad.php: Repool db1121 (duration: 00m 48s)
  • 06:08 marostegui: Deploy schema change on x1 master (db1069) with replication - T218397
  • 06:04 marostegui: Deploy schema change on db1121 - lag will appear on labsdb:s4
  • 06:03 marostegui@deploy1001: Synchronized wmf-config/db-eqiad.php: Depool db1121 (duration: 01m 04s)
  • 03:58 kart_: Finished manual run of unpublished ContentTranslation draft purge script (T218279)
  • 02:00 kart_: Started manual run of unpublished ContentTranslation draft purge script (T218279)

2019-03-17

  • 11:51 Amir1: ladsgroup@mwmaint1002:~$ mwscript maintenance/createAndPromote.php --wiki=labswiki --force --sysop Ladsgroup
  • 08:49 elukey: restart pdfrender on scb1004

2019-03-16

  • 10:00 chasemp: stop apache on cobalt for maintenance
  • 00:19 andrewbogott: restarting slapd on seaborgium

2019-03-15

  • 22:37 shdubsh: temporarily stop ircecho on icinga2001
  • 18:00 thcipriani@deploy1001: Synchronized php-1.33.0-wmf.21/extensions/MobileFrontend: SWAT: iOS: Fix mobile editor T218069 T218062 T218352 T211490 T218062 T211491 T172877 (duration: 00m 54s)
  • 17:53 ema@puppetmaster1001: conftool action : set/pooled=no; selector: name=cp2002.codfw.wmnet,service=varnish-fe
  • 17:53 ema@puppetmaster1001: conftool action : set/pooled=no; selector: name=cp2002.codfw.wmnet,service=nginx
  • 17:53 ema: depool cp2002's varnish-fe for the weekend T213263#5027366
  • 17:25 arturo: acmechief2001 - armed keyholder
  • 17:22 arturo: cumin2001 - armed keyholder
  • 17:21 andrewbogott: updating puppet compiler facts
  • 17:13 mutante: netmon2001 - armed keyholder for rancid
  • 17:12 mutante: netmon1002 - armed keyholder for rancid
  • 17:04 arturo: arm keyholder in deploy2001
  • 17:03 arturo: arm keyholder in sarin
  • 17:02 arturo: arm keyholder in labpuppetmaster1002
  • 17:01 arturo: arm keyholder in deploy101
  • 17:00 XioNoX: clean up rigel switch port
  • 17:00 arturo: arm keyholder in acmechief1001
  • 16:58 arturo: arming keyholder in cumin1001
  • 16:09 moritzm: upgrading deployment-deploy01 to component/php72
  • 15:59 akosiaris: puppetmaster1001 rm /var/run/confd-template/.citoid*.err to remove old stale confd files that resulted from merging https://gerrit.wikimedia.org/r/494213
  • 15:54 moritzm: rebooting labtestservices2003 for kernel update
  • 15:47 andrewbogott: enabling puppet on seaborgium to apply new acme cert
  • 15:47 moritzm: rebooting labtestservices2002 for kernel update
  • 15:42 moritzm: rebooting labtestcontrol2003 for kernel update
  • 15:38 moritzm: rebooting labtestnet2002 for kernel update
  • 15:11 ema@puppetmaster1001: conftool action : set/pooled=yes; selector: service=ats-be,cluster=cache_upload,name=cp2015.codfw.wmnet
  • 15:10 ema: cp2015: repool ATS with proxy.config.cache.ram_cache.size 1G T213263
  • 15:07 moritzm: rebooting graphite2003 for kernel security update
  • 15:05 ema@puppetmaster1001: conftool action : set/pooled=no; selector: service=ats-be,cluster=cache_upload,name=cp2015.codfw.wmnet
  • 15:04 ema: cp2015: test ATS depool T213263
  • 14:45 mutante: tools tools-sgebastion-07 - dpkg-reconfigure locales and adding ko_KR.EUC-KR for Korean users by request and as done in the past on former tools bastion
  • 14:43 moritzm: rebooting etherpad1001 to pick up SSBD-enabled qemu
  • 14:31 mutante: tools-sgebastion-07 - generating locales for user request in T130532
  • 13:50 moritzm: rolling reboot of ores in codfw for SSBD/L1TF kernel update
  • 13:47 akosiaris@deploy1001: scap-helm cxserver finished
  • 13:47 akosiaris@deploy1001: scap-helm cxserver cluster staging completed
  • 13:47 akosiaris@deploy1001: scap-helm cxserver upgrade -f cxserver-staging-values.yaml staging stable/cxserver [namespace: cxserver, clusters: staging]
  • 11:16 godog: reenable prometheus@k8s on prometheus2004 with mod_proxy connection limits - T217715
  • 10:31 akosiaris: add a 10s bucket to cxserver prometheus-statsd exporter mappings
  • 10:31 akosiaris@deploy1001: scap-helm cxserver finished
  • 10:31 akosiaris@deploy1001: scap-helm cxserver cluster eqiad completed
  • 10:31 akosiaris@deploy1001: scap-helm cxserver upgrade -f cxserver-eqiad-values.yaml production stable/cxserver [namespace: cxserver, clusters: eqiad]
  • 10:31 akosiaris@deploy1001: scap-helm cxserver finished
  • 10:31 akosiaris@deploy1001: scap-helm cxserver cluster codfw completed
  • 10:31 akosiaris@deploy1001: scap-helm cxserver upgrade -f cxserver-codfw-values.yaml production stable/cxserver [namespace: cxserver, clusters: codfw]
  • 10:31 akosiaris@deploy1001: scap-helm cxserver finished
  • 10:31 akosiaris@deploy1001: scap-helm cxserver cluster staging completed
  • 10:31 akosiaris@deploy1001: scap-helm cxserver upgrade -f cxserver-staging-values.yaml staging stable/cxserver [namespace: cxserver, clusters: staging]
  • 10:30 akosiaris@deploy1001: scap-helm cxserver upgrade -f cxserver-staging-values.yaml staging stable/citoid [namespace: cxserver, clusters: staging]
  • 10:03 akosiaris@deploy1001: scap-helm citoid finished
  • 10:03 akosiaris@deploy1001: scap-helm citoid cluster codfw completed
  • 10:03 akosiaris@deploy1001: scap-helm citoid upgrade -f citoid-codfw-values.yaml production stable/citoid [namespace: citoid, clusters: codfw]
  • 10:03 akosiaris@deploy1001: scap-helm citoid finished
  • 10:02 akosiaris@deploy1001: scap-helm citoid cluster eqiad completed
  • 10:02 akosiaris@deploy1001: scap-helm citoid upgrade -f citoid-eqiad-values.yaml production stable/citoid [namespace: citoid, clusters: eqiad]
  • 10:02 akosiaris: add a 10s bucket to citoid prometheus-statsd exporter mappings
  • 10:02 akosiaris: remove prometheus-statsd-exporter from zotero pods
  • 10:02 akosiaris@deploy1001: scap-helm citoid finished
  • 10:02 akosiaris@deploy1001: scap-helm citoid cluster staging completed
  • 10:02 akosiaris@deploy1001: scap-helm citoid upgrade -f citoid-staging-values.yaml staging stable/citoid [namespace: citoid, clusters: staging]
  • 10:01 akosiaris@deploy1001: scap-helm citoid upgrade -f citoid-values-staging.yaml staging stable/citoid [namespace: citoid, clusters: staging]
  • 10:00 akosiaris@deploy1001: scap-helm zotero finished
  • 10:00 akosiaris@deploy1001: scap-helm zotero cluster staging completed
  • 10:00 akosiaris@deploy1001: scap-helm zotero upgrade --install -f zotero-values-staging.yaml staging stable/zotero [namespace: zotero, clusters: staging]
  • 09:58 akosiaris@deploy1001: scap-helm zotero upgrade -f zotero-values-staging.yaml staging stable/zotero [namespace: zotero, clusters: staging]
  • 09:53 akosiaris@deploy1001: scap-helm zotero finished
  • 09:53 akosiaris@deploy1001: scap-helm zotero cluster eqiad completed
  • 09:53 akosiaris@deploy1001: scap-helm zotero upgrade -f zotero-values-eqiad.yaml production stable/zotero [namespace: zotero, clusters: eqiad]
  • 09:53 akosiaris@deploy1001: scap-helm zotero finished
  • 09:53 akosiaris@deploy1001: scap-helm zotero cluster codfw completed
  • 09:52 akosiaris@deploy1001: scap-helm zotero upgrade -f zotero-values-codfw.yaml production stable/zotero [namespace: zotero, clusters: codfw]
  • 09:42 godog: bounce grafana-server on grafana1001
  • 09:32 marostegui@deploy1001: Synchronized wmf-config/db-eqiad.php: Repool db1103 (duration: 00m 50s)
  • 09:28 godog: correction, prometheus2004
  • 09:27 godog: temporarily disable read queries to prometheus@k8s on prometheus2003
  • 09:19 jiji@cumin1001: conftool action : set/weight=12; selector: dc=eqiad,service=cxserver,cluster=scb,name=kubernetes.*
  • 09:18 jiji@cumin1001: conftool action : set/weight=15; selector: dc=codfw,service=cxserver,cluster=scb,name=kubernetes.*
  • 09:17 jijiki: Ramp up cxserver k8s traffic to 50% - T213195
  • 08:57 marostegui@deploy1001: Synchronized wmf-config/db-eqiad.php: Depool db1103 (duration: 00m 50s)
  • 08:52 marostegui@deploy1001: Synchronized wmf-config/db-eqiad.php: Repool db1084 (duration: 00m 47s)
  • 08:22 marostegui@deploy1001: Synchronized wmf-config/db-eqiad.php: Depool db1084 (duration: 00m 49s)
  • 07:28 marostegui@deploy1001: Synchronized wmf-config/db-eqiad.php: Fully repool db1091 (duration: 00m 49s)
  • 07:17 marostegui@deploy1001: Synchronized wmf-config/db-eqiad.php: More traffic to db1091 (duration: 00m 48s)
  • 07:05 marostegui@deploy1001: Synchronized wmf-config/db-eqiad.php: More traffic to db1091 (duration: 00m 48s)
  • 07:01 kart_: Finished manual run of unpublished ContentTranslation draft purge script (T217818)
  • 06:55 marostegui@deploy1001: Synchronized wmf-config/db-eqiad.php: More traffic to db1091 (duration: 00m 48s)
  • 06:44 marostegui@deploy1001: Synchronized wmf-config/db-eqiad.php: Slowly repool db1091 (duration: 00m 48s)
  • 06:04 marostegui: Upgrade db1091
  • 06:04 marostegui@deploy1001: Synchronized wmf-config/db-eqiad.php: Depool db1091 (duration: 00m 50s)
  • 04:01 kart_: Started manual run of unpublished ContentTranslation draft purge script (T217818)
  • 01:25 ejegg: re-enabled ingenico audit parser
  • 01:25 ejegg: updated fundraising CiviCRM from 41efa14fb0 to a2316be94f

2019-03-14

  • 22:54 ejegg: temporarily disabled Ingenico WX audit parsing
  • 22:05 cdanis: cdanis@icinga2001.wikimedia.org ~ % sudo systemctl restart icinga.service
  • 21:58 cdanis: cdanis@icinga2001.wikimedia.org ~ % sudo systemctl restart nsca.service
  • 21:01 crusnov@deploy1001: Finished deploy [netbox/deploy@090a0c3]: Another minor bugfix releaes for ganeti-netbox script (duration: 00m 56s)
  • 21:00 crusnov@deploy1001: Started deploy [netbox/deploy@090a0c3]: Another minor bugfix releaes for ganeti-netbox script
  • 20:26 thcipriani: gerrit live on 2.15.11
  • 20:24 thcipriani: restarting gerrit for 2.15.11
  • 20:23 thcipriani@deploy1001: Finished deploy [gerrit/gerrit@2bc8af0]: Gerrit to 2.15.11 on cobalt (duration: 00m 02s)
  • 20:23 thcipriani@deploy1001: Started deploy [gerrit/gerrit@2bc8af0]: Gerrit to 2.15.11 on cobalt
  • 20:22 thcipriani@deploy1001: Finished deploy [gerrit/gerrit@2bc8af0]: Gerrit to 2.15.11 on gerrit2001 only (duration: 00m 04s)
  • 20:22 thcipriani@deploy1001: Started deploy [gerrit/gerrit@2bc8af0]: Gerrit to 2.15.11 on gerrit2001 only
  • 20:17 ejegg: updated CiviCRM from b4e3cf16cc to 41efa14fb0
  • 20:17 thcipriani: gerrit back to 2.15.8
  • 20:15 thcipriani: restart gerrit on cobalt
  • 20:14 thcipriani@deploy1001: Finished deploy [gerrit/gerrit@2bc8af0]: Revert Gerrit to 2.15.11 on cobalt (duration: 00m 07s)
  • 20:14 thcipriani@deploy1001: Started deploy [gerrit/gerrit@2bc8af0]: Revert Gerrit to 2.15.11 on cobalt
  • 20:14 thcipriani@deploy1001: Finished deploy [gerrit/gerrit@2bc8af0]: Revert Gerrit to 2.15.11 on gerrit2001 only (duration: 00m 10s)
  • 20:13 thcipriani@deploy1001: Started deploy [gerrit/gerrit@2bc8af0]: Revert Gerrit to 2.15.11 on gerrit2001 only
  • 20:13 bstorm_: Placed labstore1006 back in rotation for NFS and rsync
  • 20:11 crusnov@deploy1001: Finished deploy [netbox/deploy@c6cf7d6]: Minor bugfix releaes for ganeti-netbox script (duration: 00m 54s)
  • 20:10 crusnov@deploy1001: Started deploy [netbox/deploy@c6cf7d6]: Minor bugfix releaes for ganeti-netbox script
  • 20:03 jforrester@deploy1001: Synchronized php-1.33.0-wmf.21/extensions/GrowthExperiments/extension.json: Hot-deploy I19414dc31 to fix dependencies on mw.Uri (duration: 00m 49s)
  • 19:37 XioNoX: set protocols bgp group Anycast4 multihop ttl 193 on cr1/2-esams - T209989
  • 19:25 XioNoX: merged Juniper BFD Icinga check
  • 19:12 thcipriani: gerrit back up
  • 19:08 thcipriani: restarting gerrit on cobalt for 2.15.11 upgrade
  • 19:07 thcipriani@deploy1001: Finished deploy [gerrit/gerrit@2bc8af0]: Gerrit to 2.15.11 on cobalt (duration: 00m 11s)
  • 19:07 thcipriani@deploy1001: Started deploy [gerrit/gerrit@2bc8af0]: Gerrit to 2.15.11 on cobalt
  • 19:05 thcipriani@deploy1001: Finished deploy [gerrit/gerrit@2bc8af0]: Gerrit to 2.15.11 on gerrit2001 only (duration: 00m 11s)
  • 19:05 thcipriani@deploy1001: Started deploy [gerrit/gerrit@2bc8af0]: Gerrit to 2.15.11 on gerrit2001 only
  • 19:02 XioNoX: set protocols bgp group Anycast4 multihop ttl 193 on cr1/2-eqiad - T209989
  • 18:53 jforrester@deploy1001: Synchronized php-1.33.0-wmf.21/extensions/ParsoidBatchAPI/includes/ApiParsoidBatch.php: SWAT Another deprecation fix via I4936d0ce03 (duration: 00m 49s)
  • 18:37 XioNoX: set protocols bgp group Anycast4 multihop ttl 190 on cr1-codfw - T209989
  • 18:31 jforrester@deploy1001: Synchronized wmf-config/InitialiseSettings.php: SWAT T216730 Enable musical notation datatype on Wikidata (duration: 00m 48s)
  • 18:29 jforrester@deploy1001: Synchronized php-1.33.0-wmf.21/extensions/GrowthExperiments/modules/help/: SWAT Ib13cf88d GrowthExperiments log fix for closes (duration: 00m 49s)
  • 18:22 jforrester@deploy1001: Synchronized wmf-config/CommonSettings.php: SWAT T217436 Add default user config for rollback confirmation (duration: 00m 48s)
  • 18:13 jforrester@deploy1001: Synchronized wmf-config/InitialiseSettings.php: SWAT T217436 Set up exceptions for rollback confirmation (duration: 00m 49s)
  • 18:08 tzatziki: change email for KStineRowe (WMF) on officewiki, collabwiki, SUL
  • 18:05 mforns@deploy1001: Finished deploy [analytics/aqs/deploy@13203f1]: Deploying AQS for node10 upgrade (duration: 19m 40s)
  • 17:59 jforrester@deploy1001: Synchronized php-1.33.0-wmf.21/extensions/ParsoidBatchAPI/includes/ApiParsoidBatch.php: Hot-deploy I2842dfea to reduce deprecation spam after T206675 deploy of wmf.21 (duration: 00m 49s)
  • 17:45 mforns@deploy1001: Started deploy [analytics/aqs/deploy@13203f1]: Deploying AQS for node10 upgrade
  • 17:43 mforns: Deploying AQS using scap (node10 upgrade)
  • 17:32 arlolra: Updated Parsoid to f3e2209 (T213950)
  • 17:24 arlolra@deploy1001: Finished deploy [parsoid/deploy@8cf4107]: Updating Parsoid to f3e2209 (duration: 07m 09s)
  • 17:17 arlolra@deploy1001: Started deploy [parsoid/deploy@8cf4107]: Updating Parsoid to f3e2209
  • 17:15 jijiki: Pool mw1280 back - T218006
  • 17:12 jijiki: Depool mw2206 - T215415
  • 16:51 otto@deploy1001: scap-helm eventgate-analytics finished
  • 16:51 otto@deploy1001: scap-helm eventgate-analytics cluster staging completed
  • 16:51 otto@deploy1001: scap-helm eventgate-analytics upgrade staging -f eventgate-analytics-staging-values.yaml stable/eventgate-analytics [namespace: eventgate-analytics, clusters: staging]
  • 16:50 crusnov@deploy1001: Finished deploy [netbox/deploy@59430dd]: Deploy Ganeti Sync and Upgrade to upstream v2.5.8 (netmon1002) - T215229 (duration: 00m 50s)
  • 16:49 crusnov@deploy1001: Started deploy [netbox/deploy@59430dd]: Deploy Ganeti Sync and Upgrade to upstream v2.5.8 (netmon1002) - T215229
  • 16:46 crusnov@deploy1001: Finished deploy [netbox/deploy@59430dd]: Deploy Ganeti Sync and Upgrade to upstream v2.5.8 - T215229 (duration: 00m 30s)
  • 16:45 crusnov@deploy1001: Started deploy [netbox/deploy@59430dd]: Deploy Ganeti Sync and Upgrade to upstream v2.5.8 - T215229
  • 16:32 XioNoX: add default deny to mr1-* junos-host policies - T218234
  • 16:30 addshore@deploy1001: Synchronized php-1.33.0-wmf.21/extensions/Wikibase/lib/includes/Store/Sql/TermSqlIndex.php: gerrit:496481 TermSqlIndex, track calls to getTermsOfEntities (duration: 00m 50s)
  • 16:22 otto@deploy1001: scap-helm eventgate-analytics finished
  • 16:22 otto@deploy1001: scap-helm eventgate-analytics cluster staging completed
  • 16:22 otto@deploy1001: scap-helm eventgate-analytics upgrade staging -f eventgate-analytics-staging-values.yaml stable/eventgate-analytics [namespace: eventgate-analytics, clusters: staging]
  • 16:08 arturo: reimaging cloudvirt1015 again
  • 16:04 akosiaris: reboot one final time all sessionstore[12]00[123] servers
  • 16:02 arturo: T216497 drop python-dogpile.cache from jessie-wikimedia/openstack-mitaka-jessie
  • 14:57 marostegui: Start replication on db2070 after testing url_notes
  • 14:53 mutante: analytics-tool1003 - stopping idle screen session
  • 14:43 marostegui: Stop replication on db2070 to test the url_notes (will alert only on IRC)
  • 14:21 otto@deploy1001: scap-helm eventgate-analytics finished
  • 14:21 otto@deploy1001: scap-helm eventgate-analytics cluster eqiad completed
  • 14:21 otto@deploy1001: scap-helm eventgate-analytics upgrade production -f eventgate-analytics-eqiad-values.yaml --set main_app.version=v1.0.3-wmf0 stable/eventgate-analytics [namespace: eventgate-analytics, clusters: eqiad]
  • 14:09 otto@deploy1001: scap-helm eventgate-analytics finished
  • 14:09 otto@deploy1001: scap-helm eventgate-analytics cluster staging completed
  • 14:09 otto@deploy1001: scap-helm eventgate-analytics upgrade staging -f eventgate-analytics-staging-values.yaml stable/eventgate-analytics [namespace: eventgate-analytics, clusters: staging]
  • 13:54 godog: take a snapshot of data on prometheus2004
  • 13:50 arturo: reimaging cloudvirt1015
  • 13:36 marostegui@deploy1001: Synchronized wmf-config/db-eqiad.php: Fully repool db1081 into API (duration: 00m 48s)
  • 13:15 arturo: T216497 drop libpulse0 from jessie-wikimedia/openstack-mtiaka-jessie
  • 13:15 marostegui@deploy1001: Synchronized wmf-config/db-eqiad.php: Slowly repool db1081 into API (duration: 00m 49s)
  • 13:10 arturo: T216497 drop python-mysqldb from jessie-wikimedia/openstack-mtiaka-jessie
  • 13:10 zfilipin@deploy1001: rebuilt and synchronized wikiversions files: all wikis to 1.33.0-wmf.21
  • 12:50 jiji@cumin1001: conftool action : set/pooled=yes; selector: dc=codfw,service=cxserver,cluster=scb,name=kubernetes.*
  • 12:49 jiji@cumin1001: conftool action : set/weight=1; selector: dc=codfw,service=cxserver,cluster=scb,name=kubernetes.*
  • 12:42 jijiki: Rump up k8s cxserver traffic to 8% - T213195
  • 12:22 jiji@cumin1001: conftool action : set/pooled=yes; selector: dc=eqiad,service=cxserver,cluster=scb,name=kubernetes.*
  • 12:21 jiji@cumin1001: conftool action : set/weight=1; selector: dc=eqiad,service=cxserver,cluster=scb,name=kubernetes.*
  • 12:17 jijiki: Send ~4% of cxserver traffic to eqiad k8s - T213195
  • 12:14 zeljkof: EU SWAT finished
  • 12:13 kartik@deploy1001: Synchronized wmf-config: SWAT: gerrit:496418 Revert "Correct the enable context detection configuration" (duration: 00m 56s)
  • 12:12 arturo: T216497 drop some packages from jessie-wikimedia/openstack-mtiaka-jessie: qemu-XXX
  • 12:06 arturo: T216497 drop some packages from jessie-wikimedia/openstack-mtiaka-jessie: libvirt*, librados2, librbd1, because they induce the resolver to conflict with those included in stretch
  • 12:02 kartik@deploy1001: Synchronized wmf-config: SWAT: Revert gerrit:496412 Fix content detection config (duration: 00m 56s)
  • 11:58 kartik@deploy1001: scap failed: average error rate on 3/11 canaries increased by 10x (rerun with --force to override this check, see https://logstash.wikimedia.org/goto/db09a36be5ed3e81155041f7d46ad040 for details)
  • {{safesubst:SAL entry|1=11:45 kartik@deploy1001: Synchronized php-1.33.0-wmf.21/skins/MinervaNeue: SWAT: [[gerrit:496364|Ensure page-actions icons are `display:block` (T218182) (duration: 00m 57s)}}
  • 11:15 kartik@deploy1001: Synchronized wmf-config/InitialiseSettings.php: SWAT: gerrit:493672 Enable ExternalGuidance to all Wikipedias (T216129) (duration: 00m 57s)
  • 10:57 marostegui@deploy1001: Synchronized wmf-config/db-eqiad.php: Slowly repool db1081 (duration: 00m 57s)
  • 10:50 ema@puppetmaster1001: conftool action : set/pooled=yes; selector: name=cp2002.codfw.wmnet,service=varnish-fe
  • 10:50 ema@puppetmaster1001: conftool action : set/pooled=yes; selector: name=cp2002.codfw.wmnet,service=nginx
  • 10:50 ema: cp2002: pool varnish-fe to resume ATS testing T213263
  • 10:44 moritzm: installing libsdl1.2 security updates for jessie
  • 10:21 marostegui@deploy1001: Synchronized wmf-config/db-eqiad.php: Depool db1081 (duration: 00m 58s)
  • 09:54 hashar: ci: live hacked job https://integration.wikimedia.org/ci/job/quibble-vendor-mysql-hhvm-docker/ in attempt to capture 'core' files from hhvm | https://gerrit.wikimedia.org/r/#/c/integration/config/+/496392/ | T216689
  • 09:02 mutante: ms-be2037 - down since a couple hours, no SAL or ticket, powercycling
  • 08:44 marostegui: Deploy schema change on s4 codfw master (db2051), this will generate lag on codfw
  • 08:26 marostegui@deploy1001: Synchronized wmf-config/db-eqiad.php: Fully repool db1088 (duration: 00m 53s)
  • 08:21 marostegui: Upgrade s3 codfw master (db2043) there will be lag on s3 codfw
  • 08:10 marostegui@deploy1001: Synchronized wmf-config/db-eqiad.php: More traffic to db1088 (duration: 00m 55s)
  • 07:54 marostegui@deploy1001: Synchronized wmf-config/db-eqiad.php: Slowly repool db1088 (duration: 00m 55s)
  • 07:48 akosiaris@deploy1001: scap-helm cxserver finished
  • 07:48 akosiaris@deploy1001: scap-helm cxserver cluster codfw completed
  • 07:48 akosiaris@deploy1001: scap-helm cxserver upgrade -f cxserver-codfw-values.yaml production stable/cxserver [namespace: cxserver, clusters: codfw]
  • 07:42 marostegui: Upgrade db1088
  • 07:26 marostegui@deploy1001: Synchronized wmf-config/db-eqiad.php: Depool db1088 (duration: 00m 54s)
  • 07:22 kartik@deploy1001: Finished deploy [cxserver/deploy@3ba57a5]: Update cxserver to b16f4a1 (T212577, T208386) (duration: 03m 50s)
  • 07:21 marostegui@deploy1001: Synchronized wmf-config/db-eqiad.php: Fully repool db1098 (duration: 00m 55s)
  • 07:18 kartik@deploy1001: Started deploy [cxserver/deploy@3ba57a5]: Update cxserver to b16f4a1 (T212577, T208386)
  • 07:16 akosiaris@deploy1001: scap-helm cxserver finished
  • 07:16 akosiaris@deploy1001: scap-helm cxserver cluster codfw completed
  • 07:16 akosiaris@deploy1001: scap-helm cxserver upgrade -f cxserver-codfw-values.yaml production stable/cxserver [namespace: cxserver, clusters: codfw]
  • 07:16 akosiaris@deploy1001: scap-helm cxserver finished
  • 07:16 akosiaris@deploy1001: scap-helm cxserver cluster eqiad completed
  • 07:16 akosiaris@deploy1001: scap-helm cxserver upgrade -f cxserver-eqiad-values.yaml production stable/cxserver [namespace: cxserver, clusters: eqiad]
  • 07:16 akosiaris@deploy1001: scap-helm cxserver finished
  • 07:15 akosiaris@deploy1001: scap-helm cxserver cluster staging completed
  • 07:15 akosiaris@deploy1001: scap-helm cxserver upgrade -f cxserver-staging-values.yaml staging stable/cxserver [namespace: cxserver, clusters: staging]
  • 07:07 marostegui@deploy1001: Synchronized wmf-config/db-eqiad.php: More traffic to db1098 (duration: 00m 55s)
  • 06:52 marostegui@deploy1001: Synchronized wmf-config/db-eqiad.php: More traffic to db1098 (duration: 00m 54s)
  • 06:51 marostegui@deploy1001: Synchronized wmf-config/db-eqiad.php: More traffic to db1097 (duration: 00m 55s)
  • 06:50 marostegui@deploy1001: sync-file aborted: More traffic to db1097 (duration: 00m 00s)
  • 06:46 akosiaris@deploy1001: scap-helm cxserver finished
  • 06:46 akosiaris@deploy1001: scap-helm cxserver cluster staging completed
  • 06:46 akosiaris@deploy1001: scap-helm cxserver upgrade -f cxserver-staging-values.yaml staging /home/akosiaris/deployment-charts/charts/cxserver/ [namespace: cxserver, clusters: staging]
  • 06:40 marostegui: Upgrade mysql on dbstore2002
  • 06:34 marostegui@deploy1001: Synchronized wmf-config/db-eqiad.php: More traffic to db1098:3317 (duration: 00m 55s)
  • 06:21 marostegui@deploy1001: Synchronized wmf-config/db-eqiad.php: Slowly repool db1098:3317 (duration: 00m 55s)
  • 06:08 kart_: Finished manual run of unpublished ContentTranslation draft purge script (T217818)
  • 06:04 marostegui: Upgrade MySQL on db1098
  • 06:04 marostegui@deploy1001: Synchronized wmf-config/db-eqiad.php: Depool db1098 (duration: 00m 56s)
  • 04:01 kart_: Started manual run of unpublished ContentTranslation draft purge script (T217818)
  • 01:39 ejegg: updated fundraising CiviCRM from 5c45e4c24d to b4e3cf16cc

2019-03-13

  • 23:48 catrope@deploy1001: Synchronized php-1.33.0-wmf.21/skins/MinervaNeue/: Remove unnecessary parameter from getHistoryPageAction (duration: 00m 56s)
  • 23:45 catrope@deploy1001: Synchronized wmf-config/WikibaseSearchSettings.php: Fix builder class definition for WBCS (duration: 00m 56s)
  • 23:41 catrope@deploy1001: Synchronized php-1.33.0-wmf.21/extensions/MobileFrontend/: Fix animation when visual section editing enabled on mobile only (T218167) (duration: 00m 58s)
  • 23:39 catrope@deploy1001: Synchronized php-1.33.0-wmf.21/extensions/WikibaseCirrusSearch/: Fix hook return values (duration: 00m 58s)
  • 23:30 catrope@deploy1001: Synchronized php-1.33.0-wmf.21/extensions/GrowthExperiments/: Instrumentation fixes (T217802) (duration: 00m 57s)
  • 22:00 otto@deploy1001: Synchronized wmf-config/InitialiseSettings.php: Disabling api-request logging to eventgate-analytics for group0 wikis until we solve T218268 (duration: 00m 56s)
  • 21:11 otto@deploy1001: scap-helm eventgate-analytics finished
  • 21:11 otto@deploy1001: scap-helm eventgate-analytics cluster eqiad completed
  • 21:11 otto@deploy1001: scap-helm eventgate-analytics upgrade production -f eventgate-analytics-eqiad-values.yaml stable/eventgate-analytics [namespace: eventgate-analytics, clusters: eqiad]
  • 21:10 otto@deploy1001: scap-helm eventgate-analytics finished
  • 21:10 otto@deploy1001: scap-helm eventgate-analytics cluster codfw completed
  • 21:09 otto@deploy1001: scap-helm eventgate-analytics upgrade production -f eventgate-analytics-codfw-values.yaml stable/eventgate-analytics [namespace: eventgate-analytics, clusters: codfw]
  • 20:58 otto@deploy1001: scap-helm eventgate-analytics finished
  • 20:58 otto@deploy1001: scap-helm eventgate-analytics cluster staging completed
  • 20:58 otto@deploy1001: scap-helm eventgate-analytics upgrade staging -f eventgate-analytics-staging-values.yaml stable/eventgate-analytics [namespace: eventgate-analytics, clusters: staging]
  • 20:58 otto@deploy1001: scap-helm eventgate-analytics install -n staging -f eventgate-analytics-staging-values.yaml stable/eventgate-analytics [namespace: eventgate-analytics, clusters: staging]
  • 20:58 otto@deploy1001: scap-helm eventgate-analytics install -n staging -f eventgate-analytics-staging-values.yaml stable/eventgate-analytics [namespace: eventgate-analytics, clusters: eqiad,codfw]
  • 20:35 arlolra@deploy1001: Finished deploy [parsoid/deploy@e2e44bc]: Updating Parsoid to ea80d1b (duration: 06m 38s)
  • 20:28 arlolra@deploy1001: Started deploy [parsoid/deploy@e2e44bc]: Updating Parsoid to ea80d1b
  • 20:25 bsitzmann@deploy1001: Finished deploy [mobileapps/deploy@5f8e4e6]: Update mobileapps to 5865552 (7074964 d6dc3cd fbc6262) (duration: 03m 35s)
  • 20:24 otto@deploy1001: Synchronized wmf-config/InitialiseSettings.php: Disabling api-request logging to eventgate-analytics for group1 wikis to investigate possible outage (duration: 00m 56s)
  • 20:21 bsitzmann@deploy1001: Started deploy [mobileapps/deploy@5f8e4e6]: Update mobileapps to 5865552 (7074964 d6dc3cd fbc6262)
  • 20:14 bsitzmann@deploy1001: Finished deploy [mobileapps/deploy@5f8e4e6]: Update mobileapps to 5865552 (7074964 d6dc3cd fbc6262) (duration: 01m 49s)
  • 20:03 herron: increased index.mapping.total_fields.limit to 1350 on index logstash-2019.03.13
  • 19:46 jijiki: Pooling mw2206 - T215415
  • 19:26 herron: performing rolling restart of eqiad logstash instances
  • 18:51 jijiki: Depool mw1280 and mw2206 to hardware issues - T215415 T218006
  • 18:44 otto@deploy1001: Synchronized wmf-config/InitialiseSettings.php: Enabling api-request logging to eventgate-analytics for group1 wikis (duration: 00m 58s)
  • 18:30 robh: thumbor1004 memtest in progress via T215411
  • 18:29 ema@puppetmaster1001: conftool action : set/pooled=no; selector: name=cp2002.codfw.wmnet,service=varnish-fe
  • 18:29 ema@puppetmaster1001: conftool action : set/pooled=no; selector: name=cp2002.codfw.wmnet,service=nginx
  • 18:28 ema: cp2002: depool varnish-fe after 1 hour ATS experiment T213263
  • 18:09 bstorm_: rebooting labstore1006 T217473
  • 18:07 bstorm_: downtime labstore1006 for troubleshooting T217473
  • 17:57 XioNoX: set interface description on fasw-c-codfw:ge-0/0/47
  • 17:43 XioNoX: s/29073/202425/ on AMS-IX
  • 17:34 XioNoX: add missing sandbox1-b-eqiad interface to ospf(3) passive on cr1/2-eqiad
  • 17:19 ema@puppetmaster1001: conftool action : set/pooled=yes; selector: name=cp2002.codfw.wmnet,service=varnish-fe
  • 17:19 ema@puppetmaster1001: conftool action : set/pooled=yes; selector: name=cp2002.codfw.wmnet,service=nginx
  • 17:18 ema: cp2002: pool varnish-fe for user traffic, routed through ATS backends T213263
  • 17:05 otto@deploy1001: scap-helm eventgate-analytics finished
  • 17:05 otto@deploy1001: scap-helm eventgate-analytics cluster eqiad completed
  • 17:05 otto@deploy1001: scap-helm eventgate-analytics upgrade production -f eventgate-analytics-eqiad-values.yaml stable/eventgate-analytics [namespace: eventgate-analytics, clusters: eqiad]
  • 17:01 otto@deploy1001: scap-helm eventgate-analytics finished
  • 17:01 otto@deploy1001: scap-helm eventgate-analytics cluster codfw completed
  • 17:01 otto@deploy1001: scap-helm eventgate-analytics upgrade production -f eventgate-analytics-codfw-values.yaml stable/eventgate-analytics [namespace: eventgate-analytics, clusters: codfw]
  • 16:59 otto@deploy1001: scap-helm eventgate-analytics finished
  • 16:58 otto@deploy1001: scap-helm eventgate-analytics cluster staging completed
  • 16:58 otto@deploy1001: scap-helm eventgate-analytics upgrade staging -f eventgate-analytics-staging-values.yaml stable/eventgate-analytics [namespace: eventgate-analytics, clusters: staging]
  • 16:56 robh: mw2206.codfw.wmnet is being powered down for firmware update, relying on auto depool function from clean shutdown for mw api server via T215415
  • 16:42 robh: mw2206.codfw.wmnet is being powered down for firmware update, relying on auto depool function from clean shutdown for mw api server via T215415
  • 16:36 addshore: SWAT done
  • 16:36 addshore@deploy1001: Synchronized php-1.33.0-wmf.21/includes/api/ApiMain.php: SWAT: T214080 T212529 ApiMain.php api/request logging event changes gerrit:496197 (duration: 00m 57s)
  • 16:32 akosiaris@deploy1001: scap-helm cxserver finished
  • 16:32 akosiaris@deploy1001: scap-helm cxserver cluster staging completed
  • 16:32 akosiaris@deploy1001: scap-helm cxserver upgrade -f cxserver-staging-values.yaml staging /home/akosiaris/deployment-charts/charts/cxserver/ [namespace: cxserver, clusters: staging]
  • 16:19 akosiaris@deploy1001: scap-helm cxserver finished
  • 16:19 akosiaris@deploy1001: scap-helm cxserver cluster staging completed
  • 16:19 akosiaris@deploy1001: scap-helm cxserver upgrade -f cxserver-staging-values.yaml staging /home/akosiaris/deployment-charts/charts/cxserver/ [namespace: cxserver, clusters: staging]
  • 16:16 akosiaris@deploy1001: scap-helm cxserver finished
  • 16:16 akosiaris@deploy1001: scap-helm cxserver cluster staging completed
  • 16:16 akosiaris@deploy1001: scap-helm cxserver upgrade -f cxserver-staging-values.yaml staging /home/akosiaris/deployment-charts/charts/cxserver/ [namespace: cxserver, clusters: staging]
  • 16:16 akosiaris@deploy1001: scap-helm cxserver finished
  • 16:16 akosiaris@deploy1001: scap-helm cxserver cluster staging completed
  • 16:16 akosiaris@deploy1001: scap-helm cxserver upgrade -f cxserver-staging-values.yaml staging /home/akosiaris/deployment-charts/charts/cxserver/ [namespace: cxserver, clusters: staging]
  • 16:15 jijiki: Depool thumbor1004 to investigate memory issues - T215411
  • 16:04 akosiaris@deploy1001: scap-helm cxserver finished
  • 16:04 akosiaris@deploy1001: scap-helm cxserver cluster eqiad completed
  • 16:04 akosiaris@deploy1001: scap-helm cxserver upgrade -f cxserver-eqiad-values.yaml production stable/cxserver [namespace: cxserver, clusters: eqiad]
  • 16:04 akosiaris@deploy1001: scap-helm cxserver finished
  • 16:04 akosiaris@deploy1001: scap-helm cxserver cluster codfw completed
  • 16:04 akosiaris@deploy1001: scap-helm cxserver upgrade -f cxserver-codfw-values.yaml production stable/cxserver [namespace: cxserver, clusters: codfw]
  • 16:04 akosiaris@deploy1001: scap-helm cxserver finished
  • 16:04 akosiaris@deploy1001: scap-helm cxserver cluster staging completed
  • 16:04 akosiaris@deploy1001: scap-helm cxserver upgrade -f cxserver-staging-values.yaml staging stable/cxserver [namespace: cxserver, clusters: staging]
  • 15:52 akosiaris@deploy1001: scap-helm cxserver finished
  • 15:52 akosiaris@deploy1001: scap-helm cxserver cluster staging completed
  • 15:52 akosiaris@deploy1001: scap-helm cxserver upgrade -f cxserver-staging-values.yaml staging stable/cxserver [namespace: cxserver, clusters: staging]
  • 15:52 akosiaris@deploy1001: scap-helm cxserver finished
  • 15:52 akosiaris@deploy1001: scap-helm cxserver cluster eqiad completed
  • 15:52 akosiaris@deploy1001: scap-helm cxserver upgrade -f cxserver-eqiad-values.yaml production stable/cxserver [namespace: cxserver, clusters: eqiad]
  • 15:52 akosiaris@deploy1001: scap-helm cxserver upgrade -f cxserver-eqiad-values.yaml eqiad stable/cxserver [namespace: cxserver, clusters: eqiad]
  • 15:52 akosiaris@deploy1001: scap-helm cxserver finished
  • 15:52 akosiaris@deploy1001: scap-helm cxserver cluster codfw completed
  • 15:52 akosiaris@deploy1001: scap-helm cxserver upgrade -f cxserver-codfw-values.yaml production stable/cxserver [namespace: cxserver, clusters: codfw]
  • 15:40 akosiaris: do the first deploy of cxserver in eqiad/codfw T213195
  • 15:39 akosiaris@deploy1001: scap-helm cxserver finished
  • 15:39 akosiaris@deploy1001: scap-helm cxserver cluster eqiad completed
  • 15:39 akosiaris@deploy1001: scap-helm cxserver install -n production -f cxserver-eqiad-values.yaml stable/cxserver [namespace: cxserver, clusters: eqiad]
  • 15:39 akosiaris@deploy1001: scap-helm cxserver finished
  • 15:39 akosiaris@deploy1001: scap-helm cxserver cluster codfw completed
  • 15:39 akosiaris@deploy1001: scap-helm cxserver install -n production -f cxserver-codfw-values.yaml stable/cxserver [namespace: cxserver, clusters: codfw]
  • 14:27 ema: cp2002: depool varnish-fe in preparation of pointing it to ATS T213263
  • 14:13 marostegui: Upgrade db2074 (sanitarium master)
  • 13:42 akosiaris: upgrade kubestage to kubernetes 1.11.8
  • 13:42 akosiaris: upgrade neon to kubernetes 1.11.8
  • 13:28 akosiaris: upgrade kubestage1002 to kubernetes 1.11.8
  • 13:24 godog: take a snapshot of prometheus@k8s data on prometheus2004
  • 13:13 zfilipin@deploy1001: Synchronized php: group1 wikis to 1.33.0-wmf.21 (duration: 01m 43s)
  • 13:12 zfilipin@deploy1001: rebuilt and synchronized wikiversions files: group1 wikis to 1.33.0-wmf.21
  • 11:34 marostegui: Test snapshot db1117:3325 to dbstore1001 - T210292
  • 10:55 marostegui: Upgrade db2057
  • 10:00 marostegui@deploy1001: Synchronized wmf-config/db-eqiad.php: Repool db1085 (duration: 00m 56s)
  • 09:52 mutante: ms-be1035 - sudo systemctl reset-failed
  • 09:45 ema: cp1071: upgrade trafficserver to 8.0.3~rc0 for testing purposes
  • 09:41 marostegui: Deploy schema change on db1085 with replication, there will be lag on labsdb:s6
  • 09:41 marostegui@deploy1001: Synchronized wmf-config/db-eqiad.php: Depool db1085 (duration: 00m 55s)
  • 09:06 moritzm: installing PHP 7.0 security updates
  • 08:59 marostegui@deploy1001: Synchronized wmf-config/db-eqiad.php: Repool db1113:3316 (duration: 00m 55s)
  • 08:58 marostegui: Upgrade mysql and kernel on db2050
  • 08:51 ema: cp3030: wipe frontend cache to get rid of large objects T216006
  • 08:26 marostegui@deploy1001: Synchronized wmf-config/db-eqiad.php: Depool db1113:3316 (duration: 00m 55s)
  • 08:14 marostegui@deploy1001: Synchronized wmf-config/db-eqiad.php: Repool db1093 (duration: 00m 55s)
  • 08:09 moritzm: upgrading job runners in eqiad to component/php72 / PHP 7.2.16 (combined with glibc/OpenSSL updates)
  • 07:30 marostegui@deploy1001: Synchronized wmf-config/db-eqiad.php: Depool db1093 (duration: 00m 54s)
  • 07:26 moritzm: upgrading remaining app servers in eqiad to component/php72 / PHP 7.2.16 (combined with glibc/OpenSSL updates)
  • 07:25 marostegui@deploy1001: Synchronized wmf-config/db-eqiad.php: Fully repool db1096 (duration: 00m 58s)
  • 07:13 marostegui: Test snapshot dbstore1001:3311 to dbstore1001 - T210292
  • 07:12 marostegui@deploy1001: Synchronized wmf-config/db-eqiad.php: More traffic to db1096 (duration: 00m 55s)
  • 06:58 marostegui: Upgrade MySQL and kernel on db2036
  • 06:56 marostegui@deploy1001: Synchronized wmf-config/db-eqiad.php: Slowly repool db1096 (duration: 00m 55s)
  • 06:40 marostegui: Stop MySQL on db1096 for upgrade
  • 06:24 kart_: Finished manual run of unpublished ContentTranslation draft purge script (T217818)
  • 06:21 marostegui: Testing snapshotting on db1117:3321 to > dbstore1001 - T210292
  • 06:15 marostegui@deploy1001: Synchronized wmf-config/db-eqiad.php: Depool db1096 (duration: 01m 07s)
  • 04:11 kart_: Started manual run of unpublished ContentTranslation draft purge script (T217818)

2019-03-12

  • 23:33 thcipriani@deploy1001: Synchronized php-1.33.0-wmf.21/extensions/MobileFrontend/includes/specials/SpecialMobileOptions.php: SWAT: Fix: undefined locals in SpecialMobileOptions.setJsConfigVars() T218098 (duration: 00m 57s)
  • 20:49 shdubsh: manually upgrade prometheus-icinga-exporter to 0.5 on standby icinga
  • 19:48 eileen: civicrm revision changed from 977b9bfcf1 to 5c45e4c24d, config revision is f930677e97
  • 19:31 herron: restarted citoid on scb1003
  • 19:16 otto@deploy1001: Synchronized wmf-config/InitialiseSettings.php: Enabling api-request logging to eventgate-analytics for group0 wikis (duration: 01m 01s)
  • 19:14 arturo: T216497 manually delete libpam-systemd and libsystemd0 230-7~bpo8+2 from jessie-wikimedia/openstack-mitaka-jessie
  • 19:09 arturo: T216497 manually delete systemd 230-7~bpo8+2 from jessie-wikimedia/openstack-mitaka-jessie
  • 19:07 robh: rebooting thumbor1004 for memory troubleshooting via T215411
  • 17:11 addshore@deploy1001: Synchronized php-1.33.0-wmf.21/extensions/Wikibase/repo/includes/Store/Sql/SqlStore.php: T97368 Increase APC cache for PropertyInfoLookup from 15 to 20s (duration: 00m 55s)
  • 17:10 addshore@deploy1001: Synchronized php-1.33.0-wmf.20/extensions/Wikibase/repo/includes/Store/Sql/SqlStore.php: T97368 Increase APC cache for PropertyInfoLookup from 15 to 20s (duration: 00m 57s)
  • 17:02 jbond42: rolling update of debdeploy
  • 16:57 addshore@deploy1001: Synchronized wmf-config/InitialiseSettings-labs.php: beta only (duration: 00m 53s)
  • 16:43 addshore@deploy1001: Synchronized php-1.33.0-wmf.21/extensions/Wikibase/repo/includes/Store/Sql/SqlStore.php: T97368 Double on server cache for PropertyInfoStore (duration: 00m 55s)
  • 16:42 addshore@deploy1001: Synchronized php-1.33.0-wmf.20/extensions/Wikibase/repo/includes/Store/Sql/SqlStore.php: T97368 Double on server cache for PropertyInfoStore (duration: 00m 57s)
  • 16:29 moritzm: upgraded buster installation image to daily build from 12th of March (T213527)
  • 15:45 otto@deploy1001: scap-helm eventgate-analytics finished
  • 15:45 otto@deploy1001: scap-helm eventgate-analytics cluster codfw completed
  • 15:45 otto@deploy1001: scap-helm eventgate-analytics upgrade production -f eventgate-analytics-codfw-values.yaml stable/eventgate-analytics [namespace: eventgate-analytics, clusters: codfw]
  • 15:43 otto@deploy1001: scap-helm eventgate-analytics finished
  • 15:43 otto@deploy1001: scap-helm eventgate-analytics cluster eqiad completed
  • 15:43 otto@deploy1001: scap-helm eventgate-analytics upgrade production -f eventgate-analytics-eqiad-values.yaml stable/eventgate-analytics [namespace: eventgate-analytics, clusters: eqiad]
  • 15:43 otto@deploy1001: scap-helm eventgate-analytics upgrade production -f eventgate-analytics-eqiad-values.yaml stable/eventgate-analytics [namespace: eventgate-analytics, clusters: eqiad]
  • 15:42 ppchelko@deploy1001: scap-helm eventgate-analytics upgrade production -f eventgate-analytics-eqiad-values.yaml stable/eventgate-analytics [namespace: eventgate-analytics, clusters: eqiad]
  • 15:41 ppchelko@deploy1001: scap-helm eventgate-analytics upgrade -f eventgate-analytics-eqiad-values.yaml stable/eventgate-analytics [namespace: eventgate-analytics, clusters: eqiad]
  • 15:39 ppchelko@deploy1001: scap-helm eventgate-analytics upgrade production -f eventgate-analytics-eqiad-values.yaml stable/eventgate-analytics [namespace: eventgate-analytics, clusters: eqiad]
  • 15:38 ayounsi@puppetmaster1001: conftool action : set/pooled=yes; selector: name=dns2001.wikimedia.org,service=pdns_recursor
  • 15:37 ppchelko@deploy1001: scap-helm eventgate-analytics upgrade production -f eventgate-analytics-eqiad-values.yaml stable/eventgate-analytics [namespace: eventgate-analytics, clusters: eqiad]
  • 15:33 ppchelko@deploy1001: scap-helm eventgate-analytics upgrade production -f eventgate-analytics-eqiad-values.yaml stable/eventgate-analytics [namespace: eventgate-analytics, clusters: eqiad]
  • 15:33 ppchelko@deploy1001: scap-helm eventgate-analytics upgrade production stable/eventgate-analytics -f eventgate-analytics-eqiad-values.yaml [namespace: eventgate-analytics, clusters: eqiad]
  • 15:28 otto@deploy1001: scap-helm eventgate-analytics finished
  • 15:28 otto@deploy1001: scap-helm eventgate-analytics cluster staging completed
  • 15:28 otto@deploy1001: scap-helm eventgate-analytics upgrade staging -f eventgate-analytics-staging-values.yaml stable/eventgate-analytics [namespace: eventgate-analytics, clusters: staging]
  • 15:26 ayounsi@puppetmaster1001: conftool action : set/pooled=no; selector: name=dns2001.wikimedia.org,service=pdns_recursor
  • 15:23 ayounsi@puppetmaster1001: conftool action : set/pooled=no; selector: name=dns2001.wikimedia.org,service=pdns_^Ccursor
  • 15:02 ppchelko@deploy1001: scap-helm eventgate-analytics finished
  • 15:02 ppchelko@deploy1001: scap-helm eventgate-analytics cluster staging completed
  • 15:02 ppchelko@deploy1001: scap-helm eventgate-analytics upgrade staging -f eventgate-analytics-staging-values.yaml stable/eventgate-analytics [namespace: eventgate-analytics, clusters: staging]
  • 15:02 ppchelko@deploy1001: scap-helm eventgate-analytics upgrade -f eventgate-analytics-staging-values.yaml stable/eventgate-analytics [namespace: eventgate-analytics, clusters: staging]
  • 15:00 ppchelko@deploy1001: scap-helm eventgate-analytics upgrade -n staging -f eventgate-analytics-staging-values.yaml stable/eventgate-analytics [namespace: eventgate-analytics, clusters: staging]
  • 14:26 mutante: phab1002 - reboot
  • 13:43 marostegui: Upgrade MySQL and kernel on db2094 (inactive sanitarium)
  • 13:27 marostegui: Deploy schema change on s6 codfw, lag will be generated on s6 codfw
  • 13:24 zfilipin@deploy1001: rebuilt and synchronized wikiversions files: group0 to 1.33.0-wmf.21
  • 12:41 arturo: T215605 include python-mwclient .deb in openstack-mitaka-jessie/jessie-wikimedia in install1002
  • 12:23 jynus: testing snapshotting on db1117:3325 -> dbstore1001 T210292
  • 12:23 zfilipin@deploy1001: Finished scap: testwiki to php-1.33.0-wmf.21 and rebuild l10n cache (duration: 34m 25s)
  • 12:09 moritzm: upgrading mw1238-mw1258 to component/php72 / PHP 7.2.16 (combined with glibc/OpenSSL updates)
  • 11:59 mutante: analytics-tool1004 - start superset service
  • 11:48 zfilipin@deploy1001: Started scap: testwiki to php-1.33.0-wmf.21 and rebuild l10n cache
  • 11:47 zfilipin@deploy1001: Pruned MediaWiki: 1.33.0-wmf.17 [keeping static files] (duration: 01m 40s)
  • 11:45 zfilipin@deploy1001: Pruned MediaWiki: 1.33.0-wmf.18 [keeping static files] (duration: 01m 35s)
  • 11:42 arturo: T215605 include python-oath .deb in stretch-wikimedia thirdparty/oath
  • 11:41 zfilipin@deploy1001: Pruned MediaWiki: 1.33.0-wmf.16 (duration: 12m 41s)
  • 11:39 elukey: raise mysql's max_user_connection to 1000 for the Analytics user on labsdb1012
  • 11:36 ema@puppetmaster1001: conftool action : set/pooled=yes; selector: name=cp1077.eqiad.wmnet
  • 11:36 ema: cp1077: repool varnish-be after service restart T217893
  • 11:35 arturo: delete wrong stretch-wikimedia `thirdparty` component in install1002
  • 11:12 zeljkof: EU SWAT finished
  • 11:12 kartik@deploy1001: Synchronized wmf-config/InitialiseSettings.php: SWAT: gerrit:495842 Add campaign prefix for EG tag (T216123) (duration: 00m 49s)
  • 11:11 moritzm: upgrading API servers/job runners servers in eqiad to component/php72 / PHP 7.2.16 (combined with glibc/OpenSSL updates) (T216712)
  • 10:32 marostegui: Deploy schema change on db1082, lag will happen on s5 on labs
  • 10:29 gtirloni: re-enabled puppet on serpens and seaborgium
  • 10:19 gtirloni: updated slapd to version 2.4.47 on seaborgium (T217280)
  • 10:17 moritzm: upgrading API servers/job runners servers in codfw to component/php72 / PHP 7.2.16 (combined with glibc/OpenSSL updates) (T216712)
  • 10:14 gtirloni: upgrading seaborgium to slapd 2.4.47
  • 09:39 jynus: stop db1114 and restart it empty
  • 09:25 marostegui@deploy1001: Synchronized wmf-config/db-eqiad.php: Repool db1082 (duration: 00m 48s)
  • 08:57 elukey: restart memcached on mc1019 to apply new settings - T217731
  • 08:50 ema: cp1077 depooled again T217893
  • 08:49 ema@puppetmaster1001: conftool action : set/pooled=no; selector: name=cp1077.eqiad.wmnet
  • 08:48 moritzm: upgrading app servers in codfw to component/php72 / PHP 7.2.16 (combined with glibc/OpenSSL updates) (T216712)
  • 08:48 ema: restart varnish-be on cp1077 T217893
  • 08:47 moritzm: upgrading app servers in codfw to component/php72 / PHP 7.2.16 (combined with glibc/OpenSSL updates)
  • 08:46 ema: cp1077 repooled T217893
  • 08:46 ema@puppetmaster1001: conftool action : set/pooled=yes; selector: name=cp1077.eqiad.wmnet
  • 08:36 marostegui@deploy1001: Synchronized wmf-config/db-eqiad.php: Depool db1082 for schema change (duration: 00m 48s)
  • 08:34 jynus: deploy core replica events to db1118
  • 08:15 ema: cp1099: ferm.service failed to resolve prometheus1003.eqiad.wmnet. ferm restarted T202966
  • 07:18 marostegui: Deploy schema change on db2052 (s5 codfw master), this will generate lag on codfw T71127 T51199
  • 07:17 marostegui@deploy1001: Synchronized wmf-config/db-eqiad.php: Repool db1113 after schema change and upgrade (duration: 00m 49s)
  • 07:09 marostegui: Upgrade mysql and kernel on db1113
  • 06:40 kart_: Finished manual run of unpublished ContentTranslation draft purge script (T217818)
  • 06:17 marostegui@deploy1001: Synchronized wmf-config/db-eqiad.php: Depool db1113 for schema change and upgrade (duration: 00m 50s)
  • 04:04 kart_: Started manual run of unpublished ContentTranslation draft purge script (T217818)
  • 02:40 ejegg: updated payments-wiki from f1a89d7045 to 7a312e371a

2019-03-11

  • 17:55 addshore@deploy1001: Synchronized wmf-config/interwiki-labs.php: BETA ONLY https://gerrit.wikimedia.org/r/#/c/operations/mediawiki-config/+/495723/ (duration: 00m 48s)
  • 17:43 addshore@deploy1001: Synchronized wmf-config/InitialiseSettings-labs.php: BETA ONLY https://gerrit.wikimedia.org/r/#/c/operations/mediawiki-config/+/495721/ (duration: 00m 49s)
  • 17:23 arturo: T215605 copy python-oath from jessie-wikimedia/thirdparty to stretch-wikimedia/thirdpary in reprepro
  • 17:03 jbond@cumin1001: END (FAIL) - Cookbook sre.hosts.ipmi-password-reset (exit_code=99)
  • 17:02 jbond@cumin1001: START - Cookbook sre.hosts.ipmi-password-reset
  • 16:31 jbond@cumin1001: END (FAIL) - Cookbook sre.hosts.ipmi-password-reset (exit_code=99)
  • 16:31 jbond@cumin1001: START - Cookbook sre.hosts.ipmi-password-reset
  • 15:26 marostegui@deploy1001: Synchronized wmf-config/db-eqiad.php: Fully repool db1097 (duration: 00m 48s)
  • 15:16 mlitn@deploy1001: Synchronized wmf-config/InitialiseSettings-labs.php: Fix syntax for MediaInfo depicts config (beta only) (duration: 00m 49s)
  • 14:55 marostegui@deploy1001: Synchronized wmf-config/db-eqiad.php: More traffic to db1097 (duration: 00m 49s)
  • 14:43 moritzm: upgrading mw canaries to PHP 7.2.16
  • 14:32 marostegui@deploy1001: Synchronized wmf-config/db-eqiad.php: More traffic to db1097 (duration: 00m 48s)
  • 14:25 hashar: contint1001: stopping zuul-merger (it is cpu or IO starving the server)
  • 14:21 moritzm: upgrading mwdebug servers to PHP 7.2.16
  • 14:11 marostegui@deploy1001: Synchronized wmf-config/db-eqiad.php: Slowly repool db1097 (duration: 00m 47s)
  • 14:09 moritzm: importing build of PHP 7.2.16 for component/php72 (T216712)
  • 13:58 marostegui: Upgrade mysql on db1097
  • 13:28 arturo: disable active checks in icinga for labtestvirt200[12] (T218023)
  • 13:04 moritzm: upgrading mwdebug2002 to php 7.2.16
  • 12:23 gtirloni: updated slapd to version 2.4.47 on serpens (T217280)
  • 12:05 gtirloni: updating slapd on serpens/codfw to test possible fix for memory leaks
  • 10:48 marostegui@deploy1001: Synchronized wmf-config/db-eqiad.php: Depool db1097 for upgrade and schema change (duration: 00m 48s)
  • 10:44 jdrewniak@deploy1001: Synchronized portals: Wikimedia Portals Update: Bumping portals to master (T128546) (duration: 00m 49s)
  • 10:44 jdrewniak@deploy1001: Synchronized portals/wikipedia.org/assets: Wikimedia Portals Update: Bumping portals to master (T128546) (duration: 00m 49s)
  • 09:56 moritzm: installing chromium security updates on remaining proton hosts
  • 09:44 moritzm: installing chromium security updates on proton1001
  • 09:44 elukey: roll restart of aqs on aqs100* to pick up new druid settings
  • 08:02 marostegui: Upgrade pc1010 (spare)
  • 07:33 marostegui@deploy1001: Synchronized wmf-config/db-eqiad.php: Fully repool db1099 after upgrade (duration: 00m 48s)
  • 07:32 marostegui: Upgrade MySQL and kernel on pc2010 (spare)
  • 07:13 marostegui@deploy1001: Synchronized wmf-config/db-eqiad.php: Give more traffic to db1099 after upgrade (duration: 00m 48s)
  • 06:58 marostegui@deploy1001: Synchronized wmf-config/db-eqiad.php: Give more traffic to db1099 after upgrade (duration: 00m 48s)
  • 06:49 marostegui@deploy1001: Synchronized wmf-config/db-eqiad.php: Slowly repool db1099 after upgrade (duration: 00m 52s)
  • 06:38 kart_: Finished manual run of unpublished ContentTranslation draft purge script (T217818)
  • 06:37 marostegui: Power cycle mw1280 - server down
  • 06:35 marostegui: Upgrade mysql and kernel on db1099
  • 06:34 marostegui@deploy1001: Synchronized wmf-config/db-eqiad.php: Depool db1099 for upgrade (duration: 03m 01s)
  • 06:03 effie: Restarting pdfrender on scb1003
  • 06:02 marostegui: Upgrade MySQL on dbstore1004 (s2, s3, s4)
  • 04:01 kart_: Started manual run of unpublished ContentTranslation draft purge script (T217818)
  • 03:30 kartik@deploy1001: Finished deploy [cxserver/deploy@101bebd]: Update cxserver to 5a26308 (T216044, T217878) (duration: 04m 01s)
  • 03:26 kartik@deploy1001: Started deploy [cxserver/deploy@101bebd]: Update cxserver to 5a26308 (T216044, T217878)

2019-03-10

  • 22:35 gtirloni: toolforge stretch: increased nscd group TTL from 60 to 300sec (T217280)
  • 07:14 _joe_: restarting pdfrender on scb1004

2019-03-08

  • 19:25 reedy@deploy1001: Synchronized wmf-config/InitialiseSettings-labs.php: beta only (duration: 00m 50s)
  • 19:21 moritzm: installing php updates on netmon1002
  • 18:20 reedy@deploy1001: Synchronized wmf-config/InitialiseSettings-labs.php: beta only (duration: 00m 49s)
  • 17:30 robh: decom in progress for rdb100[123478] via T209181
  • 16:48 mbsantos@deploy1001: Finished deploy [kartotherian/deploy@acf2694] (stretch): UBN geoshapes services on maps1004.eqiad.wmnet (T217898) (duration: 00m 22s)
  • 16:47 mbsantos@deploy1001: Started deploy [kartotherian/deploy@acf2694] (stretch): UBN geoshapes services on maps1004.eqiad.wmnet (T217898)
  • 16:23 mbsantos@deploy1001: Finished deploy [kartotherian/deploy@cc302de] (stretch): UBN geoshapes services on maps2004.codfw.wmnet (T217898) (duration: 00m 24s)
  • 16:22 mbsantos@deploy1001: Started deploy [kartotherian/deploy@cc302de] (stretch): UBN geoshapes services on maps2004.codfw.wmnet (T217898)
  • 16:19 mbsantos@deploy1001: Finished deploy [kartotherian/deploy@d71df87] (stretch): UBN geoshapes services (T217898) (duration: 02m 00s)
  • 16:17 mbsantos@deploy1001: Started deploy [kartotherian/deploy@d71df87] (stretch): UBN geoshapes services (T217898)
  • 15:45 papaul: OS install on restbase2019 and restbase2020
  • 15:30 gilles@deploy1001: Finished deploy [performance/coal@8766469]: (no justification provided) (duration: 00m 06s)
  • 15:30 gilles@deploy1001: Started deploy [performance/coal@8766469]: (no justification provided)
  • 14:34 arturo: T215605 add prometheus-rabbitmq-exporter v0.4 to stretch-wikimedia
  • 14:16 gilles@deploy1001: Finished deploy [performance/navtiming@f2d8a5f]: (no justification provided) (duration: 00m 05s)
  • 14:15 gilles@deploy1001: Started deploy [performance/navtiming@f2d8a5f]: (no justification provided)
  • 13:09 vgutierrez@puppetmaster1001: conftool action : set/pooled=no; selector: name=cp1077.eqiad.wmnet
  • 12:47 akosiaris: depooling cp1077 just in case, high mailbox lag https://grafana.wikimedia.org/d/000000352/varnish-failed-fetches?orgId=1&from=now-3h&to=now&var-datasource=eqiad%20prometheus%2Fops&var-cache_type=text&var-server=All&var-layer=backend&panelId=13&fullscreen
  • 12:47 akosiaris@puppetmaster1001: conftool action : set/pooled=yes; selector: name=cp1077.*
  • 12:07 jbond42: rolling security updates of slite3 on jessie and trusty
  • 11:07 moritzm: uploaded tideways 4.0.7-1+wmf1 for component/php72 (T216712)
  • 10:34 marostegui@deploy1001: Synchronized wmf-config/db-eqiad.php: Repool db1080, db1110 (duration: 00m 49s)
  • 10:14 marostegui: Reload haproxy on dbproxy1011 to repool labsdb1009
  • 09:51 mutante: temp disabling puppet on icinga to debug an issue with elastic checks
  • 09:39 marostegui@deploy1001: Synchronized wmf-config/db-eqiad.php: Depool db1080, db1110 (duration: 00m 49s)
  • 09:15 marostegui@deploy1001: Synchronized wmf-config/db-eqiad.php: Repool db1099:3311,db1096:3315 (duration: 00m 49s)
  • 08:37 marostegui: Reload haproxy on dbproxy1011 to depool labsdb1009
  • 08:31 marostegui: Reload haproxy on dbproxy1010 to repool labsdb1011
  • 08:20 marostegui@deploy1001: Synchronized wmf-config/db-eqiad.php: Depool db1099:3311,db1096:3315 (duration: 00m 48s)
  • 08:11 marostegui@deploy1001: Synchronized wmf-config/db-eqiad.php: Fully repool db1076 (duration: 00m 48s)
  • 07:59 elukey@deploy1001: Finished deploy [analytics/superset/deploy@UNKNOWN]: Test deployment for Buster (duration: 00m 40s)
  • 07:58 elukey@deploy1001: Started deploy [analytics/superset/deploy@UNKNOWN]: Test deployment for Buster
  • 07:57 elukey@deploy1001: Finished deploy [analytics/superset/deploy@UNKNOWN]: Test deployment for Buster (duration: 00m 02s)
  • 07:57 elukey@deploy1001: Started deploy [analytics/superset/deploy@UNKNOWN]: Test deployment for Buster
  • 07:52 elukey@deploy1001: Finished deploy [analytics/superset/deploy@UNKNOWN]: Test deployment for Buster (duration: 01m 18s)
  • 07:52 marostegui@deploy1001: Synchronized wmf-config/db-eqiad.php: Repool db1077 (duration: 00m 48s)
  • 07:51 elukey@deploy1001: Started deploy [analytics/superset/deploy@UNKNOWN]: Test deployment for Buster
  • 07:49 marostegui@deploy1001: Synchronized wmf-config/db-eqiad.php: More traffic to db1076 after mysql upgrade (duration: 00m 49s)
  • 07:35 elukey@deploy1001: Finished deploy [analytics/superset/deploy@UNKNOWN]: Test deployment for Buster (duration: 00m 30s)
  • 07:34 elukey@deploy1001: Started deploy [analytics/superset/deploy@UNKNOWN]: Test deployment for Buster
  • 07:22 marostegui@deploy1001: Synchronized wmf-config/db-eqiad.php: Slowly repool db1076 after mysql upgrade (duration: 00m 49s)
  • 07:13 marostegui@deploy1001: Synchronized wmf-config/db-eqiad.php: Repool db1076 into API after mysql upgrade (duration: 00m 48s)
  • 07:04 marostegui@deploy1001: Synchronized wmf-config/db-eqiad.php: Slowly repool db1076 after mysql upgrade (duration: 00m 48s)
  • 06:53 marostegui: Stop MySQL on db1076 for upgrade
  • 06:52 marostegui@deploy1001: Synchronized wmf-config/db-eqiad.php: Depool db1076 for mysql upgrade (duration: 00m 49s)
  • 06:22 marostegui: Deploy schema change on s3 db1077 with replication (lag will happen on s3 labs)
  • 06:21 marostegui: Stop replication on s3 on labsdb1009 and labsdb1011
  • 06:20 marostegui: Reload haproxy on dbproxy1010 to depool labsdb1010
  • 06:18 marostegui@deploy1001: Synchronized wmf-config/db-eqiad.php: Depool db1077 (duration: 00m 51s)
  • 00:23 thcipriani@deploy1001: Synchronized php-1.33.0-wmf.20/skins/MinervaNeue/resources/skins.minerva.scripts/toc.js: SWAT: Passing page parameter to TOC toggler T217820 (duration: 00m 50s)
  • 00:16 thcipriani@deploy1001: Synchronized wmf-config/InitialiseSettings-labs.php: SWAT: Cleanup beta cluster config T213599; Enable advanced mobile contributions mode on beta cluster beta-only (noop) sync (duration: 00m 49s)
  • 00:01 ayounsi@puppetmaster1001: conftool action : set/pooled=yes; selector: name=dns2001.wikimedia.org,service=pdns_recursor

2019-03-07

  • 23:53 XioNoX: set net.ipv4.ip_local_port_range="32768 60999" on dns2001 and repool server - T209989
  • 23:46 XioNoX: set net.ipv4.ip_local_port_range="49152 65535" on dns2001 - T209989
  • 23:43 ayounsi@puppetmaster1001: conftool action : set/pooled=no; selector: name=dns2001.wikimedia.org,service=pdns_recursor
  • 23:40 XioNoX: depool dns2001 - T209989
  • 20:44 XioNoX: explicitely disable sampling on non eqiad routers
  • 20:42 thcipriani: restarting gerrit on cobalt for 2.15.11 rollback
  • 20:42 thcipriani@deploy1001: Finished deploy [gerrit/gerrit@5800deb]: Revert "Gerrit to 2.15.11" on cobalt (production) (duration: 00m 07s)
  • 20:41 thcipriani@deploy1001: Started deploy [gerrit/gerrit@5800deb]: Revert "Gerrit to 2.15.11" on cobalt (production)
  • 20:40 thcipriani@deploy1001: Finished deploy [gerrit/gerrit@5800deb]: Revert "Gerrit to 2.15.11" on gerrit2001 only (duration: 00m 10s)
  • 20:40 thcipriani@deploy1001: Started deploy [gerrit/gerrit@5800deb]: Revert "Gerrit to 2.15.11" on gerrit2001 only
  • 20:10 thcipriani: restarting gerrit on cobalt for 2.15.11 upgrade
  • 20:10 thcipriani@deploy1001: Finished deploy [gerrit/gerrit@5800deb]: Gerrit to 2.15.11 on cobalt (production) (duration: 00m 11s)
  • 20:09 thcipriani@deploy1001: Started deploy [gerrit/gerrit@5800deb]: Gerrit to 2.15.11 on cobalt (production)
  • 20:07 thcipriani@deploy1001: Finished deploy [gerrit/gerrit@5800deb]: Gerrit to 2.15.11 on gerrit2001 only (duration: 00m 12s)
  • 20:07 thcipriani@deploy1001: Started deploy [gerrit/gerrit@5800deb]: Gerrit to 2.15.11 on gerrit2001 only
  • 19:33 gilles@deploy1001: Synchronized wmf-config/InitialiseSettings.php: T216499 Enable Priority Hints origin trial on ruwiki (duration: 00m 48s)
  • 19:22 niharika29@deploy1001: Synchronized wmf-config/InitialiseSettings.php: Grant 'reupload-shared' to mediawiki uploaders and fix T217523 (duration: 00m 49s)
  • 19:12 niharika29@deploy1001: Synchronized wmf-config/InitialiseSettings.php: Enable Partial Blocks on Arabic Wikipedia T217283 (duration: 00m 50s)
  • 19:04 arlolra: Updated Parsoid to d4e76d5 (T202905)
  • 18:56 arlolra@deploy1001: Finished deploy [parsoid/deploy@766a920]: Updating Parsoid to d4e76d5 (duration: 05m 01s)
  • 18:51 arlolra@deploy1001: Started deploy [parsoid/deploy@766a920]: Updating Parsoid to d4e76d5
  • 18:39 gehel@puppetmaster1001: conftool action : set/pooled=yes; selector: dc=codfw,cluster=maps,name=maps2004.codfw.wmnet
  • 18:32 mbsantos@deploy1001: Finished deploy [kartotherian/deploy@248b8c4] (stretch): Updating eqiad cluster before repool of maps2004.codfw.wmnet (duration: 01m 25s)
  • 18:30 mbsantos@deploy1001: Started deploy [kartotherian/deploy@248b8c4] (stretch): Updating eqiad cluster before repool of maps2004.codfw.wmnet
  • 18:30 mbsantos@deploy1001: Finished deploy [tilerator/deploy@fac7e5e] (stretch): Updating eqiad cluster before repool of maps2004.codfw.wmnet (duration: 03m 46s)
  • 18:26 mbsantos@deploy1001: Started deploy [tilerator/deploy@fac7e5e] (stretch): Updating eqiad cluster before repool of maps2004.codfw.wmnet
  • 18:25 gehel: cleaning kernel-proposed-updates component on reprepro (install1002)
  • 18:15 XioNoX: disable asw2-c-eqiad <-> asw-c-eqiad link - T208734
  • 17:55 gehel: rolling upgrade of kibana on logstash clusters completed - T216052
  • 17:48 gehel: rolling upgrade of kibana on logstash clusters - T216052
  • 17:44 gehel: rolling upgrade of logstash on logstash clusters completed - T216052
  • 17:36 gehel: rolling upgrade of logstash on logstash clusters - T216052
  • 17:34 gehel@deploy1001: Finished deploy [logstash/plugins@7c4c5ea]: upgrade logstash plugins to 5.6.14 - T216052 (duration: 00m 07s)
  • 17:34 gehel@deploy1001: Started deploy [logstash/plugins@7c4c5ea]: upgrade logstash plugins to 5.6.14 - T216052
  • 17:34 gehel@deploy1001: Finished deploy [logstash/plugins@7c4c5ea]: upgrade logstash plugins to 5.6.14 - T216052 (duration: 00m 08s)
  • 17:33 gehel@deploy1001: Started deploy [logstash/plugins@7c4c5ea]: upgrade logstash plugins to 5.6.14 - T216052
  • 17:16 gehel: rolling upgrade of elasticsearch on logstash clusters completed - T216052
  • 17:09 ariel@deploy1001: Finished deploy [dumps/dumps@3e25558]: fix broken page-content job retries (duration: 00m 04s)
  • 17:09 ariel@deploy1001: Started deploy [dumps/dumps@3e25558]: fix broken page-content job retries
  • 16:54 cmjohnson1: powering off cp1099 to move to different rack T202966
  • 15:26 gehel: rolling upgrade of elasticsearch on logstash clusters - T216052
  • 14:54 hashar: 1.33.0-wmf.20 seems all good
  • 14:46 marostegui: Reload haproxy on dbproxy1011 to repool labsdb1009
  • 14:15 hashar@deploy1001: rebuilt and synchronized wikiversions files: all wikis to 1.33.0-wmf.20
  • 13:47 mutante: phab1002 - removing all php-7.2 packages and letting puppet reinstall them after component change
  • 13:42 marostegui@deploy1001: Synchronized wmf-config/db-eqiad.php: Fully repool db1075 after schema change and mysql upgrade (duration: 00m 55s)
  • 13:41 marostegui: Stop mysql on labsdb1009 for upgrade (this will trigger an haproxy IRC alert)
  • 13:39 marostegui: Reload haproxy on dbproxy1010 to depool labsdb1009
  • 13:27 marostegui@deploy1001: Synchronized wmf-config/db-eqiad.php: More traffic to db1075 after schema change and mysql upgrade (duration: 00m 52s)
  • 12:59 zeljkof: EU SWAT finished
  • 12:56 gtirloni: re-enabled puppet on seaborgium/serpens
  • 12:55 zfilipin@deploy1001: Synchronized wmf-config/InitialiseSettings.php: SWAT: Enable musical notation datatype on testwikidatawiki (T216730) (duration: 00m 56s)
  • 12:42 ariel@deploy1001: Finished deploy [dumps/dumps@3a25aa0]: handle failed xml content jobs correctly (fix regression) (duration: 00m 05s)
  • 12:42 ariel@deploy1001: Started deploy [dumps/dumps@3a25aa0]: handle failed xml content jobs correctly (fix regression)
  • 12:41 zfilipin@deploy1001: Synchronized wmf-config/InitialiseSettings.php: SWAT: Create an uploader group on mediawiki.org (T217523) (duration: 00m 55s)
  • 12:34 zfilipin@deploy1001: Synchronized dblists/commonsuploads.dblist: SWAT: Restrict local uploads on mediawiki.org, take 2 (T217523) (duration: 00m 56s)
  • 12:24 kartik@deploy1001: Synchronized wmf-config/InitialiseSettings.php: SWAT: gerrit:492447 Restore bureaucrat rights on hi.wiktionary to default () (duration: 00m 56s)
  • 12:08 kartik@deploy1001: Synchronized wmf-config/InitialiseSettings.php: SWAT: gerrit:494477 Enable edittag for ExternalGuidance in CX and VE (T216123) (duration: 00m 57s)
  • 12:00 marostegui@deploy1001: Synchronized wmf-config/db-eqiad.php: More traffic to db1075 after schema change and mysql upgrade (duration: 00m 56s)
  • 11:49 marostegui@deploy1001: Synchronized wmf-config/db-eqiad.php: Slowly repool db1075 after schema change and mysql upgrade (duration: 00m 56s)
  • 11:45 gtirloni: temporarily disabled puppet on seaborgium/serpens to try slapd config changes
  • 11:28 gtirloni: updated seaborgium to stretch (T217280)
  • 11:21 mutante: doc.wikimedia.org - back up, manually fixed path to php-fpm.sock to 7.0 - puppet disabled, fix coming
  • 11:18 mutante: doc.wikimedia.org down and being worked on - package downgrade exposed an issue
  • 11:15 marostegui: Stop MySQL on db1075 for upgrade
  • 11:15 mutante: doc1001 - apt-get remove --purge php7.2* (the same packages with 7.0 were previosly installed in parallel)
  • 10:58 gtirloni: upgrading seaborgium to Stretch (so it's running the same distro as serpens/codfw)
  • 10:34 moritzm: restarting HHVM/Apache on mediawiki canaries to pick up OpenSSL security update
  • 10:14 marostegui@deploy1001: Synchronized wmf-config/db-eqiad.php: Depool db1075 for schema change and mysql upgrade (duration: 00m 56s)
  • 10:13 moritzm: upgrading mediawiki canaries to component/php72 (T216712)
  • 09:47 moritzm: upgrading mwdebug servers in eqiad to component/php72 (T216712)
  • 09:37 akosiaris@cumin1001: conftool action : set/pooled=no; selector: dc=codfw,service=citoid,cluster=scb,name=scb.*
  • 09:37 akosiaris: rump up traffic to citoid kubernetes to 100%
  • 09:37 akosiaris@cumin1001: conftool action : set/pooled=no; selector: dc=eqiad,service=citoid,cluster=scb,name=scb.*
  • 09:21 moritzm: upgrading mwdebug servers in codfw to component/php72 (T216712)
  • 09:15 elukey: fixed vlan-analytics1-d-eqiad members on asw2-d-eqiad - T205507
  • 09:03 mutante: mw2151 - mkdir /var/run/nutcracker ; chown nutcracker:nutcracker /var/run/nutcracker ; systemctl start nutcracker - runs again - pooling server
  • 08:57 marostegui@deploy1001: Synchronized wmf-config/db-eqiad.php: Fully repool db1122 (duration: 00m 55s)
  • 08:54 mutante: depooled mw2151 - nutcracker failing
  • 08:19 mutante: reloading icinga service
  • 08:10 marostegui@deploy1001: Synchronized wmf-config/db-eqiad.php: More traffic to db1122 (duration: 00m 55s)
  • 07:47 marostegui@deploy1001: Synchronized wmf-config/db-eqiad.php: repool db1122 into API (duration: 00m 55s)
  • 07:30 marostegui@deploy1001: Synchronized wmf-config/db-eqiad.php: Slowly repool db1122 (duration: 00m 55s)
  • 07:28 marostegui@deploy1001: sync-file aborted: Repool db1121 (duration: 00m 01s)
  • 07:21 marostegui@deploy1001: Synchronized wmf-config/db-eqiad.php: Repool db1121 (duration: 00m 56s)
  • 07:12 marostegui: Stop MySQL on db1122 to upgradwe
  • 07:12 marostegui@deploy1001: Synchronized wmf-config/db-eqiad.php: Depool db1122 for MySQL upgrade (duration: 00m 57s)
  • 06:40 kart_: Finished manual run of unpublished ContentTranslation draft purge script (T217310)
  • 06:03 marostegui: Deploy schema change on db1121, this will generate lag on labsdb:s4 - T86342
  • 06:03 marostegui@deploy1001: Synchronized wmf-config/db-eqiad.php: Depool db1121 (duration: 00m 57s)
  • 04:03 kart_: Started manual run of unpublished ContentTranslation draft purge script (T217310)
  • 01:19 twentyafterfour: phabricator update complete
  • 01:17 twentyafterfour: starting phabricator update to tag release/2019-03-07/1 - expect momentary downtime
  • 01:10 twentyafterfour: preparing phabricator upgrade
  • 00:47 aaron@deploy1001: Synchronized php-1.33.0-wmf.20/includes/specials/pagers/ActiveUsersPager.php: f929e2a5069 (duration: 00m 56s)
  • 00:43 aaron@deploy1001: Synchronized php-1.33.0-wmf.20/includes/specials/SpecialActiveusers.php: f929e2a5069 (duration: 00m 56s)
  • 00:28 aaron@deploy1001: Synchronized wmf-config/InitialiseSettings.php: Enable loading WikibaseCirrusSearch (disabled) on production wikis (duration: 00m 55s)
  • 00:23 aaron@deploy1001: Synchronized wmf-config/InitialiseSettings.php: Run WikibaseCirrusSearch code for search on testwikidatawiki (duration: 00m 56s)

2019-03-06

  • 21:23 XioNoX: test ping-offload with unused IP 208.80.153.225 - T190090
  • 20:30 hashar: 1.33.0-wmf.20 looks fine with group0 and group1
  • 20:14 hashar@deploy1001: Synchronized php: group1 wikis to 1.33.0-wmf.20 (duration: 01m 43s)
  • 20:12 hashar@deploy1001: rebuilt and synchronized wikiversions files: group1 wikis to 1.33.0-wmf.20
  • 19:51 hashar@deploy1001: Synchronized php-1.33.0-wmf.20/extensions/LdapAuthentication/LdapPrimaryAuthenticationProvider.php: Remove calls to no-longer-imeplemented methods after I2eeaeed1 - T217692 (duration: 00m 58s)
  • 19:14 XioNoX: apply ping-offload redirect to private1-a-codfw - T190090
  • 19:03 gtirloni: increased serpens vCPUs from 4 to 8 (T217280)
  • 18:55 gtirloni: increased seaborgium vCPUs from 4 to 8 (T217280)
  • 18:08 bstorm_: re-enabled puppet after observing the change works well on the partner for labstore2004 and T210818
  • 18:07 joal@deploy1001: Finished deploy [analytics/refinery@fef9181]: Regular analytics weekly deploy train (duration: 31m 02s)
  • 18:04 bstorm_: disabled puppet and downtimed labstore2004 while deploying a change for T210818
  • 17:36 joal@deploy1001: Started deploy [analytics/refinery@fef9181]: Regular analytics weekly deploy train
  • 17:34 sbisson@deploy1001: Synchronized wmf-config/throttle.php: SWAT: Added new throttle rules, removed expired (duration: 00m 55s)
  • 17:33 sbisson@deploy1001: sync-file aborted: SWAT: Added new throttle rules, removed expired (duration: 00m 01s)
  • 17:24 sbisson@deploy1001: Synchronized wmf-config/InitialiseSettings.php: SWAT: wgCopyUploadDomains: Changed domain for mehrnews.com (duration: 00m 56s)
  • 17:17 sbisson@deploy1001: Synchronized php-1.33.0-wmf.20/extensions/GrowthExperiments/extension.json: SWAT: Use schema version where reading is a valid editor_interface (duration: 00m 56s)
  • 17:10 elukey@deploy1001: Finished deploy [analytics/superset/deploy@911ad13]: First deploy to new host (duration: 00m 27s)
  • 17:10 elukey@deploy1001: Started deploy [analytics/superset/deploy@911ad13]: First deploy to new host
  • 17:09 sbisson@deploy1001: Synchronized wmf-config/InitialiseSettings.php: SWAT: Welcome survey: send all newcomers to variation A (cs, ko) (duration: 00m 56s)
  • 16:53 jbond42: built prometheus-openldap-exporter for stretch
  • 16:51 ema: upgrade ATS to 8.0.2-1wm1
  • 16:23 moritzm: imported conftool 1.0.2-1+deb10u1 for buster-wikimedia
  • 16:10 krinkle@deploy1001: Synchronized php-1.33.0-wmf.20/includes/api/ApiBase.php: I921777 (duration: 00m 58s)
  • 16:05 moritzm: imported scap for buster-wikimedia (T213527)
  • 14:40 marostegui@deploy1001: Synchronized wmf-config/db-eqiad.php: Repool db1123 (duration: 00m 56s)
  • 13:35 marostegui: Upgrade MySQL on db1123
  • 13:18 jbond42: rolling security updates for file on jessie
  • 13:02 zeljkof: EU SWAT finished
  • 12:41 zfilipin@deploy1001: Synchronized wmf-config/InitialiseSettings.php: SWAT: Change links in cswiki Help Panel (T217391) (duration: 00m 55s)
  • 12:32 oblivian@deploy1001: Synchronized php-1.33.0-wmf.19/extensions/WikimediaEvents: SWAT: Allow directing a sample of users to PHP 7 backport to wmf.19 T216676 (duration: 00m 57s)
  • 12:22 gtirloni: updated serpens to stretch (T217280)
  • 12:22 zfilipin@deploy1001: Synchronized wmf-config/throttle.php: SWAT: Throttle Exception for Art+Feminism event Eindhoven 8th March (T217676) (duration: 00m 56s)
  • 12:10 oblivian@deploy1001: Synchronized wmf-config/CommonSettings.php: Setting php7 sample rate for anonymous users to 0 (duration: 00m 57s)
  • 11:32 godog: bounce prometheus@k8s on prometheus2004 to test limiting concurrent connections
  • 11:21 gtirloni: updated and rebooted seaborgium (T217280)
  • 11:18 gtirloni: updated and rebooted serpens (T217280)
  • 10:56 marostegui: Deploy schema change on db1123
  • 10:56 marostegui@deploy1001: Synchronized wmf-config/db-eqiad.php: Depool db1123 (duration: 00m 53s)
  • 10:48 volans: upgraded spicerack to 0.0.20 on cumin[12]001
  • 10:46 volans: uploaded spicerack_0.0.20-1_amd64.deb to apt.wikimedia.org stretch-wikimedia
  • 10:38 hashar@deploy1001: Synchronized php-1.33.0-wmf.20/extensions/Translate/TranslateUtils.php: Revert "TranslateUtils: Avoid use of deprecated class Revision" - T217689 (duration: 00m 59s)
  • 10:36 hashar: Deploying a hotfix for Translate https://gerrit.wikimedia.org/r/#/c/mediawiki/extensions/Translate/+/494659/
  • 10:22 ema: lvs100[12],lvs1016: upgrade linux to 4.9.144-3.1, reboot for L1TF kernel/microcode updates T203011
  • 09:11 ema: lvs200[123]: upgrade linux to 4.9.144-3.1, reboot for L1TF kernel/microcode updates T203011
  • 09:05 moritzm: removed debmonitor host entry for ruthenium (T216062)
  • 09:01 mutante: switching noc.wikimedia.org from apache to httpd module (mwmaint2001, then mwmaint1002)
  • 08:48 akosiaris@cumin1001: conftool action : set/weight=12; selector: dc=eqiad,service=citoid,cluster=scb,name=kubernetes.*
  • 08:48 akosiaris@cumin1001: conftool action : set/weight=15; selector: dc=codfw,service=citoid,cluster=scb,name=kubernetes.*
  • 08:48 akosiaris: increase citoid traffic to kubernetes infrastructure to 50% T213194
  • 08:48 akosiaris: increase citoid traffic to kubernetes infrastructure to 50%
  • 08:47 marostegui: Deploy schema change on s3 codfw, this will generate lag on codfw - T86342
  • 08:42 ema: lvs300[12]: upgrade linux to 4.9.144-3.1, reboot for L1TF kernel/microcode updates T203011
  • 08:30 marostegui@deploy1001: Synchronized wmf-config/db-eqiad.php: Repool db1090 after MySQL upgrade (duration: 00m 59s)
  • 08:15 marostegui: Stop MySQL on db1090 for mysql upgrade
  • 08:14 marostegui@deploy1001: Synchronized wmf-config/db-eqiad.php: Depool db1090 for MySQL upgrade (duration: 00m 56s)
  • 08:14 marostegui: Reload haproxy on dbproxy1010 to repool labsdb1011
  • 07:48 marostegui@deploy1001: Synchronized wmf-config/db-eqiad.php: Fully repool db1105 after MySQL upgrade (duration: 00m 56s)
  • 07:40 marostegui@deploy1001: Synchronized wmf-config/db-eqiad.php: Increase traffic db1105 after MySQL upgrade (duration: 00m 56s)
  • 07:34 marostegui: Remove dbstore1002 from tendril and zarcillo T216491
  • 07:09 elukey: raised analytics user's max_user_connection from 10 to 100 on labsdb1012 - T215231
  • 07:06 marostegui@deploy1001: Synchronized wmf-config/db-eqiad.php: Increase traffic db1105 after MySQL upgrade (duration: 00m 56s)
  • 06:53 marostegui@deploy1001: Synchronized wmf-config/db-eqiad.php: Slowly repool db1105 after MySQL upgrade (duration: 00m 56s)
  • 06:32 marostegui: Stop MySQL on db1105 for MySQL upgrade
  • 06:31 marostegui@deploy1001: Synchronized wmf-config/db-eqiad.php: Depool db1105 for MySQL upgrade (duration: 01m 14s)
  • 06:27 marostegui: Add labsdb1012 to tendril and zarcillo - T215231
  • 05:50 kart_: Finished manual run of unpublished ContentTranslation draft purge script (T217310)
  • 04:26 eileen: civicrm revision changed from 196493f372 to 4aac68eead, config revision is 8ca90b4c7b
  • 04:00 kart_: Started manual run of unpublished ContentTranslation draft purge script (T217310)
  • 00:55 twentyafterfour: finished US Eveninig SWAT.
  • 00:41 twentyafterfour@deploy1001: Synchronized wmf-config/InitialiseSettings.php: sync https://gerrit.wikimedia.org/r/#/c/operations/mediawiki-config/+/494524/ for SWAT refs T217276 (duration: 00m 55s)
  • 00:23 twentyafterfour@deploy1001: Synchronized wmf-config/mobile.php: sync https://gerrit.wikimedia.org/r/#/c/operations/mediawiki-config/+/494271/ for SWAT refs T212253 (duration: 00m 56s)
  • 00:12 twentyafterfour@deploy1001: Synchronized wmf-config/InitialiseSettings.php: sync https://gerrit.wikimedia.org/r/#/c/operations/mediawiki-config/+/493236/ for SWAT. refs T217080 (duration: 00m 56s)

2019-03-05

  • 23:51 ejegg: updated payments-wiki from 4f2935ad17 to f1a89d7045
  • 21:05 godog: temporarily stop requests to k8s instance on prometheus2004
  • 21:00 herron: restarted apache on grafana1001
  • 20:43 herron: retarted apache on grafana1001
  • 19:56 hashar@deploy1001: Synchronized php-1.33.0-wmf.20/extensions/LdapAuthentication/: Stop referring to the now-killed AuthPlugin class - T217692 (duration: 00m 57s)
  • 17:44 godog: bounce uwsgi on graphite1004
  • 17:25 herron: restarting uwsgi-graphite-web on graphite1004
  • 16:54 moritzm: imported logstash 1:5.6.14-1 to thirdparty/elastic56
  • 16:52 herron: restarting uwsgi-graphite-web on graphite1004
  • 16:43 otto@deploy1001: scap-helm eventgate-analytics finished
  • 16:43 otto@deploy1001: scap-helm eventgate-analytics cluster staging completed
  • 16:43 otto@deploy1001: scap-helm eventgate-analytics upgrade staging stable/eventgate-analytics -f eventgate-analytics-staging-values.yaml [namespace: eventgate-analytics, clusters: staging]
  • 16:20 herron: restarting uwsgi-graphite-web on graphite1004
  • 15:53 hashar@deploy1001: rebuilt and synchronized wikiversions files: group0 to 1.33.0-wmf.20
  • 15:35 hashar@deploy1001: Finished scap: testwiki to php-1.33.0-wmf.20 and rebuild l10n cache # T206674 (duration: 51m 03s)
  • 14:52 gtirloni: reprepro added bdsync_0.10-1+deb9u1 T209527
  • 14:44 hashar@deploy1001: Started scap: testwiki to php-1.33.0-wmf.20 and rebuild l10n cache # T206674
  • 14:42 otto@deploy1001: scap-helm eventgate-analytics finished
  • 14:42 otto@deploy1001: scap-helm eventgate-analytics cluster eqiad completed
  • 14:42 otto@deploy1001: scap-helm eventgate-analytics upgrade production stable/eventgate-analytics -f eventgate-analytics-eqiad-values.yaml [namespace: eventgate-analytics, clusters: eqiad]
  • 14:41 otto@deploy1001: scap-helm eventgate-analytics finished
  • 14:41 otto@deploy1001: scap-helm eventgate-analytics cluster codfw completed
  • 14:41 otto@deploy1001: scap-helm eventgate-analytics upgrade production stable/eventgate-analytics -f eventgate-analytics-codfw-values.yaml [namespace: eventgate-analytics, clusters: codfw]
  • 14:40 jiji@cumin1001: conftool action : set/pooled=yes; selector: dc=codfw,service=citoid,cluster=scb,name=kubernetes.*
  • 14:35 hashar@deploy1001: scap failed: CalledProcessError Command '/usr/local/bin/mwscript mergeMessageFileList.php --wiki="cawikibooks" --list-file="/srv/mediawiki-staging/wmf-config/extension-list" --output="/tmp/tmp.BRPBtKvzZH" --verbose' returned non-zero exit status 1 (duration: 00m 20s)
  • 14:35 hashar@deploy1001: Started scap: testwiki to php-1.33.0-wmf.20 and rebuild l10n cache # T206674
  • 14:34 jijiki: Rump up citoid traffic from k8s to 25% on codfw - T213194
  • 14:34 hashar@deploy1001: scap failed: CalledProcessError Command '/usr/local/bin/mwscript mergeMessageFileList.php --wiki="cawikibooks" --list-file="/srv/mediawiki-staging/wmf-config/extension-list" --output="/tmp/tmp.ngh6XIMz8y" --verbose' returned non-zero exit status 1 (duration: 00m 21s)
  • 14:33 hashar@deploy1001: Started scap: testwiki to php-1.33.0-wmf.20 and rebuild l10n cache # T206674
  • 14:33 jiji@cumin1001: conftool action : set/weight=5; selector: dc=codfw,service=citoid,cluster=scb,name=kubernetes.*
  • 14:27 hashar@deploy1001: scap failed: CalledProcessError Command '/usr/local/bin/mwscript mergeMessageFileList.php --wiki="cawikibooks" --list-file="/srv/mediawiki-staging/wmf-config/extension-list" --output="/tmp/tmp.JrfRQw0oDJ" --verbose' returned non-zero exit status 1 (duration: 00m 21s)
  • 14:27 hashar@deploy1001: Started scap: testwiki to php-1.33.0-wmf.20 and rebuild l10n cache # T206674
  • 14:25 hashar@deploy1001: Pruned MediaWiki: 1.33.0-wmf.14 (duration: 09m 47s)
  • 14:20 otto@deploy1001: scap-helm eventgate-analytics finished
  • 14:20 otto@deploy1001: scap-helm eventgate-analytics cluster codfw completed
  • 14:20 otto@deploy1001: scap-helm eventgate-analytics cluster eqiad completed
  • 14:20 otto@deploy1001: scap-helm eventgate-analytics [namespace: eventgate-analytics, clusters: eqiad,codfw]
  • 14:17 hashar@deploy1001: scap failed: LockFailedError Failed to acquire lock "/var/lock/scap.operations_mediawiki-config.lock"; owner is "hashar"; reason is "Pruned MediaWiki: 1.33.0-wmf.14" (duration: 00m 00s)
  • 14:14 hashar: Applied wmf/1.33.0-wmf.20 local patches # T206674
  • 14:12 marostegui@deploy1001: Synchronized wmf-config/db-eqiad.php: Repool db1064 T217591 (duration: 01m 50s)
  • 13:31 hashar: Cutting branch wmf/1.33.0-wmf.20 # T206674
  • 13:16 marostegui@deploy1001: Synchronized wmf-config/db-eqiad.php: Depool db1064 T217591 (duration: 00m 48s)
  • 13:14 ema: lvs500[12]: upgrade linux to 4.9.144-3.1, reboot for L1TF kernel/microcode updates T203011
  • 13:07 zeljkof: EU SWAT finished
  • 12:58 zfilipin@deploy1001: Synchronized wmf-config/InitialiseSettings.php: SWAT: Set wgArticleCountMethod=any for zhwikiversity (T214946) (duration: 00m 49s)
  • 12:45 kartik@deploy1001: Synchronized wmf-config/InitialiseSettings.php: SWAT: Revert "Enable edittag for ExternalGuidance in CX and VE" (duration: 00m 48s)
  • 12:24 kartik@deploy1001: Synchronized wmf-config/InitialiseSettings.php: SWAT: Revert gerrit:493155 (duration: 00m 49s)
  • 11:59 _joe_: upgrading scap everywhere to 3.9.2-1, T217611
  • 11:52 ema: lvs400[56]: upgrade linux to 4.9.144-3.1, reboot for L1TF kernel/microcode updates T203011
  • 11:45 _joe_: installing new scap version in codfw
  • 11:44 oblivian@deploy1001: Synchronized README: Test deploy for new scap version (duration: 00m 48s)
  • 11:43 _joe_: installing new swat version on deployment servers, T217611
  • 11:22 _joe_: uploading new scap packages , T217611
  • 10:58 ema: lvs4007/lvs5003: upgrade linux to 4.9.144-3.1, reboot for L1TF kernel/microcode updates T203011
  • 10:57 marostegui@deploy1001: Synchronized wmf-config/db-eqiad.php: Fully repool db1103:3312 and db1103:3314 after mysql upgrade (duration: 00m 47s)
  • 10:55 gilles@deploy1001: Synchronized php-1.33.0-wmf.19/extensions/NavigationTiming/NavigationTiming.config.php: T187299 Fix wiki oversampling config validation (duration: 00m 48s)
  • 10:37 marostegui@deploy1001: Synchronized wmf-config/db-eqiad.php: More traffic to db1103:3312 and db1103:3314 after mysql upgrade (duration: 00m 48s)
  • 10:27 jiji@cumin1001: conftool action : set/weight=4; selector: dc=eqiad,service=citoid,cluster=scb,name=kubernetes.*
  • 10:24 jijiki: Rump up citoid traffic from k8s to 25% - T213194
  • 10:18 marostegui@deploy1001: Synchronized wmf-config/db-eqiad.php: More traffic to db1103:3312 and db1103:3314 after mysql upgrade (duration: 00m 47s)
  • 10:10 gilles@deploy1001: Synchronized wmf-config/InitialiseSettings.php: T187299 Oversample navtiming on ruwiki and eswiki (duration: 00m 47s)
  • 10:07 gilles@deploy1001: Synchronized php-1.33.0-wmf.19/extensions/NavigationTiming: T187299 Backport wiki oversampling config syntax change (duration: 00m 48s)
  • 10:00 marostegui@deploy1001: Synchronized wmf-config/db-eqiad.php: Slowly repool db1103:3312 and db1103:3314 after mysql upgrade (duration: 00m 50s)
  • 09:56 ema: lvs200[456]: upgrade linux to 4.9.144-3.1, reboot for L1TF kernel/microcode updates T203011
  • 09:31 marostegui: Stop MySQL on db1103:3312 and db1103:3314 for MySQL upgrade
  • 09:29 marostegui@deploy1001: Synchronized wmf-config/db-eqiad.php: Depool db1103:3312 and db1103:3314 for mysql upgrade (duration: 00m 47s)
  • 09:26 ema: lvs100[456]: reboot for L1TF kernel/microcode updates T203011
  • 09:23 marostegui@deploy1001: Synchronized wmf-config/db-eqiad.php: Repool db1081 (duration: 00m 47s)
  • 09:16 godog: kibana refresh field list
  • 08:58 mutante: restarting gerrit to pickup change 493963 - disable jgit gc
  • 08:47 marostegui@deploy1001: Synchronized wmf-config/db-eqiad.php: Depool db1081 (duration: 00m 47s)
  • 08:43 marostegui@deploy1001: Synchronized wmf-config/db-eqiad.php: Fully repool db1084 (duration: 00m 48s)
  • 08:33 marostegui@deploy1001: Synchronized wmf-config/db-eqiad.php: Repool db1084 in API (duration: 00m 48s)
  • 08:32 marostegui: Optimize echo_event table on x1 codfw master (db2034) this will generate lag on x1 codfw - T217591
  • 08:24 akosiaris: T213194 bump percentage of citoid requests reaching eqiad kubernetes cluster to 9%
  • 08:23 jiji@cumin1001: conftool action : set/pooled=yes; selector: dc=eqiad,service=citoid,cluster=scb,name=kubernetes100.*
  • 08:14 marostegui@deploy1001: Synchronized wmf-config/db-eqiad.php: Slowly repool db1084 (duration: 00m 49s)
  • 07:47 marostegui: Upgrade MySQL on db1084
  • 07:18 marostegui: Stop MySQL on db1095 (backups host) to upgrade MySQL
  • 07:18 marostegui@deploy1001: Synchronized wmf-config/db-eqiad.php: Depool db1084 (duration: 00m 47s)
  • 07:08 marostegui: Start transferring data from labsdb1011 to labsdb1012 - T215231
  • 06:56 marostegui: Reboot labsdb1012
  • 06:55 marostegui: Defragment echo_event tables on dbstore1005:3320 T217591
  • 06:51 marostegui@deploy1001: Synchronized wmf-config/db-eqiad.php: Repool db1091 (duration: 00m 48s)
  • 06:43 marostegui: Stop MySQL on db2035 (s2 codfw master) to upgrade MySQL
  • 06:41 kart_: Finished manual run of unpublished ContentTranslation draft purge script (T217310)
  • 06:18 marostegui: Stop MySQL on dbstore2001 to upgrade MySQL
  • 06:17 marostegui: Reload haproxy on dbproxy1010 to depool labsdb1011
  • 06:10 marostegui@deploy1001: Synchronized wmf-config/db-eqiad.php: Depool db1091 (duration: 00m 51s)
  • 03:05 catrope@deploy1001: Synchronized php-1.33.0-wmf.19/includes/api/ApiBase.php: Handle TitleBlacklist errors correctly (T217382) (duration: 00m 49s)
  • 03:03 kart_: Started manual run of unpublished ContentTranslation draft purge script (T217310)
  • 02:59 ejegg: updated payments-wiki from ca7c280f3e to 4f2935ad17
  • 02:27 catrope@deploy1001: Synchronized php-1.33.0-wmf.19/includes/api/ApiBase.php: Revert hot fix (duration: 00m 46s)
  • 02:21 catrope@deploy1001: Synchronized php-1.33.0-wmf.19/includes/api/ApiBase.php: Hot fix for T217615 (duration: 00m 47s)
  • 02:05 catrope@deploy1001: Synchronized php-1.33.0-wmf.19/includes/api/ApiBase.php: Logging live patch to debug T217615 (duration: 00m 47s)
  • 01:33 catrope@deploy1001: Synchronized php-1.33.0-wmf.19/includes/api/ApiBase.php: Logging live patch to debug T217615 (duration: 00m 47s)
  • 01:21 catrope@deploy1001: Synchronized php-1.33.0-wmf.19/includes/api/ApiBase.php: Logging live patch to debug T217615 (duration: 00m 47s)
  • 01:18 catrope@deploy1001: Synchronized php-1.33.0-wmf.19/includes/api/ApiBase.php: Logging live patch to debug T217615 (duration: 00m 47s)
  • 01:15 catrope@deploy1001: Synchronized php-1.33.0-wmf.19/includes/api/ApiBase.php: Logging live patch to debug T217615 (duration: 00m 49s)
  • 01:13 tzatziki: changing password for "Force de Mots" and "שרית חייט"
  • 00:46 XioNoX: disable unused ports of restbase1016 on asw-a
  • 00:44 catrope@deploy1001: Synchronized php-1.33.0-wmf.19/extensions/WikimediaEvents/: Redact title/create params and drop page_title in EditorJourney schema (T213974) (duration: 00m 49s)
  • 00:40 catrope@deploy1001: Synchronized wmf-config/InitialiseSettings.php: Enable ORES goodfaith on itwiki (T211032) (duration: 00m 47s)
  • 00:17 catrope@deploy1001: Synchronized php-1.33.0-wmf.19/extensions/GrowthExperiments/includes/HelpPanel.php: Exclude help panel from main page (T215664) (duration: 00m 48s)
  • 00:12 catrope@deploy1001: Synchronized wmf-config/InitialiseSettings.php: Enable ORES on kowiki (T161628) (duration: 00m 49s)

2019-03-04

  • 23:09 eileen: civicrm revision changed from 316e038a69 to 196493f372, config revision is 8ca90b4c7b
  • 22:15 arlolra: Updated Parsoid to 1660395 (T214099, T202905)
  • 22:05 arlolra@deploy1001: Finished deploy [parsoid/deploy@bdc9e66]: Updating Parsoid to 1660395 (duration: 06m 34s)
  • 21:59 arlolra@deploy1001: Started deploy [parsoid/deploy@bdc9e66]: Updating Parsoid to 1660395
  • 21:58 otto@deploy1001: scap-helm eventgate-analytics finished
  • 21:58 otto@deploy1001: scap-helm eventgate-analytics cluster codfw completed
  • 21:58 otto@deploy1001: scap-helm eventgate-analytics upgrade production stable/eventgate-analytics -f eventgate-analytics-codfw-values.yaml [namespace: eventgate-analytics, clusters: codfw]
  • 21:58 otto@deploy1001: scap-helm eventgate-analytics finished
  • 21:58 otto@deploy1001: scap-helm eventgate-analytics cluster eqiad completed
  • 21:58 otto@deploy1001: scap-helm eventgate-analytics upgrade production stable/eventgate-analytics -f eventgate-analytics-eqiad-values.yaml [namespace: eventgate-analytics, clusters: eqiad]
  • 21:54 otto@deploy1001: scap-helm eventgate-analytics finished
  • 21:54 otto@deploy1001: scap-helm eventgate-analytics cluster staging completed
  • 21:54 otto@deploy1001: scap-helm eventgate-analytics upgrade staging stable/eventgate-analytics -f eventgate-analytics-staging-values.yaml [namespace: eventgate-analytics, clusters: staging]
  • 21:54 otto@deploy1001: scap-helm eventgate-analytics install -n staging -f eventgate-analytics-staging-values.yaml stable/eventgate-analytics [namespace: eventgate-analytics, clusters: staging]
  • 21:49 ejegg: re-enabled Omnimail unsubscribe processing, disabled recipient repair job
  • 21:46 ejegg: updated Fundraising CiviCRM from 616c58cebe to 316e038a69
  • 21:19 XioNoX: add bgp sessions to AS137236 on cr1-eqsin
  • 21:14 XioNoX: re-enable bgp to AS13489 on cr2-eqiad
  • 20:44 reedy@deploy1001: Synchronized php-1.33.0-wmf.19/extensions/Echo/: T217487 (duration: 00m 53s)
  • 20:23 niharika29@deploy1001: Finished deploy [scholarships/scholarships@2ef7463]: Remove outdated translations (duration: 00m 02s)
  • 20:23 niharika29@deploy1001: Started deploy [scholarships/scholarships@2ef7463]: Remove outdated translations
  • 20:17 niharika29@deploy1001: Finished deploy [scholarships/scholarships@2ef7463]: Deploy new version of app with new translations + fix broken privacy policy link (duration: 00m 02s)
  • 20:17 niharika29@deploy1001: Started deploy [scholarships/scholarships@2ef7463]: Deploy new version of app with new translations + fix broken privacy policy link
  • 20:01 sbisson@deploy1001: Synchronized wmf-config/Wikibase.php: SWAT: Enables maplink for geocoordinate Wikibase statements display on clients (duration: 00m 48s)
  • 20:00 sbisson@deploy1001: Synchronized wmf-config/InitialiseSettings.php: SWAT: Enable reader demographics survey (duration: 00m 49s)
  • 19:52 sbisson@deploy1001: Synchronized wmf-config/InitialiseSettings.php: SWAT: GrowthExperiments: Enable help panel for user and user talk NS (duration: 00m 49s)
  • 19:47 sbisson@deploy1001: Synchronized tests/loggingTest.php: SWAT: Add eventbus analytics logging alongside with kafka logging. (part 2) (duration: 00m 48s)
  • 19:46 sbisson@deploy1001: Synchronized wmf-config/: SWAT: Add eventbus analytics logging alongside with kafka logging. (part 1) (duration: 00m 51s)
  • 19:41 onimisionipe@deploy1001: Finished deploy [wdqs/wdqs@20badb3]: Updater and Blazegraph group to report metric domain plus GUI updates (duration: 11m 07s)
  • 19:35 sbisson@deploy1001: Synchronized wmf-config/InitialiseSettings.php: SWAT: Enable GrowthExperiments Homepage on testwiki (duration: 00m 49s)
  • 19:30 onimisionipe@deploy1001: Started deploy [wdqs/wdqs@20badb3]: Updater and Blazegraph group to report metric domain plus GUI updates
  • 19:03 bstorm_: dumps.wikimedia.org is now running off labstore1007 T217473
  • 18:25 bstorm_: disabled notifications for high load on labstore1007 while failed over T217473
  • 18:23 vgutierrez: restarting pybal on lvs5002 - T213121
  • 18:16 XioNoX: push lvs5002 changes on cr2-eqsin - T213121
  • 16:54 hashar: contint1001: cleaned all Docker containers, compress /var/log/zuul/ files
  • 16:52 jiji@cumin1001: conftool action : set/pooled=yes; selector: dc=eqiad,service=citoid,cluster=scb,name=kubernetes1001.*
  • 16:43 marostegui: Restart MySQL on db1112 for addshore
  • 16:33 jynus: enabing gtid replication on clouddb1002
  • 16:29 jforrester@deploy1001: Synchronized wmf-config/CommonSettings.php: T217365: Enable VE section editing on mobile for Beta Cluster, part II (duration: 00m 48s)
  • 16:27 jforrester@deploy1001: Synchronized wmf-config/InitialiseSettings.php: T217365: Enable VE section editing on mobile for Beta Cluster, part I (duration: 00m 51s)
  • 16:18 moritzm: installing ldb security updates
  • 16:13 jiji@cumin1001: conftool action : set/pooled=yes; selector: dc=eqiad,service=citoid,cluster=scb,name=kubernetes1001
  • 16:13 jiji@cumin1001: conftool action : set/weight=1; selector: dc=eqiad,service=citoid,cluster=scb,name=kubernetes1001
  • 16:13 jiji@cumin1001: conftool action : set/weight=1; selector: dc=eqiad,service=citoid,cluster=scb,name=kubernetes.*
  • 15:55 jijiki: Running puppet on sbc* and kubernetes* - T213194
  • 15:44 jijiki: Disabling puppet on sbc* and kubernetes* - T213194
  • 15:22 otto@deploy1001: Synchronized wmf-config/CommonSettings.php: no-op: Remove unused legacy EventBus config settings (duration: 00m 49s)
  • 15:11 marostegui@deploy1001: Synchronized wmf-config/db-eqiad.php: Repool db1097:3314 after changing index on logging table (duration: 00m 51s)
  • 14:54 marostegui@deploy1001: Synchronized wmf-config/db-eqiad.php: Repool db1089 and db1100 after changing index on logging tbale (duration: 00m 49s)
  • 14:20 elukey: update puppet compiler's facts
  • 14:20 marostegui: Change indexes on logging table on db1100 (s5) and db1097:3314 (commonswiki) - T217397
  • 14:06 marostegui@deploy1001: Synchronized wmf-config/db-eqiad.php: Depool db1097:3314, db1100 to changeindexes on logging tbale (duration: 00m 50s)
  • 13:57 gehel: restarting blazegraph on wdqs eqiad
  • 12:23 moritzm: testing component/php72 on mw2224
  • 11:04 akosiaris@deploy1001: scap-helm citoid finished
  • 11:04 akosiaris@deploy1001: scap-helm citoid cluster codfw completed
  • 11:04 akosiaris@deploy1001: scap-helm citoid upgrade -f citoid-codfw-values.yaml production stable/citoid [namespace: citoid, clusters: codfw]
  • 11:04 akosiaris@deploy1001: scap-helm citoid finished
  • 11:04 akosiaris@deploy1001: scap-helm citoid cluster eqiad completed
  • 11:04 akosiaris@deploy1001: scap-helm citoid upgrade -f citoid-eqiad-values.yaml production stable/citoid [namespace: citoid, clusters: eqiad]
  • 11:04 akosiaris@deploy1001: scap-helm citoid finished
  • 11:04 akosiaris@deploy1001: scap-helm citoid cluster staging completed
  • 11:04 akosiaris@deploy1001: scap-helm citoid upgrade -f citoid-staging-values.yaml staging stable/citoid [namespace: citoid, clusters: staging]
  • 10:53 marostegui@deploy1001: Synchronized wmf-config/db-eqiad.php: More weight to db1089 (duration: 00m 48s)
  • 10:38 jdrewniak@deploy1001: Synchronized portals: Wikimedia Portals Update: Bumping portals to master (T128546) (duration: 00m 50s)
  • 10:37 jdrewniak@deploy1001: Synchronized portals/wikipedia.org/assets: Wikimedia Portals Update: Bumping portals to master (T128546) (duration: 00m 50s)
  • 09:44 marostegui@deploy1001: Synchronized wmf-config/db-eqiad.php: Repool db1089 with low weight (duration: 00m 48s)
  • 09:27 ariel@deploy1001: Finished deploy [dumps/dumps@932bf7e]: make misc dumps failure message nicer (duration: 00m 09s)
  • 09:27 ariel@deploy1001: Started deploy [dumps/dumps@932bf7e]: make misc dumps failure message nicer
  • 09:22 godog: temporarily stop prometheus on prometheus2004 to take a snapshot
  • 08:45 gilles@deploy1001: Synchronized wmf-config/InitialiseSettings.php: T216499 Undo enabling Priority Hints origin trial on ruwiki (duration: 00m 49s)
  • 08:44 marostegui@deploy1001: Synchronized wmf-config/db-eqiad.php: Repool db1103:3314 (duration: 00m 49s)
  • 08:38 gilles@deploy1001: scap failed: average error rate on 7/11 canaries increased by 10x (rerun with --force to override this check, see https://logstash.wikimedia.org/goto/db09a36be5ed3e81155041f7d46ad040 for details)
  • 08:29 marostegui: Change logging indexes on db1089 to leave the indexes exactly like the ones on tables.sql - T217397
  • 08:14 marostegui@deploy1001: Synchronized wmf-config/db-eqiad.php: Depool db1089 - T217397 (duration: 00m 49s)
  • 07:48 ema: cp3032/cp3042: restart varnish-be due to mbox lag
  • 07:42 marostegui@deploy1001: Synchronized wmf-config/db-eqiad.php: Depool db1103:3314 for schema change (duration: 00m 49s)
  • 07:39 marostegui@deploy1001: Synchronized wmf-config/db-eqiad.php: Repool db1097:3314 (duration: 00m 53s)
  • 07:33 marostegui: Reload haproxy on dbproxy1010 to repool labsdb1010
  • 07:17 kart_: Finished manual run of unpublished ContentTranslation draft purge script (T217310)
  • 07:13 marostegui: Remove dbstore1002 from tendril and zarcillo - T216491
  • 07:05 marostegui: Upgrade MySQL on db2088 and db2091
  • 06:46 marostegui: Stop MySQL on dbstore1002 for decommission T210478 T172410 T216491 T215589
  • 06:38 marostegui: Stop MySQL on labsdb1010 for mysql upgrade
  • 06:34 gtirloni: downtimed cloudstore1008/9 (T209527)
  • 06:13 marostegui: Upgrade MySQL on db2041 db2049 db2056 db2095
  • 06:06 marostegui: Run analyze table logging on db2038 and db2059 - T71222
  • 06:05 marostegui: Reload haproxy on dbproxy1010 to depool labsdb1010
  • 06:04 marostegui@deploy1001: Synchronized wmf-config/db-eqiad.php: Depool db1094:3314 for schema change (duration: 01m 11s)
  • 05:18 kart_: Started manual run of unpublished ContentTranslation draft purge script (T217310)

2019-03-03

  • off: restarted icinga on icinga2001, stale status file, too many open files
  • 10:44 elukey: restart pdfrender on scb1003

2019-03-02

  • 12:12 gtirloni: labstore1006 started nfsd T217473

2019-03-01

  • 20:45 ejegg: turned off fundraising omnimail process unsubscribes job
  • 19:40 XioNoX: pre-configure asw-a8 ports on asw2-a8-eqiad - T187960
  • 19:32 XioNoX: pre-configure asw-a7 ports on asw2-a7-eqiad - T187960
  • 19:29 XioNoX: pre-configure asw-a6 ports on asw2-a6-eqiad - T187960
  • 19:17 XioNoX: pre-configure asw-a5 ports on asw2-a5-eqiad - T187960
  • 18:53 robh: notebook1003 has unusually high load recently (23) and seemed to lag in reporting to icinga. no hardware failures, pinged about it in #wikimedia-analytics
  • 16:33 jbond42: rolling security update of bind9 packages on jessie and trusty
  • 15:38 ema: trafficserver_8.0.2-1wm1 uploaded to stretch-wikimedia
  • 15:02 akosiaris: restore proton config values
  • 14:33 hashar: Updating all debian-glue Jenkins job to properly take in account the BUILD_TIMEOUT parameter # T217403
  • 13:24 moritzm: removed sca* hosts from debmonitor database
  • 12:49 akosiaris: lower max_render_queue_size: to 20 for proton on proton100{1,2}
  • 12:32 akosiaris: restart proton1002, OOM showed up
  • 12:31 akosiaris: restart proton on proton1001, counted 99 chromium processes left running since at least Jan 30
  • 11:47 jbond42: rebooting labsdb1005.codfw.wmnet
  • 11:17 jbond42: rebooting labstore2004.codfw.wmnet
  • 11:05 marostegui@deploy1001: Synchronized wmf-config/db-eqiad.php: Fully repool db1094 (duration: 00m 50s)
  • 08:52 godog: temporarily stop prometheus instances on prometheus2004 to take a snapshot
  • 07:44 oblivian@deploy1001: Synchronized README: Test deploy for new scap configuration (duration: 00m 48s)
  • 07:39 oblivian@deploy1001: Synchronized README: noop sync to test opcache-manager (duration: 00m 47s)
  • 07:31 oblivian@deploy1001: Synchronized README: Test deploy for new scap configuration (duration: 00m 46s)
  • 07:29 marostegui@deploy1001: Synchronized wmf-config/db-eqiad.php: Increase traffic for db1094 after mysql upgrade (duration: 00m 47s)
  • 07:23 _joe_: installed php 7.2 compatible packages on deploy1001,2001
  • 07:12 marostegui@deploy1001: Synchronized wmf-config/db-eqiad.php: Increase traffic for db1094 after mysql upgrade (duration: 00m 47s)
  • 06:56 marostegui@deploy1001: Synchronized wmf-config/db-eqiad.php: Slowly repool db1094 after mysql upgrade (duration: 00m 46s)
  • 06:48 marostegui: Deploy schema change on s4 codfw, lag will appear on s4 codfw - T86342
  • 06:43 marostegui: Stop MySQL on db1094 for mysql upgrade
  • 06:40 _joe_: upgrading php extensions on deploy* to versions compatible with php7.2
  • 05:59 marostegui@deploy1001: Synchronized wmf-config/db-eqiad.php: Depool db1094 (duration: 00m 51s)
  • 00:12 XioNoX: pre-configure asw-a3 ports on asw2-a3-eqiad - T187960
  • 00:09 thcipriani@deploy1001: Synchronized README: noop sync to test opcache-manager in scap 3.9.1-1 (duration: 00m 48s)


Archives

See Server admin log/Archives.