Server Admin Log

From Wikitech
(Redirected from SAL)
Jump to navigation Jump to search

2019-06-16

  • 14:20 Urbanecm: running mwscript importImages.php --wiki=commonswiki --comment-ext=txt --user='AKA MBG' /home/urbanecm/T225886
  • 08:21 elukey: roll restart of druid brokers on druid100[4-6], stuck after regular data drop maintenance

2019-06-15

  • 20:38 smalyshev@deploy1001: Finished deploy [wdqs/wdqs@55174a4]: deploy new pattern for bots (duration: 21m 42s)
  • 20:17 smalyshev@deploy1001: Started deploy [wdqs/wdqs@55174a4]: deploy new pattern for bots
  • 20:16 smalyshev@deploy1001: Finished deploy [wdqs/wdqs@55174a4]: deploy new pattern for bots (duration: 00m 54s)
  • 20:15 smalyshev@deploy1001: Started deploy [wdqs/wdqs@55174a4]: deploy new pattern for bots
  • 19:14 SMalyshev: repooled wdqs1004
  • 17:35 elukey: restart hadoop-yarn-resourcemanager on an-masters as attempt to fix yarn.w.o
  • 07:44 SMalyshev: depooled wdqs1004 to catch it up

2019-06-14

  • 23:23 ejegg: updated payments-wiki from 75abd71cc1 to 79d1822644
  • 23:19 SMalyshev: repooled wdqs1003
  • 23:13 SMalyshev: repooled wdqs2003
  • 23:10 _joe_: set cpufreq governor for mw1348 to performance
  • 19:56 SMalyshev: depooled wdqs2003 to catch up
  • 19:17 SMalyshev: depooled wdqs1003 to catch up
  • 15:56 gehel: repooling wdqs1003, not catching up anyway (high edit load)
  • 15:24 godog: test setting 'performance' governor on ms-be2035 - T210723
  • 14:35 godog: powercycle mw1294, down and no console
  • 13:26 gehel: depooling wdqs1003 to allow it to catch up on lag
  • 13:22 joal@deploy1001: Started restart [analytics/aqs/deploy@fc1d232]: (no justification provided)
  • 12:38 godog: test setting 'performance' governor on ms-be2032 - T210723
  • 11:36 godog: test setting 'performance' governor on ms-be2034 - T210723
  • 10:22 marostegui: Optimize tables on pc2008 - T210725
  • 10:17 marostegui@deploy1001: Synchronized wmf-config/db-eqiad.php: Fully repool db1077 after recovering from a crash (duration: 00m 49s)
  • 10:14 godog: test setting 'performance' governor on ms-be2031 - T210723
  • 09:44 godog: test setting 'performance' governor on ms-be2037 - T210723
  • 09:43 godog: test setting 'performance' governor on ms-be2033 - T210723
  • 09:28 godog: test setting 'performance' governor on ms-be2038 - T210723
  • 09:26 godog: test setting 'performance' governor on ms-be2016 - T210723
  • 03:57 SMalyshev: repooled wdqs1005
  • 00:11 SMalyshev: depooled wdqs1005 - let it catch up
  • 00:10 SMalyshev: repooled wdqs1006 - caught up

2019-06-13

  • 23:25 SMalyshev: depooled wdqs1006 to let it catch up quicker
  • 18:10 fdans@deploy1001: Finished deploy [analytics/refinery@67b34fe]: retrying deployment of analytics refinery (duration: 00m 19s)
  • 18:10 fdans@deploy1001: Started deploy [analytics/refinery@67b34fe]: retrying deployment of analytics refinery
  • 18:01 fdans@deploy1001: Finished deploy [analytics/refinery@67b34fe]: deploying refinery source 0.0.92 into refinery (duration: 16m 45s)
  • 17:44 fdans@deploy1001: Started deploy [analytics/refinery@67b34fe]: deploying refinery source 0.0.92 into refinery
  • 17:34 bstorm_: T203254 set cpu scaling governor to performance on labstore1004 and labstore1005
  • 16:02 gehel: restart blazegraph on wdqs public cluster completed
  • 15:58 gehel: restart blazegraph on wdqs public cluster
  • 15:36 gehel: restarting blazegraph on wdqs-internal / eqiad (just in case)
  • 08:09 jynus: reloading proxies for wikireplicas to rebalance load
  • 07:00 marostegui@deploy1001: Synchronized wmf-config/db-eqiad.php: More traffic to db1077 after recovering from a crash (duration: 00m 50s)
  • 00:45 paravoid: setting the CPU governor to performance for ms-be1036 (a while ago)

2019-06-12

  • 18:15 krinkle@deploy1001: Synchronized php-1.34.0-wmf.8/thumb.php: T225197 / 06b631fae5 (duration: 00m 47s)
  • 18:13 krinkle@deploy1001: Synchronized php-1.34.0-wmf.8/extensions/ArticlePlaceholder/includes/: T207235 / a42aa15 (duration: 00m 49s)
  • 16:06 gehel@cumin1001: END (PASS) - Cookbook sre.elasticsearch.rolling-restart (exit_code=0)
  • 15:49 gehel@cumin1001: START - Cookbook sre.elasticsearch.rolling-restart
  • 15:37 legoktm: re-enabled bawolff's gerrit account
  • 15:14 gehel@cumin1001: END (ERROR) - Cookbook sre.elasticsearch.rolling-restart (exit_code=97)
  • 14:38 marostegui: Start replication on all threads on labsdb1010 - T222978
  • 14:35 marostegui@deploy1001: Synchronized wmf-config/db-eqiad.php: More traffic to db1077 after recovering from a crash (duration: 00m 47s)
  • 13:19 gehel@cumin1001: START - Cookbook sre.elasticsearch.rolling-restart
  • 11:55 godog: swift eqiad-prod: put back ms-be1033 - T223518
  • 10:52 godog: force-upgrade mtail to 3.0.0~rc24.1-1 on wezen - T225604
  • 10:36 marostegui@deploy1001: Synchronized wmf-config/db-eqiad.php: More traffic to db1077 after recovering from a crash (duration: 00m 47s)
  • 10:18 akosiaris@deploy1001: scap-helm zotero finished
  • 10:18 akosiaris@deploy1001: scap-helm zotero cluster codfw completed
  • 10:17 akosiaris@deploy1001: scap-helm zotero cluster eqiad completed
  • 10:17 akosiaris@deploy1001: scap-helm zotero upgrade --dry-run --debug production stable/zotero [namespace: zotero, clusters: eqiad,codfw]
  • 10:01 marostegui@deploy1001: Synchronized wmf-config/db-eqiad.php: Slowly repool db1077 after a crash (duration: 00m 48s)
  • 09:51 gehel@cumin2001: END (PASS) - Cookbook sre.elasticsearch.rolling-restart (exit_code=0)
  • 08:59 hashar: Gracefully stopping Zuul (kill -SIGUSR1) to prepare for the restart of the CI Jenkins T225322
  • 08:41 onimisionipe: pool map2003. reimage and setup is complete - T224395
  • 08:31 gehel@cumin2001: START - Cookbook sre.elasticsearch.rolling-restart
  • 06:49 marostegui@deploy1001: Synchronized wmf-config/db-eqiad.php: Slowly repool db1077 after a crash (duration: 00m 49s)

2019-06-11

  • 19:24 tzatziki: Removing four (4) files for legal compliance
  • 15:41 gehel: shutting down elastic1029 for investigation - T214283
  • 12:54 godog: swift eqiad-prod: put back ms-be1033 - T223518
  • 11:52 gehel@cumin2001: END (PASS) - Cookbook sre.postgresql.postgres-init (exit_code=0)
  • 10:54 godog: wipe fs on ms-be1033 data partitions - T223518
  • 09:56 gehel@cumin2001: START - Cookbook sre.postgresql.postgres-init
  • 09:20 godog: free up space wrongly allocated onto / with sdc1 umounted on ms-be2018
  • 08:26 gehel: repooling maps200[124]

2019-06-10

  • 19:39 thcipriani: restarting jenkins
  • 19:11 akosiaris: refresh all zotero pods in all clusters
  • 19:11 akosiaris@deploy1001: scap-helm zotero finished
  • 19:11 akosiaris@deploy1001: scap-helm zotero cluster staging completed
  • 19:11 akosiaris@deploy1001: scap-helm zotero upgrade -f zotero-values-staging.yaml staging stable/zotero [namespace: zotero, clusters: staging]
  • 19:11 akosiaris@deploy1001: scap-helm zotero finished
  • 19:10 akosiaris@deploy1001: scap-helm zotero cluster eqiad completed
  • 19:10 akosiaris@deploy1001: scap-helm zotero upgrade -f zotero-values-eqiad.yaml production stable/zotero [namespace: zotero, clusters: eqiad]
  • 19:10 akosiaris@deploy1001: scap-helm zotero finished
  • 19:10 akosiaris@deploy1001: scap-helm zotero cluster codfw completed
  • 19:10 akosiaris@deploy1001: scap-helm zotero upgrade -f zotero-values-codfw.yaml production stable/zotero [namespace: zotero, clusters: codfw]
  • 17:55 ottomata: rolling restart of AQS service using scap deploy for new mediawiki_history_snaphost
  • 17:55 otto@deploy1001: Started restart [analytics/aqs/deploy@fc1d232]: (no justification provided)
  • 16:24 marostegui: Power reset db1077 from the idrac T225391
  • 13:18 mvolz@deploy1001: scap-helm citoid finished
  • 13:18 mvolz@deploy1001: scap-helm citoid cluster codfw completed
  • 13:18 mvolz@deploy1001: scap-helm citoid upgrade production -f citoid-codfw-values.yaml stable/citoid [namespace: citoid, clusters: codfw]
  • 13:13 mvolz@deploy1001: scap-helm citoid finished
  • 13:13 mvolz@deploy1001: scap-helm citoid cluster eqiad completed
  • 13:13 mvolz@deploy1001: scap-helm citoid upgrade production -f citoid-eqiad-values.yaml stable/citoid [namespace: citoid, clusters: eqiad]
  • 13:04 mvolz@deploy1001: scap-helm citoid finished
  • 13:04 mvolz@deploy1001: scap-helm citoid cluster staging completed
  • 13:04 mvolz@deploy1001: scap-helm citoid upgrade staging -f citoid-staging-values.yaml stable/citoid [namespace: citoid, clusters: staging]
  • 05:37 marostegui@deploy1001: Synchronized wmf-config/db-eqiad.php: Depool db1077 - host crashed (duration: 00m 52s)

2019-06-09

  • 08:30 vgutierrez: rebooting lvs4007 after NIC driver crash

2019-06-08

  • 11:58 godog: stop swift processes on ms-be1033 - T223518
  • 10:46 reedy@deploy1001: Synchronized wmf-config/throttle.php: T225344 (duration: 00m 51s)

2019-06-07

  • 18:56 herron: performing rolling reboots of logstash codfw frontends for security updates
  • 18:22 cstone: Update payments-wiki revision changed from c6c7bbf71e to 75abd71cc1
  • 15:34 godog: bounce rsyslog on wezen - T199406

2019-06-07

  • 15:09 elukey: reboot thorium for kernel upgrades
  • 14:00 ema: pool cp3039 w/ ATS backend T222937
  • 13:15 ema: depool cp3039 and reimage as upload_ats T222937
  • 13:04 arturo: aborrero@cumin1001:~ $ sudo cumin "P{R:Systemd::Timer::Job}" "puppet agent --enable && run-puppet-agent" (patch already merged)
  • 13:03 arturo: aborrero@cumin1001:~$ sudo cumin "P{R:Systemd::Timer::Job}" "puppet agent --disable 'arturo merging systemd timer nrpe change'" (19 hosts affected) merging: https://gerrit.wikimedia.org/r/c/operations/puppet/+/514988
  • 11:45 ema: pool cp3043 w/ ATS backend T222937
  • 10:51 jbond42: upload libcpp-hocon0.1.6_0.1.6-1~bpo9+1_amd64.deb to wikimedia-stretch component/facter3
  • 10:45 jbond42: upload libleatherman-data_1.4.0+dfsg-1\~bpo9+1_all.deb to wikimedia-stretch component/facter3
  • 10:43 ema: depool cp3043 and reimage as upload_ats T222937
  • 10:09 _joe_: restarting php-fpm on the codfw hosts to pick up the recent changes in opcache
  • 09:59 jbond42: upload libleatherman1.4.0_1.4.0+dfsg-1~bpo9+1_amd64.deb to wikimedia-stretch component/facter3
  • 09:49 jbond42: upload libleatherman1.4.0_1.4.0+dfsg-1~bpo8+1_amd64.deb to wikimedia-jessie component/facter3
  • 09:16 mobrovac@deploy1001: scap-helm mathoid finished
  • 09:16 mobrovac@deploy1001: scap-helm mathoid cluster codfw completed
  • 09:16 mobrovac@deploy1001: scap-helm mathoid cluster eqiad completed
  • 09:16 mobrovac@deploy1001: scap-helm mathoid upgrade production stable/mathoid -f mathoid-values.yaml [namespace: mathoid, clusters: eqiad,codfw]
  • 09:00 marostegui: Upgrade x1 codfw hosts in preparation for its failover T220170
  • 08:46 elukey: start the reboot of the Analytics Hadoop's worker nodes for kernel+openjdk upgrades
  • 08:24 marostegui: Upgrade s2 codfw to 10.1.39 in preparation for its codfw failover - T221533
  • 08:19 XioNoX: remove BGP session to AS55658 on cr1-eqsin (left the IXP)
  • 08:12 vgutierrez: upgrading certbot in wikitech-static
  • 07:29 marostegui: Drop unused temporary test tables on db1111 and db1112
  • 05:40 marostegui@deploy1001: Synchronized wmf-config/db-codfw.php: Move db2051 from s4 to s2T221533 (duration: 00m 49s)
  • 00:00 jforrester@deploy1001: Synchronized wmf-config/CommonSettings.php: Remove unused preference T47877-buster (duration: 00m 47s)
  • 00:00 bstorm_: T224850 repooled labsdb1009 after completing view updates

2019-06-06

  • 23:57 jforrester@deploy1001: Synchronized wmf-config/CommonSettings.php: Specify the fluidsynth paths for TMH MIDI conversion T135597 (duration: 00m 47s)
  • 23:56 jforrester@deploy1001: Synchronized wmf-config/Wikibase.php: Remove T225183 (duration: 00m 48s)
  • 23:03 jeh: T224850 depooled labsdb1009
  • 22:42 bstorm_: T224850 repooled labsdb1011
  • 21:01 bstorm_: T224850 depooled labsdb1011
  • 20:58 jforrester@deploy1001: Synchronized wmf-config/reverse-proxy.php: Stop setting wgSquidServersNoPurge, MW now uses wgCdnServersNoPurge (duration: 00m 47s)
  • 20:57 jforrester@deploy1001: Synchronized wmf-config/InitialiseSettings.php: Stop setting wgSquidMaxage, MW now uses wgCdnMaxAge (duration: 00m 46s)
  • 20:55 jforrester@deploy1001: Synchronized wmf-config/CommonSettings.php: Stop setting wgUseSquid or using wgSquidServersNoPurge, duplicate existing values (duration: 00m 48s)
  • 20:49 jforrester@deploy1001: Synchronized wmf-config/Wikibase.php: Drop backwards-compatibility for dataSquidMaxage (duration: 00m 48s)
  • 19:47 herron: performing rolling reboot of eqiad logstash hw for MDS security updates
  • 18:58 jbond42: reimage sarin to stretch
  • 18:39 jbond42: mw1249 - sudo systemctl restart php7.2-fpm.service
  • 18:38 papaul: shutting down backup2001 for 10G nic troubleshooting
  • 18:24 bstorm_: T224850 repooled labsdb1010 after completing view run
  • 18:04 jijiki: Continuing rolling restarts of php-fpm in eqiad
  • 17:30 elukey: restart mcrouter on mw2271 (codfw proxy) to pick up new config changes
  • 15:56 bstorm_: T224850 depooled labsdb1010 for view updates
  • 15:28 jmm@cumin2001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
  • 15:28 jmm@cumin2001: START - Cookbook sre.hosts.downtime
  • 15:06 jmm@cumin2001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
  • 15:05 jmm@cumin2001: START - Cookbook sre.hosts.downtime
  • 15:05 moritzm: rolling reboot of sessionstore hosts in eqiad for kernel security update
  • 15:02 _joe_: rolling restart of php-fpm on {appservers,api} in eqiad, in groups of 4, staggered by 10 minutes, to pick up the new opcache settings
  • 14:57 bstorm_: T224850 update views on labsdb1012
  • 14:43 moritzm: updating qemu packages on ganeti hosts to deploy support for md_clear/MDS for Ganeti instances
  • 14:43 elukey: restart mcrouter on mw2255 (codfw proxy) to pick up new config changes
  • 14:22 dcausse@deploy1001: Synchronized wmf-config/CirrusSearch-common.php: fix logspam (duration: 00m 48s)
  • 14:18 gehel@cumin1001: END (PASS) - Cookbook sre.wdqs.restart-wdqs (exit_code=0)
  • 13:54 dcausse@deploy1001: Synchronized wmf-config/CirrusSearch-production.php: fix logspam (duration: 00m 47s)
  • 13:44 moritzm: rolling reboot of sessionstore hosts in codfw for kernel security update
  • 13:44 jmm@cumin2001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
  • 13:44 jmm@cumin2001: START - Cookbook sre.hosts.downtime
  • 13:36 gehel@cumin1001: START - Cookbook sre.wdqs.restart-wdqs
  • 13:35 zfilipin@deploy1001: rebuilt and synchronized wikiversions files: all wikis to 1.34.0-wmf.8
  • 13:35 gehel@cumin1001: END (FAIL) - Cookbook sre.wdqs.restart-wdqs (exit_code=99)
  • 13:35 gehel@cumin1001: START - Cookbook sre.wdqs.restart-wdqs
  • 13:34 gehel@cumin1001: END (PASS) - Cookbook sre.wdqs.restart-wdqs (exit_code=0)
  • 13:33 gehel@cumin1001: START - Cookbook sre.wdqs.restart-wdqs
  • 13:32 gehel@cumin1001: END (PASS) - Cookbook sre.wdqs.restart-wdqs (exit_code=0)
  • 13:31 gehel@cumin1001: START - Cookbook sre.wdqs.restart-wdqs
  • 12:44 jbond42: reimage neodymium
  • 12:23 _joe_: running puppet, restarting php-fpm on the canaries to pick up the new opcache size
  • 12:11 ema: cp1075: repool with varnish 5.1.3-1wm10 T224694
  • 12:10 elukey: restart mcrouter on mw2235
  • 12:05 Lucas_WMDE: EU SWAT done
  • {{safesubst:SAL entry|1=12:04 lucaswerkmeister-wmde@deploy1001: Synchronized wmf-config/: SWAT: [[gerrit:514700|Revert "Specify $wgWBRepoSettings['conceptBaseUri']" (duration: 00m 56s)}}
  • 12:00 ema: cp1075: upgrade varnish to 5.1.3-1wm10 T224694
  • 11:55 lucaswerkmeister-wmde@deploy1001: scap failed: average error rate on 8/11 canaries increased by 10x (rerun with --force to override this check, see https://logstash.wikimedia.org/goto/db09a36be5ed3e81155041f7d46ad040 for details)
  • 11:48 Urbanecm: running mwscript namespaceDupes.php --wiki=thwikisource --fix (T216322)
  • 11:47 Urbanecm: running mwscript namespaceDupes.php --wiki=thwikibooks --fix for T216322
  • 11:46 urbanecm@deploy1001: Synchronized wmf-config/InitialiseSettings.php: Add new namespaces for several Thai projects|gerrit:514678Add new namespaces for several Thai projects (T216322) (duration: 00m 54s)
  • 11:38 lucaswerkmeister-wmde@deploy1001: Synchronized wmf-config/InitialiseSettings.php: SWAT: Remove unused config variable wgWikibaseEnableSenses|gerrit:514534Remove unused config variable wgWikibaseEnableSenses (duration: 00m 55s)
  • 11:23 gehel@cumin2001: END (PASS) - Cookbook sre.postgresql.postgres-init (exit_code=0)
  • 11:22 lucaswerkmeister-wmde@deploy1001: Synchronized php-1.34.0-wmf.8/extensions/CirrusSearch/: SWAT: Fix event validation error for cirrussearch-request event|gerrit:514566Fix event validation error for cirrussearch-request event (duration: 01m 06s)
  • 10:55 elukey: restart mcrouter on mw2163 (codfw mcrouter proxy)
  • 10:43 mobrovac@deploy1001: scap-helm mathoid finished
  • 10:43 mobrovac@deploy1001: scap-helm mathoid cluster codfw completed
  • 10:43 mobrovac@deploy1001: scap-helm mathoid cluster eqiad completed
  • 10:43 mobrovac@deploy1001: scap-helm mathoid upgrade production stable/mathoid -f mathoid-values.yaml [namespace: mathoid, clusters: eqiad,codfw]
  • 10:30 ema: varnish 5.1.3-1wm10 uploaded to stretch-wikimedia T224694
  • 10:19 elukey: rolling restart of mcrouter on mw1* hosts to pick up config change (batch of 5 hosts, depool/run-puppet/pool)
  • 10:12 elukey: disable puppet on mw1* and mw[2163,2235,2255,2271] as prep step for mcrouter config deploy
  • 10:10 fsero: rollbacked last deployment of mathoid to revision 16
  • 09:59 mobrovac@deploy1001: scap-helm mathoid finished
  • 09:59 mobrovac@deploy1001: scap-helm mathoid cluster codfw completed
  • 09:59 mobrovac@deploy1001: scap-helm mathoid cluster eqiad completed
  • 09:59 mobrovac@deploy1001: scap-helm mathoid upgrade production stable/mathoid -f mathoid-values.yaml [namespace: mathoid, clusters: eqiad,codfw]
  • 09:32 moritzm: rebooting mwdebug2002 for some tests
  • 09:31 jmm@cumin2001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
  • 09:30 jmm@cumin2001: START - Cookbook sre.hosts.downtime
  • 09:28 moritzm: updating qemu on ganeti2004 for some tests
  • 09:24 gehel@cumin2001: START - Cookbook sre.postgresql.postgres-init
  • 08:38 marostegui: Stop MySQL on db1117:3322 - this will trigger haproxy alerts - T222682
  • 07:35 marostegui@deploy1001: Synchronized wmf-config/db-eqiad.php: Repool db1121 after upgrade T224852 (duration: 00m 53s)
  • 07:20 marostegui: Stop MySQL on db1121 for upgrade, this will generate lag on labs hosts for s6 - T224852
  • 07:16 marostegui@deploy1001: Synchronized wmf-config/db-codfw.php: Promote db2046 to s6 master as db2039 will be decommissioned T221533 (duration: 00m 55s)
  • 06:31 marostegui: Start topology changes on s6 codfw to promote db2046 as master - T221533
  • 06:23 marostegui@deploy1001: Synchronized wmf-config/db-eqiad.php: Depool db1121 for upgrade T224852 (duration: 00m 55s)
  • 06:15 marostegui@deploy1001: Synchronized wmf-config/db-eqiad.php: Fully repool db1091 after getting its BBU replaced (duration: 00m 54s)
  • 06:01 marostegui@deploy1001: Synchronized wmf-config/db-eqiad.php: More traffic to db1091 after getting its BBU replaced (duration: 01m 01s)
  • 05:47 marostegui@deploy1001: Synchronized wmf-config/db-eqiad.php: More traffic to db1091 after getting its BBU replaced (duration: 00m 55s)
  • 05:41 marostegui: Upgrade MySQL on s6 codfw hosts in preparation for s6 codfw master failover - T221533
  • 05:32 marostegui@deploy1001: Synchronized wmf-config/db-eqiad.php: More traffic to db1091 after getting its BBU replaced (duration: 00m 55s)
  • 05:18 marostegui: Remove db2042 from tendril and zarcillo T225090
  • 05:18 marostegui: Remove db2042 from tendril and zarcillo
  • 05:14 marostegui: Stop MySQL on db2042 to copy its content to dbprov2001 as a temporary backup - T225090
  • 05:11 marostegui: Disable notifications db2042 - T225090
  • 05:09 marostegui@deploy1001: Synchronized wmf-config/db-eqiad.php: Slowly repool db1091 after getting its BBU replaced T225060 (duration: 00m 56s)

2019-06-05

  • 22:15 chaomodus: restarting gerrit on cobalt due to it being down (seems like Java out of heap space)
  • 20:43 mforns@deploy1001: Finished deploy [analytics/refinery@0660e70]: deploying analytics/refinery up to 0660e70 (duration: 19m 30s)
  • 20:39 reedy@deploy1001: Synchronized wmf-config/flaggedrevs.php: Turn off some FR config T225138 (duration: 00m 54s)
  • 20:25 akosiaris@deploy1001: scap-helm blubberoid finished
  • 20:25 akosiaris@deploy1001: scap-helm blubberoid cluster codfw completed
  • 20:25 akosiaris@deploy1001: scap-helm blubberoid cluster eqiad completed
  • 20:25 akosiaris@deploy1001: scap-helm blubberoid upgrade -f blubberoid-values.yaml production stable/blubberoid [namespace: blubberoid, clusters: eqiad,codfw]
  • 20:23 mforns@deploy1001: Started deploy [analytics/refinery@0660e70]: deploying analytics/refinery up to 0660e70
  • 19:57 hashar: contint1001: docker container prune -f && docker image prune -f # reclaimed 166 MB and 3.4 GB
  • 19:48 marostegui: Check data consistency on db1091 against db1135 - T225060
  • 19:45 reedy@deploy1001: Synchronized wmf-config/flaggedrevs.php: T225115 (duration: 00m 54s)
  • 17:36 marostegui: Start replication db1091 - T225060
  • 17:32 marostegui: Start MySQL with replication stopped on db1091 - T225060
  • 16:29 otto@deploy1001: Synchronized wmf-config/InitialiseSettings.php: Revert user-blocks-change to use eventbus and old schema - T211248 (duration: 00m 54s)
  • 16:22 otto@deploy1001: Synchronized wmf-config/InitialiseSettings.php: use eventgate-main for 2 events on all wikis - T211248 (duration: 00m 55s)
  • 16:11 reedy@deploy1001: Synchronized wmf-config/InitialiseSettings.php: Add wgEventServiceStreamConfig and switch 2 topics in group0 T222822 (duration: 00m 56s)
  • 16:11 XioNoX: remove BGP to AS38082 on cr4-ulsfo (left the IXP)
  • 15:46 reedy@deploy1001: Scap failed!: Call to mwscript eval.php returned: None
  • 15:44 reedy@deploy1001: Finished scap: Rebuild .8 i18n for FlaggedRevs (duration: 41m 14s)
  • 15:36 moritzm: installing exim4 security updates
  • 15:03 reedy@deploy1001: Started scap: Rebuild .8 i18n for FlaggedRevs
  • 14:24 marostegui: Poweroff db1091 for BBU replacement - T225060
  • 13:57 elukey: restart mcrouter on MediaWiki app/api canaries to pick up new config change (timeouts before marking a memcached shard as TKO from 3 to 10) - T203786
  • 13:56 jijiki: enabling puppet and pooling on mw* canaries
  • 13:17 jynus: start es2,es3 backup on codfw
  • 13:17 zfilipin@deploy1001: rebuilt and synchronized wikiversions files: group1 wikis to 1.34.0-wmf.8
  • 13:03 hashar: restarting Jenkins
  • 12:53 marostegui@deploy1001: Synchronized wmf-config/db-eqiad.php: More traffic to db1135 (duration: 00m 54s)
  • 12:46 Lucas_WMDE: EU SWAT finished
  • 12:32 ladsgroup@deploy1001: Synchronized php-1.34.0-wmf.8/extensions/WikimediaMessages/: SWAT: Fix wikidata copyright message (T224536)|gerrit:514460Fix wikidata copyright message (T224536) (duration: 00m 56s)
  • 11:43 pmiazga@deploy1001: Synchronized wmf-config/InitialiseSettings.php: SWAT: Enable the new history page in the advanced mobile contributions mode (T219895)|gerrit:514449Enable the new history page in the advanced mobile contributions mode (T219895) (duration: 00m 56s)
  • 11:27 urbanecm@deploy1001: Synchronized wmf-config/flaggedrevs.php: Remove project namespace from flaggedrevs on ruwikisource|gerrit:514413Remove project namespace from flaggedrevs on ruwikisource (T225037) (duration: 00m 54s)
  • 10:57 ladsgroup@deploy1001: Synchronized php-1.34.0-wmf.8/extensions/FlaggedRevs: Add ext.flaggedRevs.icons to modules registeration|gerrit:514456Add ext.flaggedRevs.icons to modules registeration (duration: 00m 57s)
  • 10:14 marostegui@deploy1001: Synchronized wmf-config/db-eqiad.php: More traffic to db1135 (duration: 00m 55s)
  • 10:09 godog: mount sdb3 on ms-be1022 - T225079
  • 09:50 marostegui@deploy1001: Synchronized wmf-config/db-eqiad.php: Pool db1135 with very low weight on s4 (duration: 00m 55s)
  • 09:26 marostegui@deploy1001: Synchronized wmf-config/db-codfw.php: Pool without traffic db1135 into s4 T225060 (duration: 00m 55s)
  • 09:25 marostegui@deploy1001: Synchronized wmf-config/db-eqiad.php: Pool without traffic db1135 into s4 T225060 (duration: 00m 56s)
  • 08:42 onimisionipe: removing maps2001 from cassandra cluster. It is going to be reimaged - T224395
  • 08:40 _joe_: rolling restart of php7 on the api servers, to test a different strategy of restarting compared to the appservers.
  • 08:21 _joe_: performing a rolling restart of the php appservers via cumin to test speed and safety of the operations proposed in T224857
  • 08:13 jmm@cumin2001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
  • 08:12 jmm@cumin2001: START - Cookbook sre.hosts.downtime
  • 08:12 moritzm: rebooting pybal-test2001 for tests with new qemu
  • 08:12 ema: pool cp3035 w/ ATS backend T222937
  • 08:12 marostegui: Reboot db1091 T225060
  • 08:05 moritzm: installing qemu security updates on Ganeti hosts
  • 07:45 marostegui: Transfer dbprov1001.eqiad.wmnet:snapshot.s4.2019-06-04--21-37-03.tar.gz to db1135 to provision it on s4 T225060
  • 07:33 marostegui@deploy1001: Synchronized wmf-config/db-eqiad.php: Clarify db1091 status (duration: 00m 56s)
  • 07:22 ema: depool cp3035 and reimage as upload_ats T222937
  • 07:11 marostegui@deploy1001: Synchronized wmf-config/db-eqiad.php: Depool db1091 - host went down (duration: 00m 55s)
  • 06:45 marostegui: Restart MySQL on db2110 to get the binlog format changed to STATEMENT - T220170
  • 06:45 marostegui@deploy1001: Synchronized wmf-config/db-codfw.php: Promote db2090 to s4 codfw master T220170 (duration: 00m 54s)
  • 06:25 marostegui@deploy1001: Synchronized wmf-config/db-codfw.php: Mimic s4 codfw weights to eqiad T220170 (duration: 00m 55s)
  • 06:17 marostegui: Start topology changes on s4 codfw to replace current master db2051 with db2090 - T220170
  • 06:13 marostegui@deploy1001: Synchronized wmf-config/db-eqiad.php: Fully repool db1084 into API (duration: 00m 54s)
  • 05:57 marostegui@deploy1001: Synchronized wmf-config/db-eqiad.php: Repool db1084 after upgrade T224852 (duration: 00m 55s)
  • 05:49 marostegui: Upgrade MySQL on db1084 T224852
  • 05:49 marostegui@deploy1001: Synchronized wmf-config/db-eqiad.php: Depool db1084 for upgrade T224852 (duration: 01m 06s)
  • 05:31 marostegui: Stop MySQL on db1125 (sanitarium) s2,s4,s6,s7 to upgrade mysql - T224852
  • 05:29 marostegui: Keep compressing tables on labsdb1012 - T222978
  • 05:22 marostegui: Change replication topology on m3 codfw to promote db2065 as codfw master instead of db2042 - T221533
  • 05:07 marostegui: Upgrade Mysql on labsdb1012 - T224852
  • 04:09 onimisionipe: starting postgres slave init on maps2001 - T224395

2019-06-04

  • 23:03 catrope@deploy1001: Synchronized wmf-config/InitialiseSettings.php: Change log level to debug for PageTriage (duration: 01m 03s)
  • 22:06 eileen: civicrm revision changed from 506ebe2f2a to 5c02e62d6e, config revision is 63438eea43
  • 21:08 jbond42: finished rolling reboots of mw1* servers
  • 21:07 jbond42: finished tolling reboots of mw1* servers
  • 20:56 jbond@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
  • 20:55 jbond@cumin1001: START - Cookbook sre.hosts.downtime
  • 20:36 jbond@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
  • 20:36 jbond@cumin1001: START - Cookbook sre.hosts.downtime
  • 20:26 jbond@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
  • 20:26 jbond@cumin1001: START - Cookbook sre.hosts.downtime
  • 20:10 jbond@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
  • 20:10 jbond@cumin1001: START - Cookbook sre.hosts.downtime
  • 20:03 jbond@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
  • 20:03 jbond@cumin1001: START - Cookbook sre.hosts.downtime
  • 19:55 jbond@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
  • 19:55 jbond@cumin1001: START - Cookbook sre.hosts.downtime
  • 19:48 jbond@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
  • 19:48 jbond@cumin1001: START - Cookbook sre.hosts.downtime
  • 19:48 XioNoX: replace logstash.svc.eqiad.wmnet syslog target with syslog.codfw.wmnet on cr4-ulsfo - T224128
  • 19:41 jbond@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
  • 19:41 jbond@cumin1001: START - Cookbook sre.hosts.downtime
  • 19:41 jbond42: reboot mwdebug1002
  • 19:36 jbond42: reboot mwdebug1001
  • 19:35 jbond@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
  • 19:35 jbond@cumin1001: START - Cookbook sre.hosts.downtime
  • 19:28 jbond@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
  • 19:28 jbond@cumin1001: START - Cookbook sre.hosts.downtime
  • 19:21 jbond@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
  • 19:21 jbond@cumin1001: START - Cookbook sre.hosts.downtime
  • 18:45 jbond@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
  • 18:45 jbond@cumin1001: START - Cookbook sre.hosts.downtime
  • 18:38 jbond@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
  • 18:38 jbond@cumin1001: START - Cookbook sre.hosts.downtime
  • 18:17 jbond@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
  • 18:17 jbond@cumin1001: START - Cookbook sre.hosts.downtime
  • 18:10 herron: correction — performing rolling reboots of codfw logstash hardware hosts for MDS security updates
  • 18:10 herron: performing rolling reboots of eqiad logstash hardware hosts for MDS security updates
  • 18:06 jbond@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
  • 18:05 jbond@cumin1001: START - Cookbook sre.hosts.downtime
  • 18:04 bblack: pool cp3045 - T222937
  • 17:49 jbond@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
  • 17:49 jbond@cumin1001: START - Cookbook sre.hosts.downtime
  • 17:38 jbond@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
  • 17:38 jbond@cumin1001: START - Cookbook sre.hosts.downtime
  • 17:37 jbond@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
  • 17:37 jbond@cumin1001: START - Cookbook sre.hosts.downtime
  • 17:33 jbond@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
  • 17:33 jbond@cumin1001: START - Cookbook sre.hosts.downtime
  • 17:27 jbond@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
  • 17:27 jbond@cumin1001: START - Cookbook sre.hosts.downtime
  • 17:21 jbond@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
  • 17:21 jbond@cumin1001: START - Cookbook sre.hosts.downtime
  • 16:58 legoktm: deleted some gerrit changes
  • 16:54 jbond@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
  • 16:54 jbond@cumin1001: START - Cookbook sre.hosts.downtime
  • 16:40 jbond@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
  • 16:40 jbond@cumin1001: START - Cookbook sre.hosts.downtime
  • 16:33 robh@cumin1001: END (PASS) - Cookbook sre.hosts.decommission (exit_code=0)
  • 16:33 robh@cumin1001: START - Cookbook sre.hosts.decommission
  • 16:33 robh@cumin1001: END (PASS) - Cookbook sre.hosts.decommission (exit_code=0)
  • 16:33 robh@cumin1001: START - Cookbook sre.hosts.decommission
  • 16:33 robh@cumin1001: END (PASS) - Cookbook sre.hosts.decommission (exit_code=0)
  • 16:33 robh@cumin1001: START - Cookbook sre.hosts.decommission
  • 16:32 marostegui: Compress some more tables on labsdb1012 before upgrading the host tomorrow T222978
  • 16:25 jbond@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
  • 16:25 jbond@cumin1001: START - Cookbook sre.hosts.downtime
  • 16:14 bblack: repool cp3035 (still varnish-be, but freshly installed!)
  • 16:13 jbond@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
  • 16:13 jbond@cumin1001: START - Cookbook sre.hosts.downtime
  • 16:12 jbond42: starting rolling reboots of mw1*
  • 16:09 bblack@cumin1001: conftool action : set/pooled=no; selector: name=cp3045.esams.wmnet
  • 16:08 bblack: depool cp3045 for reimage - T222937
  • 15:56 reedy@deploy1001: Synchronized wmf-config/CommonSettings.php: JADE - T212182 (duration: 00m 53s)
  • 15:55 reedy@deploy1001: Synchronized wmf-config/extension-list: JADE - T212182 (duration: 00m 53s)
  • 15:52 reedy@deploy1001: Synchronized php-1.34.0-wmf.7/extensions/Jade: Consistency (duration: 01m 08s)
  • 15:50 otto@deploy1001: Synchronized wmf-config/CommonSettings.php: Configure eventgate-main EventService. No-op in prod. T211248 (duration: 01m 19s)
  • 15:41 bblack: reboot cp3035 post-reimage
  • 15:27 otto@deploy1001: Synchronized wmf-config/InitialiseSettings-labs.php: Use eventgate-main in beta. No-op in prod. T211248 (duration: 00m 49s)
  • 15:18 zfilipin@deploy1001: rebuilt and synchronized wikiversions files: group0 to 1.34.0-wmf.8
  • 15:13 jmm@cumin2001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
  • 15:13 jmm@cumin2001: START - Cookbook sre.hosts.downtime
  • 15:13 moritzm: draining ganeti1003 for eventual reboot to MDS-enabled Linux kernel
  • 15:13 zfilipin@deploy1001: Finished scap: testwiki to php-1.34.0-wmf.8 and rebuild l10n cache (duration: 29m 46s)
  • 15:04 moritzm: failover Ganeti master in eqiad to ganeti1001
  • 14:51 jmm@cumin2001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
  • 14:51 jmm@cumin2001: START - Cookbook sre.hosts.downtime
  • 14:51 bblack: depool cp3035 for ATS reimage - T222937
  • 14:43 zfilipin@deploy1001: Started scap: testwiki to php-1.34.0-wmf.8 and rebuild l10n cache
  • 14:41 zfilipin@deploy1001: Pruned MediaWiki: 1.34.0-wmf.5 [keeping static files] (duration: 01m 38s)
  • 14:39 zfilipin@deploy1001: Pruned MediaWiki: 1.34.0-wmf.4 [keeping static files] (duration: 01m 34s)
  • 14:36 zfilipin@deploy1001: Pruned MediaWiki: 1.34.0-wmf.3 (duration: 11m 02s)
  • 13:53 jbond42: restart mtail on lithium
  • 13:46 fsero@cumin1001: END (PASS) - Cookbook sre.hosts.decommission (exit_code=0)
  • 13:46 fsero@cumin1001: START - Cookbook sre.hosts.decommission
  • 13:30 jbond42: starting rolling reboots of mw1*
  • 13:12 moritzm: draining ganeti1008 for eventual reboot to MDS-enabled Linux kernel
  • 12:57 jmm@cumin2001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
  • 12:57 jmm@cumin2001: START - Cookbook sre.hosts.downtime
  • 12:22 Urbanecm: ran mwscript deleteBatch.php --wiki=sawikisource -r 'T214553|phab:T214553T214553: deleting useless red
  • 12:13 akosiaris: restart pybal on lvs2003, lvs1015 for sessionstore LVS configuration. T220401
  • 12:09 marostegui@deploy1001: Synchronized wmf-config/db-codfw.php: Repool db2046 (duration: 00m 46s)
  • 12:04 akosiaris: restart pybal on lvs2006 for sessionstore LVS configuration. T220401
  • 11:40 akosiaris: restart pybal on lvs1015 for sessionstore LVS configuration. T220401
  • 11:39 krinkle@deploy1001: Synchronized php-1.34.0-wmf.7/includes/: T221577 / 1286d131c01886 (duration: 01m 04s)
  • 11:39 jijiki: enabling puppet on mc1*
  • 11:38 Urbanecm: run mwscript namespaceDupes.php --wiki=kuwiktionary --fix (T224327)
  • 11:37 urbanecm@deploy1001: Synchronized wmf-config/InitialiseSettings.php: Custom namespaces for ku.wiktionary|gerrit:514239Custom namespaces for ku.wiktionary (T224327) (duration: 00m 46s)
  • 11:35 urbanecm@deploy1001: Synchronized wmf-config/InitialiseSettings.php: Add localized project logo for sahwikiquote|gerrit:507931Add localized project logo for sahwikiquote (2/2, T222065) (duration: 00m 47s)
  • 11:34 urbanecm@deploy1001: Synchronized static/images/project-logos/: Add localized project logo for sahwikiquote|gerrit:507931Add localized project logo for sahwikiquote (1/2, T222065) (duration: 00m 47s)
  • 11:31 jijiki: enabling puppet on mc2*
  • 11:29 Urbanecm: running mwscript namespaceDupes.php --wiki=sawikisource --add-prefix=T214553 --fix (T214553)
  • 11:28 Urbanecm: run mwscript namespaceDupes.php --wiki=thwiki --fix (T216322)
  • 11:28 urbanecm@deploy1001: Synchronized wmf-config/InitialiseSettings.php: Add Author namespace in Sanskrit Wikisource|gerrit:486221Add Author namespace in Sanskrit Wikisource (T214553) (duration: 00m 46s)
  • 11:24 urbanecm@deploy1001: Synchronized wmf-config/CommonSettings.php: Create new protection levels for dewiktionary|gerrit:495918Create new protection levels for dewiktionary (2/2, T216885) (duration: 00m 47s)
  • 11:23 urbanecm@deploy1001: Synchronized wmf-config/InitialiseSettings.php: Create new protection levels for dewiktionary|gerrit:495918Create new protection levels for dewiktionary (1/2, T216885) (duration: 00m 47s)
  • 11:19 urbanecm@deploy1001: Synchronized wmf-config/InitialiseSettings.php: Add editcontentmodel right to the templateeditor group on testwiki|gerrit:494016Add editcontentmodel right to the templateeditor group on testwiki (T217499) (duration: 00m 47s)
  • 11:15 urbanecm@deploy1001: Synchronized wmf-config/InitialiseSettings.php: Add new namespaces for th.wiki|gerrit:491054Add new namespaces for th.wiki (T216322) (duration: 00m 47s)
  • 11:09 krinkle@deploy1001: Synchronized php-1.34.0-wmf.6/includes/: T221577 / 1286d131c01886 (duration: 01m 07s)
  • 11:02 moritzm: draining ganeti1007 for eventual reboot to MDS-enabled Linux kernel
  • 11:02 jmm@cumin2001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
  • 11:02 jmm@cumin2001: START - Cookbook sre.hosts.downtime
  • 10:44 jbond42: mw1* restarts will be delayed untill 11:15
  • 10:42 jbond42: will start rolling reboots of mw1* servers 1t 10:50
  • 09:27 moritzm: draining ganeti1006 for eventual reboot to MDS-enabled Linux kernel
  • 09:25 jijiki: disable puppet on mc* hosts to merge 511963 and 511973
  • 09:01 moritzm: draining ganeti1005 for eventual reboot to MDS-enabled Linux kernel
  • 08:57 jmm@cumin2001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
  • 08:57 jmm@cumin2001: START - Cookbook sre.hosts.downtime
  • 08:32 elukey: remove memcached nutcracker config from mw1* hosts (not used). Changes will be picked up when nutcracker will be restarted (after reboots, etc..) - T214275
  • 08:23 moritzm: draining ganeti1004 for eventual reboot to MDS-enabled Linux kernel
  • 08:04 marostegui: Stop MySQL on db2046 to clone db2058 - T221533
  • 08:04 marostegui@deploy1001: Synchronized wmf-config/db-codfw.php: Depool db2046 (duration: 00m 47s)
  • 08:03 elukey: restart hive-server2 on an-coord1001 to pick up new GC/Heap settings
  • 07:35 mobrovac@deploy1001: Finished deploy [restbase/deploy@abcb534]: Use only Proton for PDF rendering - T210651 (duration: 19m 16s)
  • 07:22 jmm@cumin2001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
  • 07:21 jmm@cumin2001: START - Cookbook sre.hosts.downtime
  • 07:21 moritzm: draining ganeti1002 for eventual reboot to MDS-enabled Linux kernel
  • 07:16 marostegui@deploy1001: Synchronized wmf-config/db-codfw.php: Move db2058 from s4 to s6 (duration: 00m 47s)
  • 07:16 mobrovac@deploy1001: Started deploy [restbase/deploy@abcb534]: Use only Proton for PDF rendering - T210651
  • 06:57 elukey: restart hive metastore on an-coord1001 to apply new GC/heap settings
  • 06:42 marostegui@deploy1001: Synchronized wmf-config/db-eqiad.php: Fully repool db1091 after upgrade (duration: 00m 48s)
  • 06:21 elukey: restart pdfrender on scb1002 (flapping)
  • 06:04 marostegui@deploy1001: Synchronized wmf-config/db-eqiad.php: More traffic to db1091 after upgrade (duration: 00m 47s)
  • 05:54 marostegui: Stop MySQL on db2078:m3 - T221533
  • 05:51 marostegui@deploy1001: Synchronized wmf-config/db-eqiad.php: Slowly repool db1091 after upgrade (duration: 00m 47s)
  • 05:40 marostegui: Stop MySQL on db1091 for MySQL upgrade T224852
  • 05:40 marostegui@deploy1001: Synchronized wmf-config/db-eqiad.php: Depool db1091 for upgrade (duration: 00m 48s)
  • 05:27 marostegui@deploy1001: Synchronized wmf-config/db-eqiad.php: Repool db1097 after upgrade (duration: 00m 46s)
  • 05:19 marostegui: Stop MySQL on db1097 for upgrade
  • 05:19 marostegui@deploy1001: Synchronized wmf-config/db-eqiad.php: Depool db1097 for upgrade (duration: 00m 47s)
  • 04:59 marostegui@deploy1001: Synchronized wmf-config/db-eqiad.php: Remove db1081 from API (duration: 00m 49s)
  • 01:10 bstorm_: T223406 depooled/repooled labsdb1009 for view updates
  • 00:09 bstorm_: T223406 repooled labsdb1011 after completing view updates

2019-06-03

  • 22:20 bstorm_: T223406 depooled labsdb1011
  • 22:09 bstorm_: T223406 repooled labsdb1010 after completing view updates
  • 21:29 XioNoX: drop all ICMP frag on all routers - T224186
  • 19:57 XioNoX: stop sampling from cr2-eqiad
  • 18:48 XioNoX: Add RPKI validators to all routers - T220669
  • 18:35 hashar: switch most Quibble jobs to node 10 T222406 - ttps://gerrit.wikimedia.org/r/#/c/integration/config/+/514034/ T222406
  • 18:35 XioNoX: drop all ICMP frag on cr1/2-eqiad - T224186
  • 18:17 XioNoX: add routinator 0.4.0 to APT repo - T220669
  • 17:16 onimisionipe@deploy1001: Finished deploy [wdqs/wdqs@9e3035c]: Blazegraph version wmf.4 (duration: 11m 29s)
  • 17:05 onimisionipe@deploy1001: Started deploy [wdqs/wdqs@9e3035c]: Blazegraph version wmf.4
  • 16:40 onimisionipe: started osm-import on maps2004 - T224395
  • 16:30 bstorm_: T223406 depooled labsdb1010 for view updates
  • 15:39 bstorm_: T223406 labsdb1012 updated views for actor table changes
  • 14:46 akosiaris: deploy kask in sessionstore kubernetes namespace in eqiad, codfw T220401
  • 14:34 arturo: T221769 reimaging cloudservices1003 to stretch
  • 14:20 vgutierrez: upgrading acme-chief to version 0.17 in acme-chief production instances - T220518
  • 13:53 jmm@cumin2001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
  • 13:53 jmm@cumin2001: START - Cookbook sre.hosts.downtime
  • 13:53 moritzm: draining ganeti1001 for eventual reboot to MDS-enabled Linux kernel
  • 13:44 mholloway-shell@deploy1001: Synchronized wmf-config/InitialiseSettings.php: WikimediaEditorTasks: Drop caption edit counter unlock delay to 0 (duration: 00m 49s)
  • 13:38 marostegui@deploy1001: Synchronized wmf-config/db-eqiad.php: Pool db1138 into s4 API (duration: 00m 48s)
  • 13:19 marostegui: Move db2078:3321 under db2062 T220170
  • 13:03 arturo: add prometheus-pdns-rec-exporter v0.7 to stretch-wikimedia (T224877)
  • 12:56 anomie@deploy1001: Synchronized wmf-config/InitialiseSettings.php: Setting actor migration to write-new/read-new on remaining wikis (T188327) (duration: 00m 48s)
  • 12:24 arturo: add prometheus-pdns-exporter v0.4 to stretch-wikimedia (T224877)
  • 11:28 gehel: reboot relforge for microcode + jvm upgrade
  • 11:17 jijiki: Restarting php7.2-fpm in eqiad in batches of 2 for 513949
  • 11:15 Urbanecm: EU SWAT done
  • 11:15 urbanecm@deploy1001: Synchronized wmf-config/InitialiseSettings.php: Add Wikiprojekti namespace to wgExtraSignatureNamespaces for fiwiki|gerrit:513740Add Wikiprojekti namespace to wgExtraSignatureNamespaces for fiwiki (T224215) (duration: 00m 47s)
  • 11:11 urbanecm@deploy1001: Synchronized wmf-config/InitialiseSettings.php: Add 5 active namespaces for VisualEditor on en.wikiversity|gerrit:503680Add 5 active namespaces for VisualEditor on en.wikiversity (T220881) (duration: 00m 48s)
  • 11:06 urbanecm@deploy1001: Synchronized wmf-config/InitialiseSettings.php: Add "Zerrenda" (list) namespace to VisualEditor on euwiki|gerrit:513720Add "Zerrenda" (list) namespace to VisualEditor on euwiki (T224801) (duration: 00m 48s)
  • 10:52 moritzm: upgrading maps servers to new Java security release
  • 10:47 moritzm: upgrading WDQS servers to new Java security release
  • 10:42 vgutierrez: upgrading prometheus-trafficserver-exporter in upload_ats ulsfo instances
  • 10:41 jdrewniak@deploy1001: Synchronized portals: Wikimedia Portals Update: Bumping portals to master (T128546)|gerrit:513972 Bumping portals to master (T128546) (duration: 00m 47s)
  • 10:40 jdrewniak@deploy1001: Synchronized portals/wikipedia.org/assets: Wikimedia Portals Update: Bumping portals to master (T128546)|gerrit:513972 Bumping portals to master (T128546) (duration: 00m 49s)
  • 10:36 jijiki: Restarting php7.2-fpm in codfw in batches of 2 for 513949
  • 10:34 moritzm: upgrading Elastic servers to new Java security release
  • 10:26 bmansurov@deploy1001: Finished deploy [recommendation-api/deploy@5046f3c]: Update the recommendation API service (duration: 03m 15s)
  • 10:23 bmansurov@deploy1001: Started deploy [recommendation-api/deploy@5046f3c]: Update the recommendation API service
  • 10:03 oblivian@puppetmaster1001: conftool action : set/pooled=false; selector: name=codfw,dnsdisc=kartotherian
  • 10:02 oblivian@puppetmaster1001: conftool action : set/pooled=true; selector: name=eqiad,dnsdisc=kartotherian
  • 09:48 onimisionipe: depooled maps codfw due to lag and disk issues - T224395
  • 09:46 moritzm: upgrading Druid/Kafka-Jumbo servers to new Java security release (will be picked up by forthcoming MDS reboots)
  • 09:43 moritzm: upgrading AQS servers to new Java security release (will be picked up by forthcoming MDS reboots)
  • 09:33 moritzm: upgrading Hadoop servers to new Java security release (will be picked up by forthcoming MDS reboots)
  • 08:18 ema: cp1077: restart varnish-be
  • 08:17 elukey: manually removed phab_clean_tmp from www-data's crontab on phab1001 to reduce cronspam
  • 08:16 ema: cp1075: restart varnish-be
  • 08:03 marostegui: Stop MySQL on db1064 T223217
  • 08:01 marostegui: Remove db1064 from tendril and zarcillo T223217
  • 07:58 elukey: refresh field list for logstash (via kibana Management -> Index patterns -> etc..)
  • 07:48 marostegui: Repool db1103 after upgrade T224852
  • 07:29 marostegui: Stop MySQL on db1103 (s2 and s4) for upgrade T224852
  • 07:29 marostegui@deploy1001: Synchronized wmf-config/db-eqiad.php: Depool db1103 for upgrade (duration: 00m 47s)
  • 07:10 marostegui@deploy1001: Synchronized wmf-config/db-eqiad.php: Fully repool db1081 into API after upgrade (duration: 00m 48s)
  • 06:50 elukey: roll restart varnishkafka (via puppet) for a config change - T224236
  • 06:46 kartik@deploy1001: scap-helm cxserver finished
  • 06:46 kartik@deploy1001: scap-helm cxserver cluster eqiad completed
  • 06:45 kartik@deploy1001: scap-helm cxserver upgrade -f cxserver-eqiad-values.yaml production stable/cxserver [namespace: cxserver, clusters: eqiad]
  • 06:45 kartik@deploy1001: scap-helm cxserver finished
  • 06:45 kartik@deploy1001: scap-helm cxserver cluster codfw completed
  • 06:45 kartik@deploy1001: scap-helm cxserver upgrade -f cxserver-codfw-values.yaml production stable/cxserver [namespace: cxserver, clusters: codfw]
  • 06:44 kartik@deploy1001: scap-helm cxserver finished
  • 06:44 kartik@deploy1001: scap-helm cxserver cluster staging completed
  • 06:44 kartik@deploy1001: scap-helm cxserver upgrade -f cxserver-staging-values.yaml staging stable/cxserver [namespace: cxserver, clusters: staging]
  • 06:39 marostegui@deploy1001: Synchronized wmf-config/db-eqiad.php: Slowly repool db1081 into API after upgrade (duration: 00m 49s)
  • 06:26 marostegui@deploy1001: Synchronized wmf-config/db-eqiad.php: Repool db1081 after upgrade (duration: 00m 46s)
  • 06:04 marostegui: Stop MySQL on db1081 for upgrade - T224852
  • 06:00 marostegui@deploy1001: Synchronized wmf-config/db-eqiad.php: Depool db1081 for upgrade (duration: 00m 47s)
  • 05:50 marostegui@deploy1001: Synchronized wmf-config/db-eqiad.php: Fully repool es1019 T213422 (duration: 00m 46s)
  • 05:45 marostegui: Upgrade mariadb on dbstore1004 - T224852
  • 05:17 marostegui: Upgrade MariaDB on codfw hosts in preparation for s4 master failover T217396
  • 05:15 marostegui@deploy1001: Synchronized wmf-config/db-eqiad.php: More traffic to es1019 T213422 (duration: 00m 46s)
  • 05:05 marostegui: Remove db2037 from tendril and zarcillo T224720
  • 05:04 marostegui: Stop MySQL on db2037 for decommission T224720
  • 04:56 marostegui@deploy1001: Synchronized wmf-config/db-eqiad.php: Slowly repool es1019 T213422 (duration: 00m 51s)

2019-06-02

  • 20:28 onimisionipe: pooled wdqs1007. It caught up on lag
  • 15:24 onimisionipe: depooled wdqs1007 to catch up on lags
  • 15:22 onimisionipe: depool wdqs internal cluster to allow them catch up on lags. depool one at a time
  • 03:09 andrewbogott: restarting pdns-recursor on cloudservices 1003 and 1004 (but not at the same time)

2019-06-01

  • 22:49 krinkle@deploy1001: Synchronized php-1.34.0-wmf.7/extensions/3D/modules/mmv.3d.js: T224812 / bd4fbfddbe1a0 (duration: 01m 07s)

2019-05-31

  • 21:47 aaron@deploy1001: Synchronized wmf-config/db-eqiad.php: Set "secret" field in $wgLBFactoryConf for ChronologyProtector HMACs (duration: 00m 47s)
  • 21:46 aaron@deploy1001: Synchronized wmf-config/db-codfw.php: Set "secret" field in $wgLBFactoryConf for ChronologyProtector HMACs (duration: 00m 50s)
  • 21:10 bblack: cp3034: repool - T222937
  • 20:04 bblack: cp3034: depool for reimage - T222937
  • 18:44 marostegui: Start MySQL on es1019 - T213422
  • 18:34 jgleeson: payments-wiki updated from a76658f0a3 to c6c7bbf71e
  • 17:29 andrewbogott: added jeh to the 'ops' group in ldap
  • 16:20 ariel@deploy1001: Finished deploy [dumps/dumps@fd6100a]: remove orderrevs config option, unneeded now (duration: 00m 03s)
  • 16:20 ariel@deploy1001: Started deploy [dumps/dumps@fd6100a]: remove orderrevs config option, unneeded now
  • 15:05 bblack: cp3039: restart varnish-be for mbox lag (likely induced by 3049's depool for ATS conversion!)
  • 15:00 Krinkle: krinkle@deploy1001: pulling down 6f91b41 for php-1.34-wmf.7/extensions/ORES (without deploy), commit seems test-only
  • 14:59 Krinkle: krinkle@deploy1001: git status in php-1.34-wmf.7/ is dirty (extensions/ORES)
  • 14:52 bblack: pool cp3049 back into service - T222937
  • 14:32 onimisionipe: depool maps2004 (again) - T224395
  • 14:32 elukey: powercycle notebook1003 - host stuck due to user processes, no ssh available, OOM didn't trigger
  • 14:20 _joe_: rolling restart of php-fpm across production to pick up the shorter revalidate frequency for T224491
  • 14:10 bblack: reboot cp3049 - T222937
  • 13:16 bblack: depool cp3049 for reimage - T222937
  • 11:46 jynus: stop and upgrade db2084
  • 11:09 jynus@deploy1001: Synchronized wmf-config/db-eqiad.php: Fully repool db1099 after maintenance (duration: 00m 48s)
  • 10:54 jynus: depool labsdb1010 for maintenance
  • 10:47 arturo: merging multiple commits to labs/private.git. We now require `puppet-merge --labsprivate` and people may not be yet aware of that
  • 09:28 jynus: stop and upgrade db2073
  • 09:11 jynus: stop and upgrade db2095 (s2, s4, s6, s7)
  • 08:33 jynus: upgrade and restart db2065
  • 08:16 jynus: depool labsdb1011 for maintenance
  • 07:54 jynus@deploy1001: Synchronized wmf-config/db-eqiad.php: Repool db1099 with low weight (duration: 00m 49s)
  • 07:43 _joe_: restarting php-fpm on canaries
  • 07:24 _joe_: repooling mw1348
  • 07:24 jynus: upgrade and restart labsdb1009
  • 07:15 _joe_: draining mw1348 from traffic
  • 07:14 jynus: depool labsdb1009 for maintenance
  • 06:55 jynus: upgrade and restart db2058
  • 06:33 _joe_: repooled mw1348
  • 06:21 jijiki: depool mw1348
  • 06:16 _joe_: restarting php-fpm on mw1348
  • 00:08 jgleeson: Updating civicrm from bb4acf3d8a to e028bfcd63

2019-05-30

  • 23:36 XioNoX: remove BGP sessions to starhub on cr4-ulsfo (left the IXP)
  • 22:59 marxarelli: deleted 95 docker images from contint1001, freeing ~ 8G on / cc: T219850
  • 22:59 XioNoX: add terms to drop specific icmp frag packets from cr1/2-eqiad - T224186
  • 22:53 marxarelli: deleting stale docker images from contint1001, cc: T207707 T219850
  • 22:25 mutante: phab2001 / phab1003 - why is 'git status' in /srv/phab/phabricator unclean with lots of file deletions but also not identical
  • 22:24 mutante: phab2001 - scap pull - but it fails with directory /srv/mediawiki not found that's so wrong
  • 22:20 niharika29@deploy1001: Synchronized php-1.34.0-wmf.7/extensions/WikimediaEvents/: Avoid division by zero warnings T224686 (duration: 00m 49s)
  • 22:19 niharika29@deploy1001: Synchronized php-1.34.0-wmf.7/extensions/PageTriage/: Fix broken feed - T224693 (duration: 00m 51s)
  • 21:27 Krinkle: krinkle@mwmaint1002 Add 1 row to pagetriage_tags table on test2wiki db, based on PageTriageTagsPatch-recreated.sql. T224693, T189929
  • 21:12 Krinkle: krinkle@mwmaint1002 Add 1 row to pagetriage_tags table on testwiki db, based on PageTriageTagsPatch-recreated.sql. T224693, T189929
  • 21:11 Krinkle: krinkle@mwmaint1002 Add 1 row to pagetriage_tags table on enwiki, based on PageTriageTagsPatch-recreated.sql. T224693, T189929
  • 21:10 niharika29@deploy1001: Synchronized php-1.34.0-wmf.7/extensions/PageTriage: Bump wgPageTriageCacheVersion T224693 (duration: 00m 51s)
  • 21:07 XioNoX: add RPKI sessions on cr4-ulsfo - T220669
  • 20:39 twentyafterfour: phabricator: restart ssh-phab.service
  • 19:49 mutante: sodium (mirrors) - sudo -u mirror /usr/local/sbin/update-ubuntu-mirror
  • 18:49 Urbanecm: Morning SWAT finished
  • 18:47 urbanecm@deploy1001: Synchronized php-1.34.0-wmf.7/extensions/GrowthExperiments/: QuestionPoster: Correctly set timestamp when question is posted|gerrit:513300QuestionPoster: Correctly set timestamp when question is posted (T223338) (duration: 00m 51s)
  • 18:26 mutante: phab1003 - switch 'vcs' user to 'NP' to match phab1001 setup and then /srv/phab/phabricator# ./bin/config set diffusion.ssh-user vcs (T224677)
  • 18:24 XioNoX: bounce eqord-ulsfo interface to try to fix BFD sessions
  • 18:12 Krinkle: Running `php7adm /opcache-free` on mw1348 and mw1321, T224491
  • 18:12 Krinkle: Running `php7adm /opcache-free` on mw1348 and mw1321
  • 18:11 Krinkle: mw1348 (recent api/php72 100% experiment) shows signs of corruption
  • 18:11 Krinkle: mw1321 php7.2 shows signs of corruption for over 2 hours – https://phabricator.wikimedia.org/T224491#5224464
  • 18:03 krinkle@deploy1001: Synchronized wmf-config/arclamp.php: (no justification provided) (duration: 00m 53s)
  • 16:24 bblack: re-pool cp3047 into service as ats-be - T222937
  • 16:04 mutante: phab1001 - removing 2620:0:861:103:10:64:32:186/128 from eth0
  • 16:03 mutante: phab1001 - removing 10.64.32.186/32 from eth0
  • 16:01 mutante: phab1001 - removing git-ssh.wm.org IP from interface - phab1003 - activating IPv6 listen address for git-ssh
  • 15:36 jynus: stop es1019 for maintenance T213422
  • 15:26 cmjohnson1: shutting down db1099 to swap DIMM T221502
  • 15:20 jynus@deploy1001: Synchronized wmf-config/db-eqiad.php: Repool db1089 with full weight; depool es1019 (duration: 00m 52s)
  • 15:19 herron: performing rolling reboots of eqiad kafka main cluster hosts for security updates
  • 15:06 onimisionipe: pooled maps2004 - osm import is complete - T224395
  • 14:44 andrewbogott: reimaging cloudvirtan1001 for T224566
  • 14:43 jbond@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
  • 14:43 jbond@cumin1001: START - Cookbook sre.hosts.downtime
  • 14:42 andrewbogott: reimaging cloudvirtan1001
  • 14:29 jbond@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
  • 14:29 jbond@cumin1001: START - Cookbook sre.hosts.downtime
  • 14:25 jbond@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
  • 14:25 jbond@cumin1001: START - Cookbook sre.hosts.downtime
  • 14:22 bblack: rebooting cp3047 (post-reimage/puppetization for T222937)
  • 14:21 jbond@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
  • 14:21 jbond@cumin1001: START - Cookbook sre.hosts.downtime
  • 14:05 jbond@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
  • 14:05 jbond@cumin1001: START - Cookbook sre.hosts.downtime
  • 14:00 jbond@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
  • 14:00 jbond@cumin1001: START - Cookbook sre.hosts.downtime
  • 13:57 jijiki: enable puppet on mw* in eqiad
  • 13:44 volans: rm /root/.ssh/known_hosts on cumin[12]001
  • 13:40 jbond@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
  • 13:40 jbond@cumin1001: START - Cookbook sre.hosts.downtime
  • 13:36 zfilipin@deploy1001: rebuilt and synchronized wikiversions files: all wikis to 1.34.0-wmf.7
  • 13:28 jijiki: Enabling puppet on mw*.codfw.net
  • 13:22 zfilipin@deploy1001: Synchronized php-1.34.0-wmf.7/resources/src/jquery/jquery.suggestions.js: SWAT: [[gerrit:513237|jquery.suggestions: Do not show suggestions on prefilled values ([T224524])]] (duration: 00m 58s)
  • 13:17 filippo@puppetmaster1001: conftool action : set/pooled=no; selector: name=restbase1015.eqiad.wmnet
  • 13:17 filippo@puppetmaster1001: conftool action : set/pooled=no; selector: name=restbase1014.eqiad.wmnet
  • 13:17 filippo@puppetmaster1001: conftool action : set/pooled=no; selector: name=restbase1013.eqiad.wmnet
  • 13:17 filippo@puppetmaster1001: conftool action : set/pooled=no; selector: name=restbase1012.eqiad.wmnet
  • 13:17 filippo@puppetmaster1001: conftool action : set/pooled=no; selector: name=restbase1011.eqiad.wmnet
  • 13:16 filippo@puppetmaster1001: conftool action : set/pooled=no; selector: name=restbase1010.eqiad.wmnet
  • 13:16 filippo@puppetmaster1001: conftool action : set/pooled=no; selector: name=restbase1009.eqiad.wmnet
  • 13:16 filippo@puppetmaster1001: conftool action : set/pooled=no; selector: name=restbase1008.eqiad.wmnet
  • 13:16 filippo@puppetmaster1001: conftool action : set/pooled=no; selector: name=restbase1007.eqiad.wmnet
  • 13:08 bblack: cp3047 puppet-disable + depool for reimage to ATS - T222937
  • 13:03 marostegui: Stop MySQL on db1099 for onsite maintenance - T221502
  • 13:02 marostegui@deploy1001: Synchronized wmf-config/db-eqiad.php: Depool db1099 T221502 (duration: 00m 56s)
  • 13:00 reedy@deploy1001: Synchronized php-1.34.0-wmf.7/tests/phpunit/includes/: T222628 (duration: 01m 06s)
  • 12:58 reedy@deploy1001: Synchronized php-1.34.0-wmf.7/includes/Linker.php: T222628 (duration: 01m 04s)
  • 12:49 jbond@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
  • 12:49 jbond@cumin1001: START - Cookbook sre.hosts.downtime
  • 12:42 jbond@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
  • 12:42 jbond@cumin1001: START - Cookbook sre.hosts.downtime
  • 12:37 jbond@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
  • 12:37 jbond@cumin1001: START - Cookbook sre.hosts.downtime
  • 12:25 jbond@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
  • 12:25 jbond@cumin1001: START - Cookbook sre.hosts.downtime
  • 12:21 jbond@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
  • 12:21 jbond@cumin1001: START - Cookbook sre.hosts.downtime
  • 12:15 jbond@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
  • 12:15 jbond@cumin1001: START - Cookbook sre.hosts.downtime
  • 12:08 jbond@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
  • 12:08 jbond@cumin1001: START - Cookbook sre.hosts.downtime
  • 12:03 jbond@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
  • 12:02 jbond@cumin1001: START - Cookbook sre.hosts.downtime
  • 11:57 jbond@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
  • 11:57 jbond@cumin1001: START - Cookbook sre.hosts.downtime
  • 11:52 jbond@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
  • 11:52 jbond@cumin1001: START - Cookbook sre.hosts.downtime
  • 11:44 jbond@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
  • 11:44 jbond@cumin1001: START - Cookbook sre.hosts.downtime
  • 11:39 jbond@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
  • 11:39 jbond@cumin1001: START - Cookbook sre.hosts.downtime
  • 11:36 jbond@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
  • 11:35 jbond@cumin1001: START - Cookbook sre.hosts.downtime
  • 11:34 akosiaris: reboot ganeti2003 for kernel upgrades
  • 11:28 jbond@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
  • 11:28 jbond@cumin1001: START - Cookbook sre.hosts.downtime
  • 11:24 jbond@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
  • 11:24 jbond@cumin1001: START - Cookbook sre.hosts.downtime
  • 11:20 jbond@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
  • 11:20 jbond@cumin1001: START - Cookbook sre.hosts.downtime
  • 11:14 jbond@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
  • 11:14 jbond@cumin1001: START - Cookbook sre.hosts.downtime
  • 11:14 _joe_: freed opcache on mw1281
  • 11:11 jbond@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
  • 11:11 jbond@cumin1001: START - Cookbook sre.hosts.downtime
  • 11:07 jbond@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
  • 11:07 jbond@cumin1001: START - Cookbook sre.hosts.downtime
  • 11:05 Urbanecm: EU SWAT finished
  • 11:04 urbanecm@deploy1001: Synchronized wmf-config/abusefilter.php: gerrit:Enable abusefilter blocking ability in plwiki (T224617) (duration: 00m 58s)
  • 11:00 jijiki: Disable puppet on mw* servers to merge 507939 - T219150
  • 10:42 jynus: upgrade and restart db1117 (temporary proxy fail for passive host, reduced redundancy for m*)
  • 10:39 jbond@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
  • 10:39 jbond@cumin1001: START - Cookbook sre.hosts.downtime
  • 10:36 jbond@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
  • 10:36 jbond@cumin1001: START - Cookbook sre.hosts.downtime
  • 10:32 jbond@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
  • 10:32 jbond@cumin1001: START - Cookbook sre.hosts.downtime
  • 10:28 jbond@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
  • 10:28 jbond@cumin1001: START - Cookbook sre.hosts.downtime
  • 10:25 jbond@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
  • 10:25 jbond@cumin1001: START - Cookbook sre.hosts.downtime
  • 10:22 jbond@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
  • 10:22 jbond@cumin1001: START - Cookbook sre.hosts.downtime
  • 10:19 jbond@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
  • 10:19 jbond@cumin1001: START - Cookbook sre.hosts.downtime
  • 10:18 jbond@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
  • 10:18 jbond@cumin1001: START - Cookbook sre.hosts.downtime
  • 10:15 jbond@cumin1001: END (FAIL) - Cookbook sre.hosts.downtime (exit_code=99)
  • 10:15 jbond@cumin1001: START - Cookbook sre.hosts.downtime
  • 10:07 jynus: upgrade and restart test-s4 hosts (db1111, db1112)
  • 09:42 jynus: stop and upgrade db1102
  • 09:32 jbond@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
  • 09:32 jbond@cumin1001: START - Cookbook sre.hosts.downtime
  • 09:31 _joe_: depooling mw1261 for benchmarking for T224491
  • 09:26 jynus@deploy1001: Synchronized wmf-config/db-eqiad.php: Repool db1089 with low weight (duration: 00m 55s)
  • 08:54 jynus: stop and restart db1089 for upgrade
  • 08:50 onimisionipe: maps2001 postgres initialization - T224395
  • 08:44 jynus@deploy1001: Synchronized wmf-config/db-eqiad.php: Depool db1089 for maintenance (duration: 00m 57s)
  • 08:32 jynus@deploy1001: Synchronized wmf-config/db-codfw.php: Repool db2087 for maintenance (duration: 01m 00s)
  • 08:10 mobrovac: drop old Parsoid tables from cassandra -- T223998
  • 07:40 mobrovac@deploy1001: Finished deploy [restbase/deploy@92591a7]: Switch to OpenAPI v3 and drop page/html/title/revision/tid - T218218 T215956 (duration: 19m 28s)
  • 07:33 _joe_: upgraded service-checker on icinga1001,2
  • 07:21 mobrovac@deploy1001: Started deploy [restbase/deploy@92591a7]: Switch to OpenAPI v3 and drop page/html/title/revision/tid - T218218 T215956
  • 00:40 marostegui@deploy1001: Synchronized wmf-config/db-codfw.php: Repool db2091 - T224393 (duration: 00m 56s)
  • 00:24 mutante: re-enabling puppet on phab1001 now that it does not have the phab role anymore (T221389)
  • 00:17 mutante: rsyncing /srv/repos again. pulling on phab2001 from phab1003 (T221389)

2019-05-29

  • 23:37 jforrester@deploy1001: Synchronized wmf-config/InitialiseSettings.php: Remove wikibase sameAs A/B test config, part II (duration: 00m 56s)
  • 23:36 jforrester@deploy1001: sync-file aborted: Remove wikibase sameAs A/B test config, part I (duration: 00m 00s)
  • 23:35 jforrester@deploy1001: Synchronized wmf-config/Wikibase.php: Remove wikibase sameAs A/B test config, part I (duration: 00m 56s)
  • 23:26 jforrester@deploy1001: Synchronized php-1.34.0-wmf.7/extensions/AbuseFilter/includes/parser/AbuseFilterTokenizer.php: SWAT AbuseFilter: Tokenizer caching back to APC I8c6a4a95e (duration: 00m 54s)
  • 23:19 jforrester@deploy1001: Synchronized wmf-config/flaggedrevs.php: Replace FR constants with numbers Ia52f644948 (duration: 00m 56s)
  • 23:17 jforrester@deploy1001: Synchronized multiversion/MWScript.php: Mark refreshMessageBlobs.php as a global script (duration: 00m 56s)
  • 23:15 mutante: repooled phab2001-vcs , fixes pybal / lvs alerts
  • 23:13 dzahn@cumin1001: conftool action : set/pooled=yes; selector: name=phab2001-vcs.codfw.wmnet
  • 23:10 jforrester@deploy1001: Synchronized wmf-config/InitialiseSettings.php: SWAT Enable wgSpecialSearchFormOptions on production Wikidata T55652 (duration: 00m 57s)
  • 23:01 mutante: phab2001 - same issue with tin.eqiad.wmnet still showing up when first trying to git clone
  • 22:52 mutante: misweb2001 - a2dismod mpm_event ; systemctl restart apache2 to fix php7.0 dependency issue
  • 22:50 mutante: miscweb2001 - when first trying to git pull iegreview - still tries to resolve 'tin.eqiad.wmnet' which is long gone. fix is still to manually edit /srv/deployment/iegreview/iegreview-cache/cache/.git/config
  • 22:46 jforrester@deploy1001: Synchronized wmf-config/CirrusSearch-common.php: Hot-deploy T224634 to fix CirrusSearch for extension registration (duration: 00m 57s)
  • 21:50 dzahn@cumin1001: conftool action : set/pooled=no; selector: name=phab2001-vcs.codfw.wmnet
  • 21:47 mutante: installing OS on miscweb2001 VM failed at grub install step :( T224323
  • 21:47 mutante: sign puppet cert request for phab2001 after reinstall (for some reason it needed me to connect to console and hit enter, reimage script itself was stuck)
  • 20:54 mutante: creating new ganeti VM miscweb2001.codfw.wmnet with same specs as krypton.eqiad.wmnet (T224323)
  • 20:35 arlolra: Updated Parsoid to 8546c79 (T219927, T211125)
  • 20:35 ejegg: updated payments-wiki from 332aaa96e2 to 45b73e7749
  • 20:28 arlolra@deploy1001: Finished deploy [parsoid/deploy@6caac43]: Updating Parsoid to 8546c79 (duration: 07m 46s)
  • 20:20 arlolra@deploy1001: Started deploy [parsoid/deploy@6caac43]: Updating Parsoid to 8546c79
  • 20:10 bblack: pool cp3044 (esams cache_upload ats-be) - T222937
  • 19:46 reedy@deploy1001: Synchronized php-1.34.0-wmf.7/extensions/Collection/: Replace missing wfCollectionSuggestAction (duration: 00m 57s)
  • 19:45 XioNoX: enable cr1-codfw:et-0/2/1 - T224511
  • 19:45 reedy@deploy1001: Synchronized php-1.34.0-wmf.6/extensions/Collection/: Replace missing wfCollectionSuggestAction (duration: 01m 01s)
  • 19:42 dzahn@cumin1001: conftool action : set/pooled=no; selector: name=phab2001-vcs.codfw.wmnet
  • 19:32 mutante: phab2001 - reinstalling with stretch - upgrade from jessie (T190568)
  • 19:09 XioNoX: enable cr1-codfw:et-0/2/0 - T224511
  • 18:37 bblack@cumin1001: conftool action : set/pooled=no; selector: name=cp3044.esams.wmnet
  • 17:44 XioNoX: enable cr1-codfw:et-0/0/1 - T224511
  • 17:13 XioNoX: enable cr1-codfw:et-0/0/0 - T224511
  • 17:02 urbanecm@deploy1001: Synchronized wmf-config/CommonSettings.php: Change arwiki default user preferences|gerrit:501926Change arwiki default user preferences, part 3/3 (T220186) (duration: 00m 56s)
  • 17:00 urbanecm@deploy1001: Synchronized wmf-config/flaggedrevs.php: Change arwiki default user preferences|gerrit:501926Change arwiki default user preferences, part 2/3 (T220186) (duration: 00m 56s)
  • 16:59 urbanecm@deploy1001: Synchronized wmf-config/InitialiseSettings.php: Change arwiki default user preferences|gerrit:501926Change arwiki default user preferences, part 1/3 (T220186) (duration: 00m 56s)
  • 16:48 sbisson@deploy1001: Synchronized wmf-config/InitialiseSettings.php: SWAT: gerrit:512942 Revert: Hardcode korean help desk config (duration: 00m 56s)
  • 16:45 sbisson@deploy1001: Synchronized php-1.34.0-wmf.7/extensions/GrowthExperiments/includes/HelpPanel.php: SWAT: gerrit:512941 Prevent parsing of GEHelpPanelHelpDeskTitle from accessing the session (duration: 00m 56s)
  • 16:42 sbisson@deploy1001: Synchronized php-1.34.0-wmf.6/extensions/GrowthExperiments/includes/HelpPanel.php: SWAT: gerrit:512940 Prevent parsing of GEHelpPanelHelpDeskTitle from accessing the session (duration: 01m 00s)
  • 16:32 sbisson@deploy1001: Synchronized php-1.34.0-wmf.6/extensions/GrowthExperiments/includes/HelpPanel/QuestionRecord.php: SWAT: gerrit:512950 Revert: Fix phan job: ignore line using JsonSerializable (duration: 00m 57s)
  • 16:08 gehel@cumin1001: END (PASS) - Cookbook sre.elasticsearch.rolling-reboot (exit_code=0)
  • 15:55 jynus: upgrade and restart db2087
  • 15:11 moritzm: draining ganeti2008 for eventual reboot to pick up MDS-enabled kernel
  • 15:06 jmm@cumin2001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
  • 15:06 jmm@cumin2001: START - Cookbook sre.hosts.downtime
  • 15:06 anomie@deploy1001: Synchronized wmf-config/InitialiseSettings.php: Setting actor migration to write-new/read-new on group 1 (T188327) (duration: 00m 57s)
  • 14:54 jmm@cumin2001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
  • 14:54 jmm@cumin2001: START - Cookbook sre.hosts.downtime
  • 14:54 moritzm: draining ganeti2007 for eventual reboot to pick up MDS-enabled kernel
  • 14:51 XioNoX: `request chassis fpc online slot 0` on cr1-codfw - T224511
  • 14:48 XioNoX: `request chassis fpc offline slot 0` on cr1-codfw - T224511
  • 14:47 XioNoX: disable et- interfaces on cr1-codfw - T224511
  • 14:45 andrewbogott: reimaging cloudcontrol1003 T221770
  • 14:34 moritzm: draining ganeti2006 for eventual reboot to pick up MDS-enabled kernel
  • 14:34 jmm@cumin2001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
  • 14:34 jmm@cumin2001: START - Cookbook sre.hosts.downtime
  • 14:32 andrewbogott: powering off cloudcontrol1003 as one last check to see what explodes before I reimage it
  • 14:30 _joe_: installing the new service checker on restbase in eqiad
  • 14:29 _joe_: installing new service checker version on restbase in codfw
  • 14:20 jbond@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
  • 14:20 jbond@cumin1001: START - Cookbook sre.hosts.downtime
  • 14:09 jbond@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
  • 14:09 jbond@cumin1001: START - Cookbook sre.hosts.downtime
  • 14:01 jbond@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
  • 14:01 jbond@cumin1001: START - Cookbook sre.hosts.downtime
  • 13:58 gehel@cumin1001: START - Cookbook sre.elasticsearch.rolling-reboot
  • 13:58 gehel@cumin1001: END (ERROR) - Cookbook sre.elasticsearch.rolling-reboot (exit_code=97)
  • 13:54 jbond@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
  • 13:54 jbond@cumin1001: START - Cookbook sre.hosts.downtime
  • 13:48 urandom: decommissioning restbase1015-c -- T223976
  • 13:35 jmm@cumin2001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
  • 13:35 jmm@cumin2001: START - Cookbook sre.hosts.downtime
  • 13:19 zfilipin@deploy1001: Synchronized php: group1 wikis to 1.34.0-wmf.7 (duration: 00m 58s)
  • 13:18 zfilipin@deploy1001: rebuilt and synchronized wikiversions files: group1 wikis to 1.34.0-wmf.7
  • 13:16 jbond@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
  • 13:16 jbond@cumin1001: START - Cookbook sre.hosts.downtime
  • 13:12 Urbanecm: mwscript emptyUserGroup.php --wiki=fawiki 'uploader' finished (T221441)
  • 13:06 andrewbogott: stopping openstack services on cloudcontrol1003 in anticipation of a re-image
  • 13:03 gehel@cumin1001: START - Cookbook sre.elasticsearch.rolling-reboot
  • 13:02 gehel@cumin1001: END (ERROR) - Cookbook sre.elasticsearch.rolling-reboot (exit_code=97)
  • 13:02 gehel@cumin1001: END (PASS) - Cookbook sre.elasticsearch.force-shard-allocation (exit_code=0)
  • 13:02 gehel@cumin1001: START - Cookbook sre.elasticsearch.force-shard-allocation
  • 13:01 gehel@cumin1001: END (PASS) - Cookbook sre.elasticsearch.force-shard-allocation (exit_code=0)
  • 13:01 gehel@cumin1001: START - Cookbook sre.elasticsearch.force-shard-allocation
  • 13:00 gehel@cumin1001: START - Cookbook sre.elasticsearch.rolling-reboot
  • 12:42 Zppix: [12:27:02] jbond@cumin1001 END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
  • 12:41 Zppix: [12:27:02] jbond@cumin1001 START - Cookbook sre.hosts.downtime
  • 12:40 Zppix: [12:23:06] <jijiki> Rolling restart pdfrender on scb*
  • {{safesubst:SAL entry|1=12:39 Zppix: [[12:20:49] jbond@cumin1001 START - Cookbook sre.hosts.downtime}}
  • 12:39 Zppix: [12:20:49] jbond@cumin1001 START - Cookbook sre.hosts.downtime
  • 12:38 Zppix: [12:11:55] jbond@cumin1001 END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
  • 12:38 Zppix: [12:11:54] jbond@cumin1001 START - Cookbook sre.hosts.downtime
  • 12:37 Zppix: [12:01:54] jbond@cumin1001 END (PASS) - Cookbook sre.hosts.downtime (exit_code=0
  • 12:36 Zppix: [12:01:54] jbond@cumin1001 START - Cookbook sre.hosts.downtime
  • 12:36 Zppix: [12:00:21] marostegui@deploy1001 Synchronized wmf-config/db-eqiad.php: Remove db2037 from config as it will be decommissioned T221533 (duration: 00m 56s)
  • 12:35 jbond@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
  • 12:35 jbond@cumin1001: START - Cookbook sre.hosts.downtime
  • 12:34 Zppix: [11:59:19] marostegui@deploy1001 Synchronized wmf-config/db-codfw.php: Remove db2037 from config as it will be decommissioned T221533
  • 12:33 Zppix: [11:58:16] <arturo> T221770 icinga downtime cloudcontrol1003.wikimedia.org for upcoming rebuild as stretch
  • 12:32 Zppix: [11:57:57] aborrero@cumin1001 END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
  • 12:32 Zppix: [11:57:55] aborrero@cumin1001 START - Cookbook sre.hosts.downtime
  • 12:31 Zppix: [11:55:54] <Urbanecm> EU SWAT finished, maintenance script emptyUserGroup.php still running in separate tmux session
  • 12:31 Zppix: [11:55:11] urbanecm@deploy1001 Synchronized wmf-config/InitialiseSettings.php: Set wgLocaltimezone for euwiki to Europe/Berlin|gerrit:511849Set wgLocaltimezone for euwiki to Europe/Berlin (T224091) (duration: 00m 57s)
  • 12:30 Zppix: [11:55:10] jbond@cumin1001 END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
  • 12:29 Zppix: [11:55:09] jbond@cumin1001 START - Cookbook sre.hosts.downtime
  • 11:55 jbond@cumin1001: START - Cookbook sre.hosts.downtime
  • 11:50 urbanecm@deploy1001: Synchronized wmf-config/InitialiseSettings.php: RSS: Update URLs to the old Wikimedia Foundation blog to point to the new site|gerrit:471260RSS: Update URLs to the old Wikimedia Foundation blog to point to the new site (T208458) (duration: 00m 57s)
  • 11:46 jbond@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
  • 11:46 jbond@cumin1001: START - Cookbook sre.hosts.downtime
  • 11:46 gehel@cumin1001: END (ERROR) - Cookbook sre.elasticsearch.rolling-reboot (exit_code=97)
  • 11:45 Urbanecm: Started mwscript emptyUserGroup.php --wiki=fawiki 'uploader' (T221441)
  • 11:44 urbanecm@deploy1001: Synchronized dblists/commonsuploads.dblist: Remove uploader user group from fawiki and merge it with autoconfirmed|gerrit:505228Remove uploader user group from fawiki and merge it with autoconfirmed, part 2 (T221441) (duration: 00m 55s)
  • 11:43 urbanecm@deploy1001: Synchronized wmf-config/InitialiseSettings.php: Remove uploader user group from fawiki and merge it with autoconfirmed|gerrit:505228Remove uploader user group from fawiki and merge it with autoconfirmed, part 1 (T221441) (duration: 00m 55s)
  • 11:40 Urbanecm: Purged angwikibooks HD logos
  • 11:38 urbanecm@deploy1001: Synchronized static/images/project-logos/: Add HD logo for angwikibooks|gerrit:512433Add HD logo for angwikibooks, logo files (T150618) (duration: 00m 56s)
  • 11:35 urbanecm@deploy1001: Synchronized wmf-config/InitialiseSettings.php: Enable transwiki import between sqwiki and sqwikiquote|gerrit:512478Enable transwiki import between sqwiki and sqwikiquote (T221234) (duration: 00m 56s)
  • 11:31 jbond@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
  • 11:31 jbond@cumin1001: START - Cookbook sre.hosts.downtime
  • 11:30 pmiazga@deploy1001: Synchronized wmf-config/InitialiseSettings.php: SWAT: gerrit:509130 Enable Advanced Mobile Contributions Overflow menu (T223883) (duration: 00m 57s)
  • 11:18 urbanecm@deploy1001: Synchronized wmf-config/InitialiseSettings.php: Remove bureaucrat protection level for all Serbian projects|gerrit:512488Remove bureaucrat protection level for all Serbian projects (T217005) (duration: 00m 57s)
  • 11:14 urbanecm@deploy1001: Synchronized wmf-config/InitialiseSettings.php: Fix Serbian projects wgRestrictionLevels|gerrit:512487Fix Serbian projects wgRestrictionLevels (T217005) (duration: 00m 57s)
  • 11:11 jbond@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
  • 11:11 jbond@cumin1001: START - Cookbook sre.hosts.downtime
  • 11:08 urbanecm@deploy1001: Synchronized wmf-config/InitialiseSettings.php: Add namespace aliases on zhwiktionary|gerrit:506892Add namespace aliases on zhwiktionary (T222024) (duration: 00m 57s)
  • 11:02 jbond@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
  • 11:02 jbond@cumin1001: START - Cookbook sre.hosts.downtime
  • 10:59 gehel@cumin1001: START - Cookbook sre.elasticsearch.rolling-reboot
  • 10:57 jynus@deploy1001: Synchronized wmf-config/db-codfw.php: Depool db2087 for maintenance (duration: 01m 11s)
  • 10:57 Urbanecm: deleteBatch.php for srwikinews finished (T212346)
  • 10:48 jmm@cumin2001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
  • 10:48 jmm@cumin2001: START - Cookbook sre.hosts.downtime
  • 10:33 mobrovac@deploy1001: Finished deploy [restbase/deploy@92591a7] (dev-cluster): Switch to OpenAPI v3 (duration: 03m 36s)
  • 10:29 mobrovac@deploy1001: Started deploy [restbase/deploy@92591a7] (dev-cluster): Switch to OpenAPI v3
  • 09:51 gehel@cumin1001: END (ERROR) - Cookbook sre.elasticsearch.rolling-reboot (exit_code=97)
  • 09:45 _joe_: uploading a new service-checker version to jessie-wikimedia
  • 09:18 gehel@cumin1001: START - Cookbook sre.elasticsearch.rolling-reboot
  • 08:51 moritzm: draining ganeti2002 for eventual reboot to pick up MDS-enabled kernel
  • 08:31 jmm@cumin2001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
  • 08:31 jmm@cumin2001: START - Cookbook sre.hosts.downtime
  • 08:31 moritzm: draining ganeti2001 for eventual reboot to pick up MDS-enabled kernel
  • 07:42 mobrovac: decommission restbase1015-b -- T223976
  • 07:40 godog: ms-be2043 start sdd rebuild - T222654
  • 07:03 jijiki: restarting pdfrender on scb1003

2019-05-28

  • 23:19 jforrester@deploy1001: Synchronized php-1.34.0-wmf.7/extensions/TimedMediaHandler/includes/ApiTimedText.php: T224522 Fix fatal in ApiTimedText following redirect pages (duration: 00m 56s)
  • 23:17 jforrester@deploy1001: Synchronized php-1.34.0-wmf.7/extensions/TimedMediaHandler/includes/handlers/TextHandler/TextHandler.php: T224367 Fix regression in subtitles for non-English sites on Commons videos (duration: 00m 57s)
  • 23:17 bstorm_: T221339 completed view updates on labsdb1009 without depooling
  • 23:16 jforrester@deploy1001: Synchronized php-1.34.0-wmf.6/extensions/TimedMediaHandler/includes/handlers/TextHandler/TextHandler.php: T224367 Fix regression in subtitles for non-English sites on Commons videos (duration: 00m 56s)
  • 23:14 jforrester@deploy1001: Synchronized php-1.34.0-wmf.6/extensions/TimedMediaHandler/includes/ApiTimedText.php: T224522 Fix fatal in ApiTimedText following redirect pages (duration: 00m 58s)
  • 23:11 jforrester@deploy1001: Synchronized wmf-config/flaggedrevs.php: FlaggedRevisions: Copy in rest of the config, for static registration I77d70519f Id0cd2e18c (duration: 00m 56s)
  • 23:10 bstorm_: T221339 repooled labsdb1011
  • 23:06 jforrester@deploy1001: Synchronized wmf-config/throttle.php: Remove expired throttle rules I4ba3d489 (duration: 00m 55s)
  • 23:06 bstorm_: T221339 depooled labsdb1011 and updated views
  • 23:02 jforrester@deploy1001: Synchronized wmf-config/InitialiseSettings.php: SWAT T55652 Enable wgSpecialSearchFormOptions on testwikidata (duration: 00m 56s)
  • 22:49 jforrester@deploy1001: Synchronized wmf-config/CommonSettings.php: SWAT Fix order of edit tabs for multi-tabs on SET wikis T223793 (duration: 00m 57s)
  • 22:28 cstone_: Re-enabled fundraising thank you mail job
  • 22:25 mutante: cp3034 - sudo -i varnish-backend-restart
  • 22:18 cstone_: Updated fundraising civicrm from 21afd001b6 to bb4acf3d8a
  • 22:14 mutante: cp3035 - varnish-backend-restart
  • 22:13 bstorm_: repooled labsdb1010
  • 22:09 mutante: cp3034 - restart varnish backend
  • 22:09 XioNoX: restart varnish backend on cp3039
  • 22:02 cstone_: Disabled fundraising thank you mail job
  • 21:46 bstorm_: depool labsdb1010 for view updates
  • 21:38 smalyshev@deploy1001: Finished deploy [wdqs/wdqs@5a69072]: Deploy GUI & Blazegraph update (duration: 14m 37s)
  • 21:35 urandom: decommissioning restbase1015-a -- T223976
  • 21:24 smalyshev@deploy1001: Started deploy [wdqs/wdqs@5a69072]: Deploy GUI & Blazegraph update
  • 21:23 ebernhardson: restart elasticsearch on cloudelastic1001 to test sanely sized readahead on /dev/dm-0
  • 21:11 gehel@cumin2001: END (PASS) - Cookbook sre.elasticsearch.rolling-reboot (exit_code=0)
  • 20:58 mutante: phab1003 / phab2001 - removing 'apache restart' from root's crontab (gerrit:512977) (T187790)
  • 20:28 mholloway-shell@deploy1001: Synchronized wmf-config/InitialiseSettings.php: WikimediaEditorTasks: Update caption edit target counts (duration: 00m 57s)
  • 19:17 gehel@cumin2001: START - Cookbook sre.elasticsearch.rolling-reboot
  • 19:15 marostegui@deploy1001: Synchronized wmf-config/db-codfw.php: Remove db1064 from config as it will be decommissioned T223217 (duration: 00m 55s)
  • 19:14 marostegui@deploy1001: Synchronized wmf-config/db-eqiad.php: Remove db1064 from config as it will be decommissioned T223217 (duration: 00m 56s)
  • 19:02 marostegui: Reboot db2091 for full OS and MySQL upgrade - T224393
  • 18:55 jforrester@deploy1001: Synchronized wmf-config/InitialiseSettings.php: Stop setting wgMediaInfoEnableFilePageDepicts, no longer read (duration: 00m 57s)
  • 18:51 jforrester@deploy1001: Synchronized wmf-config/Wikibase.php: Add forwards-compatibility for dataCdnMaxAge (duration: 01m 00s)
  • 18:11 marostegui: Start mysql for s2 and s4 on db2091 T224393
  • 17:57 jbond@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
  • 17:57 jbond@cumin1001: START - Cookbook sre.hosts.downtime
  • 17:52 jmm@cumin2001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
  • 17:52 jmm@cumin2001: START - Cookbook sre.hosts.downtime
  • 17:48 jbond@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
  • 17:48 jbond@cumin1001: START - Cookbook sre.hosts.downtime
  • 17:42 moritzm: rebooting yubiauth* servers for kernel update
  • 17:41 jmm@cumin2001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
  • 17:41 jmm@cumin2001: START - Cookbook sre.hosts.downtime
  • 17:35 mholloway-shell@deploy1001: Finished deploy [mobileapps/deploy@0735c45]: Update mobileapps to ab67b78 (duration: 05m 56s)
  • 17:34 jbond@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
  • 17:34 jbond@cumin1001: START - Cookbook sre.hosts.downtime
  • 17:29 mholloway-shell@deploy1001: Started deploy [mobileapps/deploy@0735c45]: Update mobileapps to ab67b78
  • 17:26 jbond@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
  • 17:26 jbond@cumin1001: START - Cookbook sre.hosts.downtime
  • 17:11 jbond@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
  • 17:11 jbond@cumin1001: START - Cookbook sre.hosts.downtime
  • 17:03 jbond@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
  • 17:03 jbond@cumin1001: START - Cookbook sre.hosts.downtime
  • 16:57 jbond@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
  • 16:57 jbond@cumin1001: START - Cookbook sre.hosts.downtime
  • 16:49 jbond@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
  • 16:49 jbond@cumin1001: START - Cookbook sre.hosts.downtime
  • 16:41 jbond@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
  • 16:41 jbond@cumin1001: START - Cookbook sre.hosts.downtime
  • 16:35 hoo: Ran scap pull on mw1240 (curl -H 'Host: www.wikidata.org' … mw1240.eqiad.wmnet/wiki/Special:SetEntitySchemaLabelDescriptionAliases/E10/en returned 404)
  • 16:27 jbond@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
  • 16:27 jbond@cumin1001: START - Cookbook sre.hosts.downtime
  • 16:20 Lucas_WMDE: lucaswerkmeister-wmde@mw1271:~$ scap pull
  • 16:17 jmm@cumin2001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
  • 16:17 jmm@cumin2001: START - Cookbook sre.hosts.downtime
  • 16:15 moritzm: rearmed keyholder on deploy2001 following reboot
  • 16:09 jmm@cumin2001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
  • 16:09 jmm@cumin2001: START - Cookbook sre.hosts.downtime
  • 16:09 jmm@cumin2001: END (FAIL) - Cookbook sre.hosts.downtime (exit_code=99)
  • 16:09 jmm@cumin2001: START - Cookbook sre.hosts.downtime
  • 16:04 jbond@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
  • 16:04 jbond@cumin1001: START - Cookbook sre.hosts.downtime
  • 15:56 jbond@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
  • 15:56 jbond@cumin1001: START - Cookbook sre.hosts.downtime
  • 15:54 papaul: shutting down db2091 for firmware upgrade
  • 15:53 godog: put back wrongly-replaced sdf on ms-be2043 - T222654
  • 15:42 jbond@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
  • 15:42 jbond@cumin1001: START - Cookbook sre.hosts.downtime
  • 15:42 Lucas_WMDE: Extension:EntitySchema deployment finished successfully
  • 15:38 Lucas_WMDE: lucaswerkmeister-wmde@mwmaint1002:~$ mwscript extensions/EntitySchema/maintenance/createPreexistingSchemas.php --wiki=wikidatawiki
  • 15:38 lucaswerkmeister-wmde@deploy1001: Synchronized wmf-config/InitialiseSettings.php: Enable extension EntitySchema in production|gerrit:512909Enable extension EntitySchema in production (duration: 00m 56s)
  • 15:35 jbond@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
  • 15:35 jbond@cumin1001: START - Cookbook sre.hosts.downtime
  • 15:34 lucaswerkmeister-wmde@deploy1001: Synchronized php-1.34.0-wmf.6/extensions/EntitySchema/: Steal maintenance script user|gerrit:512911Steal maintenance script user (duration: 00m 58s)
  • 15:26 jbond@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
  • 15:26 jbond@cumin1001: START - Cookbook sre.hosts.downtime
  • 15:17 Lucas_WMDE: lucaswerkmeister-wmde@mwmaint1002:~$ mwscript extensions/EntitySchema/maintenance/createPreexistingSchemas.php --wiki=testwikidatawiki
  • 15:17 lucaswerkmeister-wmde@deploy1001: Synchronized php-1.34.0-wmf.7/extensions/EntitySchema/: Steal maintenance script user|gerrit:512912Steal maintenance script user – forgot `git submodule update` before previous sync (duration: 00m 57s)
  • 15:15 jbond@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
  • 15:15 jbond@cumin1001: START - Cookbook sre.hosts.downtime
  • 15:11 lucaswerkmeister-wmde@deploy1001: Synchronized php-1.34.0-wmf.7/extensions/EntitySchema/: Steal maintenance script user|gerrit:512912Steal maintenance script user (duration: 00m 59s)
  • 15:08 jbond@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
  • 15:07 jbond@cumin1001: START - Cookbook sre.hosts.downtime
  • 15:01 gehel@cumin2001: END (ERROR) - Cookbook sre.elasticsearch.rolling-reboot (exit_code=97)
  • 14:57 jbond42: reboot ms-be2016
  • 14:57 jbond@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
  • 14:57 jbond@cumin1001: START - Cookbook sre.hosts.downtime
  • 14:36 gehel@cumin2001: START - Cookbook sre.elasticsearch.rolling-reboot
  • 14:30 zfilipin@deploy1001: rebuilt and synchronized wikiversions files: group0 to 1.34.0-wmf.7
  • 14:10 herron: beginning rolling reboots of codfw kafka-main cluster for security updates
  • 14:10 zfilipin@deploy1001: Finished scap: testwiki to php-1.34.0-wmf.7 and rebuild l10n cache (duration: 34m 22s)
  • 14:04 gehel@cumin2001: END (ERROR) - Cookbook sre.elasticsearch.rolling-reboot (exit_code=97)
  • 13:50 _joe_: hhvm restarted on mwdebug1001
  • 13:48 _joe_: stopping hhvm on mwdebug1001 for testing
  • 13:39 gehel@cumin2001: START - Cookbook sre.elasticsearch.rolling-reboot
  • 13:35 zfilipin@deploy1001: Started scap: testwiki to php-1.34.0-wmf.7 and rebuild l10n cache
  • 13:32 gilles@deploy1001: Finished deploy [performance/asoranking@60369cc]: T224388 (duration: 00m 03s)
  • 13:31 gilles@deploy1001: Started deploy [performance/asoranking@60369cc]: T224388
  • 13:31 gilles@deploy1001: deploy aborted: T224388 (duration: 00m 01s)
  • 13:31 gilles@deploy1001: Started deploy [performance/asoranking@1c60db1]: T224388
  • 13:24 urandom: decommissioning restbase1014-c -- T223976
  • 13:23 gehel@cumin2001: END (ERROR) - Cookbook sre.elasticsearch.rolling-reboot (exit_code=97)
  • 12:55 gehel@cumin2001: START - Cookbook sre.elasticsearch.rolling-reboot
  • 12:51 gilles@deploy1001: Finished deploy [performance/asoranking@1c60db1]: T224388 (duration: 00m 04s)
  • 12:50 gilles@deploy1001: Started deploy [performance/asoranking@1c60db1]: T224388
  • 12:40 gilles@deploy1001: Finished deploy [performance/asoranking@157c25f]: T224388 (duration: 00m 06s)
  • 12:40 gilles@deploy1001: Started deploy [performance/asoranking@157c25f]: T224388
  • 12:13 raynor: EU SWAT done
  • 12:11 pmiazga@deploy1001: Synchronized wmf-config/InitialiseSettings.php: SWAT: gerrit:512743 Disable the rdf2latex Collection portlet format(T224433) (duration: 00m 55s)
  • 12:00 raynor: EU SWAT re-opened
  • 11:58 Lucas_WMDE: EU SWAT done
  • 11:54 Lucas_WMDE: ^ error, no change to wiki
  • 11:54 Lucas_WMDE: lucaswerkmeister-wmde@mwmaint1002:~$ mwscript extensions/EntitySchema/maintenance/createPreexistingSchemas.php --wiki=testwikidatawiki
  • 11:52 lucaswerkmeister-wmde@deploy1001: Synchronized php-1.34.0-wmf.6/extensions/EntitySchema/: SWAT: Add maintenance script to create preexisting Schemas|gerrit:512689Add maintenance script to create preexisting Schemas + Small maintenance script adjustments|gerrit:512717Small maintenance script adjustments (duration: 00m 54s)
  • 11:48 lucaswerkmeister-wmde@deploy1001: Synchronized php-1.34.0-wmf.6/extensions/EntitySchema: SWAT: Skip configured IDs|gerrit:512677Skip configured IDs (duration: 00m 57s)
  • 11:43 lucaswerkmeister-wmde@deploy1001: Synchronized wmf-config/InitialiseSettings.php: SWAT: Add a list of IDs to skip in production|gerrit:511753Add a list of IDs to skip in production (duration: 00m 54s)
  • 11:37 lucaswerkmeister-wmde@deploy1001: Synchronized wmf-config: SWAT: Add feature flag config for breaking Wikibase API change (T223300)|gerrit:510204Add feature flag config for breaking Wikibase API change (T223300) (duration: 00m 54s)
  • 11:31 Urbanecm: Ran namespaceDupes.php for urwikibooks, urwikiquote, urwiktionary and aswikisource
  • 11:27 urbanecm@deploy1001: Synchronized wmf-config/InitialiseSettings.php: SWAT: Use underscores instead of spaces in wgMetaNamespace(Talk) for several projects|gerrit:512426Use underscores instead of spaces in wgMetaNamespace(Talk) for several projects (T223039) (duration: 00m 54s)
  • 11:25 arturo: merging change to the puppet sudo module https://gerrit.wikimedia.org/r/c/operations/puppet/+/508311
  • 11:18 urbanecm@deploy1001: Synchronized wmf-config/abusefilter.php: SWAT: Add abusefilter-modify-restricted to abusefilter group on plwiki (T224308)|gerrit:512422Add abusefilter-modify-restricted to abusefilter group on plwiki (T224308) (duration: 02m 36s)
  • 10:54 zfilipin@deploy1001: scap failed: CalledProcessError Command '/usr/local/bin/mwscript rebuildLocalisationCache.php --wiki="testwiki" --outdir="/tmp/scap_l10n_4182265560" --threads=30 --lang en --quiet' returned non-zero exit status 1 (duration: 03m 00s)
  • 10:51 zfilipin@deploy1001: Started scap: testwiki to php-1.34.0-wmf.7 and rebuild l10n cache
  • 10:48 zfilipin@deploy1001: Pruned MediaWiki: 1.34.0-wmf.3 [keeping static files] (duration: 01m 32s)
  • 10:45 zfilipin@deploy1001: Pruned MediaWiki: 1.34.0-wmf.4 [keeping static files] (duration: 06m 06s)
  • 09:32 mobrovac@deploy1001: Synchronized wmf-config/InitialiseSettings.php: Allow MW to honour the X-Request-Id header if set - T201409 (duration: 01m 12s)
  • 09:28 moritzm: installing php5 security updates
  • 09:00 moritzm: installing ffmpeg security updates
  • 08:58 gehel: rebooting wdqs nodes for kernel upgrade
  • 08:54 jiji@deploy1001: Finished deploy [cpjobqueue/deploy@04cc66d]: Migrating ORESFetchScoresJob to PHP7 - T219148 (duration: 01m 21s)
  • 08:52 jiji@deploy1001: Started deploy [cpjobqueue/deploy@04cc66d]: Migrating ORESFetchScoresJob to PHP7 - T219148
  • 08:52 moritzm: uploaded ffmpeg 3.2.14-1~deb9u1+wmf3 to component/vp9 of stretch-wikimedia (rebase of our vp9-row-mt backport to the latest stretch-security ffmpeg update)
  • 08:47 vgutierrez: uploaded acme-chief 0.17 to apt.wikimedia.org (buster) - T220518 T213820
  • 08:40 volans: T224448 sudo cumin -b 15 -p 95 'R:git::clone' 'run-puppet-agent -q --failed-only'
  • 08:29 volans: restarting gerrit due to stack threads - T224448
  • 07:17 moritzm: uploaded ffmpeg 3.2.14-1~deb9u1+wmf1 to component/vp9 of stretch-wikimedia (rebase of our vp9-row-mt backport to the latest stretch-security ffmpeg update)
  • 07:02 mobrovac: decommission restbase1014-b -- T223976
  • 06:40 jiji@deploy1001: Synchronized wmf-config/CommonSettings.php: Send 20% of anonymous users to PHP7.2 - T219150 (duration: 00m 51s)
  • 00:38 urandom: decommissioning restbase1014-a -- T223976

2019-05-27

  • 23:19 thcipriani: gerrit back after restarting due to T224448
  • 23:10 thcipriani: restarting gerrit due to active threads being stuck being a sendemail thread.
  • 22:52 gilles@deploy1001: Finished deploy [performance/asoranking@bacfc37]: T224388 (duration: 00m 05s)
  • 22:52 gilles@deploy1001: Started deploy [performance/asoranking@bacfc37]: T224388
  • 22:19 gilles@deploy1001: Finished deploy [performance/asoranking@d0c156e]: T224388 (duration: 00m 05s)
  • 22:19 gilles@deploy1001: Started deploy [performance/asoranking@d0c156e]: T224388
  • 20:19 gilles@deploy1001: Finished deploy [performance/asoranking@61039f1]: (no justification provided) (duration: 00m 06s)
  • 20:19 gilles@deploy1001: Started deploy [performance/asoranking@61039f1]: (no justification provided)
  • 18:41 krinkle@deploy1001: Synchronized php-1.34.0-wmf.6/includes/libs/rdbms: 66556bf37e8 / T223310, T223978 (duration: 00m 50s)
  • 18:06 krinkle@deploy1001: Synchronized errorpages/: 4ffcbfc2ba3 (duration: 00m 48s)
  • 17:56 andrewbogott: re-imaging cloudservices1004 in order to make sure our apt magic is working properly
  • 17:37 andrewbogott: refreshing puppet-compiler facts
  • 16:40 volans: removed unreferenced files in /etc/dhcp/ on install[12]002
  • 16:34 mobrovac: decommission restbase1013-c - T223976
  • 15:40 akosiaris: initialize termbox namespace on eqiad/codfw/staging kubernetes clusters T220402
  • 15:36 akosiaris: initialize sessionstore namespace on eqiad/codfw/staging kubernetes clusters T220401
  • 13:03 godog: swift eqiad-prod: ms-be1033 weight to 0 - T223518
  • 11:33 onimisionipe: starting osm initial import on maps2004 - T224395
  • 10:35 mobrovac: decommission restbase1013-b - T223976
  • 10:31 onimisionipe: rebooting maps2004 - cassandra unit failed and got stuck
  • 09:59 jiji@deploy1001: Finished deploy [cpjobqueue/deploy@421c029]: Migrating wikibase-addUsagesForPage to PHP7 - T219148 (duration: 01m 09s)
  • 09:58 jiji@deploy1001: Started deploy [cpjobqueue/deploy@421c029]: Migrating wikibase-addUsagesForPage to PHP7 - T219148
  • 09:52 _joe_: disabling puppet on mw1261, running some tests for T223180
  • 08:52 arturo: 1 day downtime systemd check for cloudcontrol1003
  • 08:27 jiji@deploy1001: Synchronized wmf-config/db-codfw.php: Depool db2091 - T224393 (duration: 00m 49s)
  • 08:03 gehel: depool maps2004 - T224395
  • 07:05 gehel: running nodetool repair on maps2004 -T224395
  • 04:23 gilles@deploy1001: Finished deploy [performance/asoranking@61039f1]: (no justification provided) (duration: 00m 28s)
  • 04:23 gilles@deploy1001: Started deploy [performance/asoranking@61039f1]: (no justification provided)
  • 02:59 urandom: decommissioning restbase1013-a -- T223976

2019-05-26

  • 20:39 urandom: decommissioning restbase1012-c -- T223976
  • 14:09 urandom: decommissioning restbase1012-b -- T223976
  • 13:37 krinkle@deploy1001: Synchronized php-1.34.0-wmf.6/includes/debug: T187147 / 2be7aa4bc4af36 (duration: 00m 51s)
  • 08:01 mobrovac: decommission restbase1012-a - T223976

2019-05-25

  • 22:41 urandom: decommissioning restbase1011-c -- T223976
  • 22:00 krinkle@deploy1001: Synchronized php-1.34.0-wmf.6/includes/Linker.php: T222628 / c735a545df3a (duration: 00m 51s)
  • 19:12 andrewbogott: reimaging cloudservices1004 with Stretch
  • 13:46 urandom: decommissioning restbase1011-b -- T223976
  • 12:28 godog: bounce thumbor on thumbor1002
  • 12:21 godog: bounce thumbor on thumbor1002
  • 11:48 _joe_: restarted tumbor-instances on thumbor1001
  • 09:20 mobrovac: decommission restbase1011-b - T223976
  • 04:56 ariel@deploy1001: Finished deploy [dumps/dumps@61114e0]: add namespaces param only once for abstracts with lang variants (duration: 00m 07s)
  • 04:56 ariel@deploy1001: Started deploy [dumps/dumps@61114e0]: add namespaces param only once for abstracts with lang variants
  • 00:30 jforrester@deploy1001: Synchronized php-1.34.0-wmf.6/extensions/VisualEditor/modules/ve-mw/init/ve.init.mw.ArticleTarget.js: Hot-deploy T224319 for VisualEditor switching and auto-restore (duration: 00m 50s)

2019-05-24

  • 21:56 urandom: decommissioning restbase1011-a -- T223976
  • 16:34 XioNoX: add routinator package to reprepro/APT - T220669
  • 15:44 urandom: decommissioning restbase1010-c -- T223976
  • 15:30 XioNoX: disable bgp to telia on cr1-codfw for X-connect investigation - T222967
  • 15:01 jbond42: upload python{,3}-statsd.3.2.1-2 to jessie-wikimedia
  • 14:55 krinkle@deploy1001: Synchronized php-1.34.0-wmf.6/includes/libs/objectcache/: d262078b1 / T220470 (duration: 01m 06s)
  • 11:45 hoo: Updated the Wikidata property suggester with data from the 2019-05-13 JSON dump and applied the T132839 workarounds
  • 11:32 jbond42: [actully] rebooting prometheous1004 now
  • 11:32 jbond@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
  • 11:32 jbond@cumin1001: START - Cookbook sre.hosts.downtime
  • 11:25 jbond@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
  • 11:25 jbond@cumin1001: START - Cookbook sre.hosts.downtime
  • 11:23 jbond@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
  • 11:23 jbond@cumin1001: START - Cookbook sre.hosts.downtime
  • 11:23 jbond@cumin1001: END (FAIL) - Cookbook sre.hosts.downtime (exit_code=99)
  • 11:23 jbond@cumin1001: START - Cookbook sre.hosts.downtime
  • 11:23 jbond42: rebooting prometheous1004
  • 10:56 jbond@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
  • 10:56 jbond@cumin1001: START - Cookbook sre.hosts.downtime
  • 10:56 jbond42: rebooting prometheous2003
  • 10:25 jbond42: rebooting prometheous2004
  • 10:25 jbond@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
  • 10:25 jbond@cumin1001: START - Cookbook sre.hosts.downtime
  • 10:09 mobrovac: decommission restbase1010-b - T223976
  • 07:33 jmm@cumin2001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
  • 07:33 jmm@cumin2001: START - Cookbook sre.hosts.downtime
  • 07:32 moritzm: rebooting labweb* for kernel security update
  • 07:05 mobrovac: restbase-dev1006 force-stop the cassandra instances, fsync exception during decomm - T224260
  • 06:47 moritzm: bounced ferm on mw2286, wasn't correctly started after reboot
  • 06:45 mobrovac: restbase-dev1006 decommission cass-b - T224260
  • 06:43 _joe_: disable notifications in icinga for restbase-dev1006 T224260
  • 06:40 mobrovac: restbase-dev1006 decommission cass-a - T224260
  • 06:39 mobrovac: restbase-dev1006 stop restbase - T224260
  • 06:38 mobrovac: restbase-dev1006 puppet disabled - T224260
  • 06:26 mobrovac@deploy1001: Finished deploy [restbase/deploy@b153f5d] (dev-cluster): Remove Parsoid fallback and rate-limit stashing (duration: 05m 41s)
  • 06:20 mobrovac@deploy1001: Started deploy [restbase/deploy@b153f5d] (dev-cluster): Remove Parsoid fallback and rate-limit stashing
  • 06:20 mobrovac@deploy1001: Finished deploy [restbase/deploy@b153f5d]: Remove Parsoid fallback and rate-limit stashing - T215956 T224055 (duration: 21m 30s)
  • 06:17 marostegui: Stop MySQL on db2078:m1 to clone db2062 - T220170
  • 06:08 marostegui@deploy1001: Synchronized wmf-config/db-eqiad.php: More traffic to new hosts T220170 (duration: 00m 48s)
  • 05:58 mobrovac@deploy1001: Started deploy [restbase/deploy@b153f5d]: Remove Parsoid fallback and rate-limit stashing - T215956 T224055
  • 05:35 marostegui@deploy1001: Synchronized wmf-config/db-codfw.php: Remove db2062 from config T220170 (duration: 00m 48s)
  • 05:34 marostegui@deploy1001: Synchronized wmf-config/db-eqiad.php: Remove db2062 from config T220170 (duration: 00m 49s)
  • 05:30 marostegui: Reload haproxy on dbproxy1010 to repool labsdb1011
  • 00:32 XioNoX: remove lvs1001-5 bgp sessions from cr1/2-eqiad - T224223
  • 00:27 XioNoX: remove term protect-old-lvs-servers from cr1/2-eqiad - T224223
  • 00:20 urandom: decommissioning restbase1010-a -- T223976
  • 00:04 ebernhardson@deploy1001: Finished scap: php-1.34.0-wmf.6/extensions/CirrusSearch/includes/ T223738 Consider searching out of limits an error (duration: 21m 32s)

2019-05-23

  • 23:43 ebernhardson@deploy1001: Started scap: php-1.34.0-wmf.6/extensions/CirrusSearch/includes/ T223738 Consider searching out of limits an error
  • 23:08 jforrester@deploy1001: Synchronized wmf-config/InitialiseSettings.php: Invariant config cleanup VII–X, InitialiseSettings (duration: 00m 48s)
  • 23:06 jforrester@deploy1001: Synchronized wmf-config/CommonSettings.php: Invariant config cleanup VII–X, CommonSettings (duration: 00m 47s)
  • 23:00 jforrester@deploy1001: Synchronized wmf-config/InitialiseSettings.php: Invariant config cleanup VI, InitialiseSettings (duration: 00m 47s)
  • 22:59 jforrester@deploy1001: Synchronized wmf-config/CommonSettings.php: Invariant config cleanup VI, CommonSettings (duration: 00m 48s)
  • 22:57 jforrester@deploy1001: Synchronized wmf-config/InitialiseSettings.php: Invariant config cleanup V, InitialiseSettings (duration: 00m 47s)
  • 22:56 jforrester@deploy1001: Synchronized wmf-config/CommonSettings.php: Invariant config cleanup V, CommonSettings (duration: 00m 47s)
  • 22:53 jforrester@deploy1001: Synchronized wmf-config/InitialiseSettings.php: Invariant config cleanup IV, InitialiseSettings (duration: 00m 47s)
  • 22:51 jforrester@deploy1001: Synchronized wmf-config/CommonSettings.php: Invariant config cleanup IV, CommonSettings (duration: 00m 48s)
  • 22:50 jforrester@deploy1001: Synchronized wmf-config/InitialiseSettings.php: Invariant config cleanup III, InitialiseSettings (duration: 00m 47s)
  • 22:47 jforrester@deploy1001: Synchronized wmf-config/CommonSettings.php: Invariant config cleanup III, CommonSettings (duration: 00m 48s)
  • 22:44 jforrester@deploy1001: Synchronized wmf-config/InitialiseSettings.php: Invariant config cleanup II, InitialiseSettings (duration: 00m 48s)
  • 22:43 jforrester@deploy1001: Synchronized wmf-config/CommonSettings.php: Invariant config cleanup II, CommonSettings (duration: 00m 48s)
  • 22:39 jforrester@deploy1001: Synchronized wmf-config/InitialiseSettings.php: Invariant config cleanup I, InitialiseSettings (duration: 00m 47s)
  • 22:37 jforrester@deploy1001: Synchronized wmf-config/CommonSettings.php: Invariant config cleanup I, CommonSettings (duration: 00m 48s)
  • 22:32 jforrester@deploy1001: Synchronized wmf-config/InitialiseSettings.php: Stop setting wmgUseClusterSquid, never varied, no longer used (duration: 00m 48s)
  • 22:29 jforrester@deploy1001: Synchronized wmf-config/CommonSettings.php: Stop reading wmgUseClusterSquid, never varied (duration: 00m 47s)
  • 22:25 jforrester@deploy1001: Synchronized wmf-config/CommonSettings.php: T104148 Duplicate …Squid variables into …Cdn ahead of MW renaming, part 3 (duration: 00m 47s)
  • 22:24 jforrester@deploy1001: Synchronized wmf-config/InitialiseSettings.php: T104148 Duplicate …Squid variables into …Cdn ahead of MW renaming, part 2 (duration: 00m 48s)
  • 22:23 jforrester@deploy1001: Synchronized wmf-config/reverse-proxy.php: T104148 Duplicate …Squid variables into …Cdn ahead of MW renaming, part 1 (duration: 00m 48s)
  • 22:19 jforrester@deploy1001: Synchronized wmf-config/InitialiseSettings.php: T223793 Drop wmgVisualEditorSingleEditTabSecondaryEditor and wmgVisualEditorSecondaryTabs from InitialiseSettings (duration: 00m 48s)
  • 22:17 jforrester@deploy1001: Synchronized wmf-config/CommonSettings.php: T223793 Read wmgVisualEditorIsSecondaryEditor in CommonSettings (duration: 00m 48s)
  • 22:13 jforrester@deploy1001: Synchronized wmf-config/InitialiseSettings.php: T223793 Add wmgVisualEditorIsSecondaryEditor to InitialiseSettings (duration: 00m 49s)
  • 19:48 ejegg: updated payments-wiki from 786d76e212 to 332aaa96e2
  • 18:54 urandom: decommissioning restbase1009-c -- T223976
  • 16:13 twentyafterfour: restarting phd on phab1003 to pick up new php module config
  • 15:57 moritzm: rebooting furud/flerovium for kernel updates
  • 15:56 jmm@cumin2001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
  • 15:56 jmm@cumin2001: START - Cookbook sre.hosts.downtime
  • 15:56 jmm@cumin2001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
  • 15:56 jmm@cumin2001: START - Cookbook sre.hosts.downtime
  • 15:33 ottomata: rolling restart of swift-proxy to apply creation of analytics_admin account
  • 15:31 hashar@deploy1001: Synchronized wmf-config/InitialiseSettings.php: Hardcode korean help desk config - T224224 (duration: 00m 48s)
  • 15:31 jbond@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
  • 15:31 jbond@cumin1001: START - Cookbook sre.hosts.downtime
  • 15:31 jbond42: reboot thumbor2004
  • 15:02 jbond@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
  • 15:02 jbond@cumin1001: START - Cookbook sre.hosts.downtime
  • 15:02 jbond42: reboot thumbor2003
  • 14:57 jbond@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
  • 14:57 jbond@cumin1001: START - Cookbook sre.hosts.downtime
  • 14:57 jbond42: reboot thumbor2002
  • 14:51 jbond@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
  • 14:51 jbond@cumin1001: START - Cookbook sre.hosts.downtime
  • 14:51 jbond@cumin1001: END (FAIL) - Cookbook sre.hosts.downtime (exit_code=99)
  • 14:51 jbond@cumin1001: START - Cookbook sre.hosts.downtime
  • 14:50 jbond42: reboot thumbor2001
  • 14:43 jbond@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
  • 14:43 jbond@cumin1001: START - Cookbook sre.hosts.downtime
  • 14:43 jbond42: reboot thumbor1004
  • 14:37 jbond@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
  • 14:37 jbond@cumin1001: START - Cookbook sre.hosts.downtime
  • 14:36 jbond42: reboot thumbor1003
  • 14:30 jbond@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
  • 14:29 jbond@cumin1001: START - Cookbook sre.hosts.downtime
  • 14:28 jbond42: reboot thumbor1002
  • 14:25 jbond@cumin1001: conftool action : set/pooled=yes; selector: name=thumbor1001.eqiad.wmnet
  • 14:21 jbond@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
  • 14:21 jbond@cumin1001: START - Cookbook sre.hosts.downtime
  • 14:21 jbond@cumin1001: conftool action : set/pooled=no; selector: name=thumbor1001.eqiad.wmnet
  • 13:56 sbisson@deploy1001: Synchronized php-1.34.0-wmf.6/extensions/Echo: SWAT: Don't add CommentStoreComment as plaintext params|gerrit:512070Don't add CommentStoreComment as plaintext params (duration: 00m 50s)
  • 13:55 urandom: decommissioning restbase1009-b -- T223976
  • 13:41 bblack: stopped pybal on lvs1001-6 - T224223
  • 13:25 hashar@deploy1001: rebuilt and synchronized wikiversions files: all wikis to 1.34.0-wmf.6
  • 13:00 godog: swift eqiad-prod: ms-be1033 weight to 1500 - T223518
  • 12:04 moritzm: powercycling mw2268 (stuck after reboot)
  • 11:50 jbond42: will shortly start rolling reboots of thumbor servers
  • 11:37 jmm@cumin2001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
  • 11:37 jmm@cumin2001: START - Cookbook sre.hosts.downtime
  • 11:34 jmm@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
  • 11:34 jmm@cumin1001: START - Cookbook sre.hosts.downtime
  • 11:23 moritzm: rebooting auth1002 for kernel update
  • 11:21 jmm@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
  • 11:21 jmm@cumin1001: START - Cookbook sre.hosts.downtime
  • 10:51 Amir1: Deploying EntitySchema to testwikidatawiki is done
  • 10:50 Amir1: ladsgroup@mwmaint1002:/srv/mediawiki/php-1.34.0-wmf.5$ mwscript sql.php --wiki=wikidatawiki extensions/EntitySchema/sql/EntitySchema.sql (T216955)
  • 10:50 ladsgroup@deploy1001: Synchronized wmf-config/InitialiseSettings.php: SWAT: deploy WikibaseSchema to test (T216956)|gerrit:511844deploy WikibaseSchema to test (T216956) (duration: 00m 56s)
  • 10:44 Amir1: ladsgroup@mwmaint1002:/srv/mediawiki/php-1.34.0-wmf.5$ mwscript sql.php --wiki=testwikidatawiki extensions/EntitySchema/sql/EntitySchema.sql (T216956)
  • 10:28 jmm@cumin2001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
  • 10:28 jmm@cumin2001: START - Cookbook sre.hosts.downtime
  • 10:16 marostegui@deploy1001: Synchronized wmf-config/db-eqiad.php: Fully repool db1080 (duration: 00m 57s)
  • 10:15 _joe_: restarted php7.2-fpm on mw1261 to assess the effect of a larger APCu shm size T223180
  • 10:00 jmm@cumin2001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
  • 10:00 moritzm: rebooting remaining mw servers in codfw (sans mcrouter proxies for now)
  • 10:00 jmm@cumin2001: START - Cookbook sre.hosts.downtime
  • 09:51 hashar@deploy1001: Synchronized php-1.34.0-wmf.6/extensions/Collection: Rename wfAjaxCollectionGetItemList() T224093 (duration: 00m 57s)
  • 09:37 marostegui@deploy1001: Synchronized wmf-config/db-eqiad.php: Repool db1080 into API (duration: 00m 55s)
  • 09:22 godog: bounce rsyslog on lithium - listener stuck /T199406
  • 09:10 jmm@cumin2001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
  • 09:10 jmm@cumin2001: START - Cookbook sre.hosts.downtime
  • 09:10 moritzm: rebooting scb servers in eqiad
  • 09:02 marostegui@deploy1001: Synchronized wmf-config/db-eqiad.php: Slowly repool db1080 (duration: 00m 55s)
  • 08:29 marostegui: Upgrade MySQL and kernel on db1080
  • 08:29 marostegui@deploy1001: Synchronized wmf-config/db-eqiad.php: Depool db1080 (duration: 00m 55s)
  • 08:26 jmm@cumin2001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
  • 08:26 jmm@cumin2001: START - Cookbook sre.hosts.downtime
  • 08:26 moritzm: rebooting scb servers in codfw
  • 07:58 marostegui@deploy1001: Synchronized wmf-config/db-eqiad.php: Slowly repool db1080 (duration: 00m 56s)
  • 07:34 jmm@cumin2001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
  • 07:34 jmm@cumin2001: START - Cookbook sre.hosts.downtime
  • 07:33 moritzm: rebooting swift frontends in eqiad
  • 07:26 marostegui@deploy1001: Synchronized wmf-config/db-eqiad.php: More traffic to db1136 (duration: 00m 53s)
  • 07:11 marostegui: Stop MySQL on db1117:3323 to clone db1128 T222682
  • 06:38 marostegui@deploy1001: Synchronized wmf-config/db-eqiad.php: More traffic to db1136 (duration: 00m 55s)
  • 06:29 marostegui@deploy1001: Synchronized wmf-config/db-eqiad.php: Remove db2065 from config as it will be moved to m3 to replace db2042 (duration: 00m 55s)
  • 06:28 marostegui@deploy1001: Synchronized wmf-config/db-codfw.php: Remove db2065 from config as it will be moved to m3 to replace db2042 (duration: 00m 56s)
  • 06:14 mobrovac: start ruwiki dumps to fill the new parsoid tables - T215956
  • 05:33 marostegui@deploy1001: Synchronized wmf-config/db-codfw.php: Promote db2070 as m5 codfw master - T221533 (duration: 00m 54s)
  • 05:29 marostegui: Promote db2070 to m5 codfw master instead of db2037 - T221533
  • 05:20 marostegui@deploy1001: Synchronized wmf-config/db-codfw.php: Clarify db2107 status - will be the new master (duration: 00m 54s)
  • 05:03 marostegui@deploy1001: Synchronized wmf-config/db-codfw.php: Pool db1136 into s7 T222682 (duration: 00m 55s)
  • 05:02 marostegui@deploy1001: Synchronized wmf-config/db-eqiad.php: Pool db1136 into s7 T222682 (duration: 00m 55s)
  • 04:57 mobrovac: decommission restbase1009-a - T223976
  • 04:46 marostegui@deploy1001: Synchronized wmf-config/db-eqiad.php: Depool db1080 (duration: 00m 55s)
  • 04:40 marostegui@deploy1001: Synchronized wmf-config/db-eqiad.php: Repool db1080 (duration: 00m 58s)
  • 04:24 mobrovac: start nl, pt, pl wiki dumps to fill the new parsoid tables - T215956
  • 03:50 twentyafterfour: m3 database activity levels look like they have returned to normal
  • 03:48 twentyafterfour: puppet runs cleanly on phab1003
  • 03:39 mutante: phab1003 - disabling puppet; /etc/php/7.2/fpm/conf.d# ln -s /etc/php/7.2/mods-available/ldap.ini 20-ldap.ini ; systemctl restart php7.2-fpm
  • 03:27 twentyafterfour: restarted php-fpm on phab1003
  • 02:56 mutante: phab1001 - removing community_metrics and project_changes cron jobs to avoid duplicate mails
  • 02:51 mutante: phab1003 - chown -R phd /srv/repos/
  • 02:41 twentyafterfour: downtimed the systemd state on phab1001 for 1 year
  • 02:35 mutante: phabricator - going read-write again
  • 02:24 twentyafterfour: manually started aphlict on phab1003
  • 02:06 dzahn@cumin1001: conftool action : set/pooled=yes; selector: name=phab1003-vcs.eqiad.wmnet
  • 02:04 mutante: puppetmaster1001 - sudo -i conftool-merge
  • 01:52 twentyafterfour: phabricator is now served by phab1003 though still in read-only mode for a bit longer
  • 01:52 dzahn@cumin1001: conftool action : set/pooled=yes; selector: name=phab1003-vcs.eqiad.wmnet
  • 01:49 mutante: puppetmaster1001 - conftool-merge
  • 01:41 eileen: civicrm revision changed from e6e846708f to 21afd001b6, config revision is 87e78d3eac
  • 01:37 mutante: depooled phab1001-vcs from git-ssh via conftool
  • 01:36 dzahn@cumin1001: conftool action : set/pooled=no; selector: name=phab1001-vcs.eqiad.wmnet
  • 01:33 mutante: run puppet on mx1001/mx2001 - switch mail route for phab to phab1003
  • 01:30 mutante: switched from phab1001 to phab1003 - applied on cp1008 varnish canary first
  • 01:28 twentyafterfour: stopping phd on phab1001
  • 01:18 mutante: phabricator going readonly momentarily
  • 01:09 twentyafterfour: extended phab downtime in icinga, actual downtime hasn't started yet, prep work taking longer than expected
  • 00:52 smalyshev@deploy1001: Finished deploy [wdqs/wdqs@e040c6c]: Deploy GUI update (duration: 09m 54s)
  • 00:45 mutante: phab1003 - rsyncing /srv/repos from phab1001
  • 00:42 smalyshev@deploy1001: Started deploy [wdqs/wdqs@e040c6c]: Deploy GUI update
  • 00:33 ejegg: updated payments-wiki from fa005a0640 to 786d76e212

2019-05-22

  • 23:30 twentyafterfour: scheduling downtime for phabricator from 0:00 to 1:00 utc
  • 23:10 maxsem@deploy1001: Synchronized wmf-config/InitialiseSettings.php: https://gerrit.wikimedia.org/r/#/c/operations/mediawiki-config/+/511889/ (duration: 00m 55s)
  • 22:18 mdholloway: mobileapps rolled back deployment (again) due to occasional references endpoint timeouts
  • 22:17 mholloway-shell@deploy1001: Finished deploy [mobileapps/deploy@39e0ef1]: Update mobileapps to fcf3724, take 2 (duration: 07m 19s)
  • 22:15 foks: reset user email and password for Nv8200pa
  • 22:09 mholloway-shell@deploy1001: Started deploy [mobileapps/deploy@39e0ef1]: Update mobileapps to fcf3724, take 2
  • 22:09 mdholloway: mobileapps rolled back deployment due to endpoint check failure (not the same one as before); retrying momentarily
  • 22:08 mholloway-shell@deploy1001: Finished deploy [mobileapps/deploy@39e0ef1]: Update mobileapps to fcf3724 (duration: 03m 25s)
  • 22:08 foks: reset user email and password for DarkKyoushu
  • 22:05 mholloway-shell@deploy1001: Started deploy [mobileapps/deploy@39e0ef1]: Update mobileapps to fcf3724
  • 21:51 krinkle@deploy1001: Synchronized php-1.34.0-wmf.5/includes/resourceloader/MessageBlobStore.php: T222539 / 734b3d84f7 (duration: 00m 56s)
  • 21:47 krinkle@deploy1001: Synchronized php-1.34.0-wmf.6/includes/resourceloader/MessageBlobStore.php: T222539 / 3cb01cc73ce9 (duration: 00m 56s)
  • 21:41 urandom: decommissioning restbase1008-c -- T223976
  • 20:46 mdholloway: mobileapps rolled back deployment due to endpoint check failures
  • 20:43 mholloway-shell@deploy1001: Finished deploy [mobileapps/deploy@07632e1]: Update mobileapps to b058298, take 2 (duration: 04m 19s)
  • 20:39 mholloway-shell@deploy1001: Started deploy [mobileapps/deploy@07632e1]: Update mobileapps to b058298, take 2
  • 20:38 mholloway-shell@deploy1001: Finished deploy [mobileapps/deploy@07632e1]: Update mobileapps to b058298 (duration: 02m 41s)
  • 20:35 mholloway-shell@deploy1001: Started deploy [mobileapps/deploy@07632e1]: Update mobileapps to b058298
  • 19:26 jforrester@deploy1001: Finished scap: Re-build i18n and re-scap everything for i18n issues for T224116 T224124 T220731 (duration: 32m 55s)
  • 18:53 jforrester@deploy1001: Started scap: Re-build i18n and re-scap everything for i18n issues for T224116 T224124 T220731
  • 18:35 jforrester@deploy1001: Synchronized php-1.34.0-wmf.6/extensions/FlaggedRevs: Hot-deploy reverting FlaggedRevs config for T224116 T224124 (duration: 00m 58s)
  • 18:17 jforrester@deploy1001: Synchronized php-1.34.0-wmf.6/extensions/UrlShortener/modules/ext.urlShortener.special.js: Fix i18n/command mix-up Ic99cf063a (duration: 01m 00s)
  • 17:38 bblack: repool cp3046 as esams cache_upload ats-be node - T222937
  • 17:06 urandom: decommissioning restbase1008-b -- T223976
  • 16:17 hashar@deploy1001: rebuilt and synchronized wikiversions files: Revert group1 to 1.34.0-wmf.5 T224116 T224124 # T220731
  • 15:11 jbond@cumin1001: conftool action : set/pooled=yes; selector: name=dns1002.wikimedia.org
  • 15:08 jbond@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
  • 15:08 jbond@cumin1001: START - Cookbook sre.hosts.downtime
  • 15:08 jbond@cumin1001: conftool action : set/pooled=no; selector: name=dns1002.wikimedia.org
  • 15:07 jbond@cumin1001: conftool action : set/pooled=yes; selector: name=dns1001.wikimedia.org
  • 15:04 jbond@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
  • 15:04 jbond@cumin1001: START - Cookbook sre.hosts.downtime
  • 15:04 jbond@cumin1001: conftool action : set/pooled=no; selector: name=dns1001.wikimedia.org
  • 15:00 jbond@cumin1001: conftool action : set/pooled=yes; selector: name=dns2002.wikimedia.org
  • 14:58 jbond@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
  • 14:58 jbond@cumin1001: START - Cookbook sre.hosts.downtime
  • 14:58 jbond@cumin1001: conftool action : set/pooled=no; selector: name=dns2002.wikimedia.org
  • 14:57 jbond@cumin1001: conftool action : set/pooled=yes; selector: name=dns2001.wikimedia.org
  • 14:54 jbond@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
  • 14:54 jbond@cumin1001: START - Cookbook sre.hosts.downtime
  • 14:54 jbond@cumin1001: conftool action : set/pooled=no; selector: name=dns2001.wikimedia.org
  • 14:49 jbond@cumin1001: conftool action : set/pooled=yes; selector: name=nescio.wikimedia.org
  • 14:45 jbond@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
  • 14:45 jbond@cumin1001: START - Cookbook sre.hosts.downtime
  • 14:45 jbond@cumin1001: conftool action : set/pooled=no; selector: name=nescio.wikimedia.org
  • 14:42 jbond@cumin1001: conftool action : set/pooled=yes; selector: name=maerlant.wikimedia.org
  • 14:35 jbond@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
  • 14:35 jbond@cumin1001: START - Cookbook sre.hosts.downtime
  • 14:35 jbond@cumin1001: conftool action : set/pooled=no; selector: name=maerlant.wikimedia.org
  • 14:17 jbond@cumin1001: conftool action : set/pooled=yes; selector: name=dns4002.wikimedia.org
  • 14:14 hashar: 1.34.0-wmf.6 deployed to group1 with the exception of cawikinews due to T224116
  • 14:14 mobrovac: start it, es wiki dumps (fr and de completed) to fill the new parsoid tables - T215956
  • 14:11 jbond@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
  • 14:11 jbond@cumin1001: START - Cookbook sre.hosts.downtime
  • 14:10 jbond@cumin1001: conftool action : set/pooled=no; selector: name=dns4002.wikimedia.org
  • 14:09 jbond@cumin1001: conftool action : set/pooled=yes; selector: name=dns4001.wikimedia.org
  • 14:08 jbond@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
  • 14:08 jbond@cumin1001: START - Cookbook sre.hosts.downtime
  • 14:02 marostegui: Stop MySQL on db2078 for upgrade
  • 13:58 bblack: depool cp3046 for reimage to ats-be - T222937
  • 13:58 jmm@cumin2001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
  • 13:58 jmm@cumin2001: START - Cookbook sre.hosts.downtime
  • 13:57 moritzm: rebooting swift frontends in codfw
  • 13:46 jbond@cumin1001: conftool action : set/pooled=yes; selector: name=dns5002.wikimedia.org
  • 13:43 jbond@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
  • 13:43 jbond@cumin1001: START - Cookbook sre.hosts.downtime
  • 13:43 jbond@cumin1001: conftool action : set/pooled=no; selector: name=dns5002.wikimedia.org
  • 13:42 jbond@cumin1001: conftool action : set/pooled=yes; selector: name=dns5001.wikimedia.org
  • 13:36 jbond@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
  • 13:36 jbond@cumin1001: START - Cookbook sre.hosts.downtime
  • 13:35 jbond@cumin1001: conftool action : set/pooled=no; selector: name=dns5001.wikimedia.org
  • 13:27 reedy@deploy1001: Synchronized php-1.34.0-wmf.6/extensions/Collection/templates/: T224092 (duration: 00m 58s)
  • 13:13 hashar@deploy1001: Synchronized php: group1 wikis to 1.34.0-wmf.6 (duration: 00m 54s)
  • 13:06 urandom: decommissioning restbase1008-a -- T223976
  • 12:39 marostegui: Stop replication on db2048 (s1 codfw master) to rebuild revision table - this will generate lag on codfw - T224017
  • 12:35 bblack: cp3035: restarting varnish backend
  • 12:34 marostegui: Stop replication on db1080 to rebuild revision table - T224017
  • 12:29 marostegui@deploy1001: Synchronized wmf-config/db-eqiad.php: Depool db1080 to rebuild revision table T224017 (duration: 00m 55s)
  • 11:30 Amir1: EU SWAT is done
  • 11:30 ladsgroup@deploy1001: Synchronized wmf-config/InitialiseSettings.php: SWAT: Remove constraint-suggestions beta feature (T220609)|gerrit:503342Remove constraint-suggestions beta feature (T220609) (duration: 00m 57s)
  • 11:19 ladsgroup@deploy1001: Synchronized wmf-config/InitialiseSettings.php: SWAT: Add configuration for EntitySchema ShExSimpleUrl (T223120)|gerrit:509878Add configuration for EntitySchema ShExSimpleUrl (T223120) (duration: 00m 56s)
  • 11:14 ladsgroup@deploy1001: Synchronized wmf-config/InitialiseSettings.php: SWAT: [[gerrit:511674|[SDC] Enable depicts qualifiers on testcommons]] (duration: 00m 57s)
  • 10:01 vgutierrez: restarting varnish-backend on cp3039
  • 09:52 mobrovac: start the en, fr and de wiki dumps again to populate the new parsoid table - T215956
  • 09:43 mobrovac@deploy1001: Finished deploy [restbase/deploy@b90fb8b]: Temporarily copy from old tables to new ones if the data is not found, the correct way this time - T215956 (duration: 27m 07s)
  • 09:42 marostegui: Stop MySQL on db2078:m5 to clone db2070 - T221533
  • 09:16 mobrovac@deploy1001: Started deploy [restbase/deploy@b90fb8b]: Temporarily copy from old tables to new ones if the data is not found, the correct way this time - T215956
  • 08:52 marostegui@deploy1001: Synchronized wmf-config/db-codfw.php: Move db2070 from s1 to m5 (duration: 00m 55s)
  • 08:51 marostegui@deploy1001: sync-file aborted: Move db2070 from s1 to m5 (duration: 00m 03s)
  • 08:42 jmm@cumin2001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
  • 08:42 jmm@cumin2001: START - Cookbook sre.hosts.downtime
  • 08:32 marostegui@deploy1001: Synchronized wmf-config/db-eqiad.php: Fully repool db1086 (duration: 00m 56s)
  • 08:17 marostegui@deploy1001: Synchronized wmf-config/db-eqiad.php: Repool db1086 into API (duration: 00m 56s)
  • 08:03 marostegui@deploy1001: Synchronized wmf-config/db-eqiad.php: Slowly repool db1086 (duration: 00m 55s)
  • 07:41 marostegui@deploy1001: Synchronized wmf-config/db-codfw.php: Tackle s8 codfw weights T220170 (duration: 00m 55s)
  • 07:36 mobrovac: decommission restbase1007-c - T223976
  • 07:24 marostegui@deploy1001: Synchronized wmf-config/db-codfw.php: Tackle s4 codfw weights T220170 (duration: 01m 06s)
  • 07:23 marostegui: Restart MySQL on db2090 to change binlog format T220170
  • 06:17 marostegui@deploy1001: Synchronized wmf-config/db-codfw.php: Remove db2040 from config T224079 (duration: 00m 55s)
  • 06:16 marostegui@deploy1001: Synchronized wmf-config/db-eqiad.php: Remove db2040 from config T224079 (duration: 00m 56s)
  • 06:13 marostegui: Remove db2040 from zarcillo and tendril - T224079
  • 06:01 marostegui: Stop MySQL on db2040 - T224079
  • 05:42 marostegui: Stop MySQL on db1086 to clone db1136
  • 05:39 marostegui@deploy1001: Synchronized wmf-config/db-eqiad.php: Depool db1086 (duration: 00m 55s)
  • 05:26 marostegui@deploy1001: Synchronized wmf-config/db-codfw.php: Pool db2118 and db2120 into s7 T222772 (duration: 00m 55s)
  • 05:25 marostegui@deploy1001: Synchronized wmf-config/db-eqiad.php: Pool db2118 and db2120 into s7 T222772 (duration: 00m 55s)
  • 05:09 marostegui@deploy1001: Synchronized wmf-config/db-eqiad.php: Remove db1118 from s1 api and pool db1134 instead T224017 (duration: 00m 57s)
  • 04:41 gilles: purging ruwiki and eswiki to make them get the new origin trial tokens
  • 04:39 gilles@deploy1001: Synchronized wmf-config/InitialiseSettings.php: Renew origin trial tokens (duration: 00m 57s)
  • 03:22 legoktm: removed 2fa for T224075
  • 01:46 aaron@deploy1001: Synchronized php-1.34.0-wmf.5/includes/specials/SpecialWatchlist.php: 68eeaa5 (duration: 00m 57s)
  • 01:22 aaron@deploy1001: Synchronized php-1.34.0-wmf.6/includes/specials/SpecialWatchlist.php: 447bf50 (duration: 00m 57s)

2019-05-21

  • 23:47 maxsem@deploy1001: Synchronized wmf-config/InitialiseSettings.php: https://gerrit.wikimedia.org/r/#/c/operations/mediawiki-config/+/511668/ (duration: 00m 57s)
  • 23:34 maxsem@deploy1001: Synchronized wmf-config/CommonSettings.php: https://gerrit.wikimedia.org/r/#/c/operations/mediawiki-config/+/511667/ (duration: 00m 56s)
  • 22:56 mutante: ms-be2034 - degraded systemd state was cleared and originally caused by " failed Session 72587 of user debmonitor"
  • 22:56 mutante: ms-be2034 - sudo systemctl reset-failed
  • 22:51 urandom: decommissioning restbase1007-b -- T223976
  • 21:35 ejegg: updated payments-wiki from d5ef5ad067 to fa005a0640
  • 21:21 mutante: re-enabling puppet on mc1* hosts
  • 20:43 mutante: re-enabling puppet on all hosts using memcached class - except mc1*
  • 20:31 mutante: mc2019 - stopping memcached and letting puppet restart it to confirm no issues after switching to systemd::service
  • 20:20 mutante: disabling puppet on all servers using class memcached (57)
  • 20:06 tzatziki: removing (another) two files for legal compliance
  • 19:43 tzatziki: removing two files for legal compliance
  • 19:12 thcipriani: gerrit back on 2.15.13
  • 19:09 thcipriani: restart gerrit for 2.15.13 update
  • 19:08 thcipriani@deploy1001: Finished deploy [gerrit/gerrit@2de9001]: Gerrit to 2.15.13 (cobalt, restart incoming) (duration: 00m 20s)
  • 19:08 thcipriani@deploy1001: Started deploy [gerrit/gerrit@2de9001]: Gerrit to 2.15.13 (cobalt, restart incoming)
  • 19:06 thcipriani@deploy1001: Finished deploy [gerrit/gerrit@2de9001]: Gerrit to 2.15.13 (gerrit 2001 only) (duration: 00m 11s)
  • 19:06 thcipriani@deploy1001: Started deploy [gerrit/gerrit@2de9001]: Gerrit to 2.15.13 (gerrit 2001 only)
  • 18:50 bblack: repooling cp1085 frontends (weren't meant to be depooled)
  • 18:38 bblack: re-pooling eqiad front edge traffic (onto new LVSes from T184293 )
  • 18:36 XioNoX: update lvs static routes on cr1/2-eqiad - T184293
  • 18:06 andrewbogott: restarting rabbitmq-server on cloudcontrol1003 (turning on HA queues)
  • 17:59 bblack: rebooting lvs1016 in attempt to clear interface config issues - T224027
  • 17:51 XioNoX: add BGP sessions to AS202053 in esams
  • 17:31 bblack: eqiad LVS: low-traffic (all internal services): puppeting lvs1016, bringing back pybal in "secondary" role for all 3 traffic classes (high-traffic1, high-traffic2, low-traffic), no traffic shift expected (again, after merging last-minute fixup https://gerrit.wikimedia.org/r/c/operations/puppet/+/511759 )
  • 17:25 bblack: eqiad LVS: low-traffic (all internal services): puppeting lvs1016, bringing back pybal in "secondary" role for all 3 traffic classes (high-traffic1, high-traffic2, low-traffic), no traffic shift expected
  • 17:24 bblack: eqiad LVS: low-traffic (all internal services): puppeting lvs1006, basically no-op
  • 17:21 bblack: eqiad LVS: low-traffic (all internal services): puppeting lvs1015, bringing back pybal in primary role, shifting traffic to lvs1015
  • 17:20 bblack: eqiad LVS: low-traffic (all internal services): disable pybal on lvs1016 + lvs1015, shifting traffic to lvs1006
  • 17:18 reedy@deploy1001: Synchronized php-1.34.0-wmf.6/extensions/Collection/includes/CollectionHooks.php: Fix paths (duration: 00m 56s)
  • 17:17 bblack: eqiad LVS: high-traffic2 (upload): puppeting lvs1005, basically no-op
  • 17:15 bblack: eqiad LVS: high-traffic2 (upload): puppeting lvs1002, bringing back pybal in backup role, no traffic shift
  • 17:13 bblack: eqiad LVS: high-traffic2 (upload): puppeting lvs1014, bringing back pybal in primary role, shifting traffic to lvs1014
  • 17:11 bblack: eqiad LVS: high-traffic2 (upload): disable pybal on lvs1014 + lvs1002, shifting traffic to lvs1005
  • 17:09 bblack: eqiad LVS: high-traffic1 (text): puppeting lvs1004, basically no-op
  • 17:07 bblack: eqiad LVS: high-traffic1 (text): puppeting lvs1001, bringing back pybal in backup role, no traffic shift
  • 17:06 bblack: eqiad LVS: high-traffic1 (text): puppeting lvs1013, bringing back pybal in primary role, shifting traffic to lvs1013
  • 17:04 bblack: eqiad LVS: high-traffic1 (text): disable pybal on lvs1013 + lvs1001, shifting traffic to lvs1004
  • 16:55 jbond@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
  • 16:55 jbond@cumin1001: START - Cookbook sre.hosts.downtime
  • 16:55 jbond42: rebooting wtp1046-1048
  • 16:55 bblack: starting Eqiad LVS re-arrangement shortly - T184293 - https://gerrit.wikimedia.org/r/c/operations/puppet/+/511717 (eqiad front edge is still depooled from public traffic)
  • 16:50 jbond@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
  • 16:50 jbond@cumin1001: START - Cookbook sre.hosts.downtime
  • 16:50 jbond42: rebooting wtp1043-1045
  • 16:46 mutante: rebooting phab1003 (non-prod)
  • 16:44 jbond@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
  • 16:44 jbond@cumin1001: START - Cookbook sre.hosts.downtime
  • 16:44 jbond42: rebooting wtp1040-1042
  • 16:39 jbond@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
  • 16:39 jbond@cumin1001: START - Cookbook sre.hosts.downtime
  • 16:39 jbond42: rebooting wtp1037-1039
  • 16:26 mobrovac: truncate "others_T_parsoid".data
  • 16:25 mobrovac: restbase truncate "commons_T_parsoid".data
  • 16:24 jbond@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
  • 16:24 jbond@cumin1001: START - Cookbook sre.hosts.downtime
  • 16:24 jbond42: rebooting wtp1033-1034
  • 16:18 mobrovac: restbase truncate "enwiki_T_parsoid".data
  • 16:16 jbond@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
  • 16:16 jbond@cumin1001: START - Cookbook sre.hosts.downtime
  • 16:16 jbond42: rebooting wtp1031-1032
  • 16:10 mobrovac: restbase truncate "wikipedia_T_parsoid".data
  • 16:09 jbond@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
  • 16:09 jbond@cumin1001: START - Cookbook sre.hosts.downtime
  • 16:09 jbond42: rebooting wtp1029-2030
  • 16:01 jbond@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
  • 16:01 jbond@cumin1001: START - Cookbook sre.hosts.downtime
  • 16:01 jbond42: rebooting wtp1027-2028
  • 15:56 urandom: decommissioning restbase1007-a -- T208087
  • 15:54 jbond@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
  • 15:54 jbond@cumin1001: START - Cookbook sre.hosts.downtime
  • 15:54 jbond42: rebooting wtp1025-2026
  • 15:45 mobrovac@deploy1001: Finished deploy [restbase/deploy@cf00120]: Revert Temporarily copy from old tables to new ones if the data is not found, rb1007 (duration: 02m 43s)
  • 15:42 mobrovac@deploy1001: Started deploy [restbase/deploy@cf00120]: Revert Temporarily copy from old tables to new ones if the data is not found, rb1007
  • 15:42 mobrovac@deploy1001: Finished deploy [restbase/deploy@cf00120]: Revert Temporarily copy from old tables to new ones if the data is not found (duration: 02m 40s)
  • 15:40 jbond@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
  • 15:40 jbond@cumin1001: START - Cookbook sre.hosts.downtime
  • 15:40 jbond42: rebooting wtp2019-2020
  • 15:39 mobrovac@deploy1001: Started deploy [restbase/deploy@cf00120]: Revert Temporarily copy from old tables to new ones if the data is not found
  • 15:38 mobrovac@deploy1001: Finished deploy [restbase/deploy@022cb98]: Temporarily copy from old tables to new ones if the data is not found, take #2 (duration: 00m 45s)
  • 15:38 mobrovac@deploy1001: Started deploy [restbase/deploy@022cb98]: Temporarily copy from old tables to new ones if the data is not found, take #2
  • 15:37 mobrovac@deploy1001: Finished deploy [restbase/deploy@022cb98]: Temporarily copy from old tables to new ones if the data is not found - T215956 (duration: 07m 10s)
  • 15:37 oblivian@deploy1001: Synchronized wmf-config/CommonSettings.php: Moving to 10% of users on php7 T219150 (duration: 00m 57s)
  • 15:32 XioNoX: enable BGP to telia on cr1-codfw - T222967
  • 15:30 mobrovac@deploy1001: Started deploy [restbase/deploy@022cb98]: Temporarily copy from old tables to new ones if the data is not found - T215956
  • 15:24 jbond@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
  • 15:23 jbond@cumin1001: START - Cookbook sre.hosts.downtime
  • 15:23 jbond42: rebooting wtp2017-2018
  • 15:13 jbond@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
  • 15:13 jbond@cumin1001: START - Cookbook sre.hosts.downtime
  • 15:13 jbond42: rebooting wtp2015-2016
  • 15:10 XioNoX: disable BGP to telia on cr1-codfw - T222967
  • 15:05 jbond@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
  • 15:05 jbond@cumin1001: START - Cookbook sre.hosts.downtime
  • 15:05 jbond42: rebooting wtp2013-2014
  • 15:02 crusnov@deploy1001: Finished deploy [netbox/deploy@3091b51]: deploy new version of ganeti-netbox sync - T220422 (duration: 00m 55s)
  • 15:01 crusnov@deploy1001: Started deploy [netbox/deploy@3091b51]: deploy new version of ganeti-netbox sync - T220422
  • 14:57 jbond@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
  • 14:57 jbond@cumin1001: START - Cookbook sre.hosts.downtime
  • 14:57 jbond42: rebooting wtp2011-2012
  • 14:57 hashar@deploy1001: rebuilt and synchronized wikiversions files: group0 to 1.34.0-wmf.6
  • 14:50 jbond42: rebooting wtp2009-2010
  • 14:50 jbond@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
  • 14:50 jbond@cumin1001: START - Cookbook sre.hosts.downtime
  • 14:44 jbond42: rebooting wtp2007-2008
  • 14:44 jbond@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
  • 14:44 jbond@cumin1001: START - Cookbook sre.hosts.downtime
  • 14:37 jbond@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
  • 14:37 jbond@cumin1001: START - Cookbook sre.hosts.downtime
  • 14:37 jbond42: rebooting wtp2005-2006
  • 14:31 jbond@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
  • 14:31 jbond@cumin1001: START - Cookbook sre.hosts.downtime
  • 14:31 jbond42: rebooting wtp2003-2004
  • 14:27 hashar@deploy1001: Finished scap: testwiki to php-1.344.0-wmf.6 and rebuild l10n cache # T220731 (duration: 48m 09s)
  • 14:26 volans: restarting wikibugs
  • 14:25 elukey@cumin1001: END (PASS) - Cookbook sre.hosts.decommission (exit_code=0)
  • 14:25 elukey@cumin1001: START - Cookbook sre.hosts.decommission
  • 14:14 jbond@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
  • 14:13 jbond@cumin1001: START - Cookbook sre.hosts.downtime
  • 14:13 jbond42: rebooting wtp2001-2002
  • 13:50 bblack: rebooting lvs1013,14,15 for verification
  • 13:39 hashar@deploy1001: Started scap: testwiki to php-1.344.0-wmf.6 and rebuild l10n cache # T220731
  • 13:37 hashar@deploy1001: Pruned MediaWiki: 1.34.0-wmf.1 (duration: 02m 12s)
  • 13:36 hashar: scap clean --verbose --delete 1.34.0-wmf.1 # T220731
  • 13:29 hashar: scap clean --verbose --delete 1.33.0-wmf.25 # T220731
  • 13:25 godog: swift eqiad-prod: start depool ms-be1033 - T223518
  • 13:24 hashar: Applied security patches to 1.34.0-wmf.6 # T220731
  • 13:24 hashar: Applied security patches to 1.34.0-wmf.6
  • 13:23 bblack: rebooting lvs1013 (possibly a few times, debugging startup issues)
  • 13:20 hashar: scap prep 1.34.0-wmf.6 # T220731
  • 13:11 hashar: Updated plugins on https://releases-jenkins.wikimedia.org/
  • 13:09 hashar: Restarting Jenkins T224002
  • 12:45 hashar: Cutting branch wmf/1.34.0-wmf.6 # T220731
  • 12:22 volans: restarting Icinga on icinga1001 to pick up new open files limits
  • 12:08 jiji@deploy1001: Finished deploy [cpjobqueue/deploy@4588f16]: Migrating htmlCacheUpdate to PHP7 - T219148 (duration: 00m 54s)
  • 12:07 jiji@deploy1001: Started deploy [cpjobqueue/deploy@4588f16]: Migrating htmlCacheUpdate to PHP7 - T219148
  • 11:59 mobrovac: started dewiki dumps - T215956
  • 11:58 mobrovac: started frwiki dumps - T215956
  • 11:46 mobrovac: started enwiki dumps - T215956
  • 11:27 Amir1: EU SWAT is done
  • 11:27 ladsgroup@deploy1001: Synchronized wmf-config/CommonSettings.php: SWAT: Revert "Switch off php7 for investigation of production instabilities"|gerrit:511658Revert "Switch off php7 for investigation of production instabilities" (duration: 00m 50s)
  • 11:20 volans: restarting Icinga on icinga2001 (passive server) to pick up new open file limits
  • 11:18 jbond@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
  • 11:17 jbond@cumin1001: START - Cookbook sre.hosts.downtime
  • 11:17 jbond42: reboot wtp1025.eqiad.wmnet
  • 11:10 ladsgroup@deploy1001: Synchronized wmf-config/CommonSettings.php: SWAT: Define wmgUseEntitySchema (T221651)|gerrit:505816Define wmgUseEntitySchema (T221651), part II (duration: 00m 49s)
  • 11:09 mobrovac@deploy1001: Finished deploy [restbase/deploy@cf00120]: Switch Parsoid to simple k/v bucket - T215956 (duration: 25m 50s)
  • 11:08 ladsgroup@deploy1001: Synchronized wmf-config/InitialiseSettings.php: SWAT: Define wmgUseEntitySchema (T221651)|gerrit:505816Define wmgUseEntitySchema (T221651), part I (duration: 00m 50s)
  • 11:07 godog: swift codfw-prod: remove ms-be201[345] - T221068
  • 10:59 _joe_: rolling restart of php7.2-fpm across the fleet to pick up a config change
  • 10:44 mobrovac@deploy1001: Started deploy [restbase/deploy@cf00120]: Switch Parsoid to simple k/v bucket - T215956
  • 10:39 jijiki: updating prometheus-mcrouter-exporter on mw* servers
  • 10:26 godog: pool new restbase hosts - T219404
  • 10:20 filippo@puppetmaster1001: conftool action : set/pooled=yes; selector: name=restbase1019.eqiad.wmnet
  • 09:49 moritzm: updated buster netboot image to daily image from 20190521
  • 09:26 moritzm: reimaging graphite2001 to buster for some d-i tests
  • 08:58 marostegui@deploy1001: Synchronized wmf-config/db-codfw.php: Repool db2104 as candidate master and as API (duration: 00m 51s)
  • 08:56 marostegui: Stop MySQL on db2041 as it will be decommissioned T223950
  • 06:59 oblivian@deploy1001: Synchronized wmf-config/CommonSettings.php: Turning off php7 sampling for investigation in T223952 (duration: 00m 53s)
  • 06:55 elukey: reboot of stat100[4,5,6,7] and notebook100[3,4] for kernel upgrades
  • 06:31 marostegui: Stop mariadb on db2104 to convert it to s2 candidate master
  • 06:30 marostegui@deploy1001: Synchronized wmf-config/db-codfw.php: Depool db2104 (duration: 00m 51s)
  • 05:50 marostegui: Remove db2041 from tendril and zarcillo - T223950
  • 05:43 marostegui@deploy1001: Synchronized wmf-config/db-eqiad.php: Remove db2041 for decommissioning T223950 (duration: 00m 51s)
  • 05:42 marostegui@deploy1001: Synchronized wmf-config/db-codfw.php: Remove db2041 for decommissioning T223950 (duration: 00m 51s)
  • 05:16 marostegui: Stop MySQL on db2040
  • 05:16 marostegui@deploy1001: Synchronized wmf-config/db-codfw.php: Depool db2040 (duration: 00m 50s)
  • 05:14 marostegui@deploy1001: Synchronized wmf-config/db-codfw.php: Pool db2114 into s6 - T222772 (duration: 00m 50s)
  • 05:13 marostegui@deploy1001: Synchronized wmf-config/db-eqiad.php: Pool db2114 into s6 - T222772 (duration: 00m 51s)
  • 03:36 urandom: bootstrapping restbase1027-c -- T219404
  • 00:47 urandom: bootstrapping restbase1027-b -- T219404
  • 00:05 aaron@deploy1001: Synchronized php-1.34.0-wmf.5/includes/libs/objectcache/APCUBagOStuff.php: 982299d (duration: 00m 54s)

2019-05-20

  • 21:07 ejegg: updated payments-wiki from 8397ccf9cc to d5ef5ad067
  • 19:20 mobrovac: bootstrap restbase1027-a - T219404
  • 18:55 krinkle@deploy1001: Synchronized php-1.34.0-wmf.5/includes/Linker.php: T222857 / Iecc2140fabd3 (duration: 00m 54s)
  • 16:43 onimisionipe: rolling reboot of maps eqiad to pick kernel upgrades
  • 16:38 mobrovac: bootstrap restbase1026-c - T219404
  • 15:26 onimisionipe: rebooting codfw maps to pick up kernel upgrades
  • 15:26 marostegui: Stop replication on labsdb1011 to start compressing tables - T222978
  • 15:13 anomie@deploy1001: Synchronized wmf-config/InitialiseSettings.php: Setting actor migration to write-new/read-new on group 0 (T188327) (duration: 00m 55s)
  • 14:54 bblack: rebooting lvs1013, lvs1014, lvs1015 (not in active service, yet)
  • 14:43 jiji@deploy1001: Finished deploy [cpjobqueue/deploy@89b0ad0]: Migrating RecordLintJob to PHP7 - T219148 (duration: 00m 55s)
  • 14:42 jiji@deploy1001: Started deploy [cpjobqueue/deploy@89b0ad0]: Migrating RecordLintJob to PHP7 - T219148
  • 14:21 marostegui: Reload haproxy on dbroxy1010 to depool labsdb1011
  • 14:14 marostegui: Reload haproxy on dbroxy1010 to repool labsdb1010
  • 13:58 mobrovac: bootstrap restbase1026-b - T219404
  • 12:43 marostegui@deploy1001: Synchronized wmf-config/db-eqiad.php: More traffic to db1126 and db1134 (duration: 00m 50s)
  • 11:44 fsero@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
  • 11:44 fsero@cumin1001: START - Cookbook sre.hosts.downtime
  • 11:28 fsero@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
  • 11:28 fsero@cumin1001: START - Cookbook sre.hosts.downtime
  • 11:21 fsero@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
  • 11:21 fsero@cumin1001: START - Cookbook sre.hosts.downtime
  • 11:17 mobrovac: bootstrap restbase1026-a - T219404
  • 11:16 fsero@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
  • 11:15 fsero@cumin1001: START - Cookbook sre.hosts.downtime
  • 11:01 arturo: icinga downtime toolschecker for 3h for T223332
  • 10:45 jmm@cumin2001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
  • 10:44 jmm@cumin2001: START - Cookbook sre.hosts.downtime
  • 10:43 jdrewniak@deploy1001: Synchronized portals: Wikimedia Portals Update: Bumping portals to master (T128546)|gerrit:511398 Bumping portals to master (T128546) (duration: 00m 49s)
  • 10:42 jdrewniak@deploy1001: Synchronized portals/wikipedia.org/assets: Wikimedia Portals Update: Bumping portals to master (T128546)|gerrit:511398 Bumping portals to master (T128546) (duration: 00m 50s)
  • 10:27 moritzm: rebooting contint1001 for kernel update
  • 10:25 hashar: contint1001: docker image prune -f | Total reclaimed space: 7.115GB | T207707
  • 10:20 hashar: Stopped Zuul gracefully
  • 10:18 jmm@cumin2001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
  • 10:18 jmm@cumin2001: START - Cookbook sre.hosts.downtime
  • 10:18 fsero: puppet reenabled certs renewed - T221346
  • 10:08 fsero: rolling over certs into mcrouter proxies codfw - T221346
  • 10:03 fsero: rolling over certs into mcrouter proxies eqiad - T221346
  • 09:42 marostegui: Remove db2036 from tendril and zarcillo - T223885
  • 09:39 marostegui: Stop MySQL on db2036 T223885
  • 09:38 marostegui@deploy1001: Synchronized wmf-config/db-codfw.php: Remove db2036, going to be decommissioned T223885 (duration: 00m 49s)
  • 09:37 marostegui@deploy1001: Synchronized wmf-config/db-eqiad.php: Remove db2036, going to be decommissioned T223885 (duration: 00m 49s)
  • 09:36 fsero: rolling over new certs to all mcrouter hosts except proxys - T221346
  • 09:26 fsero: continue to rolling over new certs - T221346
  • 09:01 fsero: disabling puppet on mcrouter hosts for regenerating certs - T221346
  • 08:49 moritzm: installing atftpd security updates
  • 08:43 mobrovac: bootstrap restbase1025-c - T219404
  • 08:38 moritzm: installing samba security updates
  • 08:36 moritzm: installing ghostscript security updates on jessie
  • 08:25 moritzm: installing cups-filter security updates on jessie (prerequisite for ghostscript security update)
  • 07:48 marostegui@deploy1001: Synchronized wmf-config/db-eqiad.php: More traffic to db1126 and db1134 (duration: 00m 48s)
  • 07:26 marostegui@deploy1001: Synchronized wmf-config/db-codfw.php: Repool db2046 (duration: 00m 50s)
  • 06:25 elukey: rebuild and upload memkeys 20181031-1 to stretch-wikimedia
  • 06:21 marostegui@deploy1001: Synchronized wmf-config/db-eqiad.php: More traffic to db1126 and db1134 (duration: 00m 49s)
  • 06:20 elukey: upgrade memkeys to version 20181031-1 on all the mc* hosts (was deployied only on a few of them) - T208376
  • 06:11 mobrovac: bootstrap restbase1025-b - T219404
  • 06:00 elukey: powercycle analytics1071 - soft lockups error messages in the dmesg
  • 05:51 marostegui: Reload haproxy on dbproxy1010 to depool labsdb1010
  • 05:42 marostegui: Reload haproxy on dbproxy1010 and dbproxy1011 to repool labsdb1009 and restore original weights
  • 05:39 marostegui@deploy1001: Synchronized wmf-config/db-eqiad.php: Pool db1126 into s8, db1134 into s1 T222682 (duration: 00m 49s)
  • 05:38 marostegui@deploy1001: Synchronized wmf-config/db-codfw.php: Pool db1126 into s8, db1134 into s1 T222682 (duration: 00m 49s)
  • 05:12 marostegui: Stop MySQL on db2046
  • 05:11 marostegui@deploy1001: Synchronized wmf-config/db-codfw.php: Depool db2046 (duration: 00m 50s)
  • 05:07 marostegui@deploy1001: Synchronized wmf-config/db-codfw.php: Repool db2038 (duration: 00m 49s)
  • 05:01 marostegui@deploy1001: Synchronized wmf-config/db-codfw.php: Depool db2038 (duration: 00m 55s)
  • 02:42 cdanis: cdanis@cp1075.eqiad.wmnet ~ % sudo -i varnish-backend-restart

2019-05-19

  • 20:16 ariel@deploy1001: Finished deploy [dumps/dumps@4febe0c]: for abstract dumps, skip any processing of pages not in main namespace (duration: 00m 03s)
  • 20:16 ariel@deploy1001: Started deploy [dumps/dumps@4febe0c]: for abstract dumps, skip any processing of pages not in main namespace
  • 17:51 mobrovac: bootstrap restbase1025-a - T219404
  • 13:26 ebernhardson@deploy1001: Synchronized wmf-config/ProductionServices.php: T223734: Depool cloudelastic100[12] (duration: 00m 49s)
  • 12:37 reedy@deploy1001: Synchronized wmf-config/interwiki-labs.php: update (duration: 00m 57s)
  • 10:32 reedy@deploy1001: Synchronized wikiversions-labs.json: T223770 (duration: 00m 48s)
  • 10:31 reedy@deploy1001: Synchronized dblists/all-labs.dblist: T223770 (duration: 00m 51s)
  • 10:12 mobrovac: bootstrap restbase1024-c - T219404
  • 09:59 ebernhardson: eqiad psi elasticsearch high disk watermark to 89% to allow unallocated shard to initialize
  • 09:56 ebernhardson: eqiad psi elasticsearch low disk watermark to 79% to allow unallocated shard to initialize
  • 08:13 jijiki: varnish-backend-restart on cp1087
  • 06:56 mobrovac: bootstrap restbase1024-b - T219404
  • 05:09 marostegui: varnish-backend-restart on cp1081

2019-05-18

  • 23:53 bblack: rebooting lvs1015 for interface changes
  • 22:44 bblack: imaging lvs1013-lvs1015
  • 21:01 bblack: depooling eqiad public front edge in authdns
  • 19:18 krinkle@deploy1001: Synchronized php-1.34.0-wmf.5/extensions/Collection/templates/CollectionSuggestTemplate.php: T223742 / 89bd434 (duration: 00m 49s)
  • 19:16 mobrovac: bootstrap restbase1024-a - T219404
  • 18:50 krinkle@deploy1001: Synchronized wmf-config/CommonSettings.php: T222146 / 9385b2dd66 (duration: 00m 50s)
  • 16:53 mobrovac: bootstrap restbase1023-c - T219404
  • 15:57 krinkle@deploy1001: Synchronized php-1.34.0-wmf.5/extensions/TimedMediaHandler/includes/handlers/WebMHandler/WebMHandler.php: T223445 / a9df59c59d7a30 (duration: 00m 51s)
  • 14:59 reedy@deploy1001: Synchronized wmf-config/InitialiseSettings.php: whitespace is srs (duration: 00m 49s)
  • 14:56 reedy@deploy1001: Synchronized wmf-config/flaggedrevs.php: Copy in default config (duration: 01m 04s)
  • 13:51 urandom: bootstrapping restbase1023-b - T219404
  • 05:41 mobrovac: bootstrap rb1023-a - T219404
  • 02:37 urandom: bootstrapping restbase1022-c - T219404

2019-05-17

  • 23:55 urandom: bootstrapping restbase1022-b - T219404
  • 23:11 foks: removing one file for legal compliance
  • 15:20 hashar@deploy1001: Synchronized php-1.34.0-wmf.5/includes/api/ApiUpload.php: Revert "Always validate uploads over api" - T223448 (T222994 T223446) (duration: 01m 00s)
  • 15:18 hashar: Deploying hotfix https://gerrit.wikimedia.org/r/#/c/mediawiki/core/+/510924/ . Should restore upload of large files on commons and other wikis #T223448 (poke T22994 T223446 )
  • 14:51 mobrovac: bootstrap restbase1022-a - T219404
  • 14:43 fsero: reenabling puppet puppet on mcrouter hosts for T221346, checks in place is there any alert for cert expiration and mcrouter this is the source :)
  • 14:17 jynus@deploy1001: Synchronized wmf-config/db-eqiad.php: Repool db1098 & db1131 after maintenance (duration: 00m 49s)
  • 14:09 fsero: second round of setting up cert check, disablign puppet on mcrouter hosts T221346
  • 12:58 mobrovac: bootstrap restbase1021-c - T219404
  • 10:59 mobrovac: bootstrap restbase1021-b - T219404
  • 09:27 godog: swift remove ms-be101[345] from rings - T220590
  • 09:02 marostegui@deploy1001: Synchronized wmf-config/db-eqiad.php: Fully repool db1083 (duration: 00m 48s)
  • 08:53 marostegui@deploy1001: Synchronized wmf-config/db-eqiad.php: More traffic to db1083 (duration: 00m 49s)
  • 08:43 marostegui@deploy1001: Synchronized wmf-config/db-eqiad.php: More traffic to db1083 (duration: 00m 49s)
  • 08:32 marostegui@deploy1001: Synchronized wmf-config/db-eqiad.php: More traffic to db1083 (duration: 00m 49s)
  • 08:24 fsero: reenabling puppet after reverting T221346
  • 08:19 marostegui@deploy1001: Synchronized wmf-config/db-eqiad.php: Slowly repool db1083 (duration: 00m 59s)
  • 07:57 fsero: disabling puppet on mcrouter hosts for T221346
  • 07:12 marostegui: Compress s7 on labsdb1012 T222978
  • 06:36 marostegui@deploy1001: Synchronized wmf-config/db-codfw.php: Pool db2111 and db2113 into s5 T222772 (duration: 00m 49s)
  • 06:35 marostegui@deploy1001: Synchronized wmf-config/db-eqiad.php: Pool db2111 and db2113 into s5 T222772 (duration: 00m 50s)
  • 05:19 marostegui: Stop MySQL on db1083 to clone db1134
  • 05:17 marostegui@deploy1001: Synchronized wmf-config/db-eqiad.php: Depool db1083 (duration: 00m 50s)
  • 05:00 mobrovac: bootstrap 1021-a - T219404

2019-05-16

  • 21:02 Jeff_Green: authdns-update to switch payments.wikimedia.org back to eqiad cluster
  • 19:24 onimisionipe: pooling elastic2038 - shards are properly balanced across nodes
  • 18:31 onimisionipe: depooling elastic2038 to investigate more
  • 17:26 jbond@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
  • 17:26 jbond@cumin1001: START - Cookbook sre.hosts.downtime
  • 17:26 jbond42: reboot ores1007-1009
  • 17:15 jbond@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
  • 17:15 jbond@cumin1001: START - Cookbook sre.hosts.downtime
  • 17:15 jbond42: reboot ores1005-1006
  • 17:10 jbond@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
  • 17:10 jbond@cumin1001: START - Cookbook sre.hosts.downtime
  • 17:10 jbond42: reboot ores1003-1004
  • 17:05 jbond@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
  • 17:05 jbond@cumin1001: START - Cookbook sre.hosts.downtime
  • 17:05 jbond42: reboot ores1001-1002
  • 17:00 jbond42: reboot orespoolcounter[12]002
  • 16:53 jbond@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
  • 16:53 jbond@cumin1001: START - Cookbook sre.hosts.downtime
  • 16:53 jbond@cumin1001: END (FAIL) - Cookbook sre.hosts.downtime (exit_code=99)
  • 16:53 jbond@cumin1001: START - Cookbook sre.hosts.downtime
  • 16:53 jbond@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
  • 16:52 jbond@cumin1001: START - Cookbook sre.hosts.downtime
  • 16:52 jbond@cumin1001: END (FAIL) - Cookbook sre.hosts.downtime (exit_code=99)
  • 16:52 jbond@cumin1001: START - Cookbook sre.hosts.downtime
  • 16:51 jbond42: reboot orespoolcounter[12]001
  • 16:44 jbond42: reboot ores2008-2009
  • 16:38 jbond42: will frist reboot ores2006-2007
  • 16:36 jbond@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
  • 16:36 jbond@cumin1001: START - Cookbook sre.hosts.downtime
  • 16:36 jbond42: reboot ores2006-2009
  • 16:28 jbond@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
  • 16:28 jbond@cumin1001: START - Cookbook sre.hosts.downtime
  • 16:28 jbond42: reboot ores2003-2005
  • 16:22 XioNoX: add BGP session to Hetzner in AMS-IX
  • 16:19 akosiaris: switch all etcd* kubestagetcd* servers from "drbd" ganeti disk template to "plain" ganeti disk template
  • 16:17 jbond42: reboot ores2001-2002
  • 16:16 jbond@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
  • 16:16 jbond@cumin1001: START - Cookbook sre.hosts.downtime
  • 15:59 akosiaris: build service-checker OCI container 0.0.2 with 0.1.5 service-checker version T220401
  • 15:49 jforrester@deploy1001: Synchronized php-1.34.0-wmf.5/extensions/CirrusSearch/includes/InterwikiSearcher.php: Hot-deploy CirrusSearch interwiki no result UBN T223449 (duration: 00m 49s)
  • 15:45 marostegui: Drop the following databases from tendril to recreated them with the right user: db1127,db1129,db1130, db1131, db1137,db1138
  • 15:35 jforrester@deploy1001: Synchronized php-1.34.0-wmf.5/includes/specials/pagers/ContribsPager.php: Hot-deploy Contribs getNamespaceInfo UBN fix T223440 (duration: 00m 53s)
  • 15:25 aborrero@puppetmaster1001: conftool action : set/pooled=yes; selector: name=labweb1001.wikimedia.org,service=labweb
  • 15:02 jbond@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
  • 15:02 jbond@cumin1001: START - Cookbook sre.hosts.downtime
  • 15:02 jbond42: rebooting aqs1009
  • 14:54 jbond@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
  • 14:54 jbond@cumin1001: START - Cookbook sre.hosts.downtime
  • 14:54 jbond42: rebooting aqs1008
  • 14:45 jbond@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
  • 14:45 jbond@cumin1001: START - Cookbook sre.hosts.downtime
  • 14:45 jbond42: rebooting aqs1007
  • 14:34 jbond@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
  • 14:34 jbond@cumin1001: START - Cookbook sre.hosts.downtime
  • 14:34 jbond42: rebooting aqs1006
  • 14:28 jbond42: rebooting aqs1005
  • 14:21 jbond@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
  • 14:21 jbond@cumin1001: START - Cookbook sre.hosts.downtime
  • 14:18 moritzm: powercycling mw2199, stuck during reboot
  • 14:08 jbond@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
  • 14:08 jbond@cumin1001: START - Cookbook sre.hosts.downtime
  • 14:07 jbond@cumin1001: END (FAIL) - Cookbook sre.hosts.downtime (exit_code=99)
  • 14:07 jbond@cumin1001: START - Cookbook sre.hosts.downtime
  • 14:07 jbond@cumin1001: END (FAIL) - Cookbook sre.hosts.downtime (exit_code=99)
  • 14:07 jbond@cumin1001: START - Cookbook sre.hosts.downtime
  • 13:57 marostegui: and recreate the following hosts in tendril: db2103,db2104,db2105,db2106,db2107,db2108,db2109,db2110,db2111,db2112,db2113,db2115,db2116,db2117,db2119 T222772
  • 13:50 jmm@cumin2001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
  • 13:50 jmm@cumin2001: START - Cookbook sre.hosts.downtime
  • 13:39 cmjohnson1: replacing pdu in rack B5 eqiad
  • 13:04 hashar@deploy1001: rebuilt and synchronized wikiversions files: all wikis to 1.34.0-wmf.5
  • 13:00 arturo: labweb1001 depooled
  • 12:59 mobrovac: bootstrap restbase1020-c - T219404
  • 12:21 godog: stop swift and rsync on ms-be10[16,17,18,32,33] for eqiad B5 pdu replacement - T223126
  • 12:03 jynus: stop and shutdown db1098,db1131,db1139 T223126
  • 11:56 jmm@cumin2001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
  • 11:55 jmm@cumin2001: START - Cookbook sre.hosts.downtime
  • 11:54 moritzm: rebooting mw app servers in codfw for kernel update
  • 11:32 hoo@deploy1001: Synchronized wmf-config/extension-list: Add EntitySchema to extension-list (T221650) (duration: 00m 56s)
  • 11:22 jynus@deploy1001: Synchronized wmf-config/db-eqiad.php: Depool db1098 & db1131 for maintenance (duration: 00m 57s)
  • 11:00 arturo: T223148 downtime cloudvirt[1014,1028].eqiad.wmnet and labweb1001.wikimedia.org for 8 hours
  • 11:00 aborrero@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
  • 11:00 aborrero@cumin1001: START - Cookbook sre.hosts.downtime
  • 11:00 aborrero@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
  • 11:00 aborrero@cumin1001: START - Cookbook sre.hosts.downtime
  • 11:00 aborrero@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
  • 11:00 aborrero@cumin1001: START - Cookbook sre.hosts.downtime
  • 10:50 godog: bootstrap restbase1020-b - T219404
  • 10:27 jiji@deploy1001: Finished deploy [cpjobqueue/deploy@4d55dff]: Migrating updateBetaFeaturesUserCounts to PHP7 - T219148 (duration: 01m 07s)
  • 10:26 jiji@deploy1001: Started deploy [cpjobqueue/deploy@4d55dff]: Migrating updateBetaFeaturesUserCounts to PHP7 - T219148
  • 08:52 akosiaris: upgrade mathoid to statsd_exporter 0.9 T220709
  • 08:48 akosiaris@deploy1001: scap-helm mathoid finished
  • 08:48 akosiaris@deploy1001: scap-helm mathoid cluster codfw completed
  • 08:48 akosiaris@deploy1001: scap-helm mathoid cluster eqiad completed
  • 08:48 akosiaris@deploy1001: scap-helm mathoid upgrade -f mathoid-values.yaml production stable/mathoid [namespace: mathoid, clusters: eqiad,codfw]
  • 08:47 akosiaris@deploy1001: scap-helm mathoid upgrade -f mathoid-values.yaml [namespace: mathoid, clusters: eqiad,codfw]
  • 08:37 godog: bootstrap restbase1020-a - T219404
  • 08:32 elukey: depool/restart-nutcracker-pool mw1293/1313 - T214275
  • 08:22 elukey: depool/restart-nutcracker-pool mw1238 - T214275
  • 08:03 marostegui@deploy1001: Synchronized wmf-config/db-eqiad.php: Fully repool db1104 (duration: 00m 56s)
  • 07:57 moritzm: installing linux 4.9.168-1+deb9u2~deb8u1 kernel on jessie hosts (no reboots, just installing the new package)
  • 07:45 moritzm: removed intel-microcode 3.20180807a from jessie-wikimedia (superceded by newer version in security.debian.org, which doesn't get picked up by apt due to the higher apr priority of jessie-wikimedia)
  • 07:44 marostegui@deploy1001: Synchronized wmf-config/db-eqiad.php: Repool db1104 into API (duration: 00m 56s)
  • 07:25 marostegui@deploy1001: Synchronized wmf-config/db-eqiad.php: Slowly repool db1104 (duration: 00m 57s)
  • 06:59 moritzm: installing intel-microcode updates
  • 05:34 elukey: roll restart of nutcracker on mw2* to pick up new config changes (no more memcached config) - T214275
  • 05:33 marostegui: Stop MySQL on db1104 to clone db1126
  • 05:29 marostegui@deploy1001: Synchronized wmf-config/db-eqiad.php: Depool db1104 (duration: 00m 56s)
  • 05:18 marostegui@deploy1001: Synchronized wmf-config/db-codfw.php: Pool db2106, db2110, db2119 into s4 - T222772 (duration: 00m 56s)
  • 05:17 marostegui@deploy1001: Synchronized wmf-config/db-eqiad.php: Pool db2106, db2110, db2119 into s4 - T222772 (duration: 00m 58s)
  • 02:27 onimisionipe: pooling elastic2038 after unbanning - T217398

2019-05-15

  • 22:16 mutante: phab1003 - start ssh-phab service after adding service IPs
  • 22:01 eileen: civicrm update - lost the commit versions but 5.13.4 release
  • 21:47 mutante: phab1003 - ip -6 addr del 2620:0:861:ed1a::3:16/128 dev lo - remove extra service IP for phab's separate sshd, duplicated with phab1001 (T190568)
  • 21:24 jforrester@deploy1001: Synchronized wmf-config/MetaContactPages.php: Add movecomsignup contact page on meta T218363 (duration: 00m 56s)
  • 21:23 eileen: civicrm revision changed from 7d3ef1f2ae to c69c6e2e6a, config revision is a099f13a55
  • 21:00 fdans@deploy1001: Finished deploy [analytics/refinery@ffa4931]: deploying analytics refinery (duration: 15m 31s)
  • 20:45 tgr@deploy1001: Finished deploy [proton/deploy@9373c42]: Add gistcdn.githack.com to host blacklist (T213362) (duration: 02m 41s)
  • 20:45 fdans@deploy1001: Started deploy [analytics/refinery@ffa4931]: deploying analytics refinery
  • 20:42 tgr@deploy1001: Started deploy [proton/deploy@9373c42]: Add gistcdn.githack.com to host blacklist (T213362)
  • 20:20 robh: rebooting cloudvirt1015 into dell hardware tests per T220853
  • 20:18 arlolra@deploy1001: Finished deploy [parsoid/deploy@8f28977]: Updating Parsoid to 6658cad (duration: 06m 23s)
  • 20:12 arlolra@deploy1001: Started deploy [parsoid/deploy@8f28977]: Updating Parsoid to 6658cad
  • 19:42 hashar: group 1 promoted to 1.34.0-wmf.5 apparently without any issue # T220730
  • 19:03 hashar@deploy1001: Synchronized php: group1 wikis to 1.34.0-wmf.5 (duration: 00m 58s)
  • 19:02 hashar@deploy1001: rebuilt and synchronized wikiversions files: group1 wikis to 1.34.0-wmf.5
  • 18:38 andyrussg@deploed php-1.34.0-wmf.5/extensions/CentralNotice/: Revert CentralNotice (duration: 01m 00s)
  • 17:32 thcipriani: deploy1001:sudo -u www-data /usr/local/bin/foreachwiki extensions/WikimediaMaintenance/refreshMessageBlobs.php
  • 17:19 onimisionipe: unban elastic2038 from shard allocation - T217398
  • 17:19 XenoRyet: updated civicrm from 4b6d569383 to 7d3ef1f2ae
  • 17:09 elukey: powerup elastic2038 (was down for maintenance)
  • 17:01 godog: bootstrap restbase1019-c - T219404
  • 16:58 bstorm_: T212972 updated all views on labsdb1012
  • 16:50 elukey: restart Hadoop HDFS namenodes on an-master100[1,2] to pick up new settings
  • 16:40 urandom: bootstrap restbase1019-c - T219404
  • 16:28 elukey: restart nutcracker on mw2240 to pick up the new config (no more memcached settings)
  • 16:26 bstorm_: T212972 updated all views on labsdb1009
  • 16:17 reedy@deploy1001: Synchronized wmf-config/InitialiseSettings.php: T223166 (duration: 00m 56s)
  • 16:16 reedy@deploy1001: Synchronized php-1.34.0-wmf.5/extensions/WikimediaEvents/: T219128 (duration: 01m 13s)
  • 16:14 reedy@deploy1001: Synchronized php-1.34.0-wmf.4/extensions/WikimediaEvents/: T219128 (duration: 01m 06s)
  • 16:03 jynus: disable puppet on all production databases
  • 15:21 reedy@deploy1001: Synchronized php-1.34.0-wmf.4/extensions/GrowthExperiments/includes/HelpPanel/QuestionPoster.php: T222980 (duration: 00m 57s)
  • 14:28 andrewbogott: repooling labweb1002
  • 14:16 andrewbogott: depooling labweb1002 to test https://gerrit.wikimedia.org/r/#/c/operations/puppet/+/509916/
  • 14:15 godog: bootstrap restbase1019-b - T219404
  • 13:21 anomie@deploy1001: Synchronized wmf-config/InitialiseSettings.php: Setting actor migration to write-new/read-new on testwikis and mediawikiwiki (T188327) (duration: 00m 57s)
  • 12:22 Lucas_WMDE: EU SWAT done
  • 12:20 lucaswerkmeister-wmde@deploy1001: Synchronized php-1.34.0-wmf.4/extensions/VisualEditor/: SWAT: VisualEditorHooks: Use isVisualAvailable() when changing tabs/editsections|gerrit:510217VisualEditorHooks: Use isVisualAvailable() when changing tabs/editsections + DesktopArticleTarget.init: Allow veaction=edit to override namespace settings (T221892)|gerrit:510218DesktopArticleTarget.init: Allow veaction=edit to override namespace settings (T221892) (duration: 01m 15s)
  • 12:20 akosiaris: depool esams, network issues
  • 11:47 akosiaris@deploy1001: scap-helm mathoid finished
  • 11:47 akosiaris@deploy1001: scap-helm mathoid cluster staging completed
  • 11:46 akosiaris@deploy1001: scap-helm mathoid upgrade --wait -f mathoid-staging-values.yaml staging stable/mathoid [namespace: mathoid, clusters: staging]
  • 11:41 akosiaris@deploy1001: scap-helm citoid finished
  • 11:41 akosiaris@deploy1001: scap-helm citoid cluster eqiad completed
  • 11:41 akosiaris@deploy1001: scap-helm citoid upgrade -f citoid-eqiad-values.yaml production stable/citoid [namespace: citoid, clusters: eqiad]
  • 11:32 akosiaris@deploy1001: scap-helm citoid finished
  • 11:32 akosiaris@deploy1001: scap-helm citoid cluster codfw completed
  • 11:31 akosiaris@deploy1001: scap-helm citoid upgrade -f citoid-codfw-values.yaml production stable/citoid [namespace: citoid, clusters: codfw]
  • 11:31 godog: bootstrap restbase1019-a - T219404
  • 11:29 akosiaris: upgrade to statsd_export 0.9 for citoid T220709
  • 11:27 akosiaris@deploy1001: scap-helm citoid finished
  • 11:27 akosiaris@deploy1001: scap-helm citoid cluster staging completed
  • 11:27 akosiaris@deploy1001: scap-helm citoid upgrade -f citoid-staging-values.yaml staging stable/citoid [namespace: citoid, clusters: staging]
  • 10:31 elukey: superset.wikimedia.org moved to analytics-tool1004 (Buster + python 3.7 + Superset 0.32 upgrade)
  • 10:27 moritzm: installing linux 4.9.168-1+deb9u2 kernel on stretch hosts (no reboots, just installing the new package)
  • 10:04 elukey@deploy1001: Finished deploy [analytics/superset/deploy@9cdb9c5]: Superset 0.32 - update pyhive dependency (duration: 00m 26s)
  • 10:04 elukey@deploy1001: Started deploy [analytics/superset/deploy@9cdb9c5]: Superset 0.32 - update pyhive dependency
  • 09:33 hashar: Disable CI castor cache system since the instance is being migrated. Some / most CI jobs might have failed for the last 20 minutes or so T223148
  • 08:45 elukey@deploy1001: Finished deploy [analytics/superset/deploy@31c2c30]: Superset 0.32 (duration: 00m 26s)
  • 08:44 elukey@deploy1001: Started deploy [analytics/superset/deploy@31c2c30]: Superset 0.32
  • 08:36 elukey: stop superset on analytics-tool1003 as prep step for the migration to the new host - T212243
  • 08:31 moritzm: rebooting mw2164
  • 07:33 elukey: restart nutcracker on mw2245 to pick up config changes (removal of memcached config)
  • 07:29 elukey: powercycle an-worker1094 (OEM event occurred, checking if temporary)
  • 07:21 oblivian@deploy1001: Synchronized wmf-config/InitialiseSettings.php: Remove the php7 beta feature T219128 (duration: 00m 59s)
  • 06:24 elukey: force remount of /mnt/hdfs on stat1007 - fuse hdfs stuck
  • 01:40 eileen: process control updated - omnigroupmember.load re-enabled
  • 01:39 eileen: civicrm revision changed from 5024c968ed to 4b6d569383, config revision is a099f13a55

2019-05-14

  • 20:44 herron@deploy1001: Finished deploy [logstash/plugins@7fb8843]: adding logstash-filter-truncate plugin (duration: 00m 07s)
  • 20:43 herron@deploy1001: Started deploy [logstash/plugins@7fb8843]: adding logstash-filter-truncate plugin
  • 20:41 herron@deploy1001: Finished deploy [logstash/plugins@7fb8843]: (no justification provided) (duration: 00m 01s)
  • 20:41 herron@deploy1001: Started deploy [logstash/plugins@7fb8843]: (no justification provided)
  • 20:13 chaomodus: restarting gerrit on cobalt to pick up metrics export changes
  • 19:37 herron: adding logstash filter truncate plugin to prod logstash collectors
  • 19:28 gehel: shutting down elastic2038 for memory replacement - T217398
  • 19:25 gehel: ban elastic2038 from elasticsearch cluster for memory replacement - T217398
  • 18:21 mutante: mwmaint1002 - deleting /root/home-mwmaint2001 to save space - confirmed we have bacula backups of home on mwmaint2001
  • 17:55 mutante: elastic2029 - enable puppet agent - was disabled without reason and nobody seems to have logged in recently
  • 17:54 mutante: elastic2038 - restart nagios-nrpe-server - attempt to fix "CHECK_NRPE STATE UNKNOWN" for a single check
  • 17:32 mutante: contint1001 - mkdir /srv/zuul-logs ; mv /var/log/zuul/debug.log* /srv/zuul-logs/ to prevent CI running out of disk again (T207707)
  • 17:22 mbsantos@deploy1001: Finished deploy [proton/deploy@881b22b]: Update chromium-render to 8cc96e7 make timeout handler more robust (T217724) (duration: 02m 23s)
  • 17:20 mbsantos@deploy1001: Started deploy [proton/deploy@881b22b]: Update chromium-render to 8cc96e7 make timeout handler more robust (T217724)
  • 16:30 jynus: stop replication and start table recompression on labsdb1009 T222978
  • 16:22 godog: statsd_exporter 0.9 upgrade on thumbor - T220709
  • 16:04 gilles@deploy1001: Finished deploy [performance/coal@5a32eb2]: T221401 (duration: 00m 06s)
  • 16:04 gilles@deploy1001: Started deploy [performance/coal@5a32eb2]: T221401
  • 15:56 jforrester@deploy1001: Synchronized php-1.34.0-wmf.4/extensions/VisualEditor/includes/ApiVisualEditor.php: Hot-deploy VE unset variable fix T223281 (duration: 00m 55s)
  • 15:51 jforrester@deploy1001: Synchronized php-1.34.0-wmf.5/extensions/VisualEditor/includes/ApiVisualEditor.php: Hot-deploy VE unset variable fix T223281 (duration: 00m 57s)
  • 15:49 crusnov@deploy1001: Finished deploy [netbox/deploy@81059c6]: Deploy new reqs for reports (duration: 00m 55s)
  • 15:49 crusnov@deploy1001: Started deploy [netbox/deploy@81059c6]: Deploy new reqs for reports
  • 15:43 jynus: reload haproxy config @ dbproxy1010, dbproxy1011
  • 15:38 XioNoX: re-activate bgp to telia on cr1-codfw - T222967
  • 15:33 XioNoX: deactivate bgp to telia on cr1-codfw - T222967
  • 15:19 papaul: shutting down elastic2038 for memory replacement
  • 15:14 hashar: mw1263: scap pull
  • 14:53 hashar@deploy1001: rebuilt and synchronized wikiversions files: group0 to 1.34.0-wmf.5
  • 14:50 moritzm: rebooting mw1263 for kernel update
  • 14:47 hashar@deploy1001: Finished scap: testwiki to 1.34.0-wmf.5 and rebuild l10n cache (duration: 62m 47s)
  • 14:07 _joe_: apt-get lean on mwmaint1002
  • 13:44 hashar@deploy1001: Started scap: testwiki to 1.34.0-wmf.5 and rebuild l10n cache
  • 13:44 godog: rearm keyholder on deploy and cumin hosts
  • 13:27 hashar@deploy1001: Finished scap: testwiki to 1.34.0-wmf.5 and rebuild l10n cache (duration: 14m 39s)
  • 13:12 hashar: train delay, I forgot to sync 1.34.0-wmf.5
  • 13:12 hashar@deploy1001: Started scap: testwiki to 1.34.0-wmf.5 and rebuild l10n cache
  • 12:37 jforrester@deploy1001: Synchronized php-1.34.0-wmf.4/extensions/VisualEditor/: Hot-deploy T223023 fix I1b35b28e42 for mobile VE edit section switches (duration: 00m 54s)
  • 12:10 moritzm: rebooting mw2164 for kernel update
  • 11:33 hashar@deploy1001: Pruned MediaWiki: 1.33.0-wmf.24 (duration: 03m 20s)
  • 11:30 hashar: Deleting 1.33.0-wmf.24 from deploy1001 # T220730
  • 11:28 kart_: EU-Mid day SWAT Done.
  • 11:25 kartik@deploy1001: Synchronized wmf-config/InitialiseSettings.php: SWAT Decrease idwiki MT thresold for publishing|gerrit:508818Decrease idwiki MT thresold for publishing (T222782) (duration: 00m 51s)
  • 11:23 hashar@deploy1001: clean aborted: Pruned MediaWiki: 1.33.0-wmf.23 (duration: 14m 31s)
  • 11:23 jbond42: cumin1001 ~ % sudo cumin A:all '/usr/local/sbin/run-puppet-agent --failed-only
  • 11:18 jbond42: enable puppet issue fixed https://gerrit.wikimedia.org/r/c/operations/puppet/+/510131
  • 11:12 ema: pool cp3036 reimaged to ATS T222937
  • 11:09 hashar: Deleting 1.33.0-wmf.23 from deploy1001 # T220730
  • 11:09 jbond42: disable puppet
  • 10:58 hashar: scap prep 1.34.0-wmf.5 # T220730
  • 10:16 hashar: Cutting branches for 1.34.0-wmf.5
  • 10:01 ema: depool cp3036 and reimage as upload_ats T222937
  • 09:55 marostegui@deploy1001: Synchronized wmf-config/db-eqiad.php: Remove db2034 from config T219493 (duration: 00m 49s)
  • 09:53 marostegui@deploy1001: scap failed: average error rate on 4/11 canaries increased by 10x (rerun with --force to override this check, see https://logstash.wikimedia.org/goto/db09a36be5ed3e81155041f7d46ad040 for details)
  • 09:52 marostegui: Remove db2034 from tendril and zarcillo - T219493
  • 09:51 marostegui@deploy1001: Synchronized wmf-config/db-codfw.php: Remove db2034 from config T219493 (duration: 00m 50s)
  • 09:34 jynus: restart apache on ununpentium
  • 09:29 marostegui: Parsercache deployment window FINISHED
  • 09:27 marostegui@deploy1001: Synchronized wmf-config/db-eqiad.php: Deploy second parsercache key change everywhere after deploying it in batches first T210725 (duration: 00m 50s)
  • 09:15 godog: statsd_exporter 0.9 upgrade on ores - T220709
  • 09:02 godog: statsd_exporter 0.9 upgrade on logstash - T220709
  • 08:53 jynus: failing connections over dbproxy1006 to dbproxy1001
  • 07:48 moritzm: installing bind security updates for stretch (only client-side tools/libraries in use)
  • 06:45 ema: cp-ats: upgrade trafficserver to 8.0.3-1wm2
  • 06:20 ema: cp4021: upgrade trafficserver to 8.0.3-1wm2
  • 06:15 ema: upload trafficserver 8.0.3-1wm2 to stretch-wikimedia
  • 06:02 marostegui: Deploy parsercache change to eqiad canaries - T210725
  • 06:01 marostegui: Lock wmf-config deployment on deploy1001 to slowly change parsercache key on eqiad - T210725
  • 06:01 marostegui@deploy1001: Synchronized wmf-config/db-codfw.php: Change parsercache on codfw T210725 (duration: 00m 54s)
  • 01:55 mutante: re-scheduled nginx / HTTP availability icinga checks
  • 01:42 mutante: cumin -b 6 'R:git::clone' 'run-puppet-agent -q --failed-only'
  • 01:37 mutante: restarting Gerrit to apply 2 config changes - disable DNS reverse lookup (gerrit:508127) & list projects from index (gerrit:508892) - removes blockers for 2.16 upgrade (T200739)
  • 00:32 mutante: restarting wikibugs because it left some channels

2019-05-13

  • 20:29 ejegg: updated payments-wiki from 6e0172bac3 to 8397ccf9cc
  • 20:24 halfak@deploy1001: Finished deploy [ores/deploy@c17a1a2]: T202202 (duration: 04m 16s)
  • 20:20 halfak@deploy1001: Started deploy [ores/deploy@c17a1a2]: T202202
  • 20:19 ariel@deploy1001: Finished deploy [dumps/dumps@941d374]: lbzip2 decompression for 7z file production for big wikis (duration: 00m 03s)
  • 20:19 ariel@deploy1001: Started deploy [dumps/dumps@941d374]: lbzip2 decompression for 7z file production for big wikis
  • 20:04 halfak@deploy1001: Started deploy [ores/deploy@c17a1a2]: T202202
  • 18:29 otto@deploy1001: Synchronized wmf-config/InitialiseSettings.php: re-sync: re-enabling all eventgate-analytics monolog events - T222962 (duration: 00m 49s)
  • 18:28 ejegg: updated SmashPig standalone deploy 22b6982 Try turning off WSDL caching for Adyen
  • 18:25 reedy@deploy1001: Synchronized wmf-config/CommonSettings.php: T222954 (duration: 00m 49s)
  • 18:19 otto@deploy1001: Synchronized wmf-config/InitialiseSettings.php: re-enabling all eventgate-analytics monolog events - T222962 (duration: 00m 50s)
  • 18:17 ottomata: re-enabling all eventgate-analytics monolog events - T222962
  • 18:12 reedy@deploy1001: Synchronized wmf-config/InitialiseSettings.php: T223006 T222740 T222044 (duration: 00m 49s)
  • 18:07 otto@deploy1001: scap-helm eventgate-analytics finished
  • 18:07 otto@deploy1001: scap-helm eventgate-analytics cluster eqiad completed
  • 18:07 otto@deploy1001: scap-helm eventgate-analytics upgrade analytics -f analytics/eqiad-values.yaml --reset-values stable/eventgate [namespace: eventgate-analytics, clusters: eqiad]
  • 18:05 otto@deploy1001: scap-helm eventgate-analytics finished
  • 18:05 otto@deploy1001: scap-helm eventgate-analytics cluster codfw completed
  • 18:04 otto@deploy1001: scap-helm eventgate-analytics upgrade analytics -f analytics/codfw-values.yaml --reset-values stable/eventgate [namespace: eventgate-analytics, clusters: codfw]
  • 18:03 fsero: deleting eventgate-analytics-production releases on codfw
  • 18:01 otto@deploy1001: scap-helm eventgate-analytics finished
  • 18:01 otto@deploy1001: scap-helm eventgate-analytics cluster staging completed
  • 18:01 otto@deploy1001: scap-helm eventgate-analytics install -n analytics -f analytics/staging-values.yaml stable/eventgate [namespace: eventgate-analytics, clusters: staging]
  • 17:57 fsero: deleting eventgate-analytics and eventgate-analytics-staging releases on staging
  • 17:41 otto@deploy1001: Synchronized wmf-config/InitialiseSettings.php: retry - disabling all eventgate-analytics monolog events for eventgate chart migration - T222962 (duration: 00m 50s)
  • 17:11 otto@deploy1001: Synchronized wmf-config/InitialiseSettings.php: disabling all eventgate-analytics monolog events for eventgate chart migration - T222962 (duration: 00m 50s)
  • 17:10 ottomata: disabling all eventgate-analytics monolog events for eventgate chart migration - T222962
  • 16:14 Amir1: removing tokipona language terms from items using maintenance script (T200432)
  • 16:00 andrewbogott: reimaging clouvirt1024 (for the last time I hope)
  • 14:33 otto@deploy1001: Synchronized wmf-config/ProductionServices.php: no-op in prod - Configure eventgate services in beta (duration: 00m 49s)
  • 14:32 otto@deploy1001: Synchronized wmf-config/LabsServices.php: no-op in prod - Configure eventgate services in beta (duration: 00m 49s)
  • 14:05 moritzm: uploaded puppet 4.8.2-5+wmf1 to component/puppetdb4 for apt.wikimedia.org/stretch-wikimedia (T219803)
  • 14:00 elukey: roll restart of aqs on aqs1* to pick up new druid settings
  • 13:50 cdanis: cdanis@cumin1001.eqiad.wmnet ~ % sudo cumin -p95 -b8 'ms-fe2*' 'run-puppet-agent'
  • 13:46 moritzm: updating puppet on deployment-puppetmaster03 to 4.8.2-5+wmf1 (T219803)
  • 13:39 akosiaris: bump eventgate-analytics chart to 0.0.36. Renames nodejs GC stats to microseconds and bumps the biggest bucket to 100ms. T220709
  • 13:38 akosiaris@deploy1001: scap-helm eventgate-analytics finished
  • 13:38 akosiaris@deploy1001: scap-helm eventgate-analytics cluster staging completed
  • 13:38 akosiaris@deploy1001: scap-helm eventgate-analytics upgrade -f eventgate-analytics-staging-values.yaml staging stable/eventgate-analytics [namespace: eventgate-analytics, clusters: staging]
  • 13:38 akosiaris@deploy1001: scap-helm eventgate-analytics finished
  • 13:38 akosiaris@deploy1001: scap-helm eventgate-analytics cluster eqiad completed
  • 13:38 akosiaris@deploy1001: scap-helm eventgate-analytics upgrade -f eventgate-analytics-eqiad-values.yaml production stable/eventgate-analytics [namespace: eventgate-analytics, clusters: eqiad]
  • 13:38 akosiaris@deploy1001: scap-helm eventgate-analytics finished
  • 13:38 akosiaris@deploy1001: scap-helm eventgate-analytics cluster codfw completed
  • 13:38 akosiaris@deploy1001: scap-helm eventgate-analytics upgrade -f eventgate-analytics-codfw-values.yaml production stable/eventgate-analytics [namespace: eventgate-analytics, clusters: codfw]
  • 13:36 anomie@deploy1001: Synchronized wmf-config/InitialiseSettings.php: Setting actor migration to write-both/read-new on all wikis (T188327) (duration: 00m 50s)
  • 13:30 cdanis: cdanis@cumin1001.eqiad.wmnet ~ % sudo cumin -p95 -b8 'ms-be2*' 'run-puppet-agent'
  • 13:29 cdanis: swift codfw-prod: deploy I1035824d
  • 13:25 moritzm: uploaded puppetdb 4.4.0-1~wmf2 to component/puppetdb4 for apt.wikimedia.org/stretch-wikimedia (T219803)
  • 13:07 akosiaris: bump cxserver chart to 0.0.7. Renames nodejs GC stats to microseconds and bumps the biggest bucket to 100ms. T220709
  • 13:06 akosiaris@deploy1001: scap-helm cxserver finished
  • 13:06 akosiaris@deploy1001: scap-helm cxserver cluster staging completed
  • 13:06 akosiaris@deploy1001: scap-helm cxserver upgrade -f cxserver-staging-values.yaml staging stable/cxserver [namespace: cxserver, clusters: staging]
  • 13:06 akosiaris@deploy1001: scap-helm cxserver finished
  • 13:06 akosiaris@deploy1001: scap-helm cxserver cluster eqiad completed
  • 13:06 akosiaris@deploy1001: scap-helm cxserver upgrade -f cxserver-eqiad-values.yaml production stable/cxserver [namespace: cxserver, clusters: eqiad]
  • 13:06 akosiaris@deploy1001: scap-helm cxserver finished
  • 13:06 akosiaris@deploy1001: scap-helm cxserver cluster codfw completed
  • 13:05 akosiaris@deploy1001: scap-helm cxserver upgrade -f cxserver-codfw-values.yaml production stable/cxserver [namespace: cxserver, clusters: codfw]
  • 13:04 arturo: install libjs-jquery from stretch in cloudnet servers T222862
  • 13:03 arturo: enable puppet in cloudvirt1024 to refresh some apt config T222862
  • 12:50 moritzm: updating puppetdb on deployment-puppetdb02 to 4.4.0-1~wmf2 (T219803)
  • 12:36 cdanis: root@ms-be2013.codfw.wmnet ~ # umount /srv/swift-storage/sda1 && mount /srv/swift-storage/sda1 && umount /srv/swift-storage/sdb1 && mount /srv/swift-storage/sdb1
  • 12:36 krinkle@deploy1001: Synchronized php-1.34.0-wmf.4/resources/src/startup/startup.js: I76a2c8d52fa (duration: 00m 51s)
  • 12:33 cdanis: root@ms-be2013.codfw.wmnet ~ # mount /srv/swift-storage/sdf1
  • 12:25 cdanis: cdanis@ms-be2015.codfw.wmnet ~ % sudo umount /srv/swift-storage/sdl1 && sudo mount /srv/swift-storage/sdl1
  • 12:25 cdanis: cdanis@ms-be2015.codfw.wmnet ~ % sudo umount /srv/swift-storage/sdf1 && sudo mount /srv/swift-storage/sdf1
  • 12:18 cdanis: cdanis@ms-be2015.codfw.wmnet /var/log % sudo mount /srv/swift-storage/sda1
  • 12:08 reedy@deploy1001: Synchronized php-1.34.0-wmf.4/extensions/Wikibase/lib/includes/Formatters/CachingKartographerEmbeddingHandler.php: T223085 (duration: 00m 50s)
  • 11:59 reedy@deploy1001: Synchronized php-1.34.0-wmf.4/composer.json: T215746 (duration: 00m 49s)
  • 11:58 reedy@deploy1001: Synchronized php-1.34.0-wmf.4/vendor/: T215746 (duration: 01m 30s)
  • 11:43 reedy@deploy1001: Synchronized php-1.34.0-wmf.4/extensions/VisualEditor/: T222639 (duration: 00m 52s)
  • 11:04 ema: cp-ats rolling restart to apply https://gerrit.wikimedia.org/r/#/c/operations/puppet/+/509456/
  • 10:39 jforrester@deploy1001: Synchronized php-1.34.0-wmf.4/includes/http/HttpRequestFactory.php: T222935 Hot-deploy fix for HttpRequestFactory (duration: 00m 50s)
  • 10:38 jbond42: update puppet5 and facter3 in eqiad
  • 10:17 vgutierrez: rebooting cloudvirt1024 - T209707
  • 09:40 marostegui@deploy1001: Synchronized wmf-config/db-eqiad.php: Depool db1064 T217396 (duration: 00m 49s)
  • 09:33 hashar: Upgrading Zuul 2.5.1-wmf7 -> 2.5.1-wmf9 T105474
  • 07:27 marostegui@deploy1001: Synchronized wmf-config/db-eqiad.php: Fully pool db1130 (s5) and db1138 (s4) T222682 (duration: 00m 50s)
  • 07:08 elukey: slow roll restart of celery on ores* nodes to allow cores to be generated upon segfault - T222866
  • 07:05 marostegui@deploy1001: Synchronized wmf-config/db-eqiad.php: More traffic for db1130 (s5) and db1138 (s4) T222682 (duration: 00m 50s)
  • 06:53 moritzm: installing ghostscript security updates
  • 06:44 marostegui@deploy1001: Synchronized wmf-config/db-eqiad.php: More traffic for db1130 (s5) and db1138 (s4) T222682 (duration: 00m 49s)
  • 06:09 marostegui: Compress s2, s6 and s7 on labsdb1012 - T222978
  • 05:50 marostegui@deploy1001: Synchronized wmf-config/db-eqiad.php: More traffic for db1130 (s5) and db1138 (s4) T222682 (duration: 00m 49s)
  • 05:41 marostegui: Optimize tables on pc2007
  • 05:18 marostegui@deploy1001: Synchronized wmf-config/db-eqiad.php: Pool db1130 into s5 and db1138 into s4 T222682 (duration: 00m 49s)
  • 05:17 marostegui@deploy1001: Synchronized wmf-config/db-codfw.php: Pool db1130 into s5 and db1138 into s4 T222682 (duration: 00m 51s)

2019-05-12

  • 15:32 elukey: rollback python-kafka one eventlog1002 to 1.4.1-1~stretch1 - T222941
  • 12:14 elukey: restart eventlogging on eventlog1002 - all processors stuck due to kafka python (T222941)
  • 05:31 marostegui: DIsable notifications for db1116:s8 Slave LAG check as this is a snapshot source

2019-05-11

  • 18:26 reedy@deploy1001: Synchronized wmf-config/interwiki.php: Update interwiki cache (duration: 02m 57s)
  • 06:37 elukey: restart eventlogging on eventlog1002 - huge kafka consumer lag accumulated (T222941)
  • 02:01 mutante: actinium - low disk space - apt-get clean - gzip /var/log/squid3/access.log.1

2019-05-10

  • 18:58 cdanis: cdanis@cumin1001.eqiad.wmnet ~ % sudo cumin -b 15 -p 95 '*' 'run-puppet-agent -q --failed-only'
  • 18:51 cdanis: cdanis@cumin1001.eqiad.wmnet ~ % sudo cumin -b 15 -p 95 '*' 'run-puppet-agent -q --failed-only'
  • 18:49 cdanis: cdanis@cumin1001.eqiad.wmnet ~ % sudo cumin '*' 'enable-puppet "Puppet breakages on all hosts -- cdanis"'
  • 18:39 cdanis: cdanis@cumin1001.eqiad.wmnet ~ % sudo cumin '*' 'disable-puppet "Puppet breakages on all hosts -- cdanis"'
  • 16:50 reedy@deploy1001: Synchronized dblists/: Update size related dblists (duration: 00m 49s)
  • 16:31 ebernhardson: drop archive indices from cloudelastic
  • 16:11 ariel@deploy1001: Finished deploy [dumps/dumps@70e8498]: look for dumpstatus json file per wiki run (duration: 00m 05s)
  • 16:11 ariel@deploy1001: Started deploy [dumps/dumps@70e8498]: look for dumpstatus json file per wiki run
  • 16:05 ejegg: moved adyen smashpig job runner to frdev1001
  • 15:25 _joe_: wiped opcache clean on all api, appservers
  • 15:05 cdanis: cdanis@mw1239.eqiad.wmnet ~ % sudo php7adm /opcache-free
  • 15:05 Krinkle: fix opcache krinkle@mw1268:~$ scap pull
  • 15:04 cdanis: cdanis@mw1268.eqiad.wmnet ~ % sudo php7adm /opcache-free
  • 15:03 Krinkle: ran 'scap pull' on mw1239.eqiad.wmnet to fix opcache corruption
  • 14:56 jbond42: uploade zuul_2.5.10-wmf9 to jessie-wikimedia
  • 14:54 krinkle@deploy1001: Synchronized wmf-config/CommonSettings.php: T99740 / d9dbecad9c7b (duration: 00m 51s)
  • 14:33 akosiaris@deploy1001: scap-helm eventgate-analytics finished
  • 14:32 akosiaris@deploy1001: scap-helm eventgate-analytics cluster staging completed
  • 14:32 akosiaris@deploy1001: scap-helm eventgate-analytics upgrade -f lala.yaml staging stable/eventgate-analytics [namespace: eventgate-analytics, clusters: staging]
  • 14:30 akosiaris@deploy1001: scap-helm eventgate-analytics finished
  • 14:30 akosiaris@deploy1001: scap-helm eventgate-analytics cluster eqiad completed
  • 14:30 akosiaris@deploy1001: scap-helm eventgate-analytics upgrade -f eventgate-analytics-eqiad-values.yaml production stable/eventgate-analytics [namespace: eventgate-analytics, clusters: eqiad]
  • 14:30 akosiaris@deploy1001: scap-helm eventgate-analytics finished
  • 14:30 akosiaris@deploy1001: scap-helm eventgate-analytics cluster codfw completed
  • 14:30 akosiaris@deploy1001: scap-helm eventgate-analytics upgrade -f eventgate-analytics-codfw-values.yaml production stable/eventgate-analytics [namespace: eventgate-analytics, clusters: codfw]
  • 13:30 ema: pool cp3038 w/ ATS backend T222937
  • 12:19 ema: depool cp3038 and reimage as upload_ats T222937
  • 11:52 jbond42: (un)load edac kernel modules on elastic1029 to test resetting counters
  • 11:04 jbond42: restart refinery-eventlogging-saltrotate on an-coord1001
  • 10:30 moritzm: installing symfony security updates
  • 09:17 jynus: disabling replication lag alerts for backup source hosts on s1, s4, s8 T206203
  • 07:14 moritzm: uploaded linux-meta 1.21 for jessie-wikimedia (pointing to the new -9 ABI introduced with the 4.9.168 kernel)
  • 07:12 marostegui@deploy1001: Synchronized wmf-config/db-eqiad.php: Fully repool db1100 into API (duration: 00m 50s)
  • 06:55 ema: swift-fe: rolling restart to enable ensure_max_age T222937
  • 06:40 marostegui@deploy1001: Synchronized wmf-config/db-eqiad.php: Repool db1100 into API (duration: 00m 50s)
  • 06:27 ema: ms-fe1005: pool with ensure_max_age T222937
  • 06:26 ariel@deploy1001: Finished deploy [dumps/dumps@6f9a5a4]: remove sleep between incr dumps of wikis (duration: 00m 05s)
  • 06:26 ariel@deploy1001: Started deploy [dumps/dumps@6f9a5a4]: remove sleep between incr dumps of wikis
  • 06:22 marostegui@deploy1001: Synchronized wmf-config/db-eqiad.php: Repool db1100 (duration: 00m 50s)
  • 06:18 ema: ms-fe1005: depool and test ensure_max_age T222937
  • 06:09 _joe_: depooling mw1261 for tests
  • 05:41 marostegui@deploy1001: Synchronized wmf-config/db-codfw.php: Pool db2105 db2109 into s3 T222772 (duration: 00m 49s)
  • 05:40 marostegui@deploy1001: Synchronized wmf-config/db-eqiad.php: Pool db2105 db2109 into s3 T222772 (duration: 00m 52s)
  • 05:40 elukey: execute kafka preferred-replica-election on kafka-jumbo1001 as attempt to rebalance traffic (1002 seems handling way more than others since some days)
  • 05:32 elukey: restart eventlogging daemons on eventlog1002 - kafka consumer errors in the logs, some lag built over time
  • 05:08 marostegui: Stop MySQL on db1100
  • 05:04 marostegui@deploy1001: Synchronized wmf-config/db-eqiad.php: Depool db1100 (duration: 00m 50s)
  • 04:56 marostegui@deploy1001: Synchronized wmf-config/db-codfw.php: Repool db2112 (duration: 00m 51s)
  • 00:15 smalyshev@deploy1001: Finished deploy [wdqs/wdqs@e13facb]: Downgrade LDF server back for T222471 (duration: 00m 37s)
  • 00:14 smalyshev@deploy1001: Started deploy [wdqs/wdqs@e13facb]: Downgrade LDF server back for T222471

2019-05-09

  • 23:52 ebernhardson@deploy1001: Synchronized wmf-config/InitialiseSettings.php: T220625: Dont write to private wikis on cloudelastic (duration: 00m 50s)
  • 23:48 ebernhardson@deploy1001: Synchronized php-1.34.0-wmf.4/extensions/CirrusSearch/: T220819 Uniquely identify connections in connection pool (duration: 00m 58s)
  • 23:43 ebernhardson@deploy1001: Synchronized php-1.34.0-wmf.4/extensions/CirrusSearch/: T220625 Limit the clusters archive index is written to (duration: 00m 59s)
  • 23:41 ebernhardson@deploy1001: Synchronized php-1.34.0-wmf.4/extensions/Wikibase/view/resources/jquery/wikibase/jquery.wikibase.entityselector.js: T172937 T222346 Revert Close entityselector after selecting exact match (duration: 00m 51s)
  • 23:24 chaomodus: spicerack upgraded to 0.0.25 on cumin1001 and cumin 2001
  • 22:58 volans: uploaded spicerack_0.0.25-1_amd64.deb to apt.wikimedia.org stretch-wikimedia
  • 22:57 bawolff: Manually cleared extdistributor cache T188692
  • 22:50 mutante: labweb1001/labweb1002 - remove "runJob" cron job from www-data's crontab, it is already also a systemd timer and puppet was meant to remove it (T222917)
  • 21:27 foks: change user email for Melamrawy (WMF)@global
  • 21:23 mholloway-shell@deploy1001: Synchronized wmf-config/InitialiseSettings.php: Enable WikipediaAppCaptionEditCounter (T222211) (duration: 00m 52s)
  • 19:56 thcipriani@deploy1001: rebuilt and synchronized wikiversions files: all wikis to 1.34.0-wmf.4
  • 19:28 XioNoX: renumber mr1-esams<->cr2-knams link to 91.198.174.224/31 - T211254
  • 19:24 XioNoX: renumber mr1-esams<->cr1-esams link to 91.198.174.240/31 - T211254
  • 18:22 XioNoX: simplify filter analytics-in4 term mysql-dbstore on cr1/2-eqiad
  • 16:17 marostegui@deploy1001: Synchronized wmf-config/db-eqiad.php: Restore original weight on db1084 (duration: 00m 59s)
  • 16:07 marostegui@deploy1001: Synchronized wmf-config/db-eqiad.php: More traffic to db1081 (duration: 01m 13s)
  • 15:55 marostegui@deploy1001: Synchronized wmf-config/db-eqiad.php: More traffic to db1081 (duration: 01m 01s)
  • 15:40 marostegui@deploy1001: Synchronized wmf-config/db-eqiad.php: Slowly repool db1081 (duration: 01m 00s)
  • 15:37 marostegui@deploy1001: Synchronized wmf-config/db-codfw.php: Depool db2112 (duration: 00m 59s)
  • 15:25 marostegui@deploy1001: Synchronized wmf-config/db-eqiad.php: Slowly repool db1081 (duration: 00m 56s)
  • 15:20 marostegui: Stop mysql on db2112 for onsite work
  • 15:16 otto@deploy1001: scap-helm eventgate-main finished
  • 15:16 otto@deploy1001: scap-helm eventgate-main cluster eqiad completed
  • 15:16 otto@deploy1001: scap-helm eventgate-main install -n main -f main/eqiad-values.yaml stable/eventgate [namespace: eventgate-main, clusters: eqiad]
  • 15:13 otto@deploy1001: scap-helm eventgate-main finished
  • 15:13 otto@deploy1001: scap-helm eventgate-main cluster codfw completed
  • 15:13 otto@deploy1001: scap-helm eventgate-main install -n main -f main/codfw-values.yaml stable/eventgate [namespace: eventgate-main, clusters: codfw]
  • 15:12 papaul: shurtting down db2114 for main board replacement
  • 14:53 otto@deploy1001: scap-helm eventgate-main finished
  • 14:52 otto@deploy1001: scap-helm eventgate-main cluster staging completed
  • 14:52 otto@deploy1001: scap-helm eventgate-main upgrade main -f main/staging-values.yaml --reset-values stable/eventgate [namespace: eventgate-main, clusters: staging]
  • 14:48 moritzm: removing unused uwsgi packages from scb* hosts
  • 14:13 otto@deploy1001: scap-helm eventgate-main finished
  • 14:13 otto@deploy1001: scap-helm eventgate-main cluster staging completed
  • 14:13 otto@deploy1001: scap-helm eventgate-main upgrade main -f main/staging-values.yaml --reset-values stable/eventgate [namespace: eventgate-main, clusters: staging]
  • 13:34 bblack: recdns: wiping dyna.wikimedia.org from pdns-recursors
  • 13:13 fsero: running authdns-update for new docker-registry T221101
  • 12:49 fsero: switching traffic from old-registry to new registries registry[12]00[12] - T221101
  • 12:01 _joe_: reenabling puppet across the fleet
  • 11:57 jbond42: all puppetmasters and puppetdbs should be restored'
  • 11:55 jbond42: clean up old source files sudo cumin A:puppetmaster 'rm /etc/apt/sources.list.d/component-facter3.list /etc/apt/sources.list.d/component-puppet5.list'
  • 11:49 volans: updated netbox statues for decommissioning and spare hosts according to T222352
  • 11:23 jbond42: running sudo apt-get install puppet-master=4.8.2-5~bpo8+1 puppet-master-passenger=4.8.2-5~bpo8+1 on labtestpuppetmaster2001
  • 11:19 jbond42: running sudo apt-get install facter=2.4.6-1 puppet=4.8.2-5 puppet-master puppet-master-passenger on labpuppetmaster1001
  • 11:18 jbond42: starting puppetdb on puppetdb2001
  • 11:15 jbond42: run sudo apt-get install puppetdb on puppetdb2001
  • 11:14 jbond42: ran the folloowing on puppetdb2001 sudo apt-get install facter=2.4.6-1 puppet=4.8.2-5
  • 11:14 jbond42: ran the folloowing on puppetmaster200{1,2} sudo apt-get install facter=2.4.6-1 puppet=4.8.2-5 puppet-master puppet-master-passenger
  • 11:04 _joe_: disabling puppet across the fleet
  • 11:02 volans: stopped ircecho to avoid spam
  • 10:43 marostegui: Stop MySQL on db1081
  • 10:41 marostegui@deploy1001: Synchronized wmf-config/db-eqiad.php: Depool db1081 (duration: 00m 57s)
  • 10:29 marostegui@deploy1001: Synchronized wmf-config/db-eqiad.php: Give API traffic to db1129 (new host on s2) (duration: 00m 57s)
  • 10:15 _joe_: restarting low-traffic pybals in codfw, eqiad
  • 10:05 akosiaris: restart proton on proton1001. Host Out of memory T214975
  • 09:57 ariel@deploy1001: Finished deploy [dumps/dumps@ab56fdd]: reduce sleep time between dumps of adds-changes wikis still more (retry) (duration: 00m 06s)
  • 09:57 ariel@deploy1001: Started deploy [dumps/dumps@ab56fdd]: reduce sleep time between dumps of adds-changes wikis still more (retry)
  • 09:54 ariel@deploy1001: Finished deploy [dumps/dumps@ab56fdd]: reduce sleep time between dumps of adds-changes wikis still more (duration: 00m 06s)
  • 09:54 ariel@deploy1001: Started deploy [dumps/dumps@ab56fdd]: reduce sleep time between dumps of adds-changes wikis still more
  • 09:31 marostegui@deploy1001: Synchronized wmf-config/db-eqiad.php: More traffic to db1129 (new host on s2) (duration: 00m 57s)
  • 09:29 fsero@puppetmaster1001: conftool action : set/pooled=true; selector: dnsdisc=docker-registry,name=codfw
  • 09:12 marostegui@deploy1001: Synchronized wmf-config/db-eqiad.php: More traffic to db1076 and db1129 (new host on s2) (duration: 00m 56s)
  • 09:12 godog: bounce rsyslog on lithium
  • 09:01 marostegui@deploy1001: Synchronized wmf-config/db-eqiad.php: More traffic to db1076 and db1129 (new host on s2) (duration: 00m 56s)
  • 08:49 marostegui@deploy1001: Synchronized wmf-config/db-eqiad.php: More traffic to db1076 and db1129 (new host on s2) (duration: 00m 56s)
  • 08:38 marostegui@deploy1001: Synchronized wmf-config/db-eqiad.php: More traffic to db1076 and db1129 (new host on s2) (duration: 00m 57s)
  • 08:28 marostegui@deploy1001: Synchronized wmf-config/db-eqiad.php: More traffic to db1076 (duration: 00m 55s)
  • 08:23 elukey: upload uwsgi 2.0.14+20161117-3+deb9u2+wmf1 packages to stretch-wikimedia - T212697
  • 08:21 marostegui@deploy1001: Synchronized wmf-config/db-eqiad.php: Pool db1129 with low weight on s2 - T222682 (duration: 00m 56s)
  • 08:13 marostegui@deploy1001: Synchronized wmf-config/db-eqiad.php: Slowly repool db1076 (duration: 00m 56s)
  • 08:07 marostegui@deploy1001: Synchronized wmf-config/db-eqiad.php: Provision db1129, db2104, db2107, db2108 T222772 T222682 (duration: 00m 57s)
  • 08:06 marostegui@deploy1001: Synchronized wmf-config/db-codfw.php: Provision db1129, db2104, db2107, db2108 T222772 T222682 (duration: 00m 59s)
  • 07:54 moritzm: installing jquery security updates for stretch
  • 07:50 elukey: roll restart HDFS masters on an-master100[1,2] to pick up new logging settings
  • 07:23 moritzm: installing twitter-bootstrap3 security updates
  • 06:53 _joe_: restarted nagios-nrpe-server on proton1001
  • 05:58 marostegui@deploy1001: Synchronized wmf-config/db-codfw.php: Clarify disk status for db2103, db2112, db2116 (duration: 00m 58s)
  • 05:29 marostegui: Stop replication on db2098:s2
  • 05:25 marostegui: Stop MySQL on db1076
  • 05:22 marostegui@deploy1001: Synchronized wmf-config/db-eqiad.php: Depool db1076 (duration: 00m 57s)
  • 05:09 marostegui@deploy1001: Synchronized wmf-config/db-codfw.php: Pool db2103, db2112 and db2116 into s1 T222772 (duration: 01m 41s)
  • 05:07 marostegui@deploy1001: Synchronized wmf-config/db-eqiad.php: Pool db2103, db2112 and db2116 into s1 T222772 (duration: 01m 22s)
  • 04:55 marostegui@deploy1001: Synchronized wmf-config/db-eqiad.php: Repool pc1007 (duration: 00m 59s)
  • 00:57 twentyafterfour: stopped phd, now running `puppet agent --test` manually on phab1001
  • 00:08 twentyafterfour: phabricator upgrade successful
  • 00:04 twentyafterfour: starting phabricator deployment, momentary downtime expected (~1 minute)

2019-05-08

  • 23:06 krinkle@deploy1001: Synchronized php-1.34.0-wmf.3/includes/specials/SpecialWatchlist.php: T218511 / I423874 (duration: 00m 57s)
  • 23:00 krinkle@deploy1001: Synchronized php-1.34.0-wmf.4/extensions/CirrusSearch/includes/Hooks.php: T219342 / 164a7c1 (duration: 00m 59s)
  • 22:20 ejegg: re-enabled fundraising jobs
  • 22:15 ejegg: updated SmashPig standalone install from 78b92b7fef to 88fd9650ec
  • 22:14 ejegg: disabled fundraising jobs for SmashPig update
  • 22:08 jforrester@deploy1001: Synchronized wmf-config/InitialiseSettings.php: Stop setting wmgUseAdvancedSearch, no longer read; drop rcenhancedfilters from BF whitelist (duration: 00m 57s)
  • 22:06 jforrester@deploy1001: Synchronized wmf-config/CommonSettings.php: Unconditionally load AdvancedSearch everywhere, the config is always true (duration: 00m 57s)
  • 22:00 jforrester@deploy1001: Synchronized wmf-config/InitialiseSettings.php: Beta Feature config cleanup: doc change plus drop advancedsearch and templatewizard-betafeature (duration: 00m 57s)
  • 21:58 jforrester@deploy1001: Synchronized php-1.34.0-wmf.4/extensions/VisualEditor/includes/ApiVisualEditor.php: UBN T209599 ApiVisualEditor: Fix use of getBlockInfo() (duration: 00m 57s)
  • 21:52 niharika29@deploy1001: Synchronized php-1.34.0-wmf.4/tests/phpunit/: Fix Block::newLoad for IPv6 range blocks - follow-up to Ie8bebd8 T222246 (duration: 01m 09s)
  • 21:50 niharika29@deploy1001: Synchronized php-1.34.0-wmf.4/includes/Block.php: Fix Block::newLoad for IPv6 range blocks - follow-up to Ie8bebd8 T222246 (duration: 00m 59s)
  • 21:49 niharika29@deploy1001: sync aborted: php-1.34.0-wmf.4/includes/Block.php Fix Block::newLoad for IPv6 range blocks - follow-up to Ie8bebd8 T222246 (duration: 00m 03s)
  • 21:49 niharika29@deploy1001: Started scap: php-1.34.0-wmf.4/includes/Block.php Fix Block::newLoad for IPv6 range blocks - follow-up to Ie8bebd8 T222246
  • 20:12 thcipriani: restarting gerrit due to threads stuck behind sendemail
  • 20:10 gehel: upgrade to nodejs 10 for maps completed - T210704
  • 20:08 mbsantos@deploy1001: Finished deploy [kartotherian/deploy@7774721] (stretch): Deploy kartotherian node 10 build into maps1001 (T215852) (duration: 00m 20s)
  • 20:08 mbsantos@deploy1001: Started deploy [kartotherian/deploy@7774721] (stretch): Deploy kartotherian node 10 build into maps1001 (T215852)
  • 20:07 mbsantos@deploy1001: Finished deploy [tilerator/deploy@2736a69] (stretch): Deploy tilerator node 10 build into maps1001 (T215852) (duration: 00m 24s)
  • 20:07 mbsantos@deploy1001: Started deploy [tilerator/deploy@2736a69] (stretch): Deploy tilerator node 10 build into maps1001 (T215852)
  • 19:58 mbsantos@deploy1001: Finished deploy [tilerator/deploy@2736a69] (stretch): Deploy tilerator node 10 build into maps[12]004 (T215852) (duration: 00m 58s)
  • 19:57 mbsantos@deploy1001: Started deploy [tilerator/deploy@2736a69] (stretch): Deploy tilerator node 10 build into maps[12]004 (T215852)
  • 19:56 mbsantos@deploy1001: Finished deploy [kartotherian/deploy@7774721] (stretch): Deploy kartotherian node 10 build into maps[12]004 (T215852) (duration: 00m 59s)
  • 19:55 mbsantos@deploy1001: Started deploy [kartotherian/deploy@7774721] (stretch): Deploy kartotherian node 10 build into maps[12]004 (T215852)
  • 19:47 mbsantos@deploy1001: Finished deploy [kartotherian/deploy@7774721] (stretch): Deploy kartotherian node 10 build into maps[12]003 (T215852) (duration: 00m 54s)
  • 19:46 mbsantos@deploy1001: Started deploy [kartotherian/deploy@7774721] (stretch): Deploy kartotherian node 10 build into maps[12]003 (T215852)
  • 19:46 mbsantos@deploy1001: Finished deploy [tilerator/deploy@2736a69] (stretch): Deploy tilerator node 10 build into maps[12]003 (T215852) (duration: 00m 56s)
  • 19:45 mbsantos@deploy1001: Started deploy [tilerator/deploy@2736a69] (stretch): Deploy tilerator node 10 build into maps[12]003 (T215852)
  • 19:35 mbsantos@deploy1001: Finished deploy [tilerator/deploy@2736a69] (stretch): Deploy tilerator/kartotherian node 10 build into maps[12]002 (T215852) (duration: 01m 12s)
  • 19:33 mbsantos@deploy1001: Started deploy [tilerator/deploy@2736a69] (stretch): Deploy tilerator/kartotherian node 10 build into maps[12]002 (T215852)
  • 19:32 mbsantos@deploy1001: Finished deploy [kartotherian/deploy@7774721] (stretch): Deploy tilerator node 10 build into maps[12]002 (T215852) (duration: 00m 57s)
  • 19:31 mbsantos@deploy1001: Started deploy [kartotherian/deploy@7774721] (stretch): Deploy tilerator node 10 build into maps[12]002 (T215852)
  • 19:26 gehel: continue upgrade to nodejs 10 for maps - T210704
  • 19:22 thcipriani@deploy1001: Synchronized php: group1 wikis to 1.34.0-wmf.4 (duration: 01m 48s)
  • 19:21 cdanis: swift codfw-prod: deploy I59c88aed T221068
  • 19:20 thcipriani@deploy1001: rebuilt and synchronized wikiversions files: group1 wikis to 1.34.0-wmf.4
  • 19:01 cdanis: T221904 cdanis@cumin1001.eqiad.wmnet ~ % sudo cumin 'ms-be2*[4,7].codfw.wmnet' 'for DISK in /sys/block/sd*/queue/scheduler ; do echo cfq > $DISK ; done'
  • 18:09 mutante: restarting gerrit to apply logging changes (gerrit:508391)
  • 17:58 bblack: public authdns: deploying the big DYNA/CNAME change in https://gerrit.wikimedia.org/r/c/operations/dns/+/507399
  • 17:44 jforrester@deploy1001: Synchronized wmf-config/extension-list: Re-sort extension-list (prod no-op) (duration: 00m 56s)
  • 17:42 jforrester@deploy1001: Synchronized wmf-config/env.php: Clean-up: Allow for running outside the cluster for local testing (no-op for prod) (duration: 00m 56s)
  • 17:22 mholloway-shell@deploy1001: Synchronized wmf-config/InitialiseSettings-labs.php: Retry: Enable WikimediaEditorTasks on Beta commonswiki (duration: 00m 57s)
  • 17:15 mholloway-shell@deploy1001: Synchronized wmf-config/InitialiseSettings-labs.php: Enable WikimediaEditorTasks on Beta commonswiki (duration: 00m 57s)
  • 16:55 otto@deploy1001: scap-helm eventgate-main finished
  • 16:55 otto@deploy1001: scap-helm eventgate-main cluster staging completed
  • 16:55 otto@deploy1001: scap-helm eventgate-main upgrade main -f main/staging-values.yaml --reset-values stable/eventgate [namespace: eventgate-main, clusters: staging]
  • 16:08 gehel: restart tileratorui on maps2001 - T222801
  • 15:59 jynus: restart db2117 after first puppet run
  • 15:56 mforns@deploy1001: Finished deploy [analytics/refinery@698f213]: deploying analytics-refinery up to 698f213 with source=v0.0.89 (duration: 15m 38s)
  • 15:52 gehel: reset authentication on cassandra / maps / codfw - T222801
  • 15:40 mforns@deploy1001: Started deploy [analytics/refinery@698f213]: deploying analytics-refinery up to 698f213 with source=v0.0.89
  • 15:19 moritzm: installing ruby-i18n security updates
  • 15:14 moritzm: installing rails security updates
  • 15:04 XioNoX: fix typo on asw2-ulsfo<->cr2-ulsfo interface (Xlink2 instead of Xlink1)
  • 14:21 otto@deploy1001: scap-helm eventgate-main finished
  • 14:21 otto@deploy1001: scap-helm eventgate-main cluster staging completed
  • 14:21 otto@deploy1001: scap-helm eventgate-main install -n main -f main/staging-values.yaml stable/eventgate [namespace: eventgate-main, clusters: staging]
  • 14:18 mbsantos@deploy1001: Finished deploy [tilerator/deploy@2736a69] (stretch): Deploy tilerator node 10 build into maps2001 (T215852) (duration: 00m 27s)
  • 14:17 mbsantos@deploy1001: Started deploy [tilerator/deploy@2736a69] (stretch): Deploy tilerator node 10 build into maps2001 (T215852)
  • 14:14 mbsantos@deploy1001: Finished deploy [kartotherian/deploy@7774721] (stretch): Deploy kartotherian node 10 build into maps2001 (T215852) (duration: 00m 27s)
  • 14:14 mbsantos@deploy1001: Started deploy [kartotherian/deploy@7774721] (stretch): Deploy kartotherian node 10 build into maps2001 (T215852)
  • 14:05 fsero@deploy1001: scap-helm eventgate-main install -n main -f main/staging-values.yaml stable/eventgate [namespace: eventgate-main, clusters: staging]
  • 14:03 gehel: starting upgrade to nodejs 10 for maps - T210704
  • 13:50 otto@deploy1001: scap-helm eventgate-main install -n main -f main/staging-values.yaml stable/eventgate [namespace: eventgate-main, clusters: staging]
  • 13:18 ema: cp3035: restart varnish-be
  • 12:07 kart_: EU-Midday SWAT done.
  • 12:06 kartik@deploy1001: Synchronized php-1.34.0-wmf.3: SWAT: Log warning and show error on empty username (T222529)|gerrit:508559Log warning and show error on empty username (T222529) (duration: 07m 29s)
  • 11:56 akosiaris@deploy1001: scap-helm cxserver finished
  • 11:56 akosiaris@deploy1001: scap-helm cxserver cluster codfw completed
  • 11:56 akosiaris@deploy1001: scap-helm cxserver upgrade -f cxserver-codfw-values.yaml production stable/cxserver [namespace: cxserver, clusters: codfw]
  • 11:56 akosiaris@deploy1001: scap-helm cxserver upgrade -f cxserver-codfw-values.yaml staging stable/cxserver [namespace: cxserver, clusters: codfw]
  • 11:54 akosiaris: bump prometheus-statsd-exporter for cxserver to 0.0.5 T220709
  • 11:54 akosiaris@deploy1001: scap-helm cxserver finished
  • 11:54 akosiaris@deploy1001: scap-helm cxserver cluster staging completed
  • 11:54 akosiaris@deploy1001: scap-helm cxserver upgrade -f cxserver-staging-values.yaml staging stable/cxserver [namespace: cxserver, clusters: staging]
  • 11:29 kartik@deploy1001: Synchronized wmf-config/InitialiseSettings.php: SWAT Add publish restrictions config for enwiki|gerrit:495677Add publish restrictions config for enwiki (T217237) (duration: 00m 58s)
  • 11:06 mobrovac@deploy1001: Finished deploy [cpjobqueue/deploy@abd7fdc]: Prepare the config to allow jobs to be switched to PHP7 individually - T219148 (duration: 01m 30s)
  • 11:05 mobrovac@deploy1001: Started deploy [cpjobqueue/deploy@abd7fdc]: Prepare the config to allow jobs to be switched to PHP7 individually - T219148
  • 10:17 _joe_: restarted pybal on lvs1016 to pick up changes for T222705
  • 10:08 marostegui@deploy1001: Synchronized wmf-config/db-eqiad.php: Give more traffic to db1131 in s6 T222682 (duration: 00m 57s)
  • 09:51 _joe_: restarted proton on proton1001
  • 09:50 _joe_: restarted pybal on lvs1006 to pick up changes for T222705
  • 09:49 _joe_: restarted pybal on lvs2003 to pick up changes for T222705
  • 09:45 marostegui: Stop replication on db2097:3311
  • 09:29 marostegui@deploy1001: Synchronized wmf-config/db-eqiad.php: Give more traffic to db1131 in s6 T222682 (duration: 01m 07s)
  • 09:26 _joe_: restarting pybal on lvs2006 to pick up changes for T222705 (3/3)
  • 09:24 elukey: install uwsgi-core_2.0.14+20161117-3+deb9u2+wmf1 on netmon2001 to test a uwsgi bug fix - T212697
  • 09:12 _joe_: restarting pybal on lvs2006 to pick up changes for T222705 (2/3)
  • 08:57 _joe_: restarting pybal on lvs2006 to pick up changes for T222705
  • 08:56 godog: upload prometheus-statsd-exporter 0.9.0+ds1-1 to stretch-wikimedia T220709
  • 08:49 marostegui@deploy1001: Synchronized wmf-config/db-eqiad.php: Pool db1131 into s6 with low weight T222682 (duration: 00m 51s)
  • 08:48 marostegui@deploy1001: Synchronized wmf-config/db-codfw.php: Pool db1131 into s6 with low weight T222682 (duration: 00m 53s)
  • 08:29 marostegui@deploy1001: Synchronized wmf-config/db-eqiad.php: Fully repool db1093 (duration: 00m 58s)
  • 08:14 marostegui@deploy1001: Synchronized wmf-config/db-eqiad.php: More traffic to db1093 (duration: 00m 58s)
  • 07:49 marostegui: Stop replication s1 on db2102
  • 07:45 elukey: install uwsgi-core_2.0.14+20161117-3+deb9u2+wmf1 on netmon1002 to test a uwsgi bug fix - T212697
  • 07:42 marostegui@deploy1001: Synchronized wmf-config/db-eqiad.php: Give some API traffic to db1093 (duration: 00m 57s)
  • 07:41 vgutierrez: upgrading pybal to version 1.15.6 in lvs1001 - T222705
  • 07:40 godog: bounce prometheus on bast3002 to finalize migration
  • 07:37 vgutierrez: upgrading pybal to version 1.15.6 in lvs1004 - T222705
  • 07:33 vgutierrez: upgrading pybal to version 1.15.6 in lvs1002 - T222705
  • 07:33 marostegui@deploy1001: Synchronized wmf-config/db-codfw.php: Pool db2115 into x1 T222772 (duration: 00m 56s)
  • 07:32 marostegui@deploy1001: Synchronized wmf-config/db-eqiad.php: Pool db2115 into x1 T222772 (duration: 01m 09s)
  • 07:26 vgutierrez: upgrading pybal to version 1.15.6 in lvs1005 - T222705
  • 07:25 marostegui@deploy1001: Synchronized wmf-config/db-eqiad.php: Give some weight to db1093 (duration: 00m 56s)
  • 07:21 vgutierrez: upgrading pybal to version 1.15.6 in lvs1016 - T222705
  • 07:15 marostegui@deploy1001: Synchronized wmf-config/db-eqiad.php: Pool db1127 and db1137 into x1 T222682 (duration: 00m 56s)
  • 07:14 vgutierrez: upgrading pybal to version 1.15.6 in lvs1006 - T222705
  • 07:13 marostegui@deploy1001: Synchronized wmf-config/db-codfw.php: Pool db1127 and db1137 into x1 T222682 (duration: 01m 03s)
  • 07:04 vgutierrez: upgrading pybal to version 1.15.6 in lvs2001 - T222705
  • 07:02 vgutierrez: upgrading pybal to version 1.15.6 in lvs2004 - T222705
  • 06:58 vgutierrez: upgrading pybal to version 1.15.6 in lvs2002 - T222705
  • 06:51 vgutierrez: upgrading pybal to version 1.15.6 in lvs2005 - T222705
  • 06:42 vgutierrez: upgrading pybal to version 1.15.6 in lvs2003 - T222705
  • 06:36 vgutierrez: upgrading pybal to version 1.15.6 in lvs3001 - T222705
  • 06:32 vgutierrez: upgrading pybal to version 1.15.6 in lvs3003 - T222705
  • 06:29 elukey: restart uwsgi-netbox on netmon1002 after the daily segfault (upon restart)
  • 06:29 vgutierrez: upgrading pybal to version 1.15.6 in lvs3002 - T222705
  • 06:24 vgutierrez: upgrading pybal to version 1.15.6 in lvs3004 - T222705
  • 06:20 marostegui: Stop MySQL on db2096
  • 06:19 vgutierrez: upgrading pybal to version 1.15.6 in lvs4005 - T222705
  • 06:16 vgutierrez: upgrading pybal to version 1.15.6 in lvs4006 - T222705
  • 06:13 vgutierrez: upgrading pybal to version 1.15.6 in lvs4007 - T222705
  • 06:07 vgutierrez: upgrading pybal to version 1.15.6 in lvs5001 - T222705
  • 06:02 vgutierrez: upgrading pybal to version 1.15.6 in lvs5002 - T222705
  • 05:59 vgutierrez: upgrading pybal to version 1.15.6 in lvs5003 - T222705
  • 05:48 vgutierrez: upgrading pybal to version 1.15.6 in lvs2006 - T222705
  • 05:25 marostegui: Stop MySQL on db1093
  • 05:01 marostegui: Optimize tables on pc1007
  • 05:01 marostegui@deploy1001: Synchronized wmf-config/db-eqiad.php: Depool pc1007 (duration: 00m 59s)

2019-05-07

  • 23:31 ebernhardson@deploy1001: Synchronized wmf-config/CirrusSearch-production.php: T220625 Configure wgCirrusSearchPrivateClusters (duration: 00m 58s)
  • 22:06 ppchelko@deploy1001: Finished deploy [restbase/deploy@8f5859f]: Do not cache html if stash was requested T215956 (duration: 18m 12s)
  • 21:48 ppchelko@deploy1001: Started deploy [restbase/deploy@8f5859f]: Do not cache html if stash was requested T215956
  • 21:47 ppchelko@deploy1001: deploy aborted: Do not cache html if stash was requested T215956 (duration: 00m 12s)
  • 21:47 ppchelko@deploy1001: Started deploy [restbase/deploy@d91ee4c]: Do not cache html if stash was requested T215956
  • 21:46 mutante: deploy1001 - renabled puppet - deployment can go ahead
  • 21:06 cdanis: cdanis@cumin1001.eqiad.wmnet ~ % sudo cumin -m async -p80 -b10 'C:profile::mediawiki::php and *.codfw.wmnet' 'run-puppet-agent' 'systemctl reload php7.2-fpm.service'
  • 20:43 mutante: gerrit2001 - restarting apache.. failed
  • 20:38 ejegg: updated payments-wiki from 558427f731 to 6e0172bac3
  • 20:31 mutante: gerrit2001 - temp disabling puppet - testing apache rewrites for T218844 on non-prod host
  • 20:14 mutante: deploy1001 - temp disabled puppet - debugging issue with apache-fast-test script
  • 19:52 thcipriani@deploy1001: rebuilt and synchronized wikiversions files: group0 to 1.34.0-wmf.4
  • 19:42 thcipriani@deploy1001: Finished scap: testwiki to 1.34.0-wmf.4 and rebuild l10n cache (duration: 28m 55s)
  • 19:13 thcipriani@deploy1001: Started scap: testwiki to 1.34.0-wmf.4 and rebuild l10n cache
  • 19:04 thcipriani@deploy1001: Pruned MediaWiki: 1.33.0-wmf.22 (duration: 02m 15s)
  • 18:50 thcipriani@deploy1001: Pruned MediaWiki: 1.33.0-wmf.21 (duration: 08m 48s)
  • 18:38 mutante: LDAP - adding awight to 'wmde' group (T222538)
  • 18:08 mutante: restarting icinga via web UI button
  • 17:45 thcipriani: starting branchcut for train (1.34.0-wmf.4)
  • 17:31 arturo: rebooting cloudvirt1024 to test interfaces configuration
  • 16:59 otto@deploy1001: scap-helm eventgate-main install -n main -f main/staging-values.yaml stable/eventgate [namespace: eventgate-main, clusters: staging]
  • 16:39 otto@deploy1001: scap-helm eventgate-main install -n main -f main/staging-values.yaml stable/eventgate [namespace: eventgate-main, clusters: staging]
  • 16:38 arturo: rebooting cloudvirt1024 to test interfaces configuration
  • 16:05 fsero: created eventgate-main tokens - T218346
  • 16:05 fsero: created eventgate-main tokens
  • 15:47 fsero: creating eventgate-main namespace on k8s clusters
  • 15:38 vgutierrez: uploaded pybal 1.15.6 to apt.wikimedia.org (stretch && jessie)
  • 15:21 ebernhardson@deploy1001: Synchronized php-1.34.0-wmf.3/extensions/CirrusSearch/maintenance/forceSearchIndex.php: T222641: Cirrus maint script handle ancient logging rows (duration: 00m 52s)
  • 14:53 cdanis: pool mw1271
  • 14:53 cdanis: pool mw1256
  • 14:44 cdanis: cdanis@mw1256.eqiad.wmnet ~ % sudo php7adm /opcache-free
  • 14:43 cdanis: cdanis@mw1271.eqiad.wmnet ~ % sudo php7adm /opcache-free
  • 14:40 vgutierrez: uploaded pybal 1.15.5 to apt.wikimedia.org (stretch && jessie)
  • 14:26 _joe_: repooling mw1320
  • 14:25 _joe_: resetting opcache on mw1320
  • 14:13 vgutierrez: uploaded pybal 1.15.4 to apt.wikimedia.org (stretch)
  • 14:12 cdanis@puppetmaster1001: conftool action : set/pooled=no; selector: name=mw1256.eqiad.wmnet
  • 14:12 cdanis@puppetmaster1001: conftool action : set/pooled=no; selector: name=mw1271.eqiad.wmnet
  • 14:09 cdanis@puppetmaster1001: conftool action : set/pooled=no; selector: name=mw1320.eqiad.wmnet
  • 14:09 cdanis: depool mw1320
  • 14:07 otto@deploy1001: scap-helm eventgate-analytics finished
  • 14:07 otto@deploy1001: scap-helm eventgate-analytics cluster eqiad completed
  • 14:07 otto@deploy1001: scap-helm eventgate-analytics install -n analytics -f analytics/eqiad-values.yaml stable/eventgate [namespace: eventgate-analytics, clusters: eqiad]
  • 14:02 otto@deploy1001: scap-helm eventgate-analytics finished
  • 14:02 otto@deploy1001: scap-helm eventgate-analytics cluster codfw completed
  • 14:02 otto@deploy1001: scap-helm eventgate-analytics install -n analytics -f analytics/codfw-values.yaml stable/eventgate [namespace: eventgate-analytics, clusters: codfw]
  • 14:01 otto@deploy1001: scap-helm eventgate-analytics install -n analytics -f analytics/codfw-values.yaml stable/eventgate [namespace: eventgate-analytics, clusters: codfw]
  • 14:01 otto@deploy1001: scap-helm eventgate-analytics upgrade analytics -f analytics/codfw-values.yaml --reset-values stable/eventgate [namespace: eventgate-analytics, clusters: codfw]
  • 13:59 otto@deploy1001: scap-helm eventgate-analytics upgrade analytics -f analytics/codfw-values.yaml --reset-values stable/eventgate [namespace: eventgate-analytics, clusters: codfw]
  • 13:58 otto@deploy1001: scap-helm eventgate-analytics install -n analytics -f analytics/codfw-values.yaml stable/eventgate [namespace: eventgate-analytics, clusters: codfw]
  • 13:57 otto@deploy1001: scap-helm eventgate-analytics install -n analytics -f analytics/codfw-values.yaml stable/eventgate [namespace: eventgate-analytics, clusters: codfw]
  • 13:50 vgutierrez: uploaded prometheus-trafficserver-exporter 0.2.3 to apt.wikimedia.org (stretch) - T221217
  • 13:45 marostegui: Stop MySQL and poweroff db1093 for BBU replacement - T222127
  • 13:45 marostegui@deploy1001: Synchronized wmf-config/db-eqiad.php: Depool db1093 for BBU replacement T222127 (duration: 00m 51s)
  • 13:37 otto@deploy1001: scap-helm eventgate-analytics finished
  • 13:37 otto@deploy1001: scap-helm eventgate-analytics cluster staging completed
  • 13:37 otto@deploy1001: scap-helm eventgate-analytics upgrade analytics -f analytics/staging-values.yaml --reset-values stable/eventgate [namespace: eventgate-analytics, clusters: staging]
  • 13:37 otto@deploy1001: scap-helm eventgate-analytics upgrade analytics -f analytics/staging-values.yaml --reset-values stable/eventgate [namespace: eventgate-analytics, clusters: staging]
  • 13:17 cdanis: T221904 cdanis@cumin1001.eqiad.wmnet ~ % sudo cumin -p95 -m async -b5 'ms-be1*' 'run-puppet-agent -q' 'systemctl restart swift-object-replicator' 'systemctl restart swift-object-auditor'
  • 13:08 ema: sudo ipmitool -I lanplus -H cp2009.mgmt.codfw.wmnet -U root mc reset cold T222459
  • 13:07 ema: sudo ipmitool -I lanplus -H "cp2009.mgmt.codfw.wmnet" -U root -E chassis power cycle T222459
  • 13:02 cdanis: T221904 cdanis@cumin1001.eqiad.wmnet ~ % sudo cumin -p95 -m async -b5 'ms-be2*' 'run-puppet-agent -q' 'systemctl restart swift-object-replicator' 'systemctl restart swift-object-auditor'
  • 12:45 jynus: remove dbstore1001, dbstore2001, dbstore2002 from tendril and zarcillo T220002
  • 12:09 marostegui: Stop Replication on db1140:3320 to provision db1127 and db1137 T222682
  • 11:16 hashar: Downgraded Zuul back to 2.5.1-wmf7 # T105474 T140297
  • 11:08 hashar: Upgraded Zuul and it is broken. So downgrading back :-(
  • 10:51 hashar: Gracefully stopping Zuul for upgrade
  • 10:46 mlitn@deploy1001: Finished scap: SDC: Enable Depicts in UploadWizard on Commons (duration: 22m 45s)
  • 10:40 ema: libvmod-uuid 1.4-1 uploaded to stretch-wikimedia T221977
  • 10:23 mlitn@deploy1001: Started scap: SDC: Enable Depicts in UploadWizard on Commons
  • 10:16 hashar: contint1001: upgrading python-pbr from 0.8.2-1 to 1.10.0-1 , no more needed with recent Zuul # T218559
  • 10:16 hashar: contint1001, contint2002: rm /etc/apt/preferences.d/python_pbr.pref /etc/apt/preferences.d/python-pbr.pref # T218559
  • 10:08 jbond42: upload zull_2.5.1-wmf8 package to jessie-wikimedia
  • 09:51 godog: test statsd-exporter 0.9 upgrade on deployment-imagescaler03 - T220709
  • 09:47 jbond42: restart pdfrender on scb1004 - T174916
  • 08:51 arturo: T222685 remove facter from jessie-wikimedia/openstack-mitaka-jessie
  • 08:39 ema: repool cp1083 T222620
  • 07:59 moritzm: updating base-files from recent stretch point release
  • 07:51 mobrovac@deploy1001: Finished deploy [restbase/deploy@d91ee4c]: Remove section functionality from the REST API - T216636 (duration: 24m 46s)
  • 07:27 godog: upgrade prometheus on bast3002 - T187987
  • 07:26 mobrovac@deploy1001: Started deploy [restbase/deploy@d91ee4c]: Remove section functionality from the REST API - T216636
  • 07:21 mobrovac@deploy1001: Finished deploy [restbase/deploy@d91ee4c] (dev-cluster): Remove section functionality from the REST API (duration: 03m 02s)
  • 07:21 marostegui: Optimize tables on pc1010
  • 07:18 mobrovac@deploy1001: Started deploy [restbase/deploy@d91ee4c] (dev-cluster): Remove section functionality from the REST API
  • 06:59 moritzm: updating firmware-bnx2x (from stretch point release, this is a NOP, the source package firmware-nonfree was updated for various Wifi chipsets we don't use, doublechecked by comparing check sums for old and new bnx2x firmware)
  • 06:44 elukey: restart uwsgi-netbox on netmon1002 after segfault
  • 05:23 marostegui@deploy1001: Synchronized wmf-config/db-codfw.php: Promote db2045 to codfw x1 master T219493 (duration: 00m 55s)
  • 05:12 marostegui: Change topology on x1 codfw to promote db2045 to master T219493
  • 02:12 tstarling@deploy1001: Synchronized wmf-config/InitialiseSettings.php: Use Preprocessor_Hash unconditionally (duration: 00m 52s)
  • 00:53 mutante: install2002 - disabling puppet, live hacking DHCP config for db2103 to not serve installer via http to debug install issue for T221532 which seems like T190424#4548003
  • 00:38 jforrester@deploy1001: Synchronized php-1.34.0-wmf.3/extensions/VisualEditor/modules/ve-mw/init/ve.init.mw.ArticleTarget.js: Hot-deploy fix for visual diffs on mobile in non-section mode T222489 (duration: 00m 53s)
  • 00:32 ejegg: disabled fundraising scheduled jobs for CiviCRM maintenance

2019-05-06

  • 23:25 maxsem@deploy1001: Synchronized wmf-config/CommonSettings.php: https://gerrit.wikimedia.org/r/#/c/operations/mediawiki-config/+/503546/ (duration: 00m 50s)
  • 22:46 crusnov@deploy1001: Finished deploy [netbox/deploy@0061190]: Deploy new version of ganeti-netbox sync. (duration: 03m 53s)
  • 22:43 RoanKattouw: Running refreshMessageBlobs.php on all wikis for T222539
  • 22:42 crusnov@deploy1001: Started deploy [netbox/deploy@0061190]: Deploy new version of ganeti-netbox sync.
  • 21:59 mutante: LDAP - remove 'sukhe' from 'nda' and add to 'wmf' instead (T221990)
  • 21:24 cdanis: experimenting with different disk scheduler on ms-be2014 -- cdanis@ms-be2014.codfw.wmnet ~ % for D in /sys/block/sd*/queue/scheduler ; echo cfq | sudo tee $D
  • 21:15 godog: swift codfw-prod: push up-to-date rings, mistakenly pushed earlier an older version
  • 19:48 gehel: rolling restart of cassandra on maps* fro config change
  • 19:47 RoanKattouw: Running recomputeNotifCounts.php --notif-types=login-success on all Echo wikis for T220762
  • 19:31 cdanis: cdanis@cumin1001.eqiad.wmnet ~ % sudo cumin -m async -b4 'ms-be1*' 'run-puppet-agent --enable "cdanis rollout I369f9b29"' 'systemctl restart swift-object-replicator'
  • 19:22 cdanis: cdanis@cumin1001.eqiad.wmnet ~ % sudo cumin -m async -b4 'ms-be2*' 'run-puppet-agent --enable "cdanis rollout I369f9b29"' 'systemctl systemctl restart swift-object-replicator'
  • 19:01 catrope@deploy1001: Synchronized wmf-config/InitialiseSettings.php: Begin homepage experiment on cswiki and kowiki (T221266) (duration: 00m 51s)
  • 18:47 catrope@deploy1001: Synchronized php-1.34.0-wmf.3/extensions/GrowthExperiments/: Remove link to pageviews tool when no data available (T222405) (duration: 00m 52s)
  • 18:32 catrope@deploy1001: Synchronized php-1.34.0-wmf.3/skins/MinervaNeue/includes/menu/Definitions.php: Harden Definitions::insertCommunityPortal() method (T222407) (duration: 00m 53s)
  • 18:30 cdanis: cdanis@cumin1001.eqiad.wmnet ~ % sudo cumin 'ms-be*' 'disable-puppet "cdanis rollout I369f9b29"'
  • 18:24 jynus: restart and upgrade db1116
  • 18:14 catrope@deploy1001: Synchronized wmf-config/CommonSettings.php: Set $wgOresFrontendBaseUrl (T219396) (duration: 00m 51s)
  • 17:53 otto@deploy1001: scap-helm eventgate-main install -n main -f main/staging-values.yaml stable/eventgate [namespace: eventgate-main, clusters: staging]
  • 17:52 otto@deploy1001: scap-helm eventgate-main install -n main -f main/staging-values.yaml stable/eventgate [namespace: eventgate-main, clusters: staging]
  • 17:19 elukey: restart netbox on netmon1002 as test
  • 17:11 jynus: restart dbprov* hosts, in sequence, for kernel upgrade
  • 16:42 jynus: restart db1114 mysql for upgrade testing
  • 16:38 andrewbogott: re-imaging cloudvirt1024
  • 16:34 jynus: restart db2102 mysql for upgrade testing
  • 16:11 hashar: CI queue drained. Should be working fine again now
  • 15:57 hashar: CI / Zuul is being slowed down and being investigated
  • 15:48 moritzm: updating firmware-bnx2x (from stretch point release, this is a NOP, the source package firmware-nonfree was updated for various Wifi chipsets we don't use, doublechecked by comparing check sums for old and new bnx2x firmware)
  • 15:37 moritzm: updating firmware-bnx2 (from stretch point release, this is a NOP, the source package firmware-nonfree was updated for various Wifi chipsets we don't use, doublechecked by comparing check sums for old and new bnx2 firmware)
  • 15:35 papaul: shutting down elastic2038 for DIMM swap
  • 15:30 moritzm: updating base-files from recent stretch point release
  • 15:14 ema: pool cp4026 w/ ATS backend T219967
  • 14:57 godog: capture strace / core for rsyslog on wezen / lithium and restart - T199406
  • 14:42 ema: powercycle cp1083
  • 14:41 ema@puppetmaster1001: conftool action : set/pooled=no; selector: name=cp1083.eqiad.wmnet
  • 14:35 godog: swift eqiad-prod: finish decom ms-be101[45] - T220590
  • 14:25 moritzm: installing vips security updates
  • 14:19 ema: depool cp4026 and reimage as upload_ats T219967
  • 14:11 otto@deploy1001: scap-helm eventgate-analytics finished
  • 14:11 otto@deploy1001: scap-helm eventgate-analytics cluster staging completed
  • 14:11 otto@deploy1001: scap-helm eventgate-analytics upgrade analytics -f analytics/staging-values.yaml --reset-values stable/eventgate [namespace: eventgate-analytics, clusters: staging]
  • 14:09 hashar: CI workflow fixed by reverting a change deployed around 10:00 UTC # T222614
  • 14:03 ema: cp3038: restart varnish-be
  • 13:56 otto@deploy1001: scap-helm eventgate-analytics finished
  • 13:56 otto@deploy1001: scap-helm eventgate-analytics cluster staging completed
  • 13:56 otto@deploy1001: scap-helm eventgate-analytics install -n analytics -f analytics/staging-values.yaml stable/eventgate [namespace: eventgate-analytics, clusters: staging]
  • 13:54 moritzm: installing zziplib security updates
  • 13:52 hashar: CI does not run sometime for some reason ... https://phabricator.wikimedia.org/T222614 :(
  • 13:22 moritzm: installing audiofile security updates
  • 13:20 moritzm: installing unzip security updates
  • 12:43 moritzm: installing rsync security updates
  • 12:24 moritzm: installing golang security updates on jessie
  • 11:44 Amir1: EU SWAT is done
  • 11:40 ladsgroup@deploy1001: Synchronized wmf-config/InitialiseSettings.php: SWAT: Enable Suggestion Constraint Status on Wikidata|gerrit:508303Enable Suggestion Constraint Status on Wikidata (duration: 00m 52s)
  • 11:32 arturo: reverting puppet change to the sudo module
  • 11:17 arturo: merging puppet change to the sudo module https://gerrit.wikimedia.org/r/c/operations/puppet/+/507376
  • 10:59 ema: manual puppet-merge $sha on failed puppetmasters https://phabricator.wikimedia.org/P8477
  • 10:44 jdrewniak@deploy1001: Synchronized portals: Wikimedia Portals Update: Bumping portals to master (T128546)|gerrit:508302 Bumping portals to master (T128546) (duration: 00m 51s)
  • 10:43 jdrewniak@deploy1001: Synchronized portals/wikipedia.org/assets: Wikimedia Portals Update: Bumping portals to master (T128546)|gerrit:508302 Bumping portals to master (T128546) (duration: 00m 52s)
  • 10:05 arturo: upgrade udev in cloudservices2002-dev
  • 09:59 arturo: T222148 upgrade udev & libudev1 on cloudvirt[1001-1003,1005].eqiad.wmnet
  • 09:35 elukey: restart netbox on netmon1002 (trying to reproduce the segfault) - T212697
  • 09:03 godog: upgrade labmon1001 to prometheus 2 - T187987
  • 06:01 marostegui@deploy1001: Synchronized wmf-config/db-eqiad.php: Give some API traffic to db1093 (duration: 00m 52s)
  • 05:08 marostegui@deploy1001: Synchronized wmf-config/db-eqiad.php: Give some weight to db1093 (duration: 00m 58s)
  • 04:08 ariel@deploy1001: Finished deploy [dumps/dumps@b4b7733]: reduce sleep time more between wikis for incrs (duration: 00m 05s)
  • 04:08 ariel@deploy1001: Started deploy [dumps/dumps@b4b7733]: reduce sleep time more between wikis for incrs

2019-05-05

  • 14:42 elukey: restart pdfrender on scb1004
  • 03:10 chaomodus: fyi scb* flapping on some endpoints seems to be just noise, there is high load from mobileapi but things appear to be operating normally otherwise, several boxes are in the process of checking md which may account for service lags
  • 02:40 andrewbogott: restarting mariadb on cloudservices1003

2019-05-04

  • 22:20 reedy@deploy1001: Synchronized docroot/mediawiki/xml/index.html: Add extra xml namespace links (duration: 01m 06s)
  • 10:38 ariel@deploy1001: Finished deploy [dumps/dumps@26b52ef]: misc small fixes, reduce sleep time for incr wikis (duration: 00m 09s)
  • 10:38 ariel@deploy1001: Started deploy [dumps/dumps@26b52ef]: misc small fixes, reduce sleep time for incr wikis

2019-05-03

  • 23:50 thcipriani: gerrit back
  • 23:49 thcipriani: gerrit restart due to threads piling up
  • 22:09 XioNoX: clear v4 BGP to AS17451 on cr1-eqsin/cr4-ulsfo
  • 17:16 arturo: T222148 aborrero@labstore1005:~ $ sudo apt-get install libudev1 udev systemd systemd-sysv libsystemd0
  • 17:15 arturo: T222148 aborrero@labstore1004:~ $ sudo apt-get install libudev1 udev systemd systemd-sysv libsystemd0
  • 17:11 arturo: T222148 aborrero@labpuppetmaster1002:~ $ sudo apt-get install libudev1 udev systemd systemd-sysv libsystemd0
  • 17:10 arturo: T222148 aborrero@labpuppetmaster1001:~ $ sudo apt-get install libudev1 udev systemd systemd-sysv libsystemd0
  • 17:09 arturo: T222148 aborrero@labtestpuppetmaster2001:~ $ sudo apt-get install libudev1 udev systemd systemd-sysv libsystemd0
  • 17:08 arturo: T222148 drop libudev1 from openstack-mitaka-jessie/jessie-wikimedia (related to T216497)
  • 17:07 arturo: T222148 drop udev from openstack-mitaka-jessie/jessie-wikimedia (related to T216497)
  • 15:02 oblivian@puppetmaster1001: conftool action : set/pooled=yes; selector: cluster=parsoid,dc=codfw
  • 15:02 _joe_: repooling the wtp* servers depooled in codfw for load testing
  • 14:56 _joe_: repool mw1275
  • 13:49 jijiki: Restart npre on proton1001
  • 12:26 gehel: replaying 30 minutes of eqiad search traffic on codfw - T221121
  • 12:21 ema: cp3038: varnish-backend-restart
  • 11:10 _joe_: purging opcache on mw1275
  • 10:47 ema: pool cp4025 w/ ATS backend T219967
  • 10:43 jbond42: T220380 remove zull_2.5.0-8-gcbc7f62-wmf4jessie1 from jessie-wikimedia/thirdparty
  • 10:42 jbond42: T220380 upload zull_2.5.1-wmf7 to jessie-wikimedia
  • 10:25 jijiki: Depool mw1275
  • 10:02 lucaswerkmeister-wmde@deploy1001: Synchronized php-1.34.0-wmf.3/extensions/WikibaseLexemeCirrusSearch/: Fix reference to classes that moved (T222347)|gerrit:507847Fix reference to classes that moved (T222347) (duration: 00m 55s)
  • 09:49 ema: depool cp4025 and reimage as upload_ats T219967
  • 09:49 oblivian@puppetmaster1001: conftool action : set/pooled=no; selector: cluster=parsoid,dc=codfw,name=wtp201[3-4].*
  • 09:21 gehel: ban elastic2038 from elastic clusters pending memory issue investigation - T217398
  • 08:47 ema: pool cp4024 w/ ATS backend T219967
  • 08:27 jynus: starting table recompression on new backup source hosts on eqiad and codfw (stop replication) T220572
  • 07:45 ema: depool cp4024 and reimage as upload_ats T219967
  • 07:16 ema: cp1089: varnish-backend-restart
  • 05:32 _joe_: restarting varnish backend on cp1077
  • 05:05 oblivian@puppetmaster1001: conftool action : set/pooled=no; selector: cluster=parsoid,dc=codfw,name=wtp201[5-6].*
  • 04:57 oblivian@puppetmaster1001: conftool action : set/pooled=no; selector: cluster=parsoid,dc=codfw,name=wtp20(1[7-9]|20).*
  • 04:55 _joe_: progressively depooling parsoid servers in codfw to assess load tolerance
  • 00:32 mutante: powercycling elastic2038
  • 00:10 XioNoX: remove static route to 208.80.155.128/25 on cr1/2-eqiad - T193496
  • 00:06 mutante: restarting gerrit to pick up config changes for 2 mail threads and lower timeout (gerrit:507852, gerrit: 507853)

2019-05-02

  • 22:10 jforrester@deploy1001: Synchronized php-1.34.0-wmf.3/extensions/MobileFrontend/resources/dist/mobile.editor.overlay.js: Hot-deploy T222229 to fix VE switching on MobileFrontend (duration: 00m 52s)
  • 21:21 thcipriani: gerrit back
  • 21:20 ejegg: updated payments-wiki from aa8dad50e7 to 558427f731
  • 21:19 thcipriani: gerrit restart to pick up config changes: https://gerrit.wikimedia.org/r/504973/ and https://gerrit.wikimedia.org/r/507858/
  • 21:00 crusnov@deploy1001: Finished deploy [netbox/deploy@bf9aef2]: Upgrade Netbox to 2.5.12 - T222351 (duration: 01m 48s)
  • 20:58 crusnov@deploy1001: Started deploy [netbox/deploy@bf9aef2]: Upgrade Netbox to 2.5.12 - T222351
  • 20:58 crusnov@deploy1001: Finished deploy [netbox/deploy@bf9aef2]: Upgrade Netbox to 2.5.12 - T222351 (duration: 00m 33s)
  • 20:57 crusnov@deploy1001: Started deploy [netbox/deploy@bf9aef2]: Upgrade Netbox to 2.5.12 - T222351
  • 19:41 ejegg: updated CiviCRM from 01c4d15c9a to 5024c968ed
  • 19:40 jforrester@deploy1001: Synchronized php-1.34.0-wmf.3/resources/src/mediawiki.widgets/mw.widgets.SearchInputWidget.js: Hot-deploy T222329 fix part 2 (duration: 00m 50s)
  • 19:39 jforrester@deploy1001: Synchronized php-1.34.0-wmf.3/includes/widget/SearchInputWidget.php: Hot-deploy T222329 fix part 1 (duration: 00m 53s)
  • 19:31 James_F: Shuffled 1.34.0-wmf.3 security patch cee0e569f4 for T222324 into the tip of the upstream branch now it's merged; no-op
  • 19:27 thcipriani@deploy1001: rebuilt and synchronized wikiversions files: all wikis to 1.34.0-wmf.3
  • 19:03 mutante: phab2001 - apt-get autoremove ..removes a single python package not needed anymore
  • 19:00 mutante: phab1001 - upgrading PHP packages on prod phab server
  • 18:59 jynus: restart dbstore1001 for upgrade
  • 18:33 catrope@deploy1001: Synchronized php-1.34.0-wmf.3/extensions/GrowthExperiments/: Don't fatal on deleted pages in 'recent questions' (T222206) (duration: 01m 01s)
  • 18:18 catrope@deploy1001: Synchronized wmf-config/InitialiseSettings.php: Enable cirrussearch-request logging to eventgate-analytics on all wikis (T214080) (duration: 00m 58s)
  • 18:10 catrope@deploy1001: Synchronized wmf-config/InitialiseSettings.php: Enable SpecialHomepage on cswiki and kowiki (T221266) (duration: 00m 58s)
  • 18:09 mutante: phab1001 - install package upgrades for bash and cron
  • 17:46 sbassett: Deployed patch for T222324 (1.34.0-wmf.3)
  • 17:45 arlolra@deploy1001: Finished deploy [parsoid/deploy@414387b]: Updating Parsoid to 9786781 (duration: 05m 45s)
  • 17:39 arlolra@deploy1001: Started deploy [parsoid/deploy@414387b]: Updating Parsoid to 9786781
  • 16:42 gehel: replaying 30 minutes of eqiad search traffic on codfw - T221121
  • 16:10 jynus: restarted dbproxy1005 haproxy, weird connection issue
  • 15:42 reedy@deploy1001: Synchronized wmf-config/CommonSettings.php: Re-enable account creation on wikitech (duration: 00m 57s)
  • 15:40 reedy@deploy1001: Synchronized wmf-config/wikitech.php: Invalidate user sessions upon blocking on wikitech (duration: 00m 59s)
  • 15:15 chasemp: add dsharpe to content admin on wikitech for user blocking
  • 12:42 jynus: stopping several instances at dbstore1001 to clone them to db1139/40 T220572
  • 12:06 ema: swift-proxy rolling restart T222071
  • 12:01 ema: restart swift-proxy on ms-fe1005 T222071
  • 10:37 ariel@deploy1001: Finished deploy [dumps/dumps@53c9f22]: speed up adds-changes dumps by generating index.html less often. tmep sleep 120 (duration: 00m 15s)
  • 10:36 ariel@deploy1001: Started deploy [dumps/dumps@53c9f22]: speed up adds-changes dumps by generating index.html less often. tmep sleep 120
  • 10:04 ema: pool cp4023 w/ ATS backend T219967
  • 09:41 jynus: testing backups on db2102 (increased network and disk usage) T220572
  • 09:07 jynus: reboot db2102 T220572
  • 09:02 ema: depool cp4023 and reimage as upload_ats T219967
  • 09:02 godog: rollout rsyslog upgrade 8.1901.0-1~bpo9+wmf1 to eqiad
  • 08:55 jiji@deploy1001: Synchronized wmf-config/CommonSettings.php: Send 5% of anonymous users to PHP7.2 - T219150 (duration: 01m 03s)
  • 08:49 jijiki: Sending more traffic to PHP7.2 - T219150
  • 04:28 andrewbogott: upgraded mediawiki on wikitech-static to 1.32.1
  • 04:25 kart_: Updated cxserver to 2019-05-02-040910-production (T222305)
  • 04:23 andrewbogott: apt-get upgrade on wikitech-static
  • 04:18 kartik@deploy1001: scap-helm cxserver finished
  • 04:18 kartik@deploy1001: scap-helm cxserver cluster eqiad completed
  • 04:18 kartik@deploy1001: scap-helm cxserver upgrade -f cxserver-eqiad-values.yaml production stable/cxserver [namespace: cxserver, clusters: eqiad]
  • 04:16 kartik@deploy1001: scap-helm cxserver finished
  • 04:16 kartik@deploy1001: scap-helm cxserver cluster codfw completed
  • 04:16 kartik@deploy1001: scap-helm cxserver upgrade -f cxserver-codfw-values.yaml production stable/cxserver [namespace: cxserver, clusters: codfw]
  • 04:15 kartik@deploy1001: scap-helm cxserver finished
  • 04:15 kartik@deploy1001: scap-helm cxserver cluster staging completed
  • 04:15 kartik@deploy1001: scap-helm cxserver upgrade -f cxserver-staging-values.yaml staging stable/cxserver [namespace: cxserver, clusters: staging]
  • 00:35 eileen: civicrm revision changed from 3414657d36 to 01c4d15c9a, config revision is 2119df9495

2019-05-01

  • 23:35 catrope@deploy1001: Synchronized php-1.34.0-wmf.3/extensions/GrowthExperiments/: Drop RENDER_NOW for impact module images (T222223) (duration: 01m 04s)
  • 23:19 ebernhardson@deploy1001: Synchronized wmf-config/InitialiseSettings.php: T220625 Start writing to cloudelastic for group0 (duration: 01m 05s)
  • 22:07 mutante: LDAP - adding jaufrecht to wmf (T222214)
  • 21:57 ebernhardson: start importing group2 to cloudelastic in parallel with group1
  • 21:18 ebernhardson: start importing group1 into cloudelastic from mwmaint1002
  • 20:15 halfak@deploy1001: Finished deploy [ores/deploy@52e9759]: T222121 (duration: 14m 03s)
  • 20:01 halfak@deploy1001: Started deploy [ores/deploy@52e9759]: T222121
  • 19:17 thcipriani@deploy1001: Synchronized php: group1 wikis to 1.34.0-wmf.3 (duration: 01m 53s)
  • 19:15 thcipriani@deploy1001: rebuilt and synchronized wikiversions files: group1 wikis to 1.34.0-wmf.3
  • 17:59 elukey: force remount of /mnt/hdfs on notebook1003 (fuse hdfs got stuck)
  • 17:43 joal@deploy1001: Finished deploy [analytics/refinery@682ab7c]: Regular analytics weekly train - Second try after space freed (duration: 03m 15s)
  • 17:40 joal@deploy1001: Started deploy [analytics/refinery@682ab7c]: Regular analytics weekly train - Second try after space freed
  • 17:27 joal@deploy1001: Finished deploy [analytics/refinery@682ab7c]: Regular analytics weekly train (duration: 25m 18s)
  • 17:02 joal@deploy1001: Started deploy [analytics/refinery@682ab7c]: Regular analytics weekly train
  • 16:58 ebernhardson@deploy1001: Synchronized wmf-config/InitialiseSettings.php: T220625 Start writing to cloudelastic from testwiki (duration: 01m 01s)
  • 16:52 sbisson@deploy1001: Synchronized php-1.34.0-wmf.3/extensions/GrowthExperiments/modules/homepage/ext.growthExperiments.QuestionPosterDialog.js: SWAT: Ensure text exists before logging enter-question-text action|gerrit:507598Ensure text exists before logging enter-question-text action (duration: 01m 00s)
  • 16:48 sbisson@deploy1001: Synchronized php-1.34.0-wmf.3/extensions/GrowthExperiments/includes/HelpPanel/QuestionPoster.php: SWAT: Re-use timestamp for section header and question storage|gerrit:507593Re-use timestamp for section header and question storage (duration: 01m 01s)
  • 16:41 sbisson@deploy1001: Synchronized php-1.34.0-wmf.3/extensions/GrowthExperiments/includes/HelpPanel/QuestionPoster.php: SWAT: Re-use timestamp for section header and question storage|gerrit:507593Re-use timestamp for section header and question storage (duration: 01m 01s)
  • 16:23 sbisson@deploy1001: Synchronized php-1.34.0-wmf.3/extensions/GrowthExperiments/modules/homepage/ext.growthExperiments.Homepage.Mentorship.js: SWAT: Mentorship module: Add data-link-id to mentor's talkpage link|gerrit:507580Mentorship module: Add data-link-id to mentor's talkpage link (duration: 01m 01s)
  • 16:17 sbisson@deploy1001: Synchronized wmf-config/InitialiseSettings.php: SWAT: Enable cirrussearch-request logging to eventgate-analytics for group1 wikis|gerrit:507550Enable cirrussearch-request logging to eventgate-analytics for group1 wikis (duration: 01m 00s)
  • 15:58 reedy@deploy1001: Synchronized wmf-config/wikitech.php: Re-enable password reset on wikitech (duration: 00m 58s)
  • 14:54 reedy@deploy1001: Synchronized wmf-config/wikitech.php: propagate blocks to gerrit (duration: 00m 57s)
  • 14:52 reedy@deploy1001: Synchronized wmf-config/InitialiseSettings.php: Add new logging channel for wikitech (duration: 00m 58s)
  • 13:57 gilles@deploy1001: Synchronized wmf-config/InitialiseSettings.php: T209572 Disable Reporting API endpoint (duration: 00m 59s)
  • 13:31 gilles@deploy1001: Synchronized wmf-config/InitialiseSettings.php: T209572 Enable Feature Policy Reporting origin trial (duration: 01m 01s)
  • 13:28 jbond42: update puppet and facter on esams
  • 12:53 gehel: start recording 30 minutes of traffic from elasticsearch eqiad - T221121
  • 11:27 gilles: T216499 Y216594 T216598 mwscript purgeList.php ruwiki --all --verbose
  • 11:22 gilles@deploy1001: Synchronized wmf-config/InitialiseSettings.php: T216499 T216598 T216594 Renew origin trial tokens for ruwiki (duration: 01m 14s)
  • 01:01 mholloway-shell@deploy1001: Finished deploy [mobileapps/deploy@5d619e4]: Update spec x-amples (duration: 03m 58s)
  • 00:57 mholloway-shell@deploy1001: Started deploy [mobileapps/deploy@5d619e4]: Update spec x-amples
  • 00:30 ayounsi@deploy1001: Finished deploy [librenms/librenms@2094575]: Upgrade LibreNMS to 1.51 - T207481 (duration: 00m 04s)
  • 00:30 ayounsi@deploy1001: Started deploy [librenms/librenms@2094575]: Upgrade LibreNMS to 1.51 - T207481

2019-04-30

  • 23:56 ayounsi@deploy1001: Finished deploy [librenms/librenms@0fd8da6]: Rollback LibreNMS (duration: 00m 05s)
  • 23:56 ayounsi@deploy1001: Started deploy [librenms/librenms@0fd8da6]: Rollback LibreNMS
  • 23:49 ayounsi@deploy1001: Finished deploy [librenms/librenms@2094575]: Upgrade LibreNMS to 1.51 - T207481 (duration: 00m 04s)
  • 23:49 ayounsi@deploy1001: Started deploy [librenms/librenms@2094575]: Upgrade LibreNMS to 1.51 - T207481
  • 23:35 ariel@deploy1001: Finished deploy [dumps/dumps@d715ea0]: determine page ranges of content output files by cumul revision length as well as rev count (duration: 00m 03s)
  • 23:35 ariel@deploy1001: Started deploy [dumps/dumps@d715ea0]: determine page ranges of content output files by cumul revision length as well as rev count
  • 23:18 ayounsi@deploy1001: Finished deploy [librenms/librenms@0fd8da6]: Rollback LibreNMS (duration: 00m 05s)
  • 23:18 ayounsi@deploy1001: Started deploy [librenms/librenms@0fd8da6]: Rollback LibreNMS
  • 23:07 ayounsi@deploy1001: Finished deploy [librenms/librenms@2094575]: Upgrade LibreNMS to 1.51 - T207481 (duration: 00m 05s)
  • 23:07 ayounsi@deploy1001: Started deploy [librenms/librenms@2094575]: Upgrade LibreNMS to 1.51 - T207481
  • 22:21 mobrovac@deploy1001: Finished deploy [restbase/deploy@b3b140f]: Parsoid: Use the new stash tables for old revisions - T215956 (duration: 23m 56s)
  • 21:57 mobrovac@deploy1001: Started deploy [restbase/deploy@b3b140f]: Parsoid: Use the new stash tables for old revisions - T215956
  • 21:56 mobrovac@deploy1001: Finished deploy [restbase/deploy@b3b140f] (dev-cluster): Parsoid: use the new stashing tables for old revisions too (duration: 03m 22s)
  • 21:52 mobrovac@deploy1001: Started deploy [restbase/deploy@b3b140f] (dev-cluster): Parsoid: use the new stashing tables for old revisions too
  • 21:44 sbassett: Deployed patch for T222038 (1.34.0-wmf.1 and 1.34.0-wmf.3)
  • 21:44 sbassett: Deployed patch for T222036 (1.34.0-wmf.1 and 1.34.0-wmf.3)
  • 21:13 thcipriani@deploy1001: rebuilt and synchronized wikiversions files: group0 to 1.34.0-wmf.3
  • 21:10 mutante: netmon1002 - apt-get remove --purge php 7.0* ; apt-get install php-common php-pear (pending upgrades) | netmon2001: apt autoremove
  • 21:06 mutante: netmon2001 - apt-get install php-common php-pear (pending upgrades)
  • 21:03 mutante: netmon2001 - apt-get remove --purge php7.0*
  • 21:03 mutante: librenms - switched from PHP 7.0 to PHP 7.2 succesful now. reverted manual changes for debugging on netmon1002
  • 20:29 thcipriani@deploy1001: Finished scap: testwiki to 1.34.0-wmf.3 and rebuild l10n cache (duration: 31m 17s)
  • 20:21 mutante: netmon1002 - loading PHP 7.2 module to debug issue for librenms. librenms very short downtime
  • 19:58 thcipriani@deploy1001: Started scap: testwiki to 1.34.0-wmf.3 and rebuild l10n cache
  • 19:56 thcipriani@deploy1001: Pruned MediaWiki: 1.33.0-wmf.20 (duration: 02m 07s)
  • 19:47 thcipriani@deploy1001: Pruned MediaWiki: 1.33.0-wmf.19 (duration: 02m 24s)
  • 19:44 smalyshev@deploy1001: Finished deploy [wdqs/wdqs@4360316]: Redeploy GUI for fixes T222133, T222129, T222181, T222182 (duration: 09m 17s)
  • 19:44 thcipriani@deploy1001: Pruned MediaWiki: 1.33.0-wmf.18 (duration: 02m 25s)
  • 19:43 mutante: switched netmon1002/netmon2001 from PHP 7.0 to 7.2 but reverted because LibreNMS still had an issue with it
  • 19:40 thcipriani@deploy1001: Pruned MediaWiki: 1.33.0-wmf.17 (duration: 10m 11s)
  • 19:35 smalyshev@deploy1001: Started deploy [wdqs/wdqs@4360316]: Redeploy GUI for fixes T222133, T222129, T222181, T222182
  • 19:27 otto@deploy1001: scap-helm eventgate-analytics finished
  • 19:27 otto@deploy1001: scap-helm eventgate-analytics cluster codfw completed
  • 19:27 otto@deploy1001: scap-helm eventgate-analytics upgrade production -f eventgate-analytics/eventgate-analytics-codfw-values.yaml --reset-values stable/eventgate-analytics [namespace: eventgate-analytics, clusters: codfw]
  • 19:26 otto@deploy1001: scap-helm eventgate-analytics finished
  • 19:26 otto@deploy1001: scap-helm eventgate-analytics cluster eqiad completed
  • 19:26 otto@deploy1001: scap-helm eventgate-analytics upgrade production -f eventgate-analytics/eventgate-analytics-eqiad-values.yaml --reset-values stable/eventgate-analytics [namespace: eventgate-analytics, clusters: eqiad]
  • 19:25 otto@deploy1001: scap-helm eventgate-analytics finished
  • 19:25 otto@deploy1001: scap-helm eventgate-analytics cluster staging completed
  • 19:25 otto@deploy1001: scap-helm eventgate-analytics upgrade staging -f eventgate-analytics/eventgate-analytics-staging-values.yaml --reset-values stable/eventgate-analytics [namespace: eventgate-analytics, clusters: staging]
  • 19:24 otto@deploy1001: scap-helm eventgate-analytics upgrade production -f eventgate-analytics/eventgate-analytics-staging-values.yaml --reset-values stable/eventgate-analytics [namespace: eventgate-analytics, clusters: staging]
  • 19:24 otto@deploy1001: scap-helm eventgate-analytics upgrade production -f eventgate-analytics/analytics/eventgate-analytics-staging-values.yaml --reset-values stable/eventgate-analytics [namespace: eventgate-analytics, clusters: staging]
  • 18:40 cdanis: running puppet on ms-be201[3,5] to bump replication concurrency T221068
  • 18:24 cdanis: running puppet on ms-be2014 to bump replication concurrency T221068
  • 18:09 thcipriani: start branchcut for 1.34.0-wmf.3
  • 17:16 bsitzmann@deploy1001: Finished deploy [mobileapps/deploy@1f09e44]: Update mobileapps to 142ba30 (T217837) (duration: 04m 16s)
  • 17:11 bsitzmann@deploy1001: Started deploy [mobileapps/deploy@1f09e44]: Update mobileapps to 142ba30 (T217837)
  • 16:57 ayounsi@deploy1001: Finished deploy [librenms/librenms@0fd8da6]: Rollback LibreNMS (duration: 00m 09s)
  • 16:57 ayounsi@deploy1001: Started deploy [librenms/librenms@0fd8da6]: Rollback LibreNMS
  • 16:52 arturo: merging change to `profile::base` and `::raid` https://gerrit.wikimedia.org/r/c/operations/puppet/+/507357 related to T221225
  • 16:36 ayounsi@deploy1001: Finished deploy [librenms/librenms@2094575]: Upgrade LibreNMS to 1.51 - T207706 (duration: 00m 11s)
  • 16:36 ayounsi@deploy1001: Started deploy [librenms/librenms@2094575]: Upgrade LibreNMS to 1.51 - T207706
  • 16:27 XioNoX: upgrade librenms to 1.51
  • 16:26 jbond42: upgrade puppet and facter in eqsin
  • 16:04 ema: pool cp4022 w/ ATS backend T219967
  • 15:58 robh@cumin1001: END (PASS) - Cookbook sre.hosts.decommission (exit_code=0)
  • 15:58 robh@cumin1001: START - Cookbook sre.hosts.decommission
  • 15:58 robh@cumin1001: END (PASS) - Cookbook sre.hosts.decommission (exit_code=0)
  • 15:58 robh@cumin1001: START - Cookbook sre.hosts.decommission
  • 15:45 elukey: restart hadoop hdfs namenodes on an-master100[1,2] to pick up new logging settings - T220702
  • 15:18 jynus: stop s8 instance on dbstore2001 for cloning to db2100 T220572
  • 15:09 jiji@deploy1001: Synchronized wmf-config/CommonSettings.php: Send 1% of anonymous users to PHP7.2 - T219150 (duration: 00m 54s)
  • 14:58 jbond42: enable-puppet "T220987: global kafaka log shipping - staged rollout (jbond)"
  • 14:56 cdanis: cdanis@cumin1001.eqiad.wmnet ~ % sudo cumin 'bast3002*' 'run-puppet-agent --enable "filippo prometheus"'
  • 14:49 cdanis: cdanis@cumin1001.eqiad.wmnet ~ % sudo cumin 'labmon1001*' 'run-puppet-agent --enable "staged rollout T222105 by cdanis"'
  • 14:44 jijiki: Sending 1% of anonymous users to PHP7.2 - T219150
  • 14:43 cdanis: cdanis@cumin1001.eqiad.wmnet ~ % sudo cumin 'bast5001*' 'run-puppet-agent --enable "staged rollout T222105 by cdanis"'
  • 14:26 jbond42: disable-puppet "T220987: global kafaka log shipping - staged rollout (jbond)"
  • 14:24 cdanis: cdanis@cumin1001.eqiad.wmnet ~ % sudo cumin 'prometheus2004*' 'run-puppet-agent --enable "staged rollout T222105 by cdanis"'
  • 14:17 cdanis: cdanis@cumin1001.eqiad.wmnet ~ % sudo cumin 'prometheus2003*' 'run-puppet-agent --enable "staged rollout T222105 by cdanis"'
  • 14:15 cdanis: cdanis@prometheus1003.eqiad.wmnet ~ % sudo enable-puppet 'cdanis testing original query.max-samples T222105'
  • 13:29 cdanis: cdanis@prometheus1004.eqiad.wmnet ~ % sudo systemctl restart prometheus@ops.service
  • 13:28 ema: depool cp4022 and reimage as upload_ats T219967
  • 13:20 arturo: reverting sudo puppet module changes https://gerrit.wikimedia.org/r/c/operations/puppet/+/507317
  • 13:16 cdanis: cdanis@prometheus1003.eqiad.wmnet ~ % sudo systemctl restart prometheus@ops.service
  • 13:15 cdanis: cdanis@prometheus1003.eqiad.wmnet ~ % sudo disable-puppet 'cdanis testing original query.max-samples T222105'
  • 13:08 cdanis: OOMed the eqiad ops prometheus @ prometheus1003
  • 13:02 cdanis: OOMed the eqiad ops prometheus @ prometheus1004
  • 12:47 cdanis: cdanis@prometheus1003.eqiad.wmnet ~ % sudo run-puppet-agent --enable "staged rollout T222105 by cdanis"
  • 12:41 arturo: merging a sudo puppet module change
  • 12:39 cdanis: cdanis@prometheus1004.eqiad.wmnet ~ % sudo run-puppet-agent --enable "staged rollout T222105 by cdanis"
  • 12:34 elukey: moved /home to /srv/home (more space in a dedicated partition) on stat1005
  • 12:32 cdanis: cdanis@cumin1001.eqiad.wmnet ~ % sudo cumin 'R:prometheus::server' 'disable-puppet "staged rollout T222105 by cdanis"'
  • 11:27 Lucas_WMDE: EU SWAT done
  • 11:22 mlitn@deploy1001: Synchronized wmf-config/CommonSettings.php: Allow cross-site requests from mobile domains (duration: 00m 52s)
  • 11:15 lucaswerkmeister-wmde@deploy1001: Synchronized wmf-config/InitialiseSettings.php: SWAT: Serialize empty lists as objects on Commons (T138104)|gerrit:507032Serialize empty lists as objects on Commons (T138104) (duration: 00m 54s)
  • 11:12 lucaswerkmeister-wmde@deploy1001: Synchronized wmf-config/InitialiseSettings.php: SWAT: Serialize empty lists as objects on Wikidata (T138104)|gerrit:507031Serialize empty lists as objects on Wikidata (T138104) (duration: 00m 55s)
  • 11:08 gilles@deploy1001: Finished deploy [performance/navtiming@d6756c0]: T221848 Proper fix for partitions_for_topic in python-kafka > 1.4.4 (duration: 00m 05s)
  • 11:08 gilles@deploy1001: Started deploy [performance/navtiming@d6756c0]: T221848 Proper fix for partitions_for_topic in python-kafka > 1.4.4
  • 11:02 ema: cp3038 mbox lag, restarting varnish-be
  • 10:55 kart_: Updated cxserver to 2019-04-30-055331-production (T219412)
  • 10:49 santhosh@deploy1001: scap-helm cxserver finished
  • 10:49 santhosh@deploy1001: scap-helm cxserver cluster codfw completed
  • 10:49 santhosh@deploy1001: scap-helm cxserver upgrade -f cxserver-codfw-values.yaml production stable/cxserver [namespace: cxserver, clusters: codfw]
  • 10:48 santhosh@deploy1001: scap-helm cxserver finished
  • 10:48 santhosh@deploy1001: scap-helm cxserver cluster eqiad completed
  • 10:48 santhosh@deploy1001: scap-helm cxserver upgrade -f cxserver-eqiad-values.yaml production stable/cxserver [namespace: cxserver, clusters: eqiad]
  • 10:45 santhosh@deploy1001: scap-helm cxserver finished
  • 10:45 santhosh@deploy1001: scap-helm cxserver cluster staging completed
  • 10:45 santhosh@deploy1001: scap-helm cxserver upgrade -f cxserver-staging-values.yaml staging stable/cxserver [namespace: cxserver, clusters: staging]
  • 10:32 godog: rollout rsyslog upgrade to 8.1901.0-1~bpo9+wmf1 in codfw
  • 10:32 arturo: T222060 reimaged labtestservices2003 as stretch spare system
  • 10:32 arturo: T222057 reimaged labtestvirt2003 as spare system
  • 10:12 godog: rollout rsyslog upgrade to 8.1901.0-1~bpo9+wmf1 in eqsin / ulsfo / esams
  • 10:08 jynus: stop s7 and x1 instances on dbstore2* for cloning T220572
  • 09:31 fsero@puppetmaster1001: conftool action : set/pooled=yes; selector: cluster=docker-registry,service=docker-registry
  • 09:26 fsero: creating lvs endpoints for docker registry - T221101
  • 09:02 elukey: roll restart hdfs namenodes on an-master100[1,2] to pick up new settings - T220702
  • 08:22 godog: bounce prometheus on bast4002 after backfill has finished - T187987
  • 08:11 gilles@deploy1001: Finished deploy [performance/navtiming@8f135ac]: T221848 Default to partition 0 when no partition is found (duration: 00m 05s)
  • 08:11 gilles@deploy1001: Started deploy [performance/navtiming@8f135ac]: T221848 Default to partition 0 when no partition is found
  • 08:11 gilles@deploy1001: deploy aborted: T221848 Defalt to partition 0 when no partition is found (duration: 00m 00s)
  • 08:11 gilles@deploy1001: Started deploy [performance/navtiming@8f135ac]: T221848 Defalt to partition 0 when no partition is found
  • 07:53 gilles@deploy1001: Finished deploy [performance/navtiming@e900152]: T221848 add more logging around startup (duration: 00m 05s)
  • 07:53 gilles@deploy1001: Started deploy [performance/navtiming@e900152]: T221848 add more logging around startup
  • 07:29 moritzm: installing systemd updates for jessie
  • 07:24 marostegui: Remove labservices1001 and labservices1002 from tendril T221857
  • 05:27 marostegui@deploy1001: Synchronized wmf-config/db-eqiad.php: Clarify db1093's status (duration: 00m 51s)
  • 05:26 marostegui@deploy1001: Synchronized wmf-config/db-codfw.php: Clarify db1093's status (duration: 00m 55s)
  • 04:26 mutante: LDAP - remove user pirroh from group nda (T222085 and cross-validate-accounts demands consistency)
  • 02:23 mutante: analytics1050 - systemctl start mclog ... it was failed like recently on analytics1052 (T212219 ?)
  • 02:09 tgr@deploy1001: Synchronized wmf-config/db-eqiad.php: SWAT: depool db1093|gerrit:507237depool db1093 (duration: 00m 54s)
  • 01:30 mutante: contint2001..then contint1001 - deleting /etc/zuul/wikimedia and letting puppet re-clone it (gerrit:507070) (T218844)

2019-04-29

  • 23:59 ebernhardson@deploy1001: Synchronized wmf-config/CirrusSearch-production.php: T220625 Add cloudelastic servers to wgCirrusSearchClusters (5/5) (duration: 00m 52s)
  • 23:58 ebernhardson@deploy1001: Synchronized wmf-config/InitialiseSettings.php: T220625 Add cloudelastic servers to wgCirrusSearchClusters (4/5) (duration: 00m 52s)
  • 23:56 ebernhardson@deploy1001: Synchronized wmf-config/ProductionServices.php: T220625 Add cloudelastic servers to wgCirrusSearchClusters (3/5) (duration: 00m 50s)
  • 23:55 ebernhardson@deploy1001: Synchronized wmf-config/LabsServices.php: T220625 Add cloudelastic servers to wgCirrusSearchClusters (2/5) (duration: 00m 52s)
  • 23:54 ebernhardson@deploy1001: Synchronized tests/: T220625 Add cloudelastic servers to wgCirrusSearchClusters (1/5) (duration: 00m 53s)
  • 23:34 smalyshev@deploy1001: Finished deploy [wdqs/wdqs@65796ad]: New deploy with GUI fix (duration: 31m 04s)
  • 23:33 ebernhardson@deploy1001: Synchronized wmf-config/InitialiseSettings.php: T221154: Add static.inaturalist.org to $wgCopyUploadDomains for Commons (duration: 00m 54s)
  • 23:03 smalyshev@deploy1001: Started deploy [wdqs/wdqs@65796ad]: New deploy with GUI fix
  • 21:13 mutante: restarting gerrit
  • 21:10 mutante: cobalt (gerrit) upgrading openjdk 8 minor version
  • 20:40 arlolra: Updated Parsoid to c9dab9d (T106578, T113194, T205338, T219072, T219938, T221384, T219943)
  • 20:37 XioNoX: add BGP session to AS4922 in eqiad
  • 20:37 RoanKattouw: Deployed patch for T222014
  • 20:26 arlolra@deploy1001: Finished deploy [parsoid/deploy@7859b58]: Updating Parsoid to c9dab9d (duration: 06m 36s)
  • 20:25 jbond@cumin1001: conftool action : set/pooled=yes; selector: name=mw127[5-9].eqiad.wmnet
  • 20:19 arlolra@deploy1001: Started deploy [parsoid/deploy@7859b58]: Updating Parsoid to c9dab9d
  • 20:18 jbond@cumin1001: conftool action : set/pooled=no; selector: name=mw127[5-9].eqiad.wmnet
  • 20:18 jbond@cumin1001: conftool action : set/pooled=yes; selector: name=mw127[0-4].eqiad.wmnet
  • 20:10 jbond@cumin1001: conftool action : set/pooled=no; selector: name=mw127[0-4].eqiad.wmnet
  • 20:08 jbond@cumin1001: conftool action : set/pooled=yes; selector: name=mw126[5-9].eqiad.wmnet
  • 19:59 jbond@cumin1001: conftool action : set/pooled=no; selector: name=mw126[5-9].eqiad.wmnet
  • 19:52 jbond@cumin1001: conftool action : set/pooled=yes; selector: name=mw126[1-4].eqiad.wmnet
  • 19:44 thcipriani: gerrit back
  • 19:44 jbond@cumin1001: conftool action : set/pooled=no; selector: name=mw126[1-4].eqiad.wmnet
  • 19:44 jbond@cumin1001: conftool action : set/pooled=yes; selector: name=mw125[4-8].eqiad.wmnet
  • 19:43 thcipriani: gerrit restart for https://gerrit.wikimedia.org/r/327763 T221026
  • 19:39 jbond@cumin1001: conftool action : set/pooled=no; selector: name=mw125[4-8].eqiad.wmnet
  • 19:39 jbond@cumin1001: conftool action : set/pooled=yes; selector: name=mw125[0-3].eqiad.wmnet
  • 19:36 jbond@cumin1001: conftool action : set/pooled=no; selector: name=mw125[0-3].eqiad.wmnet
  • 19:35 jbond@cumin1001: conftool action : set/pooled=yes; selector: name=mw124[5-9].eqiad.wmnet
  • 19:32 jbond@cumin1001: conftool action : set/pooled=no; selector: name=mw124[5-9].eqiad.wmnet
  • 19:31 jbond@cumin1001: conftool action : set/pooled=yes; selector: name=mw124[0-4].eqiad.wmnet
  • 19:26 jbond@cumin1001: conftool action : set/pooled=no; selector: name=mw124[0-4].eqiad.wmnet
  • 19:26 jbond@cumin1001: conftool action : set/pooled=yes; selector: name=mw124[0-4].eqiad.wmnet
  • 19:25 jbond@cumin1001: conftool action : set/pooled=yes; selector: name=mw123[8-9].eqiad.wmnet
  • 19:21 jbond@cumin1001: conftool action : set/pooled=no; selector: name=mw123[8-9].eqiad.wmnet
  • 19:20 jbond@cumin1001: conftool action : set/pooled=yes; selector: name=mw123[0-5].eqiad.wmnet
  • 19:17 jbond@cumin1001: conftool action : set/pooled=no; selector: name=mw123[0-5].eqiad.wmnet
  • 19:07 otto@deploy1001: sync-file aborted: Enable cirrussearch-request logging to eventgate-analytics for group0 wikis - T214080 (duration: 00m 02s)
  • 19:05 otto@deploy1001: Synchronized wmf-config/InitialiseSettings.php: Enable cirrussearch-request logging to eventgate-analytics for group0 wikis - T214080 (duration: 00m 53s)
  • 19:01 ottomata: deploying config change to enable cirrusssearch-request logging to eventgate-analytics for group0 wikis - T214080
  • 18:59 RoanKattouw: Deployed patch for T221739
  • 18:45 otto@deploy1001: scap-helm eventgate-analytics finished
  • 18:45 otto@deploy1001: scap-helm eventgate-analytics cluster eqiad completed
  • 18:45 otto@deploy1001: scap-helm eventgate-analytics upgrade production -f analytics/eventgate-analytics-eqiad-values.yaml --reset-values stable/eventgate-analytics [namespace: eventgate-analytics, clusters: eqiad]
  • 18:44 otto@deploy1001: scap-helm eventgate-analytics finished
  • 18:44 otto@deploy1001: scap-helm eventgate-analytics cluster codfw completed
  • 18:44 otto@deploy1001: scap-helm eventgate-analytics upgrade production -f analytics/eventgate-analytics-codfw-values.yaml --reset-values stable/eventgate-analytics [namespace: eventgate-analytics, clusters: codfw]
  • 18:42 catrope@deploy1001: Synchronized static/images/project-logos/: Change wikimaniawiki logo to Wikimania 2019 version (T221829) (duration: 00m 54s)
  • 18:41 otto@deploy1001: scap-helm eventgate-analytics finished
  • 18:41 otto@deploy1001: scap-helm eventgate-analytics cluster staging completed
  • 18:41 otto@deploy1001: scap-helm eventgate-analytics upgrade staging -f analytics/eventgate-analytics-staging-values.yaml --reset-values stable/eventgate-analytics [namespace: eventgate-analytics, clusters: staging]
  • 18:41 jbond@cumin1001: conftool action : set/pooled=yes; selector: name=mw122[8-9].eqiad.wmnet
  • 18:37 jbond@cumin1001: conftool action : set/pooled=no; selector: name=mw122[8-9].eqiad.wmnet
  • 18:37 catrope@deploy1001: Synchronized wmf-config/InitialiseSettings.php: Serialize empty lists as objects on Test Commons (T138104) (duration: 00m 54s)
  • 18:34 jbond@cumin1001: conftool action : set/pooled=yes; selector: name=mw122[1-6].eqiad.wmnet
  • 18:33 otto@deploy1001: scap-helm eventgate-analytics finished
  • 18:33 otto@deploy1001: scap-helm eventgate-analytics cluster staging completed
  • 18:33 otto@deploy1001: scap-helm eventgate-analytics upgrade staging -f analytics/eventgate-analytics-staging-values.yaml --reset-values stable/eventgate-analytics [namespace: eventgate-analytics, clusters: staging]
  • 18:30 catrope@deploy1001: Synchronized wmf-config/InitialiseSettings.php: Serialize empty lists as objects on Test Wikidata (T138104) (duration: 00m 53s)
  • 18:29 jbond@cumin1001: conftool action : set/pooled=no; selector: name=mw122[1-6].eqiad.wmnet
  • 18:26 otto@deploy1001: scap-helm eventgate-analytics finished
  • 18:26 otto@deploy1001: scap-helm eventgate-analytics cluster staging completed
  • 18:26 otto@deploy1001: scap-helm eventgate-analytics upgrade staging -f analytics/eventgate-analytics-staging-values.yaml --reset-values stable/eventgate-analytics [namespace: eventgate-analytics, clusters: staging]
  • 18:22 Jeff_Green: authdns-update for T221475
  • 18:21 catrope@deploy1001: Synchronized docroot/noc: Publish throttle-analyze at noc (T187894) (duration: 00m 53s)
  • 18:13 catrope@deploy1001: Synchronized wmf-config/InitialiseSettings.php: Add www4.bibl.ulaval.ca to wgCopyUploadsDomains (T220704) (duration: 00m 53s)
  • 17:35 Jeff_Green: authdns-update to deploy T214525
  • 17:15 onimisionipe@deploy1001: Finished deploy [wdqs/wdqs@9273213]: Blazegraph upgrade for new LDF version and GUI updates (duration: 06m 58s)
  • 17:08 onimisionipe@deploy1001: Started deploy [wdqs/wdqs@9273213]: Blazegraph upgrade for new LDF version and GUI updates
  • 16:38 jforrester@deploy1001: Synchronized wmf-config/InitialiseSettings.php: SDC UW config cleanup: Drop wmgMediaInfoEnableUploadWizardDepicts from IS (duration: 00m 53s)
  • 16:34 jforrester@deploy1001: Synchronized wmf-config/CommonSettings.php: SDC UW config cleanup: Switch to wmgMediaInfoEnableUploadWizardStatements in CS (duration: 00m 53s)
  • 16:33 jforrester@deploy1001: sync-file aborted: SDC UW config cleanup: Switch to wmgMediaInfoEnableUploadWizardStatements in CS (duration: 00m 01s)
  • 16:33 jforrester@deploy1001: Synchronized wmf-config/InitialiseSettings.php: SDC UW config cleanup: Add wmgMediaInfoEnableUploadWizardDepicts to IS (duration: 00m 53s)
  • 16:28 jforrester@deploy1001: Synchronized wmf-config/InitialiseSettings.php: SDC: Enable feature flag for depicts in UW on Test Commons (duration: 00m 53s)
  • 15:40 mholloway-shell@deploy1001: Synchronized wmf-config/InitialiseSettings.php: Update WikimediaEditorTasks counter config (T221951) (duration: 00m 58s)
  • 14:49 herron: added uid=sukhe,ou=people,dc=wikimedia,dc=org to nda ldap group T221990
  • 13:56 jbond42: rolling security updates for imagemagick
  • 13:45 fsero: DNS: creating docker-registry.svc.(eqiad|codfw).wmnet RRs
  • 13:17 jbond42: rolling security updates for libpng
  • 12:46 godog: resume rollout rsyslog 8.1901.0-1 to jessie hosts - T219764
  • 12:07 jynus: stop dbstore2002:s3 and dbstore2001:s5 for cloning to db2098/99 T220572
  • 11:56 kart_: EU-Midday SWAT done. Thanks.
  • 11:56 kartik@deploy1001: Synchronized php-1.34.0-wmf.1/extensions/ContentTranslation: SWAT: 506971|Change the way we calculate total unmodified MT (T221930) (duration: 00m 56s)
  • 11:30 kartik@deploy1001: Synchronized wmf-config/InitialiseSettings.php: SWAT: 505765|Add namespace "Aldono" at eo.wiktionary (T221525) (duration: 00m 54s)
  • 11:21 kartik@deploy1001: Synchronized wmf-config/InitialiseSettings.php: SWAT: 506939| (T222018) (duration: 00m 53s)
  • 11:14 kartik@deploy1001: Synchronized wmf-config/InitialiseSettings.php: SWAT: 506860|Allow admins to add or remove patroller group at enwikivoyage (T222008) (duration: 00m 55s)
  • 09:27 joal@deploy1001: Finished deploy [analytics/refinery@53a4eee]: Test of deploy-user change (analytics -> analytics-deploy) - bis (duration: 28m 19s)
  • 09:13 jynus: stop dbstore2002:s4 for cloning to db2099 T220572
  • 08:59 joal@deploy1001: Started deploy [analytics/refinery@53a4eee]: Test of deploy-user change (analytics -> analytics-deploy) - bis
  • 08:39 godog: begin migration of bast4002 to prometheus v2 - T187987
  • 08:38 joal@deploy1001: Finished deploy [analytics/refinery@53a4eee]: Test of deploy-user change (analytics -> analytics-deploy) (duration: 15m 38s)
  • 08:33 elukey: restart keyholder on deploy1001 + rearm keys
  • 08:28 elukey: restart keyholder-proxy on deploy1001 (attempt to see if new analytics scap settings got applied)
  • 08:25 oblivian@deploy1001: Synchronized wmf-config/CommonSettings.php: Enable unicode overrides table for php 7.2 T219279 (duration: 00m 53s)
  • 08:25 jynus: stop dbstore2001:s2 for cloning to db2098 T220572
  • 08:23 oblivian@deploy1001: Synchronized wmf-config/Php72ToUpper.php: Adding unicode overrides table for php 7.2 T219279 (duration: 00m 54s)
  • 08:23 joal@deploy1001: Started deploy [analytics/refinery@53a4eee]: Test of deploy-user change (analytics -> analytics-deploy)
  • 07:58 marostegui@deploy1001: Synchronized wmf-config/db-codfw.php: Move db2045 from s8 to x1 T219493 (duration: 00m 55s)
  • 07:47 marostegui: Stop mysql on db2034 (lag will happen on x1 codfw) - T219493
  • 07:44 marostegui: Stop replication on db2034 (x1 master) for maintenance - T219493
  • 07:13 moritzm: updated stretch netboot image for 9.9 point release

2019-04-28

  • 17:46 jiji@cumin1001: conftool action : set/pooled=no; selector: name=cp3037.esams.wmnet
  • 17:46 jijiki: Depooling cp3037 - server and mgmt is unreachable
  • 14:55 James_F: Updated trwiki's MediaWiki:Common.css to not over-ride the logo.
  • 14:53 James_F: Manually purged the trwiki logos from Varnish as part of updating them for 2 year anniversary.
  • 14:47 jforrester@deploy1001: Synchronized static/images/project-logos/trwiki.png: trwiki: Update logo for 2 year anniversary, part III (duration: 00m 53s)
  • 14:45 jforrester@deploy1001: Synchronized static/images/project-logos/trwiki-1.5x.png: trwiki: Update logo for 2 year anniversary, part II (duration: 00m 53s)
  • 14:44 jforrester@deploy1001: Synchronized static/images/project-logos/trwiki-2x.png: trwiki: Update logo for 2 year anniversary, part I (duration: 00m 55s)

2019-04-27

  • 17:44 elukey: restart pdfrender on scb1002 (alert flapping)
  • 12:37 jynus: correcting last log, stopping dbstore2002:s1 to clone it to db2097 T220572
  • 12:37 jynus: stopping dbstore2002:s6 to clone it to db2097 T220572
  • 00:11 foks: reset passwords for FritzSolms@global and Seanhood@global

2019-04-26

  • 20:15 foks: changing email and password for "Lemon martini@global"
  • 19:38 foks: changing password for JDiPierro@global
  • 19:21 bblack: varnish-backend-restart on cp4026, evidence of artificial 503s from mbox lag behavior, probably related to the semi-abuse client doing odd 404 traffic to ulsfo that's triggering bugs in swift's rewrite.py ....
  • 19:04 foks: changing password for Subinsebastien
  • 17:50 mutante: analytics1052 - reported broken systemd state in Icinga - service mcelog was in state failed - systemctl start mcelog - (T212219 ?)
  • 16:18 jynus: stop s6 mariadb instance on dbstore2001 T220572
  • 15:34 jbond42: upgrade puppet 4=> 5 and facter 2 => 3 on canary hosts: thumbor1001 ms-fe1005 ms-be1013 scb1001 restbase1007
  • 15:05 jbond42: upgrade puppet 4=> 5 and facter 2 => 3 on canary hosts: ores1001.yaml wtp1025.yaml rdb1006.yaml
  • 14:18 marostegui: Set pc1004-1006 and pc2004-2006 as unracked on netbox - T209858 T210969
  • 13:17 jbond42: upgrade puppet 4=> 5 and facter 2 => 3 on canary hosts: mw1311.yaml, mx2001 & dubnium
  • 12:52 ema: cp4025: restart varnish-be due to mbox lag
  • 12:50 jijiki: Restarting hhvm on mw1288
  • 12:48 jbond42: upgrade puppet 4=> 5 and facter 2 => 3 on mc1019, maps1001 and logstash1007
  • 12:45 ema@puppetmaster1001: conftool action : set/pooled=yes; selector: cluster=cache_upload,name=cp4021.ulsfo.wmnet,dc=ulsfo
  • 12:44 ema: pool cp4021 w/ ATS backend T219967
  • 12:20 ema: repool cp3030 after directors.frontend.vcl testing T219967
  • 12:09 jbond42: upgrade puppet 4=> 5 and facter 2 => 3 on canary hosts: elastic1017, ganeti2001, analytics1042
  • 11:26 jbond42: upgrade puppet 4=> 5 and facter 2 => 3 on lvs4007, dns2001 and multatuli
  • 11:16 jbond42: upgrade puppet 4=> 5 and facter 2 => 3 on bast4002, aqs1004 and conf2001
  • 10:28 moritzm: restarting Parsoid on wtp1025 for glibc update
  • 10:19 ema: depool cp3030 for testing T219967
  • 09:48 marostegui: Remove labtestservices2001 from tendril - T218022
  • 09:11 moritzm: restarting AQS on aqs1004 for glibc update
  • 08:42 elukey: restart pdfrender on scb1003 (alert flapping)
  • 08:21 moritzm: uploaded php-xdebug 2.7.0+wmf1 for component/php72 (T221923)
  • 07:20 moritzm: installing glibc updates on a number of analytics hosts
  • 04:58 marostegui@deploy1001: Synchronized wmf-config/db-eqiad.php: Repool db1113:3316 T221782 (duration: 00m 56s)
  • 00:31 eileen: civicrm revision changed from 88736c7c11 to 34027da7df, config revision is 2119df9495

2019-04-25

2019-04-24

  • 22:46 mutante: icinga-downtime -h ms-be2034 -r swift-rebalancing -d 86400
  • 22:19 mutante: deploying varnish/trafficserver change to cover www.wikiba.se (not prod yet)
  • 22:19 mutante: icinga-downtime -h ms-be2039 -r swift-rebalancing -d 86400
  • 21:31 mutante: icinga-downtime -h ms-be2038 -r swift-rebalancing -d 86400
  • 20:41 mobrovac@deploy1001: Finished deploy [restbase/deploy@8a6b6fc]: Parsoid storage simplification step 1: switch Parsoid stashing to simple key/value - T215956 (duration: 20m 39s)
  • 20:21 mobrovac@deploy1001: Started deploy [restbase/deploy@8a6b6fc]: Parsoid storage simplification step 1: switch Parsoid stashing to simple key/value - T215956
  • 20:01 mobrovac@deploy1001: Finished deploy [restbase/deploy@8a6b6fc] (dev-cluster): Switch Parsoid stashing to simple key/value (duration: 04m 18s)
  • 19:57 mobrovac@deploy1001: Started deploy [restbase/deploy@8a6b6fc] (dev-cluster): Switch Parsoid stashing to simple key/value
  • 18:47 mutante: pooled mw1297 as a new API server (T192457)
  • 18:46 dzahn@cumin1001: conftool action : set/pooled=yes; selector: name=mw1297.eqiad.wmnet,cluster=api_appserver
  • 18:45 mutante: mw1297 - scap pull
  • 18:17 mutante: sudo icinga-downtime -h ms-be2031 -r swift-rebalancing -d 86400
  • 17:52 mutante: contint1001 - for logfile in $(find /var/log/zuul/ ! -name "*.gz"); do gzip $logfile; done to get more disk space (T207707)
  • 17:33 mutante: contint1001 - apt-get clean for 1% more disk space
  • 17:23 mutante: proton1001 - restarting proton service - low RAM caused facter/puppet fails (https://tickets.puppetlabs.com/browse/PUP-8048) freed memory and fixed puppet run (cc: T219456 T214975)
  • 16:33 catrope@deploy1001: Synchronized php-1.34.0-wmf.1/extensions/GrowthExperiments/: Fix exceptions in Homepage logging (duration: 00m 56s)
  • 15:52 herron: performing rolling restart of pybal on low-traffic eqiad/codfw lvs hosts
  • 15:32 jijiki: Restarting php7.2-fpm on mw2* in codfw for 505383 and T211488
  • 15:00 herron: switching kibana lvs to source hash scheduler
  • 14:41 jijiki: restart pdfrender on scb1002
  • 14:28 godog: being rollout rsyslog 8.1901.0-1 to jessie hosts - T219764
  • 13:38 marostegui: Poweroff db2080 for onsite maintenance - T216240
  • 13:01 jijiki: Restarting php7.2-fpm on mw13* for 505383 and T211488
  • 12:36 jijiki: restarting pdfrender on scb1004
  • 12:23 moritzm: rolling restart of Cassandra on restbase/eqiad to pick up Java security update
  • 11:59 jijiki: Restarting php7.2-fpm on mw12* for 505383 and T211488
  • 11:45 gehel: restarting relforge for jvm ugprade
  • 11:33 jbond42: security update ghostscript on scb jessie servers
  • 11:25 jijiki: Restarting php7.2-fpm on mw-canary for 505383 and T211488
  • 11:23 ladsgroup@deploy1001: Finished deploy [ores/deploy@060fc37]: (no justification provided) (duration: 16m 18s)
  • 11:07 ladsgroup@deploy1001: Started deploy [ores/deploy@060fc37]: (no justification provided)
  • 10:28 akosiaris@deploy1001: scap-helm cxserver finished
  • 10:28 akosiaris@deploy1001: scap-helm cxserver cluster staging completed
  • 10:28 akosiaris@deploy1001: scap-helm cxserver upgrade -f cxserver-staging-values.yaml staging stable/cxserver [namespace: cxserver, clusters: staging]
  • 10:23 jijiki: Restarting php-fpm on mw1238 for 505383 and T211488
  • 09:58 moritzm: installing rsync security updates on jessie
  • 08:44 moritzm: rolling restart of Cassandra on restbase/codfw to pick up Java security update
  • 08:29 godog: swift eqiad-prod: start decom for ms-be101[45] - T220590
  • 08:17 godog: bounce prometheus on bast5001 after migration and backfill
  • 08:04 gehel@cumin1001: END (PASS) - Cookbook sre.elasticsearch.force-shard-allocation (exit_code=0)
  • 08:04 gehel@cumin1001: START - Cookbook sre.elasticsearch.force-shard-allocation
  • 08:02 gehel@cumin1001: END (PASS) - Cookbook sre.elasticsearch.force-shard-allocation (exit_code=0)
  • 08:02 gehel@cumin1001: START - Cookbook sre.elasticsearch.force-shard-allocation
  • 06:41 marostegui: Optimize tables on pc1010
  • 06:38 elukey: restart pdfrender on scb1003
  • 06:37 marostegui@deploy1001: Synchronized wmf-config/db-codfw.php: Repool db2082 (duration: 00m 52s)
  • 06:22 marostegui: Upgrade db2082
  • 06:22 marostegui@deploy1001: Synchronized wmf-config/db-codfw.php: Repool db2079, depool db2082 (duration: 00m 55s)
  • 06:18 marostegui: Upgrade db2081
  • 06:10 marostegui: Upgrade db2079
  • 06:10 marostegui@deploy1001: Synchronized wmf-config/db-codfw.php: Repool db2086, depool db2079 (duration: 00m 53s)
  • 05:55 marostegui: Upgrade db2086
  • 05:55 marostegui@deploy1001: Synchronized wmf-config/db-codfw.php: Repool db2083 and depool db2086 (duration: 00m 52s)
  • 05:38 marostegui: Upgrade db2080 and db2083
  • 05:38 marostegui@deploy1001: Synchronized wmf-config/db-codfw.php: Depool db2080 and db2083 (duration: 00m 54s)
  • 03:45 SMalyshev: repooled wdqs1003, it's good now
  • 01:26 eileen: jobs restarted process-control config revision is ef6d4761e5
  • 01:06 eileen: civicrm revision changed from 31982324b8 to 468f85e524, config revision is 13b9eefe7b
  • 01:02 eileen: process-control config revision is 13b9eefe7b
  • 00:29 mutante: mw1297 - rebooting for nutcracker issue
  • 00:28 mutante: mw1297 - scap pull
  • 00:08 mutante: DNS - add initiatives.wikimedia.org (and initiaves.m) for campaign wiki requested at T167375

2019-04-23

  • 23:51 mutante: mw1297 - initial puppet run - will show up in Icinga in a little while but not pooled yet.. all the things are being installed right now
  • 23:48 ejegg: updated payments-wiki (inactive cluster) from 7a312e371a to aa8dad50e7
  • 23:39 jforrester@deploy1001: Synchronized php-1.34.0-wmf.1/extensions/GrowthExperiments/modules/homepage/ext.growthExperiments.Homepage.Logger.js: SWAT GrowthExperiments: Fix validation errors due to state= (duration: 00m 53s)
  • 23:38 jforrester@deploy1001: Synchronized php-1.34.0-wmf.1/extensions/GrowthExperiments/includes/EventLogging/SpecialHomepageLogger.php: SWAT GrowthExperiments: Fix EventLogging errors (duration: 00m 53s)
  • 23:25 mutante: generating mcrouter certs for appservers, added mw1297.eqiad.wmnet (T192457)
  • 23:23 jforrester@deploy1001: Synchronized php-1.34.0-wmf.1/languages/Language.php: SWAT T219728 Add support for new Japanese era name 'Reiwa' (duration: 00m 52s)
  • 23:20 jforrester@deploy1001: Synchronized php-1.34.0-wmf.1/extensions/VisualEditor/modules/ve-mw/init/ve.init.mw.Target.js: SWAT T221668 VisualEditor: Restore external paste sanitization of DOM elements (duration: 00m 55s)
  • 23:09 jforrester@deploy1001: Synchronized wmf-config/InitialiseSettings.php: SWAT T221521 Add autoreviewer to wgRestrictionLevels on ptwikinews (duration: 00m 54s)
  • 22:35 XioNoX: push firewall rule to pfw3-eqiad - T221475
  • 22:33 XioNoX: push firewall rule to pfw3-codfw - T221475
  • 21:54 reedy@deploy1001: Synchronized php-1.34.0-wmf.1/extensions/ORES/includes/Specials/SpecialORESModels.php: T221696 (duration: 00m 55s)
  • 21:43 robh@cumin1001: END (PASS) - Cookbook sre.hosts.decommission (exit_code=0)
  • 21:43 robh@cumin1001: START - Cookbook sre.hosts.decommission
  • 21:33 thcipriani: restarting gerrit to pickup config changes
  • 20:55 smalyshev@deploy1001: Finished deploy [wdqs/wdqs@51b4728]: Deploy new Updater fix for cnstraints (T221407) (duration: 13m 03s)
  • 20:43 andrewbogott: updating designate pools on cloudservices1003 and 1004 using eqiad1_pool_config.yml template from the puppet repo
  • 20:42 smalyshev@deploy1001: Started deploy [wdqs/wdqs@51b4728]: Deploy new Updater fix for cnstraints (T221407)
  • 20:26 urandom: dropping disused restbase keyspaces -- T221530
  • 19:57 robh@cumin1001: END (PASS) - Cookbook sre.hosts.decommission (exit_code=0)
  • 19:57 robh@cumin1001: START - Cookbook sre.hosts.decommission
  • 19:32 mutante: webperf* - running puppet to git pull docroot
  • 19:11 thcipriani: gerrit restart
  • 18:59 krinkle@deploy1001: Synchronized php-1.34.0-wmf.1/extensions/MassMessage: c640195 (duration: 00m 56s)
  • 18:09 SMalyshev: depool wdqs1003 to let it catch up
  • 18:03 robh@cumin1001: END (PASS) - Cookbook sre.hosts.decommission (exit_code=0)
  • 18:03 robh@cumin1001: START - Cookbook sre.hosts.decommission
  • 18:03 robh@cumin1001: END (PASS) - Cookbook sre.hosts.decommission (exit_code=0)
  • 18:02 robh@cumin1001: START - Cookbook sre.hosts.decommission
  • 17:43 jijiki: Restarting memcached on mc1029 - T208844
  • 17:26 bsitzmann@deploy1001: Finished deploy [mobileapps/deploy@78985fb]: Update mobileapps to 6d3a422 (T201382 T217837) (duration: 04m 06s)
  • 17:22 bsitzmann@deploy1001: Started deploy [mobileapps/deploy@78985fb]: Update mobileapps to 6d3a422 (T201382 T217837)
  • 16:55 jijiki: Depool thumbor2004 for 505759 and pool back - T187765
  • 16:54 gehel: restart wdqs for jvm ugprade
  • 16:49 jijiki: Depool thumbor1004 for 505759 and pool back - T187765
  • 16:43 jijiki: Depool thumbor2003 for 505759 and pool back - T187765
  • 16:40 jijiki: Depool thumbor1003 for 505759 and pool back - T187765
  • 16:38 otto@deploy1001: Synchronized wmf-config/InitialiseSettings.php: Enable api-request logging to eventgate-analytics for all wikis - T214080 (duration: 00m 53s)
  • 16:33 ottomata: proceeding to enable api-request eventgate-analytics logging for all wikis
  • 16:31 herron: added jfishback to wmf ldap group T221660
  • 16:12 robh@cumin1001: END (PASS) - Cookbook sre.hosts.decommission (exit_code=0)
  • 16:12 robh@cumin1001: START - Cookbook sre.hosts.decommission
  • 16:07 reedy@deploy1001: Synchronized wmf-config/InitialiseSettings.php: set wglocaltimezone for sqwikiquote T221627 (duration: 00m 54s)
  • 15:28 mlitn@deploy1001: Synchronized wmf-config/InitialiseSettings.php: SDC: Enable Depicts functionality on Commons (duration: 00m 54s)
  • 14:27 jijiki: Depool thumbor2002 for 505759 and pool back - T187765
  • 14:21 jijiki: Depool thumbor1002 for 505759 and pool back - T187765
  • 14:16 jijiki: Depool thumbor2001 for 505759 and pool back - T187765
  • 14:14 jijiki: Depool thumbor1001 for 505759 and pool back - T187765
  • 14:07 jijiki: Disable puppet on thumbor* to merge 505759
  • 13:54 ema: depool cp4021 and reimage as upload_ats T219967
  • 13:17 jijiki: Restart nagios-nrpe-server on prometheus1003
  • 12:15 godog: swift eqiad-prod: fully decom ms-be1013 - T220590
  • 11:59 moritzm: installing clamav security updates on fermium
  • 11:56 kart_: EU-Midday SWAT is done.
  • 11:54 kart_: 'SWAT: gerrit:505059 deployment-prep: Use new poolcounter instance, gerrit:505060 deployment-prep: Use new ms-fe host.'
  • 11:53 kartik@deploy1001: Synchronized wmf-config/LabsServices.php: SWAT: gerrit:505643 (duration: 00m 53s)
  • 11:45 jijiki: Stop xenon-log, excimer-log and apache on mwlog*
  • 11:43 kartik@deploy1001: Synchronized wmf-config/InitialiseSettings.php: SWAT: gerrit:505643 Turn off logging for CitationUsage and CitationUsagePageLoad (T213969) (duration: 00m 53s)
  • 11:29 kartik@deploy1001: Synchronized wmf-config/InitialiseSettings.php: SWAT: Fix undefined variable from last SWAT (duration: 00m 54s)
  • 11:27 moritzm: installing clamav security updates on mendelevium (OTRS host)
  • 11:18 kartik@deploy1001: Synchronized wmf-config: SWAT: gerrit:505220 Use higher unmodified MT threshold for Indonesian Wikipedia (T221353) (duration: 00m 57s)
  • 10:44 moritzm: uploaded ferm 2.4-1+wmf2+deb10u1 to buster-wikimedia (T153468)
  • 09:23 godog: upgrade prometheus to v2 on bast5001, previous metrics will not be available until migration and backfill are complete - T187987
  • 09:19 elukey: dumping Kafka consumer offsets' history on logstash1012 for T221202
  • 09:00 fdans@deploy1001: Finished deploy [analytics/refinery@0d63671]: deploying changes to pageview definition brought in refinery source 0.0.87 (duration: 14m 09s)
  • 08:54 fsero: synchronizing old docker_registry content into new one - T221101
  • 08:46 fdans@deploy1001: Started deploy [analytics/refinery@0d63671]: deploying changes to pageview definition brought in refinery source 0.0.87
  • 08:14 moritzm: removing debmonitor entries for labvirt* hosts
  • 08:06 moritzm: installing wget security updates on jessie
  • 07:27 gilles@deploy1001: Synchronized wmf-config/InitialiseSettings.php: T216499 Set wgPriorityHintsRatio (duration: 00m 52s)
  • 06:20 marostegui@deploy1001: Synchronized wmf-config/db-eqiad.php: Repool all slaves in x1 T136427 (duration: 00m 57s)
  • 05:52 elukey: powercycle wtp2019 - no ssh, mgmt console stuck
  • 05:16 marostegui: Deploy schema change on x1 master - lag will appear on x1 slaves - T136427
  • 05:16 marostegui@deploy1001: Synchronized wmf-config/db-eqiad.php: Depool all slaves in x1 T136427 (duration: 00m 54s)

2019-04-22

  • 18:46 gilles@deploy1001: Synchronized php-1.34.0-wmf.1/includes/media/ThumbnailImage.php: T216499 Only apply high priority hint half the time (duration: 00m 53s)
  • 18:22 XioNoX: Add k8s BGP neighbors on cr1/2-eqiad - T220822
  • 18:15 XioNoX: Add k8s BGP neighbors on cr1/2-codfw - T220822
  • 08:47 marostegui: finished maintenance window on dbstore1003 and dbstore1005
  • 08:37 marostegui: Upgrade dbstore1005
  • 07:53 marostegui@deploy1001: Synchronized wmf-config/db-eqiad.php: Fully repool db1099 (duration: 00m 54s)
  • 07:37 marostegui@deploy1001: Synchronized wmf-config/db-eqiad.php: More traffic to db1099 (duration: 00m 53s)
  • 07:16 marostegui@deploy1001: Synchronized wmf-config/db-eqiad.php: More traffic to db1099 (duration: 00m 53s)
  • 06:40 marostegui: Upgrade dbstore1003
  • 06:38 marostegui@deploy1001: Synchronized wmf-config/db-eqiad.php: More traffic to db1099 (duration: 00m 53s)
  • 05:56 marostegui@deploy1001: Synchronized wmf-config/db-eqiad.php: More traffic to db1099 (duration: 00m 53s)
  • 05:38 marostegui@deploy1001: Synchronized wmf-config/db-eqiad.php: Slowly repool db1099 (duration: 00m 54s)
  • 05:26 marostegui: Stop MySQL and reboot db1099 to see if memory errors clear up T221502
  • 05:25 marostegui@deploy1001: Synchronized wmf-config/db-eqiad.php: Depool db1099 T221502 (duration: 01m 15s)

2019-04-21

  • 05:19 marostegui: Clean up some space on webperf2001 - T221508

2019-04-20

  • 08:12 _joe_: depooling mw1261,mw1312 wikidata (at least) not working
  • 07:58 jijiki: Pool thumbor1001
  • 07:52 jijiki: depool thumbor1001, switch back to nginx - T187765
  • 07:50 _joe_: restarting php-fpm on mw1312, mw1261 to test the new settings over the weekend

2019-04-19

  • 23:17 dzahn@cumin1001: conftool action : set/pooled=yes; selector: name=mw2245.codfw.wmnet,cluster=api_appserver
  • 23:16 dzahn@cumin1001: conftool action : set/pooled=yes; selector: name=mw2244.codfw.wmnet,cluster=api_appserver
  • 23:10 dzahn@cumin1001: conftool action : set/pooled=yes; selector: name=mw2150.codfw.wmnet,service=nginx,cluster=jobrunner
  • 22:55 mutante: mw2244,mw2245,mw2150 - scap pull
  • 22:53 mutante: mw2244,mw2245,mw2150 - rebooting for known nutcracker issue after first install
  • 22:47 mutante: furud - remounted /mnt/hdfs for T221483
  • 21:42 mutante: mw2150,mw2244,mw2245: initial puppet run, added to mw roles
  • 19:38 otto@deploy1001: Synchronized wmf-config/CirrusSearch-common.php: No-op - enabling cirrussearch-request logging in beta (duration: 00m 52s)
  • 19:37 otto@deploy1001: Synchronized wmf-config/InitialiseSettings-labs.php: No-op - enabling cirrussearch-request logging in beta (duration: 00m 53s)
  • 19:36 otto@deploy1001: Synchronized wmf-config/InitialiseSettings.php: No-op - prep for enabling cirrussearch-request logging in beta (duration: 00m 53s)
  • 16:20 bblack: wikipedia.org CNAME TTLs increase to 4H - https://gerrit.wikimedia.org/r/c/operations/dns/+/505249 - T208263
  • 16:18 ejegg: rolled back payments-wiki from eb3d0f35de to aa8dad50e7
  • 15:55 reedy@deploy1001: Synchronized php-1.34.0-wmf.1/includes/logging/LogFormatter.php: T220767 (duration: 00m 53s)
  • 15:54 bblack: restart pybal on lvs1016 (eqiad primary) for eventscehmas service add
  • 15:54 reedy@deploy1001: Synchronized php-1.34.0-wmf.1/includes/Linker.php: T220767 (duration: 00m 55s)
  • 15:50 bblack@cumin1001: conftool action : set/pooled=yes; selector: name=schema.*
  • 15:42 bblack: restart pybal on lvs2003 (codfw primary) for eventscehmas service add
  • 15:39 bblack: restart pybal on lvs2006 (codfw backup) for eventscehmas service add
  • 15:32 bblack: restarting pybal on lvs1006 (eqiad backup) for eventschema service add
  • 14:59 volans: uploaded spicerack_0.0.23-1_amd64.deb to apt.wikimedia.org stretch-wikimedia
  • 12:59 gilles@deploy1001: Synchronized wmf-config/InitialiseSettings.php: T216499 T216598 Enable Priority Hints and Element Timing on eswiki (duration: 00m 56s)
  • 08:45 akosiaris: restart gerrit to pick up https://gerrit.wikimedia.org/r/504981
  • 06:39 elukey: roll restart of druid daemons on druid100[1-3] to pick up new jvm settings

2019-04-18

  • 23:16 mobrovac: evening SWAT completed
  • 23:10 mobrovac@deploy1001: Synchronized php-1.34.0-wmf.1/extensions/EventBus/includes: (no justification provided) (duration: 00m 54s)
  • 23:10 ejegg: updated payments-wiki from aa8dad50e7 to eb3d0f35de
  • 23:07 mobrovac@deploy1001: Synchronized wmf-config/InitialiseSettings.php: Add wikimania years namespaces to wgNamespacesWithSubpages - T220950 (duration: 00m 53s)
  • 23:00 robh@cumin1001: END (PASS) - Cookbook sre.hosts.decommission (exit_code=0)
  • 23:00 robh@cumin1001: START - Cookbook sre.hosts.decommission
  • 22:40 ejegg: updated payments-wiki from aa8dad50e7 to 2f7cd8f195
  • 22:14 mutante: LDAP - adding 'ldoan' and 'schang' to 'wmf' (T221118)
  • 22:01 XioNoX: remove asw2-a-eqiad license keys for troubleshoting
  • 21:58 ejegg: rolled back payments-wiki to aa8dad50e7
  • 21:55 mutante: LDAP - adding rosalie-wmde to group 'wmde' (T220691)
  • 21:52 ejegg: updated payments-wiki from aa8dad50e7 to 2f7cd8f195
  • 21:28 mutante: puppetmaster1001 - mcrouter_generate_certs --generate
  • 21:18 thcipriani@deploy1001: Finished deploy [gerrit/gerrit@e3c340f]: plugin update -- no restart needed (cobalt) (duration: 00m 10s)
  • 21:18 thcipriani@deploy1001: Started deploy [gerrit/gerrit@e3c340f]: plugin update -- no restart needed (cobalt)
  • 21:17 thcipriani@deploy1001: Finished deploy [gerrit/gerrit@e3c340f]: plugin update -- no restart needed (gerrit2001) (duration: 00m 11s)
  • 21:17 thcipriani@deploy1001: Started deploy [gerrit/gerrit@e3c340f]: plugin update -- no restart needed (gerrit2001)
  • 21:14 robh@cumin1001: END (FAIL) - Cookbook sre.hosts.decommission (exit_code=99)
  • 21:14 robh@cumin1001: START - Cookbook sre.hosts.decommission
  • 20:56 twentyafterfour@deploy1001: rebuilt and synchronized wikiversions files: all wikis to 1.34.0-wmf.1 refs T220726
  • 20:52 cdanis: root@icinga1001.wikimedia.org /var/lib/icinga # for DOWNTIME in $(fgrep -B12 'comment=mobrovac: temp stop JQ for T221368 - cdanis@cumin1001' retention.dat | grep -A13 servicedowntime | grep downtime_id | cut -d= -f2); do printf "[%lu] DEL_SVC_DOWNTIME;%u\n" $(date +%s) $DOWNTIME ; done > rw/icinga.cmd
  • 20:40 mobrovac@deploy1001: Synchronized php-1.34.0-wmf.1/extensions/Translate/utils/MessageUpdateJob.php: Translate jobs: Remove problematic Job::$params assignments, dir 2/2 - T221368 (duration: 01m 00s)
  • 20:39 mobrovac@deploy1001: Synchronized php-1.34.0-wmf.1/extensions/Translate/tag: Translate jobs: Remove problematic Job::$params assignments, dir 1/2 - T221368 (duration: 01m 01s)
  • 20:32 cdanis: cdanis@cumin1001.eqiad.wmnet ~ % sudo cumin 'scb*' 'enable-puppet "mobrovac: temp stop JQ for T221368"'
  • 20:31 mobrovac@deploy1001: Finished deploy [cpjobqueue/deploy@71941b1]: Ignore Kafka disconnect errors (duration: 00m 51s)
  • 20:30 mobrovac@deploy1001: Started deploy [cpjobqueue/deploy@71941b1]: Ignore Kafka disconnect errors
  • 19:36 cdanis: cdanis@cumin1001.eqiad.wmnet ~ % sudo cookbook sre.hosts.downtime -r "mobrovac: temp stop JQ for T221368" 'scb*'
  • 19:36 cdanis@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
  • 19:36 cdanis@cumin1001: START - Cookbook sre.hosts.downtime
  • 19:29 cdanis: cdanis@cumin1001.eqiad.wmnet ~ % sudo cumin 'scb*' 'disable-puppet "mobrovac: temp stop JQ for T221368" && systemctl stop cpjobqueue'
  • 19:17 mobrovac@deploy1001: Started restart [cpjobqueue/deploy@922cbc0]: Bounce CP4JQ, lots of transport broken failures - T221368
  • 19:11 mobrovac@deploy1001: Synchronized php-1.34.0-wmf.1/extensions/EventBus/includes/EventFactory.php: Remove the use of page titles in JobExecutor, file 2/2 - T221368 (duration: 00m 59s)
  • 19:10 mobrovac@deploy1001: Synchronized php-1.34.0-wmf.1/extensions/EventBus/includes/JobExecutor.php: Remove the use of page titles in JobExecutor, file 1/2 - T221368 (duration: 01m 01s)
  • 18:47 robh@cumin1001: END (PASS) - Cookbook sre.hosts.decommission (exit_code=0)
  • 18:47 robh@cumin1001: START - Cookbook sre.hosts.decommission
  • 18:47 robh@cumin1001: END (PASS) - Cookbook sre.hosts.decommission (exit_code=0)
  • 18:47 robh@cumin1001: START - Cookbook sre.hosts.decommission
  • 18:41 mutante: mw2150 - reimaging, not in confctl
  • 18:02 dzahn@puppetmaster1001: conftool action : set/pooled=yes; selector: name=mw2151.codfw.wmnet,cluster=jobrunner,service=nginx
  • 17:49 mutante: mw2151 - scap pull
  • 17:46 mobrovac@deploy1001: Synchronized php-1.34.0-wmf.1/extensions/EventBus/includes/JobExecutor.php: Default to a dummy title for invalid titles - T221368 (duration: 01m 01s)
  • 17:20 twentyafterfour@deploy1001: Synchronized php-1.34.0-wmf.1/extensions/AbuseFilter/includes/: sync https://gerrit.wikimedia.org/r/c/mediawiki/extensions/AbuseFilter/+/504863 (duration: 01m 00s)
  • 16:20 bblack: Experimental DNS-level changes deploying for wikipedia.org domain - if wikipedia.org DNS problems appear, revert https://gerrit.wikimedia.org/r/c/operations/dns/+/504588 - T208263
  • 16:17 XioNoX: remove peering to 63199 in eqsin (down for 1 month, no reply to emails)
  • 16:13 XioNoX: rollback dhcp option 82 test from asw2-b-eqiad
  • 14:55 fsero: synchronizing docker_registry_codfw swift container from docker_registry
  • 14:40 XioNoX: push firewall change to pfw3-eqiad - T221278
  • 13:30 jbond42: rolling updates of ruby2.1 on jessie
  • 13:08 elukey: roll restart of cassandra on aqs* to pick up new openjdk upgrades
  • 13:05 jmm@cumin2001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
  • 13:05 jmm@cumin2001: START - Cookbook sre.hosts.downtime
  • 12:58 reedy@deploy1001: rebuilt and synchronized wikiversions files: group1 back to .25
  • 12:36 anomie: Ran `php7adm /opcache-free` on mw1274 to test a theory related to T221347. The log entries related to that task stopped immediately.
  • 12:30 gehel: restarting blazegraph + updater on wdqs* for jvm upgrade
  • 12:22 moritzm: installing Java security updates on restbase-dev hosts (along with Cassandra restarts)
  • 12:21 gehel: restarting blazegraph + updater on wdqs1009 / wdqs1010 for jvm upgrade
  • 12:19 moritzm: installing Java security updates on WDQS autodeploy/test hosts
  • 10:40 jmm@cumin2001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
  • 10:40 jmm@cumin2001: START - Cookbook sre.hosts.downtime
  • 10:35 moritzm: installing rails security updates on jessie hosts
  • 10:21 moritzm: installing jasper updates on jessie hosts
  • 09:44 akosiaris: update grafana service/ dashboard to have user, system, throttled CPU metrics under the CPU saturation row
  • 09:41 gilles@deploy1001: Synchronized wmf-config/InitialiseSettings.php: T216597 Run CPU benchmark for all samples on eswiki/ruwiki (duration: 01m 06s)
  • 09:11 jmm@cumin2001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
  • 09:10 jmm@cumin2001: START - Cookbook sre.hosts.downtime
  • 09:00 jmm@cumin2001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
  • 09:00 jmm@cumin2001: START - Cookbook sre.hosts.downtime
  • 08:54 elukey@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
  • 08:54 elukey@cumin1001: START - Cookbook sre.hosts.downtime
  • 08:54 elukey@cumin1001: END (FAIL) - Cookbook sre.hosts.downtime (exit_code=99)
  • 08:54 elukey@cumin1001: START - Cookbook sre.hosts.downtime
  • 08:53 elukey: reboot kafka10[12-23] (old Analytics cluster) for kernel + openjdk upgrades
  • 08:23 gehel@cumin1001: END (PASS) - Cookbook sre.wdqs.data-transfer (exit_code=0)
  • 08:14 moritzm: installing libssh2 security updates on jessie
  • 08:01 moritzm: restarting mw1261-mw1265 to pick up new libssh2
  • 07:55 jmm@cumin2001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
  • 07:55 jmm@cumin2001: START - Cookbook sre.hosts.downtime
  • 07:53 filippo@puppetmaster1001: conftool action : set/pooled=yes; selector: name=prometheus2004.codfw.wmnet
  • 07:28 moritzm: installing libssh2 security updates
  • 07:19 gehel@cumin1001: START - Cookbook sre.wdqs.data-transfer
  • 06:58 moritzm: restarting icinga on icinga1001 (T196336)
  • 06:37 moritzm: rolling reboots of Swift backends in eqiad for combined kernel/glibc/OpenSSL update

2019-04-17

  • 22:46 krinkle@deploy1001: Synchronized php-1.34.0-wmf.1/includes/: I3a50508178159 (duration: 01m 21s)
  • 22:40 XioNoX: push firewall change to pfw3-codfw - T221278
  • 22:28 krinkle@deploy1001: Synchronized php-1.34.0-wmf.1/extensions/Score/: Id58156cfca805 / T219342 (duration: 01m 03s)
  • 21:30 XioNoX: enable option-82 on asw2-b:cloud-hosts1-b-eqiad vlan
  • 21:10 thcipriani: gerrit back
  • 21:07 thcipriani@deploy1001: Finished deploy [gerrit/gerrit@4dcb851]: Gerrit update (cobalt -- restart incoming) (duration: 00m 10s)
  • 21:07 thcipriani@deploy1001: Started deploy [gerrit/gerrit@4dcb851]: Gerrit update (cobalt -- restart incoming)
  • 21:06 thcipriani@deploy1001: Finished deploy [gerrit/gerrit@4dcb851]: Gerrit update (gerrit2001 only) (duration: 00m 11s)
  • 21:06 thcipriani@deploy1001: Started deploy [gerrit/gerrit@4dcb851]: Gerrit update (gerrit2001 only)
  • 19:14 twentyafterfour@deploy1001: Synchronized php: group1 wikis to 1.34.0-wmf.1 refs T220726 (duration: 01m 49s)
  • 19:13 twentyafterfour@deploy1001: rebuilt and synchronized wikiversions files: group1 wikis to 1.34.0-wmf.1 refs T220726
  • 18:04 thcipriani: gerrit back
  • 18:01 thcipriani: gerrit restart for https://gerrit.wikimedia.org/r/504611/
  • 17:59 jforrester@deploy1001: Synchronized wmf-config/InitialiseSettings.php: SDC: Enable Wikidata federation on Commons again T214075 (duration: 01m 00s)
  • 17:20 otto@deploy1001: Synchronized wmf-config/InitialiseSettings.php: Enabling EventGate api-request logging on group1 wikis (duration: 01m 00s)
  • 17:18 mutante: LDAP - added 'brennen' to group 'gerritadmin' (T218858)
  • 17:18 jforrester@deploy1001: Synchronized php-1.34.0-wmf.1/extensions/OATHAuth/: UBN T221257 train un-blocker (duration: 01m 02s)
  • 17:09 jforrester@deploy1001: Synchronized php-1.34.0-wmf.1/extensions/Echo/includes/formatters/: Notifications: Revert 7121b9c4 per I8f9a6a19ba (duration: 01m 01s)
  • 16:49 tzatziki: deleting three files for legal compliance
  • 16:47 jforrester@deploy1001: Synchronized php-1.34.0-wmf.1/extensions/WikibaseMediaInfo/: SDC: Various fixes T218922 T221071 T221110 T221123 (duration: 01m 02s)
  • 16:41 jforrester@deploy1001: Synchronized php-1.34.0-wmf.1/autoload.php: Update to point to new maintenance scripts (duration: 01m 00s)
  • 16:39 jforrester@deploy1001: Synchronized php-1.34.0-wmf.1/maintenance/language/generateUpperCharTable.php: Maintenance script for _joe_ (duration: 00m 59s)
  • 16:38 jforrester@deploy1001: Synchronized php-1.34.0-wmf.1/maintenance/language/generateUcfirstOverrides.php: Maintenance script for _joe_ (duration: 01m 00s)
  • 16:21 jforrester@deploy1001: Synchronized php-1.34.0-wmf.1/languages/Language.php: T219279 Ability to set wgOverrideUcfirstCharacters part 1 try two (duration: 01m 00s)
  • 16:18 jforrester@deploy1001: Synchronized php-1.34.0-wmf.1/includes/DefaultSettings.php: T219279 Ability to set wgOverrideUcfirstCharacters part 1b (duration: 01m 03s)
  • 16:13 jforrester@deploy1001: scap failed: average error rate on 4/11 canaries increased by 10x (rerun with --force to override this check, see https://logstash.wikimedia.org/goto/db09a36be5ed3e81155041f7d46ad040 for details)
  • 16:11 XioNoX: set fasw-c-eqiad:ge-[0-1]/0/17 in admin vlan - T221232
  • 16:09 jforrester@deploy1001: Synchronized wmf-config/InitialiseSettings.php: SWAT T220434 Deploy Partial blocks to Chinese Wikipedia (duration: 01m 02s)
  • 14:37 ariel@deploy1001: Finished deploy [dumps/dumps@dcf04a0]: fix up paths for 1.34_wmf.1 for AbstractFilter (duration: 00m 04s)
  • 14:36 ariel@deploy1001: Started deploy [dumps/dumps@dcf04a0]: fix up paths for 1.34_wmf.1 for AbstractFilter
  • 14:35 otto@deploy1001: scap-helm eventgate-analytics finished
  • 14:35 otto@deploy1001: scap-helm eventgate-analytics cluster eqiad completed
  • 14:35 otto@deploy1001: scap-helm eventgate-analytics upgrade production -f eventgate-analytics-eqiad-values.yaml --reset-values stable/eventgate-analytics [namespace: eventgate-analytics, clusters: eqiad]
  • 14:34 otto@deploy1001: scap-helm eventgate-analytics finished
  • 14:34 otto@deploy1001: scap-helm eventgate-analytics cluster codfw completed
  • 14:34 otto@deploy1001: scap-helm eventgate-analytics upgrade production -f eventgate-analytics-codfw-values.yaml --reset-values stable/eventgate-analytics [namespace: eventgate-analytics, clusters: codfw]
  • 14:13 elukey@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
  • 14:12 elukey@cumin1001: START - Cookbook sre.hosts.downtime
  • 13:56 otto@deploy1001: scap-helm eventgate-analytics finished
  • 13:56 otto@deploy1001: scap-helm eventgate-analytics cluster staging completed
  • 13:56 otto@deploy1001: scap-helm eventgate-analytics upgrade staging -f eventgate-analytics-staging-values.yaml --reset-values stable/eventgate-analytics [namespace: eventgate-analytics, clusters: staging]
  • 13:52 elukey: upgrading hadoop cdh distrubition to 5.16.1 on all the Hadoop-related nodes - T218343
  • 13:48 elukey@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
  • 13:48 elukey@cumin1001: START - Cookbook sre.hosts.downtime
  • 13:48 godog: reimage prometheus2004 - T187987
  • 12:57 filippo@puppetmaster1001: conftool action : set/pooled=yes; selector: name=prometheus1004.eqiad.wmnet
  • 12:44 godog: bounce prometheus instances on prometheus[12]003 after https://gerrit.wikimedia.org/r/c/operations/puppet/+/499742
  • 12:33 moritzm: running some ferm tests on graphite2002
  • 12:10 godog: briefly stop all prometheus on prometheus1003 to finish metrics rsync - T187987
  • 11:39 Lucas_WMDE: EU SWAT done
  • 11:38 lucaswerkmeister-wmde@deploy1001: Synchronized wmf-config/InitialiseSettings.php: SWAT: Enable suggestion constraint status on testwikidata (T221108, T204439)|gerrit:504380Enable suggestion constraint status on testwikidata (T221108, T204439) (duration: 01m 01s)
  • 10:58 volans@deploy1001: Finished deploy [debmonitor/deploy@f049b3b]: Deploy Debmonitor v0.1.9 (duration: 01m 00s)
  • 10:57 volans@deploy1001: Started deploy [debmonitor/deploy@f049b3b]: Deploy Debmonitor v0.1.9
  • 10:40 moritzm: installing Java security updates on kafka/analytics cluster
  • 09:17 godog: swift eqiad-prod continue ms-be1013 decom - T220590
  • 09:09 elukey: restart eventlogging on eventlog1002 due to errors in processors and consumer lag accumulated after the last Kafka Jumbo roll restart
  • 08:47 godog: reimage prometheus1004 - T187987
  • 08:38 jynus@deploy1001: Synchronized wmf-config/db-eqiad.php: Repool db1078 fully (duration: 01m 00s)
  • 08:29 moritzm: installing ghostscript security updates
  • 07:51 gilles@deploy1001: Synchronized php-1.33.0-wmf.25/extensions/NavigationTiming: T216597 Event timing support (duration: 01m 01s)
  • 07:45 gilles@deploy1001: Synchronized wmf-config/InitialiseSettings.php: T216597 Enable Event Timing origin trial on ruwiki and eswiki (duration: 01m 04s)
  • 07:21 jynus@deploy1001: Synchronized wmf-config/db-eqiad.php: Repool db1078 with low load (duration: 01m 18s)
  • 07:07 moritzm: rolling reboots of Swift backends in codfw for combined kernel/glibc/OpenSSL update

2019-04-16

  • 23:42 ebernhardson@deploy1001: Synchronized wmf-config/InitialiseSettings.php: Return CirrusSearch to standard execution against eqiad cluster (duration: 01m 00s)
  • 23:37 ebernhardson@deploy1001: Synchronized php-1.33.0-wmf.25/extensions/CirrusSearch/includes/: Fix fatals on malformed search queries against overridden clusters (duration: 01m 06s)
  • 22:42 thcipriani: gerrit back
  • 22:39 thcipriani: restarting gerrit for configuration update https://gerrit.wikimedia.org/r/504448
  • 22:24 jforrester@deploy1001: Synchronized wmf-config/InitialiseSettings.php: T165795 Give bureaucrats the usermerge right (duration: 00m 59s)
  • 22:20 jforrester@deploy1001: Synchronized php-1.34.0-wmf.1/extensions/NewUserMessage/includes/NewUserMessage.php: Disable onLocalUserCreated for known bot accounts (duration: 01m 01s)
  • 22:17 mobrovac@deploy1001: Finished deploy [restbase/deploy@f1c767d]: mobile-sections simplification: use the key/value bucket only - T215960 (duration: 20m 02s)
  • 22:09 jforrester@deploy1001: Synchronized wmf-config/InitialiseSettings.php: T165795 Enable the UserMerge extension for clean-up on wikitech (duration: 01m 00s)
  • 21:57 mobrovac@deploy1001: Started deploy [restbase/deploy@f1c767d]: mobile-sections simplification: use the key/value bucket only - T215960
  • 21:56 eileen: civicrm revision changed from 1bc1570967 to 31982324b8, config revision is e5a7908330
  • 21:56 mobrovac@deploy1001: Finished deploy [restbase/deploy@f1c767d] (dev-cluster): mobile-sections simplification: use the key/value bucket only (duration: 05m 24s)
  • 21:50 mobrovac@deploy1001: Started deploy [restbase/deploy@f1c767d] (dev-cluster): mobile-sections simplification: use the key/value bucket only
  • 21:47 twentyafterfour@deploy1001: rebuilt and synchronized wikiversions files: group0 wikis to 1.34.0-wmf.1 refs T220726
  • 21:24 andrewbogott: deleting 'eqiad' endpoint in keystone
  • 21:21 twentyafterfour@deploy1001: Finished scap: testwikis wikis to 1.34.0-wmf.1 refs T220726 (duration: 36m 47s)
  • 21:09 XioNoX: add wpao to wmf/ops in LDAP - T221142
  • 21:02 cdanis@puppetmaster1001: conftool action : set/pooled=yes; selector: name=mw1280.eqiad.wmnet
  • 20:59 otto@deploy1001: scap-helm eventgate-analytics finished
  • 20:58 otto@deploy1001: scap-helm eventgate-analytics cluster staging completed
  • 20:58 otto@deploy1001: scap-helm eventgate-analytics upgrade staging -f eventgate-analytics-staging-values.yaml --reset-values stable/eventgate-analytics [namespace: eventgate-analytics, clusters: staging]
  • 20:55 andrewbogott: removing keystone endpoints for the 'eqiad' region
  • 20:45 twentyafterfour@deploy1001: Started scap: testwikis wikis to 1.34.0-wmf.1 refs T220726
  • 20:43 mobrovac@deploy1001: Finished deploy [restbase/deploy@dfca9e6]: Use the simplified key/value bucket - T215960 (duration: 19m 52s)
  • 20:43 otto@deploy1001: scap-helm eventgate-analytics finished
  • 20:42 otto@deploy1001: scap-helm eventgate-analytics cluster staging completed
  • 20:42 otto@deploy1001: scap-helm eventgate-analytics upgrade staging -f eventgate-analytics-staging-values.yaml --reset-values --set debug_mode_enabled=true stable/eventgate-analytics [namespace: eventgate-analytics, clusters: staging]
  • 20:23 mobrovac@deploy1001: Started deploy [restbase/deploy@dfca9e6]: Use the simplified key/value bucket - T215960
  • 20:19 ariel@deploy1001: Finished deploy [dumps/dumps@796ccb5]: use safe_load yaml and getReplicaServer.php, cleanup symlinks once per job only (duration: 00m 04s)
  • 20:19 ariel@deploy1001: Started deploy [dumps/dumps@796ccb5]: use safe_load yaml and getReplicaServer.php, cleanup symlinks once per job only
  • 20:11 mobrovac@deploy1001: Finished deploy [restbase/deploy@dfca9e6] (dev-cluster): Use the simplified key/value bucket (duration: 05m 24s)
  • 20:05 mobrovac@deploy1001: Started deploy [restbase/deploy@dfca9e6] (dev-cluster): Use the simplified key/value bucket
  • 20:04 otto@deploy1001: scap-helm eventgate-analytics finished
  • 20:04 otto@deploy1001: scap-helm eventgate-analytics cluster staging completed
  • 20:04 otto@deploy1001: scap-helm eventgate-analytics upgrade staging -f eventgate-analytics-staging-values.yaml --reset-values --set debug_mode_enabled=true stable/eventgate-analytics [namespace: eventgate-analytics, clusters: staging]
  • 19:59 otto@deploy1001: scap-helm eventgate-analytics finished
  • 19:59 otto@deploy1001: scap-helm eventgate-analytics cluster staging completed
  • 19:59 otto@deploy1001: scap-helm eventgate-analytics upgrade staging -f eventgate-analytics-staging-values.yaml --reset-values --set debug_mode_enabled=true stable/eventgate-analytics [namespace: eventgate-analytics, clusters: staging]
  • 19:56 gehel: restarting cassandra on maps* for config change - T221055
  • 19:49 otto@deploy1001: scap-helm eventgate-analytics finished
  • 19:49 otto@deploy1001: scap-helm eventgate-analytics cluster staging completed
  • 19:49 otto@deploy1001: scap-helm eventgate-analytics upgrade staging -f eventgate-analytics-staging-values.yaml --reset-values --set debug_mode_enabled=true stable/eventgate-analytics [namespace: eventgate-analytics, clusters: staging]
  • 19:48 otto@deploy1001: scap-helm eventgate-analytics finished
  • 19:48 otto@deploy1001: scap-helm eventgate-analytics cluster staging completed
  • 19:48 otto@deploy1001: scap-helm eventgate-analytics upgrade staging -f eventgate-analytics-staging-values.yaml --reset-values --set main_app.debug_mode_enabled=true stable/eventgate-analytics [namespace: eventgate-analytics, clusters: staging]
  • 19:11 twentyafterfour: twentyafterfour@deploy1001:/srv/mediawiki-staging$ scap prep 1.34.0-wmf.1
  • 19:07 bblack: restarting varnish backend on cp1083
  • 19:04 bblack: restarting varnish backend on cp1085
  • 18:55 cdanis: cdanis@cp1085.eqiad.wmnet ~ % sudo -i depool
  • 18:53 otto@deploy1001: scap-helm eventgate-analytics finished
  • 18:53 otto@deploy1001: scap-helm eventgate-analytics cluster staging completed
  • 18:53 otto@deploy1001: scap-helm eventgate-analytics upgrade staging -f eventgate-analytics-staging-values.yaml --reset-values --set main_app.profiling_enabled=true stable/eventgate-analytics [namespace: eventgate-analytics, clusters: staging]
  • 18:46 twentyafterfour: branching 1.34.0-wmf.1 refs T220726
  • 18:25 otto@deploy1001: scap-helm eventgate-analytics finished
  • 18:25 otto@deploy1001: scap-helm eventgate-analytics cluster staging completed
  • 18:25 otto@deploy1001: scap-helm eventgate-analytics upgrade staging -f eventgate-analytics-staging-values.yaml --reset-values --set wmfdebug_enabled=true stable/eventgate-analytics [namespace: eventgate-analytics, clusters: staging]
  • 18:14 cmjohnson1: powering off mw1280 to replace DIMM
  • 18:08 mutante: restbase2007, restbase2008 - re-enabled puppet which was disabled with reason 'decom'ed' but actually needed to run to decom after they had moved to role::spare::system (T208087)
  • 17:56 reedy@deploy1001: Synchronized php-1.33.0-wmf.25/extensions/WikimediaIncubator/: T220623 (duration: 00m 53s)
  • 17:47 herron: beginning rolling ELK upgrade to 5.6.15
  • 17:46 lucaswerkmeister-wmde@deploy1001: Synchronized wmf-config/InitialiseSettings-labs.php: no-op preparatory change (T221107)|gerrit:504386no-op preparatory change (T221107) (duration: 00m 52s)
  • 17:36 arturo: toolforge k8s reallocation (from nova-network to neutron) is causing troubles with IRC bots, expect missing entries in the SAL
  • 17:28 aborrero@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
  • 17:28 aborrero@cumin1001: START - Cookbook sre.hosts.downtime
  • 17:27 andrewbogott: restarting rabbitmq on cloudcontrol1003
  • 17:26 cdanis@puppetmaster1001: conftool action : set/pooled=no; selector: dc=eqiad,name=mw1280.eqiad.wmnet,cluster=api_appserver
  • 17:25 arturo: rebooted cloudnet1003
  • 17:24 gehel: force initialization of unassigned shards on elasticsearch eqiad
  • 17:16 lucaswerkmeister-wmde@deploy1001: Synchronized wmf-config/InitialiseSettings.php: no-op preparatory change (T221108)|gerrit:504374no-op preparatory change (T221108) (duration: 00m 52s)
  • 16:54 Lucas_WMDE: lucaswerkmeister-wmde@mwmaint1002:~$ mwscript extensions/WikibaseQualityConstraints/maintenance/ImportConstraintEntities.php --wiki=testwikidatawiki --config-format=wgConf | tee T221108.php
  • 16:53 mutante: bast2001 - shutdown -h now - decom'ed (T219492)
  • 16:48 mutante: puppet node clean bast2001.wikimedia.org ; puppet node deactivate bast2001.wikimedia.org ; it showed up in Icinga again despite running decom cookbook (T219492)
  • 16:47 otto@deploy1001: scap-helm eventgate-analytics finished
  • 16:47 otto@deploy1001: scap-helm eventgate-analytics cluster staging completed
  • 16:47 otto@deploy1001: scap-helm eventgate-analytics upgrade staging -f eventgate-analytics-staging-values.yaml --reset-values --set wmfdebug_enabled=true stable/eventgate-analytics [namespace: eventgate-analytics, clusters: staging]
  • 16:44 otto@deploy1001: scap-helm eventgate-analytics finished
  • 16:44 otto@deploy1001: scap-helm eventgate-analytics cluster staging completed
  • 16:44 otto@deploy1001: scap-helm eventgate-analytics upgrade staging -f eventgate-analytics-staging-values.yaml --reset-values stable/eventgate-analytics [namespace: eventgate-analytics, clusters: staging]
  • 16:43 jynus: upgrading and shutting down db1078 T219115
  • 16:41 jynus: disabling notifications on db1078 T219115
  • 16:37 jynus@deploy1001: Synchronized wmf-config/db-eqiad.php: Depool db1078 (duration: 00m 52s)
  • 15:36 arturo: reimaging cloudnet2002-dev because role name change
  • 15:21 otto@deploy1001: scap-helm eventgate-analytics finished
  • 15:21 otto@deploy1001: scap-helm eventgate-analytics cluster staging completed
  • 15:20 otto@deploy1001: scap-helm eventgate-analytics upgrade staging --version 0.0.28 -f eventgate-analytics-staging-values.yaml --reset-values stable/eventgate-analytics [namespace: eventgate-analytics, clusters: staging]
  • 15:19 otto@deploy1001: scap-helm eventgate-analytics finished
  • 15:19 otto@deploy1001: scap-helm eventgate-analytics cluster staging completed
  • 15:19 otto@deploy1001: scap-helm eventgate-analytics upgrade staging -f eventgate-analytics-staging-values.yaml --reset-values stable/eventgate-analytics [namespace: eventgate-analytics, clusters: staging]
  • 15:18 otto@deploy1001: scap-helm eventgate-analytics finished
  • 15:18 otto@deploy1001: scap-helm eventgate-analytics cluster staging completed
  • 15:18 otto@deploy1001: scap-helm eventgate-analytics upgrade staging -f eventgate-analytics-staging-values.yaml --reset-values stable/eventgate-analytics [namespace: eventgate-analytics, clusters: staging]
  • 15:16 elukey: roll restart kafka on kafka-jumbo100[1-6] to pick up openjdk upgrades
  • 14:58 gehel: manual data transfer from wdqs1008 to wdqs1009 - T220830
  • 14:56 ema: swift-fe-eqiad: nginx reload for new TLS certificate T204245
  • 14:53 gehel@cumin1001: END (PASS) - Cookbook sre.wdqs.data-transfer (exit_code=0)
  • 14:52 gehel@cumin1001: START - Cookbook sre.wdqs.data-transfer
  • 14:51 ema@puppetmaster1001: conftool action : set/pooled=yes; selector: name=ms-fe1005.eqiad.wmnet
  • 14:45 ema: test https://gerrit.wikimedia.org/r/504340 on ms-fe1005 T204245
  • 14:30 ema: swift-fe-codfw: nginx reload for new TLS certificate T204245
  • 14:22 gehel@cumin1001: END (PASS) - Cookbook sre.wdqs.data-transfer (exit_code=0)
  • 14:21 gehel@cumin1001: START - Cookbook sre.wdqs.data-transfer
  • 14:20 elukey: roll restart of all the druid daemons on druid100[1-6] to pick up new openjdk updates
  • 14:17 ema@puppetmaster1001: conftool action : set/pooled=yes; selector: name=ms-fe2005.codfw.wmnet
  • 14:07 jijiki: Pooling thumbor1001
  • 14:04 ema: test https://gerrit.wikimedia.org/r/#/c/operations/puppet/+/504331/ on ms-fe2005 T204245
  • 14:01 ema@puppetmaster1001: conftool action : set/pooled=no; selector: name=ms-fe2005.codfw.wmnet
  • 14:01 jijiki: Depooling thumbor1001
  • 13:58 jijiki: Disable puppet on thumbor1001 for ~24h to serve traffic via haproxy - T187765
  • 13:54 gehel@cumin1001: END (FAIL) - Cookbook sre.wdqs.data-transfer (exit_code=99)
  • 13:53 gehel@cumin1001: START - Cookbook sre.wdqs.data-transfer
  • 13:52 jijiki: Enable puppet on thumbor*
  • 13:42 gehel@cumin1001: END (FAIL) - Cookbook sre.wdqs.data-transfer (exit_code=99)
  • 13:41 gehel@cumin1001: START - Cookbook sre.wdqs.data-transfer
  • 13:39 gehel: restetting cookbooks repo on cumin1001 (local changes)
  • 13:34 jijiki: Disabling puppet on thumbor* to merge 504284
  • 13:13 ema: cp-ats: upgrade fifo-log-demux to 0.2 and restart services
  • 13:10 ema: fifo-log-demux 0.2 uploaded to stretch-wikimedia
  • 13:03 arturo: T220095 renaming/reimaging labtestcontrol2003 as cloudcontrol2003-dev
  • 12:58 moritzm: installing ghostscript update on thumbor1001
  • 12:54 gehel: cleanup redundant prometheus-elasticsearch units on elasticsearch servers
  • 12:52 godog: swift eqiad-prod continue ms-be1013 decom - T220590
  • 12:17 moritzm: installing OpenSSL 1.0.2 updates on cp* Varnish hosts
  • 12:07 arturo: rebooting cloudvirt200[123]-dev because deep changes in config
  • 11:18 hoo@deploy1001: Synchronized wmf-config/InitialiseSettings.php: Add wgWikibaseMusicalNotationLineWidthInches to config (T218191) (duration: 00m 52s)
  • 11:10 hoo@deploy1001: Synchronized wmf-config/Wikibase.php: Revert "WikibaseClient: Conditionally enable mapframe support" (T218051) (duration: 00m 51s)
  • 11:08 hoo@deploy1001: Synchronized wmf-config/InitialiseSettings.php: Enable signatures in 2019: NS (ID 128) for wikimaniawiki (T221062) (duration: 00m 52s)
  • 10:49 gilles: T221065 eswiki purge finished
  • 10:45 moritzm: installing libjs-bootstrap updates from Stretch point release
  • 10:21 gilles: T221065 mwscript purgeList.php eswiki --all --verbose on mwmaint1002
  • 10:21 moritzm: installing xapian-core update from stretch point release
  • 10:18 gilles@deploy1001: Synchronized wmf-config/InitialiseSettings.php: T221065 Set up origin trials on Spanish Wikipedia mobile site (duration: 00m 52s)
  • 09:59 jijiki: Enabling puppet again on on dbproxy* and thumbor*
  • 09:51 jynus@deploy1001: Synchronized wmf-config/db-eqiad.php: Reduce db1078 load (duration: 00m 53s)
  • 09:37 jijiki: Disabling puppet on dbproxy* and thumbor* to merge 502972
  • 09:26 fsero: [late logging] swift container-to-container synchronization enabled between docker_registry_eqiad and docker_registry_codfw swift containers at 08:15:00 UTC
  • 09:05 ema@puppetmaster1001: conftool action : set/pooled=yes; selector: name=cp1076.eqiad.wmnet,service=varnish-fe
  • 09:05 ema@puppetmaster1001: conftool action : set/pooled=yes; selector: name=cp1076.eqiad.wmnet,service=nginx
  • 09:05 ema: cp1076: repool varnish-fe pointing to Varnish T213263
  • 08:57 ema@puppetmaster1001: conftool action : set/pooled=no; selector: name=cp1076.eqiad.wmnet,service=varnish-fe
  • 08:57 ema@puppetmaster1001: conftool action : set/pooled=no; selector: name=cp1076.eqiad.wmnet,service=nginx
  • 08:57 ema: cp1076: depool varnish-fe in preparation of traffic switchback to Varnish T213263
  • 08:40 hoo: Updated the Wikidata property suggester with data from the 2019-04-08 JSON dump and applied the T132839 workarounds
  • 08:33 moritzm: rebooting ms-be1020 for combined kernel/glibc/OpenSSL update
  • 08:01 moritzm: rebooting Swift frontends in codfw for combined kernel/glibc/OpenSSL security updates
  • 07:50 ema@puppetmaster1001: conftool action : set/pooled=yes; selector: name=cp2002.codfw.wmnet,service=varnish-fe
  • 07:50 ema@puppetmaster1001: conftool action : set/pooled=yes; selector: name=cp2002.codfw.wmnet,service=nginx
  • 07:50 ema: cp2002: repool varnish-fe pointing to Varnish T213263
  • 07:47 moritzm: rebooting Swift frontends in eqiad combined kernel/glibc/OpenSSL security updates
  • 07:45 ema@puppetmaster1001: conftool action : set/pooled=no; selector: name=cp2002.codfw.wmnet,service=varnish-fe
  • 07:45 ema@puppetmaster1001: conftool action : set/pooled=no; selector: name=cp2002.codfw.wmnet,service=nginx
  • 07:45 ema: cp2002: depool varnish-fe in preparation of traffic switchback to Varnish T213263
  • 07:36 marostegui: Upgrade db2093
  • 07:33 ema@puppetmaster1001: conftool action : set/pooled=yes; selector: name=cp2005.codfw.wmnet,service=varnish-fe
  • 07:33 ema@puppetmaster1001: conftool action : set/pooled=yes; selector: name=cp2005.codfw.wmnet,service=nginx
  • 07:32 ema: cp2005: repool varnish-fe pointing to Varnish T213263
  • 07:25 ema@puppetmaster1001: conftool action : set/pooled=no; selector: name=cp2005.codfw.wmnet,service=varnish-fe
  • 07:25 ema@puppetmaster1001: conftool action : set/pooled=no; selector: name=cp2005.codfw.wmnet,service=nginx
  • 07:25 ema: cp2005: depool varnish-fe in preparation of traffic switchback to Varnish T213263
  • 07:11 moritzm: upgrading Java on Hadoop/Kafka/Jumbo/Druid clusters
  • 05:02 marostegui@deploy1001: Synchronized wmf-config/db-eqiad.php: Repool pc1007 (duration: 00m 31s)
  • 01:46 aaron@deploy1001: Synchronized php-1.33.0-wmf.25/includes/parser/Parser.php: 73529ae6c5ffb6 (duration: 00m 53s)
  • 00:34 onimisionipe: pooled maps2003 - postgres init complete!
  • 00:33 krinkle@deploy1001: Synchronized wmf-config/profiler.php: I7589aa153 (duration: 00m 52s)
  • 00:33 urandom: creating new restbase schema -- T221031

2019-04-15

  • 23:39 dzahn@cumin1001: END (PASS) - Cookbook sre.hosts.decommission (exit_code=0)
  • 23:39 dzahn@cumin1001: START - Cookbook sre.hosts.decommission
  • 23:20 cdanis: cdanis@icinga1001.wikimedia.org ~ % sudo systemctl restart tcpircbot-logmsgbot.service
  • 23:17 bd808: scap: SWAT: wikitech: Use cn:caseExactMatch: as account search filter|gerrit:497423wikitech: Use cn:caseExactMatch: as account search filter (T165795)
  • 20:59 thcipriani: gerrit back
  • 20:57 gehel: shutting down blazegraph and updater on wdqs1010, waiting for data reimport
  • 20:55 thcipriani: gerrit restart to pick up gc log changes incoming
  • 20:37 arlolra: Updated Parsoid to 83c17fc9
  • 20:23 Amir1: the ores deployment is over
  • 19:49 XioNoX: export BGP communities (prepend x3 outside asia) to AS3491 in eqsin
  • 19:46 mutante: bromine/vega: rm /etc/rsyncd.conf ; systemctl stop rsync (clean up old rsync config gerrit:503961)
  • 19:45 XioNoX: update (and add) AS3491 BGP communities in eqsin
  • 18:58 XioNoX: update mr1-* security policies - T219384
  • 18:41 onimisionipe: depooling maps2003 for psotgres init
  • 18:40 onimisionipe: pooling map2002 - postgres init complete
  • 18:39 Amir1: Morning SWAT is done
  • 18:35 shdubsh: logstash1009: disabling puppet and testing logstash config
  • 18:09 mutante: LDAP - adding legoktm and qchris to gerritadmin group (T219086)
  • 17:45 anomie: Backporting fix for T220991
  • 17:41 akosiaris: force puppet agent run on maps* after moving config-vars.yaml file for kartotherian, tilerator, tileratorui T220982
  • 17:33 mutante: LDAP - re-adding 'pbj' to 'nda' group, extended access until May 6th, transparency report contractor
  • 17:23 mutante: wikibugs - qdel'ed jobs and restarted another time, make it rejoin
  • 17:17 onimisionipe: wdqs deployment is complete! for some reasons I don't know scap did not logging here
  • 17:17 herron: restarted logstash on logstash1007
  • 17:15 mutante: restarted wikibugs because it stopped talking
  • 16:08 onimisionipe: pooling maps2001 - postgres reinit is complete
  • 15:55 Reedy: changed /srv/mediawiki/docroot/wikimedia.org to a symlink to standard-docroot
  • 15:53 XioNoX: add cloud-in4 firewall filter to codfw - T211921
  • 15:31 onimisionipe: restarting prometheus-wmf-elasticsearch-exporter-9* on all elastic nodes
  • 15:30 onimisionipe: restarting prometheus-wmf-elasticsearch-exporter-9200 on all elastic nodes
  • 15:28 _joe_: systemctl reset-failed on ms-be1027, debmonitor session
  • 15:24 Amir1: end of foreachwikiindblist wikidataclient extensions/Wikibase/lib/maintenance/populateSitesTable.php --force-protocol https (T219871)
  • 14:55 gehel: deploying tilerator to maps1001 to validate deployment is working - T220982
  • 14:55 Amir1: start of foreachwikiindblist wikidataclient extensions/Wikibase/lib/maintenance/populateSitesTable.php --force-protocol https (T219871)
  • 14:43 _joe_: running apply-config-tilerator on maps1001
  • 14:40 _joe_: running apply-config-karthoterian on maps1001
  • 14:22 cdanis: T220982 cdanis@cumin1001.eqiad.wmnet ~ % sudo cumin 'maps1*' 'sudo chmod -R a+r /srv/deployment/tilerator /srv/deployment/kartotherian'
  • 14:21 cdanis: cdanis@cumin1001.eqiad.wmnet ~ % sudo cumin 'maps1*' "disable-puppet 'bad permissions - T220982 - cdanis'"
  • 14:18 cdanis: cdanis@cumin1001.eqiad.wmnet ~ % sudo cumin 'maps*' 'sudo chmod -R a+r /srv/deployment/tilerator /srv/deployment/kartotherian'
  • 14:18 gehel: reseting permissions on maps server fir /srv/deployment/kartotherian and /srv/deplyoment/tilerator
  • 14:04 moritzm: rebooting ms-fe1005 for combined kernel/glibc/OpenSSL update
  • 13:57 jbond42: upgrading puppet 4 -> 5 and facter 2 -> 3 on mediawiki::canary_appserver, mediawiki::appserver::canary_api and cache::cache roles
  • 13:56 gehel: restart tilerator / kartotherian on all maps servers for openssl update
  • 13:55 godog: start ms-be1013 decom - T220590
  • 13:42 godog: reboot ms-be1013
  • 13:09 moritzm: installing wget security updates on trusty hosts
  • 12:59 moritzm: restarting archiva on archiva1001 for OpenJDK security update
  • 12:50 moritzm: restarting Apache on matomo1001 to pick up OpenSSL update
  • 12:14 moritzm: rolling restart of HHVM/Apache on deployment servers to pick up OpenSSL update
  • 11:59 fsero: pointing boron docker builds to the new registry temporarily (docker builds on boron might fail)
  • 11:35 Amir1: EU swat is done
  • 11:26 moritzm: rolling restart of HHVM/Apache on labweb* to pick up OpenSSL update
  • 09:58 moritzm: installing openssl1.0 security updates
  • 09:18 gehel: unbanning elastic1029 from cluster
  • 08:58 moritzm: updating mediawiki servers in eqiad to version 1.8.1 of the PHP extension for wikidiff
  • 08:29 onimisionipe: increase wal_keep_segments on codfw maps master
  • 08:19 moritzm: updating mediawiki servers in codfw to version 1.8.1 of the PHP extension for wikidiff
  • 07:50 Amir1: ladsgroup@mwmaint1002:~$ mwscript maintenance/initSiteStats.php --wiki=hywwiki --active (T220936)
  • 05:31 marostegui: Upgrade db1100
  • 05:07 marostegui: powercycle mw1280 (crashed)

2019-04-14

  • 06:10 ebernhardson: unban elastic1027 from eqiad-psi
  • 05:36 ebernhardson: unbanning elastic1027 after about half the shards left and load dropped
  • 05:31 ebernhardson: ban elastic1027 from elasticsearch-psi in eqiad
  • 04:59 ebernhardson: restart elasticsearch_6@production-searhc-psi-eqiad on elastic1027 due to 100% cpu for last 30+ minutes

2019-04-13

  • 18:46 godog: 3h downtime for cloudvirt1015
  • 15:58 ebernhardson: restart elasticsearch on elastic1027
  • 15:34 shdubsh: restart recommendation_api on scb1001
  • 15:33 shdubsh: restart recommendation_api on scb2001
  • 10:46 onimisionipe: depooling maps2001 for postgres init
  • 08:05 gehel: repooling wdqs1008 - data transfer completed - T220830
  • 00:32 krinkle@deploy1001: Synchronized php-1.33.0-wmf.25/includes/: Idc19cc29764a / T220854 - hot fix (duration: 05m 37s)

2019-04-12

  • 21:16 Krinkle: scap was unable to sync to 1 apache (connect to host cloudweb2001-dev.wikimedia.org port 22: Connection timed out)
  • 21:10 krinkle@deploy1001: Synchronized php-1.33.0-wmf.25/extensions/ImageMap/includes/ImageMap.php: I0ee84f059da / T217087 (duration: 05m 12s)
  • 19:27 dzahn@cumin1001: END (FAIL) - Cookbook sre.hosts.decommission (exit_code=99)
  • 19:27 dzahn@cumin1001: START - Cookbook sre.hosts.decommission
  • 19:24 dzahn@cumin1001: END (FAIL) - Cookbook sre.hosts.decommission (exit_code=99)
  • 19:24 dzahn@cumin1001: START - Cookbook sre.hosts.decommission
  • 18:59 dzahn@cumin1001: END (PASS) - Cookbook sre.hosts.decommission (exit_code=0)
  • 18:59 dzahn@cumin1001: START - Cookbook sre.hosts.decommission
  • 17:17 onimisionipe: depooling maps2002 for postgres init
  • 17:16 onimisionipe: repooling maps2001 - postgres init is complete
  • 16:14 elukey: install ifstat on all the mc1* hosts for network bandwidth investigation
  • 15:56 gehel: starting data trasnfer from wdqs1008 to wdqs1009 - T220830
  • 15:32 thcipriani: gerrit back
  • 15:29 thcipriani: gerrit restart incoming
  • 14:29 onimisionipe: depool maps2001 for postgres initialization
  • 13:24 akosiaris: re-enable puppet across the fleet. Patch merged, recovery storm coming
  • 13:18 akosiaris: disable puppet across the fleet to avoid incoming puppet alert storm
  • 12:57 marostegui: Purge old rows and optimize tables on spare host pc1010 T210725
  • 12:53 urandom: decommissioning cassandra-c, restbase2008 -- T208087
  • 12:49 gehel: rolling restart of cassandra on maps* for jvm upgrade
  • 12:22 arturo: T220095 disable icinga checks for labtestcontrol2003
  • 12:16 gilles@deploy1001: Synchronized wmf-config/InitialiseSettings.php: T220807 Reduce cawiki survey sampling rate (duration: 05m 11s)
  • 11:56 moritzm: upgrading app server canaries to version 1.8.1 of the PHP wikidiff extension (HHVM already deployed) T203069
  • 11:46 moritzm: upgrading acmechief hosts to latest buster state
  • 11:44 gilles@deploy1001: Synchronized wmf-config/InitialiseSettings.php: T220807 Oversample navtiming on cawiki and commonswiki (duration: 05m 14s)
  • 11:37 Trey314159: reindexing Greek, Turkish, and Irish wikis on elastic@eqiad and elastic@codfw complete (T217806)
  • 11:19 moritzm: installed Java security updates on relforge* hosts
  • 11:10 moritzm: installing Java security updates on remaining maps hosts
  • 10:32 arturo: T219626 reimaging cloudcontrol2001-dev
  • 10:13 elukey: matomo updated to 3.9.1 on matomo1001 + deb upload to wikimedia-stretch - T218037
  • 09:53 moritzm: updated mwdebug1001 to php-wikidiff 1.8.1
  • 09:37 moritzm: updated mwdebug1002 to php-wikidiff 1.8.1
  • 09:30 volans: reset mgmt card on labtestcontrol2003 - T220783
  • 09:07 moritzm: added the wikimedia repository key to the stretch build chroot on boron, fixes builds using the PHP72/SPICERACK hooks
  • 09:05 arturo: T218021 disable icinga checks for labtestcontrol2001
  • 08:35 gilles@deploy1001: Synchronized php-1.33.0-wmf.25/extensions/NavigationTiming/modules/ext.navigationTiming.js: T220788 Fix veaction === null case (duration: 00m 54s)
  • 08:02 moritzm: updated ssacli in thirdparty/hwraid component for stretch to 3.30-13.0 T220787
  • 07:12 marostegui: Manually install ssacli on db2[097|098|099|100|101|102] T220787 T220572
  • 07:04 moritzm: synced ssacli to thirdparty/hwraid components for jessie/stretch T220787
  • 01:00 mutante: puppet cert clean, puppet node clean, puppet node deactivate on cloudnet2001-dev.codfw.wmnet (T218025)
  • 00:25 tstarling@deploy1001: Synchronized wmf-config/profiler.php: increase excimer max depth (duration: 00m 53s)
  • 00:02 ejegg: updated fundraising CiviCRM from 24b968b1f9 to 1bc1570967

2019-04-11

  • 23:57 urandom: decommissioning cassandra-b, restbase2008 -- T208087
  • 22:15 jforrester@deploy1001: Synchronized php-1.33.0-wmf.25/extensions/WikibaseMediaInfo/resources/: Hot-deploy fix for WBMI variable cache miss T220665 (duration: 00m 55s)
  • 20:46 mutante: deleting job of wikibugs-phab-listener in an attempt to restart it
  • 19:47 cdanis: cdanis@mwdebug1001.eqiad.wmnet ~ % sudo systemctl stop hhvm && sudo rm /var/cache/hhvm/fcgi.hhbc.sq3 && sudo systemctl start hhvm
  • 19:39 twentyafterfour: mediawiki error rate seems to be back to normal after deploying 1.33.0-wmf.25, the new branch looks stable refs T206679
  • 18:55 mutante: disabling puppet on hosts using class 'confd' to safely deploy gerrit:456317
  • 18:55 Trey314159: reindexing Greek, Turkish, and Irish wikis on elastic@eqiad and elastic@codfw (T217806)
  • 18:01 onimisionipe: increase replication factor on maps codfw cluster
  • 17:45 onimisionipe@deploy1001: Finished deploy [kartotherian/deploy@5394b59] (stretch): Insert maps2001 into stretch environment (duration: 00m 22s)
  • 17:45 onimisionipe@deploy1001: Started deploy [kartotherian/deploy@5394b59] (stretch): Insert maps2001 into stretch environment
  • 17:22 mbsantos@deploy1001: Finished deploy [proton/deploy@5cb8bbe]: Update chromium-renderer to 8988283 (T213362, T216191, T212322) (duration: 01m 33s)
  • 17:21 mbsantos@deploy1001: Started deploy [proton/deploy@5cb8bbe]: Update chromium-renderer to 8988283 (T213362, T216191, T212322)
  • 16:25 otto@deploy1001: scap-helm eventgate-analytics upgrade staging -f eventgate-analytics-staging-values.yaml --reset-values stable/eventgate-analytics [namespace: eventgate-analytics, clusters: staging]
  • 15:48 otto@deploy1001: scap-helm eventgate-analytics finished
  • 15:47 otto@deploy1001: scap-helm eventgate-analytics cluster staging completed
  • 15:47 otto@deploy1001: scap-helm eventgate-analytics upgrade staging -f eventgate-analytics-staging-values.yaml --reset-values stable/eventgate-analytics [namespace: eventgate-analytics, clusters: staging]
  • 15:42 otto@deploy1001: scap-helm eventgate-analytics finished
  • 15:42 otto@deploy1001: scap-helm eventgate-analytics cluster staging completed
  • 15:42 otto@deploy1001: scap-helm eventgate-analytics upgrade staging -f eventgate-analytics-staging-values.yaml --reset-values stable/eventgate-analytics [namespace: eventgate-analytics, clusters: staging]
  • 15:41 otto@deploy1001: scap-helm eventgate-analytics upgrade staging -f eventgate-analytics-staging-values.yaml --reset-values stable/eventgate-analytics [namespace: eventgate-analytics, clusters: staging]
  • 15:36 onimisionipe@deploy1001: Finished deploy [kartotherian/deploy@13d9ebb] (stretch): Update stretch instance with latest code (duration: 00m 22s)
  • 15:35 onimisionipe@deploy1001: Started deploy [kartotherian/deploy@13d9ebb] (stretch): Update stretch instance with latest code
  • 15:23 lucaswerkmeister-wmde@deploy1001: Synchronized wmf-config/InitialiseSettings.php: no-op comment update|gerrit:503008no-op comment update (duration: 01m 00s)
  • 15:06 cdanis@puppetmaster1001: conftool action : set/pooled=true; selector: dnsdisc=kartotherian,name=codfw
  • 14:53 paravoid: rebooting labnet1002
  • 14:49 vgutierrez: uploaded acme-chief 0.16 to apt.wikimedia.org (buster) - T207461
  • 14:47 urandom: decommissioning cassandra-a, restbase2008 -- T208087
  • 14:46 akosiaris: cxserver Add gargage collections graphs under saturation. T205911
  • 14:18 Amir1: Deployment of Url shortener is done now
  • 14:15 ladsgroup@deploy1001: Synchronized wmf-config/InitialiseSettings.php: Deploy UrlShortener to metawiki, let's get the party started (T108557, T44085) (duration: 01m 00s)
  • 12:49 gehel@puppetmaster1001: conftool action : set/pooled=inactive; selector: dc=codfw,cluster=maps,name=maps2001.codfw.wmnet
  • 12:20 kartik@deploy1001: scap-helm cxserver finished
  • 12:19 kartik@deploy1001: scap-helm cxserver cluster eqiad completed
  • 12:19 kartik@deploy1001: scap-helm cxserver upgrade -f cxserver-eqiad-values.yaml production stable/cxserver [namespace: cxserver, clusters: eqiad]
  • 12:16 kartik@deploy1001: scap-helm cxserver finished
  • 12:16 kartik@deploy1001: scap-helm cxserver cluster codfw completed
  • 12:15 kartik@deploy1001: scap-helm cxserver upgrade -f cxserver-codfw-values.yaml production stable/cxserver [namespace: cxserver, clusters: codfw]
  • 12:12 kartik@deploy1001: scap-helm cxserver finished
  • 12:12 kartik@deploy1001: scap-helm cxserver cluster staging completed
  • 12:12 kartik@deploy1001: scap-helm cxserver upgrade -f cxserver-staging-values.yaml staging stable/cxserver [namespace: cxserver, clusters: staging]
  • 11:40 zeljkof: EU SWAT finished
  • 11:39 zfilipin@deploy1001: Synchronized wmf-config/InitialiseSettings.php: SWAT: Increase musical notation datatype string length limit (T218767)|gerrit:500692Increase musical notation datatype string length limit (T218767) (duration: 01m 02s)
  • 11:37 akosiaris@deploy1001: scap-helm cxserver finished
  • 11:36 akosiaris@deploy1001: scap-helm cxserver cluster codfw completed
  • 11:36 akosiaris@deploy1001: scap-helm cxserver upgrade -f cxserver-codfw-values.yaml production stable/cxserver [namespace: cxserver, clusters: codfw]
  • 11:30 onimisionipe: removing maps2002 from cassandra cluster due to dead node error
  • 10:46 moritzm: upgrading remaining app servers to HHVM 3.18.5+dfsg-1+wmf8+deb9u2 and wikidiff 1.8.1 (T203069)
  • 10:39 hashar: Upgrading CI Jenkins
  • 10:21 volans: forcing puppet run on A:cp-upload_codfw
  • 10:15 gehel: remove maps2001 from new cassandra cluster -T198622
  • 10:10 oblivian@puppetmaster1001: conftool action : set/pooled=false; selector: dnsdisc=kartotherian,name=codfw
  • 09:57 elukey: roll restart druid-coordinator/overlord on druid100[4-6] to pick up new jvm settings
  • 09:01 moritzm: deployment servers to HHVM 3.18.5+dfsg-1+wmf8+deb9u2 and wikidiff 1.8.1 (T203069)
  • 08:20 moritzm: upgrading remaining job runners to HHVM 3.18.5+dfsg-1+wmf8+deb9u2 and wikidiff 1.8.1 (T203069)
  • 08:19 elukey: roll restart of druid-broker/historical on druid100[4-6] to pick up new settings
  • 06:33 moritzm: uploaded jenkins 2.164.2 to apt.wikimedia.org (stretch-wikimedia / thirdparty/ci)
  • 06:32 moritzm: uploaded jenkins 2.164.2 to apt.wikimedia.org (jessie-wikimedia / thirdparty)
  • 06:24 moritzm: upgrading remaining API Servers to HHVM 3.18.5+dfsg-1+wmf8+deb9u2 and wikidiff 1.8.1 (T203069)
  • 05:03 marostegui@deploy1001: Synchronized wmf-config/db-eqiad.php: Remove s3 ready only T219115 (duration: 00m 36s)
  • 05:02 marostegui@deploy1001: Synchronized wmf-config/db-eqiad.php: Switchover s3 master eqiad from db1078 to db1075 T219115 (duration: 00m 36s)
  • 05:01 marostegui@deploy1001: Synchronized wmf-config/db-eqiad.php: Set s3 on read-only T219115 (duration: 00m 37s)
  • 05:00 marostegui: Starting s3 failover from db1078 to db1075 - T219115
  • 04:32 marostegui: Disable puppet on db1078 and db1075 T219115
  • 04:18 marostegui: Start topology changes to move s3 slaves under db1075 T219115
  • 04:14 marostegui: Disable GTID on s3 hosts - https://phabricator.wikimedia.org/T219115
  • 00:45 jforrester@deploy1001: Synchronized php-1.33.0-wmf.25/extensions/PageTriage/: UBN Fix for pageTriage and ORES T220649 (duration: 01m 04s)
  • 00:12 twentyafterfour: deploying phabricator upgrade

2019-04-10

  • 20:43 urandom: decommissioning cassandra-c, restbase2007 -- T208087
  • 20:27 otto@deploy1001: Synchronized wmf-config/InitialiseSettings.php: Revert - Enabling api-request logging via eventgate-analytics for group1 wikis - T214080 (duration: 01m 00s)
  • 19:48 otto@deploy1001: Synchronized wmf-config/InitialiseSettings.php: Enabling api-request logging via eventgate-analytics for group1 wikis - T214080 (duration: 00m 59s)
  • 19:42 twentyafterfour@deploy1001: Synchronized php: group1 wikis to 1.33.0-wmf.25 refs T206679 (duration: 01m 48s)
  • 19:40 twentyafterfour@deploy1001: rebuilt and synchronized wikiversions files: group1 wikis to 1.33.0-wmf.25 refs T206679
  • 19:28 twentyafterfour@deploy1001: rebuilt and synchronized wikiversions files: group0 wikis to 1.33.0-wmf.25 refs T206679
  • 19:26 XioNoX: enable sampling on cr2-eqiad external links, outbound
  • 19:17 twentyafterfour@deploy1001: Pruned MediaWiki: 1.33.0-wmf.20 [keeping static files] (duration: 02m 18s)
  • 19:14 ejegg: updated fundraising CiviCRM from d0e44a9e51 to 24b968b1f9
  • 19:08 twentyafterfour@deploy1001: Pruned MediaWiki: 1.33.0-wmf.19 [keeping static files] (duration: 02m 22s)
  • 17:44 twentyafterfour@deploy1001: Pruned MediaWiki: 1.33.0-wmf.18 [keeping static files] (duration: 02m 22s)
  • 16:58 chaomodus: restarted nagios-nrpe-server on proton1001 (it died due to OOM)
  • 16:51 elukey@puppetmaster1001: conftool action : set/pooled=no; selector: name=druid1004.eqiad.wmnet
  • 16:01 elukey: restart brokers on druid100[3-6] - locking after segments get deleted
  • 15:46 krinkle@deploy1001: Synchronized php-1.33.0-wmf.25/includes/parser/DateFormatter.php: Ib2b3fb / T220563 (duration: 01m 00s)
  • 15:28 gilles@deploy1001: Synchronized php-1.33.0-wmf.25/includes/media/ThumbnailImage.php: T216499 Only apply high priority hint half the time (duration: 00m 59s)
  • 15:26 oblivian@deploy1001: Finished deploy [docker-pkg/deploy@605690c]: Upgrade to docker-pkg 2.0.0 everywhere (duration: 00m 21s)
  • 15:26 oblivian@deploy1001: Started deploy [docker-pkg/deploy@605690c]: Upgrade to docker-pkg 2.0.0 everywhere
  • 15:24 jforrester@deploy1001: Synchronized php-1.33.0-wmf.25/extensions/Score/: UBN Revert Score changes that broke VE T220465 (duration: 01m 01s)
  • 15:19 oblivian@deploy1001: Finished deploy [docker-pkg/deploy@605690c]: Upgrade to docker-pkg 2.0.0 (duration: 00m 13s)
  • 15:19 oblivian@deploy1001: Started deploy [docker-pkg/deploy@605690c]: Upgrade to docker-pkg 2.0.0
  • 15:01 fsero: pooled back mwdebug200[1,2] T219989
  • 15:00 fsero: repooling mwdebug2002
  • 15:00 jijiki: Enable puppet on thumbor1001, switch back to nginx, pool thumbor1004 - T187765
  • 14:57 fsero: repooling mwdebug2001
  • 14:20 hashar: CI processing was a bit slower than usual over the past couple hours or so. It should be slightly faster now T220606
  • 14:13 joal@deploy1001: Finished deploy [analytics/aqs/deploy@fc1d232]: Deploying per-page limits for druid-endpoints (duration: 14m 41s)
  • 13:58 joal@deploy1001: Started deploy [analytics/aqs/deploy@fc1d232]: Deploying per-page limits for druid-endpoints
  • 13:47 fsero: resizing disk on mwdebug2002 T219989
  • 13:42 anomie@deploy1001: Synchronized wmf-config/InitialiseSettings.php: Setting actor migration to write-both/read-new on group0 (T188327) (duration: 01m 00s)
  • 13:19 marostegui: Deploy schema change on aawiki aawikibooks aawiktionary abwiki abwiktionary acewiki advisorswiki advisorywiki adywiki afwiki on x1 - T136427
  • 12:41 urandom: decommissioning cassandra-b, restbase2007 -- T208087
  • 12:40 hashar: contint2001: stopped puppet and zuul-merger for debugging
  • 12:17 jbond42: rolling security update of systemd on stretch systems
  • 12:07 Amir1: EU swat is done
  • 12:07 ladsgroup@deploy1001: Synchronized wmf-config/CommonSettings.php: SWAT: Prep work for deploying UrlShortener extension (T108557), part II (duration: 01m 00s)
  • 12:05 ladsgroup@deploy1001: Synchronized wmf-config/InitialiseSettings.php: SWAT: Prep work for deploying UrlShortener extension (T108557), part I (duration: 01m 00s)
  • 11:46 dcausse: elastisearch search cluster: reindexing zh-min-nan wikis (T219533)
  • 10:55 moritzm: upgrading nodejs on analytics-tool1002 to latest node 10 version from component/node10
  • 10:46 gilles: T220265 setZoneAccess on all wikis finished
  • 10:40 akosiaris: upgrade kubernetes-node on kubestage1002 (staging cluster) to 1.12.7-1 T220405
  • 10:33 moritzm: upgrading nodejs on aqs* to latest node 10 version from component/node10
  • 10:25 fsero: resizing disk on mwdebug2001 T219989
  • 10:17 akosiaris: upload kubernetes_1.12.7-1 to apt.wikimedia.org/stretch-wikimedia component main T220405
  • 10:14 marostegui@deploy1001: Synchronized wmf-config/db-eqiad.php: Repool db1064 T217453 (duration: 00m 59s)
  • 10:08 marostegui@deploy1001: Synchronized wmf-config/db-eqiad.php: Repool db1120 T217453 (duration: 01m 03s)
  • 09:59 moritzm: upgrading labweb hosts (wikitech) to HHVM 3.18.5+dfsg-1+wmf8+deb9u2 and wikidiff 1.8.1 (T203069)
  • 09:51 akosiaris: upgrade kubernetes-node on kubestage1001 (staging cluster) to 1.12.7-1 T220405
  • 09:50 moritzm: upgrading snapshot hosts to HHVM 3.18.5+dfsg-1+wmf8+deb9u2 and wikidiff 1.8.1 (T203069)
  • 09:40 akosiaris: upgrade kubernetes-master on neon (staging cluster) to 1.12.7-1 T220405
  • 09:40 akosiaris: upgrade kubernetes-master on neon (staging cluster) to 1.12.7-1
  • 09:05 moritzm: upgrading job runners mw1299-mw1311 to HHVM 3.18.5+dfsg-1+wmf8+deb9u2 and wikidiff 1.8.1 (T203069)
  • 08:56 elukey: restart druid-broker on druid100[4-6] - stuck after attempt datasource delete action
  • 08:46 godog: roll-restart swift frontends - T214289
  • 08:36 elukey: update thirdparty/cloudera packages to cdh 5.16.1 for jessie/stretch-wikimedia - T218343
  • 08:26 onimisionipe@deploy1001: Finished deploy [kartotherian/deploy@f7518bb] (stretch): Insert maps2003 into stretch environment (duration: 00m 22s)
  • 08:26 onimisionipe@deploy1001: Started deploy [kartotherian/deploy@f7518bb] (stretch): Insert maps2003 into stretch environment
  • 08:12 gilles: T220265 foreachwiki extensions/WikimediaMaintenance/filebackend/setZoneAccess.php --backend local-multiwrite
  • 07:22 mholloway-shell@deploy1001: Finished deploy [mobileapps/deploy@efd5bd5]: Revert "Bifurcate imageinfo queries to improve performance" (T220574) (duration: 04m 05s)
  • 07:18 mholloway-shell@deploy1001: Started deploy [mobileapps/deploy@efd5bd5]: Revert "Bifurcate imageinfo queries to improve performance" (T220574)
  • 07:12 onimisionipe: depooling maps200[34] to increase cassandra replication factor - T198622
  • 07:09 jijiki: Rolling restart thumbor service
  • 07:08 jijiki: Upgrading Thumbor servers to python-thumbor-wikimedia to 2.4-1+deb9u1
  • 06:59 marostegui: Deploy schema change on x1 master, with replication, lag will happen on x1 T217453
  • 06:59 marostegui@deploy1001: Synchronized wmf-config/db-eqiad.php: Depool x1 slaves T217453 (duration: 01m 13s)
  • 05:52 _joe_: setting both mwdebug200{1,2} to pooled = inactive to remove them from scap dsh list and allow deployments, T219989
  • 05:12 _joe_: same on mwdebug2001
  • 05:08 _joe_: removing hhvm cache on mwdebug2002
  • 00:37 Krinkle: last scap sync-file failed to mwdebug2002.codfw and mwdebug2001.codfw due to insufficient disk space
  • 00:20 krinkle@deploy1001: Synchronized php-1.33.0-wmf.25/resources/src/startup/: I3b9f1a13379a / Ie9db60e417cca (duration: 01m 01s)

2019-04-09

  • 23:14 twentyafterfour@deploy1001: Pruned MediaWiki: 1.33.0-wmf.17 [keeping static files] (duration: 06m 03s)
  • 22:31 twentyafterfour@deploy1001: Finished scap: testwikis wikis to 1.33.0-wmf.25 refs T206679 (duration: 39m 59s)
  • 22:19 chaomodus: uploaded python-pynetbox to apt.wikimedia.org/stretch-wikimedia (T217072)
  • 22:13 mobrovac@deploy1001: Finished deploy [restbase/deploy@c0a2977]: Bring RB on restbase20(19|20) up to date - T208087 (duration: 02m 32s)
  • 22:11 mobrovac@deploy1001: Started deploy [restbase/deploy@c0a2977]: Bring RB on restbase20(19|20) up to date - T208087
  • 21:57 twentyafterfour@deploy1001: Started scap: testwikis wikis to 1.33.0-wmf.25 refs T206679
  • 21:48 urandom: decommissioning cassandra-a, restbase2007 -- T208087
  • 19:46 herron: added myself to ldap group cn=archiva-deployers,ou=groups,dc=wikimedia,dc=org
  • 19:10 twentyafterfour: branching 1.33.0-wmf.25
  • 18:53 crusnov@deploy1001: Finished deploy [netbox/deploy@018d83e]: Minor fix to Netbox-Ganeti sync script (duration: 00m 52s)
  • 18:52 crusnov@deploy1001: Started deploy [netbox/deploy@018d83e]: Minor fix to Netbox-Ganeti sync script
  • 18:50 thcipriani: gerrit back
  • 18:48 thcipriani: gerrit restart
  • 18:48 thcipriani@deploy1001: Finished deploy [gerrit/gerrit@43d2d2e]: Gerrit update (cobalt) -- restart incoming (duration: 00m 10s)
  • 18:47 thcipriani@deploy1001: Started deploy [gerrit/gerrit@43d2d2e]: Gerrit update (cobalt) -- restart incoming
  • 18:46 thcipriani@deploy1001: Finished deploy [gerrit/gerrit@43d2d2e]: Gerrit update (gerrit2001 only) (duration: 00m 10s)
  • 18:46 thcipriani@deploy1001: Started deploy [gerrit/gerrit@43d2d2e]: Gerrit update (gerrit2001 only)
  • 18:42 volans: restart icinga on icinga1001 - T196336
  • 18:38 cdanis: T196336 cdanis@icinga1001$ sudo systemctl restart nsca
  • 18:27 crusnov@deploy1001: Finished deploy [netbox/deploy@4aa3e47]: Add node sync to Netbox-Ganeti sync script - T215229 (duration: 00m 57s)
  • 18:26 crusnov@deploy1001: Started deploy [netbox/deploy@4aa3e47]: Add node sync to Netbox-Ganeti sync script - T215229
  • 18:11 gilles@deploy1001: Finished deploy [performance/asoranking@4c83130]: (no justification provided) (duration: 00m 03s)
  • 18:11 gilles@deploy1001: Started deploy [performance/asoranking@4c83130]: (no justification provided)
  • 18:07 urandom: bootstrapping cassandra-c, restbase2020 -- T208087
  • 17:58 gilles@deploy1001: Finished deploy [performance/asoranking@4c83130]: (no justification provided) (duration: 00m 02s)
  • 17:58 gilles@deploy1001: Started deploy [performance/asoranking@4c83130]: (no justification provided)
  • 17:56 elukey: restart keyholder-agent on deploy1001 to pick up new settings for analytics (+ arm all the keys)
  • 17:42 gilles@deploy1001: Finished deploy [performance/asoranking@4c83130]: (no justification provided) (duration: 00m 04s)
  • 17:42 gilles@deploy1001: Started deploy [performance/asoranking@4c83130]: (no justification provided)
  • 17:42 elukey: restart keyholder-proxy.service on deploy1001 as attempt to reload perms for the analytics_deploy key
  • 17:37 gilles@deploy1001: Finished deploy [performance/asoranking@4c83130]: (no justification provided) (duration: 00m 10s)
  • 17:37 gilles@deploy1001: Started deploy [performance/asoranking@4c83130]: (no justification provided)
  • 17:19 bsitzmann@deploy1001: Finished deploy [mobileapps/deploy@b04c397]: Update mobileapps to 3edfcad (T220045 T219411 T219667) (duration: 03m 50s)
  • 17:15 bsitzmann@deploy1001: Started deploy [mobileapps/deploy@b04c397]: Update mobileapps to 3edfcad (T220045 T219411 T219667)
  • 17:14 twentyafterfour@deploy1001: Synchronized php-1.33.0-wmf.24/includes/export/WikiExporter.php: deploy https://gerrit.wikimedia.org/r/c/mediawiki/core/+/502537/1 (duration: 00m 51s)
  • 17:09 twentyafterfour@deploy1001: Synchronized php-1.33.0-wmf.24/includes/export/XmlDumpWriter.php: deploy https://gerrit.wikimedia.org/r/c/mediawiki/core/+/502538/1 (duration: 00m 52s)
  • 17:04 gilles@deploy1001: Synchronized php-1.33.0-wmf.24/includes/specials/SpecialUploadStash.php: T220265 Add support for X-Swift-Secret to upload stash (duration: 00m 53s)
  • 17:03 twentyafterfour: deploying https://gerrit.wikimedia.org/r/c/mediawiki/core/+/502538/1 and https://gerrit.wikimedia.org/r/c/mediawiki/core/+/502537/1
  • 17:01 arturo: T220426 reimaging+renaming labtestnet2002 to cloudweb2001-dev
  • 16:49 otto@deploy1001: scap-helm eventgate-analytics finished
  • 16:49 otto@deploy1001: scap-helm eventgate-analytics cluster eqiad completed
  • 16:49 otto@deploy1001: scap-helm eventgate-analytics upgrade production -f eventgate-analytics-eqiad-values.yaml --reset-values stable/eventgate-analytics [namespace: eventgate-analytics, clusters: eqiad]
  • 16:46 otto@deploy1001: scap-helm eventgate-analytics finished
  • 16:46 otto@deploy1001: scap-helm eventgate-analytics cluster codfw completed
  • 16:46 otto@deploy1001: scap-helm eventgate-analytics upgrade production -f eventgate-analytics-codfw-values.yaml --reset-values stable/eventgate-analytics [namespace: eventgate-analytics, clusters: codfw]
  • 16:45 otto@deploy1001: scap-helm eventgate-analytics finished
  • 16:45 otto@deploy1001: scap-helm eventgate-analytics cluster staging completed
  • 16:45 otto@deploy1001: scap-helm eventgate-analytics upgrade staging -f eventgate-analytics-staging-values.yaml --reset-values stable/eventgate-analytics [namespace: eventgate-analytics, clusters: staging]
  • 16:41 herron: performing rolling restart of kafka main brokers and eventbus instances in eqiad to pick up security updates
  • 16:32 otto@deploy1001: scap-helm eventgate-analytics finished
  • 16:32 otto@deploy1001: scap-helm eventgate-analytics cluster staging completed
  • 16:32 otto@deploy1001: scap-helm eventgate-analytics upgrade staging -f eventgate-analytics-staging-values.yaml --reset-values stable/eventgate-analytics [namespace: eventgate-analytics, clusters: staging]
  • 16:28 jijiki: Restarting thumbor service on thumbor1001
  • 16:26 jijiki: Upgrading thumbor1001 to python-thumbor-wikimedia_2.4-1+deb9u1
  • 16:18 jijiki: Uploading python-thumbor-wikimedia_2.4-1+deb9u1 to component/thumbor in stretch-wikimedia
  • 15:05 moritzm: uploaded jenkins 2.164.1 for stretch-wikimedia/thirdparty/ci
  • 15:04 moritzm: uploaded jenkins 2.164.1 for jessie-wikimedia/thirdparty
  • 14:42 ejegg: updated payments-wiki from 15bcb3d1a6 to aa8dad50e7
  • 14:10 ema: reboot lvs2010 with systemd 232 T209707
  • 14:09 godog: bootstrapping cassandra-b, restbase2020 -- T208087
  • 13:19 godog: bounce rsyslog on wezen
  • 13:11 fsero: building envoy docker image
  • 13:07 jbond42: rolling security updates of systemd on canary systems
  • 12:35 godog: bounce rsyslog on lithium
  • 12:13 elukey: powercycle logstash1012 - no ssh, no mgmt console available, seems completely stuck
  • 12:10 jbond42: remove facter2.4 from wikimedia-buster
  • 11:27 moritzm: upgrading API servers mw1276-mw1290 to HHVM 3.18.5+dfsg-1+wmf8+deb9u2 and wikidiff 1.8.1 (T203069)
  • 11:07 akosiaris: pool both DCs for newly created swift.recovery.wmnet RR
  • 11:07 akosiaris@puppetmaster1001: conftool action : set/pooled=true; selector: name=.*,dnsdisc=swift
  • 11:00 ema: rebooting lvs2010 with systemd 241-1~bpo9+1 T209707
  • 10:57 moritzm: updated buster installer to daily build from 9th of April
  • 10:09 godog: bootstrapping cassandra-a, restbase2020 -- T208087
  • 10:07 moritzm: rebooting stat1005 for some tests again
  • 09:49 gilles@deploy1001: Synchronized php-1.33.0-wmf.24/extensions/NavigationTiming: T220476 Add originCountry to paintTiming context (duration: 00m 54s)
  • 09:46 moritzm: rebooting stat1005 for some tests
  • 08:47 akosiaris: switch swift to be accessed from varnish+ats active/active rw
  • 08:38 marostegui@deploy1001: Synchronized wmf-config/db-eqiad.php: Remove old comment from db1089 (duration: 00m 51s)
  • 08:33 marostegui@deploy1001: Synchronized wmf-config/db-codfw.php: Repool db2069 (duration: 00m 50s)
  • 08:10 marostegui: Upgrade db2069
  • 08:10 marostegui@deploy1001: Synchronized wmf-config/db-codfw.php: Depool db2069 (duration: 00m 51s)
  • 07:52 moritzm: upgrading app servers mw1319-mw1333 to HHVM 3.18.5+dfsg-1+wmf8+deb9u2 and wikidiff 1.8.1 (T203069)
  • 07:49 marostegui@deploy1001: Synchronized wmf-config/db-eqiad.php: Deploy parsercache key change everywhere T210725 (duration: 00m 53s)
  • 07:37 moritzm: installing samba security updates
  • 07:21 marostegui: Change parsercache keys on mw[1230-1235,1238-1239] - T210725
  • 07:10 jijiki: Depool thumbor1004 for testing - T187765
  • 07:09 marostegui: Change parsercache keys on mw[1221-1229] - T210725
  • 07:03 marostegui: Change parsercache keys on mw[1280-1289] - T210725
  • 06:51 dcausse: elasticsearch search cluster: reindex all spaceless languages in eqiad and codfw (T219533)
  • 06:47 moritzm: installing libav security updates
  • 06:39 marostegui: Change parsercache keys on mw[1260-1269] - T210725
  • 06:30 marostegui: Change parsercache keys on mw[1270-1279] - T210725
  • 06:01 marostegui: Deploy parsercache key change on canaries only - T210725
  • 03:23 krinkle@deploy1001: Synchronized php-1.33.0-wmf.24/extensions/ExternalGuidance/extension.json: Id04a3a / T219841 (duration: 00m 52s)
  • 03:16 onimisionipe: depooled maps2003 - T219849
  • 02:47 onimisionipe: restarting tilerator on maps2003 - T219849
  • 02:40 krinkle@deploy1001: Synchronized php-1.33.0-wmf.24/extensions/ExternalGuidance/extension.json: I8614f6 / T219841 (duration: 00m 53s)
  • 01:27 eileen: civicrm revision changed from dfe89516b3 to d0e44a9e51, config revision is 2bcbf44521
  • 00:45 urandom: bootstrapping cassandra-c, restbase2019 -- T208087
  • 00:07 ebernhardson@deploy1001: Synchronized wmf-config/: T218716: Migrade configs to WikibaseCirrusSearch (duration: 00m 51s)

2019-04-08

  • 23:57 ebernhardson@deploy1001: Synchronized wmf-config/InitialiseSettings.php: T218954: Enable WBCS search on commons too (duration: 00m 50s)
  • 23:45 ebernhardson@deploy1001: Synchronized wmf-config: T218954: Disable wbcs dispatching query builder on commons (3/3) (duration: 00m 52s)
  • 23:41 ebernhardson@deploy1001: Synchronized wmf-config: T218954: Disable wbcs dispatching query builder on commons (3/3) (duration: 00m 51s)
  • 23:33 ebernhardson@deploy1001: Synchronized wmf-config/Wikibase.php: T218954: Disable wbcs dispatching query builder on commons (2/3) (duration: 00m 52s)
  • 23:10 ebernhardson@deploy1001: Synchronized wmf-config/: T218954: Disable wbcs dispatching query builder on commons (1/3) (duration: 00m 52s)
  • 22:45 XioNoX: rollback enable sampling on cr2-eqiad external links
  • 22:29 XioNoX: enable sampling on cr2-eqiad external links
  • 22:18 XioNoX: enable sampling on eqiad Telia transit link
  • 22:04 jforrester@deploy1001: Synchronized php-1.33.0-wmf.24/extensions/WikibaseMediaInfo/src/WikibaseMediaInfoHooks.php: WBMI T220277 (duration: 00m 57s)
  • 22:01 XioNoX: pfw firewall rules update - T217355
  • 20:49 bsitzmann@deploy1001: Finished deploy [mobileapps/deploy@c7fa522]: Update mobileapps to cdb9928 (T220045 T219411 T219667) (duration: 07m 55s)
  • 20:41 bsitzmann@deploy1001: Started deploy [mobileapps/deploy@c7fa522]: Update mobileapps to cdb9928 (T220045 T219411 T219667)
  • 20:24 urandom: bootstrapping cassandra-b, restbase2019 -- T208087
  • 20:08 bearND: mobileapps deploy failed on canary (Check 'endpoints' failed). Rolled back canary.
  • 20:08 bsitzmann@deploy1001: Finished deploy [mobileapps/deploy@c7fa522]: Update mobileapps to cdb9928 (T220045 T219411 T219667) (duration: 02m 10s)
  • 20:05 bsitzmann@deploy1001: Started deploy [mobileapps/deploy@c7fa522]: Update mobileapps to cdb9928 (T220045 T219411 T219667)
  • 19:59 marxarelli: promotion of 1.33.0-wmf.24 to all wikis completed. error rates nominal aside from usual timeouts. cc: T206678, T220037
  • 19:51 dduvall@deploy1001: rebuilt and synchronized wikiversions files: all wikis to 1.33.0-wmf.24
  • 19:48 marxarelli: promoting 1.33.0-wmf.24 to all wikis. cc: T220037, T206678
  • 19:41 dduvall@deploy1001: Synchronized php: group1 wikis to 1.33.0-wmf.24 (duration: 01m 46s)
  • 19:41 marxarelli: dduvall@deploy1001 rebuilt and synchronized wikiversions files: group1 wikis to 1.33.0-wmf.24
  • 19:41 marxarelli: dduvall@deploy1001 rebuilt and synchronized wikiversions files: group1 wikis to 1.33.0-wmf.2
  • 19:35 marxarelli: starting promotion of 1.33.0-wmf.24 to group1
  • 18:45 Lucas_WMDE: Morning SWAT done
  • 18:31 bblack: deploying wiktionary CNAME experiment - https://phabricator.wikimedia.org/T208263#5094712
  • 18:27 mobrovac@deploy1001: Finished deploy [restbase/deploy@9cf5364]: Lower AQS rate limits and fix recommendation-api spec - T219910 T220221 (duration: 21m 14s)
  • 18:27 otto@deploy1001: Synchronized wmf-config/InitialiseSettings.php: Enable eventgate-analytics api-request logging for group0 wikis - T214080 (duration: 00m 56s)
  • 18:24 mobrovac: restart pdfrender on scb2001 - T174916
  • 18:13 otto@deploy1001: scap-helm eventgate-analytics finished
  • 18:13 otto@deploy1001: scap-helm eventgate-analytics cluster eqiad completed
  • 18:12 otto@deploy1001: scap-helm eventgate-analytics upgrade production -f eventgate-analytics-eqiad-values.yaml --reset-values stable/eventgate-analytics [namespace: eventgate-analytics, clusters: eqiad]
  • 18:12 otto@deploy1001: scap-helm eventgate-analytics finished
  • 18:12 otto@deploy1001: scap-helm eventgate-analytics cluster codfw completed
  • 18:12 otto@deploy1001: scap-helm eventgate-analytics upgrade production -f eventgate-analytics-codfw-values.yaml --reset-values stable/eventgate-analytics [namespace: eventgate-analytics, clusters: codfw]
  • 18:10 otto@deploy1001: scap-helm eventgate-analytics upgrade production -f eventgate-analytics-codfw-values.yaml --reset-values stable/eventgate-analytics [namespace: eventgate-analytics, clusters: codfw]
  • 18:09 otto@deploy1001: scap-helm eventgate-analytics finished
  • 18:09 otto@deploy1001: scap-helm eventgate-analytics cluster staging completed
  • 18:09 otto@deploy1001: scap-helm eventgate-analytics upgrade staging -f eventgate-analytics-staging-values.yaml --reset-values stable/eventgate-analytics [namespace: eventgate-analytics, clusters: staging]
  • 18:06 mobrovac@deploy1001: Started deploy [restbase/deploy@9cf5364]: Lower AQS rate limits and fix recommendation-api spec - T219910 T220221
  • 17:50 arturo: T220129 renaming labtestmetal2001.codfw.wmnet to clouddb2001-dev.codfw.wmnet
  • 17:42 XioNoX: add swift term to cr1/2-eqiad - T220081
  • 17:14 onimisionipe@deploy1001: Finished deploy [wdqs/wdqs@c30a540]: GUI updates, Updater with redirect fix and Blazegraph with XSS fix (duration: 11m 17s)
  • 17:03 onimisionipe@deploy1001: Started deploy [wdqs/wdqs@c30a540]: GUI updates, Updater with redirect fix and Blazegraph with XSS fix
  • 16:59 mobrovac@deploy1001: Finished deploy [mobileapps/deploy@64f09a0]: Force-deploy to scb1001 to test the config perms (duration: 00m 16s)
  • 16:59 mobrovac@deploy1001: Started deploy [mobileapps/deploy@64f09a0]: Force-deploy to scb1001 to test the config perms
  • 16:55 mholloway-shell@deploy1001: Synchronized wmf-config/InitialiseSettings-labs.php: Replace needed WikimediaEditorTasks Beta Cluster config (T220153) (duration: 00m 58s)
  • 16:31 urandom: bootstrapping cassandra-a, restbase2019 -- T208087
  • 15:35 herron: aborting ores to logstash kafka logging pipeline switchover for now. puppet applied only to ores2009, reverting now
  • 15:19 herron: switching ores to logstash kafka logging pipeline (via temporary puppet disable and rolling puppet agent runs)
  • 15:09 jijiki: Pool mw2206 - T215415
  • 14:55 papaul: powering down mw2206 for DIMM replacement
  • 14:49 otto@deploy1001: Finished deploy [analytics/refinery@7fa6fb7]: deploying oozie article recommender for baho (duration: 18m 35s)
  • 14:45 papaul: powering down elastic2048 for disk replacement
  • 14:30 otto@deploy1001: Started deploy [analytics/refinery@7fa6fb7]: deploying oozie article recommender for baho
  • 14:17 anomie@deploy1001: Synchronized wmf-config/InitialiseSettings.php: Setting actor migration to write-both/read-new on test wikis and mediawikiwiki (T188327) (duration: 00m 59s)
  • 14:06 jijiki: Temporarily serve thumbor traffic on thumbor1001 via haproxy - T187765
  • 13:41 moritzm: upgrading job runners in codfw to HHVM 3.18.5+dfsg-1+wmf8+deb9u2 and wikidiff 1.8.1 (T203069)
  • 12:31 hashar: contint2001: upgraded python-pbr 0.8.2-1 -> 1.10.0-1 # T218559
  • 12:25 moritzm: upgrading API servers in codfw to HHVM 3.18.5+dfsg-1+wmf8+deb9u2 and wikidiff 1.8.1 (T203069)
  • 12:06 arturo: reboot cloudvirt1009 to clean some ACPI errors in dmesg
  • 12:03 arturo: T219776 puppet node deactivate labtestnet2003.codfw.wmnet
  • 12:00 hashar: contint1001 upgraded zuul to 2.5.1-wmf6 # T208426
  • 11:53 hoo@deploy1001: Synchronized wmf-config/Wikibase.php: WikibaseClient: Conditionally enable mapframe support (T218051) (duration: 00m 58s)
  • 11:48 hashar: contint2001: stopping zuul-server , it is not meant to be running there
  • 11:41 hoo@deploy1001: Synchronized wmf-config/abusefilter.php: Enable blocking feature of AbuseFilter in zh.wikipedia (T210364) (duration: 00m 58s)
  • 11:25 hoo@deploy1001: Synchronized wmf-config/InitialiseSettings.php: Create uploader user group for thwiki (T216615) (duration: 00m 58s)
  • 11:12 jijiki: Restarted thumbor services after librsvg upgrade
  • 11:11 fsero: upgrading envoy to 1.9.1 T215810
  • 10:42 jdrewniak@deploy1001: Synchronized portals: Wikimedia Portals Update: Bumping portals to master (T128546)|gerrit:502190 Bumping portals to master (T128546) (duration: 00m 58s)
  • 10:41 jdrewniak@deploy1001: Synchronized portals/wikipedia.org/assets: Wikimedia Portals Update: Bumping portals to master (T128546)|gerrit:502190 Bumping portals to master (T128546) (duration: 00m 59s)
  • 10:34 moritzm: upgrading app servers in codfw to HHVM 3.18.5+dfsg-1+wmf8+deb9u2 and wikidiff 1.8.1 (T203069)
  • 10:23 jijiki: Running debdeploy to upgrade librsvg
  • 09:43 gehel: force allocation of 3 unassigned shards on elasticsearch / cirrus / eqiad
  • 09:30 arturo: T219776 puppet node clean labtestnet2003.codfw.wmnet
  • 09:20 volans: restarting icinga on icinga1001 - T196336
  • 08:45 moritzm: upgrading API servers mw1221-mw1235 to HHVM 3.18.5+dfsg-1+wmf8+deb9u2 and wikidiff 1.8.1 (T203069)
  • 08:34 akosiaris@deploy1001: scap-helm zotero finished
  • 08:34 akosiaris@deploy1001: scap-helm zotero cluster staging completed
  • 08:34 akosiaris@deploy1001: scap-helm zotero upgrade -f zotero-values-staging.yaml --reset-values staging stable/zotero [namespace: zotero, clusters: staging]
  • 08:32 akosiaris@deploy1001: scap-helm zotero finished
  • 08:32 akosiaris@deploy1001: scap-helm zotero cluster eqiad completed
  • 08:32 akosiaris@deploy1001: scap-helm zotero upgrade -f zotero-values-eqiad.yaml production stable/zotero [namespace: zotero, clusters: eqiad]
  • 08:32 akosiaris: lower CPU, memory limits for zotero pods. Set 1 cpu, 700Mi. This should help the pods to recover faster in some cases. The old memory leak issues we used to have seem to be no longer present
  • 08:31 akosiaris@deploy1001: scap-helm zotero finished
  • 08:31 akosiaris@deploy1001: scap-helm zotero cluster codfw completed
  • 08:31 akosiaris@deploy1001: scap-helm zotero upgrade -f zotero-values-codfw.yaml production stable/zotero [namespace: zotero, clusters: codfw]
  • 08:17 godog: delete fundraising folder from public grafana - T219825
  • 08:01 godog: bounce grafana after https://gerrit.wikimedia.org/r/c/operations/puppet/+/501519
  • 07:59 moritzm: upgrading mw1266-mw1275 to HHVM 3.18.5+dfsg-1+wmf8+deb9u2 and wikidiff 1.8.1 (T203069)
  • 07:59 moritzm: upgrading mw1266-mw1255 to HHVM 3.18.5+dfsg-1+wmf8+deb9u2 and wikidiff 1.8.1 (T203069)
  • 07:24 marostegui@deploy1001: Synchronized wmf-config/db-eqiad.php: Repool all slaves in x1 T217453 (duration: 00m 58s)
  • 07:19 marostegui: Deploy schema change on the first 10 wikis - T217453
  • 07:18 marostegui@deploy1001: Synchronized wmf-config/db-eqiad.php: Depool all slaves in x1 T217453 (duration: 00m 59s)
  • 07:02 moritzm: installing wget security updates
  • 07:02 marostegui@deploy1001: Synchronized wmf-config/db-eqiad.php: Repool all slaves in x1 T143763 (duration: 00m 58s)
  • 06:34 _joe_: restarted netbox, SIGSEGV on HUP-induced reload
  • 05:20 marostegui: Deploy schema change on x1 master with replication, there will be lag on x1 slaves T143763
  • 05:18 marostegui@deploy1001: Synchronized wmf-config/db-eqiad.php: Depool all slaves in x1 T219777 T143763 (duration: 01m 30s)

2019-04-07

  • off: restarted icinga on icinga2001
  • 06:34 oblivian@puppetmaster1001: conftool action : set/pooled=true; selector: dnsdisc=zotero,name=codfw
  • 06:23 _joe_: deleting zotero pods with high memory watermark in codfw
  • 06:03 oblivian@puppetmaster1001: conftool action : set/pooled=false; selector: dnsdisc=zotero,name=codfw

2019-04-06

  • 10:09 gilles: Purging ruwiki namespaces > 0

2019-04-05

  • 23:10 thcipriani: revert some recent problematic gerrit acl changes
  • 22:46 chaomodus: restarted pdfrender on scb1002 T174916
  • 21:45 hashar: thcipriani restarted Gerrit. CI works again # T220243
  • 21:37 thcipriani: restarting gerrit
  • 21:30 hashar: CI / Zuul is no more processing events / T220243
  • 17:29 thcipriani: gerrit back on 2.15.11
  • 17:27 thcipriani: restart gerrit
  • 17:26 thcipriani@deploy1001: Finished deploy [gerrit/gerrit@a4e66d4]: Gerrit to back to 2.15.11 on cobalt (restart incoming) (duration: 00m 11s)
  • 17:26 thcipriani@deploy1001: Started deploy [gerrit/gerrit@a4e66d4]: Gerrit to back to 2.15.11 on cobalt (restart incoming)
  • 17:25 thcipriani@deploy1001: Finished deploy [gerrit/gerrit@a4e66d4]: Gerrit to back to 2.15.11 (on gerrit2001 only) (duration: 00m 10s)
  • 17:25 thcipriani@deploy1001: Started deploy [gerrit/gerrit@a4e66d4]: Gerrit to back to 2.15.11 (on gerrit2001 only)
  • 17:19 krinkle@deploy1001: Synchronized php-1.33.0-wmf.23/includes/diff/TextSlotDiffRenderer.php: Ia326c6 / T220217 (duration: 01m 02s)
  • 17:12 krinkle@deploy1001: Synchronized php-1.33.0-wmf.24/includes/diff/TextSlotDiffRenderer.php: Ia326c6 / T220217 (duration: 01m 00s)
  • 16:02 krinkle@deploy1001: Synchronized php-1.33.0-wmf.24/includes/jobqueue/jobs/RefreshLinksJob.php: Ib1ac31365f9c / T220037 (duration: 00m 59s)
  • 15:58 ejegg: re-enabled recurring donations queue consumer
  • 15:57 krinkle@deploy1001: Synchronized php-1.33.0-wmf.24/extensions/NavigationTiming/: I6b23be / T220156 (duration: 01m 00s)
  • 15:51 krinkle@deploy1001: Synchronized php-1.33.0-wmf.24/extensions/GlobalBlocking/includes/specials/: I5843cd181ca7d (duration: 01m 02s)
  • 15:08 ejegg: upgraded fundraising CiviCRM from 3c55850631 to 83478013a8
  • 15:01 ejegg: disabled recurring donation queue consumer
  • 14:55 papaul: powering down restbase2019 and 2020 for relocation
  • 13:53 gehel@cumin2001: END (FAIL) - Cookbook sre.elasticsearch.rolling-reboot (exit_code=99)
  • 13:45 akosiaris: repool eqiad for all kubernetes services T217426
  • 13:45 akosiaris: ρepool eqiad for all kubernetes services T217426
  • 13:45 akosiaris@puppetmaster1001: conftool action : set/pooled=true; selector: name=eqiad,dnsdisc=citoid
  • 13:45 akosiaris@puppetmaster1001: conftool action : set/pooled=true; selector: name=eqiad,dnsdisc=cxserver
  • 13:45 akosiaris@puppetmaster1001: conftool action : set/pooled=true; selector: name=eqiad,dnsdisc=mathoid
  • 13:44 akosiaris@puppetmaster1001: conftool action : set/pooled=true; selector: name=eqiad,dnsdisc=blubberoid
  • 13:44 akosiaris@puppetmaster1001: conftool action : set/pooled=true; selector: name=eqiad,dnsdisc=zotero
  • 13:41 arturo: T220203 reimage labtestnet2002 as spare in stretch
  • 13:36 arturo: T220101 disable active icinga checks for cloudcontrol2002-dev
  • 13:35 gehel@cumin1001: END (PASS) - Cookbook sre.elasticsearch.force-shard-allocation (exit_code=0)
  • 13:35 gehel@cumin1001: START - Cookbook sre.elasticsearch.force-shard-allocation
  • 13:35 gehel@cumin1001: END (PASS) - Cookbook sre.elasticsearch.force-shard-allocation (exit_code=0)
  • 13:35 gehel@cumin1001: START - Cookbook sre.elasticsearch.force-shard-allocation
  • 12:50 gehel@cumin1001: END (FAIL) - Cookbook sre.elasticsearch.force-shard-allocation (exit_code=99)
  • 12:49 gehel@cumin1001: START - Cookbook sre.elasticsearch.force-shard-allocation
  • 12:48 jijiki: Restarting pybal on lvs1016 and lvs2003 for 496382
  • 12:43 gehel@cumin1001: END (PASS) - Cookbook sre.elasticsearch.force-shard-allocation (exit_code=0)
  • 12:43 gehel@cumin1001: START - Cookbook sre.elasticsearch.force-shard-allocation
  • 12:43 gehel@cumin1001: END (PASS) - Cookbook sre.elasticsearch.force-shard-allocation (exit_code=0)
  • 12:43 gehel@cumin1001: START - Cookbook sre.elasticsearch.force-shard-allocation
  • 12:33 gehel@cumin2001: START - Cookbook sre.elasticsearch.rolling-reboot
  • 12:33 gehel@cumin2001: END (ERROR) - Cookbook sre.elasticsearch.rolling-reboot (exit_code=97)
  • 12:32 akosiaris: depool eqiad for all kubernetes services T217426
  • 12:32 akosiaris@puppetmaster1001: conftool action : set/pooled=false; selector: name=eqiad,dnsdisc=citoid
  • 12:32 akosiaris@puppetmaster1001: conftool action : set/pooled=false; selector: name=eqiad,dnsdisc=cxserver
  • 12:32 akosiaris@puppetmaster1001: conftool action : set/pooled=false; selector: name=eqiad,dnsdisc=mathoid
  • 12:32 akosiaris@puppetmaster1001: conftool action : set/pooled=false; selector: name=eqiad,dnsdisc=blubberoid
  • 12:32 akosiaris@puppetmaster1001: conftool action : set/pooled=false; selector: name=eqiad,dnsdisc=zotero
  • 12:31 akosiaris: repool codfw for all kubernetes services T217426
  • 12:30 gehel@cumin1001: END (PASS) - Cookbook sre.elasticsearch.force-shard-allocation (exit_code=0)
  • 12:30 gehel@cumin1001: START - Cookbook sre.elasticsearch.force-shard-allocation
  • 12:29 akosiaris: repool codfw for all kubernetes services
  • 12:29 akosiaris@puppetmaster1001: conftool action : set/pooled=true; selector: name=codfw,dnsdisc=citoid
  • 12:29 akosiaris@puppetmaster1001: conftool action : set/pooled=true; selector: name=codfw,dnsdisc=cxserver
  • 12:29 akosiaris@puppetmaster1001: conftool action : set/pooled=true; selector: name=codfw,dnsdisc=mathoid
  • 12:29 akosiaris@puppetmaster1001: conftool action : set/pooled=true; selector: name=codfw,dnsdisc=blubberoid
  • 12:29 akosiaris@puppetmaster1001: conftool action : set/pooled=true; selector: name=codfw,dnsdisc=zotero
  • 12:18 gehel@cumin2001: START - Cookbook sre.elasticsearch.rolling-reboot
  • 12:18 gehel@cumin2001: END (ERROR) - Cookbook sre.elasticsearch.rolling-reboot (exit_code=97)
  • 12:15 gehel@cumin1001: END (PASS) - Cookbook sre.elasticsearch.force-shard-allocation (exit_code=0)
  • 12:15 gehel@cumin1001: START - Cookbook sre.elasticsearch.force-shard-allocation
  • 12:12 bblack: repool esams
  • 12:04 gehel@cumin2001: START - Cookbook sre.elasticsearch.rolling-reboot
  • 11:53 bblack: esams depooled in DNS
  • 11:37 jijiki: Restarting pybal on lvs1006 and lvs2006 for 496382
  • 11:27 gehel@cumin2001: END (FAIL) - Cookbook sre.elasticsearch.rolling-reboot (exit_code=99)
  • 10:57 arturo: updating puppet catalog compiler facts
  • 10:42 elukey: restart druid broker on druid100[5,6] - exceptions in the logs after old datasource removal
  • 10:41 elukey: restart druid broker on druid1004 - exceptions in the logs after old datasource removal
  • 10:10 gehel@cumin2001: START - Cookbook sre.elasticsearch.rolling-reboot
  • 10:10 gehel@cumin2001: END (ERROR) - Cookbook sre.elasticsearch.rolling-reboot (exit_code=97)
  • 09:27 gehel@cumin2001: START - Cookbook sre.elasticsearch.rolling-reboot
  • 09:27 gehel@cumin2001: END (ERROR) - Cookbook sre.elasticsearch.rolling-reboot (exit_code=97)
  • 09:26 gehel@cumin1001: END (PASS) - Cookbook sre.elasticsearch.force-shard-allocation (exit_code=0)
  • 09:26 gehel@cumin1001: START - Cookbook sre.elasticsearch.force-shard-allocation
  • 08:57 akosiaris: depool codfw kubernetes apps from discovery in preparation for upgrade
  • 08:57 akosiaris@puppetmaster1001: conftool action : set/pooled=false; selector: name=codfw,dnsdisc=citoid
  • 08:57 akosiaris@puppetmaster1001: conftool action : set/pooled=false; selector: name=codfw,dnsdisc=cxserver
  • 08:57 akosiaris@puppetmaster1001: conftool action : set/pooled=false; selector: name=codfw,dnsdisc=mathoid
  • 08:57 akosiaris@puppetmaster1001: conftool action : set/pooled=false; selector: name=codfw,dnsdisc=blubberoid
  • 08:57 akosiaris@puppetmaster1001: conftool action : set/pooled=false; selector: name=codfw,dnsdisc=zotero
  • 08:55 arturo: T220101 reimaging+renaming labtestservices2002 to cloudservices2002-dev
  • 08:43 akosiaris: upgrade kubernetes staging cluster to 1.11.9
  • 08:32 elukey: roll restart of aqs on aqs100* to pick up new druid settings
  • 08:07 marostegui@deploy1001: Synchronized wmf-config/db-eqiad.php: Fully repool db1075 (duration: 00m 59s)
  • 08:06 gehel@cumin2001: START - Cookbook sre.elasticsearch.rolling-reboot
  • 07:51 elukey: restart gerrit on cobalt (timeouts and general slowdown)
  • 07:34 jijiki: Repooling thumbor1004 until we replace its memory - T215411
  • 07:18 moritzm: upgrading mw1262-mw1265 to HHVM 3.18.5+dfsg-1+wmf8+deb9u2 and wikidiff 1.8.1 (T203069)
  • 06:56 marostegui@deploy1001: Synchronized wmf-config/db-eqiad.php: More traffic to db1075 (duration: 00m 57s)
  • 06:04 marostegui@deploy1001: Synchronized wmf-config/db-eqiad.php: More traffic to db1075 (duration: 01m 00s)
  • 05:32 marostegui@deploy1001: Synchronized wmf-config/db-eqiad.php: Repool db1075 with low weight (duration: 00m 58s)
  • 05:15 marostegui: Fully upgrade and reboot db1075
  • 05:14 marostegui@deploy1001: Synchronized wmf-config/db-eqiad.php: Depool db1075 (duration: 00m 59s)
  • 04:49 gilles: T216594 Start purge of namespace 0 on ruwiki
  • 02:27 eileen: update civicrm revision changed from 7560af93df to 3c55850631, config revision is 9ad5ef3e15
  • 00:09 bd808@deploy1001: Synchronized wmf-config/InitialiseSettings.php: SWAT: wikitech: Lock LDAP accounts when users are blocked|gerrit:497866wikitech: Lock LDAP accounts when users are blocked, Disable Phabricator accounts when blocked on wikitech|gerrit:501123Disable Phabricator accounts when blocked on wikitech (T168692) 2/2 (duration: 00m 57s)
  • 00:07 bd808@deploy1001: Synchronized wmf-config/wikitech.php: SWAT: wikitech: Lock LDAP accounts when users are blocked|gerrit:497866wikitech: Lock LDAP accounts when users are blocked, Disable Phabricator accounts when blocked on wikitech|gerrit:501123Disable Phabricator accounts when blocked on wikitech (T168692) (duration: 00m 59s)

2019-04-04

  • 23:52 bd808@deploy1001: Synchronized php-1.33.0-wmf.23/extensions/LdapAuthentication: SWAT: Also set an LDAP password policy on Block|gerrit:501412Also set an LDAP password policy on Block (T168692) (duration: 01m 01s)
  • 23:38 bd808@deploy1001: Synchronized wmf-config/InitialiseSettings.php: SWAT: Add smn and sms to wmgExtraLanguageNames|gerrit:501393Add smn and sms to wmgExtraLanguageNames (T220118) (duration: 01m 02s)
  • 21:22 XioNoX: renumber AS58587 to AS10075 in eqsin
  • 21:17 bblack: DNS deploying https://gerrit.wikimedia.org/r/c/operations/dns/+/500731 which can affect resolution of our CNAME records. If dns-related issues, can revert at will!
  • 21:09 herron: restarting eqiad ELK stack for security updates
  • 20:45 marxarelli: promotion of 1.33.0-wmf.24 rolled back to group0 and holding. cc: T206678, T220037
  • 20:41 dduvall@deploy1001: rebuilt and synchronized wikiversions files: Revert "group2/group1 wikis to 1.33.0-wmf.24"
  • 20:36 marxarelli: rolling back again following still high rates of DBTransactionError (avg ~ 800/min)
  • 20:16 dduvall@deploy1001: rebuilt and synchronized wikiversions files: all wikis to 1.33.0-wmf.24
  • 20:11 marxarelli: promoting 1.33.0-wmf.24 to all wikis
  • 20:11 marxarelli: error rates look good after proper syncs and re-deploy. cc: T220037
  • 20:06 dduvall@deploy1001: Synchronized php-1.33.0-wmf.24/extensions/Citoid/modules/ve.ui.Citoid.init.js: sync for https://gerrit.wikimedia.org/r/c/mediawiki/extensions/Citoid/+/501114 (duration: 00m 58s)
  • 20:04 dduvall@deploy1001: Synchronized php-1.33.0-wmf.24/extensions/LdapAuthentication/LdapAuthenticationPlugin.php: sync for https://gerrit.wikimedia.org/r/c/mediawiki/extensions/LdapAuthentication/+/500994 (duration: 00m 57s)
  • 20:03 dduvall@deploy1001: Synchronized php-1.33.0-wmf.24/extensions/LdapAuthentication/LdapAuthenticationHooks.php: sync for https://gerrit.wikimedia.org/r/c/mediawiki/extensions/LdapAuthentication/+/500994 (duration: 00m 58s)
  • 20:02 dduvall@deploy1001: Synchronized php-1.33.0-wmf.24/extensions/LdapAuthentication/LdapAuthentication.php: sync for https://gerrit.wikimedia.org/r/c/mediawiki/extensions/LdapAuthentication/+/500994 (duration: 00m 58s)
  • 19:58 dduvall@deploy1001: Synchronized php-1.33.0-wmf.24/extensions/EventBus/includes/JobExecutor.php: syncing JobExecutor changes (duration: 00m 58s)
  • 19:55 dduvall@deploy1001: Synchronized php: group1 wikis to 1.33.0-wmf.24 (duration: 01m 47s)
  • 19:53 dduvall@deploy1001: rebuilt and synchronized wikiversions files: group1 wikis to 1.33.0-wmf.24
  • 19:51 marxarelli: re-deploying to group1 after proper syncs
  • 19:47 dduvall@deploy1001: Synchronized php-1.33.0-wmf.24/extensions/Citoid/modules/ve.ui.Citoid.init.js: (no justification provided) (duration: 00m 59s)
  • 19:46 dduvall@deploy1001: Synchronized php-1.33.0-wmf.24/extensions/EventBus/includes/JobExecutor.php: (no justification provided) (duration: 00m 58s)
  • 19:45 dduvall@deploy1001: Synchronized php-1.33.0-wmf.24/extensions/LdapAuthentication/LdapAuthenticationPlugin.php: (no justification provided) (duration: 00m 58s)
  • 19:44 dduvall@deploy1001: Synchronized php-1.33.0-wmf.24/extensions/LdapAuthentication/LdapAuthenticationHooks.php: (no justification provided) (duration: 00m 59s)
  • 19:43 dduvall@deploy1001: Synchronized php-1.33.0-wmf.24/extensions/LdapAuthentication/LdapAuthentication.php: (no justification provided) (duration: 00m 59s)
  • 19:19 dduvall@deploy1001: rebuilt and synchronized wikiversions files: Revert "group1 wikis to 1.33.0-wmf.24"
  • 19:13 marxarelli: large spike in DBTransactionError errors. rolling back. cc: T220037
  • 19:12 dduvall@deploy1001: Synchronized php: group1 wikis to 1.33.0-wmf.24 (duration: 01m 46s)
  • 19:10 dduvall@deploy1001: rebuilt and synchronized wikiversions files: group1 wikis to 1.33.0-wmf.24
  • 19:06 marxarelli: fetch/rebase looks good, incorporates fixes for T220037, T219510. deploying
  • 19:03 marxarelli: preparing to promote 1.33.0-wmf.24 to group1
  • 18:33 catrope@deploy1001: Synchronized wmf-config/InitialiseSettings.php: Enable partial blocks on frwiki, plwiki (T219327, T219218) (duration: 00m 58s)
  • 18:23 catrope@deploy1001: Synchronized wmf-config/InitialiseSettings.php: Enable ORES RCFilters on eswikiquote (T219160) (duration: 01m 02s)
  • 18:13 moritzm: restarted apache on people.wikimedia.org to pick up OpenSSL update
  • 17:59 bstorm_: stopped postgresql on labsdb1006.eqiad.wmnet and moved the database master functionality (and all rsyncs) to clouddb1003.clouddb-services.eqiad.wmflabs
  • 17:59 ppchelko@deploy1001: Finished deploy [cpjobqueue/deploy@922cbc0]: Switch to new logging infrastructure T211125 (duration: 04m 03s)
  • 17:55 ppchelko@deploy1001: Started deploy [cpjobqueue/deploy@922cbc0]: Switch to new logging infrastructure T211125
  • 17:47 ppchelko@deploy1001: Finished deploy [changeprop/deploy@f69dc9c]: Switch to new logging infrastructure T211125 (duration: 01m 44s)
  • 17:45 ppchelko@deploy1001: Started deploy [changeprop/deploy@f69dc9c]: Switch to new logging infrastructure T211125
  • 17:33 jynus: stopping replication on dbstore2001:s8 for backup testing T206203
  • 17:29 jynus: killing ongoing backup at dbprov2002, stuck
  • 17:28 gehel@cumin2001: END (ERROR) - Cookbook sre.elasticsearch.rolling-reboot (exit_code=97)
  • 17:10 gehel@cumin2001: START - Cookbook sre.elasticsearch.rolling-reboot
  • 16:31 herron: beginning rolling kafka restarts on kafka200[123] for security updates
  • 16:01 herron: repooling kafka2003 eventbus
  • 15:59 mutante: wikivoyage-old.org domain has been retired and deactivated (T219867, T81727)
  • 15:56 herron: depooling kafka2003 for eventbus security updates
  • 15:55 herron: repooling kafka2002 eventbus
  • 15:52 herron: depooling kafka2002 for eventbus security updates
  • 15:52 herron: pooling kafka2001 eventbus
  • 15:42 herron: depooling kafka2001 for eventbus security updates
  • 15:38 moritzm: rolling restart of proton to pick up openssl security update
  • 15:03 gehel@cumin2001: END (PASS) - Cookbook sre.elasticsearch.rolling-reboot (exit_code=0)
  • 14:59 moritzm: installing libdatetime-timezone-perl updates
  • 14:24 jiji@cumin1001: conftool action : set/pooled=no; selector: dc=codfw,service=cxserver,cluster=scb,name=scb.*
  • 14:24 jiji@cumin1001: conftool action : set/pooled=no; selector: dc=eqiad,service=cxserver,cluster=scb,name=scb.*
  • 14:23 jijiki: Depooling scb* from service cxserver traffic
  • 13:46 gehel@cumin2001: START - Cookbook sre.elasticsearch.rolling-reboot
  • 13:46 ladsgroup@deploy1001: Synchronized wmf-config/interwiki.php: Update interwiki cache (duration: 03m 37s)
  • 13:29 jbond42: restart of gerrit apache service will occure at 13:40
  • 13:28 volans: upgraded spicerack to 0.0.22 on cumin[12]001
  • 13:27 volans: uploaded spicerack_0.0.22-1_amd64.deb to apt.wikimedia.org stretch-wikimedia
  • 13:23 moritzm: upgrading mw1261 to HHVM 3.18.5+dfsg-1+wmf8+deb9u2 / wikidiff 1.8.1
  • 13:20 jijiki: Stopped all citoid services from scb* - 494215
  • 13:15 jbond42: restart of phabricator apache service will occure at 14:25
  • 12:46 moritzm: uploaded HHVM 3.18.5+dfsg-1+wmf8+deb9u2 to apt.wikimedia.org/stretch-wikimedia
  • 12:10 arturo: T219626 reimaging cloudcontrol2001-dev again
  • 11:43 moritzm: upgrading HHVM on mwdebug servers in eqiad along with update to hhvm-wikidiff 1.8.1
  • 11:35 moritzm: uploaded nodejs 10.15.2~dfsg-1+wmf1 to the component/node10 component of apt.wikimedia.org/stretch-wikimedia (updated to latest 10.x release and a change to ensure zlib binary compat with NodeSource) (T215562)
  • 11:34 Amir1: EU SWAT is done
  • 11:32 ladsgroup@deploy1001: Synchronized wmf-config/CommonSettings.php: SWAT: Add mediawiki.org to the URL shortener whitelist|gerrit:500976Add mediawiki.org to the URL shortener whitelist (duration: 00m 58s)
  • 11:28 jbond42: rolling security updates for apache on jessie
  • 11:25 ladsgroup@deploy1001: Synchronized wmf-config/InitialiseSettings.php: SWAT: Enable ReferencePreviews beta feature on de- and ar-wiki (T218766)|gerrit:498371Enable ReferencePreviews beta feature on de- and ar-wiki (T218766) (duration: 01m 00s)
  • 11:21 arturo: T219626 reimaging cloudcontrol2001-dev again
  • 11:08 arturo: drop python-psutil from jessie-wikimedia/openstack-mitaka-jessie, related to T219626
  • 10:56 moritzm: uploaded hhvm-wikidiff 1.8.1 to apt.wikimedia.org/stretch-wikimedia (source package is named php-wikdiff2 for legacy reasons) (T203069)
  • 10:21 arturo: T219626 reimaging cloudcontrol2001-dev again
  • 10:01 moritzm: installing openssl1.0 security updates on stretch-based DB hosts
  • 08:36 moritzm: rolling restart of parsoid to pick up OpenSSL security update
  • 08:06 moritzm: uploaded Apache 2.4.10-10+deb8u14+wmf1 to apt.wikimedia.org/jessie-wikimedia (latest jessie security update rebased with our local patches)
  • 05:39 marostegui: Stop MySQL on db2033 for decommission - T219493
  • 05:32 marostegui: Remove db2033 from tendril and zarcillo - T219493
  • 05:19 marostegui@deploy1001: Synchronized wmf-config/db-eqiad.php: Remove db2033 for decommission T219493 (duration: 00m 59s)
  • 05:18 marostegui@deploy1001: Synchronized wmf-config/db-codfw.php: Remove db2033 for decommission T219493 (duration: 00m 59s)
  • 04:58 marostegui: Deploy schema change on labswiki for the job table - T219887
  • 00:40 chaomodus: restart pdfrender on scb1003 - T174916

2019-04-03

  • 23:51 catrope@deploy1001: Synchronized wmf-config/InitialiseSettings.php: Enable Flow beta feature on zhwikisource (T219588) (duration: 00m 58s)
  • 23:50 catrope@deploy1001: Synchronized dblists/flow.dblist: Enable Flow on zhwikisource (T219588) (duration: 00m 57s)
  • 23:38 catrope@deploy1001: Synchronized wmf-config/InitialiseSettings.php: Enable GrowthExperiments homepage EventLogging on testwiki (duration: 00m 59s)
  • 23:20 catrope@deploy1001: Synchronized wmf-config/InitialiseSettings.php: Configure GrowthExperiments homepage tutorial pages on cswiki, kowiki, viwiki (dark deploy) (duration: 00m 59s)
  • 23:18 catrope@deploy1001: sync-file aborted: (no justification provided) (duration: 00m 00s)
  • 23:14 catrope@deploy1001: Synchronized wmf-config/InitialiseSettings.php: Configure GrowthExperiments homepage on testwiki (duration: 01m 01s)
  • 21:32 elukey: start hadoop-hdfs-namenode on an-master1002 after outage due to big job hitting HDFS
  • 20:40 gehel: excluding elastic2048 from cluster and depooling - T220038
  • 20:29 arlolra: Updated Parsoid to 0b3bb10 (T219337)
  • 20:20 arlolra@deploy1001: Finished deploy [parsoid/deploy@4f740e3]: Updating Parsoid to 0b3bb10 (duration: 05m 44s)
  • 20:14 arlolra@deploy1001: Started deploy [parsoid/deploy@4f740e3]: Updating Parsoid to 0b3bb10
  • 20:09 marxarelli: 1.33.0-wmf.24 is holding at group0 following rollback. filed T220037. cc: T206678
  • 19:56 marxarelli: log correction group1 reverted to 1.33.0-wmf.23
  • 19:56 dduvall@deploy1001: rebuilt and synchronized wikiversions files: Revert group1 to 1.33.0-wmf.24
  • 19:55 marxarelli: 111,185 and counting DBTransactionError for jobrunner.discovery.wmnet
  • 19:53 marxarelli: rolling back group1
  • 19:53 marxarelli: massive spike in DBTransactionError ([{exception_id}] {exception_url} Wikimedia\Rdbms\DBTransactionError from line 246 of /srv/mediawiki/php-1.33.0-wmf.24/includes/libs/rdbms/lbfactory/LBFactory.php: RefreshLinksJob::runForTitle: transaction round 'RefreshLinksJob::run' already started.)
  • 19:51 dduvall@deploy1001: Synchronized php: group1 wikis to 1.33.0-wmf.24 (duration: 01m 49s)
  • 19:50 marxarelli: dduvall@deploy1001 rebuilt and synchronized wikiversions files: group1 wikis to 1.33.0-wmf.24
  • 19:34 smalyshev@deploy1001: Finished deploy [wdqs/wdqs@50b2af9]: Deploy new Updater for more cache-friendly update startegy (duration: 10m 54s)
  • 19:23 smalyshev@deploy1001: Started deploy [wdqs/wdqs@50b2af9]: Deploy new Updater for more cache-friendly update startegy
  • 18:14 thcipriani: gerrit back on 2.15.12
  • 18:12 thcipriani: restarting gerrit for 2.15.12 update
  • 18:11 thcipriani@deploy1001: Finished deploy [gerrit/gerrit@e416edf]: Gerrit to 2.15.12 on cobalt (restart to follow) (duration: 00m 11s)
  • 18:11 thcipriani@deploy1001: Started deploy [gerrit/gerrit@e416edf]: Gerrit to 2.15.12 on cobalt (restart to follow)
  • 18:09 thcipriani@deploy1001: Finished deploy [gerrit/gerrit@e416edf]: Gerrit to 2.15.12 on gerrit2001 only (duration: 00m 11s)
  • 18:09 thcipriani@deploy1001: Started deploy [gerrit/gerrit@e416edf]: Gerrit to 2.15.12 on gerrit2001 only
  • 17:57 elukey: restart hadoop-hdfs-namenode on an-master1001 as precautionary measure after the outage (currently standby)
  • 17:44 herron: shortly postponing restarts of eventbus and kafka services for security updates due to unrelated firefighting - repooling kafka1001
  • 17:19 elukey: restart hadoop-hdfs-namenode on an-master1002 after forced shutdown due to errors
  • 17:14 herron: depooling kafka1001 to restart eventbus and kafka services for security updates
  • 17:04 Lucas_WMDE: EU SWAT done
  • 17:04 Lucas_WMDE: lucaswerkmeister-wmde@mwmaint1002:~$ mwscript maintenance/namespaceDupes.php --wiki=srwiki --fix # T214428 – 0 pages to fix, 0 links to fix, Looks good!
  • 17:03 lucaswerkmeister-wmde@deploy1001: Synchronized wmf-config/throttle.php: SWAT: Add new throttle rule (T220001)|gerrit:500987Add new throttle rule (T220001) (duration: 00m 58s)
  • 17:00 lucaswerkmeister-wmde@deploy1001: Synchronized php-1.33.0-wmf.24/extensions/EventBus: SWAT: Incorrect order of calls in createPageDeleteEvent.|gerrit:500959Incorrect order of calls in createPageDeleteEvent. (duration: 00m 59s)
  • 16:51 gehel@cumin2001: END (FAIL) - Cookbook sre.elasticsearch.rolling-reboot (exit_code=99)
  • 16:44 gehel@cumin2001: START - Cookbook sre.elasticsearch.rolling-reboot
  • 16:37 Lucas_WMDE: lucaswerkmeister-wmde@mwmaint1002:~$ mwscript maintenance/namespaceDupes.php --wiki=idwiktionary --fix # T218796 – 41 links to fix, 41 were resolvable, Looks good!
  • 16:36 lucaswerkmeister-wmde@deploy1001: Synchronized wmf-config/InitialiseSettings.php: Add namespace "Lampiran" at id.wiktionary (T218796)|gerrit:499530Add namespace "Lampiran" at id.wiktionary (T218796) (duration: 00m 59s)
  • 16:29 lucaswerkmeister-wmde@deploy1001: Synchronized wmf-config/InitialiseSettings.php: SWAT: Enable Draft namespace on srwiki (T214428)|gerrit:500761Enable Draft namespace on srwiki (T214428) (duration: 01m 00s)
  • 16:22 lucaswerkmeister-wmde@deploy1001: Synchronized wmf-config/InitialiseSettings.php: SWAT: Add three domains at wgCopyUploadDomains (T216886, T219075)|gerrit:500154Add three domains at wgCopyUploadDomains (T216886, T219075) (duration: 01m 00s)
  • 16:16 lucaswerkmeister-wmde@deploy1001: Synchronized wmf-config/flaggedrevs.php: SWAT: Remove namespace 104 from FlaggedRevs configuration for arwiki (T217507)|gerrit:500153Remove namespace 104 from FlaggedRevs configuration for arwiki (T217507) (duration: 01m 00s)
  • 15:18 volans: shutdown ms-be2026 for firmware upgrade - T219854
  • 15:16 volans@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
  • 15:16 volans@cumin1001: START - Cookbook sre.hosts.downtime
  • 15:00 anomie@mwmaint1002: Fixing empty values for 'target_author_actor' in log_search on wikitech for T215525
  • 15:00 anomie@mwmaint1002: Fixing empty values for 'target_author_actor' in log_search on section 8 wikis for T215525
  • 15:00 anomie@mwmaint1002: Fixing empty values for 'target_author_actor' in log_search on section 7 wikis for T215525
  • 15:00 anomie@mwmaint1002: Fixing empty values for 'target_author_actor' in log_search on section 6 wikis for T215525
  • 15:00 anomie@mwmaint1002: Fixing empty values for 'target_author_actor' in log_search on section 5 wikis for T215525
  • 15:00 anomie@mwmaint1002: Fixing empty values for 'target_author_actor' in log_search on section 4 wikis for T215525
  • 15:00 anomie@mwmaint1002: Fixing empty values for 'target_author_actor' in log_search on remaining section 3 wikis for T215525
  • 15:00 anomie@mwmaint1002: Fixing empty values for 'target_author_actor' in log_search on section 2 wikis for T215525
  • 14:59 anomie@mwmaint1002: Fixing empty values for 'target_author_actor' in log_search on section 1 wikis for T215525
  • 14:56 anomie@deploy1001: Synchronized php-1.33.0-wmf.24/maintenance/includes/MigrateActors.php: Backporting fix from gerrit:500754 (duration: 01m 01s)
  • 14:55 anomie@deploy1001: Synchronized php-1.33.0-wmf.23/maintenance/includes/MigrateActors.php: Backporting fix from gerrit:500754 (duration: 01m 01s)
  • 14:18 marostegui: Stop replication on pc2007 for testing - T210725
  • 14:03 andrewbogott: restarting rabbitmq on cloudcontrol1003
  • 13:59 andrewbogott: restarting neutron-l3-agent on cloudnet1003 and cloudnet1004
  • 13:46 andrewbogott: restarting neutron-metadata-agent on cloudnet1003
  • 13:44 gilles@deploy1001: Synchronized php-1.33.0-wmf.23/includes/media/MediaTransformOutput.php: T216499 Identify images that should have had high importance (duration: 00m 59s)
  • 13:34 moritzm: reverting dbmonitor2001 to deb8u12+wmf1 build
  • 13:02 gehel@cumin2001: END (FAIL) - Cookbook sre.elasticsearch.rolling-reboot (exit_code=99)
  • 13:01 gehel@cumin2001: START - Cookbook sre.elasticsearch.rolling-reboot
  • 12:49 gehel@cumin2001: END (FAIL) - Cookbook sre.elasticsearch.rolling-reboot (exit_code=99)
  • 12:45 gehel@cumin2001: START - Cookbook sre.elasticsearch.rolling-reboot
  • 12:42 arturo: T219626 reimaging cloudcontrol2001-dev
  • 12:31 mutante: restarting gerrit service to apply change 498431
  • 11:25 Amir1: EU SWAT is done
  • 11:16 jbond42: rolling security updates for apache
  • 10:29 mutante: planet1001/2001 - apt autoremove un-required packages
  • 10:27 mutante: planet1001/2001 - upgrade apache2, openssh, locales, rsyslog ..
  • 10:25 arturo: updating puppet compiler facts
  • 10:19 volans: upgraded spicerack to 0.0.21 on cumin[12]001
  • 10:17 volans: uploaded spicerack_0.0.21-1_amd64.deb to apt.wikimedia.org stretch-wikimedia
  • 09:56 marostegui: Alter empty job table on s6 primary master - T219887
  • 09:55 moritzm: upgrading beta to hhvm wikidiff 1.8.1 (T203069)
  • 09:54 mutante: running mysql select queries on m3-slave to get data from phabricator conpherence as requested by andre
  • 09:45 moritzm: removed labtestnet2003.codfw.wmnet from debmonitor (T219776)
  • 09:29 ema: cp-ats-codfw: test ATS rolling restart T213263
  • 09:27 marostegui: Drop wikishared.wikimedia_editor_tasks_entity_description_exists table from x1 T219963
  • 09:27 marostegui@deploy1001: Synchronized wmf-config/db-eqiad.php: Repool s8 sanitarium master (duration: 00m 56s)
  • 09:26 marostegui@deploy1001: Synchronized wmf-config/db-codfw.php: Repool s8 sanitarium master (duration: 01m 00s)
  • 08:35 jynus: merging change on network constants (firewall operation)
  • 08:23 marostegui: Restart mysql on sanitarium hosts db1124 db1125 db2094 db2095 - T218302
  • 08:18 marostegui: Stop replication on db2082 and db1087 (s8 sanitarium masters) T218302
  • 08:17 marostegui@deploy1001: Synchronized wmf-config/db-codfw.php: Depool s8 sanitarium master (duration: 00m 57s)
  • 08:16 marostegui@deploy1001: Synchronized wmf-config/db-eqiad.php: Depool s8 sanitarium master (duration: 00m 58s)
  • 08:09 moritzm: installing new apache packages on mmw1261
  • 07:53 gilles@deploy1001: Synchronized php-1.33.0-wmf.24/includes/media/ThumbnailImage.php: T216499 Only apply high priority hint half the time (duration: 00m 58s)
  • 07:51 moritzm: installing new apache packages on mwdebug
  • 07:42 marostegui: Reboot db1115 - tendril and dbtree will be down
  • 07:40 marostegui: DIsable event scheduler on db1115 before restarting - tendril is stuck
  • 07:26 marostegui@deploy1001: Synchronized wmf-config/db-eqiad.php: Repool db1120 T219493 (duration: 00m 57s)
  • 07:25 marostegui: Deploy schema change on db1073, labtestwiki - T219887
  • 07:09 marostegui: Stop replication in sync on db1120 and db2034 (x1 codfw master) - T219493
  • 07:07 marostegui@deploy1001: Synchronized wmf-config/db-eqiad.php: Depool db1120 T219493 (duration: 01m 13s)
  • 06:04 _joe_: restart varnish backend on cp1085, causing unavailability
  • 05:57 marostegui: Fix data drifts on bnwikisource on x1 - T219493
  • 05:31 marostegui@deploy1001: Synchronized wmf-config/db-eqiad.php: Repool pc1007 (duration: 00m 59s)
  • 05:23 marostegui: Upgrade pc1007
  • 05:23 marostegui@deploy1001: Synchronized wmf-config/db-eqiad.php: Depool pc1007 for upgrade (duration: 01m 00s)

2019-04-02

  • 23:44 jforrester@deploy1001: Synchronized wmf-config/InitialiseSettings.php: SWAT enwiki: Restrict move-categorypages to +extendedmover/+sysop/+bot T219261 (duration: 00m 58s)
  • 23:30 jforrester@deploy1001: Synchronized wmf-config/InitialiseSettings.php: SWAT Add new WMCS IP range to wgRateLimitsExcludedIps T167432 (duration: 00m 57s)
  • 23:27 jforrester@deploy1001: Synchronized wmf-config/InitialiseSettings.php: SWAT Enable SandboxLink for rowiki T219855 (duration: 00m 56s)
  • 23:22 jforrester@deploy1001: Synchronized wmf-config/InitialiseSettings.php: SDC: Add 'depicts' statements to search index on testcommons (duration: 00m 59s)
  • 21:27 andrewbogott: rebooting labservices1001
  • 21:16 andrewbogott: rebooting labservices1002
  • 20:54 andrewbogott: restarting pdns and pdns-recursor on labservices1001 and 1002 in hopes of getting those machines to act a bit less sluggish
  • 20:23 krinkle@deploy1001: Synchronized php-1.33.0-wmf.24/skins/Vector/includes/: I6e04b512d / T219864 (duration: 00m 59s)
  • 20:20 krinkle@deploy1001: Synchronized php-1.33.0-wmf.23/skins/Vector/includes/: I6e04b512d / T219864 (duration: 01m 00s)
  • 20:16 marxarelli: 1.33.0-wmf.24 successfully deployed to group0. errors rates look normal (T206678)
  • 20:07 dduvall@deploy1001: rebuilt and synchronized wikiversions files: Group0 to 1.33.0-wmf.24
  • 19:57 dduvall@deploy1001: Finished scap: testwiki to php-1.33.0-wmf.24 and rebuild l10n cache (duration: 44m 20s)
  • 19:12 dduvall@deploy1001: Started scap: testwiki to php-1.33.0-wmf.24 and rebuild l10n cache
  • 18:41 ppchelko@deploy1001: Finished deploy [restbase/deploy@2cb53a7]: Kafka logging pipeline, full deploy T211125 (duration: 20m 49s)
  • 18:22 marxarelli: cutting mediawiki branch 1.33.0-wmf.24 (T206678)
  • 18:22 marxarelli: cutting mediawiki branch 1.33.0-wmf.24
  • 18:20 ppchelko@deploy1001: Started deploy [restbase/deploy@2cb53a7]: Kafka logging pipeline, full deploy T211125
  • 18:20 ppchelko@deploy1001: deploy aborted: Kafka logging pipeline, full deploy T211125 (duration: 00m 03s)
  • 18:20 ppchelko@deploy1001: Started deploy [restbase/deploy@2cb53a7]: Kafka logging pipeline, full deploy T211125
  • 18:09 ppchelko@deploy1001: Finished deploy [restbase/deploy@2cb53a7]: Kafka logging pipeline, canary on restbase2010 T211125 (duration: 02m 33s)
  • 18:06 ppchelko@deploy1001: Started deploy [restbase/deploy@2cb53a7]: Kafka logging pipeline, canary on restbase2010 T211125
  • 17:59 ppchelko@deploy1001: Finished deploy [restbase/deploy@2cb53a7] (dev-cluster): Kafka logging pipeline, dev cluster only T211125 (duration: 03m 25s)
  • 17:56 ppchelko@deploy1001: Started deploy [restbase/deploy@2cb53a7] (dev-cluster): Kafka logging pipeline, dev cluster only T211125
  • 17:51 ppchelko@deploy1001: Finished deploy [restbase/deploy@3dcf328]: Upgrade swagger to v3, attempt 2, T218218 (duration: 20m 47s)
  • 17:37 ejegg: updated payments-wiki-staging from 793bce1a5f to 15bcb3d1a6
  • 17:30 ppchelko@deploy1001: Started deploy [restbase/deploy@3dcf328]: Upgrade swagger to v3, attempt 2, T218218
  • 17:30 ppchelko@deploy1001: Finished deploy [restbase/deploy@3dcf328] (dev-cluster): Upgrade swagger to v3, attempt 2, T218218 (duration: 03m 02s)
  • 17:27 ppchelko@deploy1001: Started deploy [restbase/deploy@3dcf328] (dev-cluster): Upgrade swagger to v3, attempt 2, T218218
  • 16:47 XioNoX: - replacing accepted-prefix-limit with prefix-limit in eqsin - T211730
  • 16:44 ppchelko@deploy1001: Finished deploy [restbase/deploy@6026ad1]: Switch to swagger 3 T218218 (duration: 04m 52s)
  • 16:39 ppchelko@deploy1001: Started deploy [restbase/deploy@6026ad1]: Switch to swagger 3 T218218
  • 16:36 XioNoX: - replacing accepted-prefix-limit with prefix-limit on esams - T211730
  • 16:12 XioNoX: - replacing accepted-prefix-limit with prefix-limit on cr2-eqiad - T211730
  • 16:02 mutante: T194174 - bump. started alerting again 2 days ago
  • 16:00 mutante: icinga - schedule (30d) downtime for kubernetes operational latencies alerts (T219696) on kubernetes1004
  • 15:57 arturo: T219626 reimaging cloudcontrol2001-dev again
  • 15:55 mutante: scandium - systemctl start parsoid-vd was failed (T201366)
  • 15:55 herron: beginning rolling upgrade of codfw ELK cluster to 5.6.15 T219571
  • 15:52 mutante: icinga - re-enabling notifications for scandium. setup task is resolved yet systemd is alerting, should not have been turned off anymore (T201366)
  • 15:39 XioNoX: repool eqsin - T219847
  • 15:32 jbond42: add cpp-hocon 0.1.6 to jessie-wikimedia/backports
  • 15:21 jforrester@deploy1001: Synchronized wmf-config/InitialiseSettings.php: VE: Enable mobile section editing A/B test on all remaining wikis T219564 (duration: 00m 51s)
  • 15:07 moritzm: stopped/disabled ipmievd on cumin2001
  • 14:54 jbond42: add leatherman 1.4 to jessie-wikimedia/backports
  • 13:44 anomie@mwmaint1002: Fixing empty values for 'target_author_actor' in log_search on test wikis and mediawikiwiki for T215525
  • 13:24 volans: reboot ms-be2026 to see if that fixes the controller - T219854
  • 13:23 volans@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
  • 13:23 volans@cumin1001: START - Cookbook sre.hosts.downtime
  • 13:20 volans@cumin1001: END (FAIL) - Cookbook sre.hosts.downtime (exit_code=99)
  • 13:20 volans@cumin1001: START - Cookbook sre.hosts.downtime
  • 13:20 jynus: updating puppet compiler facts
  • 12:11 arturo: icinga downtime toolschecker for 1 month T219243
  • 12:07 hashar: contint1001: compressing some MediaWiki debugging logs under /srv/jenkins/builds # T219850
  • 11:42 moritzm: restarting parsoid on wtp1025 to pick up openssl update
  • 11:33 hashar: contint1001: cleaning Docker containers #T219850
  • 11:23 Amir1: EU SWAT is done
  • 11:22 ladsgroup@deploy1001: Synchronized wmf-config/InitialiseSettings.php: SWAT: Add the urlshortener-manage-url right and enable it for stewards (T133109)|gerrit:499777Add the urlshortener-manage-url right and enable it for stewards (T133109), Part I (duration: 00m 51s)
  • 11:21 ladsgroup@deploy1001: Synchronized wmf-config/CommonSettings.php: SWAT: Add the urlshortener-manage-url right and enable it for stewards (T133109)|gerrit:499777Add the urlshortener-manage-url right and enable it for stewards (T133109), Part I (duration: 00m 53s)
  • 11:14 akosiaris: T217715 Update mathoid, citoid, cxserver, eventgate grafana dashboards to use the new recording rules for the quantiles
  • 11:14 jbond42: add cmake 3.6.2 to jessie-wikimedia/backports
  • 11:02 jbond42: add rapidjson 1.1.0 to jessie-wikimedia/backports
  • 10:47 jbond42: add catch 1.10 to jessie-wikimedia/backports
  • 10:42 jbond42: add strip-nondeterminism 0.034 to jessie-wikimedia/backports
  • 10:39 jbond42: add dh-autoreconf 12 to jessie-wikimedia/backports
  • 10:30 jbond42: add debhelper 10.2.5 and dh-systemd 10.2.5 to jessie-wikimedia/backports
  • 10:08 elukey: manually purge varnishkafka graphite alert's URL as attempt to avoid a flapping alert - T219842
  • 09:14 arturo: T219776 finally reimaging cloudnet2003-dev.codfw.wmnet (was labtestnet2003)
  • 09:03 _joe_: uploaded patched version of bootstrap-vz to account for jessie-updates vanishing (T219683)
  • 08:58 marostegui@deploy1001: Synchronized wmf-config/db-eqiad.php: Repool all slaves in x1 T219777 T143763 (duration: 00m 53s)
  • 08:50 marostegui: Execute schema change on db1069 x1 master with replication enabled on the following small wikis: aawiki aawikibooks aawiktionary abwiki abwiktionary acewiki advisorswiki advisorywiki adywiki afwiki T143763
  • 08:20 marostegui: Compress wikishared.urlshortcodes table on x1, directly on the master with replication (table has 1 row) - T219777
  • 08:16 marostegui@deploy1001: Synchronized wmf-config/db-eqiad.php: Depool all slaves in x1 T219777 T143763 (duration: 00m 53s)
  • 08:13 moritzm: installing debdeploy updates on remaining hosts in eqiad/codfw
  • 08:05 moritzm: installing openssl1.0 security updates
  • 07:52 moritzm: removed labvirt1008 from debmonitor (T216661)
  • 06:38 marostegui@deploy1001: Synchronized wmf-config/db-eqiad.php: Repool db1120 (duration: 00m 50s)
  • 06:31 marostegui@deploy1001: Synchronized wmf-config/db-eqiad.php: Depool db1120 (duration: 00m 52s)
  • 06:26 marostegui@deploy1001: Synchronized wmf-config/db-eqiad.php: Repool db1064 (duration: 00m 52s)
  • 06:16 marostegui@deploy1001: Synchronized wmf-config/db-eqiad.php: Depool db1064 (duration: 00m 54s)
  • 06:02 marostegui@deploy1001: Synchronized wmf-config/db-eqiad.php: Repool pc1008 (duration: 00m 53s)
  • 05:58 oblivian@deploy1001: Finished deploy [docker-pkg/deploy@2a090ef]: New version for T219778 (duration: 00m 19s)
  • 05:58 oblivian@deploy1001: Started deploy [docker-pkg/deploy@2a090ef]: New version for T219778
  • 05:55 marostegui: Upgrade pc1008
  • 05:55 marostegui@deploy1001: Synchronized wmf-config/db-eqiad.php: Depool pc1008 (duration: 00m 56s)
  • 04:14 onimisionipe: restarted tilerator on maps200[1-3] - connection refused
  • 01:18 XioNoX: replacing accepted-prefix-limit with prefix-limit on cr1-eqiad - T211730
  • 01:14 XioNoX: replacing accepted-prefix-limit with prefix-limit in eqord - T211730
  • 00:52 XioNoX: depool eqsin due to Telia eqsin-codfw link outage
  • 00:40 XioNoX: replacing accepted-prefix-limit with prefix-limit in [co|eq]dfw - T211730
  • 00:25 XioNoX: replacing accepted-prefix-limit with prefix-limit on all ulsfo peers - T211730
  • 00:19 XioNoX: replacing accepted-prefix-limit with prefix-limit on one ulsfo peer - T211730
  • 00:06 XioNoX: jnt push to msw switches

2019-04-01

  • 23:54 shdubsh: restarting kafka on kafka-jumbo1004
  • 23:47 shdubsh: restarting kafka on kafka-jumbo1003
  • 23:36 shdubsh: restart kafka on kafka-jumbo1002
  • 23:28 shdubsh: restart kafka on kafka-jumbo1001
  • 23:16 XioNoX: jnt push to csw2-esams
  • 22:52 XioNoX: restart pdfrender on scb1003 - T174916
  • 21:44 sbassett@deploy1001: Synchronized private/PrivateSettings.php: Remove kowiki spam mitigations T212679 (duration: 00m 54s)
  • 21:28 XioNoX: Push AS specific policy-statements to cr1/2-eqsin v4 peers - T211930
  • 21:11 dcausse: elasticsearch search cluster: reindex spaceless languages (T219533)
  • 19:48 gilles@deploy1001: Synchronized wmf-config/InitialiseSettings.php: T216499 Renew Priority Hints origin trial token (duration: 00m 54s)
  • 19:48 bblack: authdns2001 (ns1) upgrade gdnsd -> 3.1.0
  • 18:58 XioNoX: re-set ulsfo-codfw ospf cost to previous default - T219591
  • 18:52 shdubsh: restart mjolnir-kafka-msearch on relforge1002 to adopt new logging config
  • 18:44 dcausse: Morning SWAT done
  • 18:42 dcausse@deploy1001: Synchronized wmf-config/InitialiseSettings.php: T219268: [cirrus] Use bm25 similarity for all wikis (duration: 00m 51s)
  • 18:33 dcausse@deploy1001: Synchronized wmf-config/CirrusSearch-production.php: T210381: [cirrus] Cleanup transitional states (duration: 00m 53s)
  • 18:22 lucaswerkmeister-wmde@deploy1001: Synchronized wmf-config/CommonSettings.php: SWAT: ExternalGuidance: Allow google translate hosts as known services (T218948)|gerrit:498913ExternalGuidance: Allow google translate hosts as known services (T218948) (duration: 00m 53s)
  • 18:18 bblack: multatuli (ns2) upgrade gdnsd -> 3.1.0
  • 18:10 lucaswerkmeister-wmde@deploy1001: Synchronized wmf-config/InitialiseSettings.php: SWAT: Add tmpSerializeEmptyListsAsObjects Wikibase repo config (T138104)|gerrit:499999Add tmpSerializeEmptyListsAsObjects Wikibase repo config (T138104) (duration: 00m 54s)
  • 17:55 XioNoX: remove asw2-c-eqiad:et-3/1/2 from disabled interfaces - T218059
  • 17:31 bblack: authdns1001 (ns0) upgrade gdnsd -> 3.1.0
  • 17:22 bblack: upgrade gdnsd -> 3.1.0 (wmf2) on cp1099 (authdns test)
  • 17:21 bblack: uploading gdnsd-3.1.0-1~wmf2 to stretch-wikimedia
  • 17:15 onimisionipe@deploy1001: Finished deploy [wdqs/wdqs@115a6bf]: Added more endpoint, GUI updates and new bot pattern (duration: 12m 10s)
  • 17:07 arturo: restart dhcp server in install2002 to release old lease for labtestnet2003
  • 17:03 onimisionipe@deploy1001: Started deploy [wdqs/wdqs@115a6bf]: Added more endpoint, GUI updates and new bot pattern
  • 16:32 vgutierrez: slowly reenabling puppet in cache text cluster - T213705
  • 16:28 bblack: upgrade gdnsd -> 3.1.0 on cp1099 (authdns test)
  • 16:25 bblack: uploading gdnsd-3.1.0-1~wmf1 to stretch-wikimedia
  • 16:15 arturo: T219776 reimaging + renaming labtestnet2003 into cloudnet2003-dev
  • 16:13 vgutierrez@puppetmaster1001: conftool action : set/pooled=yes; selector: name=cp1075.eqiad.wmnet
  • 16:07 vgutierrez@puppetmaster1001: conftool action : set/pooled=no; selector: name=cp1075.eqiad.wmnet
  • 16:05 vgutierrez@puppetmaster1001: conftool action : set/pooled=yes; selector: name=cp2023.codfw.wmnet
  • 15:57 vgutierrez@puppetmaster1001: conftool action : set/pooled=no; selector: name=cp2023.codfw.wmnet
  • 15:56 vgutierrez@puppetmaster1001: conftool action : set/pooled=yes; selector: name=cp3042.esams.wmnet
  • 15:49 vgutierrez@puppetmaster1001: conftool action : set/pooled=no; selector: name=cp3042.esams.wmnet
  • 15:48 vgutierrez@puppetmaster1001: conftool action : set/pooled=yes; selector: name=cp4032.ulsfo.wmnet
  • 15:43 vgutierrez@puppetmaster1001: conftool action : set/pooled=no; selector: name=cp4032.ulsfo.wmnet
  • 15:42 vgutierrez@puppetmaster1001: conftool action : set/pooled=yes; selector: name=cp5007.eqsin.wmnet
  • 15:30 vgutierrez@puppetmaster1001: conftool action : set/pooled=no; selector: name=cp5007.eqsin.wmnet
  • 15:24 vgutierrez: disable puppet in the cache text cluster - T213705
  • 15:09 Amir1: mwscript extensions/CirrusSearch/maintenance/updateSearchIndexConfig.php --wiki=hywwiki --baseName hywwiki --cluster (eqiad|codfw)
  • 14:59 mholloway-shell@deploy1001: Synchronized wmf-config/InitialiseSettings-labs.php: Cleanup: Remove obsolete WikimediaEditorTasks beta cluster prefs (duration: 00m 50s)
  • 14:44 moritzm: rolling out debdeploy 0.0.99.10 for jessie, buster, stretch systems
  • 14:42 moritzm: restarting superset on analytics-tool1004 to pick up latest Python
  • 14:41 Amir1: ladsgroup@mwmaint1002:~$ mwscript maintenance/createAndPromote.php --wiki=hywwiki --force --sysop Ladsgroup
  • 14:37 ladsgroup@deploy1001: Synchronized langlist: (no justification provided) (duration: 00m 50s)
  • 14:35 ladsgroup@deploy1001: Synchronized wmf-config/InitialiseSettings.php: (no justification provided) (duration: 00m 50s)
  • 14:33 ladsgroup@deploy1001: Synchronized multiversion/MWMultiVersion.php: T212597 (duration: 00m 51s)
  • 14:32 Amir1: wikiadmin@10.64.32.136(hywwiki)> update text set old_text = 'DB://cluster25/1';
  • 14:18 ladsgroup@deploy1001: rebuilt and synchronized wikiversions files: (no justification provided)
  • 14:11 moritzm: uploaded debdeploy 0.0.99.10 to apt.wikimedia.org (jessie, stretch, buster)
  • 14:07 ladsgroup@deploy1001: Synchronized dblists: (no justification provided) (duration: 00m 52s)
  • 14:04 vgutierrez@puppetmaster1001: conftool action : set/pooled=yes; selector: name=cp5007.eqsin.wmnet
  • 13:57 vgutierrez@puppetmaster1001: conftool action : set/pooled=no; selector: name=cp5007.eqsin.wmnet
  • 13:56 vgutierrez@puppetmaster1001: conftool action : set/pooled=yes; selector: name=cp5001.eqsin.wmnet
  • 13:50 hashar: Reverted CI Jenkins jobs to Quibble 0.0.28 # T219647
  • 13:47 vgutierrez@puppetmaster1001: conftool action : set/pooled=no; selector: name=cp5001.eqsin.wmnet
  • 13:26 mvolz@deploy1001: scap-helm citoid finished
  • 13:26 mvolz@deploy1001: scap-helm citoid cluster codfw completed
  • 13:26 mvolz@deploy1001: scap-helm citoid upgrade production -f citoid-codfw-values.yaml stable/citoid [namespace: citoid, clusters: codfw]
  • 13:23 mvolz@deploy1001: scap-helm citoid finished
  • 13:23 mvolz@deploy1001: scap-helm citoid cluster eqiad completed
  • 13:23 mvolz@deploy1001: scap-helm citoid upgrade production -f citoid-eqiad-values.yaml stable/citoid [namespace: citoid, clusters: eqiad]
  • 13:12 mvolz@deploy1001: scap-helm citoid finished
  • 13:12 mvolz@deploy1001: scap-helm citoid cluster staging completed
  • 13:12 mvolz@deploy1001: scap-helm citoid upgrade staging -f citoid-staging-values.yaml stable/citoid [namespace: citoid, clusters: staging]
  • 13:11 hashar: Upgraded CI Jenkins jobs to Quibble 0.0.30 # T219647
  • 13:09 jbond42: rolling security update of tshark
  • 12:24 oblivian@deploy1001: Finished deploy [docker-pkg/deploy@46ba982]: Rollback - third time is the charm (duration: 00m 43s)
  • 12:23 oblivian@deploy1001: Started deploy [docker-pkg/deploy@46ba982]: Rollback - third time is the charm
  • 12:08 oblivian@deploy1001: Finished deploy [docker-pkg/deploy@0c32dc1]: Rollback to 1.0.0, T219778 (duration: 00m 18s)
  • 12:08 oblivian@deploy1001: Started deploy [docker-pkg/deploy@0c32dc1]: Rollback to 1.0.0, T219778
  • 12:02 oblivian@deploy1001: Finished deploy [docker-pkg/deploy@UNKNOWN]: Rollback to 1.0.0, T219778 (duration: 00m 34s)
  • 12:02 oblivian@deploy1001: Started deploy [docker-pkg/deploy@UNKNOWN]: Rollback to 1.0.0, T219778
  • 11:58 Lucas_WMDE: EU SWAT done
  • 11:57 lucaswerkmeister-wmde@deploy1001: Synchronized php-1.33.0-wmf.23/extensions/WikibaseLexeme: SWAT: Fix GrammaticalFeatureListWidget (T219134, T219734)|gerrit:500237Fix GrammaticalFeatureListWidget (T219134, T219734) (duration: 01m 00s)
  • 11:53 moritzm: uploaded logstash/kibana/elasticsearch 5.6.15 to component thirdparty/elastic56
  • 11:52 moritzm: uploaded logstash/kibana/elasticsearch to component thirdparty/elastic56
  • 11:51 zfilipin@deploy1001: Synchronized wmf-config/InitialiseSettings.php: SWAT: Add unwatchedpages permission to rollbacker and patroller at zhwiki (T219285)|gerrit:500393Add unwatchedpages permission to rollbacker and patroller at zhwiki (T219285) (duration: 00m 52s)
  • 11:41 zfilipin@deploy1001: Synchronized static/images/project-logos/: SWAT: Correct logos for the Gujarati Wikipedia (T219373)|gerrit:499210Correct logos for the Gujarati Wikipedia (T219373) (duration: 00m 52s)
  • 11:34 zfilipin@deploy1001: Synchronized wmf-config/abusefilter.php: SWAT: Enable logging of private filters on commonswiki (T218527)|gerrit:497236Enable logging of private filters on commonswiki (T218527) (duration: 00m 50s)
  • 11:25 zfilipin@deploy1001: Synchronized wmf-config/InitialiseSettings.php: SWAT: Revert "Revert "Remove $wgAbuseFilterRuntimeProfile"" (T191039)|gerrit:498818Revert "Revert "Remove $wgAbuseFilterRuntimeProfile"" (T191039) (duration: 00m 51s)
  • 11:17 zfilipin@deploy1001: Synchronized wmf-config/abusefilter.php: SWAT: Revert "Revert "Remove $wgAbuseFilterProfile"" (T191039)|gerrit:498817Revert "Revert "Remove $wgAbuseFilterProfile"" (T191039) (duration: 00m 52s)
  • 11:16 oblivian@deploy1001: Finished deploy [docker-pkg/deploy@0c32dc1]: Upgrade to 1.1.2 (duration: 01m 08s)
  • 11:15 oblivian@deploy1001: Started deploy [docker-pkg/deploy@0c32dc1]: Upgrade to 1.1.2
  • 11:00 jbond42: halt rolling updates of tshark untill after SWAT
  • 10:48 jdrewniak@deploy1001: Synchronized portals: Wikimedia Portals Update: Bumping portals to master (T128546)|gerrit:500410 Bumping portals to master (T128546) (duration: 00m 50s)
  • 10:47 jdrewniak@deploy1001: Synchronized portals/wikipedia.org/assets: Wikimedia Portals Update: Bumping portals to master (T128546)|gerrit:500410 Bumping portals to master (T128546) (duration: 00m 52s)
  • 10:42 jbond42: rolling security update of tshark
  • 10:32 _joe_: pruning old images on boron
  • 10:31 oblivian@deploy1001: Finished deploy [docker-pkg/deploy@7ef5ca3]: Upgrade to 1.1.2 (duration: 00m 26s)
  • 10:31 oblivian@deploy1001: Started deploy [docker-pkg/deploy@7ef5ca3]: Upgrade to 1.1.2
  • 10:27 arturo: T219626 reimaging cloudcontrol2001-dev
  • 09:09 moritzm: installing Chromium security updates on proton* (tested the new release in deployment-prep)
  • 08:40 marostegui@deploy1001: Synchronized wmf-config/db-codfw.php: Repool db2033 (duration: 00m 51s)
  • 08:09 marostegui: Deploy testing schema change on enwiki.echo_event on db2033 and upgrade mysql - T143961
  • 07:54 ariel@deploy1001: Finished deploy [dumps/dumps@7abb6c8]: get db user/passwd va mw maint script (duration: 00m 03s)
  • 07:54 ariel@deploy1001: Started deploy [dumps/dumps@7abb6c8]: get db user/passwd va mw maint script
  • 07:30 marostegui@deploy1001: Synchronized wmf-config/db-codfw.php: Depool db2033 (duration: 00m 51s)
  • 06:28 _joe_: pushing wikimedia-jessie:{20190401,latest} to docker-registry.w.o T219580
  • 06:27 _joe_: installing new bootstrap-vz on boron T219580
  • 05:38 marostegui@deploy1001: Synchronized wmf-config/db-eqiad.php: Repool db1077 (duration: 00m 50s)
  • 05:08 marostegui: Deploy schema change on db1077, this will generate lag on s3 on labs
  • 05:08 marostegui@deploy1001: Synchronized wmf-config/db-eqiad.php: Depool db1077 (duration: 00m 53s)

2019-03-31

  • 06:57 marostegui: Remove old files from dbstore1001 to clean up the disk space warning

2019-03-30

  • 03:39 krinkle@deploy1001: Synchronized php-1.33.0-wmf.23/extensions/ImageMap/includes/ImageMap.php: I1387825f25e / T217087 (duration: 00m 52s)
  • 03:16 krinkle@deploy1001: Synchronized php-1.33.0-wmf.23/skins/Vector/includes/templates/index.mustache: I0d6e036b65da0 / T219359 / i18n regression (duration: 00m 54s)

2019-03-29

  • 22:06 bstorm_: stopped database services on labsdb1004 and labsdb1005
  • 21:01 ebernhardson@deploy1001: Finished deploy [search/mjolnir/deploy@ff9d424]: Elasticsearch 6 fixes for content-type headers (part 3) (duration: 05m 14s)
  • 20:55 ebernhardson@deploy1001: Started deploy [search/mjolnir/deploy@ff9d424]: Elasticsearch 6 fixes for content-type headers (part 3)
  • 20:49 ebernhardson@deploy1001: Finished deploy [search/mjolnir/deploy@ff9d424]: Elasticsearch 6 fixes for content-type headers (part 3) (duration: 03m 13s)
  • 20:46 ebernhardson@deploy1001: Started deploy [search/mjolnir/deploy@ff9d424]: Elasticsearch 6 fixes for content-type headers (part 3)
  • 20:35 ebernhardson@deploy1001: Finished deploy [search/mjolnir/deploy@ff9d424]: Elasticsearch 6 fixes for content-type headers (part 2) (duration: 03m 30s)
  • 20:31 ebernhardson@deploy1001: Started deploy [search/mjolnir/deploy@ff9d424]: Elasticsearch 6 fixes for content-type headers (part 2)
  • 20:30 ebernhardson@deploy1001: Finished deploy [search/mjolnir/deploy@ff9d424]: Elasticsearch 6 fixes for content-type headers (duration: 00m 30s)
  • 20:29 ebernhardson@deploy1001: Started deploy [search/mjolnir/deploy@ff9d424]: Elasticsearch 6 fixes for content-type headers
  • 18:41 ejegg: updated payments-wiki from 4b49bb7333 to 793bce1a5f
  • 15:51 XioNoX: repool ulsfo - T219591
  • 15:48 XioNoX: bump ulsfo-codfw ospf link cost to 1000 - T219591
  • 15:14 _joe_: pruning old images and containers on boron
  • 15:00 mutante: ldap-eqiad-replica02 - running out of disk - apt-get clean - gzipping /var/log/debug
  • 13:06 ema@puppetmaster1001: conftool action : set/pooled=yes; selector: name=cp2005.codfw.wmnet,service=varnish-fe
  • 13:06 ema@puppetmaster1001: conftool action : set/pooled=yes; selector: name=cp2005.codfw.wmnet,service=nginx
  • 13:06 ema@puppetmaster1001: conftool action : set/pooled=yes; selector: name=cp2002.codfw.wmnet,service=varnish-fe
  • 13:06 ema@puppetmaster1001: conftool action : set/pooled=yes; selector: name=cp2002.codfw.wmnet,service=nginx
  • 13:05 ema: cp2002/cp2005: repool varnish-fe for user traffic T213263
  • 12:55 thcipriani: gerrit running on 2.15.11
  • 12:53 thcipriani: restarting gerrit to finish rollback to 2.15.11
  • 12:52 thcipriani@deploy1001: Finished deploy [gerrit/gerrit@670ddb8]: Gerrit (back) to version 2.15.11 on cobalt -- restart of gerrit incoming (duration: 00m 11s)
  • 12:52 thcipriani@deploy1001: Started deploy [gerrit/gerrit@670ddb8]: Gerrit (back) to version 2.15.11 on cobalt -- restart of gerrit incoming
  • 12:51 moritzm: removing php 7.0 packages from snapshot1008, dumps are only using 7.2 (T218193)
  • 12:50 thcipriani@deploy1001: Finished deploy [gerrit/gerrit@670ddb8]: Gerrit (back) to version 2.15.11 (gerrit2001 only) (duration: 00m 10s)
  • 12:50 thcipriani@deploy1001: Started deploy [gerrit/gerrit@670ddb8]: Gerrit (back) to version 2.15.11 (gerrit2001 only)
  • 12:47 moritzm: upgrading snapshot1008 to component/php72 (T218193)
  • 12:46 moritzm: upgrading snapshot1005-1007/1009 to component/php72 (T218193)
  • 12:23 ema: rolling ATS restarts to apply https://gerrit.wikimedia.org/r/#/c/operations/puppet/+/500011/ T213263
  • 11:45 mutante: cobalt - systemctl restart gerrit
  • 10:36 ema@puppetmaster1001: conftool action : set/pooled=no; selector: name=cp2002.codfw.wmnet,service=varnish-fe
  • 10:36 ema@puppetmaster1001: conftool action : set/pooled=no; selector: name=cp2002.codfw.wmnet,service=nginx
  • 10:35 ema@puppetmaster1001: conftool action : set/pooled=yes; selector: name=cp2002.codfw.wmnet,service=varnish-fe
  • 10:35 ema@puppetmaster1001: conftool action : set/pooled=yes; selector: name=cp2002.codfw.wmnet,service=nginx
  • 09:37 mutante: restarting zuul on contint1001
  • 09:05 ema@puppetmaster1001: conftool action : set/pooled=no; selector: name=cp2005.codfw.wmnet,service=varnish-fe
  • 09:05 ema@puppetmaster1001: conftool action : set/pooled=no; selector: name=cp2005.codfw.wmnet,service=nginx
  • 09:05 ema@puppetmaster1001: conftool action : set/pooled=no; selector: name=cp2002.codfw.wmnet,service=varnish-fe
  • 09:05 ema@puppetmaster1001: conftool action : set/pooled=no; selector: name=cp2002.codfw.wmnet,service=nginx
  • 08:36 godog: depool ulsfo as precaution -- link repair in progress
  • 08:30 marostegui@deploy1001: Synchronized wmf-config/db-eqiad.php: Fully repool db1110 (duration: 00m 50s)
  • 07:58 gilles@deploy1001: Synchronized php-1.33.0-wmf.23/includes/media/MediaTransformOutput.php: T216499 Only apply high priority half the time (duration: 00m 50s)
  • 07:54 marostegui@deploy1001: Synchronized wmf-config/db-eqiad.php: More traffic to db1110 (duration: 00m 51s)
  • 07:28 marostegui@deploy1001: Synchronized wmf-config/db-eqiad.php: More traffic to db1110 (duration: 00m 50s)
  • 07:19 vgutierrez: reenabling puppet in acme-chief clients after verifying NOOP in netmon2001
  • 07:17 marostegui@deploy1001: Synchronized wmf-config/db-eqiad.php: Slowly repool db1110 (duration: 01m 06s)
  • 07:11 vgutierrez: disabling puppet in acme-chief clients to merge I437b91 safely
  • 07:06 marostegui: Upgrade db1110
  • 07:06 marostegui@deploy1001: Synchronized wmf-config/db-eqiad.php: Depool db1110 (duration: 00m 49s)
  • 07:01 gilles@deploy1001: Synchronized wmf-config/InitialiseSettings.php: T216598 T216594 Element Timing for Images and Layout Stability on ruwiki (duration: 00m 51s)
  • 06:56 marostegui: Remove tools section from tendril by doing: update shards set display='0' where name='tools'; T216749
  • 06:44 marostegui@deploy1001: Synchronized wmf-config/db-eqiad.php: Repool pc1009 (duration: 00m 49s)
  • 06:41 marostegui: Upgrade pc1009
  • 06:32 marostegui@deploy1001: Synchronized wmf-config/db-eqiad.php: Depool pc1009 (duration: 00m 50s)
  • 06:19 marostegui@deploy1001: Synchronized wmf-config/db-eqiad.php: Repool db1075 (duration: 00m 50s)
  • 05:49 marostegui: Disable notifications on labsdb1004 and labsdb1005 - T216749
  • 05:47 marostegui: Remove labsdb1004 and labsdb1005 from tendril - T216749
  • 05:46 marostegui@deploy1001: Synchronized wmf-config/db-eqiad.php: Depool db1075 (duration: 00m 52s)
  • 00:18 krinkle@deploy1001: Synchronized php-1.33.0-wmf.23/includes/api/ApiStashEdit.php: I35213d83a0 (duration: 00m 49s)
  • 00:16 krinkle@deploy1001: Synchronized wmf-config/InitialiseSettings.php: I8887ce013a8 (duration: 00m 51s)
  • 00:00 krinkle@deploy1001: Synchronized wmf-config/InitialiseSettings.php: I24a5469dbfd0 / T216206 for testwikidatawiki (duration: 00m 50s)

2019-03-28

  • 23:54 krinkle@deploy1001: Synchronized wmf-config/Wikibase.php: Ib9d617 (duration: 00m 50s)
  • 23:53 krinkle@deploy1001: Synchronized wmf-config/WikibaseSearchSettings.php: Ib9d617 (duration: 00m 51s)
  • 23:14 bstorm_: completed setting up clouddb1003 as the replica of labsdb1006 (osm)
  • 22:13 bd808@deploy1001: Finished deploy [striker/deploy@2f62c43]: Fixes for error pages and repo creation (T176325) (duration: 00m 59s)
  • 22:12 bd808@deploy1001: Started deploy [striker/deploy@2f62c43]: Fixes for error pages and repo creation (T176325)
  • 22:11 XioNoX: add AS specific policy-statements to cr1-eqsin v6 transits - T211930
  • 21:51 thcipriani: restarting gerrit
  • 21:18 jforrester@deploy1001: Synchronized wmf-config/InitialiseSettings.php: [Wikimania] Enable VisualEditor in the 2019 namespace T218645 (duration: 00m 50s)
  • 21:16 XioNoX: add AS specific policy-statements to cr2-eqsin v6 transits - T211930
  • 21:13 jforrester@deploy1001: Synchronized wmf-config/InitialiseSettings.php: [Wikitech] Enable VisualEditor in extra namespaces (duration: 00m 50s)
  • 20:48 jforrester@deploy1001: Synchronized wmf-config/InitialiseSettings.php: VisualEditor: Enable mobile section editing A/B test on 10 Wikipedias T218851 T218939 (duration: 00m 50s)
  • 20:29 moritzm: restarting Gerrit on cobalt to effect new Java security update
  • 19:47 mholloway-shell@deploy1001: Synchronized wmf-config/InitialiseSettings.php: Enable WikimediaEditorTasks on wikidatawiki (duration: 00m 52s)
  • 19:39 mdholloway: created table wikimedia_editor_tasks_entity_description_exists on wikidatawiki
  • 19:19 marxarelli: 1.33.0-wmf.23 deployed for all wikis (T206677)
  • 19:09 marxarelli: dduvall@deploy1001 rebuilt and synchronized wikiversions files: all wikis to 1.33.0-wmf.23
  • 18:45 bstorm_: switching replica for osmdb to clouddb1003 VM from labsdb1007
  • 18:42 addshore@deploy1001: Synchronized wmf-config/db-labs.php: BETA ONLY db-labs (duration: 00m 57s)
  • 18:35 addshore@deploy1001: Synchronized wmf-config/Wikibase.php: SWAT: wikibase.php, define sharedCacheKeyGroup (duration: 00m 57s)
  • 18:32 jforrester@deploy1001: Synchronized php-1.33.0-wmf.23/extensions/ProofreadPage/includes/Index/IndexContent.php: ProofreadPage: Fix AbuseFilter UBN T219514 (duration: 00m 57s)
  • 18:17 jforrester@deploy1001: Synchronized php-1.33.0-wmf.23/extensions/AdvancedSearch/: AdvancedSearch: Fix two UBNs T219455 T219539 (duration: 00m 59s)
  • 18:03 ejegg: updated payments-wiki from 6661655e37 to 4b49bb7333
  • 17:46 ebernhardson@deploy1001: Finished deploy [search/mjolnir/deploy@ff9d424]: Deploy logging @cee: prefixing bugfix (duration: 03m 24s)
  • 17:43 ebernhardson@deploy1001: Started deploy [search/mjolnir/deploy@ff9d424]: Deploy logging @cee: prefixing bugfix
  • 16:39 XioNoX: enable cr2-codfw:xe-5/0/0 (to cr2-eqdfw)
  • 16:36 mutante: wikitech-static - changing [renewalparams] authenticator = to 'apache' from 'standalone' (installer = was already apache) (T214640)
  • 16:36 jbond42: move python3-requests and python3-urllib3 from jessie-wikimedia backports to component/kube2proxy
  • 16:33 XioNoX: disable cr2-codfw:xe-5/0/0 (to cr2-eqdfw)
  • 16:00 akosiaris: poweroff sessionstore2001 for a re-racking
  • 15:15 mutante: wikitech-static - removing acme-setup cron jobs from root's crontab. this was used before the switch to certbot, is unrelated and added to confusion and maybe the problem (T214640)
  • 15:07 otto@deploy1001: scap-helm eventgate-analytics finished
  • 15:07 otto@deploy1001: scap-helm eventgate-analytics cluster eqiad completed
  • 15:07 otto@deploy1001: scap-helm eventgate-analytics upgrade production -f eventgate-analytics-eqiad-values.yaml --reset-values stable/eventgate-analytics [namespace: eventgate-analytics, clusters: eqiad]
  • 15:07 otto@deploy1001: scap-helm eventgate-analytics finished
  • 15:07 otto@deploy1001: scap-helm eventgate-analytics cluster codfw completed
  • 15:07 otto@deploy1001: scap-helm eventgate-analytics upgrade production -f eventgate-analytics-codfw-values.yaml --reset-values stable/eventgate-analytics [namespace: eventgate-analytics, clusters: codfw]
  • 15:06 otto@deploy1001: scap-helm eventgate-analytics finished
  • 15:06 otto@deploy1001: scap-helm eventgate-analytics cluster staging completed
  • 15:06 otto@deploy1001: scap-helm eventgate-analytics upgrade staging -f eventgate-analytics-staging-values.yaml --reset-values stable/eventgate-analytics [namespace: eventgate-analytics, clusters: staging]
  • 14:46 ppchelko@deploy1001: Finished deploy [cpjobqueue/deploy@4deeb04]: Partition htmlCacheUpdate topic, final cleanup stage T219159 (duration: 00m 52s)
  • 14:45 ppchelko@deploy1001: Started deploy [cpjobqueue/deploy@4deeb04]: Partition htmlCacheUpdate topic, final cleanup stage T219159
  • 14:32 ppchelko@deploy1001: Finished deploy [cpjobqueue/deploy@3a8a889]: Partition htmlCacheUpdate topic, step 2 T219159 (duration: 00m 53s)
  • 14:31 ppchelko@deploy1001: Started deploy [cpjobqueue/deploy@3a8a889]: Partition htmlCacheUpdate topic, step 2 T219159
  • 14:07 gehel: reindexing changes from '2019-03-26T12:00:00Z' to '2019-03-28T12:00:00Z' into cirrus / elasticsearch - T218878
  • 13:59 gehel: restarting elasticsearch on elastic2050 to validate JVM upgrade
  • 13:57 moritzm: upgrading Java on elasticsearch hosts
  • 13:50 filippo@puppetmaster1001: conftool action : set/pooled=no; selector: name=prometheus2004.codfw.wmnet
  • 13:49 filippo@puppetmaster1001: conftool action : set/pooled=yes; selector: name=prometheus2003.codfw.wmnet
  • 13:22 ppchelko@deploy1001: Finished deploy [cpjobqueue/deploy@c120b38]: Partition htmlCacheUpdate topic, explicitly exclude htmlCacheUpdate T219159 (duration: 00m 48s)
  • 13:21 ppchelko@deploy1001: Started deploy [cpjobqueue/deploy@c120b38]: Partition htmlCacheUpdate topic, explicitly exclude htmlCacheUpdate T219159
  • 13:14 ppchelko@deploy1001: Finished deploy [cpjobqueue/deploy@17285f8]: Partition htmlCacheUpdate topic, step 1 T219159 (duration: 01m 46s)
  • 13:12 ppchelko@deploy1001: Started deploy [cpjobqueue/deploy@17285f8]: Partition htmlCacheUpdate topic, step 1 T219159
  • 12:20 moritzm: removing php 7.0 packages from snapshot1005-1007/1009, dumps are only using 7.2 (T218193)
  • 12:13 jbond42: move git from jessie-wikimedia backports repo components/ci
  • 12:02 lucaswerkmeister-wmde@deploy1001: Synchronized wmf-config/InitialiseSettings.php: Revert "SDC: Enable both new-style and old-style Wikibase federation on Commons" (T219450)|gerrit:499756Revert "SDC: Enable both new-style and old-style Wikibase federation on Commons" (T219450) (duration: 00m 57s)
  • 11:54 moritzm: upgrading snapshot1005-1007/1009 to component/php72 (T218193)
  • 11:53 ladsgroup@deploy1001: rebuilt and synchronized wikiversions files: Revert T212597
  • 11:51 ladsgroup@deploy1001: Synchronized dblists: Revert T212597 (duration: 00m 58s)
  • 11:27 ladsgroup@deploy1001: Synchronized dblists: T212597 (duration: 00m 56s)
  • 11:01 godog: test copying prometheus metrics on bast3002
  • 10:54 gehel: restarting elasticsearch-psi on elastic20[35,36,53] (shards stuck in recovery) - T218878
  • 10:22 gehel: restarting elasticsearch on elastic20[34,36,50] (shards stuck in recovery) - T218878
  • 10:15 addshore@deploy1001: Synchronized php-1.33.0-wmf.23/extensions/Wikibase/lib: T219452 Revert: Use enableModuleContentVersion() for Wikibase\lib\SitesModule|gerrit:499738Revert: Use enableModuleContentVersion() for Wikibase\lib\SitesModule (duration: 01m 06s)
  • 10:11 gehel: restarting elasticsearch-omega on elastic2050 (shards stuck in recovery) - T218878
  • 09:56 gehel: restarting elasticsearch-omega on elastic2031 (shards stuck in recovery) - T218878
  • 09:42 gehel: restarting elasticsearch on elastic20[28,29,41] (shards stuck in recovery) - T218878
  • 09:37 gehel: restarting elasticsearch-psi on elastic20[39,40] (shards stuck in recovery) - T218878
  • 09:33 marostegui@deploy1001: Synchronized wmf-config/db-eqiad.php: Repool db1123 (duration: 00m 56s)
  • 09:28 gehel: restarting elasticsearch on elastic20[25,27] (shards stuck in recovery) - T218878
  • 09:19 gehel: restarting elasticsearch-omega on elastic20[38,50] (shards stuck in recovery) - T218878
  • 09:14 godog: install rsyslog 8.1901.0-1~bpo8+wmf1 on phab1001 and copper
  • 09:09 gehel: restarting elasticsearch-omega on elastic2050 (shards stuck in recovery) - T218878
  • 09:06 gehel: restarting elasticsearch-psi on elastic20[35,36,53] (shards stuck in recovery) - T218878
  • 09:00 gehel: restarting elasticsearch-psi on elastic2036 (shards stuck in recovery) - T218878
  • 08:57 marostegui@deploy1001: Synchronized wmf-config/db-eqiad.php: Depool db1123 (duration: 00m 55s)
  • 08:43 marostegui@deploy1001: Synchronized wmf-config/db-codfw.php: Repool pc2007 after upgrade (duration: 00m 57s)
  • 08:38 gehel: retry shard allocation on elasticsearch codfw all clusters (curl -k -XPOST 'https://localhost:9243/_cluster/reroute?pretty&explain=true&retry_failed') - T218878
  • 08:37 gehel: retry shard allocation on elasticsearch codfw (curl -k -XPOST 'https://localhost:9243/_cluster/reroute?pretty&explain=true&retry_failed')
  • 08:33 elukey: move hadoop yarn configuration from hdfs back to zookeeper - T218758
  • 08:32 marostegui: Upgrade pc2007
  • 08:31 marostegui@deploy1001: Synchronized wmf-config/db-codfw.php: Depool pc2007 for upgrade (duration: 00m 56s)
  • 08:23 marostegui@deploy1001: Synchronized wmf-config/db-codfw.php: Repool pc2009 after upgrade (duration: 00m 57s)
  • 08:12 marostegui: Upgrade pc2009
  • 08:11 marostegui@deploy1001: Synchronized wmf-config/db-codfw.php: Depool pc2009 for upgrade (duration: 00m 57s)
  • 08:10 gehel@cumin2001: END (PASS) - Cookbook sre.elasticsearch.force-shard-allocation (exit_code=0)
  • 08:07 gehel@cumin2001: START - Cookbook sre.elasticsearch.force-shard-allocation
  • 07:32 marostegui@deploy1001: Synchronized wmf-config/db-codfw.php: Repool pc2008 after upgrade (duration: 00m 57s)
  • 07:22 marostegui: Upgrade pc2008
  • 07:22 marostegui@deploy1001: Synchronized wmf-config/db-codfw.php: Depool pc2008 for upgrade (duration: 00m 57s)
  • 07:18 marostegui@deploy1001: Synchronized wmf-config/db-eqiad.php: Clean up old non used entries (duration: 01m 04s)
  • 06:27 marostegui: Deploy schema change on s3 codfw, lag will be generated on s3 codfw.
  • 05:39 marostegui: Restart apache on phab1001 - phabricator is down
  • 02:50 chaomodus: restarted pdfrender on scb1004 in order to attempt to address flapping errors
  • 01:45 XioNoX: add AS specific policy-statements to cr2-eqsin (but don't apply them yet) - T211930
  • 01:20 XioNoX: progressive jnt push to standardize cr*
  • 01:15 XioNoX: remove sandbox-out6 filter from all routers
  • 00:56 XioNoX: jnt push to standardize asw*
  • 00:32 XioNoX: jnt push to standardize mr1-*
  • 00:21 krinkle@deploy1001: Synchronized php-1.33.0-wmf.23/includes/api/ApiStashEdit.php: Ic357dbfcd9ab / T203786 (duration: 00m 57s)

2019-03-27

  • 23:46 mholloway-shell@deploy1001: Synchronized php-1.33.0-wmf.23/extensions/WikimediaEditorTasks: Fix: Pass database name to the NameTableStore constructor (duration: 00m 57s)
  • 23:34 thcipriani@deploy1001: Synchronized wmf-config/InitialiseSettings.php: SWAT: Load WikibaseLexemeCirrusSearch on Wikidata|gerrit:499400Load WikibaseLexemeCirrusSearch on Wikidata T216206 (duration: 00m 58s)
  • 23:25 thcipriani@deploy1001: Synchronized wmf-config/InitialiseSettings.php: SWAT: Load WikibaseLexemeCirrusSearch on test.wikidata.org|gerrit:499399Load WikibaseLexemeCirrusSearch on test.wikidata.org T216206 (duration: 00m 59s)
  • 22:51 smalyshev@deploy1001: Finished deploy [wdqs/wdqs@4fd1022]: Deploy new bot patterns (duration: 00m 31s)
  • 22:51 smalyshev@deploy1001: Started deploy [wdqs/wdqs@4fd1022]: Deploy new bot patterns
  • 22:47 smalyshev@deploy1001: Finished deploy [wdqs/wdqs@4fd1022]: Deploy new bot patterns (duration: 00m 04s)
  • 22:47 smalyshev@deploy1001: Started deploy [wdqs/wdqs@4fd1022]: Deploy new bot patterns
  • 22:45 krinkle@deploy1001: Synchronized wmf-config/profiler.php: I8c7f8c / T176916 (duration: 00m 59s)
  • 22:36 smalyshev@deploy1001: Finished deploy [wdqs/wdqs@4fd1022]: Deploy new bot patterns (duration: 00m 34s)
  • 22:35 smalyshev@deploy1001: Started deploy [wdqs/wdqs@4fd1022]: Deploy new bot patterns
  • 22:30 niharika29@deploy1001: Finished deploy [scholarships/scholarships@9db232d]: Update wikimania-scholarships; includes fix for broken privacy policy link (duration: 00m 02s)
  • 22:30 niharika29@deploy1001: Started deploy [scholarships/scholarships@9db232d]: Update wikimania-scholarships; includes fix for broken privacy policy link
  • 22:21 smalyshev@deploy1001: Finished deploy [wdqs/wdqs@4fd1022]: Deploy new bot patterns (duration: 00m 31s)
  • 22:21 smalyshev@deploy1001: Started deploy [wdqs/wdqs@4fd1022]: Deploy new bot patterns
  • 21:59 chaomodus: restarting proton1001 to upgrade ram
  • 21:58 chaomodus: restarting proton1002 to upgrade ram
  • 21:57 chaomodus: restarting proton2001 in order to upgrade ram
  • 21:54 chaomodus: restarting proton2002 in order to upgrade ram
  • 21:25 dcausse@deploy1001: Synchronized wmf-config/Wikibase.php: T219448 (duration: 00m 55s)
  • 21:25 eileen: civicrm revision changed from 67b8405b60 to 7560af93df, config revision is 5a0cbb3c7d (was actually before the process control one)
  • 21:24 eileen: process-control config revision is e1bc772c89
  • 21:17 chaomodus: restarted proton on proton1001 in response to memory exhaustion and cpu peg
  • 21:07 milimetric@deploy1001: Finished deploy [analytics/refinery@fdd21a4]: non-deploy changes and two new oozie jobs (duration: 11m 48s)
  • 20:55 milimetric@deploy1001: Started deploy [analytics/refinery@fdd21a4]: non-deploy changes and two new oozie jobs
  • 20:29 mholloway-shell@deploy1001: Synchronized wmf-config/InitialiseSettings.php: Update WikimediaEditorTasks config for DB location split (duration: 00m 57s)
  • 20:23 mholloway-shell@deploy1001: Synchronized php-1.33.0-wmf.23/extensions/WikimediaEditorTasks: Update DB utils to handle counts and suggestion DBs in different locations (duration: 00m 58s)
  • 20:14 gehel@cumin2001: END (FAIL) - Cookbook sre.elasticsearch.e6-upgrade (exit_code=99)
  • 20:14 mholloway-shell@deploy1001: Synchronized php-1.33.0-wmf.23/extensions/WikimediaEditorTasks: Fix: Use READ_LOCKING when evaluating whether to update targets_passed (duration: 00m 58s)
  • 20:04 gehel@cumin2001: START - Cookbook sre.elasticsearch.e6-upgrade
  • 20:03 gehel@cumin2001: END (ERROR) - Cookbook sre.elasticsearch.e6-upgrade (exit_code=97)
  • 19:48 gehel@cumin2001: END (PASS) - Cookbook sre.elasticsearch.force-shard-allocation (exit_code=0)
  • 19:43 gehel@cumin2001: START - Cookbook sre.elasticsearch.force-shard-allocation
  • 19:43 herron: removed queued wikidata notification messages for a***a@w**gm**ster.** on mx1001 to address gmail excessive volume rate limiting
  • 19:32 jijiki: restarting pdfrender on scb1001
  • 19:30 gehel@cumin2001: END (PASS) - Cookbook sre.elasticsearch.force-shard-allocation (exit_code=0)
  • 19:27 marxarelli: (resent; originally @ 1916) dduvall@deploy1001 rebuilt and synchronized wikiversions files: group1 wikis to 1.33.0-wmf.23
  • 19:23 gehel@cumin2001: START - Cookbook sre.elasticsearch.force-shard-allocation
  • 19:18 dduvall@deploy1001: Synchronized php: group1 wikis to 1.33.0-wmf.23 (duration: 01m 45s)
  • 19:14 gehel@cumin2001: START - Cookbook sre.elasticsearch.e6-upgrade
  • 18:48 thcipriani: restarting gerrit process
  • 18:12 jynus: update grants on db1115 for new provisioning hosts on codfw T218336
  • 18:10 elukey: interface::rps applied to all the mc10XX hosts - T203786
  • 17:41 gehel@cumin2001: END (PASS) - Cookbook sre.elasticsearch.force-shard-allocation (exit_code=0)
  • 17:41 gehel@cumin2001: START - Cookbook sre.elasticsearch.force-shard-allocation
  • 17:10 ema: fermium: /usr/local/sbin/disable_list wikimetrics T211835
  • 16:55 jforrester@deploy1001: Synchronized wmf-config/InitialiseSettings.php: T214075 SDC: Enable Wikidata federation on Commons (duration: 00m 57s)
  • 16:38 elukey: mc20XX and mc1022 have interface::rps enabled - T203786
  • 16:28 jforrester@deploy1001: Synchronized php-1.33.0-wmf.23/extensions/GlobalPreferences/includes/GlobalPreferencesFactory.php: Hot-fix T219380 GlobalPreferences: Allow modifiedPrefs to be set even if no UI control (duration: 00m 58s)
  • 16:18 jforrester@deploy1001: Synchronized wmf-config/CommonSettings.php: SDC: Use feature flag for enabling depicts in UW (duration: 00m 57s)
  • 16:13 jforrester@deploy1001: Synchronized wmf-config/InitialiseSettings.php: SDC: Add feature flag for enabling depicts in UW (duration: 00m 57s)
  • 15:56 jbond42: bastion reboots complete
  • 15:56 ariel@deploy1001: Finished deploy [dumps/dumps@88ddd76]: ability to use lbzip2 for meta-history compression (duration: 00m 03s)
  • 15:56 ariel@deploy1001: Started deploy [dumps/dumps@88ddd76]: ability to use lbzip2 for meta-history compression
  • 15:44 jbond42: rebooting bast2001.wikimedia.org in 5 minutes
  • 15:44 gehel@cumin2001: END (PASS) - Cookbook sre.elasticsearch.force-shard-allocation (exit_code=0)
  • 15:43 gehel@cumin2001: START - Cookbook sre.elasticsearch.force-shard-allocation
  • 15:42 jbond42: rebooting bast2002.wikimedia.org in 5 minutes
  • 15:38 jbond42: rebooting bast1002.wikimedia.org in 5 minutes
  • 15:34 jbond42: rebooting bast4002.wikimedia.org in 5 minutes
  • 15:30 jbond42: rebooting bast5001.wikimedia.org in 5 minutes
  • 15:24 jbond42: rebooting iron.wikimedia.org in 5 minutes
  • 15:22 gehel@cumin2001: END (PASS) - Cookbook sre.elasticsearch.force-shard-allocation (exit_code=0)
  • 15:21 gehel@cumin2001: START - Cookbook sre.elasticsearch.force-shard-allocation
  • 15:19 elukey: slowly rolling out interface::rps to all the mcXXXX nodes - T203786
  • 14:52 gehel@cumin2001: END (FAIL) - Cookbook sre.elasticsearch.e6-upgrade (exit_code=99)
  • 14:45 gehel@cumin2001: END (PASS) - Cookbook sre.elasticsearch.force-shard-allocation (exit_code=0)
  • 14:44 gehel@cumin2001: START - Cookbook sre.elasticsearch.force-shard-allocation
  • 14:13 gehel@cumin2001: END (PASS) - Cookbook sre.elasticsearch.force-shard-allocation (exit_code=0)
  • 14:12 marostegui: Sanitize hywwiki on db1124:3313 T212625
  • 14:11 gehel@cumin2001: START - Cookbook sre.elasticsearch.force-shard-allocation
  • 14:05 godog: roll-restart logstash to apply https://gerrit.wikimedia.org/r/c/operations/puppet/+/498417
  • 13:38 gehel@cumin2001: START - Cookbook sre.elasticsearch.e6-upgrade
  • 13:35 gehel@cumin2001: END (PASS) - Cookbook sre.elasticsearch.force-shard-allocation (exit_code=0)
  • 13:35 gehel@cumin2001: START - Cookbook sre.elasticsearch.force-shard-allocation
  • 13:11 ladsgroup@deploy1001: Synchronized dblists: (no justification provided) (duration: 00m 57s)
  • 12:42 ladsgroup@deploy1001: Synchronized dblists: (no justification provided) (duration: 00m 58s)
  • 12:41 Amir1: scap sync-file dblists
  • 12:30 Amir1: mwscript extensions/WikimediaMaintenance/addWiki.php --wiki=mediawikiwiki hyw wikipedia hywwiki hyw.wikipedia.org
  • 12:25 gehel@cumin2001: END (PASS) - Cookbook sre.elasticsearch.force-shard-allocation (exit_code=0)
  • 12:23 gehel@cumin2001: START - Cookbook sre.elasticsearch.force-shard-allocation
  • 12:15 gehel@cumin2001: END (FAIL) - Cookbook sre.elasticsearch.e6-upgrade (exit_code=99)
  • 11:47 gehel@cumin2001: END (PASS) - Cookbook sre.elasticsearch.force-shard-allocation (exit_code=0)
  • 11:43 gehel@cumin2001: START - Cookbook sre.elasticsearch.force-shard-allocation
  • 11:37 mdholloway: created wikimedia_editor_tasks_entity_description_exists table on testwikidatawiki
  • 11:28 _joe_: SWAT done
  • 11:24 oblivian@deploy1001: Synchronized php-1.33.0-wmf.22/extensions/WikimediaEvents: SWAT: Backport Use a cookie to persist the seed for php7 a/b test to .22 T216676 (duration: 00m 58s)
  • 11:20 zfilipin@deploy1001: Synchronized wmf-config/throttle.php: SWAT: Throttle rule for The Art and Feminism Edit-a-thon in Taiwan (T219113)|gerrit:498770Throttle rule for The Art and Feminism Edit-a-thon in Taiwan (T219113) (duration: 00m 59s)
  • 11:14 zfilipin@deploy1001: Synchronized wmf-config/throttle.php: SWAT: Clean the throttles up (T219311)|gerrit:499287Clean the throttles up (T219311) (duration: 00m 57s)
  • 11:10 dcausse: elasticsearch search cluster: setting cluster.routing.allocation.disk.watermark.flood_stage to 100% on omega/psi/chi@eqiad (T219364)
  • 11:08 zfilipin@deploy1001: Synchronized wmf-config/throttle.php: SWAT: Add throttle rule for Czech editathon (T219291)|gerrit:499231Add throttle rule for Czech editathon (T219291) (duration: 00m 58s)
  • 11:06 dcausse: elasticsearch search cluster: setting "index.blocks.read_only_allow_delete" to null on all indices in omega/psi/chi@omega (T219364)
  • 11:04 mutante: re-enabled puppet on logstash1007 through 1011 - then on logstash*
  • 11:00 gehel@cumin2001: START - Cookbook sre.elasticsearch.e6-upgrade
  • 10:57 godog: upgrade rsyslog to 8.1903.0-3~bpo8+wmf1 on cobalt to test imfile file rotation fix - T214176
  • 10:53 mutante: enabling and running puppet on logstash1007
  • 10:49 mutante: disabling puppet on logstash* via cumin
  • 10:47 marostegui@deploy1001: Synchronized wmf-config/db-eqiad.php: Repool db1090:3312 (duration: 00m 58s)
  • 10:20 godog: upgrade rsyslog to 8.1903.0-3~bpo8+wmf1 on phab1001 to test imfile file rotation fix - T214176
  • 09:58 marostegui@deploy1001: Synchronized wmf-config/db-eqiad.php: Depool db1090:3312 (duration: 00m 56s)
  • 09:41 marostegui@deploy1001: Synchronized wmf-config/db-eqiad.php: Repool db1074 (duration: 00m 57s)
  • 09:41 marostegui: Upgrade db2092
  • 09:06 vgutierrez: puppet reenabled in acme-chief clients - T207295
  • 09:01 marostegui: Deploy schema change on db1074, this will generate lag on labsdb hosts for s2
  • 09:01 marostegui@deploy1001: Synchronized wmf-config/db-eqiad.php: Depool db1074 (duration: 00m 57s)
  • 08:56 marostegui@deploy1001: Synchronized wmf-config/db-eqiad.php: Repool db1076 (duration: 00m 54s)
  • 08:33 vgutierrez: disabling puppet in acme-chief clients to get rid safely of old TLS material - T207295
  • 08:25 marostegui@deploy1001: Synchronized wmf-config/db-eqiad.php: Depool db1076 (duration: 00m 57s)
  • 08:17 godog: bounce rsyslog on phab* - apache access logs stopped at ~6.30 today
  • 08:09 godog: bounce rsyslog on cobalt - apache access logs stopped at ~6.30 today
  • 08:07 marostegui@deploy1001: Synchronized wmf-config/db-eqiad.php: Repool db1122 (duration: 00m 57s)
  • 07:11 marostegui@deploy1001: Synchronized wmf-config/db-eqiad.php: Depool db1122 (duration: 00m 58s)
  • 06:57 SMalyshev: depooled wdqs1005 to catch up
  • 06:56 SMalyshev: repooled wdqs1004
  • 06:42 marostegui@deploy1001: Synchronized wmf-config/db-eqiad.php: Repool db1103:3312 (duration: 00m 58s)
  • 06:02 marostegui@deploy1001: Synchronized wmf-config/db-codfw.php: Change one parsercache key on codfw - T210725 (duration: 00m 57s)
  • 05:58 marostegui@deploy1001: Synchronized wmf-config/db-eqiad.php: Depool db1103:3312 (duration: 01m 10s)
  • 00:56 SMalyshev: depooled wdq1004 to catch up
  • 00:55 SMalyshev: repooled wdq1006

2019-03-26

  • 23:37 SMalyshev: repooled wdqs2003
  • 23:12 ebernhardson@deploy1001: Synchronized wmf-config/InitialiseSettings-labs.php: T216206 : sync noop labs config: Actually load WBCS-Lexeme extension before trying to use it (duration: 00m 57s)
  • 22:12 gehel: freezing and unfreezing writes to elasticsearch codfw
  • 21:47 SMalyshev: depool wdq2003 to catch it up
  • 21:32 ebernhardson: manually thaw search.svc.codfw.wmnet:9643
  • 21:31 mholloway-shell@deploy1001: Synchronized wmf-config/InitialiseSettings.php: Enable WikimediaEditorTasks on testwikidatawiki (duration: 00m 57s)
  • 21:22 mdholloway: created new db tables for WikimediaEditorTasks in x1
  • 21:00 SMalyshev: depooled wdqs1006 to see if it'd catch up better
  • 20:19 marxarelli: correction: group0 to 1.33.0-wmf.23
  • 20:15 dduvall@deploy1001: rebuilt and synchronized wikiversions files: group0 to 1.33.0-wmf.0
  • 20:08 ejegg: updated payments-wiki from f42910460b to 6661655e37
  • 19:58 dduvall@deploy1001: Finished scap: testwiki to php-1.33.0-wmf.23 and rebuild l10n cache (duration: 37m 59s)
  • 19:20 dduvall@deploy1001: Started scap: testwiki to php-1.33.0-wmf.23 and rebuild l10n cache
  • 19:18 marxarelli: scap clean failure due to T218783. train is rolling without cleanup
  • 19:17 jynus: reloading db2095 mariadb instances to reload and check filters
  • 19:13 jynus: reloading db2094 mariadb instances to reload and check filters
  • 19:07 dduvall@deploy1001: clean aborted: Pruned MediaWiki: 1.33.0-wmf.17 (duration: 00m 10s)
  • 19:04 jynus: reloading db1125 mariadb instances to reload and check filters
  • 18:49 marxarelli: branch 1.33.0-wmf.23 was cut successfully (T206677)
  • 18:24 jynus: reloading db1124 mariadb instances to reload and check filters
  • 18:21 marxarelli: starting branch cut for 1.33.0-wmf.23 (T206677)
  • 18:09 thcipriani: gerrit back on version 2.15.12, upgrade complete.
  • 18:05 thcipriani: restarting gerrit on cobalt for update to 2.15.12
  • 18:05 gehel@cumin2001: END (FAIL) - Cookbook sre.elasticsearch.e6-upgrade (exit_code=99)
  • 18:05 thcipriani@deploy1001: Finished deploy [gerrit/gerrit@d3d2134]: Gerrit to 2.15.12 on cobalt (duration: 00m 15s)
  • 18:04 thcipriani@deploy1001: Started deploy [gerrit/gerrit@d3d2134]: Gerrit to 2.15.12 on cobalt
  • 18:03 thcipriani@deploy1001: Finished deploy [gerrit/gerrit@d3d2134]: Gerrit to 2.15.12 on gerrit2001 only (duration: 00m 11s)
  • 18:03 thcipriani@deploy1001: Started deploy [gerrit/gerrit@d3d2134]: Gerrit to 2.15.12 on gerrit2001 only
  • 18:01 thcipriani: starting gerrit 2.15.12 upgrade
  • 17:45 otto@deploy1001: scap-helm eventgate-analytics finished
  • 17:45 otto@deploy1001: scap-helm eventgate-analytics cluster eqiad completed
  • 17:45 otto@deploy1001: scap-helm eventgate-analytics upgrade production -f eventgate-analytics-eqiad-values.yaml --reset-values stable/eventgate-analytics [namespace: eventgate-analytics, clusters: eqiad]
  • 17:43 otto@deploy1001: scap-helm eventgate-analytics finished
  • 17:43 otto@deploy1001: scap-helm eventgate-analytics cluster codfw completed
  • 17:43 otto@deploy1001: scap-helm eventgate-analytics upgrade production -f eventgate-analytics-codfw-values.yaml --reset-values stable/eventgate-analytics [namespace: eventgate-analytics, clusters: codfw]
  • 17:41 otto@deploy1001: scap-helm eventgate-analytics finished
  • 17:41 otto@deploy1001: scap-helm eventgate-analytics cluster staging completed
  • 17:41 otto@deploy1001: scap-helm eventgate-analytics upgrade staging -f eventgate-analytics-staging-values.yaml --reset-values stable/eventgate-analytics [namespace: eventgate-analytics, clusters: staging]
  • 17:39 otto@deploy1001: scap-helm eventgate-analytics finished
  • 17:39 otto@deploy1001: scap-helm eventgate-analytics cluster staging completed
  • 17:39 otto@deploy1001: scap-helm eventgate-analytics upgrade staging -f eventgate-analytics-staging-values.yaml --dry-run --debug stable/eventgate-analytics [namespace: eventgate-analytics, clusters: staging]
  • 17:38 otto@deploy1001: scap-helm eventgate-analytics finished
  • 17:38 otto@deploy1001: scap-helm eventgate-analytics cluster staging completed
  • 17:38 otto@deploy1001: scap-helm eventgate-analytics upgrade staging -f eventgate-analytics-staging-values.yaml stable/eventgate-analytics [namespace: eventgate-analytics, clusters: staging]
  • 17:38 arlolra: Updated Parsoid to f58c3d1 (T219023)
  • 17:38 otto@deploy1001: scap-helm eventgate-analytics finished
  • 17:38 otto@deploy1001: scap-helm eventgate-analytics cluster staging completed
  • 17:38 otto@deploy1001: scap-helm eventgate-analytics upgrade staging -f eventgate-analytics-staging-values.yaml --dry-run --debug stable/eventgate-analytics [namespace: eventgate-analytics, clusters: staging]
  • 17:33 otto@deploy1001: scap-helm eventgate-analytics finished
  • 17:33 otto@deploy1001: scap-helm eventgate-analytics cluster staging completed
  • 17:33 otto@deploy1001: scap-helm eventgate-analytics upgrade staging -f eventgate-analytics-staging-values.yaml --dry-run --debug stable/eventgate-analytics [namespace: eventgate-analytics, clusters: staging]
  • 17:31 otto@deploy1001: scap-helm eventgate-analytics finished
  • 17:31 otto@deploy1001: scap-helm eventgate-analytics cluster staging completed
  • 17:31 otto@deploy1001: scap-helm eventgate-analytics upgrade staging -f eventgate-analytics-staging-values.yaml stable/eventgate-analytics [namespace: eventgate-analytics, clusters: staging]
  • 17:28 arlolra@deploy1001: Finished deploy [parsoid/deploy@395a214]: Updating Parsoid to f58c3d1 (duration: 06m 51s)
  • 17:21 arlolra@deploy1001: Started deploy [parsoid/deploy@395a214]: Updating Parsoid to f58c3d1
  • 17:14 gehel@cumin2001: END (PASS) - Cookbook sre.elasticsearch.force-shard-allocation (exit_code=0)
  • 17:13 gehel@cumin2001: START - Cookbook sre.elasticsearch.force-shard-allocation
  • 17:12 otto@deploy1001: scap-helm eventgate-analytics finished
  • 17:12 otto@deploy1001: scap-helm eventgate-analytics cluster staging completed
  • 17:12 otto@deploy1001: scap-helm eventgate-analytics upgrade staging -f eventgate-analytics-staging-values.yaml --dry-run --debug stable/eventgate-analytics [namespace: eventgate-analytics, clusters: staging]
  • 17:10 otto@deploy1001: scap-helm eventgate-analytics finished
  • 17:10 otto@deploy1001: scap-helm eventgate-analytics cluster staging completed
  • 17:10 otto@deploy1001: scap-helm eventgate-analytics upgrade staging -f eventgate-analytics-staging-values.yaml stable/eventgate-analytics [namespace: eventgate-analytics, clusters: staging]
  • 17:06 otto@deploy1001: scap-helm eventgate-analytics finished
  • 17:06 otto@deploy1001: scap-helm eventgate-analytics cluster staging completed
  • 17:06 otto@deploy1001: scap-helm eventgate-analytics upgrade staging -f eventgate-analytics-staging-values.yaml --dry-run --debug stable/eventgate-analytics [namespace: eventgate-analytics, clusters: staging]
  • 17:03 otto@deploy1001: scap-helm eventgate-analytics finished
  • 17:03 otto@deploy1001: scap-helm eventgate-analytics cluster staging completed
  • 17:03 otto@deploy1001: scap-helm eventgate-analytics upgrade staging -f eventgate-analytics-staging-values.yaml --dry-run --debug stable/eventgate-analytics [namespace: eventgate-analytics, clusters: staging]
  • 16:59 otto@deploy1001: scap-helm eventgate-analytics finished
  • 16:59 otto@deploy1001: scap-helm eventgate-analytics cluster staging completed
  • 16:59 otto@deploy1001: scap-helm eventgate-analytics upgrade staging -f eventgate-analytics-staging-values.yaml --dry-run --debug stable/eventgate-analytics [namespace: eventgate-analytics, clusters: staging]
  • 16:59 otto@deploy1001: scap-helm eventgate- upgrade staging -f eventgate-analytics-staging-values.yaml --dry-run --debug stable/eventgate-analytics [namespace: eventgate-, clusters: staging]
  • 16:59 otto@deploy1001: scap-helm eventgate- upgrade staging -f eventgate-analytics-staging-values.yaml --dry-run --debug stable/eventgate-analytics [namespace: eventgate-, clusters: staging]
  • 16:58 gehel@cumin2001: START - Cookbook sre.elasticsearch.e6-upgrade
  • 16:58 gehel@cumin2001: END (PASS) - Cookbook sre.elasticsearch.force-shard-allocation (exit_code=0)
  • 16:57 gehel@cumin2001: START - Cookbook sre.elasticsearch.force-shard-allocation
  • 16:57 otto@deploy1001: scap-helm eventgate-analytics finished
  • 16:57 otto@deploy1001: scap-helm eventgate-analytics cluster staging completed
  • 16:57 otto@deploy1001: scap-helm eventgate-analytics upgrade staging -f eventgate-analytics-staging-values.yaml stable/eventgate-analytics [namespace: eventgate-analytics, clusters: staging]
  • 16:31 gilles@deploy1001: Finished deploy [performance/asoranking@9a1e5ef]: (no justification provided) (duration: 00m 52s)
  • 16:30 gilles@deploy1001: Started deploy [performance/asoranking@9a1e5ef]: (no justification provided)
  • 16:07 gehel@cumin2001: END (PASS) - Cookbook sre.elasticsearch.force-shard-allocation (exit_code=0)
  • 16:07 gehel@cumin2001: START - Cookbook sre.elasticsearch.force-shard-allocation
  • 16:05 robh: decom of labtestvirt200[12] started via T218023
  • 15:45 otto@deploy1001: scap-helm eventgate-analytics finished
  • 15:45 otto@deploy1001: scap-helm eventgate-analytics cluster staging completed
  • 15:45 otto@deploy1001: scap-helm eventgate-analytics upgrade staging --version 0.0.16 -f eventgate-analytics-staging-values.yaml stable/eventgate-analytics [namespace: eventgate-analytics, clusters: staging]
  • 15:44 otto@deploy1001: scap-helm eventgate-analytics finished
  • 15:44 otto@deploy1001: scap-helm eventgate-analytics cluster staging completed
  • 15:44 otto@deploy1001: scap-helm eventgate-analytics upgrade staging --version 0.0.16 -f eventgate-analytics-staging-values.yaml stable/eventgate-analytics [namespace: eventgate-analytics, clusters: staging]
  • 15:43 gehel@cumin2001: END (PASS) - Cookbook sre.elasticsearch.force-shard-allocation (exit_code=0)
  • 15:43 otto@deploy1001: scap-helm eventgate-analytics upgrade staging --version 52 -f eventgate-analytics-staging-values.yaml stable/eventgate-analytics [namespace: eventgate-analytics, clusters: staging]
  • 15:43 gehel@cumin2001: START - Cookbook sre.elasticsearch.force-shard-allocation
  • 15:40 otto@deploy1001: scap-helm eventgate-analytics finished
  • 15:40 otto@deploy1001: scap-helm eventgate-analytics cluster staging completed
  • 15:40 otto@deploy1001: scap-helm eventgate-analytics upgrade --help [namespace: eventgate-analytics, clusters: staging]
  • 15:34 otto@deploy1001: scap-helm eventgate-analytics finished
  • 15:34 otto@deploy1001: scap-helm eventgate-analytics cluster staging completed
  • 15:34 otto@deploy1001: scap-helm eventgate-analytics upgrade staging -f eventgate-analytics-staging-values.yaml --dry-run --debug stable/eventgate-analytics [namespace: eventgate-analytics, clusters: staging]
  • 15:32 otto@deploy1001: scap-helm eventgate-analytics finished
  • 15:32 otto@deploy1001: scap-helm eventgate-analytics cluster staging completed
  • 15:32 otto@deploy1001: scap-helm eventgate-analytics upgrade staging -f eventgate-analytics-staging-values.yaml stable/eventgate-analytics [namespace: eventgate-analytics, clusters: staging]
  • 15:31 gehel@cumin2001: END (FAIL) - Cookbook sre.elasticsearch.e6-upgrade (exit_code=99)
  • 15:31 otto@deploy1001: scap-helm eventgate-analytics finished
  • 15:31 otto@deploy1001: scap-helm eventgate-analytics cluster staging completed
  • 15:31 otto@deploy1001: scap-helm eventgate-analytics upgrade staging -f eventgate-analytics-staging-values.yaml stable/eventgate-analytics [namespace: eventgate-analytics, clusters: staging]
  • 15:27 otto@deploy1001: scap-helm eventgate-analytics finished
  • 15:27 otto@deploy1001: scap-helm eventgate-analytics cluster staging completed
  • 15:27 otto@deploy1001: scap-helm eventgate-analytics upgrade staging -f eventgate-analytics-staging-values.yaml stable/eventgate-analytics [namespace: eventgate-analytics, clusters: staging]
  • 15:20 gehel@cumin2001: END (PASS) - Cookbook sre.elasticsearch.force-shard-allocation (exit_code=0)
  • 15:20 gehel@cumin2001: START - Cookbook sre.elasticsearch.force-shard-allocation
  • 15:08 gehel@cumin2001: END (PASS) - Cookbook sre.elasticsearch.force-shard-allocation (exit_code=0)
  • 15:07 gehel@cumin2001: START - Cookbook sre.elasticsearch.force-shard-allocation
  • 14:01 jbond42: rolling update of passenger on puppet masters
  • 13:35 gehel@cumin2001: END (PASS) - Cookbook sre.elasticsearch.force-shard-allocation (exit_code=0)
  • 13:35 gehel@cumin2001: START - Cookbook sre.elasticsearch.force-shard-allocation
  • 13:06 gehel@cumin2001: END (PASS) - Cookbook sre.elasticsearch.force-shard-allocation (exit_code=0)
  • 13:04 gehel@cumin2001: START - Cookbook sre.elasticsearch.force-shard-allocation
  • 12:58 gehel@cumin2001: START - Cookbook sre.elasticsearch.e6-upgrade
  • 11:42 Amir1: EU SWAT is done
  • 11:40 Amir1: ladsgroup@mwmaint1002:~$ mwscript extensions/Wikibase/lib/maintenance/populateSitesTable.php --wiki=wikimaniawiki --force-protocol https (T217730)
  • 11:39 Amir1: wikiadmin@db1078.eqiad.wmnet(wikimaniawiki)> DELETE FROM sites; and site_identifiers
  • 11:28 ladsgroup@deploy1001: Synchronized wmf-config/InitialiseSettings.php: SWAT: Set $wmgWikibaseSiteGroup for wikimaniawiki (T217730)|gerrit:498440Set $wmgWikibaseSiteGroup for wikimaniawiki (T217730) (duration: 00m 49s)
  • 11:22 elukey: temporary install ifstat on mc1022 + tmux session to log in/out bandwidth usage every 1s for T203786
  • 11:20 ladsgroup@deploy1001: Synchronized wmf-config/throttle.php: SWAT: Throttle rule for Wikimedia Hackathon 2019 (T213869)|gerrit:498949Throttle rule for Wikimedia Hackathon 2019 (T213869), try II (duration: 00m 49s)
  • 11:11 ladsgroup@deploy1001: Synchronized wmf-config/throttle.php: SWAT: Throttle rule for Wikimedia Hackathon 2019 (T213869)|gerrit:498949Throttle rule for Wikimedia Hackathon 2019 (T213869) (duration: 00m 51s)
  • 10:54 marostegui@deploy1001: Synchronized wmf-config/db-eqiad.php: Repool db1105:3312 (duration: 00m 49s)
  • 09:56 marostegui@deploy1001: Synchronized wmf-config/db-eqiad.php: Depool db1105:3312 (duration: 00m 50s)
  • 09:54 marostegui: Upgrade db2071
  • 09:42 marostegui: Upgrade db2070
  • 09:15 jijiki: Restarting pdfrender on scb1001
  • 09:09 filippo@puppetmaster1001: conftool action : set/pooled=no; selector: name=prometheus1004.eqiad.wmnet
  • 09:05 filippo@puppetmaster1001: conftool action : set/pooled=yes; selector: name=prometheus1003.eqiad.wmnet
  • 08:09 marostegui: Deploy schema change on s2 codfw master, this will generate lag on codfw s2
  • 07:16 marostegui@deploy1001: Synchronized wmf-config/db-eqiad.php: Repool db1119 (duration: 00m 49s)
  • 06:51 marostegui@deploy1001: Synchronized wmf-config/db-eqiad.php: Depool db1119 (duration: 00m 50s)
  • 06:35 marostegui@deploy1001: Synchronized wmf-config/db-eqiad.php: Repool db1106 (duration: 00m 52s)
  • 06:02 marostegui@deploy1001: Synchronized wmf-config/db-eqiad.php: Depool db1106 (duration: 00m 51s)
  • 06:02 marostegui: Deploy schema change on db1106, this will generate lag on s1 on labs hosts

2019-03-25

  • 23:20 jforrester@deploy1001: Synchronized wmf-config/InitialiseSettings.php: SWAT T219234 Turn on Elastica logging channel (duration: 00m 51s)
  • 22:32 krinkle@deploy1001: Synchronized docroot/wikipedia.org/speed-tests/Banksy.enwiki.872156204: T185446 (duration: 00m 49s)
  • 21:44 jforrester@deploy1001: Synchronized wmf-config/Wikibase.php: T216206 Deploy WikibaseLexemeCirrusSearch: Part 1 - set up variables, sub-part b (duration: 00m 49s)
  • 21:43 jforrester@deploy1001: Synchronized wmf-config/InitialiseSettings.php: T216206 Deploy WikibaseLexemeCirrusSearch: Part 1 - set up variables, sub-part a (duration: 00m 50s)
  • 21:40 XioNoX: apply transport-in4 filter to cr1/2-eqiad - T190090
  • 21:33 jforrester@deploy1001: Synchronized wmf-config/InitialiseSettings.php: T218715 Enable WBCS on Testcommons too (duration: 00m 50s)
  • 20:32 ebernhardson: T218994 set various deprecation channels on all six cirrus elasticsearch clusters to ERROR
  • 19:54 dcausse: elasticsearch search cluster: SET "logger.org.elasticsearch.common.logging.DeprecationLogger" to "ERROR" to psi/omega@eqiad (T218994)
  • 19:48 dcausse: elasticsearch search cluster: SET "logger.org.elasticsearch.deprecation.index.query.functionscore.ScoreFunctionBuilder" to "ERROR" to chi/psi/omega@eqiad (T218994)
  • 19:40 volans: restart icinga on icinga1001 to reset modified attributes
  • 19:37 dcausse: morning SWAT done
  • 19:33 dcausse@deploy1001: Synchronized wmf-config/InitialiseSettings.php: [cirrus] switch all wikis to eqiad (elastic 6.5.4) (duration: 00m 50s)
  • 19:21 dcausse@deploy1001: Synchronized wmf-config/InitialiseSettings.php: T192254 (duration: 00m 49s)
  • 19:13 dcausse@deploy1001: Synchronized wmf-config/CommonSettings.php: T218260 (duration: 00m 49s)
  • 19:06 ebernhardson@deploy1001: Finished deploy [search/mjolnir/deploy@1cbf290]: Roll update to mjolnir-bulk-daemon es6 handling of super_detect_noop (duration: 03m 27s)
  • 19:02 ebernhardson@deploy1001: Started deploy [search/mjolnir/deploy@1cbf290]: Roll update to mjolnir-bulk-daemon es6 handling of super_detect_noop
  • 18:46 dcausse@deploy1001: Synchronized wmf-config/flaggedrevs.php: revert T217507 (duration: 00m 49s)
  • 18:43 ebernhardson: restart mjolnir-kafka-msearch-daemon across cirrus elasticsearch servers
  • 18:41 dcausse@deploy1001: scap failed: average error rate on 6/11 canaries increased by 10x (rerun with --force to override this check, see https://logstash.wikimedia.org/goto/db09a36be5ed3e81155041f7d46ad040 for details)
  • 18:32 dcausse@deploy1001: Synchronized php-1.33.0-wmf.22/extensions/FlaggedRevs/: T218949: Fix reject changes when user is partially blocked (duration: 00m 51s)
  • 18:27 dcausse@deploy1001: Synchronized dblists/visualeditor-nondefault.dblist: T192135 (duration: 00m 50s)
  • 18:15 dcausse@deploy1001: Synchronized wmf-config/CommonSettings.php: T211622: Enforce 8 char password length requirements for non-privileged users (duration: 00m 50s)
  • 17:24 onimisionipe@deploy1001: Finished deploy [wdqs/wdqs@281aaf8]: New build for blazegraph and updater plus GUI updates (duration: 10m 31s)
  • 17:24 elukey: restart pdfrender on scb1004
  • 17:14 onimisionipe@deploy1001: Started deploy [wdqs/wdqs@281aaf8]: New build for blazegraph and updater plus GUI updates
  • 17:11 ebernhardson: restart mjolnir-kafka-msearch-daemon on relforge100[12]
  • 17:10 dcausse@deploy1001: Synchronized wmf-config/InitialiseSettings.php: T218878: [cirrus] switch low volume wikis to eqiad (elastic 6.5.4) (duration: 00m 49s)
  • 16:56 ebernhardson@deploy1001: Finished deploy [search/mjolnir/deploy@2fb5038]: Ship new logging support code via new simplified virtualenv deployment (duration: 09m 52s)
  • 16:47 ebernhardson@deploy1001: Started deploy [search/mjolnir/deploy@2fb5038]: Ship new logging support code via new simplified virtualenv deployment
  • 16:28 ebernhardson@deploy1001: Finished deploy [search/mjolnir/deploy@ba09eb5]: Ship new logging support code via new simplified virtualenv deployment (duration: 09m 10s)
  • 16:19 ebernhardson@deploy1001: Started deploy [search/mjolnir/deploy@ba09eb5]: Ship new logging support code via new simplified virtualenv deployment
  • 16:19 hashar: updating Jenkins plugins and restarting
  • 16:16 ebernhardson@deploy1001: Finished deploy [search/mjolnir/deploy@6eda7d8]: Ship new logging support code via new simplified virtualenv deployment (duration: 02m 38s)
  • 16:13 ebernhardson@deploy1001: Started deploy [search/mjolnir/deploy@6eda7d8]: Ship new logging support code via new simplified virtualenv deployment
  • 15:48 XioNoX: remove 2nd AS7568 router in Equinix Singapore
  • 15:21 ebernhardson@deploy1001: Finished deploy [search/mjolnir/deploy@ddf26d0]: Ship new logging support code (duration: 01m 29s)
  • 15:20 ebernhardson@deploy1001: Started deploy [search/mjolnir/deploy@ddf26d0]: Ship new logging support code
  • 15:00 jbond42: updateing passenger on rhodium
  • 14:29 andrewbogott: updating slapd indexes on seaborgium, serpens, ldap-eqiad-replica01, ldap-eqiad-replica02 for 498396
  • 13:52 ema@puppetmaster1001: conftool action : set/pooled=yes; selector: name=cp1076.eqiad.wmnet,service=varnish-fe
  • 13:52 ema@puppetmaster1001: conftool action : set/pooled=yes; selector: name=cp1076.eqiad.wmnet,service=nginx
  • 13:52 ema: cp1076: repool varnish-fe, frontend misses served by cp-ats T213263
  • 13:41 ema@puppetmaster1001: conftool action : set/pooled=no; selector: name=cp1076.eqiad.wmnet,service=varnish-fe
  • 13:41 ema@puppetmaster1001: conftool action : set/pooled=no; selector: name=cp1076.eqiad.wmnet,service=nginx
  • 13:41 ema: cp1076: depool varnish-fe and point it to cp-ats T213263
  • 13:28 mutante: planet - manually updating en version since new monitoring check warned it wasn't current (T203208)
  • 13:17 mutante: mwmaint1002 - manually running tor_exit_node cron command and test with PHP 7.2
  • 12:48 mutante: reloading icinga config
  • 12:15 Lucas_WMDE: EU SWAT finished
  • 12:08 oblivian@deploy1001: Synchronized wmf-config/CommonSettings.php: Move 0.1% of anonymous users to php7 T212828 (duration: 00m 49s)
  • 12:07 moritzm: installing openssl1.0 security updates on stretch
  • 12:00 lucaswerkmeister-wmde@deploy1001: Synchronized wmf-config/InitialiseSettings.php: SWAT: Revert "Remove $wgAbuseFilterRuntimeProfile" (T191039)|gerrit:498814Revert "Remove $wgAbuseFilterRuntimeProfile" (T191039) (duration: 00m 51s)
  • 11:48 lucaswerkmeister-wmde@deploy1001: Synchronized wmf-config/InitialiseSettings.php: SWAT: Remove $wgAbuseFilterRuntimeProfile (T191039)|gerrit:486470Remove $wgAbuseFilterRuntimeProfile (T191039) (duration: 00m 49s)
  • 11:46 ema: cp-ats-codfw: upgrade trafficserver to 8.0.3-1wm1
  • 11:35 lucaswerkmeister-wmde@deploy1001: Synchronized php-1.33.0-wmf.22/extensions/Wikibase/repo: SWAT: Revert "OutputPageBeforeHTML: do nothing for non entity pages" (T218907)|gerrit:498354 Revert "OutputPageBeforeHTML: do nothing for non entity pages" (T218907) (duration: 01m 06s)
  • 11:26 filippo@puppetmaster1001: conftool action : set/pooled=no; selector: name=prometheus2003.codfw.wmnet
  • 11:23 filippo@puppetmaster1001: conftool action : set/pooled=yes; selector: name=prometheus2004.codfw.wmnet
  • 11:23 godog: switch codfw prometheus from prometheus2003 to prometheus2004
  • 11:19 ema: cp-ats-eqiad: upgrade trafficserver to 8.0.3-1wm1
  • 11:18 oblivian@deploy1001: Synchronized wmf-config/ProductionServices.php: switching to use of the local proxy for search in php7 (duration: 00m 50s)
  • 11:16 oblivian@deploy1001: Synchronized wmf-config/LabsServices.php: switching to use of the local proxy for search in php7 (duration: 00m 50s)
  • 11:09 ema: trafficserver 8.0.3-1wm1 uploaded to stretch-wikimedia
  • 10:56 marostegui@deploy1001: Synchronized wmf-config/db-eqiad.php: Fully repool db1083 (duration: 00m 48s)
  • 10:40 marostegui@deploy1001: Synchronized wmf-config/db-eqiad.php: More traffic to db1083 (duration: 00m 49s)
  • 10:40 gehel: disable deprecation warnings on elasticsearch eqiad - T218994
  • 10:38 jdrewniak@deploy1001: Synchronized portals: Wikimedia Portals Update: Bumping portals to master (T128546)|gerrit:498800 Bumping portals to master (T128546) (duration: 00m 49s)
  • 10:37 jdrewniak@deploy1001: Synchronized portals/wikipedia.org/assets: Wikimedia Portals Update: Bumping portals to master (T128546)|gerrit:498800 Bumping portals to master (T128546) (duration: 00m 49s)
  • 10:27 moritzm: installing Java security updates on Hadoop/Druid test cluster
  • 10:26 marostegui@deploy1001: Synchronized wmf-config/db-eqiad.php: More traffic to db1083 (duration: 00m 49s)
  • 10:12 marostegui@deploy1001: Synchronized wmf-config/db-eqiad.php: More traffic to db1083 (duration: 00m 49s)
  • 10:07 moritzm: installing ntfs-3g security updates
  • 10:02 marostegui@deploy1001: Synchronized wmf-config/db-eqiad.php: Slowly repool db1083 (duration: 00m 49s)
  • 09:42 moritzm: uploaded openjdk 8u212-b01-1~deb8u1 to apt.wikimedia.org/jessie-wikimedia/main
  • 09:34 marostegui: Upgrade db2062
  • 09:24 hashar: contint1001: manually compressing Zuul log files sudo -u zuul gzip --best /var/log/zuul/*.log.????-??-??
  • 09:22 marostegui@deploy1001: Synchronized wmf-config/db-eqiad.php: Depool db1083+ (duration: 00m 49s)
  • 09:18 marostegui: Upgrade db2055
  • 09:16 marostegui@deploy1001: Synchronized wmf-config/db-eqiad.php: Repool db1105:3311 (duration: 00m 49s)
  • 09:10 mutante: contint1001 - restarting zuul
  • 08:21 marostegui@deploy1001: Synchronized wmf-config/db-eqiad.php: Depool db1105:3311 (duration: 00m 49s)
  • 08:08 vgutierrez: reenabling puppet in openldap servers
  • 08:05 marostegui@deploy1001: Synchronized wmf-config/db-eqiad.php: Fully repool db1118 (duration: 00m 49s)
  • 07:58 vgutierrez: disable puppet and downtime host in icinga for labtestservices2001 - T218022
  • 07:40 vgutierrez: disable puppet in production openldap servers before merging https://gerrit.wikimedia.org/r/498776
  • 07:24 marostegui@deploy1001: Synchronized wmf-config/db-eqiad.php: More traffic to db1118 (duration: 00m 49s)
  • 06:59 marostegui@deploy1001: Synchronized wmf-config/db-eqiad.php: Slowly repool db1118 after mysql upgrade (duration: 00m 50s)
  • 06:45 marostegui: Stop MySQL on db1118 for upgrade
  • 06:44 marostegui: Deploy schema change on s1 codfw master, this will generate lag on codfw
  • 06:10 marostegui@deploy1001: Synchronized wmf-config/db-eqiad.php: Depool db1118 for schema change and upgrade (duration: 00m 54s)
  • 04:31 chaomodus: restarted pdfrender on scb1003 to try to help flapping

2019-03-24

  • 15:00 jijiki: Restart pdfrender on scb1002 and scb1004

2019-03-23

  • 13:02 mholloway-shell@deploy1001: Synchronized wmf-config/InitialiseSettings-labs.php: Fix WikimediaEditorTasks Beta Cluster DB config, take 2 (duration: 00m 50s)
  • 12:36 mholloway-shell@deploy1001: Synchronized wmf-config/InitialiseSettings-labs.php: Fix WikimediaEditorTasks Beta Cluster DB config (duration: 00m 52s)

2019-03-22

  • 22:13 bd808: Restarted uwsgi-striker on labweb1002
  • 22:12 bd808: Restarted uwsgi-striker on labweb1001
  • 20:14 otto@deploy1001: scap-helm eventgate-analytics finished
  • 20:14 otto@deploy1001: scap-helm eventgate-analytics cluster eqiad completed
  • 20:14 otto@deploy1001: scap-helm eventgate-analytics upgrade production -f eventgate-analytics-eqiad-values.yaml stable/eventgate-analytics [namespace: eventgate-analytics, clusters: eqiad]
  • 20:04 otto@deploy1001: scap-helm eventgate-analytics finished
  • 20:04 otto@deploy1001: scap-helm eventgate-analytics cluster eqiad completed
  • 20:04 otto@deploy1001: scap-helm eventgate-analytics upgrade production -f eventgate-analytics-eqiad-values.yaml stable/eventgate-analytics [namespace: eventgate-analytics, clusters: eqiad]
  • 19:59 ejegg: updated payments-wiki-staging from 31647bc97e to f42910460b
  • 19:57 otto@deploy1001: scap-helm eventgate-analytics finished
  • 19:57 otto@deploy1001: scap-helm eventgate-analytics cluster eqiad completed
  • 19:57 otto@deploy1001: scap-helm eventgate-analytics upgrade production -f eventgate-analytics-eqiad-values.yaml stable/eventgate-analytics [namespace: eventgate-analytics, clusters: eqiad]
  • 19:55 otto@deploy1001: scap-helm eventgate-analytics finished
  • 19:55 otto@deploy1001: scap-helm eventgate-analytics cluster codfw completed
  • 19:55 otto@deploy1001: scap-helm eventgate-analytics upgrade production -f eventgate-analytics-codfw-values.yaml stable/eventgate-analytics [namespace: eventgate-analytics, clusters: codfw]
  • 19:52 otto@deploy1001: scap-helm eventgate-analytics finished
  • 19:52 otto@deploy1001: scap-helm eventgate-analytics cluster staging completed
  • 19:52 otto@deploy1001: scap-helm eventgate-analytics upgrade staging -f eventgate-analytics-staging-values.yaml stable/eventgate-analytics --set main_app.extra_kafka_conf= [namespace: eventgate-analytics, clusters: staging]
  • 19:52 otto@deploy1001: scap-helm eventgate-analytics finished
  • 19:52 otto@deploy1001: scap-helm eventgate-analytics cluster staging completed
  • 19:52 otto@deploy1001: scap-helm eventgate-analytics upgrade staging -f eventgate-analytics-staging-values.yaml stable/eventgate-analytics --set main_app.extra_kafka_conf={} [namespace: eventgate-analytics, clusters: staging]
  • 19:46 otto@deploy1001: scap-helm eventgate-analytics finished
  • 19:46 otto@deploy1001: scap-helm eventgate-analytics cluster staging completed
  • 19:46 otto@deploy1001: scap-helm eventgate-analytics upgrade staging -f eventgate-analytics-staging-values.yaml stable/eventgate-analytics [namespace: eventgate-analytics, clusters: staging]
  • 19:39 otto@deploy1001: scap-helm eventgate-analytics finished
  • 19:39 otto@deploy1001: scap-helm eventgate-analytics cluster staging completed
  • 19:39 otto@deploy1001: scap-helm eventgate-analytics upgrade staging -f eventgate-analytics-staging-values.yaml stable/eventgate-analytics [namespace: eventgate-analytics, clusters: staging]
  • 19:36 otto@deploy1001: scap-helm eventgate-analytics finished
  • 19:36 otto@deploy1001: scap-helm eventgate-analytics cluster staging completed
  • 19:36 otto@deploy1001: scap-helm eventgate-analytics upgrade staging -f eventgate-analytics-staging-values.yaml stable/eventgate-analytics [namespace: eventgate-analytics, clusters: staging]
  • 18:53 otto@deploy1001: scap-helm eventgate-analytics finished
  • 18:53 otto@deploy1001: scap-helm eventgate-analytics cluster staging completed
  • 18:53 otto@deploy1001: scap-helm eventgate-analytics upgrade staging -f eventgate-analytics-staging-values.yaml stable/eventgate-analytics [namespace: eventgate-analytics, clusters: staging]
  • 18:41 krinkle@deploy1001: Synchronized php-1.33.0-wmf.22/extensions/Collection/: I2c4f5d / T217835 (duration: 00m 52s)
  • 18:21 otto@deploy1001: scap-helm eventgate-analytics finished
  • 18:21 otto@deploy1001: scap-helm eventgate-analytics cluster staging completed
  • 18:21 otto@deploy1001: scap-helm eventgate-analytics upgrade staging -f eventgate-analytics-staging-values.yaml stable/eventgate-analytics [namespace: eventgate-analytics, clusters: staging]
  • 18:16 otto@deploy1001: scap-helm eventgate-analytics finished
  • 18:16 otto@deploy1001: scap-helm eventgate-analytics cluster staging completed
  • 18:16 otto@deploy1001: scap-helm eventgate-analytics upgrade staging -f eventgate-analytics-staging-values.yaml --set main_app.kafka_broker_list=kafka-jumbo1003.eqiad.wmnet:9092 stable/eventgate-analytics [namespace: eventgate-analytics, clusters: staging]
  • 18:13 tzatziki: removing 5 files for legal compliance
  • 18:13 otto@deploy1001: scap-helm eventgate-analytics upgrade staging -f eventgate-analytics-staging-values.yaml --set main_app.kafka_broker_list=kafka-jumbo1003.eqiad.wmnet:9092 stable/eventgate-analytics [namespace: eventgate-analytics, clusters: staging]
  • 17:29 otto@deploy1001: scap-helm eventgate-analytics upgrade staging -f eventgate-analytics-staging-values.yaml --set main_app.kafka_broker_list=kafka-jumbo1003.eqiad.wmnet:9092 stable/eventgate-analytics [namespace: eventgate-analytics, clusters: staging]
  • 16:06 jijiki: Restart ferm on db2096
  • 15:58 James_F: UBN hot-deploy for T218918: Only load latest revision in MessageCache::loadFromDB
  • 15:26 gehel: restarting elasticsearch on elastic1046 for logging configuration change - T218994
  • 14:34 mutante: scandium - apt-get remove --purge php* ; apt autoremove ; letting puppet reinstall php 7.2 one more time using mediawiki::profile::php now
  • 14:33 gehel: upgrading to elasticsearch-curator 5.6.0 on all elasticsearch nodes (including logstash) - T218991
  • 11:22 ema: lvs1002: bounce pybal to clear backends health icinga warning T218133
  • 11:18 ema: lvs1005: bounce pybal to clear backends health icinga warning T218133
  • 10:24 mutante: scandium - apt autoremove
  • 10:20 mutante: scandium - manually removing all php* packages to let puppet reinstall 7.2 instead of 7.0
  • 10:05 ema: cp2005: repooled, serving traffic via ATS T213263
  • 10:00 ema@puppetmaster1001: conftool action : set/pooled=yes; selector: name=cp2005.codfw.wmnet,service=varnish-fe
  • 10:00 ema@puppetmaster1001: conftool action : set/pooled=yes; selector: name=cp2005.codfw.wmnet,service=nginx
  • 09:48 ema@puppetmaster1001: conftool action : set/pooled=no; selector: name=cp2005.codfw.wmnet,service=varnish-fe
  • 09:48 ema@puppetmaster1001: conftool action : set/pooled=no; selector: name=cp2005.codfw.wmnet,service=nginx
  • 09:47 ema: cp2005: depool varnish-fe in preparation of traffic switch to ATS T213263
  • 09:42 moritzm: rebooting pool counters in codfw to pick up SSBD-enabled qemu
  • 09:04 elukey: start tcpdump on mc1022 to gather traffic for analysis
  • 06:26 marostegui@deploy1001: Synchronized wmf-config/db-eqiad.php: Repool db1094 (duration: 00m 50s)
  • 06:07 marostegui@deploy1001: Synchronized wmf-config/db-eqiad.php: Depool db1094 (duration: 00m 49s)
  • 06:05 marostegui@deploy1001: Synchronized wmf-config/db-codfw.php: Repool db2096 after onsite maintenance (duration: 00m 51s)
  • 01:31 bd808: labweb: upgraded mariadb packages installed on labweb100[12]
  • 01:19 bd808@deploy1001: Finished deploy [striker/deploy@b4bcd08]: Update python wheels (duration: 01m 00s)
  • 01:18 bd808@deploy1001: Started deploy [striker/deploy@b4bcd08]: Update python wheels
  • 00:54 bd808: Striker down following upgrade. scap3 did not rebuild venv as expected. Manually resolved, but not having mysql library issues.
  • 00:47 Krinkle: krinkle@mwmaint1002 Fixing corrupt 'log_params' field of kawiki.logging row where log_id=1021367; T93110
  • 00:36 bd808@deploy1001: Finished deploy [striker/deploy@c4726e3]: Django upgrade and various bug fixes (T192487, T182142, T176325, T217932) (duration: 01m 15s)
  • 00:34 bd808@deploy1001: Started deploy [striker/deploy@c4726e3]: Django upgrade and various bug fixes (T192487, T182142, T176325, T217932)
  • 00:32 James_F: SWAT done, 12 minutes ago.
  • 00:20 jforrester@deploy1001: Finished scap: SWAT: Full scap for i18n rebuild for 498259 and 498113 (duration: 24m 49s)

2019-03-21

  • 23:57 gtirloni: downtimed systemd check in labweb1001/1002 (T218935)
  • 23:56 jforrester@deploy1001: Started scap: SWAT: Full scap for i18n rebuild for 498259 and 498113
  • 23:53 gtirloni: downtimed systemd check in labwen1001 (T210818)
  • 23:32 jforrester@deploy1001: Synchronized php-1.33.0-wmf.22/extensions/ContentTranslation/api/ApiQueryContentTranslationSuggestions.php: SWAT T218902 CX: Return API error on anonymous suggestions queries (duration: 00m 51s)
  • 23:08 jforrester@deploy1001: Synchronized wmf-config/Wikibase.php: SWAT T217730 Add wikimaniawiki to another special group in Wikibase client (duration: 00m 49s)
  • 22:33 jijiki: Restarting pdfrender on scb1003
  • 22:26 otto@deploy1001: scap-helm eventgate-analytics finished
  • 22:26 otto@deploy1001: scap-helm eventgate-analytics cluster staging completed
  • 22:26 otto@deploy1001: scap-helm eventgate-analytics upgrade staging -f eventgate-analytics-staging-values.yaml stable/eventgate-analytics [namespace: eventgate-analytics, clusters: staging]
  • 22:14 gehel@cumin1001: END (PASS) - Cookbook sre.elasticsearch.e6-upgrade (exit_code=0)
  • 22:02 mholloway-shell@deploy1001: Synchronized wmf-config/CommonSettings.php: Enable WikimediaEditorTasks on the Beta Cluster (duration: 00m 49s)
  • 21:56 mholloway-shell@deploy1001: Synchronized wmf-config/InitialiseSettings-labs.php: Add WikimediaEditorTasks labs config to InitializeSettings-labs.php (duration: 00m 47s)
  • 21:54 mholloway-shell@deploy1001: Synchronized wmf-config/InitialiseSettings.php: Add WikimediaEditorTasks default config to InitializeSettings.php (duration: 00m 49s)
  • 21:53 jijiki: Restarting pdfrender on scb1004
  • 21:52 mholloway-shell@deploy1001: Synchronized wmf-config/extension-list: Add WikimediaEditorTasks to extension-list (duration: 00m 50s)
  • 21:45 gehel@cumin1001: START - Cookbook sre.elasticsearch.e6-upgrade
  • 21:39 XioNoX: Ping offload - replace test IP with text-lb.codfw IP on cr1/2-codfw - T190090
  • 21:11 XioNoX: remove peering sessions to AS7385 on cr4-ulsfo
  • 21:08 otto@deploy1001: scap-helm eventgate-analytics finished
  • 21:08 otto@deploy1001: scap-helm eventgate-analytics cluster staging completed
  • 21:08 otto@deploy1001: scap-helm eventgate-analytics upgrade staging -f eventgate-analytics-staging-values.yaml stable/eventgate-analytics [namespace: eventgate-analytics, clusters: staging]
  • 20:55 otto@deploy1001: scap-helm eventgate-analytics finished
  • 20:55 otto@deploy1001: scap-helm eventgate-analytics cluster staging completed
  • 20:55 otto@deploy1001: scap-helm eventgate-analytics upgrade staging -f eventgate-analytics-staging-values.yaml stable/eventgate-analytics [namespace: eventgate-analytics, clusters: staging]
  • 20:43 otto@deploy1001: scap-helm eventgate-analytics finished
  • 20:43 otto@deploy1001: scap-helm eventgate-analytics cluster staging completed
  • 20:43 otto@deploy1001: scap-helm eventgate-analytics upgrade staging -f eventgate-analytics-staging-values.yaml stable/eventgate-analytics [namespace: eventgate-analytics, clusters: staging]
  • 20:29 otto@deploy1001: scap-helm eventgate-analytics finished
  • 20:29 otto@deploy1001: scap-helm eventgate-analytics cluster staging completed
  • 20:29 otto@deploy1001: scap-helm eventgate-analytics upgrade staging -f eventgate-analytics-staging-values.yaml stable/eventgate-analytics [namespace: eventgate-analytics, clusters: staging]
  • 20:29 otto@deploy1001: scap-helm eventgate-analytics finished
  • 20:29 otto@deploy1001: scap-helm eventgate-analytics cluster staging completed
  • 20:29 otto@deploy1001: scap-helm eventgate-analytics upgrade staging -f eventgate-analytics-staging-values.yaml --set main_app.kafka_broker_list=kafka-jumbo1006.eqiad.wmnet:9092 stable/eventgate-analytics [namespace: eventgate-analytics, clusters: staging]
  • 20:27 otto@deploy1001: scap-helm eventgate-analytics finished
  • 20:27 otto@deploy1001: scap-helm eventgate-analytics cluster staging completed
  • 20:27 otto@deploy1001: scap-helm eventgate-analytics upgrade staging -f eventgate-analytics-staging-values.yaml --set main_app.kafka_broker_list=kafka-jumbo1006.eqiad.wmnet:9092 stable/eventgate-analytics [namespace: eventgate-analytics, clusters: staging]
  • 20:26 otto@deploy1001: scap-helm eventgate-analytics finished
  • 20:26 otto@deploy1001: scap-helm eventgate-analytics cluster staging completed
  • 20:26 otto@deploy1001: scap-helm eventgate-analytics upgrade staging -f eventgate-analytics-staging-values.yaml --set main_app.kafka_broker_list=kafka-jumbo1005.eqiad.wmnet:9092 stable/eventgate-analytics [namespace: eventgate-analytics, clusters: staging]
  • 20:24 otto@deploy1001: scap-helm eventgate-analytics finished
  • 20:24 otto@deploy1001: scap-helm eventgate-analytics cluster staging completed
  • 20:24 otto@deploy1001: scap-helm eventgate-analytics upgrade staging -f eventgate-analytics-staging-values.yaml --set main_app.kafka_broker_list=kafka-jumbo1004.eqiad.wmnet:9092 stable/eventgate-analytics [namespace: eventgate-analytics, clusters: staging]
  • 20:23 otto@deploy1001: scap-helm eventgate-analytics finished
  • 20:23 otto@deploy1001: scap-helm eventgate-analytics cluster staging completed
  • 20:23 otto@deploy1001: scap-helm eventgate-analytics upgrade staging -f eventgate-analytics-staging-values.yaml --set main_app.kafka_broker_list=kafka-jumbo1003.eqiad.wmnet:9092 stable/eventgate-analytics [namespace: eventgate-analytics, clusters: staging]
  • 20:23 otto@deploy1001: scap-helm eventgate-analytics finished
  • 20:23 otto@deploy1001: scap-helm eventgate-analytics cluster staging completed
  • 20:23 otto@deploy1001: scap-helm eventgate-analytics upgrade staging -f eventgate-analytics-staging-values.yaml --set main_app.kafka_broker_list=kafka-jumbo1001.eqiad.wmnet:9092 stable/eventgate-analytics [namespace: eventgate-analytics, clusters: staging]
  • 20:22 otto@deploy1001: scap-helm eventgate-analytics finished
  • 20:22 otto@deploy1001: scap-helm eventgate-analytics cluster staging completed
  • 20:22 otto@deploy1001: scap-helm eventgate-analytics upgrade staging -f eventgate-analytics-staging-values.yaml --set main_app.kafka_broker_list=kafka-jumbo1001.eqiad.wmnet:9092 stable/eventgate-analytics [namespace: eventgate-analytics, clusters: staging]
  • 20:21 otto@deploy1001: scap-helm eventgate-analytics finished
  • 20:21 otto@deploy1001: scap-helm eventgate-analytics cluster staging completed
  • 20:21 otto@deploy1001: scap-helm eventgate-analytics upgrade staging -f eventgate-analytics-staging-values.yaml --set main_app.kafka_broker_list=kafka-jumbo1002.eqiad.wmnet:9092 stable/eventgate-analytics [namespace: eventgate-analytics, clusters: staging]
  • 20:03 otto@deploy1001: scap-helm eventgate-analytics finished
  • 20:03 otto@deploy1001: scap-helm eventgate-analytics cluster staging completed
  • 20:03 otto@deploy1001: scap-helm eventgate-analytics upgrade staging -f eventgate-analytics-staging-values.yaml stable/eventgate-analytics [namespace: eventgate-analytics, clusters: staging]
  • 19:45 jforrester@deploy1001: Synchronized wmf-config/Wikibase.php: T213483 Disable RDF output of mediainfo Wikibase entities (duration: 00m 49s)
  • 19:40 jforrester@deploy1001: Synchronized wmf-config/Wikibase.php: T213483 Read wmgWikibaseEntityTypesWithoutRdfOutput value (duration: 00m 50s)
  • 19:39 jforrester@deploy1001: Synchronized wmf-config/InitialiseSettings.php: T213483 Set default wmgWikibaseEntityTypesWithoutRdfOutput value (duration: 00m 51s)
  • 18:49 gehel: resetting archived settings on elasticsearch cirrus eqiad - T218879
  • 18:41 otto@deploy1001: scap-helm eventgate-analytics finished
  • 18:41 otto@deploy1001: scap-helm eventgate-analytics cluster staging completed
  • 18:41 otto@deploy1001: scap-helm eventgate-analytics upgrade staging -f eventgate-analytics-staging-values.yaml stable/eventgate-analytics [namespace: eventgate-analytics, clusters: staging]
  • 18:36 sbisson@deploy1001: Synchronized php-1.33.0-wmf.22/languages/Language.php: SWAT: languages: Partial revert of I8287118cf8ec01326ead9|gerrit:498116languages: Partial revert of I8287118cf8ec01326ead9 (duration: 00m 50s)
  • 18:30 gehel@cumin1001: END (FAIL) - Cookbook sre.elasticsearch.e6-upgrade (exit_code=99)
  • 18:25 sbisson@deploy1001: Synchronized wmf-config/InitialiseSettings.php: SWAT: Disable Welcome survey on viwiki|gerrit:498166Disable Welcome survey on viwiki (duration: 00m 49s)
  • 18:23 otto@deploy1001: scap-helm eventgate-analytics finished
  • 18:23 otto@deploy1001: scap-helm eventgate-analytics cluster staging completed
  • 18:23 otto@deploy1001: scap-helm eventgate-analytics upgrade staging -f eventgate-analytics-staging-values.yaml stable/eventgate-analytics [namespace: eventgate-analytics, clusters: staging]
  • 18:17 gehel@cumin2001: END (PASS) - Cookbook sre.elasticsearch.force-shard-allocation (exit_code=0)
  • 18:16 sbisson@deploy1001: Synchronized wmf-config/InitialiseSettings.php: SWAT: Enable logging for CitationUsage and CitationUsagePageLoad|gerrit:496857Enable logging for CitationUsage and CitationUsagePageLoad (duration: 00m 49s)
  • 18:13 gehel@cumin2001: START - Cookbook sre.elasticsearch.force-shard-allocation
  • 18:12 otto@deploy1001: scap-helm eventgate-analytics finished
  • 18:12 otto@deploy1001: scap-helm eventgate-analytics cluster staging completed
  • 18:12 otto@deploy1001: scap-helm eventgate-analytics upgrade staging -f eventgate-analytics-staging-values.yaml stable/eventgate-analytics [namespace: eventgate-analytics, clusters: staging]
  • 18:11 sbisson@deploy1001: Synchronized wmf-config/InitialiseSettings.php: SWAT: Disable reader trust survey v2|gerrit:494552Disable reader trust survey v2 (duration: 00m 50s)
  • 18:08 gehel@cumin1001: START - Cookbook sre.elasticsearch.e6-upgrade
  • 18:05 otto@deploy1001: scap-helm eventgate-analytics finished
  • 18:05 otto@deploy1001: scap-helm eventgate-analytics cluster staging completed
  • 18:05 otto@deploy1001: scap-helm eventgate-analytics upgrade staging -f eventgate-analytics-staging-values.yaml stable/eventgate-analytics [namespace: eventgate-analytics, clusters: staging]
  • 18:01 otto@deploy1001: scap-helm eventgate-analytics finished
  • 18:01 otto@deploy1001: scap-helm eventgate-analytics cluster staging completed
  • 18:01 otto@deploy1001: scap-helm eventgate-analytics upgrade staging -f eventgate-analytics-staging-values.yaml stable/eventgate-analytics [namespace: eventgate-analytics, clusters: staging]
  • 17:56 bblack: everything back to normal for lvs1002/lvs1005 (high-traffic2 @ eqiad)
  • 17:55 bblack: restarting pybal on lvs1002
  • 17:54 otto@deploy1001: scap-helm eventgate-analytics finished
  • 17:54 otto@deploy1001: scap-helm eventgate-analytics cluster staging completed
  • 17:54 otto@deploy1001: scap-helm eventgate-analytics upgrade staging -f eventgate-analytics-staging-values.yaml stable/eventgate-analytics [namespace: eventgate-analytics, clusters: staging]
  • 17:49 reedy@deploy1001: Synchronized php-1.33.0-wmf.22/includes/user/User.php: Iab2492 (duration: 00m 51s)
  • 17:43 bblack: restarting pybal on lvs1005
  • 17:43 jforrester@deploy1001: Synchronized wmf-config/InitialiseSettings.php: SDC: Enable EntitySourceBasedFederation on TestCommons (duration: 00m 50s)
  • 17:37 gehel@cumin1001: END (FAIL) - Cookbook sre.elasticsearch.e6-upgrade (exit_code=99)
  • 17:35 bblack: disabled puppet on lvs1002 + lvs1005 for new service rollout
  • 17:28 gehel@cumin1001: START - Cookbook sre.elasticsearch.e6-upgrade
  • 17:27 jforrester@deploy1001: Synchronized wmf-config/CommonSettings.php: SDC: Add test-commons.wikimedia.org to wgCrossSiteAJAXdomains (duration: 00m 49s)
  • 17:11 gehel@cumin1001: END (FAIL) - Cookbook sre.elasticsearch.e6-upgrade (exit_code=99)
  • 17:07 jforrester@deploy1001: Synchronized wmf-config/InitialiseSettings.php: SDC: Enable Depicts on TestCommons, with related config (duration: 00m 50s)
  • 17:03 gehel@cumin1001: START - Cookbook sre.elasticsearch.e6-upgrade
  • 17:03 gehel@cumin1001: END (ERROR) - Cookbook sre.elasticsearch.e6-upgrade (exit_code=97)
  • 17:02 gehel@cumin2001: END (PASS) - Cookbook sre.elasticsearch.force-shard-allocation (exit_code=0)
  • 17:02 gehel@cumin2001: START - Cookbook sre.elasticsearch.force-shard-allocation
  • 16:39 gehel@cumin1001: START - Cookbook sre.elasticsearch.e6-upgrade
  • 16:38 gehel@cumin1001: END (ERROR) - Cookbook sre.elasticsearch.e6-upgrade (exit_code=97)
  • 16:38 gehel@cumin1001: START - Cookbook sre.elasticsearch.e6-upgrade
  • 16:38 gehel@cumin1001: END (ERROR) - Cookbook sre.elasticsearch.e6-upgrade (exit_code=97)
  • 16:38 gehel@cumin2001: END (PASS) - Cookbook sre.elasticsearch.force-shard-allocation (exit_code=0)
  • 16:38 gehel@cumin2001: START - Cookbook sre.elasticsearch.force-shard-allocation
  • 16:29 gehel@cumin2001: END (PASS) - Cookbook sre.elasticsearch.force-shard-allocation (exit_code=0)
  • 16:29 gehel@cumin2001: START - Cookbook sre.elasticsearch.force-shard-allocation
  • 16:03 marostegui@deploy1001: Synchronized wmf-config/db-codfw.php: Depool db2096 for onsite maintenance (duration: 00m 50s)
  • 16:01 marostegui: Poweroff db2096 for onsite maintenance T218336
  • 15:20 moritzm: rebooting flerovium/furud for kernel updates
  • 14:35 moritzm: restarging jenkins on releases* after Java update
  • 14:18 gtirloni: downtimed labtestweb2001 (T218881)
  • 14:11 vgutierrez: re-enabling puppet in acme-chief clients - T218862
  • 14:09 arturo: T218024 disabled icinga checks for labtestweb2001
  • 14:07 gehel@cumin1001: START - Cookbook sre.elasticsearch.e6-upgrade
  • 13:58 vgutierrez: update acme-chief to version 0.15 in acmechief1001 - T218862
  • 13:54 vgutierrez: disabling puppet in acme-chief clients - T218862
  • 13:48 akosiaris: reboot oresrdb2001
  • 13:39 marostegui@deploy1001: Synchronized wmf-config/db-eqiad.php: Repool db1090:3317 (duration: 00m 51s)
  • 13:37 elukey: upgrade openjdk-8 on an-worker1080 and restarted hadoop daemons
  • 13:28 moritzm: installing Java security updates on notebook hosts
  • 13:22 zfilipin@deploy1001: rebuilt and synchronized wikiversions files: all wikis to 1.33.0-wmf.22
  • 13:18 gtirloni: downtimed cloudcontrol*, cloudservices*, labcontrol*, labweb* (T210818)
  • 13:06 moritzm: installing Java security updates on stat hosts
  • 12:40 arturo: T216497 remove python-cliff from jessie-wikimedia/openstack-mitaka-jessie
  • 12:35 jijiki: Pooling mw1339 back
  • 12:33 jijiki: Pooling mw1290 back
  • 12:08 arturo: T216497 add python-cliff to jessie-wikimedia/openstack-mitaka-jessie
  • 12:02 vgutierrez: uploaded acme-chief 0.15 to apt.wikimedia.org (buster) - T218862
  • 11:54 elukey: restart yarn node managers on an-worker10[82,89,92] - shutdown after a long yarn failover and only now downtime is expired
  • 11:36 mutante: gerrit2001 (not the master prod server)- scheduled downtime and rebooting for upgrade
  • 11:04 zeljkof: EU SWAT finished
  • 11:04 zfilipin@deploy1001: Synchronized wmf-config/throttle.php: SWAT: Add new throttle rule for LMU Edit-a-thon (T217929) (duration: 00m 57s)
  • 10:57 filippo@puppetmaster1001: conftool action : set/pooled=no; selector: name=prometheus2004.codfw.wmnet
  • 10:52 filippo@puppetmaster1001: conftool action : set/pooled=yes; selector: name=prometheus2003.codfw.wmnet
  • 10:46 elukey: restart hadoop yarn resource managers on an-master100[1,2] to pick up new settings
  • 10:23 moritzm: rebooting labtestcontrol2001 for kernel update
  • 10:19 marostegui@deploy1001: Synchronized wmf-config/db-eqiad.php: Depool db1090:3317 (duration: 00m 56s)
  • 09:59 marostegui@deploy1001: Synchronized wmf-config/db-eqiad.php: Fully repool db1086 (duration: 00m 58s)
  • 09:42 jiji@cumin1001: conftool action : set/pooled=no; selector: dc=codfw,service=cxserver,cluster=scb,name=scb.*
  • 09:42 jijiki: Depool scb* in codfw from serving cxserver, finishing its migration to k8s - T213195
  • 09:29 marostegui@deploy1001: Synchronized wmf-config/db-eqiad.php: Slowly repool db1086 after mysql upgrade (duration: 00m 56s)
  • 09:27 moritzm: rolling reboot of maps servers in codfw for kernel update
  • 09:17 marostegui: Upgrade and reboot db1086
  • 08:53 marostegui: Upgrade db1086
  • 08:53 marostegui@deploy1001: Synchronized wmf-config/db-eqiad.php: Depool db1086 for upgrade (duration: 00m 56s)
  • 08:43 marostegui@deploy1001: Synchronized wmf-config/db-eqiad.php: Repool db1086 (duration: 00m 57s)
  • 08:20 marostegui@deploy1001: Synchronized wmf-config/db-eqiad.php: Depool db1086 (duration: 00m 56s)
  • 08:09 marostegui@deploy1001: Synchronized wmf-config/db-eqiad.php: Repool db1079 (duration: 00m 56s)
  • 08:01 vgutierrez: deploying directory based certificates in acme-chief clients - T207295
  • 07:35 _joe_: rolling restart of php-fpm to pick up some changes
  • 07:34 marostegui: Deploy schema change on db1079, this will generate lag on labsdb:s8
  • 07:33 marostegui@deploy1001: Synchronized wmf-config/db-eqiad.php: Depool db1079 (duration: 00m 57s)
  • 07:03 elukey: restart pdfrender on scb1002
  • 06:55 marostegui@deploy1001: Synchronized wmf-config/db-eqiad.php: Repool db1101:3317 (duration: 00m 56s)
  • 06:24 marostegui: Run wmcs-wikireplica-dns on cloudcontrol1003 to get dbproxy1011 back
  • 06:13 marostegui@deploy1001: Synchronized wmf-config/db-eqiad.php: Depool db1101:3317 (duration: 01m 10s)
  • 06:12 marostegui: Upgrade and reboot dbproxy1011
  • 06:04 marostegui: Run wmcs-wikireplica-dns on cloudcontrol1003 to drain dbproxy1011
  • 00:09 jforrester@deploy1001: Synchronized php-1.33.0-wmf.22/includes/parser/BlockLevelPass.php: SAT T218817 Unbreak parser line counting for long wikitext pages I22eebb70a I55a2c4c0 I41a45266d (duration: 00m 56s)
  • 00:08 twentyafterfour: deploying phabricator upgrade
  • 00:01 jforrester@deploy1001: Synchronized wmf-config/CommonSettings.php: SWAT Move FundraisingTranslateWorkflow load to after Translate I73452ae8 (duration: 00m 56s)

2019-03-20

  • 23:49 jforrester@deploy1001: Synchronized php-1.33.0-wmf.22/resources/lib/ooui/oojs-ui-core.js: SWAT T218722 T218830 Bring forward UBN OOUI fix (duration: 00m 57s)
  • 23:28 maxsem@deploy1001: Synchronized wmf-config/InitialiseSettings.php: https://gerrit.wikimedia.org/r/#/c/operations/mediawiki-config/+/497948/ (duration: 00m 56s)
  • 23:10 maxsem@deploy1001: Synchronized wmf-config/InitialiseSettings.php: https://gerrit.wikimedia.org/r/#/c/operations/mediawiki-config/+/490648/ (duration: 00m 56s)
  • 22:29 jforrester@deploy1001: Synchronized wmf-config/InitialiseSettings.php: T214075 Enable federation of Wikidata items and properties on Test Commons (duration: 00m 57s)
  • 21:37 XioNoX: apply transit-in4 term offload-ping4 with test IP to cr1/2-codfw - T190090
  • 21:34 XioNoX: apply transit-in4 term offload-ping4 with test IP to cr2-codfw
  • 21:00 XioNoX: apply icmp redirect on cr1-codfw:xe-5/0/2 (to cr4-ulsfo) for test IP 208.80.154.225 - T190090
  • 20:24 zfilipin@deploy1001: Synchronized php: group1 wikis to 1.33.0-wmf.22 (duration: 01m 46s)
  • 20:23 zfilipin@deploy1001: rebuilt and synchronized wikiversions files: group1 wikis to 1.33.0-wmf.22
  • 20:13 otto@deploy1001: scap-helm eventgate-analytics cluster codfw completed
  • 20:13 otto@deploy1001: scap-helm eventgate-analytics finished
  • 20:13 otto@deploy1001: scap-helm eventgate-analytics upgrade production -f eventgate-analytics-codfw-values.yaml stable/eventgate-analytics [namespace: eventgate-analytics, clusters: codfw]
  • 20:13 otto@deploy1001: scap-helm eventgate-analytics finished
  • 20:13 otto@deploy1001: scap-helm eventgate-analytics cluster eqiad completed
  • 20:13 otto@deploy1001: scap-helm eventgate-analytics upgrade production -f eventgate-analytics-eqiad-values.yaml stable/eventgate-analytics [namespace: eventgate-analytics, clusters: eqiad]
  • 20:07 otto@deploy1001: scap-helm eventgate-analytics finished
  • 20:07 otto@deploy1001: scap-helm eventgate-analytics cluster staging completed
  • 20:07 otto@deploy1001: scap-helm eventgate-analytics upgrade staging -f eventgate-analytics-staging-values.yaml stable/eventgate-analytics [namespace: eventgate-analytics, clusters: staging]
  • 19:38 zfilipin@deploy1001: rebuilt and synchronized wikiversions files: group0 to 1.33.0-wmf.22
  • 19:13 otto@deploy1001: scap-helm eventgate-analytics cluster staging completed
  • 19:13 otto@deploy1001: scap-helm eventgate-analytics finished
  • 19:13 otto@deploy1001: scap-helm eventgate-analytics upgrade staging -f eventgate-analytics-staging-values.yaml stable/eventgate-analytics [namespace: eventgate-analytics, clusters: staging]
  • 19:13 otto@deploy1001: scap-helm eventgate-analytics upgrade staging -f eventgate-analytics-staging-values.yaml stable/eventgate-analytics [namespace: eventgate-analytics, clusters: staging]
  • 19:10 otto@deploy1001: scap-helm eventgate-analytics upgrade staging -f eventgate-analytics-staging-values.yaml stable/eventgate-analytics [namespace: eventgate-analytics, clusters: staging]
  • 19:04 zfilipin@deploy1001: Finished scap: testwiki to php-1.33.0-wmf.22 and rebuild l10n cache (duration: 38m 29s)
  • 18:50 jijiki: restarting pdfrender on scb1003
  • 18:49 ottomata: hitting eventgate-analytics in eqiad with ab
  • 18:39 otto@deploy1001: scap-helm eventgate-analytics cluster eqiad completed
  • 18:39 otto@deploy1001: scap-helm eventgate-analytics finished
  • 18:39 otto@deploy1001: scap-helm eventgate-analytics upgrade production -f eventgate-analytics-eqiad-values.yaml stable/eventgate-analytics [namespace: eventgate-analytics, clusters: eqiad]
  • 18:39 otto@deploy1001: scap-helm eventgate-analytics cluster codfw completed
  • 18:39 otto@deploy1001: scap-helm eventgate-analytics finished
  • 18:39 otto@deploy1001: scap-helm eventgate-analytics upgrade production -f eventgate-analytics-codfw-values.yaml stable/eventgate-analytics [namespace: eventgate-analytics, clusters: codfw]
  • 18:37 otto@deploy1001: scap-helm eventgate-analytics cluster staging completed
  • 18:37 otto@deploy1001: scap-helm eventgate-analytics finished
  • 18:37 otto@deploy1001: scap-helm eventgate-analytics upgrade staging -f eventgate-analytics-staging-values.yaml stable/eventgate-analytics [namespace: eventgate-analytics, clusters: staging]
  • 18:26 zfilipin@deploy1001: Started scap: testwiki to php-1.33.0-wmf.22 and rebuild l10n cache
  • 16:44 XioNoX: disable lldp on asw2-a-eqiad:ge-8/0/10
  • 16:25 chasemp: mkdir /srv/dumps/xmldatadumps/public/other/rook for T218587 (fyi apergos)
  • 15:55 gehel@cumin1001: END (PASS) - Cookbook sre.elasticsearch.e6-upgrade (exit_code=0)
  • 15:52 gehel@cumin1001: START - Cookbook sre.elasticsearch.e6-upgrade
  • 15:35 gehel@cumin1001: END (FAIL) - Cookbook sre.elasticsearch.e6-upgrade (exit_code=99)
  • 15:35 marostegui@deploy1001: Synchronized wmf-config/db-eqiad.php: Repool db1098:3317 (duration: 00m 50s)
  • 15:33 gehel@cumin1001: START - Cookbook sre.elasticsearch.e6-upgrade
  • 15:24 gehel@cumin1001: END (FAIL) - Cookbook sre.elasticsearch.e6-upgrade (exit_code=99)
  • 15:24 gehel@cumin1001: START - Cookbook sre.elasticsearch.e6-upgrade
  • 15:23 gehel@cumin1001: START - Cookbook sre.elasticsearch.e6-upgrade
  • 15:23 gehel@cumin1001: END (FAIL) - Cookbook sre.elasticsearch.e6-upgrade (exit_code=99)
  • 15:22 bawolff@deploy1001: Synchronized wmf-config/wikitech.php: Adjust account stuff at wikitech 4adc89bce4 (duration: 00m 48s)
  • 15:20 gehel@cumin1001: END (PASS) - Cookbook sre.elasticsearch.e6-upgrade (exit_code=0)
  • 15:20 gehel@cumin1001: START - Cookbook sre.elasticsearch.e6-upgrade
  • 15:10 gehel@cumin1001: END (FAIL) - Cookbook sre.elasticsearch.e6-upgrade (exit_code=99)
  • 15:09 gehel@cumin1001: START - Cookbook sre.elasticsearch.e6-upgrade
  • 15:09 gehel@cumin1001: END (ERROR) - Cookbook sre.elasticsearch.e6-upgrade (exit_code=97)
  • 15:08 gehel@cumin1001: START - Cookbook sre.elasticsearch.e6-upgrade
  • 14:51 marostegui@deploy1001: Synchronized wmf-config/db-eqiad.php: Depool db1098:3317 (duration: 00m 56s)
  • 14:35 zfilipin@deploy1001: clean aborted: Pruned MediaWiki: 1.33.0-wmf.17 (duration: 00m 03s)
  • 14:02 moritzm: rebooting oresrdb2002 for kernel update
  • 13:48 godog: take a snapshot of prometheus data on prometheus1004
  • 13:44 zfilipin@deploy1001: clean aborted: Pruned MediaWiki: 1.33.0-wmf.17 (duration: 00m 05s)
  • 13:37 zfilipin@deploy1001: clean aborted: Pruned MediaWiki: 1.33.0-wmf.17 (duration: 00m 08s)
  • 13:29 otto@deploy1001: scap-helm eventgate-analytics cluster staging completed
  • 13:29 otto@deploy1001: scap-helm eventgate-analytics finished
  • 13:29 otto@deploy1001: scap-helm eventgate-analytics upgrade staging -f eventgate-analytics-staging-values.yaml stable/eventgate-analytics [namespace: eventgate-analytics, clusters: staging]
  • 11:51 akosiaris: re-enable puppet across fleet
  • 11:45 Amir1: EU SWAT is done
  • 11:44 ladsgroup@deploy1001: Synchronized wmf-config/InitialiseSettings.php: SWAT: Add wikimania as a special group to wikidata sitelinks (T217730) (duration: 00m 50s)
  • 11:40 ladsgroup@deploy1001: Synchronized dblists/wikidataclient.dblist: SWAT: Add wikimaniawiki to wikidataclient.dblist (T217730) (duration: 00m 50s)
  • 11:34 ladsgroup@deploy1001: Synchronized wmf-config/InitialiseSettings.php: SWAT: Enable Advanced Mobile Contributions mode for ar,id,es and test wikis (T217643) (duration: 00m 50s)
  • 11:34 akosiaris" disable puppet across fleet to avoid alert spam storm
  • 11:28 ladsgroup@deploy1001: Synchronized wmf-config/InitialiseSettings.php: SWAT: Partially revert "Enable musical notation datatype in wikidata" (T218535) (duration: 00m 50s)
  • 11:16 ladsgroup@deploy1001: Synchronized wmf-config/InitialiseSettings.php: SWAT: Increased maxSerializedEntitySize from 2500 to 3000 (T217739) (duration: 01m 47s)
  • 11:03 akosiaris: restart gerrit for testing https://gerrit.wikimedia.org/r/#/c/operations/puppet/+/497727/
  • 10:28 akosiaris: restart gerrit for merge of https://gerrit.wikimedia.org/r/#/c/operations/puppet/+/497561/
  • 10:26 godog: reimage prometheus1003 with stretch - T205870
  • 10:20 marostegui: Repool dbproxy1010 and running wmcs-wikireplica-dns script
  • 10:12 marostegui: Reboot dbproxy1010 for upgrade
  • 09:45 vgutierrez: updated acme-chief to version 0.14 in acmechief[12]001
  • 09:32 marostegui: Deploy schema change on s7 codfw master, lag will appear on codfw
  • 09:06 marostegui@deploy1001: Synchronized wmf-config/db-eqiad.php: Repool db1087 (duration: 00m 48s)
  • 08:59 marostegui@deploy1001: Synchronized wmf-config/db-eqiad.php: Depool db1087 (duration: 00m 48s)
  • 08:55 filippo@puppetmaster1001: conftool action : set/pooled=no; selector: name=prometheus1003.eqiad.wmnet
  • 08:53 marostegui@deploy1001: Synchronized wmf-config/db-eqiad.php: Repool db1109 (duration: 00m 48s)
  • 08:47 marostegui@deploy1001: Synchronized wmf-config/db-eqiad.php: Depool db1109 (duration: 00m 48s)
  • 08:33 marostegui@deploy1001: Synchronized wmf-config/db-eqiad.php: Repool db1092 (duration: 00m 48s)
  • 08:26 marostegui@deploy1001: Synchronized wmf-config/db-eqiad.php: Depool db1092 (duration: 00m 48s)
  • 08:20 ema: cp2009, cp1071 (cp-ats): reboot for kernel upgrades
  • 07:32 elukey: pool kafka1001 in pybal's eventbus service after yesterday's network maintenance
  • 06:09 marostegui@deploy1001: Synchronized wmf-config/db-eqiad.php: Repool databases in row A - T187960 (duration: 00m 49s)
  • 00:48 thcipriani@deploy1001: Synchronized php-1.33.0-wmf.21/includes/Title.php: SWAT: Improve Caching in Title::loadRestrictions() (duration: 00m 51s)

2019-03-19

  • 22:20 otto@deploy1001: Finished deploy [eventlogging/analytics@9aea626]: fix for production error where mw api is returning html instead of json schemas (duration: 00m 04s)
  • 22:20 otto@deploy1001: Started deploy [eventlogging/analytics@9aea626]: fix for production error where mw api is returning html instead of json schemas
  • 21:50 otto@deploy1001: scap-helm eventgate-analytics finished
  • 21:50 otto@deploy1001; scap-helm eventgate-analytics cluster eqiad completed
  • 21:50 otto@deploy1001: scap-helm eventgate-analytics upgrade production -f eventgate-analytics-eqiad-values.yaml stable/eventgate-analytics [namespace: eventgate-analytics, clusters: eqiad]
  • 21:50 otto@deploy1001: scap-helm eventgate-analytics finished
  • 21:50 otto@deploy1001: scap-helm eventgate-analytics cluster codfw completed
  • 21:50 otto@deploy1001: scap-helm eventgate-analytics upgrade production -f eventgate-analytics-codfw-values.yaml stable/eventgate-analytics [namespace: eventgate-analytics, clusters: codfw]
  • 21:36 otto@deploy1001: scap-helm eventgate-analytics cluster staging completed
  • 21:36 otto@deploy1001: scap-helm eventgate-analytics finished
  • 21:36 otto@deploy1001; scap-helm eventgate-analytics upgrade staging -f eventgate-analytics-staging-values.yaml stable/eventgate-analytics [namespace: eventgate-analytics, clusters: staging]
  • 21:23 otto@deploy1001: scap-helm eventgate-analytics finished
  • 21:23 otto@deploy1001: scap-helm eventgate-analytics cluster staging completed
  • 21:23 otto@deploy1001: scap-helm eventgate-analytics upgrade staging -f eventgate-analytics-staging-values.yaml stable/eventgate-analytics [namespace: eventgate-analytics, clusters: staging]
  • 21:07 cdanis: cdanis@wikitech-static.wikimedia.org: apt install sshguard
  • 21:06 otto@deploy1001: scap-helm eventgate-analytics cluster staging completed
  • 21:06 otto@deploy1001: scap-helm eventgate-analytics finished
  • 21:06 otto@deploy1001: scap-helm eventgate-analytics upgrade staging -f eventgate-analytics-staging-values.yaml stable/eventgate-analytics [namespace: eventgate-analytics, clusters: staging]
  • 20:58 XioNoX: disable down ports with no description on switches
  • 20:44 cdanis: enabling puppet on contint1001
  • 19:54 gehel@cumin1001: END (FAIL) - Cookbook sre.elasticsearch.e6-upgrade (exit_code=99)
  • 19:52 gehel@cumin1001: START - Cookbook sre.elasticsearch.e6-upgrade
  • 19:47 XioNoX: disable asw2-a<->asw-a link
  • 19:44 cdanis: icinga failed over to icinga1001 successfully
  • 19:43 XioNoX: remove forced failover on cr1/cr2-eqiad
  • 19:36 cdanis: failing over icinga to icinga1001
  • 19:35 XioNoX: enable cr2-eqiad:ae1
  • 19:29 ariel@deploy1001: Finished deploy [dumps/dumps@da66149]: move maxretries to config (duration: 00m 03s)
  • 19:29 ariel@deploy1001: Started deploy [dumps/dumps@da66149]: move maxretries to config
  • 19:09 ejegg: updated CiviCRM from a2316be94f to 3bfc7a762e
  • 19:09 gtirloni: rebooted labmon1001
  • 19:02 XioNoX: disable cr2-eqiad:ae1
  • 18:46 XioNoX: failover cr2-eqiad:ae1 VRRP master to cr1
  • 18:17 XioNoX: starting pybal on lvs1002
  • 18:11 XioNoX: stopping pybal on lvs1002
  • 18:09 XioNoX: starting pybal on lvs1001
  • 18:01 XioNoX: stopping pybal on lvs1001
  • 18:01 jijiki: restart pdfrender on scb1003
  • 17:56 XioNoX: shutdown scp1001 for uplink move
  • 17:47 Lucas_WMDE: Updated the Wikidata property suggester with data from last Monday's JSON dump and applied the T132839 workarounds (T216270)
  • 17:33 hasharAway: contint1001 / CI going for a quick scheduled maintenance -network cable being moved-
  • 17:33 mholloway-shell@deploy1001: Finished deploy [mobileapps/deploy@64f09a0]: T150377: Bump wikimedia-page-library to 6.3.0 (duration: 01m 50s)
  • 17:31 mholloway-shell@deploy1001: Started deploy [mobileapps/deploy@64f09a0]: T150377: Bump wikimedia-page-library to 6.3.0
  • 17:30 mdholloway: mobileapps deploy failed for group default3, retrying
  • 17:24 tzatziki: changing email for User:St3f
  • 17:18 mholloway-shell@deploy1001: Finished deploy [mobileapps/deploy@64f09a0]: T150377: Bump wikimedia-page-library to 6.3.0 (duration: 03m 47s)
  • 17:16 addshore: started "foreachwikiindblist wiktionary extensions/Cognate/maintenance/populateCognatePages.php --batch-size 1000" in a screen on mwdebug1002 (catching up cognate after x1 readonly time)
  • 17:14 mholloway-shell@deploy1001: Started deploy [mobileapps/deploy@64f09a0]: T150377: Bump wikimedia-page-library to 6.3.0
  • 16:45 vgutierrez: uploaded acme-chief 0.14 to apt.wikimedia.org (buster) - T218685 T218418 T207295
  • 16:30 elukey: stop eventlogging's mysql kafka consumers on eventlog1002, eventlogging's db replication on db1108 to ease db1107's maintenance
  • 16:29 elukey: stop eventlogging's mysql kafka consumers on eventlog1002, eventlogging's db replication on db1108 to ease db1107's maintenance
  • 16:15 bstorm_: downtimed labstore1003 for network moves so it doesn't page
  • 16:10 otto@deploy1001: scap-helm eventgate-analytics cluster staging completed
  • 16:10 otto@deploy1001: scap-helm eventgate-analytics finished
  • 16:10 otto@deploy1001: scap-helm eventgate-analytics upgrade staging -f eventgate-analytics-staging-values.yaml stable/eventgate-analytics [namespace: eventgate-analytics, clusters: staging]
  • 16:08 ayounsi@puppetmaster1001: conftool action : set/pooled=yes; selector: name=dns1001.wikimedia.org,service=pdns_recursor
  • 16:02 ayounsi@puppetmaster1001: conftool action : set/pooled=no; selector: name=dns1001.wikimedia.org,service=pdns_recursor
  • 16:01 mobrovac@deploy1001: Finished deploy [restbase/deploy@62df8c3]: Update the docs page title; deploy v0.19.3, take #2 (duration: 21m 01s)
  • 15:58 tzatziki: changing password for User:St3f
  • 15:57 XioNoX: enable pybal on lvs1006
  • 15:55 XioNoX; disable pybal on lvs1006
  • 15:54 XioNoX: enable pybal on lvs1005
  • 15:52 XioNoX: disable pybal on lvs1005
  • 15:50 otto@deploy1001: scap-helm eventgate-analytics cluster staging completed
  • 15:50 otto@deploy1001: scap-helm eventgate-analytics finished
  • 15:50 otto@deploy1001: scap-helm eventgate-analytics upgrade staging -f eventgate-analytics-staging-values.yaml stable/eventgate-analytics [namespace: eventgate-analytics, clusters: staging]
  • 15:49 XioNoX: enable pybal on lvs1004
  • 15:45 XioNoX: disable pybal on lvs1004
  • 15:40 mobrovac@deploy1001: Started deploy [restbase/deploy@62df8c3]: Update the docs page title; deploy v0.19.3, take #2
  • 15:40 mobrovac@deploy1001: Finished deploy [restbase/deploy@62df8c3]: Update the docs page title; deploy v0.19.3 (duration: 12m 27s)
  • 15:28 mobrovac@deploy1001: Started deploy [restbase/deploy@62df8c3]: Update the docs page title; deploy v0.19.3
  • 15:27 marostegui@deploy1001: Synchronized wmf-config/db-eqiad.php: Set s2 read only OFF - T187960 (duration: 00m 26s)
  • 15:21 marostegui@deploy1001: Synchronized wmf-config/db-eqiad.php: Set s2 database master on read only - T187960 (duration: 00m 48s)
  • 15:12 XioNoX: eqiad A7 servers uplink move - T187960
  • 14:46 moritzm: rebooting icinga1001 for kernel update
  • 14:44 marostegui@deploy1001: Synchronized wmf-config/db-eqiad.php: Depool databases in row A - T187960 (duration: 00m 48s)
  • 14:41 reedy@deploy1001: Synchronized wmf-config/CommonSettings.php: Reapply I49a18d from gerrit for consistency (duration: 00m 49s)
  • 14:32 otto@deploy1001: scap-helm eventgate-analytics finished
  • 14:32 otto@deploy1001: scap-helm eventgate-analytics cluster eqiad completed
  • 14:32 otto@deploy1001: scap-helm eventgate-analytics upgrade production -f eventgate-analytics-eqiad-values.yaml stable/eventgate-analytics [namespace: eventgate-analytics, clusters: eqiad]
  • 14:32 otto@deploy1001: scap-helm eventgate-analytics cluster codfw completed
  • 14:32 otto@deploy1001: scap-helm eventgate-analytics finished
  • 14:31 otto@deploy1001: scap-helm eventgate-analytics upgrade production -f eventgate-analytics-codfw-values.yaml stable/eventgate-analytics [namespace: eventgate-analytics, clusters: codfw]
  • 14:31 otto@deploy1001: scap-helm eventgate-analytics install -n production -f eventgate-analytics-codfw-values.yaml stable/eventgate-analytics [namespace: eventgate-analytics, clusters: codfw]
  • 14:28 otto@deploy1001: scap-helm eventgate-analytics finished
  • 14:28 otto@deploy1001: scap-helm eventgate-analytics cluster staging completed
  • 14:28 <otto@deploy1001: scap-helm eventgate-analytics upgrade staging -f eventgate-analytics-staging-values.yaml stable/eventgate-analytics [namespace: eventgate-analytics, clusters: staging]
  • 14:10 otto@deploy1001: scap-helm eventgate-analytics cluster staging completed
  • 14:10 otto@deploy1001: scap-helm eventgate-analytics finished
  • 14:10 otto@deploy1001: scap-helm eventgate-analytics upgrade staging -f eventgate-analytics-staging-values.yaml stable/eventgate-analytics [namespace: eventgate-analytics, clusters: staging]
  • 13:19 akosiaris: start zuul/zuul-merger
  • 13:12 akosiaris: unfirewall gerrit, put service back in action
  • 11:31 moritzm: installing php5 security updates
  • 09:08 akosiaris: start nagios-nrpe-server on proton1002, failed due to fork() failed with error 12, bailing out...
  • 07:25 kart_: Finished manual run of unpublished ContentTranslation draft purge script (T218279)
  • 07:20 twentyafterfour@deploy1001: Synchronized wmf-config/CommonSettings.php: Temporarily disable account creation on wikitech (duration: 00m 51s)
  • 06:47 akosiaris: stop zuul and zuul-merger on contint1001
  • 03:45 kart_: Started manual run of unpublished ContentTranslation draft purge script (T218279)
  • 02:12 krinkle@deploy1001: Synchronized php-1.33.0-wmf.21/extensions/EventLogging/includes/ApiJsonSchema.php: If280a4056a (duration: 00m 48s)
  • 02:11 krinkle@deploy1001: Synchronized php-1.33.0-wmf.21/extensions/EventLogging/includes/RemoteSchema.php: If280a4056a (duration: 00m 51s)
  • 00:14 reedy@deploy1001: Synchronized php-1.33.0-wmf.21/tests/phpunit/includes/: Replace wgUser with RequestContext::getUser in User::getBlockedStatus (duration: 01m 00s)
  • 00:12 reedy@deploy1001: Synchronized php-1.33.0-wmf.21/includes/user/User.php: Replace wgUser with RequestContext::getUser in User::getBlockedStatus (duration: 00m 49s)

2019-03-18

  • 23:54 maxsem@deploy1001: Synchronized wmf-config/InitialiseSettings.php: https://gerrit.wikimedia.org/r/#/c/operations/mediawiki-config/+/494551/ (duration: 00m 49s)
  • 23:45 maxsem@deploy1001: Synchronized wmf-config/InitialiseSettings.php: https://gerrit.wikimedia.org/r/#/c/operations/mediawiki-config/+/494551/ (duration: 00m 48s)
  • 23:33 maxsem@deploy1001: Synchronized php-1.33.0-wmf.21/includes/EditPage.php: https://gerrit.wikimedia.org/r/#/c/mediawiki/core/+/497347/ (duration: 00m 49s)
  • 23:25 twentyafterfour: running puppet on phab1001 to get out of degraded state
  • 23:23 XioNoX: renumber Telia transit in eqsin
  • 23:14 maxsem@deploy1001> Synchronized wmf-config/InitialiseSettings.php: https://gerrit.wikimedia.org/r/#/c/operations/mediawiki-config/+/497317/ (duration: 00m 49s)
  • 23:07 maxsem@deploy1001> Synchronized wmf-config/CommonSettings.php: https://gerrit.wikimedia.org/r/#/c/operations/mediawiki-config/+/496515/ (duration: 00m 48s)
  • 22:18 greg-g: gjg@phab1001:~$ sudo /srv/phab/phabricator/bin/auth strip --all-types --user Barras # per request/verification from foks
  • 19:57 bawolff@deploy1001: Synchronized wmf-config/InitialiseSettings.php: Enable block disables login on wikitech (duration: 00m 48s)
  • 19:56 bawolff@deploy1001: Synchronized wmf-config/wikitech.php: Adjust ldap config (duration: 00m 48s)
  • 16:17 volans: restarting pdfrender on scb1003
  • 16:15 volans: restarting pdfrender on scb1004
  • 15:48 jiji@cumin1001: conftool action : set/pooled=no; selector: dc=eqiad,service=cxserver,cluster=scb,name=scb.*
  • 15:45 jijiki: Depool sbc* from serving cxserver on eqiad - T213195
  • 15:06 papaul: shutting down mw2206 for memtest
  • 14:47 gehel@cumin1001: END (FAIL) - Cookbook sre.elasticsearch.e6-upgrade (exit_code=99)
  • 14:46 gehel@cumin1001: START - Cookbook sre.elasticsearch.e6-upgrade
  • 14:13 filippo@puppetmaster1001: conftool action : set/pooled=no; selector: name=prometheus2003.codfw.wmnet
  • 13:42 ema: cp-ats rolling restart to apply proxy.config.cache.ram_cache.size config change T213263
  • 13:23 mvolz@deploy1001: scap-helm citoid finished
  • 13:22 mvolz@deploy1001: scap-helm citoid cluster codfw completed
  • 13:22 mvolz@deploy1001: scap-helm citoid upgrade production -f citoid-codfw-values.yaml stable/citoid [namespace: citoid, clusters: codfw]
  • 13:18 mvolz@deploy1001: scap-helm citoid finished
  • 13:18 mvolz@deploy1001: scap-helm citoid cluster eqiad completed
  • 13:17 mvolz@deploy1001: scap-helm citoid upgrade production -f citoid-eqiad-values.yaml stable/citoid [namespace: citoid, clusters: eqiad]
  • 13:04 arturo: T218022 disable icinga checks for labtestservices2001.wikimedia.org
  • 12:54 arturo: T218025 disable icinga checks for cloudnet2001-dev.codfw.wmnet
  • 12:49 mvolz@deploy1001: scap-helm citoid finished
  • 12:49 mvolz@deploy1001: scap-helm citoid cluster staging completed
  • 12:49 mvolz@deploy1001: scap-helm citoid upgrade staging -f citoid-staging-values.yaml stable/citoid [namespace: citoid, clusters: staging]
  • 12:48 mvolz@deploy1001: scap-helm citoid upgrade staging -f citoid-values-staging.yaml stable/citoid [namespace: citoid, clusters: staging]
  • 11:45 zeljkof: EU SWAT finished
  • 11:45 zfilipin@deploy1001: Synchronized wmf-config/InitialiseSettings.php: SWAT: Enable mobile section editing on bnwiki, hewiki, zh_yuewiki (T218375)|gerrit:496696Enable mobile section editing on bnwiki, hewiki, zh_yuewiki (T218375) (duration: 00m 50s)
  • 10:51 _joe_: testing safety checks for php-fpm on mwdebug2001
  • 10:35 jdrewniak@deploy1001: Synchronized portals: Wikimedia Portals Update: Bumping portals to master (T128546)|gerrit:497261 Bumping portals to master (T128546) (duration: 00m 48s)
  • 10:34 jdrewniak@deploy1001: Synchronized portals/wikipedia.org/assets: Wikimedia Portals Update: Bumping portals to master (T128546)|gerrit:497261 Bumping portals to master (T128546) (duration: 00m 49s)
  • 10:12 vgutierrez: uploaded acme-chief 0.12 to apt.wikimedia.org (buster) - T218543
  • 10:12 volans: restarted irc echo on icinga2001
  • 10:04 _joe_: hot-patching the error in php7.2-fpm config
  • 10:02 volans: running puppet on hosts matching 'C:php::fpm' to apply I004349
  • 10:00 volans: running puppet on failed hosts
  • 09:57 volans: temporarily stop ircecho to avoid spam
  • 09:40 ema: superior-cache-analyzer_3.3.7 uploaded to stretch-wikimedia T213263
  • 09:29 godog: switch to mpm_event for prometheus apache before merging https://gerrit.wikimedia.org/r/c/operations/puppet/+/496750
  • 08:58 vgutierrez: uploaded acme-chief 0.11 to apt.wikimedia.org (buster) - T207295
  • 08:52 moritzm: restarting ferm on sessionstore, was stuck in resolving one of the -a records, which were only merged in a subsequent step (T215883)
  • 08:50 marostegui@deploy1001: Synchronized wmf-config/db-eqiad.php: Repool db1104 (duration: 00m 48s)
  • 08:39 marostegui@deploy1001: Synchronized wmf-config/db-eqiad.php: Depool db1104 (duration: 00m 48s)
  • 08:34 filippo@puppetmaster1001: conftool action : set/pooled=yes; selector: name=prometheus2003.codfw.wmnet
  • 08:32 ema@puppetmaster1001: conftool action : set/pooled=yes; selector: name=cp2002.codfw.wmnet,service=varnish-fe
  • 08:32 ema@puppetmaster1001: conftool action : set/pooled=yes; selector: name=cp2002.codfw.wmnet,service=nginx
  • 08:31 ema: cp2002: repool varnish-fe to resume ATS testing T213263
  • 08:28 marostegui@deploy1001: Synchronized wmf-config/db-eqiad.php: Fully repool db1101 (duration: 00m 48s)
  • 08:22 moritzm: armed keyholder on neodymium
  • 07:49 marostegui@deploy1001: Synchronized wmf-config/db-eqiad.php: More traffic to db1101 (duration: 00m 48s)
  • 07:35 marostegui@deploy1001: Synchronized wmf-config/db-eqiad.php: More traffic to db1101 (duration: 00m 48s)
  • 07:24 marostegui@deploy1001: Synchronized wmf-config/db-eqiad.php: More traffic to db1101 (duration: 00m 48s)
  • 07:16 marostegui@deploy1001: Synchronized wmf-config/db-eqiad.php: Slowly repool db1101 (duration: 00m 49s)
  • 07:02 marostegui: Stop db1101 to upgrade mysql and kernel
  • 07:02 marostegui@deploy1001: Synchronized wmf-config/db-eqiad.php: Depool db1101 (duration: 00m 48s)
  • 06:33 marostegui: Deploy schema change on s8 codfw master (db2045), this will generate lag on s8 codfw
  • 06:30 marostegui@deploy1001: Synchronized wmf-config/db-eqiad.php: Repool db1121 (duration: 00m 48s)
  • 06:08 marostegui: Deploy schema change on x1 master (db1069) with replication - T218397
  • 06:04 marostegui: Deploy schema change on db1121 - lag will appear on labsdb:s4
  • 06:03 marostegui@deploy1001: Synchronized wmf-config/db-eqiad.php: Depool db1121 (duration: 01m 04s)
  • 03:58 kart_: Finished manual run of unpublished ContentTranslation draft purge script (T218279)
  • 02:00 kart_: Started manual run of unpublished ContentTranslation draft purge script (T218279)

2019-03-17

  • 11:51 Amir1: ladsgroup@mwmaint1002:~$ mwscript maintenance/createAndPromote.php --wiki=labswiki --force --sysop Ladsgroup
  • 08:49 elukey: restart pdfrender on scb1004

2019-03-16

  • 10:00 chasemp: stop apache on cobalt for maintenance
  • 00:19 andrewbogott: restarting slapd on seaborgium

2019-03-15

  • 22:37 shdubsh: temporarily stop ircecho on icinga2001
  • 18:00 thcipriani@deploy1001: Synchronized php-1.33.0-wmf.21/extensions/MobileFrontend: SWAT: iOS: Fix mobile editor|gerrit:496827iOS: Fix mobile editor T218069 T218062 T218352 T211490 T218062 T211491 T172877 (duration: 00m 54s)
  • 17:53 ema@puppetmaster1001: conftool action : set/pooled=no; selector: name=cp2002.codfw.wmnet,service=varnish-fe
  • 17:53 ema@puppetmaster1001: conftool action : set/pooled=no; selector: name=cp2002.codfw.wmnet,service=nginx
  • 17:53 ema: depool cp2002's varnish-fe for the weekend T213263#5027366
  • 17:25 arturo: acmechief2001 - armed keyholder
  • 17:22 arturo: cumin2001 - armed keyholder
  • 17:21 andrewbogott: updating puppet compiler facts
  • 17:13 mutante: netmon2001 - armed keyholder for rancid
  • 17:12 mutante: netmon1002 - armed keyholder for rancid
  • 17:04 arturo: arm keyholder in deploy2001
  • 17:03 arturo: arm keyholder in sarin
  • 17:02 arturo: arm keyholder in labpuppetmaster1002
  • 17:01 arturo: arm keyholder in deploy101
  • 17:00 XioNoX: clean up rigel switch port
  • 17:00 arturo: arm keyholder in acmechief1001
  • 16:58 arturo: arming keyholder in cumin1001
  • 16:09 moritzm: upgrading deployment-deploy01 to component/php72
  • 15:59 akosiaris: puppetmaster1001 rm /var/run/confd-template/.citoid*.err to remove old stale confd files that resulted from merging https://gerrit.wikimedia.org/r/494213
  • 15:54 moritzm: rebooting labtestservices2003 for kernel update
  • 15:47 andrewbogott: enabling puppet on seaborgium to apply new acme cert
  • 15:47 moritzm: rebooting labtestservices2002 for kernel update
  • 15:42 moritzm: rebooting labtestcontrol2003 for kernel update
  • 15:38 moritzm: rebooting labtestnet2002 for kernel update
  • 15:11 ema@puppetmaster1001: conftool action : set/pooled=yes; selector: service=ats-be,cluster=cache_upload,name=cp2015.codfw.wmnet
  • 15:10 ema: cp2015: repool ATS with proxy.config.cache.ram_cache.size 1G T213263
  • 15:07 moritzm: rebooting graphite2003 for kernel security update
  • 15:05 ema@puppetmaster1001: conftool action : set/pooled=no; selector: service=ats-be,cluster=cache_upload,name=cp2015.codfw.wmnet
  • 15:04 ema: cp2015: test ATS depool T213263
  • 14:45 mutante: tools tools-sgebastion-07 - dpkg-reconfigure locales and adding ko_KR.EUC-KR for Korean users by request and as done in the past on former tools bastion
  • 14:43 moritzm: rebooting etherpad1001 to pick up SSBD-enabled qemu
  • 14:31 mutante: tools-sgebastion-07 - generating locales for user request in T130532
  • 13:50 moritzm: rolling reboot of ores in codfw for SSBD/L1TF kernel update
  • 13:47 akosiaris@deploy1001: scap-helm cxserver finished
  • 13:47 akosiaris@deploy1001: scap-helm cxserver cluster staging completed
  • 13:47 akosiaris@deploy1001: scap-helm cxserver upgrade -f cxserver-staging-values.yaml staging stable/cxserver [namespace: cxserver, clusters: staging]
  • 11:16 godog: reenable prometheus@k8s on prometheus2004 with mod_proxy connection limits - T217715
  • 10:31 akosiaris: add a 10s bucket to cxserver prometheus-statsd exporter mappings
  • 10:31 akosiaris@deploy1001: scap-helm cxserver finished
  • 10:31 akosiaris@deploy1001: scap-helm cxserver cluster eqiad completed
  • 10:31 akosiaris@deploy1001: scap-helm cxserver upgrade -f cxserver-eqiad-values.yaml production stable/cxserver [namespace: cxserver, clusters: eqiad]
  • 10:31 akosiaris@deploy1001: scap-helm cxserver finished
  • 10:31 akosiaris@deploy1001: scap-helm cxserver cluster codfw completed
  • 10:31 akosiaris@deploy1001: scap-helm cxserver upgrade -f cxserver-codfw-values.yaml production stable/cxserver [namespace: cxserver, clusters: codfw]
  • 10:31 akosiaris@deploy1001: scap-helm cxserver finished
  • 10:31 akosiaris@deploy1001: scap-helm cxserver cluster staging completed
  • 10:31 akosiaris@deploy1001: scap-helm cxserver upgrade -f cxserver-staging-values.yaml staging stable/cxserver [namespace: cxserver, clusters: staging]
  • 10:30 akosiaris@deploy1001: scap-helm cxserver upgrade -f cxserver-staging-values.yaml staging stable/citoid [namespace: cxserver, clusters: staging]
  • 10:03 akosiaris@deploy1001: scap-helm citoid finished
  • 10:03 akosiaris@deploy1001: scap-helm citoid cluster codfw completed
  • 10:03 akosiaris@deploy1001: scap-helm citoid upgrade -f citoid-codfw-values.yaml production stable/citoid [namespace: citoid, clusters: codfw]
  • 10:03 akosiaris@deploy1001: scap-helm citoid finished
  • 10:02 akosiaris@deploy1001: scap-helm citoid cluster eqiad completed
  • 10:02 akosiaris@deploy1001: scap-helm citoid upgrade -f citoid-eqiad-values.yaml production stable/citoid [namespace: citoid, clusters: eqiad]
  • 10:02 akosiaris: add a 10s bucket to citoid prometheus-statsd exporter mappings
  • 10:02 akosiaris: remove prometheus-statsd-exporter from zotero pods
  • 10:02 akosiaris@deploy1001: scap-helm citoid finished
  • 10:02 akosiaris@deploy1001: scap-helm citoid cluster staging completed
  • 10:02 akosiaris@deploy1001: scap-helm citoid upgrade -f citoid-staging-values.yaml staging stable/citoid [namespace: citoid, clusters: staging]
  • 10:01 akosiaris@deploy1001: scap-helm citoid upgrade -f citoid-values-staging.yaml staging stable/citoid [namespace: citoid, clusters: staging]
  • 10:00 akosiaris@deploy1001: scap-helm zotero finished
  • 10:00 akosiaris@deploy1001: scap-helm zotero cluster staging completed
  • 10:00 akosiaris@deploy1001: scap-helm zotero upgrade --install -f zotero-values-staging.yaml staging stable/zotero [namespace: zotero, clusters: staging]
  • 09:58 akosiaris@deploy1001: scap-helm zotero upgrade -f zotero-values-staging.yaml staging stable/zotero [namespace: zotero, clusters: staging]
  • 09:53 akosiaris@deploy1001: scap-helm zotero finished
  • 09:53 akosiaris@deploy1001: scap-helm zotero cluster eqiad completed
  • 09:53 akosiaris@deploy1001: scap-helm zotero upgrade -f zotero-values-eqiad.yaml production stable/zotero [namespace: zotero, clusters: eqiad]
  • 09:53 akosiaris@deploy1001: scap-helm zotero finished
  • 09:53 akosiaris@deploy1001: scap-helm zotero cluster codfw completed
  • 09:52 akosiaris@deploy1001: scap-helm zotero upgrade -f zotero-values-codfw.yaml production stable/zotero [namespace: zotero, clusters: codfw]
  • 09:42 godog: bounce grafana-server on grafana1001
  • 09:32 marostegui@deploy1001: Synchronized wmf-config/db-eqiad.php: Repool db1103 (duration: 00m 50s)
  • 09:28 godog: correction, prometheus2004
  • 09:27 godog: temporarily disable read queries to prometheus@k8s on prometheus2003
  • 09:19 jiji@cumin1001: conftool action : set/weight=12; selector: dc=eqiad,service=cxserver,cluster=scb,name=kubernetes.*
  • 09:18 jiji@cumin1001: conftool action : set/weight=15; selector: dc=codfw,service=cxserver,cluster=scb,name=kubernetes.*
  • 09:17 jijiki: Ramp up cxserver k8s traffic to 50% - T213195
  • 08:57 marostegui@deploy1001: Synchronized wmf-config/db-eqiad.php: Depool db1103 (duration: 00m 50s)
  • 08:52 marostegui@deploy1001: Synchronized wmf-config/db-eqiad.php: Repool db1084 (duration: 00m 47s)
  • 08:22 marostegui@deploy1001: Synchronized wmf-config/db-eqiad.php: Depool db1084 (duration: 00m 49s)
  • 07:28 marostegui@deploy1001: Synchronized wmf-config/db-eqiad.php: Fully repool db1091 (duration: 00m 49s)
  • 07:17 marostegui@deploy1001: Synchronized wmf-config/db-eqiad.php: More traffic to db1091 (duration: 00m 48s)
  • 07:05 marostegui@deploy1001: Synchronized wmf-config/db-eqiad.php: More traffic to db1091 (duration: 00m 48s)
  • 07:01 kart_: Finished manual run of unpublished ContentTranslation draft purge script (T217818)
  • 06:55 marostegui@deploy1001: Synchronized wmf-config/db-eqiad.php: More traffic to db1091 (duration: 00m 48s)
  • 06:44 marostegui@deploy1001: Synchronized wmf-config/db-eqiad.php: Slowly repool db1091 (duration: 00m 48s)
  • 06:04 marostegui: Upgrade db1091
  • 06:04 marostegui@deploy1001: Synchronized wmf-config/db-eqiad.php: Depool db1091 (duration: 00m 50s)
  • 04:01 kart_: Started manual run of unpublished ContentTranslation draft purge script (T217818)
  • 01:25 ejegg: re-enabled ingenico audit parser
  • 01:25 ejegg: updated fundraising CiviCRM from 41efa14fb0 to a2316be94f

2019-03-14

  • 22:54 ejegg: temporarily disabled Ingenico WX audit parsing
  • 22:05 cdanis: cdanis@icinga2001.wikimedia.org ~ % sudo systemctl restart icinga.service
  • 21:58 cdanis: cdanis@icinga2001.wikimedia.org ~ % sudo systemctl restart nsca.service
  • 21:01 crusnov@deploy1001: Finished deploy [netbox/deploy@090a0c3]: Another minor bugfix releaes for ganeti-netbox script (duration: 00m 56s)
  • 21:00 crusnov@deploy1001: Started deploy [netbox/deploy@090a0c3]: Another minor bugfix releaes for ganeti-netbox script
  • 20:26 thcipriani: gerrit live on 2.15.11
  • 20:24 thcipriani: restarting gerrit for 2.15.11
  • 20:23 thcipriani@deploy1001: Finished deploy [gerrit/gerrit@2bc8af0]: Gerrit to 2.15.11 on cobalt (duration: 00m 02s)
  • 20:23 thcipriani@deploy1001: Started deploy [gerrit/gerrit@2bc8af0]: Gerrit to 2.15.11 on cobalt
  • 20:22 thcipriani@deploy1001: Finished deploy [gerrit/gerrit@2bc8af0]: Gerrit to 2.15.11 on gerrit2001 only (duration: 00m 04s)
  • 20:22 thcipriani@deploy1001: Started deploy [gerrit/gerrit@2bc8af0]: Gerrit to 2.15.11 on gerrit2001 only
  • 20:17 ejegg: updated CiviCRM from b4e3cf16cc to 41efa14fb0
  • 20:17 thcipriani: gerrit back to 2.15.8
  • 20:15 thcipriani: restart gerrit on cobalt
  • 20:14 thcipriani@deploy1001: Finished deploy [gerrit/gerrit@2bc8af0]: Revert Gerrit to 2.15.11 on cobalt (duration: 00m 07s)
  • 20:14 thcipriani@deploy1001: Started deploy [gerrit/gerrit@2bc8af0]: Revert Gerrit to 2.15.11 on cobalt
  • 20:14 thcipriani@deploy1001: Finished deploy [gerrit/gerrit@2bc8af0]: Revert Gerrit to 2.15.11 on gerrit2001 only (duration: 00m 10s)
  • 20:13 thcipriani@deploy1001: Started deploy [gerrit/gerrit@2bc8af0]: Revert Gerrit to 2.15.11 on gerrit2001 only
  • 20:13 bstorm_: Placed labstore1006 back in rotation for NFS and rsync
  • 20:11 crusnov@deploy1001: Finished deploy [netbox/deploy@c6cf7d6]: Minor bugfix releaes for ganeti-netbox script (duration: 00m 54s)
  • 20:10 crusnov@deploy1001: Started deploy [netbox/deploy@c6cf7d6]: Minor bugfix releaes for ganeti-netbox script
  • 20:03 jforrester@deploy1001: Synchronized php-1.33.0-wmf.21/extensions/GrowthExperiments/extension.json: Hot-deploy I19414dc31 to fix dependencies on mw.Uri (duration: 00m 49s)
  • 19:37 XioNoX: set protocols bgp group Anycast4 multihop ttl 193 on cr1/2-esams - T209989
  • 19:25 XioNoX: merged Juniper BFD Icinga check
  • 19:12 thcipriani: gerrit back up
  • 19:08 thcipriani: restarting gerrit on cobalt for 2.15.11 upgrade
  • 19:07 thcipriani@deploy1001: Finished deploy [gerrit/gerrit@2bc8af0]: Gerrit to 2.15.11 on cobalt (duration: 00m 11s)
  • 19:07 thcipriani@deploy1001: Started deploy [gerrit/gerrit@2bc8af0]: Gerrit to 2.15.11 on cobalt
  • 19:05 thcipriani@deploy1001: Finished deploy [gerrit/gerrit@2bc8af0]: Gerrit to 2.15.11 on gerrit2001 only (duration: 00m 11s)
  • 19:05 thcipriani@deploy1001: Started deploy [gerrit/gerrit@2bc8af0]: Gerrit to 2.15.11 on gerrit2001 only
  • 19:02 XioNoX: set protocols bgp group Anycast4 multihop ttl 193 on cr1/2-eqiad - T209989
  • 18:53 jforrester@deploy1001: Synchronized php-1.33.0-wmf.21/extensions/ParsoidBatchAPI/includes/ApiParsoidBatch.php: SWAT Another deprecation fix via I4936d0ce03 (duration: 00m 49s)
  • 18:37 XioNoX: set protocols bgp group Anycast4 multihop ttl 190 on cr1-codfw - T209989
  • 18:31 jforrester@deploy1001: Synchronized wmf-config/InitialiseSettings.php: SWAT T216730 Enable musical notation datatype on Wikidata (duration: 00m 48s)
  • 18:29 jforrester@deploy1001: Synchronized php-1.33.0-wmf.21/extensions/GrowthExperiments/modules/help/: SWAT Ib13cf88d GrowthExperiments log fix for closes (duration: 00m 49s)
  • 18:22 jforrester@deploy1001: Synchronized wmf-config/CommonSettings.php: SWAT T217436 Add default user config for rollback confirmation (duration: 00m 48s)
  • 18:13 jforrester@deploy1001: Synchronized wmf-config/InitialiseSettings.php: SWAT T217436 Set up exceptions for rollback confirmation (duration: 00m 49s)
  • 18:08 tzatziki: change email for KStineRowe (WMF) on officewiki, collabwiki, SUL
  • 18:05 mforns@deploy1001: Finished deploy [analytics/aqs/deploy@13203f1]: Deploying AQS for node10 upgrade (duration: 19m 40s)
  • 17:59 jforrester@deploy1001: Synchronized php-1.33.0-wmf.21/extensions/ParsoidBatchAPI/includes/ApiParsoidBatch.php: Hot-deploy I2842dfea to reduce deprecation spam after T206675 deploy of wmf.21 (duration: 00m 49s)
  • 17:45 mforns@deploy1001: Started deploy [analytics/aqs/deploy@13203f1]: Deploying AQS for node10 upgrade
  • 17:43 mforns: Deploying AQS using scap (node10 upgrade)
  • 17:32 arlolra: Updated Parsoid to f3e2209 (T213950)
  • 17:24 arlolra@deploy1001: Finished deploy [parsoid/deploy@8cf4107]: Updating Parsoid to f3e2209 (duration: 07m 09s)
  • 17:17 arlolra@deploy1001: Started deploy [parsoid/deploy@8cf4107]: Updating Parsoid to f3e2209
  • 17:15 jijiki: Pool mw1280 back - T218006
  • 17:12 jijiki: Depool mw2206 - T215415
  • 16:51 otto@deploy1001: scap-helm eventgate-analytics finished
  • 16:51 otto@deploy1001: scap-helm eventgate-analytics cluster staging completed
  • 16:51 otto@deploy1001: scap-helm eventgate-analytics upgrade staging -f eventgate-analytics-staging-values.yaml stable/eventgate-analytics [namespace: eventgate-analytics, clusters: staging]
  • 16:50 crusnov@deploy1001: Finished deploy [netbox/deploy@59430dd]: Deploy Ganeti Sync and Upgrade to upstream v2.5.8 (netmon1002) - T215229 (duration: 00m 50s)
  • 16:49 crusnov@deploy1001: Started deploy [netbox/deploy@59430dd]: Deploy Ganeti Sync and Upgrade to upstream v2.5.8 (netmon1002) - T215229
  • 16:46 crusnov@deploy1001: Finished deploy [netbox/deploy@59430dd]: Deploy Ganeti Sync and Upgrade to upstream v2.5.8 - T215229 (duration: 00m 30s)
  • 16:45 crusnov@deploy1001: Started deploy [netbox/deploy@59430dd]: Deploy Ganeti Sync and Upgrade to upstream v2.5.8 - T215229
  • 16:32 XioNoX: add default deny to mr1-* junos-host policies - T218234
  • 16:30 addshore@deploy1001: Synchronized php-1.33.0-wmf.21/extensions/Wikibase/lib/includes/Store/Sql/TermSqlIndex.php: gerrit:496481 TermSqlIndex, track calls to getTermsOfEntities (duration: 00m 50s)
  • 16:22 otto@deploy1001: scap-helm eventgate-analytics finished
  • 16:22 otto@deploy1001: scap-helm eventgate-analytics cluster staging completed
  • 16:22 otto@deploy1001: scap-helm eventgate-analytics upgrade staging -f eventgate-analytics-staging-values.yaml stable/eventgate-analytics [namespace: eventgate-analytics, clusters: staging]
  • 16:08 arturo: reimaging cloudvirt1015 again
  • 16:04 akosiaris: reboot one final time all sessionstore[12]00[123] servers
  • 16:02 arturo: T216497 drop python-dogpile.cache from jessie-wikimedia/openstack-mitaka-jessie
  • 14:57 marostegui: Start replication on db2070 after testing url_notes
  • 14:53 mutante: analytics-tool1003 - stopping idle screen session
  • 14:43 marostegui: Stop replication on db2070 to test the url_notes (will alert only on IRC)
  • 14:21 otto@deploy1001: scap-helm eventgate-analytics finished
  • 14:21 otto@deploy1001: scap-helm eventgate-analytics cluster eqiad completed
  • 14:21 otto@deploy1001: scap-helm eventgate-analytics upgrade production -f eventgate-analytics-eqiad-values.yaml --set main_app.version=v1.0.3-wmf0 stable/eventgate-analytics [namespace: eventgate-analytics, clusters: eqiad]
  • 14:09 otto@deploy1001: scap-helm eventgate-analytics finished
  • 14:09 otto@deploy1001: scap-helm eventgate-analytics cluster staging completed
  • 14:09 otto@deploy1001: scap-helm eventgate-analytics upgrade staging -f eventgate-analytics-staging-values.yaml stable/eventgate-analytics [namespace: eventgate-analytics, clusters: staging]
  • 13:54 godog: take a snapshot of data on prometheus2004
  • 13:50 arturo: reimaging cloudvirt1015
  • 13:36 marostegui@deploy1001: Synchronized wmf-config/db-eqiad.php: Fully repool db1081 into API (duration: 00m 48s)
  • 13:15 arturo: T216497 drop libpulse0 from jessie-wikimedia/openstack-mtiaka-jessie
  • 13:15 marostegui@deploy1001: Synchronized wmf-config/db-eqiad.php: Slowly repool db1081 into API (duration: 00m 49s)
  • 13:10 arturo: T216497 drop python-mysqldb from jessie-wikimedia/openstack-mtiaka-jessie
  • 13:10 zfilipin@deploy1001: rebuilt and synchronized wikiversions files: all wikis to 1.33.0-wmf.21
  • 12:50 jiji@cumin1001: conftool action : set/pooled=yes; selector: dc=codfw,service=cxserver,cluster=scb,name=kubernetes.*
  • 12:49 jiji@cumin1001: conftool action : set/weight=1; selector: dc=codfw,service=cxserver,cluster=scb,name=kubernetes.*
  • 12:42 jijiki: Rump up k8s cxserver traffic to 8% - T213195
  • 12:22 jiji@cumin1001: conftool action : set/pooled=yes; selector: dc=eqiad,service=cxserver,cluster=scb,name=kubernetes.*
  • 12:21 jiji@cumin1001: conftool action : set/weight=1; selector: dc=eqiad,service=cxserver,cluster=scb,name=kubernetes.*
  • 12:17 jijiki: Send ~4% of cxserver traffic to eqiad k8s - T213195
  • 12:14 zeljkof: EU SWAT finished
  • 12:13 kartik@deploy1001: Synchronized wmf-config: SWAT: gerrit:496418 Revert "Correct the enable context detection configuration" (duration: 00m 56s)
  • 12:12 arturo: T216497 drop some packages from jessie-wikimedia/openstack-mtiaka-jessie: qemu-XXX
  • 12:06 arturo: T216497 drop some packages from jessie-wikimedia/openstack-mtiaka-jessie: libvirt*, librados2, librbd1, because they induce the resolver to conflict with those included in stretch
  • 12:02 kartik@deploy1001: Synchronized wmf-config: SWAT: Revert gerrit:496412 Fix content detection config (duration: 00m 56s)
  • 11:58 kartik@deploy1001: scap failed: average error rate on 3/11 canaries increased by 10x (rerun with --force to override this check, see https://logstash.wikimedia.org/goto/db09a36be5ed3e81155041f7d46ad040 for details)
  • {{safesubst:SAL entry|1=11:45 kartik@deploy1001: Synchronized php-1.33.0-wmf.21/skins/MinervaNeue: SWAT: [[gerrit:496364|Ensure page-actions icons are `display:block` (T218182) (duration: 00m 57s)}}
  • 11:15 kartik@deploy1001: Synchronized wmf-config/InitialiseSettings.php: SWAT: gerrit:493672 Enable ExternalGuidance to all Wikipedias (T216129) (duration: 00m 57s)
  • 10:57 marostegui@deploy1001: Synchronized wmf-config/db-eqiad.php: Slowly repool db1081 (duration: 00m 57s)
  • 10:50 ema@puppetmaster1001: conftool action : set/pooled=yes; selector: name=cp2002.codfw.wmnet,service=varnish-fe
  • 10:50 ema@puppetmaster1001: conftool action : set/pooled=yes; selector: name=cp2002.codfw.wmnet,service=nginx
  • 10:50 ema: cp2002: pool varnish-fe to resume ATS testing T213263
  • 10:44 moritzm: installing libsdl1.2 security updates for jessie
  • 10:21 marostegui@deploy1001: Synchronized wmf-config/db-eqiad.php: Depool db1081 (duration: 00m 58s)
  • 09:54 hashar: ci: live hacked job https://integration.wikimedia.org/ci/job/quibble-vendor-mysql-hhvm-docker/ in attempt to capture 'core' files from hhvm | https://gerrit.wikimedia.org/r/#/c/integration/config/+/496392/ | T216689
  • 09:02 mutante: ms-be2037 - down since a couple hours, no SAL or ticket, powercycling
  • 08:44 marostegui: Deploy schema change on s4 codfw master (db2051), this will generate lag on codfw
  • 08:26 marostegui@deploy1001: Synchronized wmf-config/db-eqiad.php: Fully repool db1088 (duration: 00m 53s)
  • 08:21 marostegui: Upgrade s3 codfw master (db2043) there will be lag on s3 codfw
  • 08:10 marostegui@deploy1001: Synchronized wmf-config/db-eqiad.php: More traffic to db1088 (duration: 00m 55s)
  • 07:54 marostegui@deploy1001: Synchronized wmf-config/db-eqiad.php: Slowly repool db1088 (duration: 00m 55s)
  • 07:48 akosiaris@deploy1001: scap-helm cxserver finished
  • 07:48 akosiaris@deploy1001: scap-helm cxserver cluster codfw completed
  • 07:48 akosiaris@deploy1001: scap-helm cxserver upgrade -f cxserver-codfw-values.yaml production stable/cxserver [namespace: cxserver, clusters: codfw]
  • 07:42 marostegui: Upgrade db1088
  • 07:26 marostegui@deploy1001: Synchronized wmf-config/db-eqiad.php: Depool db1088 (duration: 00m 54s)
  • 07:22 kartik@deploy1001: Finished deploy [cxserver/deploy@3ba57a5]: Update cxserver to b16f4a1 (T212577, T208386) (duration: 03m 50s)
  • 07:21 marostegui@deploy1001: Synchronized wmf-config/db-eqiad.php: Fully repool db1098 (duration: 00m 55s)
  • 07:18 kartik@deploy1001: Started deploy [cxserver/deploy@3ba57a5]: Update cxserver to b16f4a1 (T212577, T208386)
  • 07:16 akosiaris@deploy1001: scap-helm cxserver finished
  • 07:16 akosiaris@deploy1001: scap-helm cxserver cluster codfw completed
  • 07:16 akosiaris@deploy1001: scap-helm cxserver upgrade -f cxserver-codfw-values.yaml production stable/cxserver [namespace: cxserver, clusters: codfw]
  • 07:16 akosiaris@deploy1001: scap-helm cxserver finished
  • 07:16 akosiaris@deploy1001: scap-helm cxserver cluster eqiad completed
  • 07:16 akosiaris@deploy1001: scap-helm cxserver upgrade -f cxserver-eqiad-values.yaml production stable/cxserver [namespace: cxserver, clusters: eqiad]
  • 07:16 akosiaris@deploy1001: scap-helm cxserver finished
  • 07:15 akosiaris@deploy1001: scap-helm cxserver cluster staging completed
  • 07:15 akosiaris@deploy1001: scap-helm cxserver upgrade -f cxserver-staging-values.yaml staging stable/cxserver [namespace: cxserver, clusters: staging]
  • 07:07 marostegui@deploy1001: Synchronized wmf-config/db-eqiad.php: More traffic to db1098 (duration: 00m 55s)
  • 06:52 marostegui@deploy1001: Synchronized wmf-config/db-eqiad.php: More traffic to db1098 (duration: 00m 54s)
  • 06:51 marostegui@deploy1001: Synchronized wmf-config/db-eqiad.php: More traffic to db1097 (duration: 00m 55s)
  • 06:50 marostegui@deploy1001: sync-file aborted: More traffic to db1097 (duration: 00m 00s)
  • 06:46 akosiaris@deploy1001: scap-helm cxserver finished
  • 06:46 akosiaris@deploy1001: scap-helm cxserver cluster staging completed
  • 06:46 akosiaris@deploy1001: scap-helm cxserver upgrade -f cxserver-staging-values.yaml staging /home/akosiaris/deployment-charts/charts/cxserver/ [namespace: cxserver, clusters: staging]
  • 06:40 marostegui: Upgrade mysql on dbstore2002
  • 06:34 marostegui@deploy1001: Synchronized wmf-config/db-eqiad.php: More traffic to db1098:3317 (duration: 00m 55s)
  • 06:21 marostegui@deploy1001: Synchronized wmf-config/db-eqiad.php: Slowly repool db1098:3317 (duration: 00m 55s)
  • 06:08 kart_: Finished manual run of unpublished ContentTranslation draft purge script (T217818)
  • 06:04 marostegui: Upgrade MySQL on db1098
  • 06:04 marostegui@deploy1001: Synchronized wmf-config/db-eqiad.php: Depool db1098 (duration: 00m 56s)
  • 04:01 kart_: Started manual run of unpublished ContentTranslation draft purge script (T217818)
  • 01:39 ejegg: updated fundraising CiviCRM from 5c45e4c24d to b4e3cf16cc

2019-03-13

  • 23:48 catrope@deploy1001: Synchronized php-1.33.0-wmf.21/skins/MinervaNeue/: Remove unnecessary parameter from getHistoryPageAction (duration: 00m 56s)
  • 23:45 catrope@deploy1001: Synchronized wmf-config/WikibaseSearchSettings.php: Fix builder class definition for WBCS (duration: 00m 56s)
  • 23:41 catrope@deploy1001: Synchronized php-1.33.0-wmf.21/extensions/MobileFrontend/: Fix animation when visual section editing enabled on mobile only (T218167) (duration: 00m 58s)
  • 23:39 catrope@deploy1001: Synchronized php-1.33.0-wmf.21/extensions/WikibaseCirrusSearch/: Fix hook return values (duration: 00m 58s)
  • 23:30 catrope@deploy1001: Synchronized php-1.33.0-wmf.21/extensions/GrowthExperiments/: Instrumentation fixes (T217802) (duration: 00m 57s)
  • 22:00 otto@deploy1001: Synchronized wmf-config/InitialiseSettings.php: Disabling api-request logging to eventgate-analytics for group0 wikis until we solve T218268 (duration: 00m 56s)
  • 21:11 otto@deploy1001: scap-helm eventgate-analytics finished
  • 21:11 otto@deploy1001: scap-helm eventgate-analytics cluster eqiad completed
  • 21:11 otto@deploy1001: scap-helm eventgate-analytics upgrade production -f eventgate-analytics-eqiad-values.yaml stable/eventgate-analytics [namespace: eventgate-analytics, clusters: eqiad]
  • 21:10 otto@deploy1001: scap-helm eventgate-analytics finished
  • 21:10 otto@deploy1001: scap-helm eventgate-analytics cluster codfw completed
  • 21:09 otto@deploy1001: scap-helm eventgate-analytics upgrade production -f eventgate-analytics-codfw-values.yaml stable/eventgate-analytics [namespace: eventgate-analytics, clusters: codfw]
  • 20:58 otto@deploy1001: scap-helm eventgate-analytics finished
  • 20:58 otto@deploy1001: scap-helm eventgate-analytics cluster staging completed
  • 20:58 otto@deploy1001: scap-helm eventgate-analytics upgrade staging -f eventgate-analytics-staging-values.yaml stable/eventgate-analytics [namespace: eventgate-analytics, clusters: staging]
  • 20:58 otto@deploy1001: scap-helm eventgate-analytics install -n staging -f eventgate-analytics-staging-values.yaml stable/eventgate-analytics [namespace: eventgate-analytics, clusters: staging]
  • 20:58 otto@deploy1001: scap-helm eventgate-analytics install -n staging -f eventgate-analytics-staging-values.yaml stable/eventgate-analytics [namespace: eventgate-analytics, clusters: eqiad,codfw]
  • 20:35 arlolra@deploy1001: Finished deploy [parsoid/deploy@e2e44bc]: Updating Parsoid to ea80d1b (duration: 06m 38s)
  • 20:28 arlolra@deploy1001: Started deploy [parsoid/deploy@e2e44bc]: Updating Parsoid to ea80d1b
  • 20:25 bsitzmann@deploy1001: Finished deploy [mobileapps/deploy@5f8e4e6]: Update mobileapps to 5865552 (7074964 d6dc3cd fbc6262) (duration: 03m 35s)
  • 20:24 otto@deploy1001: Synchronized wmf-config/InitialiseSettings.php: Disabling api-request logging to eventgate-analytics for group1 wikis to investigate possible outage (duration: 00m 56s)
  • 20:21 bsitzmann@deploy1001: Started deploy [mobileapps/deploy@5f8e4e6]: Update mobileapps to 5865552 (7074964 d6dc3cd fbc6262)
  • 20:14 bsitzmann@deploy1001: Finished deploy [mobileapps/deploy@5f8e4e6]: Update mobileapps to 5865552 (7074964 d6dc3cd fbc6262) (duration: 01m 49s)
  • 20:03 herron: increased index.mapping.total_fields.limit to 1350 on index logstash-2019.03.13
  • 19:46 jijiki: Pooling mw2206 - T215415
  • 19:26 herron: performing rolling restart of eqiad logstash instances
  • 18:51 jijiki: Depool mw1280 and mw2206 to hardware issues - T215415 T218006
  • 18:44 otto@deploy1001: Synchronized wmf-config/InitialiseSettings.php: Enabling api-request logging to eventgate-analytics for group1 wikis (duration: 00m 58s)
  • 18:30 robh: thumbor1004 memtest in progress via T215411
  • 18:29 ema@puppetmaster1001: conftool action : set/pooled=no; selector: name=cp2002.codfw.wmnet,service=varnish-fe
  • 18:29 ema@puppetmaster1001: conftool action : set/pooled=no; selector: name=cp2002.codfw.wmnet,service=nginx
  • 18:28 ema: cp2002: depool varnish-fe after 1 hour ATS experiment T213263
  • 18:09 bstorm_: rebooting labstore1006 T217473
  • 18:07 bstorm_: downtime labstore1006 for troubleshooting T217473
  • 17:57 XioNoX: set interface description on fasw-c-codfw:ge-0/0/47
  • 17:43 XioNoX: s/29073/202425/ on AMS-IX
  • 17:34 XioNoX: add missing sandbox1-b-eqiad interface to ospf(3) passive on cr1/2-eqiad
  • 17:19 ema@puppetmaster1001: conftool action : set/pooled=yes; selector: name=cp2002.codfw.wmnet,service=varnish-fe
  • 17:19 ema@puppetmaster1001: conftool action : set/pooled=yes; selector: name=cp2002.codfw.wmnet,service=nginx
  • 17:18 ema: cp2002: pool varnish-fe for user traffic, routed through ATS backends T213263
  • 17:05 otto@deploy1001: scap-helm eventgate-analytics finished
  • 17:05 otto@deploy1001: scap-helm eventgate-analytics cluster eqiad completed
  • 17:05 otto@deploy1001: scap-helm eventgate-analytics upgrade production -f eventgate-analytics-eqiad-values.yaml stable/eventgate-analytics [namespace: eventgate-analytics, clusters: eqiad]
  • 17:01 otto@deploy1001: scap-helm eventgate-analytics finished
  • 17:01 otto@deploy1001: scap-helm eventgate-analytics cluster codfw completed
  • 17:01 otto@deploy1001: scap-helm eventgate-analytics upgrade production -f eventgate-analytics-codfw-values.yaml stable/eventgate-analytics [namespace: eventgate-analytics, clusters: codfw]
  • 16:59 otto@deploy1001: scap-helm eventgate-analytics finished
  • 16:58 otto@deploy1001: scap-helm eventgate-analytics cluster staging completed
  • 16:58 otto@deploy1001: scap-helm eventgate-analytics upgrade staging -f eventgate-analytics-staging-values.yaml stable/eventgate-analytics [namespace: eventgate-analytics, clusters: staging]
  • 16:56 robh: mw2206.codfw.wmnet is being powered down for firmware update, relying on auto depool function from clean shutdown for mw api server via T215415
  • 16:42 robh: mw2206.codfw.wmnet is being powered down for firmware update, relying on auto depool function from clean shutdown for mw api server via T215415
  • 16:36 addshore: SWAT done
  • 16:36 addshore@deploy1001: Synchronized php-1.33.0-wmf.21/includes/api/ApiMain.php: SWAT: T214080 T212529 ApiMain.php api/request logging event changes gerrit:496197 (duration: 00m 57s)
  • 16:32 akosiaris@deploy1001: scap-helm cxserver finished
  • 16:32 akosiaris@deploy1001: scap-helm cxserver cluster staging completed
  • 16:32 akosiaris@deploy1001: scap-helm cxserver upgrade -f cxserver-staging-values.yaml staging /home/akosiaris/deployment-charts/charts/cxserver/ [namespace: cxserver, clusters: staging]
  • 16:19 akosiaris@deploy1001: scap-helm cxserver finished
  • 16:19 akosiaris@deploy1001: scap-helm cxserver cluster staging completed
  • 16:19 akosiaris@deploy1001: scap-helm cxserver upgrade -f cxserver-staging-values.yaml staging /home/akosiaris/deployment-charts/charts/cxserver/ [namespace: cxserver, clusters: staging]
  • 16:16 akosiaris@deploy1001: scap-helm cxserver finished
  • 16:16 akosiaris@deploy1001: scap-helm cxserver cluster staging completed
  • 16:16 akosiaris@deploy1001: scap-helm cxserver upgrade -f cxserver-staging-values.yaml staging /home/akosiaris/deployment-charts/charts/cxserver/ [namespace: cxserver, clusters: staging]
  • 16:16 akosiaris@deploy1001: scap-helm cxserver finished
  • 16:16 akosiaris@deploy1001: scap-helm cxserver cluster staging completed
  • 16:16 akosiaris@deploy1001: scap-helm cxserver upgrade -f cxserver-staging-values.yaml staging /home/akosiaris/deployment-charts/charts/cxserver/ [namespace: cxserver, clusters: staging]
  • 16:15 jijiki: Depool thumbor1004 to investigate memory issues - T215411
  • 16:04 akosiaris@deploy1001: scap-helm cxserver finished
  • 16:04 akosiaris@deploy1001: scap-helm cxserver cluster eqiad completed
  • 16:04 akosiaris@deploy1001: scap-helm cxserver upgrade -f cxserver-eqiad-values.yaml production stable/cxserver [namespace: cxserver, clusters: eqiad]
  • 16:04 akosiaris@deploy1001: scap-helm cxserver finished
  • 16:04 akosiaris@deploy1001: scap-helm cxserver cluster codfw completed
  • 16:04 akosiaris@deploy1001: scap-helm cxserver upgrade -f cxserver-codfw-values.yaml production stable/cxserver [namespace: cxserver, clusters: codfw]
  • 16:04 akosiaris@deploy1001: scap-helm cxserver finished
  • 16:04 akosiaris@deploy1001: scap-helm cxserver cluster staging completed
  • 16:04 akosiaris@deploy1001: scap-helm cxserver upgrade -f cxserver-staging-values.yaml staging stable/cxserver [namespace: cxserver, clusters: staging]
  • 15:52 akosiaris@deploy1001: scap-helm cxserver finished
  • 15:52 akosiaris@deploy1001: scap-helm cxserver cluster staging completed
  • 15:52 akosiaris@deploy1001: scap-helm cxserver upgrade -f cxserver-staging-values.yaml staging stable/cxserver [namespace: cxserver, clusters: staging]
  • 15:52 akosiaris@deploy1001: scap-helm cxserver finished
  • 15:52 akosiaris@deploy1001: scap-helm cxserver cluster eqiad completed
  • 15:52 akosiaris@deploy1001: scap-helm cxserver upgrade -f cxserver-eqiad-values.yaml production stable/cxserver [namespace: cxserver, clusters: eqiad]
  • 15:52 akosiaris@deploy1001: scap-helm cxserver upgrade -f cxserver-eqiad-values.yaml eqiad stable/cxserver [namespace: cxserver, clusters: eqiad]
  • 15:52 akosiaris@deploy1001: scap-helm cxserver finished
  • 15:52 akosiaris@deploy1001: scap-helm cxserver cluster codfw completed
  • 15:52 akosiaris@deploy1001: scap-helm cxserver upgrade -f cxserver-codfw-values.yaml production stable/cxserver [namespace: cxserver, clusters: codfw]
  • 15:40 akosiaris: do the first deploy of cxserver in eqiad/codfw T213195
  • 15:39 akosiaris@deploy1001: scap-helm cxserver finished
  • 15:39 akosiaris@deploy1001: scap-helm cxserver cluster eqiad completed
  • 15:39 akosiaris@deploy1001: scap-helm cxserver install -n production -f cxserver-eqiad-values.yaml stable/cxserver [namespace: cxserver, clusters: eqiad]
  • 15:39 akosiaris@deploy1001: scap-helm cxserver finished
  • 15:39 akosiaris@deploy1001: scap-helm cxserver cluster codfw completed
  • 15:39 akosiaris@deploy1001: scap-helm cxserver install -n production -f cxserver-codfw-values.yaml stable/cxserver [namespace: cxserver, clusters: codfw]
  • 14:27 ema: cp2002: depool varnish-fe in preparation of pointing it to ATS T213263
  • 14:13 marostegui: Upgrade db2074 (sanitarium master)
  • 13:42 akosiaris: upgrade kubestage to kubernetes 1.11.8
  • 13:42 akosiaris: upgrade neon to kubernetes 1.11.8
  • 13:28 akosiaris: upgrade kubestage1002 to kubernetes 1.11.8
  • 13:24 godog: take a snapshot of prometheus@k8s data on prometheus2004
  • 13:13 zfilipin@deploy1001: Synchronized php: group1 wikis to 1.33.0-wmf.21 (duration: 01m 43s)
  • 13:12 zfilipin@deploy1001: rebuilt and synchronized wikiversions files: group1 wikis to 1.33.0-wmf.21
  • 11:34 marostegui: Test snapshot db1117:3325 to dbstore1001 - T210292
  • 10:55 marostegui: Upgrade db2057
  • 10:00 marostegui@deploy1001: Synchronized wmf-config/db-eqiad.php: Repool db1085 (duration: 00m 56s)
  • 09:52 mutante: ms-be1035 - sudo systemctl reset-failed
  • 09:45 ema: cp1071: upgrade trafficserver to 8.0.3~rc0 for testing purposes
  • 09:41 marostegui: Deploy schema change on db1085 with replication, there will be lag on labsdb:s6
  • 09:41 marostegui@deploy1001: Synchronized wmf-config/db-eqiad.php: Depool db1085 (duration: 00m 55s)
  • 09:06 moritzm: installing PHP 7.0 security updates
  • 08:59 marostegui@deploy1001: Synchronized wmf-config/db-eqiad.php: Repool db1113:3316 (duration: 00m 55s)
  • 08:58 marostegui: Upgrade mysql and kernel on db2050
  • 08:51 ema: cp3030: wipe frontend cache to get rid of large objects T216006
  • 08:26 marostegui@deploy1001: Synchronized wmf-config/db-eqiad.php: Depool db1113:3316 (duration: 00m 55s)
  • 08:14 marostegui@deploy1001: Synchronized wmf-config/db-eqiad.php: Repool db1093 (duration: 00m 55s)
  • 08:09 moritzm: upgrading job runners in eqiad to component/php72 / PHP 7.2.16 (combined with glibc/OpenSSL updates)
  • 07:30 marostegui@deploy1001: Synchronized wmf-config/db-eqiad.php: Depool db1093 (duration: 00m 54s)
  • 07:26 moritzm: upgrading remaining app servers in eqiad to component/php72 / PHP 7.2.16 (combined with glibc/OpenSSL updates)
  • 07:25 marostegui@deploy1001: Synchronized wmf-config/db-eqiad.php: Fully repool db1096 (duration: 00m 58s)
  • 07:13 marostegui: Test snapshot dbstore1001:3311 to dbstore1001 - T210292
  • 07:12 marostegui@deploy1001: Synchronized wmf-config/db-eqiad.php: More traffic to db1096 (duration: 00m 55s)
  • 06:58 marostegui: Upgrade MySQL and kernel on db2036
  • 06:56 marostegui@deploy1001: Synchronized wmf-config/db-eqiad.php: Slowly repool db1096 (duration: 00m 55s)
  • 06:40 marostegui: Stop MySQL on db1096 for upgrade
  • 06:24 kart_: Finished manual run of unpublished ContentTranslation draft purge script (T217818)
  • 06:21 marostegui: Testing snapshotting on db1117:3321 to > dbstore1001 - T210292
  • 06:15 marostegui@deploy1001: Synchronized wmf-config/db-eqiad.php: Depool db1096 (duration: 01m 07s)
  • 04:11 kart_: Started manual run of unpublished ContentTranslation draft purge script (T217818)

2019-03-12

  • 23:33 thcipriani@deploy1001: Synchronized php-1.33.0-wmf.21/extensions/MobileFrontend/includes/specials/SpecialMobileOptions.php: SWAT: Fix: undefined locals in SpecialMobileOptions.setJsConfigVars()|gerrit:495907Fix: undefined locals in SpecialMobileOptions.setJsConfigVars() T218098 (duration: 00m 57s)
  • 20:49 shdubsh: manually upgrade prometheus-icinga-exporter to 0.5 on standby icinga
  • 19:48 eileen: civicrm revision changed from 977b9bfcf1 to 5c45e4c24d, config revision is f930677e97
  • 19:31 herron: restarted citoid on scb1003
  • 19:16 otto@deploy1001: Synchronized wmf-config/InitialiseSettings.php: Enabling api-request logging to eventgate-analytics for group0 wikis (duration: 01m 01s)
  • 19:14 arturo: T216497 manually delete libpam-systemd and libsystemd0 230-7~bpo8+2 from jessie-wikimedia/openstack-mitaka-jessie
  • 19:09 arturo: T216497 manually delete systemd 230-7~bpo8+2 from jessie-wikimedia/openstack-mitaka-jessie
  • 19:07 robh: rebooting thumbor1004 for memory troubleshooting via T215411
  • 17:11 addshore@deploy1001: Synchronized php-1.33.0-wmf.21/extensions/Wikibase/repo/includes/Store/Sql/SqlStore.php: T97368 Increase APC cache for PropertyInfoLookup from 15 to 20s (duration: 00m 55s)
  • 17:10 addshore@deploy1001: Synchronized php-1.33.0-wmf.20/extensions/Wikibase/repo/includes/Store/Sql/SqlStore.php: T97368 Increase APC cache for PropertyInfoLookup from 15 to 20s (duration: 00m 57s)
  • 17:02 jbond42: rolling update of debdeploy
  • 16:57 addshore@deploy1001: Synchronized wmf-config/InitialiseSettings-labs.php: beta only (duration: 00m 53s)
  • 16:43 addshore@deploy1001: Synchronized php-1.33.0-wmf.21/extensions/Wikibase/repo/includes/Store/Sql/SqlStore.php: T97368 Double on server cache for PropertyInfoStore (duration: 00m 55s)
  • 16:42 addshore@deploy1001: Synchronized php-1.33.0-wmf.20/extensions/Wikibase/repo/includes/Store/Sql/SqlStore.php: T97368 Double on server cache for PropertyInfoStore (duration: 00m 57s)
  • 16:29 moritzm: upgraded buster installation image to daily build from 12th of March (T213527)
  • 15:45 otto@deploy1001: scap-helm eventgate-analytics finished
  • 15:45 otto@deploy1001: scap-helm eventgate-analytics cluster codfw completed
  • 15:45 otto@deploy1001: scap-helm eventgate-analytics upgrade production -f eventgate-analytics-codfw-values.yaml stable/eventgate-analytics [namespace: eventgate-analytics, clusters: codfw]
  • 15:43 otto@deploy1001: scap-helm eventgate-analytics finished
  • 15:43 otto@deploy1001: scap-helm eventgate-analytics cluster eqiad completed
  • 15:43 otto@deploy1001: scap-helm eventgate-analytics upgrade production -f eventgate-analytics-eqiad-values.yaml stable/eventgate-analytics [namespace: eventgate-analytics, clusters: eqiad]
  • 15:43 otto@deploy1001: scap-helm eventgate-analytics upgrade production -f eventgate-analytics-eqiad-values.yaml stable/eventgate-analytics [namespace: eventgate-analytics, clusters: eqiad]
  • 15:42 ppchelko@deploy1001: scap-helm eventgate-analytics upgrade production -f eventgate-analytics-eqiad-values.yaml stable/eventgate-analytics [namespace: eventgate-analytics, clusters: eqiad]
  • 15:41 ppchelko@deploy1001: scap-helm eventgate-analytics upgrade -f eventgate-analytics-eqiad-values.yaml stable/eventgate-analytics [namespace: eventgate-analytics, clusters: eqiad]
  • 15:39 ppchelko@deploy1001: scap-helm eventgate-analytics upgrade production -f eventgate-analytics-eqiad-values.yaml stable/eventgate-analytics [namespace: eventgate-analytics, clusters: eqiad]
  • 15:38 ayounsi@puppetmaster1001: conftool action : set/pooled=yes; selector: name=dns2001.wikimedia.org,service=pdns_recursor
  • 15:37 ppchelko@deploy1001: scap-helm eventgate-analytics upgrade production -f eventgate-analytics-eqiad-values.yaml stable/eventgate-analytics [namespace: eventgate-analytics, clusters: eqiad]
  • 15:33 ppchelko@deploy1001: scap-helm eventgate-analytics upgrade production -f eventgate-analytics-eqiad-values.yaml stable/eventgate-analytics [namespace: eventgate-analytics, clusters: eqiad]
  • 15:33 ppchelko@deploy1001: scap-helm eventgate-analytics upgrade production stable/eventgate-analytics -f eventgate-analytics-eqiad-values.yaml [namespace: eventgate-analytics, clusters: eqiad]
  • 15:28 otto@deploy1001: scap-helm eventgate-analytics finished
  • 15:28 otto@deploy1001: scap-helm eventgate-analytics cluster staging completed
  • 15:28 otto@deploy1001: scap-helm eventgate-analytics upgrade staging -f eventgate-analytics-staging-values.yaml stable/eventgate-analytics [namespace: eventgate-analytics, clusters: staging]
  • 15:26 ayounsi@puppetmaster1001: conftool action : set/pooled=no; selector: name=dns2001.wikimedia.org,service=pdns_recursor
  • 15:23 ayounsi@puppetmaster1001: conftool action : set/pooled=no; selector: name=dns2001.wikimedia.org,service=pdns_^Ccursor
  • 15:02 ppchelko@deploy1001: scap-helm eventgate-analytics finished
  • 15:02 ppchelko@deploy1001: scap-helm eventgate-analytics cluster staging completed
  • 15:02 ppchelko@deploy1001: scap-helm eventgate-analytics upgrade staging -f eventgate-analytics-staging-values.yaml stable/eventgate-analytics [namespace: eventgate-analytics, clusters: staging]
  • 15:02 ppchelko@deploy1001: scap-helm eventgate-analytics upgrade -f eventgate-analytics-staging-values.yaml stable/eventgate-analytics [namespace: eventgate-analytics, clusters: staging]
  • 15:00 ppchelko@deploy1001: scap-helm eventgate-analytics upgrade -n staging -f eventgate-analytics-staging-values.yaml stable/eventgate-analytics [namespace: eventgate-analytics, clusters: staging]
  • 14:26 mutante: phab1002 - reboot
  • 13:43 marostegui: Upgrade MySQL and kernel on db2094 (inactive sanitarium)
  • 13:27 marostegui: Deploy schema change on s6 codfw, lag will be generated on s6 codfw
  • 13:24 zfilipin@deploy1001: rebuilt and synchronized wikiversions files: group0 to 1.33.0-wmf.21
  • 12:41 arturo: T215605 include python-mwclient .deb in openstack-mitaka-jessie/jessie-wikimedia in install1002
  • 12:23 jynus: testing snapshotting on db1117:3325 -> dbstore1001 T210292
  • 12:23 zfilipin@deploy1001: Finished scap: testwiki to php-1.33.0-wmf.21 and rebuild l10n cache (duration: 34m 25s)
  • 12:09 moritzm: upgrading mw1238-mw1258 to component/php72 / PHP 7.2.16 (combined with glibc/OpenSSL updates)
  • 11:59 mutante: analytics-tool1004 - start superset service
  • 11:48 zfilipin@deploy1001: Started scap: testwiki to php-1.33.0-wmf.21 and rebuild l10n cache
  • 11:47 zfilipin@deploy1001: Pruned MediaWiki: 1.33.0-wmf.17 [keeping static files] (duration: 01m 40s)
  • 11:45 zfilipin@deploy1001: Pruned MediaWiki: 1.33.0-wmf.18 [keeping static files] (duration: 01m 35s)
  • 11:42 arturo: T215605 include python-oath .deb in stretch-wikimedia thirdparty/oath
  • 11:41 zfilipin@deploy1001: Pruned MediaWiki: 1.33.0-wmf.16 (duration: 12m 41s)
  • 11:39 elukey: raise mysql's max_user_connection to 1000 for the Analytics user on labsdb1012
  • 11:36 ema@puppetmaster1001: conftool action : set/pooled=yes; selector: name=cp1077.eqiad.wmnet
  • 11:36 ema: cp1077: repool varnish-be after service restart T217893
  • 11:35 arturo: delete wrong stretch-wikimedia `thirdparty` component in install1002
  • 11:12 zeljkof: EU SWAT finished
  • 11:12 kartik@deploy1001: Synchronized wmf-config/InitialiseSettings.php: SWAT: gerrit:495842 Add campaign prefix for EG tag (T216123) (duration: 00m 49s)
  • 11:11 moritzm: upgrading API servers/job runners servers in eqiad to component/php72 / PHP 7.2.16 (combined with glibc/OpenSSL updates) (T216712)
  • 10:32 marostegui: Deploy schema change on db1082, lag will happen on s5 on labs
  • 10:29 gtirloni: re-enabled puppet on serpens and seaborgium
  • 10:19 gtirloni: updated slapd to version 2.4.47 on seaborgium (T217280)
  • 10:17 moritzm: upgrading API servers/job runners servers in codfw to component/php72 / PHP 7.2.16 (combined with glibc/OpenSSL updates) (T216712)
  • 10:14 gtirloni: upgrading seaborgium to slapd 2.4.47
  • 09:39 jynus: stop db1114 and restart it empty
  • 09:25 marostegui@deploy1001: Synchronized wmf-config/db-eqiad.php: Repool db1082 (duration: 00m 48s)
  • 08:57 elukey: restart memcached on mc1019 to apply new settings - T217731
  • 08:50 ema: cp1077 depooled again T217893
  • 08:49 ema@puppetmaster1001: conftool action : set/pooled=no; selector: name=cp1077.eqiad.wmnet
  • 08:48 moritzm: upgrading app servers in codfw to component/php72 / PHP 7.2.16 (combined with glibc/OpenSSL updates) (T216712)
  • 08:48 ema: restart varnish-be on cp1077 T217893
  • 08:47 moritzm: upgrading app servers in codfw to component/php72 / PHP 7.2.16 (combined with glibc/OpenSSL updates)
  • 08:46 ema: cp1077 repooled T217893
  • 08:46 ema@puppetmaster1001: conftool action : set/pooled=yes; selector: name=cp1077.eqiad.wmnet
  • 08:36 marostegui@deploy1001: Synchronized wmf-config/db-eqiad.php: Depool db1082 for schema change (duration: 00m 48s)
  • 08:34 jynus: deploy core replica events to db1118
  • 08:15 ema: cp1099: ferm.service failed to resolve prometheus1003.eqiad.wmnet. ferm restarted T202966
  • 07:18 marostegui: Deploy schema change on db2052 (s5 codfw master), this will generate lag on codfw T71127 T51199
  • 07:17 marostegui@deploy1001: Synchronized wmf-config/db-eqiad.php: Repool db1113 after schema change and upgrade (duration: 00m 49s)
  • 07:09 marostegui: Upgrade mysql and kernel on db1113
  • 06:40 kart_: Finished manual run of unpublished ContentTranslation draft purge script (T217818)
  • 06:17 marostegui@deploy1001: Synchronized wmf-config/db-eqiad.php: Depool db1113 for schema change and upgrade (duration: 00m 50s)
  • 04:04 kart_: Started manual run of unpublished ContentTranslation draft purge script (T217818)
  • 02:40 ejegg: updated payments-wiki from f1a89d7045 to 7a312e371a

2019-03-11

  • 17:55 addshore@deploy1001: Synchronized wmf-config/interwiki-labs.php: BETA ONLY https://gerrit.wikimedia.org/r/#/c/operations/mediawiki-config/+/495723/ (duration: 00m 48s)
  • 17:43 addshore@deploy1001: Synchronized wmf-config/InitialiseSettings-labs.php: BETA ONLY https://gerrit.wikimedia.org/r/#/c/operations/mediawiki-config/+/495721/ (duration: 00m 49s)
  • 17:23 arturo: T215605 copy python-oath from jessie-wikimedia/thirdparty to stretch-wikimedia/thirdpary in reprepro
  • 17:03 jbond@cumin1001: END (FAIL) - Cookbook sre.hosts.ipmi-password-reset (exit_code=99)
  • 17:02 jbond@cumin1001: START - Cookbook sre.hosts.ipmi-password-reset
  • 16:31 jbond@cumin1001: END (FAIL) - Cookbook sre.hosts.ipmi-password-reset (exit_code=99)
  • 16:31 jbond@cumin1001: START - Cookbook sre.hosts.ipmi-password-reset
  • 15:26 marostegui@deploy1001: Synchronized wmf-config/db-eqiad.php: Fully repool db1097 (duration: 00m 48s)
  • 15:16 mlitn@deploy1001: Synchronized wmf-config/InitialiseSettings-labs.php: Fix syntax for MediaInfo depicts config (beta only) (duration: 00m 49s)
  • 14:55 marostegui@deploy1001: Synchronized wmf-config/db-eqiad.php: More traffic to db1097 (duration: 00m 49s)
  • 14:43 moritzm: upgrading mw canaries to PHP 7.2.16
  • 14:32 marostegui@deploy1001: Synchronized wmf-config/db-eqiad.php: More traffic to db1097 (duration: 00m 48s)
  • 14:25 hashar: contint1001: stopping zuul-merger (it is cpu or IO starving the server)
  • 14:21 moritzm: upgrading mwdebug servers to PHP 7.2.16
  • 14:11 marostegui@deploy1001: Synchronized wmf-config/db-eqiad.php: Slowly repool db1097 (duration: 00m 47s)
  • 14:09 moritzm: importing build of PHP 7.2.16 for component/php72 (T216712)
  • 13:58 marostegui: Upgrade mysql on db1097
  • 13:28 arturo: disable active checks in icinga for labtestvirt200[12] (T218023)
  • 13:04 moritzm: upgrading mwdebug2002 to php 7.2.16
  • 12:23 gtirloni: updated slapd to version 2.4.47 on serpens (T217280)
  • 12:05 gtirloni: updating slapd on serpens/codfw to test possible fix for memory leaks
  • 10:48 marostegui@deploy1001: Synchronized wmf-config/db-eqiad.php: Depool db1097 for upgrade and schema change (duration: 00m 48s)
  • 10:44 jdrewniak@deploy1001: Synchronized portals: Wikimedia Portals Update: Bumping portals to master (T128546)|gerrit:495650 Bumping portals to master (T128546) (duration: 00m 49s)
  • 10:44 jdrewniak@deploy1001: Synchronized portals/wikipedia.org/assets: Wikimedia Portals Update: Bumping portals to master (T128546)|gerrit:495650 Bumping portals to master (T128546) (duration: 00m 49s)
  • 09:56 moritzm: installing chromium security updates on remaining proton hosts
  • 09:44 moritzm: installing chromium security updates on proton1001
  • 09:44 elukey: roll restart of aqs on aqs100* to pick up new druid settings
  • 08:02 marostegui: Upgrade pc1010 (spare)
  • 07:33 marostegui@deploy1001: Synchronized wmf-config/db-eqiad.php: Fully repool db1099 after upgrade (duration: 00m 48s)
  • 07:32 marostegui: Upgrade MySQL and kernel on pc2010 (spare)
  • 07:13 marostegui@deploy1001: Synchronized wmf-config/db-eqiad.php: Give more traffic to db1099 after upgrade (duration: 00m 48s)
  • 06:58 marostegui@deploy1001: Synchronized wmf-config/db-eqiad.php: Give more traffic to db1099 after upgrade (duration: 00m 48s)
  • 06:49 marostegui@deploy1001: Synchronized wmf-config/db-eqiad.php: Slowly repool db1099 after upgrade (duration: 00m 52s)
  • 06:38 kart_: Finished manual run of unpublished ContentTranslation draft purge script (T217818)
  • 06:37 marostegui: Power cycle mw1280 - server down
  • 06:35 marostegui: Upgrade mysql and kernel on db1099
  • 06:34 marostegui@deploy1001: Synchronized wmf-config/db-eqiad.php: Depool db1099 for upgrade (duration: 03m 01s)
  • 06:03 effie: Restarting pdfrender on scb1003
  • 06:02 marostegui: Upgrade MySQL on dbstore1004 (s2, s3, s4)
  • 04:01 kart_: Started manual run of unpublished ContentTranslation draft purge script (T217818)
  • 03:30 kartik@deploy1001: Finished deploy [cxserver/deploy@101bebd]: Update cxserver to 5a26308 (T216044, T217878) (duration: 04m 01s)
  • 03:26 kartik@deploy1001: Started deploy [cxserver/deploy@101bebd]: Update cxserver to 5a26308 (T216044, T217878)

2019-03-10

  • 22:35 gtirloni: toolforge stretch: increased nscd group TTL from 60 to 300sec (T217280)
  • 07:14 _joe_: restarting pdfrender on scb1004

2019-03-08

  • 19:25 reedy@deploy1001: Synchronized wmf-config/InitialiseSettings-labs.php: beta only (duration: 00m 50s)
  • 19:21 moritzm: installing php updates on netmon1002
  • 18:20 reedy@deploy1001: Synchronized wmf-config/InitialiseSettings-labs.php: beta only (duration: 00m 49s)
  • 17:30 robh: decom in progress for rdb100[123478] via T209181
  • 16:48 mbsantos@deploy1001: Finished deploy [kartotherian/deploy@acf2694] (stretch): UBN geoshapes services on maps1004.eqiad.wmnet (T217898) (duration: 00m 22s)
  • 16:47 mbsantos@deploy1001: Started deploy [kartotherian/deploy@acf2694] (stretch): UBN geoshapes services on maps1004.eqiad.wmnet (T217898)
  • 16:23 mbsantos@deploy1001: Finished deploy [kartotherian/deploy@cc302de] (stretch): UBN geoshapes services on maps2004.codfw.wmnet (T217898) (duration: 00m 24s)
  • 16:22 mbsantos@deploy1001: Started deploy [kartotherian/deploy@cc302de] (stretch): UBN geoshapes services on maps2004.codfw.wmnet (T217898)
  • 16:19 mbsantos@deploy1001: Finished deploy [kartotherian/deploy@d71df87] (stretch): UBN geoshapes services (T217898) (duration: 02m 00s)
  • 16:17 mbsantos@deploy1001: Started deploy [kartotherian/deploy@d71df87] (stretch): UBN geoshapes services (T217898)
  • 15:45 papaul: OS install on restbase2019 and restbase2020
  • 15:30 gilles@deploy1001: Finished deploy [performance/coal@8766469]: (no justification provided) (duration: 00m 06s)
  • 15:30 gilles@deploy1001: Started deploy [performance/coal@8766469]: (no justification provided)
  • 14:34 arturo: T215605 add prometheus-rabbitmq-exporter v0.4 to stretch-wikimedia
  • 14:16 gilles@deploy1001: Finished deploy [performance/navtiming@f2d8a5f]: (no justification provided) (duration: 00m 05s)
  • 14:15 gilles@deploy1001: Started deploy [performance/navtiming@f2d8a5f]: (no justification provided)
  • 13:09 vgutierrez@puppetmaster1001: conftool action : set/pooled=no; selector: name=cp1077.eqiad.wmnet
  • 12:47 akosiaris: depooling cp1077 just in case, high mailbox lag https://grafana.wikimedia.org/d/000000352/varnish-failed-fetches?orgId=1&from=now-3h&to=now&var-datasource=eqiad%20prometheus%2Fops&var-cache_type=text&var-server=All&var-layer=backend&panelId=13&fullscreen
  • 12:47 akosiaris@puppetmaster1001: conftool action : set/pooled=yes; selector: name=cp1077.*
  • 12:07 jbond42: rolling security updates of slite3 on jessie and trusty
  • 11:07 moritzm: uploaded tideways 4.0.7-1+wmf1 for component/php72 (T216712)
  • 10:34 marostegui@deploy1001: Synchronized wmf-config/db-eqiad.php: Repool db1080, db1110 (duration: 00m 49s)
  • 10:14 marostegui: Reload haproxy on dbproxy1011 to repool labsdb1009
  • 09:51 mutante: temp disabling puppet on icinga to debug an issue with elastic checks
  • 09:39 marostegui@deploy1001: Synchronized wmf-config/db-eqiad.php: Depool db1080, db1110 (duration: 00m 49s)
  • 09:15 marostegui@deploy1001: Synchronized wmf-config/db-eqiad.php: Repool db1099:3311,db1096:3315 (duration: 00m 49s)
  • 08:37 marostegui: Reload haproxy on dbproxy1011 to depool labsdb1009
  • 08:31 marostegui: Reload haproxy on dbproxy1010 to repool labsdb1011
  • 08:20 marostegui@deploy1001: Synchronized wmf-config/db-eqiad.php: Depool db1099:3311,db1096:3315 (duration: 00m 48s)
  • 08:11 marostegui@deploy1001: Synchronized wmf-config/db-eqiad.php: Fully repool db1076 (duration: 00m 48s)
  • 07:59 elukey@deploy1001: Finished deploy [analytics/superset/deploy@UNKNOWN]: Test deployment for Buster (duration: 00m 40s)
  • 07:58 elukey@deploy1001: Started deploy [analytics/superset/deploy@UNKNOWN]: Test deployment for Buster
  • 07:57 elukey@deploy1001: Finished deploy [analytics/superset/deploy@UNKNOWN]: Test deployment for Buster (duration: 00m 02s)
  • 07:57 elukey@deploy1001: Started deploy [analytics/superset/deploy@UNKNOWN]: Test deployment for Buster
  • 07:52 elukey@deploy1001: Finished deploy [analytics/superset/deploy@UNKNOWN]: Test deployment for Buster (duration: 01m 18s)
  • 07:52 marostegui@deploy1001: Synchronized wmf-config/db-eqiad.php: Repool db1077 (duration: 00m 48s)
  • 07:51 elukey@deploy1001: Started deploy [analytics/superset/deploy@UNKNOWN]: Test deployment for Buster
  • 07:49 marostegui@deploy1001: Synchronized wmf-config/db-eqiad.php: More traffic to db1076 after mysql upgrade (duration: 00m 49s)
  • 07:35 elukey@deploy1001: Finished deploy [analytics/superset/deploy@UNKNOWN]: Test deployment for Buster (duration: 00m 30s)
  • 07:34 elukey@deploy1001: Started deploy [analytics/superset/deploy@UNKNOWN]: Test deployment for Buster
  • 07:22 marostegui@deploy1001: Synchronized wmf-config/db-eqiad.php: Slowly repool db1076 after mysql upgrade (duration: 00m 49s)
  • 07:13 marostegui@deploy1001: Synchronized wmf-config/db-eqiad.php: Repool db1076 into API after mysql upgrade (duration: 00m 48s)
  • 07:04 marostegui@deploy1001: Synchronized wmf-config/db-eqiad.php: Slowly repool db1076 after mysql upgrade (duration: 00m 48s)
  • 06:53 marostegui: Stop MySQL on db1076 for upgrade
  • 06:52 marostegui@deploy1001: Synchronized wmf-config/db-eqiad.php: Depool db1076 for mysql upgrade (duration: 00m 49s)
  • 06:22 marostegui: Deploy schema change on s3 db1077 with replication (lag will happen on s3 labs)
  • 06:21 marostegui: Stop replication on s3 on labsdb1009 and labsdb1011
  • 06:20 marostegui: Reload haproxy on dbproxy1010 to depool labsdb1010
  • 06:18 marostegui@deploy1001: Synchronized wmf-config/db-eqiad.php: Depool db1077 (duration: 00m 51s)
  • 00:23 thcipriani@deploy1001: Synchronized php-1.33.0-wmf.20/skins/MinervaNeue/resources/skins.minerva.scripts/toc.js: SWAT: Passing page parameter to TOC toggler|gerrit:495021Passing page parameter to TOC toggler T217820 (duration: 00m 50s)
  • 00:16 thcipriani@deploy1001: Synchronized wmf-config/InitialiseSettings-labs.php: SWAT: Cleanup beta cluster config|gerrit:495024Cleanup beta cluster config T213599; Enable advanced mobile contributions mode on beta cluster|gerrit:495023Enable advanced mobile contributions mode on beta cluster beta-only (noop) sync (duration: 00m 49s)
  • 00:01 ayounsi@puppetmaster1001: conftool action : set/pooled=yes; selector: name=dns2001.wikimedia.org,service=pdns_recursor

2019-03-07

  • 23:53 XioNoX: set net.ipv4.ip_local_port_range="32768 60999" on dns2001 and repool server - T209989
  • 23:46 XioNoX: set net.ipv4.ip_local_port_range="49152 65535" on dns2001 - T209989
  • 23:43 ayounsi@puppetmaster1001: conftool action : set/pooled=no; selector: name=dns2001.wikimedia.org,service=pdns_recursor
  • 23:40 XioNoX: depool dns2001 - T209989
  • 20:44 XioNoX: explicitely disable sampling on non eqiad routers
  • 20:42 thcipriani: restarting gerrit on cobalt for 2.15.11 rollback
  • 20:42 thcipriani@deploy1001: Finished deploy [gerrit/gerrit@5800deb]: Revert "Gerrit to 2.15.11" on cobalt (production) (duration: 00m 07s)
  • 20:41 thcipriani@deploy1001: Started deploy [gerrit/gerrit@5800deb]: Revert "Gerrit to 2.15.11" on cobalt (production)
  • 20:40 thcipriani@deploy1001: Finished deploy [gerrit/gerrit@5800deb]: Revert "Gerrit to 2.15.11" on gerrit2001 only (duration: 00m 10s)
  • 20:40 thcipriani@deploy1001: Started deploy [gerrit/gerrit@5800deb]: Revert "Gerrit to 2.15.11" on gerrit2001 only
  • 20:10 thcipriani: restarting gerrit on cobalt for 2.15.11 upgrade
  • 20:10 thcipriani@deploy1001: Finished deploy [gerrit/gerrit@5800deb]: Gerrit to 2.15.11 on cobalt (production) (duration: 00m 11s)
  • 20:09 thcipriani@deploy1001: Started deploy [gerrit/gerrit@5800deb]: Gerrit to 2.15.11 on cobalt (production)
  • 20:07 thcipriani@deploy1001: Finished deploy [gerrit/gerrit@5800deb]: Gerrit to 2.15.11 on gerrit2001 only (duration: 00m 12s)
  • 20:07 thcipriani@deploy1001: Started deploy [gerrit/gerrit@5800deb]: Gerrit to 2.15.11 on gerrit2001 only
  • 19:33 gilles@deploy1001: Synchronized wmf-config/InitialiseSettings.php: T216499 Enable Priority Hints origin trial on ruwiki (duration: 00m 48s)
  • 19:22 niharika29@deploy1001: Synchronized wmf-config/InitialiseSettings.php: Grant 'reupload-shared' to mediawiki uploaders and fix T217523 (duration: 00m 49s)
  • 19:12 niharika29@deploy1001: Synchronized wmf-config/InitialiseSettings.php: Enable Partial Blocks on Arabic Wikipedia T217283 (duration: 00m 50s)
  • 19:04 arlolra: Updated Parsoid to d4e76d5 (T202905)
  • 18:56 arlolra@deploy1001: Finished deploy [parsoid/deploy@766a920]: Updating Parsoid to d4e76d5 (duration: 05m 01s)
  • 18:51 arlolra@deploy1001: Started deploy [parsoid/deploy@766a920]: Updating Parsoid to d4e76d5
  • 18:39 gehel@puppetmaster1001: conftool action : set/pooled=yes; selector: dc=codfw,cluster=maps,name=maps2004.codfw.wmnet
  • 18:32 mbsantos@deploy1001: Finished deploy [kartotherian/deploy@248b8c4] (stretch): Updating eqiad cluster before repool of maps2004.codfw.wmnet (duration: 01m 25s)
  • 18:30 mbsantos@deploy1001: Started deploy [kartotherian/deploy@248b8c4] (stretch): Updating eqiad cluster before repool of maps2004.codfw.wmnet
  • 18:30 mbsantos@deploy1001: Finished deploy [tilerator/deploy@fac7e5e] (stretch): Updating eqiad cluster before repool of maps2004.codfw.wmnet (duration: 03m 46s)
  • 18:26 mbsantos@deploy1001: Started deploy [tilerator/deploy@fac7e5e] (stretch): Updating eqiad cluster before repool of maps2004.codfw.wmnet
  • 18:25 gehel: cleaning kernel-proposed-updates component on reprepro (install1002)
  • 18:15 XioNoX: disable asw2-c-eqiad <-> asw-c-eqiad link - T208734
  • 17:55 gehel: rolling upgrade of kibana on logstash clusters completed - T216052
  • 17:48 gehel: rolling upgrade of kibana on logstash clusters - T216052
  • 17:44 gehel: rolling upgrade of logstash on logstash clusters completed - T216052
  • 17:36 gehel: rolling upgrade of logstash on logstash clusters - T216052
  • 17:34 gehel@deploy1001: Finished deploy [logstash/plugins@7c4c5ea]: upgrade logstash plugins to 5.6.14 - T216052 (duration: 00m 07s)
  • 17:34 gehel@deploy1001: Started deploy [logstash/plugins@7c4c5ea]: upgrade logstash plugins to 5.6.14 - T216052
  • 17:34 gehel@deploy1001: Finished deploy [logstash/plugins@7c4c5ea]: upgrade logstash plugins to 5.6.14 - T216052 (duration: 00m 08s)
  • 17:33 gehel@deploy1001: Started deploy [logstash/plugins@7c4c5ea]: upgrade logstash plugins to 5.6.14 - T216052
  • 17:16 gehel: rolling upgrade of elasticsearch on logstash clusters completed - T216052
  • 17:09 ariel@deploy1001: Finished deploy [dumps/dumps@3e25558]: fix broken page-content job retries (duration: 00m 04s)
  • 17:09 ariel@deploy1001: Started deploy [dumps/dumps@3e25558]: fix broken page-content job retries
  • 16:54 cmjohnson1: powering off cp1099 to move to different rack T202966
  • 15:26 gehel: rolling upgrade of elasticsearch on logstash clusters - T216052
  • 14:54 hashar: 1.33.0-wmf.20 seems all good
  • 14:46 marostegui: Reload haproxy on dbproxy1011 to repool labsdb1009
  • 14:15 hashar@deploy1001: rebuilt and synchronized wikiversions files: all wikis to 1.33.0-wmf.20
  • 13:47 mutante: phab1002 - removing all php-7.2 packages and letting puppet reinstall them after component change
  • 13:42 marostegui@deploy1001: Synchronized wmf-config/db-eqiad.php: Fully repool db1075 after schema change and mysql upgrade (duration: 00m 55s)
  • 13:41 marostegui: Stop mysql on labsdb1009 for upgrade (this will trigger an haproxy IRC alert)
  • 13:39 marostegui: Reload haproxy on dbproxy1010 to depool labsdb1009
  • 13:27 marostegui@deploy1001: Synchronized wmf-config/db-eqiad.php: More traffic to db1075 after schema change and mysql upgrade (duration: 00m 52s)
  • 12:59 zeljkof: EU SWAT finished
  • 12:56 gtirloni: re-enabled puppet on seaborgium/serpens
  • 12:55 zfilipin@deploy1001: Synchronized wmf-config/InitialiseSettings.php: SWAT: Enable musical notation datatype on testwikidatawiki (T216730)|gerrit:493010Enable musical notation datatype on testwikidatawiki (T216730) (duration: 00m 56s)
  • 12:42 ariel@deploy1001: Finished deploy [dumps/dumps@3a25aa0]: handle failed xml content jobs correctly (fix regression) (duration: 00m 05s)
  • 12:42 ariel@deploy1001: Started deploy [dumps/dumps@3a25aa0]: handle failed xml content jobs correctly (fix regression)
  • 12:41 zfilipin@deploy1001: Synchronized wmf-config/InitialiseSettings.php: SWAT: Create an uploader group on mediawiki.org (T217523)|gerrit:494225Create an uploader group on mediawiki.org (T217523) (duration: 00m 55s)
  • 12:34 zfilipin@deploy1001: Synchronized dblists/commonsuploads.dblist: SWAT: Restrict local uploads on mediawiki.org, take 2 (T217523)|gerrit:494806Restrict local uploads on mediawiki.org, take 2 (T217523) (duration: 00m 56s)
  • 12:24 kartik@deploy1001: Synchronized wmf-config/InitialiseSettings.php: SWAT: gerrit:492447 Restore bureaucrat rights on hi.wiktionary to default () (duration: 00m 56s)
  • 12:08 kartik@deploy1001: Synchronized wmf-config/InitialiseSettings.php: SWAT: gerrit:494477 Enable edittag for ExternalGuidance in CX and VE (T216123) (duration: 00m 57s)
  • 12:00 marostegui@deploy1001: Synchronized wmf-config/db-eqiad.php: More traffic to db1075 after schema change and mysql upgrade (duration: 00m 56s)
  • 11:49 marostegui@deploy1001: Synchronized wmf-config/db-eqiad.php: Slowly repool db1075 after schema change and mysql upgrade (duration: 00m 56s)
  • 11:45 gtirloni: temporarily disabled puppet on seaborgium/serpens to try slapd config changes
  • 11:28 gtirloni: updated seaborgium to stretch (T217280)
  • 11:21 mutante: doc.wikimedia.org - back up, manually fixed path to php-fpm.sock to 7.0 - puppet disabled, fix coming
  • 11:18 mutante: doc.wikimedia.org down and being worked on - package downgrade exposed an issue
  • 11:15 marostegui: Stop MySQL on db1075 for upgrade
  • 11:15 mutante: doc1001 - apt-get remove --purge php7.2* (the same packages with 7.0 were previosly installed in parallel)
  • 10:58 gtirloni: upgrading seaborgium to Stretch (so it's running the same distro as serpens/codfw)
  • 10:34 moritzm: restarting HHVM/Apache on mediawiki canaries to pick up OpenSSL security update
  • 10:14 marostegui@deploy1001: Synchronized wmf-config/db-eqiad.php: Depool db1075 for schema change and mysql upgrade (duration: 00m 56s)
  • 10:13 moritzm: upgrading mediawiki canaries to component/php72 (T216712)
  • 09:47 moritzm: upgrading mwdebug servers in eqiad to component/php72 (T216712)
  • 09:37 akosiaris@cumin1001: conftool action : set/pooled=no; selector: dc=codfw,service=citoid,cluster=scb,name=scb.*
  • 09:37 akosiaris: rump up traffic to citoid kubernetes to 100%
  • 09:37 akosiaris@cumin1001: conftool action : set/pooled=no; selector: dc=eqiad,service=citoid,cluster=scb,name=scb.*
  • 09:21 moritzm: upgrading mwdebug servers in codfw to component/php72 (T216712)
  • 09:15 elukey: fixed vlan-analytics1-d-eqiad members on asw2-d-eqiad - T205507
  • 09:03 mutante: mw2151 - mkdir /var/run/nutcracker ; chown nutcracker:nutcracker /var/run/nutcracker ; systemctl start nutcracker - runs again - pooling server
  • 08:57 marostegui@deploy1001: Synchronized wmf-config/db-eqiad.php: Fully repool db1122 (duration: 00m 55s)
  • 08:54 mutante: depooled mw2151 - nutcracker failing
  • 08:19 mutante: reloading icinga service
  • 08:10 marostegui@deploy1001: Synchronized wmf-config/db-eqiad.php: More traffic to db1122 (duration: 00m 55s)
  • 07:47 marostegui@deploy1001: Synchronized wmf-config/db-eqiad.php: repool db1122 into API (duration: 00m 55s)
  • 07:30 marostegui@deploy1001: Synchronized wmf-config/db-eqiad.php: Slowly repool db1122 (duration: 00m 55s)
  • 07:28 marostegui@deploy1001: sync-file aborted: Repool db1121 (duration: 00m 01s)
  • 07:21 marostegui@deploy1001: Synchronized wmf-config/db-eqiad.php: Repool db1121 (duration: 00m 56s)
  • 07:12 marostegui: Stop MySQL on db1122 to upgradwe
  • 07:12 marostegui@deploy1001: Synchronized wmf-config/db-eqiad.php: Depool db1122 for MySQL upgrade (duration: 00m 57s)
  • 06:40 kart_: Finished manual run of unpublished ContentTranslation draft purge script (T217310)
  • 06:03 marostegui: Deploy schema change on db1121, this will generate lag on labsdb:s4 - T86342
  • 06:03 marostegui@deploy1001: Synchronized wmf-config/db-eqiad.php: Depool db1121 (duration: 00m 57s)
  • 04:03 kart_: Started manual run of unpublished ContentTranslation draft purge script (T217310)
  • 01:19 twentyafterfour: phabricator update complete
  • 01:17 twentyafterfour: starting phabricator update to tag release/2019-03-07/1 - expect momentary downtime
  • 01:10 twentyafterfour: preparing phabricator upgrade
  • 00:47 aaron@deploy1001: Synchronized php-1.33.0-wmf.20/includes/specials/pagers/ActiveUsersPager.php: f929e2a5069 (duration: 00m 56s)
  • 00:43 aaron@deploy1001: Synchronized php-1.33.0-wmf.20/includes/specials/SpecialActiveusers.php: f929e2a5069 (duration: 00m 56s)
  • 00:28 aaron@deploy1001: Synchronized wmf-config/InitialiseSettings.php: Enable loading WikibaseCirrusSearch (disabled) on production wikis (duration: 00m 55s)
  • 00:23 aaron@deploy1001: Synchronized wmf-config/InitialiseSettings.php: Run WikibaseCirrusSearch code for search on testwikidatawiki (duration: 00m 56s)

2019-03-06

  • 21:23 XioNoX: test ping-offload with unused IP 208.80.153.225 - T190090
  • 20:30 hashar: 1.33.0-wmf.20 looks fine with group0 and group1
  • 20:14 hashar@deploy1001: Synchronized php: group1 wikis to 1.33.0-wmf.20 (duration: 01m 43s)
  • 20:12 hashar@deploy1001: rebuilt and synchronized wikiversions files: group1 wikis to 1.33.0-wmf.20
  • 19:51 hashar@deploy1001: Synchronized php-1.33.0-wmf.20/extensions/LdapAuthentication/LdapPrimaryAuthenticationProvider.php: Remove calls to no-longer-imeplemented methods after I2eeaeed1 - T217692 (duration: 00m 58s)
  • 19:14 XioNoX: apply ping-offload redirect to private1-a-codfw - T190090
  • 19:03 gtirloni: increased serpens vCPUs from 4 to 8 (T217280)
  • 18:55 gtirloni: increased seaborgium vCPUs from 4 to 8 (T217280)
  • 18:08 bstorm_: re-enabled puppet after observing the change works well on the partner for labstore2004 and T210818
  • 18:07 joal@deploy1001: Finished deploy [analytics/refinery@fef9181]: Regular analytics weekly deploy train (duration: 31m 02s)
  • 18:04 bstorm_: disabled puppet and downtimed labstore2004 while deploying a change for T210818
  • 17:36 joal@deploy1001: Started deploy [analytics/refinery@fef9181]: Regular analytics weekly deploy train
  • 17:34 sbisson@deploy1001: Synchronized wmf-config/throttle.php: SWAT: Added new throttle rules, removed expired|gerrit:494782Added new throttle rules, removed expired (duration: 00m 55s)
  • 17:33 sbisson@deploy1001: sync-file aborted: SWAT: Added new throttle rules, removed expired|gerrit:494782Added new throttle rules, removed expired (duration: 00m 01s)
  • 17:24 sbisson@deploy1001: Synchronized wmf-config/InitialiseSettings.php: SWAT: wgCopyUploadDomains: Changed domain for mehrnews.com|gerrit:492448wgCopyUploadDomains: Changed domain for mehrnews.com (duration: 00m 56s)
  • 17:17 sbisson@deploy1001: Synchronized php-1.33.0-wmf.20/extensions/GrowthExperiments/extension.json: SWAT: Use schema version where reading is a valid editor_interface|gerrit:494531Use schema version where reading is a valid editor_interface (duration: 00m 56s)
  • 17:10 elukey@deploy1001: Finished deploy [analytics/superset/deploy@911ad13]: First deploy to new host (duration: 00m 27s)
  • 17:10 elukey@deploy1001: Started deploy [analytics/superset/deploy@911ad13]: First deploy to new host
  • 17:09 sbisson@deploy1001: Synchronized wmf-config/InitialiseSettings.php: SWAT: Welcome survey: send all newcomers to variation A (cs, ko)|gerrit:494698Welcome survey: send all newcomers to variation A (cs, ko) (duration: 00m 56s)
  • 16:53 jbond42: built prometheus-openldap-exporter for stretch
  • 16:51 ema: upgrade ATS to 8.0.2-1wm1
  • 16:23 moritzm: imported conftool 1.0.2-1+deb10u1 for buster-wikimedia
  • 16:10 krinkle@deploy1001: Synchronized php-1.33.0-wmf.20/includes/api/ApiBase.php: I921777 (duration: 00m 58s)
  • 16:05 moritzm: imported scap for buster-wikimedia (T213527)
  • 14:40 marostegui@deploy1001: Synchronized wmf-config/db-eqiad.php: Repool db1123 (duration: 00m 56s)
  • 13:35 marostegui: Upgrade MySQL on db1123
  • 13:18 jbond42: rolling security updates for file on jessie
  • 13:02 zeljkof: EU SWAT finished
  • 12:41 zfilipin@deploy1001: Synchronized wmf-config/InitialiseSettings.php: SWAT: Change links in cswiki Help Panel (T217391)|gerrit:494668Change links in cswiki Help Panel (T217391) (duration: 00m 55s)
  • 12:32 oblivian@deploy1001: Synchronized php-1.33.0-wmf.19/extensions/WikimediaEvents: SWAT: Allow directing a sample of users to PHP 7 backport to wmf.19 T216676 (duration: 00m 57s)
  • 12:22 gtirloni: updated serpens to stretch (T217280)
  • 12:22 zfilipin@deploy1001: Synchronized wmf-config/throttle.php: SWAT: Throttle Exception for Art+Feminism event Eindhoven 8th March (T217676)|gerrit:494669Throttle Exception for Art+Feminism event Eindhoven 8th March (T217676) (duration: 00m 56s)
  • 12:10 oblivian@deploy1001: Synchronized wmf-config/CommonSettings.php: Setting php7 sample rate for anonymous users to 0 (duration: 00m 57s)
  • 11:32 godog: bounce prometheus@k8s on prometheus2004 to test limiting concurrent connections
  • 11:21 gtirloni: updated and rebooted seaborgium (T217280)
  • 11:18 gtirloni: updated and rebooted serpens (T217280)
  • 10:56 marostegui: Deploy schema change on db1123
  • 10:56 marostegui@deploy1001: Synchronized wmf-config/db-eqiad.php: Depool db1123 (duration: 00m 53s)
  • 10:48 volans: upgraded spicerack to 0.0.20 on cumin[12]001
  • 10:46 volans: uploaded spicerack_0.0.20-1_amd64.deb to apt.wikimedia.org stretch-wikimedia
  • 10:38 hashar@deploy1001: Synchronized php-1.33.0-wmf.20/extensions/Translate/TranslateUtils.php: Revert "TranslateUtils: Avoid use of deprecated class Revision" - T217689 (duration: 00m 59s)
  • 10:36 hashar: Deploying a hotfix for Translate https://gerrit.wikimedia.org/r/#/c/mediawiki/extensions/Translate/+/494659/
  • 10:22 ema: lvs100[12],lvs1016: upgrade linux to 4.9.144-3.1, reboot for L1TF kernel/microcode updates T203011
  • 09:11 ema: lvs200[123]: upgrade linux to 4.9.144-3.1, reboot for L1TF kernel/microcode updates T203011
  • 09:05 moritzm: removed debmonitor host entry for ruthenium (T216062)
  • 09:01 mutante: switching noc.wikimedia.org from apache to httpd module (mwmaint2001, then mwmaint1002)
  • 08:48 akosiaris@cumin1001: conftool action : set/weight=12; selector: dc=eqiad,service=citoid,cluster=scb,name=kubernetes.*
  • 08:48 akosiaris@cumin1001: conftool action : set/weight=15; selector: dc=codfw,service=citoid,cluster=scb,name=kubernetes.*
  • 08:48 akosiaris: increase citoid traffic to kubernetes infrastructure to 50% T213194
  • 08:48 akosiaris: increase citoid traffic to kubernetes infrastructure to 50%
  • 08:47 marostegui: Deploy schema change on s3 codfw, this will generate lag on codfw - T86342
  • 08:42 ema: lvs300[12]: upgrade linux to 4.9.144-3.1, reboot for L1TF kernel/microcode updates T203011
  • 08:30 marostegui@deploy1001: Synchronized wmf-config/db-eqiad.php: Repool db1090 after MySQL upgrade (duration: 00m 59s)
  • 08:15 marostegui: Stop MySQL on db1090 for mysql upgrade
  • 08:14 marostegui@deploy1001: Synchronized wmf-config/db-eqiad.php: Depool db1090 for MySQL upgrade (duration: 00m 56s)
  • 08:14 marostegui: Reload haproxy on dbproxy1010 to repool labsdb1011
  • 07:48 marostegui@deploy1001: Synchronized wmf-config/db-eqiad.php: Fully repool db1105 after MySQL upgrade (duration: 00m 56s)
  • 07:40 marostegui@deploy1001: Synchronized wmf-config/db-eqiad.php: Increase traffic db1105 after MySQL upgrade (duration: 00m 56s)
  • 07:34 marostegui: Remove dbstore1002 from tendril and zarcillo T216491
  • 07:09 elukey: raised analytics user's max_user_connection from 10 to 100 on labsdb1012 - T215231
  • 07:06 marostegui@deploy1001: Synchronized wmf-config/db-eqiad.php: Increase traffic db1105 after MySQL upgrade (duration: 00m 56s)
  • 06:53 marostegui@deploy1001: Synchronized wmf-config/db-eqiad.php: Slowly repool db1105 after MySQL upgrade (duration: 00m 56s)
  • 06:32 marostegui: Stop MySQL on db1105 for MySQL upgrade
  • 06:31 marostegui@deploy1001: Synchronized wmf-config/db-eqiad.php: Depool db1105 for MySQL upgrade (duration: 01m 14s)
  • 06:27 marostegui: Add labsdb1012 to tendril and zarcillo - T215231
  • 05:50 kart_: Finished manual run of unpublished ContentTranslation draft purge script (T217310)
  • 04:26 eileen: civicrm revision changed from 196493f372 to 4aac68eead, config revision is 8ca90b4c7b
  • 04:00 kart_: Started manual run of unpublished ContentTranslation draft purge script (T217310)
  • 00:55 twentyafterfour: finished US Eveninig SWAT.
  • 00:41 twentyafterfour@deploy1001: Synchronized wmf-config/InitialiseSettings.php: sync https://gerrit.wikimedia.org/r/#/c/operations/mediawiki-config/+/494524/ for SWAT refs T217276 (duration: 00m 55s)
  • 00:23 twentyafterfour@deploy1001: Synchronized wmf-config/mobile.php: sync https://gerrit.wikimedia.org/r/#/c/operations/mediawiki-config/+/494271/ for SWAT refs T212253 (duration: 00m 56s)
  • 00:12 twentyafterfour@deploy1001: Synchronized wmf-config/InitialiseSettings.php: sync https://gerrit.wikimedia.org/r/#/c/operations/mediawiki-config/+/493236/ for SWAT. refs T217080 (duration: 00m 56s)

2019-03-05

  • 23:51 ejegg: updated payments-wiki from 4f2935ad17 to f1a89d7045
  • 21:05 godog: temporarily stop requests to k8s instance on prometheus2004
  • 21:00 herron: restarted apache on grafana1001
  • 20:43 herron: retarted apache on grafana1001
  • 19:56 hashar@deploy1001: Synchronized php-1.33.0-wmf.20/extensions/LdapAuthentication/: Stop referring to the now-killed AuthPlugin class - T217692 (duration: 00m 57s)
  • 17:44 godog: bounce uwsgi on graphite1004
  • 17:25 herron: restarting uwsgi-graphite-web on graphite1004
  • 16:54 moritzm: imported logstash 1:5.6.14-1 to thirdparty/elastic56
  • 16:52 herron: restarting uwsgi-graphite-web on graphite1004
  • 16:43 otto@deploy1001: scap-helm eventgate-analytics finished
  • 16:43 otto@deploy1001: scap-helm eventgate-analytics cluster staging completed
  • 16:43 otto@deploy1001: scap-helm eventgate-analytics upgrade staging stable/eventgate-analytics -f eventgate-analytics-staging-values.yaml [namespace: eventgate-analytics, clusters: staging]
  • 16:20 herron: restarting uwsgi-graphite-web on graphite1004
  • 15:53 hashar@deploy1001: rebuilt and synchronized wikiversions files: group0 to 1.33.0-wmf.20
  • 15:35 hashar@deploy1001: Finished scap: testwiki to php-1.33.0-wmf.20 and rebuild l10n cache # T206674 (duration: 51m 03s)
  • 14:52 gtirloni: reprepro added bdsync_0.10-1+deb9u1 T209527
  • 14:44 hashar@deploy1001: Started scap: testwiki to php-1.33.0-wmf.20 and rebuild l10n cache # T206674
  • 14:42 otto@deploy1001: scap-helm eventgate-analytics finished
  • 14:42 otto@deploy1001: scap-helm eventgate-analytics cluster eqiad completed
  • 14:42 otto@deploy1001: scap-helm eventgate-analytics upgrade production stable/eventgate-analytics -f eventgate-analytics-eqiad-values.yaml [namespace: eventgate-analytics, clusters: eqiad]
  • 14:41 otto@deploy1001: scap-helm eventgate-analytics finished
  • 14:41 otto@deploy1001: scap-helm eventgate-analytics cluster codfw completed
  • 14:41 otto@deploy1001: scap-helm eventgate-analytics upgrade production stable/eventgate-analytics -f eventgate-analytics-codfw-values.yaml [namespace: eventgate-analytics, clusters: codfw]
  • 14:40 jiji@cumin1001: conftool action : set/pooled=yes; selector: dc=codfw,service=citoid,cluster=scb,name=kubernetes.*
  • 14:35 hashar@deploy1001: scap failed: CalledProcessError Command '/usr/local/bin/mwscript mergeMessageFileList.php --wiki="cawikibooks" --list-file="/srv/mediawiki-staging/wmf-config/extension-list" --output="/tmp/tmp.BRPBtKvzZH" --verbose' returned non-zero exit status 1 (duration: 00m 20s)
  • 14:35 hashar@deploy1001: Started scap: testwiki to php-1.33.0-wmf.20 and rebuild l10n cache # T206674
  • 14:34 jijiki: Rump up citoid traffic from k8s to 25% on codfw - T213194
  • 14:34 hashar@deploy1001: scap failed: CalledProcessError Command '/usr/local/bin/mwscript mergeMessageFileList.php --wiki="cawikibooks" --list-file="/srv/mediawiki-staging/wmf-config/extension-list" --output="/tmp/tmp.ngh6XIMz8y" --verbose' returned non-zero exit status 1 (duration: 00m 21s)
  • 14:33 hashar@deploy1001: Started scap: testwiki to php-1.33.0-wmf.20 and rebuild l10n cache # T206674
  • 14:33 jiji@cumin1001: conftool action : set/weight=5; selector: dc=codfw,service=citoid,cluster=scb,name=kubernetes.*
  • 14:27 hashar@deploy1001: scap failed: CalledProcessError Command '/usr/local/bin/mwscript mergeMessageFileList.php --wiki="cawikibooks" --list-file="/srv/mediawiki-staging/wmf-config/extension-list" --output="/tmp/tmp.JrfRQw0oDJ" --verbose' returned non-zero exit status 1 (duration: 00m 21s)
  • 14:27 hashar@deploy1001: Started scap: testwiki to php-1.33.0-wmf.20 and rebuild l10n cache # T206674
  • 14:25 hashar@deploy1001: Pruned MediaWiki: 1.33.0-wmf.14 (duration: 09m 47s)
  • 14:20 otto@deploy1001: scap-helm eventgate-analytics finished
  • 14:20 otto@deploy1001: scap-helm eventgate-analytics cluster codfw completed
  • 14:20 otto@deploy1001: scap-helm eventgate-analytics cluster eqiad completed
  • 14:20 otto@deploy1001: scap-helm eventgate-analytics [namespace: eventgate-analytics, clusters: eqiad,codfw]
  • 14:17 hashar@deploy1001: scap failed: LockFailedError Failed to acquire lock "/var/lock/scap.operations_mediawiki-config.lock"; owner is "hashar"; reason is "Pruned MediaWiki: 1.33.0-wmf.14" (duration: 00m 00s)
  • 14:14 hashar: Applied wmf/1.33.0-wmf.20 local patches # T206674
  • 14:12 marostegui@deploy1001: Synchronized wmf-config/db-eqiad.php: Repool db1064 T217591 (duration: 01m 50s)
  • 13:31 hashar: Cutting branch wmf/1.33.0-wmf.20 # T206674
  • 13:16 marostegui@deploy1001: Synchronized wmf-config/db-eqiad.php: Depool db1064 T217591 (duration: 00m 48s)
  • 13:14 ema: lvs500[12]: upgrade linux to 4.9.144-3.1, reboot for L1TF kernel/microcode updates T203011
  • 13:07 zeljkof: EU SWAT finished
  • 12:58 zfilipin@deploy1001: Synchronized wmf-config/InitialiseSettings.php: SWAT: Set wgArticleCountMethod=any for zhwikiversity (T214946)|gerrit:487115Set wgArticleCountMethod=any for zhwikiversity (T214946) (duration: 00m 49s)
  • 12:45 kartik@deploy1001: Synchronized wmf-config/InitialiseSettings.php: SWAT: Revert "Enable edittag for ExternalGuidance in CX and VE" (duration: 00m 48s)
  • 12:24 kartik@deploy1001: Synchronized wmf-config/InitialiseSettings.php: SWAT: Revert gerrit:493155 (duration: 00m 49s)
  • 11:59 _joe_: upgrading scap everywhere to 3.9.2-1, T217611
  • 11:52 ema: lvs400[56]: upgrade linux to 4.9.144-3.1, reboot for L1TF kernel/microcode updates T203011
  • 11:45 _joe_: installing new scap version in codfw
  • 11:44 oblivian@deploy1001: Synchronized README: Test deploy for new scap version (duration: 00m 48s)
  • 11:43 _joe_: installing new swat version on deployment servers, T217611
  • 11:22 _joe_: uploading new scap packages , T217611
  • 10:58 ema: lvs4007/lvs5003: upgrade linux to 4.9.144-3.1, reboot for L1TF kernel/microcode updates T203011
  • 10:57 marostegui@deploy1001: Synchronized wmf-config/db-eqiad.php: Fully repool db1103:3312 and db1103:3314 after mysql upgrade (duration: 00m 47s)
  • 10:55 gilles@deploy1001: Synchronized php-1.33.0-wmf.19/extensions/NavigationTiming/NavigationTiming.config.php: T187299 Fix wiki oversampling config validation (duration: 00m 48s)
  • 10:37 marostegui@deploy1001: Synchronized wmf-config/db-eqiad.php: More traffic to db1103:3312 and db1103:3314 after mysql upgrade (duration: 00m 48s)
  • 10:27 jiji@cumin1001: conftool action : set/weight=4; selector: dc=eqiad,service=citoid,cluster=scb,name=kubernetes.*
  • 10:24 jijiki: Rump up citoid traffic from k8s to 25% - T213194
  • 10:18 marostegui@deploy1001: Synchronized wmf-config/db-eqiad.php: More traffic to db1103:3312 and db1103:3314 after mysql upgrade (duration: 00m 47s)
  • 10:10 gilles@deploy1001: Synchronized wmf-config/InitialiseSettings.php: T187299 Oversample navtiming on ruwiki and eswiki (duration: 00m 47s)
  • 10:07 gilles@deploy1001: Synchronized php-1.33.0-wmf.19/extensions/NavigationTiming: T187299 Backport wiki oversampling config syntax change (duration: 00m 48s)
  • 10:00 marostegui@deploy1001: Synchronized wmf-config/db-eqiad.php: Slowly repool db1103:3312 and db1103:3314 after mysql upgrade (duration: 00m 50s)
  • 09:56 ema: lvs200[456]: upgrade linux to 4.9.144-3.1, reboot for L1TF kernel/microcode updates T203011
  • 09:31 marostegui: Stop MySQL on db1103:3312 and db1103:3314 for MySQL upgrade
  • 09:29 marostegui@deploy1001: Synchronized wmf-config/db-eqiad.php: Depool db1103:3312 and db1103:3314 for mysql upgrade (duration: 00m 47s)
  • 09:26 ema: lvs100[456]: reboot for L1TF kernel/microcode updates T203011
  • 09:23 marostegui@deploy1001: Synchronized wmf-config/db-eqiad.php: Repool db1081 (duration: 00m 47s)
  • 09:16 godog: kibana refresh field list
  • 08:58 mutante: restarting gerrit to pickup change 493963 - disable jgit gc
  • 08:47 marostegui@deploy1001: Synchronized wmf-config/db-eqiad.php: Depool db1081 (duration: 00m 47s)
  • 08:43 marostegui@deploy1001: Synchronized wmf-config/db-eqiad.php: Fully repool db1084 (duration: 00m 48s)
  • 08:33 marostegui@deploy1001: Synchronized wmf-config/db-eqiad.php: Repool db1084 in API (duration: 00m 48s)
  • 08:32 marostegui: Optimize echo_event table on x1 codfw master (db2034) this will generate lag on x1 codfw - T217591
  • 08:24 akosiaris: T213194 bump percentage of citoid requests reaching eqiad kubernetes cluster to 9%
  • 08:23 jiji@cumin1001: conftool action : set/pooled=yes; selector: dc=eqiad,service=citoid,cluster=scb,name=kubernetes100.*
  • 08:14 marostegui@deploy1001: Synchronized wmf-config/db-eqiad.php: Slowly repool db1084 (duration: 00m 49s)
  • 07:47 marostegui: Upgrade MySQL on db1084
  • 07:18 marostegui: Stop MySQL on db1095 (backups host) to upgrade MySQL
  • 07:18 marostegui@deploy1001: Synchronized wmf-config/db-eqiad.php: Depool db1084 (duration: 00m 47s)
  • 07:08 marostegui: Start transferring data from labsdb1011 to labsdb1012 - T215231
  • 06:56 marostegui: Reboot labsdb1012
  • 06:55 marostegui: Defragment echo_event tables on dbstore1005:3320 T217591
  • 06:51 marostegui@deploy1001: Synchronized wmf-config/db-eqiad.php: Repool db1091 (duration: 00m 48s)
  • 06:43 marostegui: Stop MySQL on db2035 (s2 codfw master) to upgrade MySQL
  • 06:41 kart_: Finished manual run of unpublished ContentTranslation draft purge script (T217310)
  • 06:18 marostegui: Stop MySQL on dbstore2001 to upgrade MySQL
  • 06:17 marostegui: Reload haproxy on dbproxy1010 to depool labsdb1011
  • 06:10 marostegui@deploy1001: Synchronized wmf-config/db-eqiad.php: Depool db1091 (duration: 00m 51s)
  • 03:05 catrope@deploy1001: Synchronized php-1.33.0-wmf.19/includes/api/ApiBase.php: Handle TitleBlacklist errors correctly (T217382) (duration: 00m 49s)
  • 03:03 kart_: Started manual run of unpublished ContentTranslation draft purge script (T217310)
  • 02:59 ejegg: updated payments-wiki from ca7c280f3e to 4f2935ad17
  • 02:27 catrope@deploy1001: Synchronized php-1.33.0-wmf.19/includes/api/ApiBase.php: Revert hot fix (duration: 00m 46s)
  • 02:21 catrope@deploy1001: Synchronized php-1.33.0-wmf.19/includes/api/ApiBase.php: Hot fix for T217615 (duration: 00m 47s)
  • 02:05 catrope@deploy1001: Synchronized php-1.33.0-wmf.19/includes/api/ApiBase.php: Logging live patch to debug T217615 (duration: 00m 47s)
  • 01:33 catrope@deploy1001: Synchronized php-1.33.0-wmf.19/includes/api/ApiBase.php: Logging live patch to debug T217615 (duration: 00m 47s)
  • 01:21 catrope@deploy1001: Synchronized php-1.33.0-wmf.19/includes/api/ApiBase.php: Logging live patch to debug T217615 (duration: 00m 47s)
  • 01:18 catrope@deploy1001: Synchronized php-1.33.0-wmf.19/includes/api/ApiBase.php: Logging live patch to debug T217615 (duration: 00m 47s)
  • 01:15 catrope@deploy1001: Synchronized php-1.33.0-wmf.19/includes/api/ApiBase.php: Logging live patch to debug T217615 (duration: 00m 49s)
  • 01:13 tzatziki: changing password for "Force de Mots" and "שרית חייט"
  • 00:46 XioNoX: disable unused ports of restbase1016 on asw-a
  • 00:44 catrope@deploy1001: Synchronized php-1.33.0-wmf.19/extensions/WikimediaEvents/: Redact title/create params and drop page_title in EditorJourney schema (T213974) (duration: 00m 49s)
  • 00:40 catrope@deploy1001: Synchronized wmf-config/InitialiseSettings.php: Enable ORES goodfaith on itwiki (T211032) (duration: 00m 47s)
  • 00:17 catrope@deploy1001: Synchronized php-1.33.0-wmf.19/extensions/GrowthExperiments/includes/HelpPanel.php: Exclude help panel from main page (T215664) (duration: 00m 48s)
  • 00:12 catrope@deploy1001: Synchronized wmf-config/InitialiseSettings.php: Enable ORES on kowiki (T161628) (duration: 00m 49s)

2019-03-04

  • 23:09 eileen: civicrm revision changed from 316e038a69 to 196493f372, config revision is 8ca90b4c7b
  • 22:15 arlolra: Updated Parsoid to 1660395 (T214099, T202905)
  • 22:05 arlolra@deploy1001: Finished deploy [parsoid/deploy@bdc9e66]: Updating Parsoid to 1660395 (duration: 06m 34s)
  • 21:59 arlolra@deploy1001: Started deploy [parsoid/deploy@bdc9e66]: Updating Parsoid to 1660395
  • 21:58 otto@deploy1001: scap-helm eventgate-analytics finished
  • 21:58 otto@deploy1001: scap-helm eventgate-analytics cluster codfw completed
  • 21:58 otto@deploy1001: scap-helm eventgate-analytics upgrade production stable/eventgate-analytics -f eventgate-analytics-codfw-values.yaml [namespace: eventgate-analytics, clusters: codfw]
  • 21:58 otto@deploy1001: scap-helm eventgate-analytics finished
  • 21:58 otto@deploy1001: scap-helm eventgate-analytics cluster eqiad completed
  • 21:58 otto@deploy1001: scap-helm eventgate-analytics upgrade production stable/eventgate-analytics -f eventgate-analytics-eqiad-values.yaml [namespace: eventgate-analytics, clusters: eqiad]
  • 21:54 otto@deploy1001: scap-helm eventgate-analytics finished
  • 21:54 otto@deploy1001: scap-helm eventgate-analytics cluster staging completed
  • 21:54 otto@deploy1001: scap-helm eventgate-analytics upgrade staging stable/eventgate-analytics -f eventgate-analytics-staging-values.yaml [namespace: eventgate-analytics, clusters: staging]
  • 21:54 otto@deploy1001: scap-helm eventgate-analytics install -n staging -f eventgate-analytics-staging-values.yaml stable/eventgate-analytics [namespace: eventgate-analytics, clusters: staging]
  • 21:49 ejegg: re-enabled Omnimail unsubscribe processing, disabled recipient repair job
  • 21:46 ejegg: updated Fundraising CiviCRM from 616c58cebe to 316e038a69
  • 21:19 XioNoX: add bgp sessions to AS137236 on cr1-eqsin
  • 21:14 XioNoX: re-enable bgp to AS13489 on cr2-eqiad
  • 20:44 reedy@deploy1001: Synchronized php-1.33.0-wmf.19/extensions/Echo/: T217487 (duration: 00m 53s)
  • 20:23 niharika29@deploy1001: Finished deploy [scholarships/scholarships@2ef7463]: Remove outdated translations (duration: 00m 02s)
  • 20:23 niharika29@deploy1001: Started deploy [scholarships/scholarships@2ef7463]: Remove outdated translations
  • 20:17 niharika29@deploy1001: Finished deploy [scholarships/scholarships@2ef7463]: Deploy new version of app with new translations + fix broken privacy policy link (duration: 00m 02s)
  • 20:17 niharika29@deploy1001: Started deploy [scholarships/scholarships@2ef7463]: Deploy new version of app with new translations + fix broken privacy policy link
  • 20:01 sbisson@deploy1001: Synchronized wmf-config/Wikibase.php: SWAT: Enables maplink for geocoordinate Wikibase statements display on clients|gerrit:494289Enables maplink for geocoordinate Wikibase statements display on clients (duration: 00m 48s)
  • 20:00 sbisson@deploy1001: Synchronized wmf-config/InitialiseSettings.php: SWAT: Enable reader demographics survey|gerrit:494292Enable reader demographics survey (duration: 00m 49s)
  • 19:52 sbisson@deploy1001: Synchronized wmf-config/InitialiseSettings.php: SWAT: GrowthExperiments: Enable help panel for user and user talk NS|gerrit:493616GrowthExperiments: Enable help panel for user and user talk NS (duration: 00m 49s)
  • 19:47 sbisson@deploy1001: Synchronized tests/loggingTest.php: SWAT: Add eventbus analytics logging alongside with kafka logging. (part 2)|gerrit:490668Add eventbus analytics logging alongside with kafka logging. (part 2) (duration: 00m 48s)
  • 19:46 sbisson@deploy1001: Synchronized wmf-config/: SWAT: Add eventbus analytics logging alongside with kafka logging. (part 1)|gerrit:490668Add eventbus analytics logging alongside with kafka logging. (part 1) (duration: 00m 51s)
  • 19:41 onimisionipe@deploy1001: Finished deploy [wdqs/wdqs@20badb3]: Updater and Blazegraph group to report metric domain plus GUI updates (duration: 11m 07s)
  • 19:35 sbisson@deploy1001: Synchronized wmf-config/InitialiseSettings.php: SWAT: Enable GrowthExperiments Homepage on testwiki|gerrit:494223Enable GrowthExperiments Homepage on testwiki (duration: 00m 49s)
  • 19:30 onimisionipe@deploy1001: Started deploy [wdqs/wdqs@20badb3]: Updater and Blazegraph group to report metric domain plus GUI updates
  • 19:03 bstorm_: dumps.wikimedia.org is now running off labstore1007 T217473
  • 18:25 bstorm_: disabled notifications for high load on labstore1007 while failed over T217473
  • 18:23 vgutierrez: restarting pybal on lvs5002 - T213121
  • 18:16 XioNoX: push lvs5002 changes on cr2-eqsin - T213121
  • 16:54 hashar: contint1001: cleaned all Docker containers, compress /var/log/zuul/ files
  • 16:52 jiji@cumin1001: conftool action : set/pooled=yes; selector: dc=eqiad,service=citoid,cluster=scb,name=kubernetes1001.*
  • 16:43 marostegui: Restart MySQL on db1112 for addshore
  • 16:33 jynus: enabing gtid replication on clouddb1002
  • 16:29 jforrester@deploy1001: Synchronized wmf-config/CommonSettings.php: T217365: Enable VE section editing on mobile for Beta Cluster, part II (duration: 00m 48s)
  • 16:27 jforrester@deploy1001: Synchronized wmf-config/InitialiseSettings.php: T217365: Enable VE section editing on mobile for Beta Cluster, part I (duration: 00m 51s)
  • 16:18 moritzm: installing ldb security updates
  • 16:13 jiji@cumin1001: conftool action : set/pooled=yes; selector: dc=eqiad,service=citoid,cluster=scb,name=kubernetes1001
  • 16:13 jiji@cumin1001: conftool action : set/weight=1; selector: dc=eqiad,service=citoid,cluster=scb,name=kubernetes1001
  • 16:13 jiji@cumin1001: conftool action : set/weight=1; selector: dc=eqiad,service=citoid,cluster=scb,name=kubernetes.*
  • 15:55 jijiki: Running puppet on sbc* and kubernetes* - T213194
  • 15:44 jijiki: Disabling puppet on sbc* and kubernetes* - T213194
  • 15:22 otto@deploy1001: Synchronized wmf-config/CommonSettings.php: no-op: Remove unused legacy EventBus config settings (duration: 00m 49s)
  • 15:11 marostegui@deploy1001: Synchronized wmf-config/db-eqiad.php: Repool db1097:3314 after changing index on logging table (duration: 00m 51s)
  • 14:54 marostegui@deploy1001: Synchronized wmf-config/db-eqiad.php: Repool db1089 and db1100 after changing index on logging tbale (duration: 00m 49s)
  • 14:20 elukey: update puppet compiler's facts
  • 14:20 marostegui: Change indexes on logging table on db1100 (s5) and db1097:3314 (commonswiki) - T217397
  • 14:06 marostegui@deploy1001: Synchronized wmf-config/db-eqiad.php: Depool db1097:3314, db1100 to changeindexes on logging tbale (duration: 00m 50s)
  • 13:57 gehel: restarting blazegraph on wdqs eqiad
  • 12:23 moritzm: testing component/php72 on mw2224
  • 11:04 akosiaris@deploy1001: scap-helm citoid finished
  • 11:04 akosiaris@deploy1001: scap-helm citoid cluster codfw completed
  • 11:04 akosiaris@deploy1001: scap-helm citoid upgrade -f citoid-codfw-values.yaml production stable/citoid [namespace: citoid, clusters: codfw]
  • 11:04 akosiaris@deploy1001: scap-helm citoid finished
  • 11:04 akosiaris@deploy1001: scap-helm citoid cluster eqiad completed
  • 11:04 akosiaris@deploy1001: scap-helm citoid upgrade -f citoid-eqiad-values.yaml production stable/citoid [namespace: citoid, clusters: eqiad]
  • 11:04 akosiaris@deploy1001: scap-helm citoid finished
  • 11:04 akosiaris@deploy1001: scap-helm citoid cluster staging completed
  • 11:04 akosiaris@deploy1001: scap-helm citoid upgrade -f citoid-staging-values.yaml staging stable/citoid [namespace: citoid, clusters: staging]
  • 10:53 marostegui@deploy1001: Synchronized wmf-config/db-eqiad.php: More weight to db1089 (duration: 00m 48s)
  • 10:38 jdrewniak@deploy1001: Synchronized portals: Wikimedia Portals Update: Bumping portals to master (T128546)|gerrit:494191 Bumping portals to master (T128546) (duration: 00m 50s)
  • 10:37 jdrewniak@deploy1001: Synchronized portals/wikipedia.org/assets: Wikimedia Portals Update: Bumping portals to master (T128546)|gerrit:494191 Bumping portals to master (T128546) (duration: 00m 50s)
  • 09:44 marostegui@deploy1001: Synchronized wmf-config/db-eqiad.php: Repool db1089 with low weight (duration: 00m 48s)
  • 09:27 ariel@deploy1001: Finished deploy [dumps/dumps@932bf7e]: make misc dumps failure message nicer (duration: 00m 09s)
  • 09:27 ariel@deploy1001: Started deploy [dumps/dumps@932bf7e]: make misc dumps failure message nicer
  • 09:22 godog: temporarily stop prometheus on prometheus2004 to take a snapshot
  • 08:45 gilles@deploy1001: Synchronized wmf-config/InitialiseSettings.php: T216499 Undo enabling Priority Hints origin trial on ruwiki (duration: 00m 49s)
  • 08:44 marostegui@deploy1001: Synchronized wmf-config/db-eqiad.php: Repool db1103:3314 (duration: 00m 49s)
  • 08:38 gilles@deploy1001: scap failed: average error rate on 7/11 canaries increased by 10x (rerun with --force to override this check, see https://logstash.wikimedia.org/goto/db09a36be5ed3e81155041f7d46ad040 for details)
  • 08:29 marostegui: Change logging indexes on db1089 to leave the indexes exactly like the ones on tables.sql - T217397
  • 08:14 marostegui@deploy1001: Synchronized wmf-config/db-eqiad.php: Depool db1089 - T217397 (duration: 00m 49s)
  • 07:48 ema: cp3032/cp3042: restart varnish-be due to mbox lag
  • 07:42 marostegui@deploy1001: Synchronized wmf-config/db-eqiad.php: Depool db1103:3314 for schema change (duration: 00m 49s)
  • 07:39 marostegui@deploy1001: Synchronized wmf-config/db-eqiad.php: Repool db1097:3314 (duration: 00m 53s)
  • 07:33 marostegui: Reload haproxy on dbproxy1010 to repool labsdb1010
  • 07:17 kart_: Finished manual run of unpublished ContentTranslation draft purge script (T217310)
  • 07:13 marostegui: Remove dbstore1002 from tendril and zarcillo - T216491
  • 07:05 marostegui: Upgrade MySQL on db2088 and db2091
  • 06:46 marostegui: Stop MySQL on dbstore1002 for decommission T210478 T172410 T216491 T215589
  • 06:38 marostegui: Stop MySQL on labsdb1010 for mysql upgrade
  • 06:34 gtirloni: downtimed cloudstore1008/9 (T209527)
  • 06:13 marostegui: Upgrade MySQL on db2041 db2049 db2056 db2095
  • 06:06 marostegui: Run analyze table logging on db2038 and db2059 - T71222
  • 06:05 marostegui: Reload haproxy on dbproxy1010 to depool labsdb1010
  • 06:04 marostegui@deploy1001: Synchronized wmf-config/db-eqiad.php: Depool db1094:3314 for schema change (duration: 01m 11s)
  • 05:18 kart_: Started manual run of unpublished ContentTranslation draft purge script (T217310)

2019-03-03

  • off: restarted icinga on icinga2001, stale status file, too many open files
  • 10:44 elukey: restart pdfrender on scb1003

2019-03-02

  • 12:12 gtirloni: labstore1006 started nfsd T217473

2019-03-01

  • 20:45 ejegg: turned off fundraising omnimail process unsubscribes job
  • 19:40 XioNoX: pre-configure asw-a8 ports on asw2-a8-eqiad - T187960
  • 19:32 XioNoX: pre-configure asw-a7 ports on asw2-a7-eqiad - T187960
  • 19:29 XioNoX: pre-configure asw-a6 ports on asw2-a6-eqiad - T187960
  • 19:17 XioNoX: pre-configure asw-a5 ports on asw2-a5-eqiad - T187960
  • 18:53 robh: notebook1003 has unusually high load recently (23) and seemed to lag in reporting to icinga. no hardware failures, pinged about it in #wikimedia-analytics
  • 16:33 jbond42: rolling security update of bind9 packages on jessie and trusty
  • 15:38 ema: trafficserver_8.0.2-1wm1 uploaded to stretch-wikimedia
  • 15:02 akosiaris: restore proton config values
  • 14:33 hashar: Updating all debian-glue Jenkins job to properly take in account the BUILD_TIMEOUT parameter # T217403
  • 13:24 moritzm: removed sca* hosts from debmonitor database
  • 12:49 akosiaris: lower max_render_queue_size: to 20 for proton on proton100{1,2}
  • 12:32 akosiaris: restart proton1002, OOM showed up
  • 12:31 akosiaris: restart proton on proton1001, counted 99 chromium processes left running since at least Jan 30
  • 11:47 jbond42: rebooting labsdb1005.codfw.wmnet
  • 11:17 jbond42: rebooting labstore2004.codfw.wmnet
  • 11:05 marostegui@deploy1001: Synchronized wmf-config/db-eqiad.php: Fully repool db1094 (duration: 00m 50s)
  • 08:52 godog: temporarily stop prometheus instances on prometheus2004 to take a snapshot
  • 07:44 oblivian@deploy1001: Synchronized README: Test deploy for new scap configuration (duration: 00m 48s)
  • 07:39 oblivian@deploy1001: Synchronized README: noop sync to test opcache-manager (duration: 00m 47s)
  • 07:31 oblivian@deploy1001: Synchronized README: Test deploy for new scap configuration (duration: 00m 46s)
  • 07:29 marostegui@deploy1001: Synchronized wmf-config/db-eqiad.php: Increase traffic for db1094 after mysql upgrade (duration: 00m 47s)
  • 07:23 _joe_: installed php 7.2 compatible packages on deploy1001,2001
  • 07:12 marostegui@deploy1001: Synchronized wmf-config/db-eqiad.php: Increase traffic for db1094 after mysql upgrade (duration: 00m 47s)
  • 06:56 marostegui@deploy1001: Synchronized wmf-config/db-eqiad.php: Slowly repool db1094 after mysql upgrade (duration: 00m 46s)
  • 06:48 marostegui: Deploy schema change on s4 codfw, lag will appear on s4 codfw - T86342
  • 06:43 marostegui: Stop MySQL on db1094 for mysql upgrade
  • 06:40 _joe_: upgrading php extensions on deploy* to versions compatible with php7.2
  • 05:59 marostegui@deploy1001: Synchronized wmf-config/db-eqiad.php: Depool db1094 (duration: 00m 51s)
  • 00:12 XioNoX: pre-configure asw-a3 ports on asw2-a3-eqiad - T187960
  • 00:09 thcipriani@deploy1001: Synchronized README: noop sync to test opcache-manager in scap 3.9.1-1 (duration: 00m 48s)


Archives

See Server admin log/Archives.