Server Admin Log

From Wikitech
(Redirected from Server admin log)
Jump to navigation Jump to search

2019-08-19

  • 19:35 ejegg: updated payments-wiki from e3b378f65d to 7b8091ba87
  • 18:57 Urbanecm: Morning SWaT done
  • 18:48 urbanecm@deploy1001: Synchronized wmf-config/InitialiseSettings.php: SWAT: Raise rollback limit for all groups (T228708) (duration: 00m 48s)
  • 18:36 urbanecm@deploy1001: Synchronized wmf-config/InitialiseSettings.php: SWAT: 26317c7: Fix zhwikisource wgExtraNamespaces entry (T230294) (duration: 00m 48s)
  • 18:31 urbanecm@deploy1001: Synchronized wmf-config/InitialiseSettings.php: SWAT: b21bbc0: Add `WS` and `CAT` as aliases for zhwikisource namespaces (T230548) (duration: 00m 47s)
  • 18:26 urbanecm@deploy1001: Synchronized wmf-config/CommonSettings.php: SWAT: 0a87e3c: Assign all rights assigned to suppress group to oversight group (T230601) (duration: 00m 48s)
  • 17:56 ebernhar1son: freeze cloudelastic writes to let prod clear 30 min backlog
  • 17:23 onimisionipe@deploy1001: Finished deploy [wdqs/wdqs@2d36896]: Fix Blazegraph dictionary mixup (duration: 18m 18s)
  • 17:17 shdubsh: restarting icinga to disable UI autocomplete
  • 17:04 onimisionipe@deploy1001: Started deploy [wdqs/wdqs@2d36896]: Fix Blazegraph dictionary mixup
  • 16:45 onimisionipe: pool elastic2050. mgmt issue has been resolved - T230597
  • 15:39 ejegg: updated payments-wiki from 00eb090dcc to e3b378f65d
  • 13:57 vgutierrez: repooling cp5001
  • 12:51 marostegui@deploy1001: Synchronized wmf-config/db-codfw.php: Remove db2049 from config T230721 (duration: 00m 48s)
  • 12:50 marostegui@deploy1001: Synchronized wmf-config/db-eqiad.php: Remove db2049 from config T230721 (duration: 00m 48s)
  • 12:38 vgutierrez: depooling cp5001 prior to ats-tls deployment
  • 12:02 Urbanecm: EU SWAT done
  • 11:46 urbanecm@deploy1001: Synchronized wmf-config/InitialiseSettings.php: Revert 483691c (T225053) (duration: 00m 48s)
  • 11:39 urbanecm@deploy1001: Synchronized wmf-config/InitialiseSettings.php: SWAT: 483691c: Revert "Revert "Switch property terms migration to WRITE_NEW on client wikis"" (T225053) (duration: 00m 48s)
  • 11:15 jmm@cumin1001: END (PASS) - Cookbook sre.ganeti.makevm (exit_code=0)
  • 11:12 jmm@cumin1001: START - Cookbook sre.ganeti.makevm
  • 11:03 jmm@cumin2001: END (PASS) - Cookbook sre.ganeti.makevm (exit_code=0)
  • 11:02 elukey@cumin1001: END (PASS) - Cookbook sre.ganeti.makevm (exit_code=0)
  • 11:00 jmm@cumin2001: START - Cookbook sre.ganeti.makevm
  • 10:53 elukey@cumin1001: START - Cookbook sre.ganeti.makevm
  • 10:53 elukey@cumin1001: END (ERROR) - Cookbook sre.ganeti.makevm (exit_code=97)
  • 10:52 elukey@cumin1001: START - Cookbook sre.ganeti.makevm
  • 10:38 jdrewniak@deploy1001: Synchronized portals: Wikimedia Portals Update: Bumping portals to master (T128546) (duration: 00m 49s)
  • 10:37 jdrewniak@deploy1001: Synchronized portals/wikipedia.org/assets: Wikimedia Portals Update: Bumping portals to master (T128546) (duration: 00m 49s)
  • 10:32 elukey@cumin1001: END (PASS) - Cookbook sre.ganeti.makevm (exit_code=0)
  • 10:22 elukey@cumin1001: START - Cookbook sre.ganeti.makevm
  • 09:57 jbond42: add mapped ipv6 to conf200* servers https://gerrit.wikimedia.org/r/c/operations/puppet/+/528475
  • 09:26 marostegui@cumin1001: END (FAIL) - Cookbook sre.hosts.downtime (exit_code=99)
  • 09:24 marostegui@cumin1001: START - Cookbook sre.hosts.downtime
  • 08:57 godog: add 100G to graphite1004 / graphite2003 /srv LVs
  • 07:59 onimisionipe: shutdown elastic2050 to prepare for mgmt reset - T230597
  • 07:40 marostegui: Redact napwikisource on db1124 and db2094 - T210762
  • 07:19 moritzm: installing golang-1.11 security updates on buster
  • 07:08 moritzm: installing ffmpeg security updates on buster
  • 06:37 vgutierrez: upgrading acme-chief to version 0.20 on production servers - T229096
  • 06:30 vgutierrez@puppetmaster1001: conftool action : set/pooled=yes; selector: name=ncredir1001.eqiad.wmnet
  • 06:29 vgutierrez@puppetmaster1001: conftool action : set/pooled=no; selector: name=ncredir1001.eqiad.wmnet
  • 06:28 vgutierrez@puppetmaster1001: conftool action : set/pooled=yes; selector: name=ncredir1002.eqiad.wmnet
  • 06:27 vgutierrez@puppetmaster1001: conftool action : set/pooled=no; selector: name=ncredir1002.eqiad.wmnet
  • 06:26 moritzm: installing ghostscript security updates on scb/proton/notebook* hosts
  • 06:25 vgutierrez@puppetmaster1001: conftool action : set/pooled=yes; selector: name=ncredir2001.codfw.wmnet
  • 06:25 vgutierrez@puppetmaster1001: conftool action : set/pooled=no; selector: name=ncredir2001.codfw.wmnet
  • 06:24 vgutierrez@puppetmaster1001: conftool action : set/pooled=yes; selector: name=ncredir2002.codfw.wmnet
  • 06:22 vgutierrez@puppetmaster1001: conftool action : set/pooled=no; selector: name=ncredir2002.codfw.wmnet
  • 06:21 vgutierrez: rolling upgrade of nginx in ncredir hosts
  • 06:03 moritzm: installing php5 security updates
  • 05:51 marostegui@deploy1001: Synchronized wmf-config/db-eqiad.php: Remove db2067 from config T230705 (duration: 00m 47s)
  • 05:50 marostegui@deploy1001: Synchronized wmf-config/db-codfw.php: Remove db2067 from config T230705 (duration: 00m 50s)
  • 05:46 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db2067, will be moved to m1 T230705', diff saved to https://phabricator.wikimedia.org/P8930 and previous config saved to /var/cache/conftool/dbconfig/20190819-054606-marostegui.json
  • 05:29 elukey: reboot cp2004 due to bnx2x crash (kern.log saved into my home on the host if needed)

2019-08-18

  • 08:28 onimisionipe: running `_cluster/reroute?pretty&explain=true&retry_failed` on eqiad production-search cluster to force allocation of shards

2019-08-16

  • 19:48 sbassett: Deployed security patch for T230576 (ex:MobileFrontend)
  • 18:57 @: helmfile [STAGING] Ran 'apply' command on namespace 'sessionstore' for release 'staging' .
  • 16:38 XioNoX: add BGP sessions to Scaleway (AS12876) in esams
  • 16:12 elukey: upload prometheus-druid-exporter 0.7-1 to stretch/buster-wikimedia
  • 15:42 elukey: roll restart of druid broker/historicals to pick up new logging/metrics settings
  • 14:39 onimisionipe: run `bmc-device --cold-reset; echo $?` in elastic2050 hoping it resets mgmt interface -T230597
  • 14:24 gehel: rolling reboot of cloudelastic
  • 13:52 mholloway-shell@deploy1001: Synchronized wmf-config/InitialiseSettings-labs.php: MachineVision (beta): Request labels targeting Beta Wikidata (duration: 00m 50s)
  • 08:18 _joe_: stopping php on phab1003, to restart it with systemd
  • 06:50 _joe_: upgrading envoyproxy across production (http2 CVEs)
  • 02:51 vgutierrez: repooling cp5002, running compress.so experiment

2019-08-15

  • 23:35 smalyshev@deploy1001: Finished deploy [wdqs/wdqs@b4da6e4]: Rollback blazegraph due to T230588 (duration: 09m 48s)
  • 23:25 smalyshev@deploy1001: Started deploy [wdqs/wdqs@b4da6e4]: Rollback blazegraph due to T230588
  • 21:54 smalyshev@deploy1001: Finished deploy [wdqs/wdqs@fce8177]: Weekly deploy (duration: 25m 28s)
  • 21:28 smalyshev@deploy1001: Started deploy [wdqs/wdqs@fce8177]: Weekly deploy
  • 21:27 ebernhardson: finish restarting cloudelastic-chi-eqiad with -XX:NewRatio=3
  • 21:18 ebernhardson: increase cloudelastic indices.recovery.max_bytes_per_sec from 40mbit to 512mbit as these have 10G networking
  • 21:07 ebernhardson: restart cloudelastic1002 with -XX:NewRatio=3 to match cloudelastic1001
  • 20:22 gehel@cumin1001: END (PASS) - Cookbook sre.wdqs.data-transfer (exit_code=0)
  • 19:37 ema: depool cp5002 during the EU night, running compress.so experiment
  • 19:28 gehel@cumin1001: END (PASS) - Cookbook sre.wdqs.reboot-wdqs (exit_code=0)
  • 19:19 sbassett: Deployed security patch for T230402 (1.34.0-wmf.17)
  • 19:18 gehel@cumin1001: START - Cookbook sre.wdqs.data-transfer
  • 19:18 sbassett: Deployed security patch for T229541 (1.34.0-wmf.17)
  • 19:17 gehel@cumin1001: END (ERROR) - Cookbook sre.wdqs.data-transfer (exit_code=97)
  • 19:17 gehel@cumin1001: START - Cookbook sre.wdqs.data-transfer
  • 19:01 ebernhardson: restart elasticsearch on cloudelastic1001 with -XX:NewRatio=3
  • 18:51 gehel@cumin1001: START - Cookbook sre.wdqs.reboot-wdqs
  • 17:58 mbsantos@deploy1001: Finished deploy [proton/deploy@fb0b2a5]: Update chromium-renderer to 3f1cc72 (T218220) (duration: 00m 43s)
  • 17:58 mbsantos@deploy1001: Started deploy [proton/deploy@fb0b2a5]: Update chromium-renderer to 3f1cc72 (T218220)
  • 17:47 mbsantos@deploy1001: Finished deploy [mobileapps/deploy@1bd2bea]: Update service-mobileapp-node to 5c1da03 (T230067 T229984) (duration: 05m 53s)
  • 17:41 mbsantos@deploy1001: Started deploy [mobileapps/deploy@1bd2bea]: Update service-mobileapp-node to 5c1da03 (T230067 T229984)
  • 17:11 ejegg: updated payments-wiki from 44eae2d65f to 00eb090dcc
  • 17:02 cstone: civicrm revision changed from 3caf54a0d2 to 9c7b2ffbc9
  • 16:53 reedy@deploy1001: Synchronized docroot/noc/db.php: Use WmfClusters from seperate file (duration: 00m 47s)
  • 16:52 reedy@deploy1001: Synchronized src/WmfClusters.php: Move WmfClusters.php (duration: 00m 47s)
  • 16:27 XioNoX: advertise core v4 range (208.80.152.0/22) from eqord - T167841
  • 16:09 ori: Finished messing around with mwdebug1002
  • 16:06 reedy@deploy1001: Synchronized docroot/: phpcs fixes (duration: 00m 47s)
  • 16:05 reedy@deploy1001: Synchronized wmf-config/arclamp.php: phpcs (duration: 00m 47s)
  • 16:04 reedy@deploy1001: Synchronized tests/: phpunit (duration: 00m 47s)
  • 16:03 reedy@deploy1001: Synchronized phpcs.xml: more exclusions! (duration: 00m 47s)
  • 15:40 ebernhardson: unfreeze writes to cloudelastic cluster
  • 15:37 ema: cp5002: re-pool with compress.so cache:false
  • 15:34 herron: performing rolling restarts of eqiad kafka-main brokers for security updates
  • 15:34 gehel@cumin1001: END (PASS) - Cookbook sre.wdqs.reboot-wdqs (exit_code=0)
  • 15:13 ori: Messing around with CommonSettings.php on mwdebug1002 to profile config loading
  • 14:58 gehel@cumin1001: START - Cookbook sre.wdqs.reboot-wdqs
  • 14:58 gehel@cumin1001: END (ERROR) - Cookbook sre.wdqs.reboot-wdqs (exit_code=97)
  • 14:56 gehel@cumin1001: START - Cookbook sre.wdqs.reboot-wdqs
  • 14:52 reedy@deploy1001: Synchronized wmf-config/: phpcs cleanup (duration: 00m 47s)
  • 14:51 gehel@cumin1001: END (ERROR) - Cookbook sre.wdqs.reboot-wdqs (exit_code=97)
  • 14:51 reedy@deploy1001: Synchronized multiversion/: phpcs cleanup (duration: 00m 47s)
  • 14:50 gehel@cumin1001: START - Cookbook sre.wdqs.reboot-wdqs
  • 14:50 ema: cp5002 depool due to compress.so crash
  • 14:50 gehel@cumin1001: END (PASS) - Cookbook sre.wdqs.reboot-wdqs (exit_code=0)
  • 14:49 reedy@deploy1001: Synchronized phpcs.xml: remove exclusions (duration: 00m 49s)
  • 14:47 gehel@cumin1001: START - Cookbook sre.wdqs.reboot-wdqs
  • 14:44 gehel@cumin1001: END (PASS) - Cookbook sre.wdqs.reboot-wdqs (exit_code=0)
  • 14:41 gehel@cumin1001: START - Cookbook sre.wdqs.reboot-wdqs
  • 14:33 papaul: shutting down db2063 for maintenance
  • 13:17 reedy@deploy1001: Synchronized phpcs.xml: remove excess lines (duration: 00m 46s)
  • 12:56 urbanecm@deploy1001: Synchronized wmf-config/InitialiseSettings.php: Remove account creation restrictions (T230304, T230521) (duration: 00m 48s)
  • 12:21 Urbanecm: EU SWAT done
  • 12:19 urbanecm@deploy1001: Synchronized wmf-config/InitialiseSettings.php: SWAT: d036388: Increase default thumb size to 260px on Dutch Wikipedia (T215106) (duration: 00m 48s)
  • 12:16 urbanecm@deploy1001: Synchronized php-1.34.0-wmf.17/extensions/AbuseFilter/extension.json: SWAT: e9422c5: Rearrange config to provide better experience (T191740, T200032, T226987) (duration: 00m 47s)
  • 12:14 urbanecm@deploy1001: Synchronized wmf-config/: SWAT: 7e95f6d: Update AbuseFilter config to keep the status quo (T191740, T200032, T226987) (duration: 00m 49s)
  • 12:04 Urbanecm: EU SWAT is going a few minutes out of its window
  • 12:01 gehel@cumin1001: END (PASS) - Cookbook sre.elasticsearch.force-shard-allocation (exit_code=0)
  • 12:01 gehel@cumin1001: START - Cookbook sre.elasticsearch.force-shard-allocation
  • 12:00 gehel@cumin1001: END (PASS) - Cookbook sre.elasticsearch.force-shard-allocation (exit_code=0)
  • 12:00 gehel@cumin1001: START - Cookbook sre.elasticsearch.force-shard-allocation
  • 11:37 Urbanecm: Run mwscript namespaceDupes.php --wiki=zhwikisource --add-prefix="FIXME" --fix (T230294)
  • 11:36 urbanecm@deploy1001: Synchronized wmf-config/InitialiseSettings.php: SWAT: fe9b6ed: Add Portal namespace on zhwikisource (T230294) (duration: 00m 47s)
  • 11:29 urbanecm@deploy1001: Synchronized wmf-config/throttle.php: SWAT: 377cc53: Add new throttle rule for cawiki editathon (T230313) (duration: 00m 47s)
  • 11:26 urbanecm@deploy1001: Synchronized wmf-config/InitialiseSettings.php: SWAT: Remove napwikisource from wgProofreadPageNamespaceIds (T230541) (duration: 00m 47s)
  • 11:12 urbanecm@deploy1001: Synchronized wmf-config/InitialiseSettings.php: SWAT: 0d8c516: Fix addition of Hubblesite.org and Spacetelescope.org to commons wgCopyUploadsDomains (T230083) (duration: 00m 48s)
  • 10:50 jforrester@deploy1001: Synchronized wmf-config/InitialiseSettings.php: T230533: Add more import sources for napwikisource (duration: 00m 52s)
  • 08:54 gehel@cumin1001: END (PASS) - Cookbook sre.elasticsearch.force-shard-allocation (exit_code=0)
  • 08:54 gehel@cumin1001: START - Cookbook sre.elasticsearch.force-shard-allocation
  • 08:52 gehel@cumin1001: END (PASS) - Cookbook sre.elasticsearch.force-shard-allocation (exit_code=0)
  • 08:52 gehel@cumin1001: START - Cookbook sre.elasticsearch.force-shard-allocation
  • 07:35 ema: cp5002: ats-backend-restart to enable compress plugin
  • 06:38 ema: wdqs1009: restart wdqs-updater.service
  • 00:15 robh: scs-ulsfo offline due to networking issues, rob returning tomorrow with fix T230077
  • 00:03 twentyafterfour: starting phabricator upgrade to 2019-08-14/1 refs T215697

2019-08-14

  • 23:13 ebernhardson: leave cloudelastic writes paused, and dropping from backlog queue, to allow primary clusters to catch up
  • 22:41 eileen: civicrm revision changed from 569e52e23d to 3caf54a0d2, config revision is 1c76e94ac3
  • 22:38 ebernhardson: freeze writes to cloudelastic for real this time
  • 22:03 ejegg: updated fundraising python tools from 827ce3750e to 5c080bac63
  • 22:01 robh: starting scs-ulsfo replacement. There will be icinga errors and they are intentionally being allowed so we know when things dont recover properly T230077
  • 21:37 XioNoX: advertise core v6 range (2620:0:860::/46) from eqord - T167841
  • 21:30 @: helmfile [STAGING] Ran 'apply' command on namespace 'sessionstore' for release 'staging' .
  • 21:26 ebernhardson: thaw writes to cloudelastic
  • 21:24 ejegg: updated payments-wiki from 9533f70fab to 44eae2d65f
  • 21:18 @: helmfile [STAGING] Ran 'apply' command on namespace 'sessionstore' for release 'staging' .
  • 21:13 ebernhardson: apply freeze to cloudelastic writes, to determine if backlog processing can catchup while deferring cloudelastic writes
  • 20:49 @: helmfile [STAGING] Ran 'apply' command on namespace 'sessionstore' for release 'staging' .
  • 20:46 @: helmfile [STAGING] Ran 'apply' command on namespace 'sessionstore' for release 'staging' .
  • 20:44 @: helmfile [STAGING] Ran 'apply' command on namespace 'sessionstore' for release 'staging' .
  • 20:44 @: helmfile [STAGING] Ran 'apply' command on namespace 'sessionstore' for release 'staging' .
  • 18:01 gehel@cumin1001: END (PASS) - Cookbook sre.elasticsearch.rolling-reboot (exit_code=0)
  • 17:29 gehel@cumin1001: START - Cookbook sre.elasticsearch.rolling-reboot
  • 16:32 gehel@cumin1001: END (PASS) - Cookbook sre.elasticsearch.force-shard-allocation (exit_code=0)
  • 16:32 gehel@cumin1001: START - Cookbook sre.elasticsearch.force-shard-allocation
  • 16:31 gehel@cumin1001: END (ERROR) - Cookbook sre.elasticsearch.rolling-reboot (exit_code=97)
  • 15:50 ema: cp5002: ats-backend-restart to disable compress plugin while I'm not around
  • 15:45 gehel@cumin1001: START - Cookbook sre.elasticsearch.rolling-reboot
  • 15:41 gehel: powercycling elastic101[789]
  • 15:30 gehel@cumin1001: END (FAIL) - Cookbook sre.elasticsearch.rolling-reboot (exit_code=99)
  • 14:55 vgutierrez: upgrade nginx to 1.13.9-1wm2 in cp3032
  • 14:17 fsero: upgrading envoy package to 1.11.1
  • 14:09 vgutierrez: rolling back nginx upgrade in cp3032
  • 14:01 reedy@deploy1001: Synchronized wmf-config/interwiki.php: Update interwiki cache (duration: 03m 04s)
  • 13:58 reedy@deploy1001: Synchronized static/images/project-logos/: T210752 (duration: 00m 47s)
  • 13:56 reedy@deploy1001: Synchronized wmf-config/InitialiseSettings.php: T210752 (duration: 00m 47s)
  • 13:55 reedy@deploy1001: rebuilt and synchronized wikiversions files: T212881
  • 13:53 reedy@deploy1001: Synchronized dblists/: T212881 (duration: 00m 48s)
  • 12:48 gehel@cumin1001: START - Cookbook sre.elasticsearch.rolling-reboot
  • 12:47 James_F: <sadtrombone> Wiki creation is still not working correctly, unfortunately.
  • Away: We're going to try making a new wiki. T212881
  • 12:20 vgutierrez: rolling upgrade of nginx to 1.13.9-1+wmf2 in the cache cluster
  • 12:17 gehel@cumin1001: END (ERROR) - Cookbook sre.elasticsearch.rolling-reboot (exit_code=97)
  • 11:20 vgutierrez: repooling cp5002
  • 11:19 tarrow: termbox smoketests finished
  • 11:06 gehel@cumin1001: START - Cookbook sre.elasticsearch.rolling-reboot
  • 10:46 ema: depool cp5002 after crash. See /var/log/trafficserver/crash-2019-08-14-104502.log
  • 10:28 tarrow: Starting smoketest of termbox service on eqiad: T229907
  • 09:40 gehel@cumin1001: END (ERROR) - Cookbook sre.elasticsearch.rolling-reboot (exit_code=97)
  • 09:20 ema: cp5002: ats-backend-restart to enable compress plugin
  • 08:52 vgutierrez: upgrading nginx to 1.13.9-1+wmf2 in cp1075, cp2001, cp3030 and cp4027 (text) and cp1076, cp2002, cp3034, cp4021 (upload)
  • 08:25 vgutierrez: upgrading nginx to 1.13.9-1+wmf2 in cp5001 (upload) and cp5007 (text)
  • 08:17 vgutierrez: uploaded nginx-1.13.9-1+wmf2 to apt.wikimedia.org (stretch)
  • 08:16 gehel@cumin1001: START - Cookbook sre.elasticsearch.rolling-reboot
  • 08:12 gehel@cumin1001: END (FAIL) - Cookbook sre.elasticsearch.rolling-reboot (exit_code=99)
  • 08:10 gehel@cumin1001: START - Cookbook sre.elasticsearch.rolling-reboot
  • 07:09 marostegui@deploy1001: Synchronized wmf-config/db-codfw.php: Remove db2063 from config T230459 (duration: 00m 47s)
  • 07:08 marostegui@deploy1001: Synchronized wmf-config/db-eqiad.php: Remove db2063 from config T230459 (duration: 00m 48s)

2019-08-13

  • 20:43 ejegg: rolled back payments-wiki from 9ed8be0532 to 9533f70fab
  • 20:34 ejegg: updated payments-wiki from 9533f70fab to 9ed8be0532
  • 20:32 mholloway-shell@deploy1001: Synchronized wmf-config/InitialiseSettings-labs.php: Fix MachineVision provider config (duration: 00m 47s)
  • 19:48 gehel@cumin2001: END (PASS) - Cookbook sre.elasticsearch.rolling-reboot (exit_code=0)
  • 19:23 ppchelko@deploy1001: Finished deploy [cpjobqueue/deploy@3882ddb]: Increase cirrusSearchLinksUpdatePrioritized concurrency 150 -> 200 T220625 (duration: 00m 58s)
  • 19:22 ppchelko@deploy1001: Started deploy [cpjobqueue/deploy@3882ddb]: Increase cirrusSearchLinksUpdatePrioritized concurrency 150 -> 200 T220625
  • 19:19 ppchelko@deploy1001: deploy aborted: Revert on canary (duration: 00m 18s)
  • 19:18 ppchelko@deploy1001: Started deploy [cpjobqueue/deploy@f1a562e]: Revert on canary
  • 19:17 ppchelko@deploy1001: deploy aborted: Increase cirrusSearchLinksUpdatePrioritized concurrency 150 -> 200 T220625 (duration: 01m 30s)
  • 19:15 ppchelko@deploy1001: Started deploy [cpjobqueue/deploy@f1a562e]: Increase cirrusSearchLinksUpdatePrioritized concurrency 150 -> 200 T220625
  • 19:03 @: helmfile [STAGING] Ran 'apply' command on namespace 'sessionstore' for release 'staging' .
  • 18:50 gehel@cumin2001: START - Cookbook sre.elasticsearch.rolling-reboot
  • 18:41 ebernhardson: set cpufreq scaling_governor to performance on cloudelastic100[1-4] to test any changes to indexing performance
  • 18:38 mholloway-shell@deploy1001: Synchronized wmf-config/CommonSettings.php: Enable MachineVision on Beta (4/4) (duration: 00m 48s)
  • 18:34 mholloway-shell@deploy1001: Synchronized wmf-config/InitialiseSettings-labs.php: Enable MachineVision on Beta (3/4) (duration: 00m 47s)
  • 18:33 crusnov@deploy1001: Finished deploy [netbox/deploy@367ca84]: Update Netbox to v2.6.1-wmf3 affects: T223292 (fix perms) (duration: 00m 09s)
  • 18:33 crusnov@deploy1001: Started deploy [netbox/deploy@367ca84]: Update Netbox to v2.6.1-wmf3 affects: T223292 (fix perms)
  • 18:33 crusnov@deploy1001: Finished deploy [netbox/deploy@367ca84]: Update Netbox to v2.6.1-wmf3 affects: T223292 (duration: 00m 43s)
  • 18:32 crusnov@deploy1001: Started deploy [netbox/deploy@367ca84]: Update Netbox to v2.6.1-wmf3 affects: T223292
  • 18:32 crusnov@deploy1001: Finished deploy [netbox/deploy@367ca84]: Update Netbox to v2.6.1-wmf3 affects: T223292 (duration: 00m 36s)
  • 18:31 crusnov@deploy1001: Started deploy [netbox/deploy@367ca84]: Update Netbox to v2.6.1-wmf3 affects: T223292
  • 18:30 mholloway-shell@deploy1001: Synchronized wmf-config/InitialiseSettings.php: Enable MachineVision on Beta (2/4) (duration: 00m 48s)
  • 18:27 mholloway-shell@deploy1001: Synchronized wmf-config/extension-list: Enable MachineVision on Beta (1/4) (duration: 00m 48s)
  • 17:44 XioNoX: set target netflow port to 2000 in eqiad
  • 17:11 XioNoX: repool eqsin
  • 17:06 XioNoX: rollback: disable all peering and transit on cr2-eqsin
  • 16:57 XioNoX: reboot cr2-eqsin
  • 16:46 XioNoX: disable all peering and transit on cr2-eqsin
  • 16:25 filippo@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
  • 16:25 filippo@cumin1001: START - Cookbook sre.hosts.downtime
  • 16:25 filippo@cumin1001: END (FAIL) - Cookbook sre.hosts.downtime (exit_code=99)
  • 16:25 filippo@cumin1001: START - Cookbook sre.hosts.downtime
  • 16:07 ppchelko@deploy1001: Finished deploy [restbase/deploy@8fca708]: Expose transform/wikitext/to/mobile-html endpoint T211026, take 2 (duration: 10m 12s)
  • 15:56 ppchelko@deploy1001: Started deploy [restbase/deploy@8fca708]: Expose transform/wikitext/to/mobile-html endpoint T211026, take 2
  • 15:56 ppchelko@deploy1001: Finished deploy [restbase/deploy@8fca708]: Expose transform/wikitext/to/mobile-html endpoint T211026 (duration: 07m 35s)
  • 15:49 ppchelko@deploy1001: Started deploy [restbase/deploy@8fca708]: Expose transform/wikitext/to/mobile-html endpoint T211026
  • 15:46 XioNoX: fail vrrp master to cr1-eqsin
  • 15:42 gehel@cumin2001: END (FAIL) - Cookbook sre.elasticsearch.rolling-reboot (exit_code=99)
  • 15:39 bblack: puppet re-enabled on lvs1014, lvs1016, icinga1001
  • 15:35 XioNoX: depool eqsin for cr2-eqsin upgrade
  • 15:32 bblack: disabled pupped on lvs1014, lvs1016, icinga1001 ahead of deploying https://gerrit.wikimedia.org/r/#/c/operations/puppet/+/528885/ - T229621
  • 15:32 gehel@cumin2001: START - Cookbook sre.elasticsearch.rolling-reboot
  • 15:30 XioNoX: rollback ospf + bgp changes on cr2-eqord
  • 15:19 XioNoX: restart cr2-eqord - T227886
  • 15:12 XioNoX: disable all peering and transit on cr2-eqord
  • 15:01 XioNoX: increase ospf cost of cr2-eqord<->cr2-eqiad link (+1000)
  • 14:57 ema: cp5002: reboot for kernel upgrade
  • 14:42 gehel@cumin2001: END (FAIL) - Cookbook sre.elasticsearch.rolling-reboot (exit_code=99)
  • 14:42 gehel@cumin2001: START - Cookbook sre.elasticsearch.rolling-reboot
  • 14:31 gehel@cumin2001: END (FAIL) - Cookbook sre.elasticsearch.rolling-reboot (exit_code=99)
  • 14:31 gehel@cumin2001: START - Cookbook sre.elasticsearch.rolling-reboot
  • 14:29 XioNoX: rollback: disable all peering and transit on cr2-eqdfw
  • 14:18 XioNoX: reboot cr2-eqdfw for software upgrade - T227886
  • 14:14 XioNoX: disable all peering and transit on cr2-eqdfw
  • 14:04 volans@cumin2001: END (PASS) - Cookbook sre.hosts.decommission (exit_code=0)
  • 14:04 volans@cumin2001: START - Cookbook sre.hosts.decommission
  • 13:20 jbond42: rolling update of postgresql-9.6
  • 13:07 jijiki: rolling restart hhvm on api servers in eqiad
  • 12:57 jijiki: Restart hhvm on mw1235
  • 12:17 fsero@puppetmaster1001: conftool action : set/pooled=true; selector: dnsdisc=sessionstore|citoid|cxserver|eventgate-analytics|eventgate-main|termbox|blubberoid|mathoid|zotero,name=eqiad
  • 12:08 _joe_: restarted php-fpm on mw1221
  • 12:03 fsero@: helmfile [EQIAD] Ran 'apply' command on namespace 'sessionstore' for release 'production' .
  • 12:00 fsero@: helmfile [EQIAD] Ran 'apply' command on namespace 'cxserver' for release 'production' .
  • 11:56 fsero@: helmfile [EQIAD] Ran 'apply' command on namespace 'blubberoid' for release 'production' .
  • 11:56 fsero@: helmfile [EQIAD] Ran 'apply' command on namespace 'blubberoid' for release 'production' .
  • 11:49 fsero@: helmfile [EQIAD] Ran 'apply' command on namespace 'blubberoid' for release 'production' .
  • 11:44 fsero: recreating cxserver blubber and sessionstore namespace - T228836
  • 11:39 fsero@: helmfile [EQIAD] Ran 'apply' command on namespace 'mathoid' for release 'production' .
  • 11:35 gehel: restart wdqs-blazegraph on wdqs2001
  • 11:34 gehel: restart wdqs-updater on wdqs2001
  • 11:30 fsero@: helmfile [EQIAD] Ran 'apply' command on namespace 'eventgate-main' for release 'main' .
  • 11:29 fsero@: helmfile [EQIAD] Ran 'apply' command on namespace 'eventgate-analytics' for release 'analytics' .
  • 11:25 fsero@: helmfile [EQIAD] Ran 'apply' command on namespace 'citoid' for release 'production' .
  • 11:21 fsero: recreating citoid eventgate-analytics eventgate-main mathoid namespace - T228836
  • 11:20 fsero@: helmfile [EQIAD] Ran 'apply' command on namespace 'termbox' for release 'production' .
  • 11:18 raynor: EU SWAT finished
  • 11:15 pmiazga@deploy1001: Synchronized wmf-config/InitialiseSettings.php: SWAT: Undeploy editor gender surveys (T227793) (duration: 00m 48s)
  • 11:13 fsero: recreating termbox namespace - T228836
  • 11:06 oblivian@: helmfile [EQIAD] Ran 'apply' command on namespace 'zotero' for release 'production' .
  • 11:04 fsero: resetting net.netfilter.nf_conntrack_tcp_timeout_time_wait to 65 in kubernetes2006
  • 10:59 _joe_: [eqiad] downtiming zotero on icinga for 10 minutes while recreating the deployment with helmfile
  • 10:57 oblivian@cumin1001: END (FAIL) - Cookbook sre.hosts.downtime (exit_code=99)
  • 10:57 oblivian@cumin1001: START - Cookbook sre.hosts.downtime
  • 10:56 oblivian@cumin1001: END (FAIL) - Cookbook sre.hosts.downtime (exit_code=99)
  • 10:56 oblivian@cumin1001: START - Cookbook sre.hosts.downtime
  • 10:49 oblivian@: helmfile [EQIAD] Ran 'apply' command on namespace 'kube-system' for release 'coredns' .
  • 10:44 oblivian@: helmfile [EQIAD] Ran 'apply' command on namespace 'kube-system' for release 'coredns' .
  • 10:39 oblivian@: helmfile [EQIAD] Ran 'apply' command on namespace 'kube-system' for release 'rbac-deploy-clusterrole' .
  • 10:39 _joe_: recreating rbac roles via helmfile
  • 10:32 oblivian@: helmfile [EQIAD] Ran 'apply' command on namespace 'kube-system' for release 'calico-policy-controller' .
  • 10:29 _joe_: deleting calico deploy and configmap in kubernetes in eqiad, recreating with helmfile
  • 10:25 jbond42: rolling update of ghostscript
  • 10:23 fsero@puppetmaster1001: conftool action : set/pooled=false; selector: dnsdisc=sessionstore|citoid|cxserver|eventgate-analytics|eventgate-main|termbox|blubberoid|mathoid|zotero,name=eqiad
  • 10:10 fsero: initialize_cluster.sh kube-system kubemaster.svc.eqiad.wmnet 6443 - T228836
  • 10:10 fsero: creating tiller in kube-system for helmfile T228836
  • 09:58 vgutierrez: upgrading the rest of cache@upload to 8.0.3-1wm3 - T221594
  • 08:49 marostegui: Stop MySQL on db2057 - T230394
  • 08:48 marostegui: Remove db2057 from tendril and zarcillo T230394
  • 07:55 marostegui@deploy1001: Synchronized wmf-config/db-eqiad.php: Remove db2057 from config T230394 (duration: 00m 47s)
  • 07:54 marostegui@deploy1001: Synchronized wmf-config/db-codfw.php: Remove db2057 from config T230394 (duration: 00m 48s)
  • 06:59 volans: upgrading spicerack to 0.0.26 on cumin2001
  • 06:49 vgutierrez: Rolling restart of fifo-log-demux and atsmtail services across cache@upload
  • 06:38 vgutierrez: upgrading fifo-log-demux to version 0.5 in cache@upload
  • 06:11 vgutierrez: Upgrading ATS to 8.0.3-1wm3 in cp2002, cp1076, cp3034 and cp4021 - T221594
  • 05:47 marostegui: Stop mysql on db2050 - T230391
  • 05:40 marostegui: Remove db2050 from tendril and zarcillo T230391
  • 05:35 marostegui@cumin1001: dbctl commit (dc=all): 'Remove db2050 from config, host will be decommissioned T230391', diff saved to https://phabricator.wikimedia.org/P8904 and previous config saved to /var/cache/conftool/dbconfig/20190813-053514-marostegui.json
  • 05:33 marostegui@deploy1001: Synchronized wmf-config/db-codfw.php: Remove db2050 from config T230391 (duration: 00m 48s)
  • 05:32 marostegui@deploy1001: Synchronized wmf-config/db-eqiad.php: Remove db2050 from config T230391 (duration: 00m 48s)
  • 05:12 marostegui@deploy1001: Synchronized wmf-config/db-eqiad.php: Provision db2122 into s7 T228969 (duration: 00m 47s)
  • 05:11 marostegui@deploy1001: Synchronized wmf-config/db-codfw.php: Provision db2122 into s7 T228969 (duration: 00m 49s)
  • 05:10 marostegui@cumin1001: dbctl commit (dc=all): 'Provision db2122 into s7 T228969', diff saved to https://phabricator.wikimedia.org/P8903 and previous config saved to /var/cache/conftool/dbconfig/20190813-051019-marostegui.json

2019-08-12

  • 23:24 XioNoX: add samplicator to buster-wikimedia repo
  • 21:33 eileen: tools revision changed from 2a56e5e283 to 827ce3750e
  • 20:43 eileen: civicrm revision changed from be5b5a150b to 569e52e23d, config revision is 1c76e94ac3
  • 20:17 mbsantos@deploy1001: Finished deploy [mobileapps/deploy@615004f]: Update service-mobileapp-node to f0a2847 (duration: 05m 05s)
  • 20:12 mbsantos@deploy1001: Started deploy [mobileapps/deploy@615004f]: Update service-mobileapp-node to f0a2847
  • 20:08 gehel@cumin2001: END (ERROR) - Cookbook sre.elasticsearch.rolling-reboot (exit_code=97)
  • 19:15 mforns@deploy1001: Finished deploy [analytics/refinery@5418d3b]: deploying analytics-refinery up to 5418d3b (duration: 39m 23s)
  • 19:14 gehel@cumin2001: START - Cookbook sre.elasticsearch.rolling-reboot
  • 18:35 mforns@deploy1001: Started deploy [analytics/refinery@5418d3b]: deploying analytics-refinery up to 5418d3b
  • 17:42 onimisionipe@deploy1001: Finished deploy [wdqs/wdqs@8579f50]: Updated GUI, New endpoints and New Blazegraph and Updater build (duration: 05m 04s)
  • 17:37 onimisionipe@deploy1001: Started deploy [wdqs/wdqs@8579f50]: Updated GUI, New endpoints and New Blazegraph and Updater build
  • 15:05 jijiki: rolling restat php-fpm on mw122[4-8] - T219150
  • 15:01 ema: cp1076, cp500[12]: restart trafficserver with compress plugin disabled
  • 14:39 jijiki: disable puppet on mw122[4-8]
  • 14:24 urbanecm@deploy1001: Synchronized wmf-config/InitialiseSettings.php: Account creation throttle to 2 everywhere (T230304) (duration: 00m 47s)
  • 13:51 gehel@cumin2001: END (ERROR) - Cookbook sre.elasticsearch.rolling-reboot (exit_code=97)
  • 13:18 gehel@cumin2001: START - Cookbook sre.elasticsearch.rolling-reboot
  • 12:51 ema: cp1076,cp5001,cp5002: ats-backend-restart to disable ATS systemd hardening features
  • 11:36 urbanecm@deploy1001: Synchronized wmf-config/InitialiseSettings.php: More restrictive account creation throttle (T230304) (duration: 00m 47s)
  • 11:34 vgutierrez: restart atsmtail@backend on cp1076
  • 11:24 urbanecm@deploy1001: Synchronized wmf-config/InitialiseSettings.php: Enable global abuse filters on warwiki as an emergency measure (T230304) (duration: 00m 48s)
  • 10:59 vgutierrez: restarting trafficserver in cp5002
  • 10:47 vgutierrez: Upgrade trafficserver to 8.0.3-1wm3 in cp5002 - T221594
  • 10:47 jijiki: Enabling puppet and rolling restarting nginx across the fleet - T224538
  • 10:39 jijiki: Restarting nginx on mwmaint2001.codfw.wmnet,mwmaint1002.eqiad.wmnet,scandium.eqiad.wmnet,snapshot[1005-1009].eqiad.wmnet, deploy2001.codfw.wmnet,deploy1001.eqiad.wmnet
  • 10:28 jijiki: Disable puppet on all servers running a services_proxy - T224538
  • 10:09 marostegui: Remove empty table globalblocks from s3 (where it exists) - T230055
  • 10:07 vgutierrez: Upgrade trafficserver to 8.0.3-1wm3 in cp5001 - T221594
  • 10:01 marostegui: Remove empty table wikidatawiki.globalblocks from s8 - T230055
  • 09:36 jijiki: Disable puppet on mwmaint for 425027
  • 09:36 marostegui: Remove empty table enwikivoyage.globalblocks from s5 - T230055
  • 09:32 marostegui: Stop MySQL on db2043 T230311
  • 09:24 marostegui: Remove empty table testcommonswiki. globalblocks from s4 - T230055
  • 09:22 marostegui@deploy1001: Synchronized wmf-config/db-eqiad.php: Remove db2043 from config T230311 (duration: 00m 47s)
  • 09:22 marostegui: Remove db2043 from tendril and zarcillo T230311
  • 09:21 marostegui@deploy1001: Synchronized wmf-config/db-codfw.php: Remove db2043 from config T230311 (duration: 00m 48s)
  • 09:06 jijiki: depool and pool back mw1222
  • 08:22 elukey: restart Analytics hadoop HDFS namenodes to pick up new heap settings
  • 08:12 marostegui@deploy1001: Synchronized wmf-config/db-codfw.php: Reorganize s3 codfw weights T220170 (duration: 00m 48s)
  • 08:07 marostegui@cumin1001: dbctl commit (dc=codfw): 'Reorganize s3 codfw weights T220170', diff saved to https://phabricator.wikimedia.org/P8901 and previous config saved to /var/cache/conftool/dbconfig/20190812-080731-marostegui.json
  • 07:46 marostegui@deploy1001: Synchronized wmf-config/db-codfw.php: Promote db2105 as s3 codfw master (duration: 00m 47s)
  • 07:43 marostegui@cumin1001: dbctl commit (dc=all): 'Promote db2105 to s3 codfw master T230106', diff saved to https://phabricator.wikimedia.org/P8900 and previous config saved to /var/cache/conftool/dbconfig/20190812-074314-marostegui.json
  • 07:34 marostegui: Switchover s3 codfw master db2043 -> db2105 - T230106
  • 07:26 marostegui@cumin1001: dbctl commit (dc=all): 'Pool db2121 into s7', diff saved to https://phabricator.wikimedia.org/P8899 and previous config saved to /var/cache/conftool/dbconfig/20190812-072617-marostegui.json
  • 07:20 marostegui@deploy1001: Synchronized wmf-config/db-codfw.php: Provision db2121 into s7 T228969 (duration: 00m 47s)
  • 07:19 marostegui@deploy1001: Synchronized wmf-config/db-eqiad.php: Provision db2121 into s7 T228969 (duration: 00m 48s)
  • 05:04 marostegui: Remove math table from s3 - T196055
  • 05:02 marostegui: Remove math table from s1 - T196055

2019-08-11

  • 22:38 urbanecm@deploy1001: Synchronized wmf-config/InitialiseSettings.php: Temporary make account creation limits more restrictive (T230304) (duration: 00m 50s)

2019-08-10

  • 01:49 mutante: mwmaint - running (1 of 8, the one for en) refreshLinks maintenance cron manually to verify it works after switching mwscriptwikiset to PHP7.2 (T195392)
  • 00:52 mutante: mwmaint - running update_flaggedrevs_stats - updates the flagged revs statistics table on each wiki
  • 00:47 mutante: mwmaint - running cirrus sanitize jobs maintenance cron

2019-08-09

  • 21:28 mutante: mwmaint - generating new captchas for ConfirmEdit extension by running generatecaptcha maintenance cron command
  • 20:55 mutante: mwmaint - running update_special_pages maintenance cron manually
  • 20:31 mutante: contint1001 - added entry to /etc/fstab for /mnt/docker to survive reboots ( 13 /dev/mapper/contint1001--data-docker /mnt/docker ext4 defaults 0 2$
  • 19:46 mutante: mwdebug1001 - temp stopped puppet, editing nginx config to test making it listen on IPv6 for upstream proxies (529401) (T224538)
  • 19:37 mutante: mwmaint - running cirrussearch maintenance jobs manually (completion indices, sanitize cirrus jobs)
  • 18:14 elukey: add BGP peer for AS 38758 on cr1-eqsin
  • 17:54 mutante: mwmaint - running initsitestats maintenance job - initializes or updates statistics table on all wikis
  • 17:23 elukey: set BGP peer "BrightRidge" on cr2-eqiad
  • 17:19 mutante: mwmaint - running purgeParserCache maintenance cron manually with PHP 7.2 - ..slowly
  • 16:52 mutante: mwmaint - manually running updatePageTriageQueue maintenance cron with PHP 7.2
  • 16:15 arturo: add phamhi to 'wmf' and 'ops' LDAP groups (T228942)
  • 15:48 jijiki: Disable puppet on mw1222 and depool
  • 11:50 ema: root@puppetmaster2001:/srv/private# su -c "export GIT_SSH=/srv/private/.git/ssh_wrapper.sh ; git push ssh://puppetmaster1001.eqiad.wmnet/srv/private master" gitpuppet
  • 11:44 ema: puppetmaster1001: resetting last 3 /srv/private commits due to broken replication
  • 10:38 thcipriani: gerrit restart on cobalt.
  • 09:36 marostegui: Drop math table from s7 T196055
  • 09:04 marostegui: Drop math table from s4 - T196055
  • 08:58 marostegui: Reload haproxy on dbproxy1010 to repool labsdb1011 - T196055
  • 08:51 moritzm: upgrading ghostscript on thumbor1001
  • 08:32 marostegui: Stop MySQL on db2069 T230107
  • 08:29 marostegui: Remove db2069 from tendril and zarcillo T230107
  • 08:24 marostegui: Reload haproxy on dbproxy1010 to depool labsdb1011 - T196055
  • 07:31 vgutierrez: uploaded trafficserver-8.0.3wm3 to apt.wikimedia.org (stretch) - T220383 T228135
  • 06:19 elukey: powercycle thumbor2004 (no ssh, serial console showing a fronzen os)
  • 05:37 marostegui: Run maintain-views script with --clean to clean up math table views - T196055
  • 02:30 mutante: mwmaint1002 - manually running cleanup_upload_stash maintenance cron to confirm no issues with PHP 7.2 in maintenance/cleanupUploadStash.php
  • 02:24 mutante: mwmaint1002 - manually running purge_expired_userrights maintenance cron to confirm no issues with PHP 7.2 in maintenance/purgeExpiredUserrights.php
  • 02:17 mutante: mwmaint1002 - manually running purge_abusefilter maintenance cron

2019-08-08

  • 23:50 Urbanecm: Evening SWAT done
  • 23:49 urbanecm@deploy1001: Synchronized php-1.34.0-wmf.17/extensions/WikiEditor/modules/jquery.wikiEditor.dialogs.config.js: SWAT: 6dcab39: Follow-up Ia75d685c: Fix the insert file dialog (T230078) (duration: 00m 50s)
  • 23:48 mutante: mwmaint1002 - manually running purge_securepoll maintenance script
  • 23:42 mutante: mwmaint1002 - manually running TranslatioNNotifications DigestEmailer maintenance cron
  • 22:05 mutante: rolling out new scap version 3.12.0-1 on all of eqiad
  • 22:02 mutante: mwdebug2002 - scap pull to test new scap, nothing to do
  • 22:00 mutante: rolling out new scap version 3.12.0-1 on all of codfw
  • 21:54 mutante: (purge unpublished articles from ContentTranslation older than 455 days)
  • 21:52 mutante: mwmwaint1002 - manually running purge_old_cx_drafts maintenance job for ContentTranslation - after switching helper script to PHP 7.2
  • 21:50 mutante: mwmaint1002 - manually running purgeUnusedProjects with PageAssessments extension to confirm no issues after switch to PHP7.2
  • 21:40 mutante: mwmaint1002 - manually running (weekly) echo_mail cron job (user notifications) to confirm it works after switching foreachwikiindblist to use php7.2 (T195392)
  • 21:30 mutante: rolling out new scap package 3.12.0-1 on mw-canary servers via debdeploy (T230144)
  • 21:28 mutante: rolling out new scap package 3.12.0-1 on contint servers
  • 21:22 mutante: built new scap version 3.12.0-1 on boron, imported packages on install1002 (apt.wm.org), copied from stretch to jessie and buster (T230144)
  • 20:33 gehel@cumin1001: END (PASS) - Cookbook sre.wdqs.data-transfer (exit_code=0)
  • 19:36 thcipriani: restart gerrit on cobalt to pick up new config
  • 19:34 thcipriani: restart gerrit-replica on gerrit2001 to pick up new config
  • 19:27 gehel@cumin1001: START - Cookbook sre.wdqs.data-transfer
  • 19:06 brennen@deploy1001: rebuilt and synchronized wikiversions files: all wikis to 1.34.0-wmf.17
  • 17:52 XioNoX: run /usr/local/sbin/restart-php7.2-fpm on mwdebug1001
  • 17:33 fdans@deploy1001: Finished deploy [analytics/refinery@cef01d3]: deploy analytics refinery, second attempt (duration: 16m 52s)
  • 17:21 XioNoX: add user jbond to network devices
  • 17:16 fdans@deploy1001: Started deploy [analytics/refinery@cef01d3]: deploy analytics refinery, second attempt
  • 16:56 ppchelko@deploy1001: Finished deploy [changeprop/deploy@069d297]: Remove workaround for ORES not supporting eventgate events T228688 (duration: 01m 24s)
  • 16:55 ppchelko@deploy1001: Started deploy [changeprop/deploy@069d297]: Remove workaround for ORES not supporting eventgate events T228688
  • 16:40 fdans@deploy1001: Started deploy [analytics/refinery@cef01d3]: deploying analytics refinery
  • 15:49 XioNoX: set virtual-chassis vcp-snmp-statistics to all VC - T228824
  • 15:13 herron: rebooting fermium (lists) for security updates
  • 15:11 XioNoX: commit synchronize on cr1-codfw - T226422
  • 14:52 XioNoX: continue cr1-codfw:re1 replacement - T226422
  • 13:09 marostegui: Drop table math from s8 T196055
  • 12:15 tarrow: EU midday SWAT done
  • 12:15 tarrow@deploy1001: Synchronized php-1.34.0-wmf.17/extensions/Wikibase/: SWAT: Add hook to invalidate cache entries missing TermboxOption (T228978) (duration: 01m 14s)
  • 12:01 tarrow@deploy1001: Synchronized php-1.34.0-wmf.17/extensions/Wikibase/: SWAT: Split ParserCache on Termbox (T228978) (duration: 01m 21s)
  • 12:00 tarrow: Running SWAT a little over time because late start and slow jenkins
  • 11:46 urbanecm@deploy1001: Synchronized wmf-config/InitialiseSettings.php: SWAT: dfeb2a9: HD logo for enwikivoyage (T230114) (duration: 00m 56s)
  • 11:44 urbanecm@deploy1001: Synchronized static/images/project-logos/: SWAT: dfeb2a9: HD logo for enwikivoyage (T230114) (duration: 00m 56s)
  • 11:31 Urbanecm: Purge https://en.wikipedia.org/static/images/project-logos/zhwikisource.png (T229715)
  • 11:30 urbanecm@deploy1001: Synchronized wmf-config/InitialiseSettings.php: SWAT: be886ad: Add hd variations for zhwikiource project logo (T229715) (duration: 00m 55s)
  • 11:28 urbanecm@deploy1001: Synchronized static/images/project-logos/: SWAT: be886ad: Add hd variations for zhwikiource project logo (T229715) (duration: 00m 56s)
  • 11:24 urbanecm@deploy1001: Synchronized wmf-config/InitialiseSettings.php: SWAT: 9a4494a: Add Hubblesite.org and Spacetelescope.org to commons wgCopyUploadsDomains (T230083) (duration: 00m 57s)
  • 11:05 Urbanecm: Run scap pull on mwdebug1001 to revert local modifications (T207627)
  • 10:53 jijiki: Disable puppet, depool and pool mw1221, mw1222, mw1223 for 529061
  • 10:46 Urbanecm: Set $wgContentHandlers["flow-board"] = $wgContentHandlers["wikitext"]; locally on mwdebug1001 to fix few bad pages (T207627)
  • 10:43 moritzm: installing exim4 security updates on buster hosts (our exim config is not vulnerable)
  • 09:41 moritzm: installing OpenJDK security updates on WDQS servers
  • 09:30 jbond42: disabling puppet fleet wide
  • 09:26 marostegui: Drop table math from labswiki (wikitech) and labtestwiki T196055
  • 09:21 marostegui@deploy1001: Synchronized wmf-config/db-eqiad.php: Remove db2069 from config T230107 (duration: 00m 55s)
  • 09:19 marostegui@deploy1001: Synchronized wmf-config/db-codfw.php: Remove db2069 from config T230107 (duration: 00m 57s)
  • 08:45 elukey: restart hadoop namenodes on an-master100* to pick up new GC settings (CMS -> G1 switch)
  • 08:44 moritzm: installing OpenJDK security updates on elastic* servers
  • 08:36 marostegui: Remove math table from s5 T196055
  • 08:13 marostegui: Stop MySQL on db2065 to test dbproxy2003
  • 07:48 marostegui@deploy1001: Synchronized wmf-config/db-codfw.php: Promote db2096 as codfw x1 master T220170 (duration: 00m 57s)
  • 07:39 marostegui: Switchover x1 codfw master db2069 -> db2096 T220170
  • 06:40 _joe_: restarting php-fpm on the application servers to pick up the change
  • 05:54 marostegui: Stop MySQL on db2035 for decommissioning T229784
  • 05:52 marostegui: Remove db2035 from tendril and zarcillo T229784
  • 00:48 mutante: mwdebug2002 - sudo -i restart-php7.2-fpm
  • 00:20 ejegg: re-enabled both recurring charge jobs
  • 00:02 tstarling@deploy1001: Synchronized wmf-config/CommonSettings.php: hack for Parsoid testing on scandium (duration: 00m 55s)

2019-08-07

  • 23:58 tstarling@deploy1001: Synchronized w/rest.php: Creating rest.php endpoint disabled by default (duration: 00m 55s)
  • 23:46 ejegg: disabled newer recurring charge job to test one at a time on existing recur records
  • 23:22 mutante: elastic2054 - powercycling after it went down unexpectedly and Icinga alerted, this happened before in T227298
  • 23:08 XioNoX: set virtual-chassis vcp-snmp-statistics on asw2-ulsfo - T228824
  • 23:07 ebernhardson@deploy1001: Synchronized wmf-config/InitialiseSettings.php: T220625: Send writes for all non-private wikis to cloudelastic (duration: 01m 02s)
  • 23:03 XioNoX: set virtual-chassis vcp-snmp-statistics on asw-a-codfw - T228824
  • 22:50 ebernhardson: mwmaint start cirrussearch saneitize.php against all non-private group1 wikis for cloudelastic cluster
  • 22:48 mutante: mwmaint1002 - manually running the purgeOldData cron command to verify it with PHP 7.2 for 528730 (T195392)
  • 22:12 jgleeson: switched on all fundraising process-control except ingenico_recurring_charge
  • 21:50 ppchelko@deploy1001: Finished deploy [cpjobqueue/deploy@a151f4e]: Prepare for eventgate transition T230049 T230048 (duration: 00m 59s)
  • 21:49 ppchelko@deploy1001: Started deploy [cpjobqueue/deploy@a151f4e]: Prepare for eventgate transition T230049 T230048
  • 21:25 mutante: restarting gerrit service to apply config change (528769)
  • 21:00 ebernhardson: apply transient logger settings from prod search clusters to cloudelastic
  • 20:34 reedy@deploy1001: rebuilt and synchronized wikiversions files: labswiki back to .17
  • 20:34 jgleeson: updated civicrm from 727a2c193b to be5b5a150b
  • 20:32 reedy@deploy1001: rebuilt and synchronized wikiversions files: labswiki back to .16 temporarily
  • 20:28 jgleeson: switched off fundraising process-control jobs
  • 19:36 brennen@deploy1001: Synchronized php: group1 wikis to 1.34.0-wmf.17 (duration: 00m 54s)
  • 19:35 brennen@deploy1001: rebuilt and synchronized wikiversions files: group1 wikis to 1.34.0-wmf.17
  • 19:16 reedy@deploy1001: Synchronized wmf-config/InitialiseSettings.php: Revert Switch property terms migration to WRITE_NEW on client wikis T225053 (duration: 00m 58s)
  • 18:15 jijiki: Restart hhvm and php-fpm on canary mw hosts
  • 17:54 shdubsh: install2002 add fstab entry for /srv mount - T229997
  • 17:46 shdubsh: install2002 stop nginx and squid for resync /srv to spare disk and restore mount - T229997
  • 17:42 otto@deploy1001: Synchronized wmf-config/InitialiseSettings.php: Retry - Revert "Switch high-traffic jobs to eventgate." (duration: 00m 58s)
  • 16:40 mobrovac@deploy1001: Synchronized wmf-config/InitialiseSettings.php: JobQueue: Revert switching high-traffic jobs to eventgate (duration: 00m 55s)
  • 16:34 mobrovac@deploy1001: scap failed: average error rate on 6/11 canaries increased by 10x (rerun with --force to override this check, see https://logstash.wikimedia.org/goto/db09a36be5ed3e81155041f7d46ad040 for details)
  • 16:00 thcipriani: restarting jenkins for update
  • 15:58 jijiki: restart npre on stat1004
  • 15:08 _joe_: freeing APCu on mw1270, which has degraded performance
  • 14:24 marostegui: Reboot dbproxy2003 for kernel upgrades
  • 14:16 jbond42: puppet *now* re-enabled
  • 14:16 jbond42: puppet not re-enabled
  • 14:01 jbond42: disable puppet fleet wide for puppetdb restart
  • 13:57 marostegui: Remove labsdb1004 and labsdb1005 from zarcillo database (instance table), as those hosts were decommissioned months ago
  • 13:55 marostegui: Remove labsdb1004 and labsdb1005 from zarcillo database, as those hosts were decommissioned months ago
  • 13:48 marostegui: Apply grants for dbproxy2003 on m3 - T202367
  • 13:22 elukey: roll restart aqs on aqs100[4-9] to pick up new Druid backend settings
  • 11:48 Amir1: EU SWAT is done
  • 11:37 kart_: Updated cxserver to 2019-08-06-100812-production (T227571)
  • 11:33 ladsgroup@deploy1001: Synchronized wmf-config/InitialiseSettings.php: SWAT: Switch property terms migration to WRITE_NEW on client wikis (T225053) (duration: 00m 56s)
  • 11:29 @: helmfile [EQIAD] Ran 'apply' command on namespace 'cxserver' for release 'production' .
  • 11:26 pmiazga@deploy1001: Synchronized wmf-config/InitialiseSettings.php: SWAT: Enable AMC on all wikipedias (T228916) (duration: 00m 55s)
  • 11:26 @: helmfile [CODFW] Ran 'apply' command on namespace 'cxserver' for release 'production' .
  • 11:22 @: helmfile [STAGING] Ran 'apply' command on namespace 'cxserver' for release 'staging' .
  • 11:09 marostegui: Restart gerrit
  • 10:11 moritzm: deleting poolcounter1001, poolcounter1003, poolcounter2001, poolcounter2002 in Ganeti (T224572)
  • 10:03 jmm@cumin2001: END (PASS) - Cookbook sre.hosts.decommission (exit_code=0)
  • 10:03 jmm@cumin2001: START - Cookbook sre.hosts.decommission
  • 09:14 marostegui: Drop math table from s6 - T196055
  • 08:49 marostegui@deploy1001: Synchronized wmf-config/db-eqiad.php: Provision db2131 into x1 T228969 (duration: 00m 55s)
  • 08:48 marostegui@deploy1001: Synchronized wmf-config/db-codfw.php: Provision db2131 into x1 T228969 (duration: 00m 56s)
  • 08:37 jmm@cumin2001: END (PASS) - Cookbook sre.hosts.decommission (exit_code=0)
  • 08:37 jmm@cumin2001: START - Cookbook sre.hosts.decommission
  • 08:01 marostegui@cumin1001: dbctl commit (dc=all): 'Pool db2130 into s1 - T228969', diff saved to https://phabricator.wikimedia.org/P8877 and previous config saved to /var/cache/conftool/dbconfig/20190807-080059-marostegui.json
  • 07:36 jmm@cumin2001: END (PASS) - Cookbook sre.hosts.decommission (exit_code=0)
  • 07:36 jmm@cumin2001: START - Cookbook sre.hosts.decommission
  • 07:33 marostegui@cumin1001: dbctl commit (dc=all): 'Fully repool db1100 after on-site maintenance', diff saved to https://phabricator.wikimedia.org/P8876 and previous config saved to /var/cache/conftool/dbconfig/20190807-073349-marostegui.json
  • 07:32 jmm@cumin2001: END (PASS) - Cookbook sre.hosts.decommission (exit_code=0)
  • 07:31 jmm@cumin2001: START - Cookbook sre.hosts.decommission
  • 07:28 marostegui@deploy1001: Synchronized wmf-config/db-eqiad.php: Provision db2130 into s1 T228969 (duration: 00m 56s)
  • 07:27 marostegui@deploy1001: Synchronized wmf-config/db-codfw.php: Provision db2130 into s1 T228969 (duration: 00m 55s)
  • 05:57 marostegui: Stop MySQL on db1071 - T229381
  • 05:55 marostegui: Remove db1071 from tendril and zarcillo - T229381
  • 05:51 marostegui@deploy1001: Synchronized wmf-config/db-eqiad.php: Remove db1071 from config T229381 (duration: 00m 55s)
  • 05:50 marostegui@deploy1001: Synchronized wmf-config/db-codfw.php: Remove db1071 from config T229381 (duration: 00m 57s)
  • 05:39 marostegui@cumin1001: dbctl commit (dc=all): 'Slowly repool db1100 after on-site maintenance', diff saved to https://phabricator.wikimedia.org/P8875 and previous config saved to /var/cache/conftool/dbconfig/20190807-053903-marostegui.json
  • 00:48 mutante: restarting gerrit to apply config change 528276 to exclude some projects from github replication
  • 00:21 mutante: gerrit2001 - restarting gerrit to apply 528276

2019-08-06

  • 23:51 catrope@deploy1001: Synchronized static/images/project-logos/: Update HD logos for enwikisource and sourceswiki (T229769) (duration: 00m 56s)
  • 23:50 catrope@deploy1001: Synchronized php-1.34.0-wmf.17/extensions/Flow/includes/Import/OptInController.php: Unbreak disabling of Flow beta feature (T229795) (duration: 00m 55s)
  • 23:49 catrope@deploy1001: Synchronized php-1.34.0-wmf.16/extensions/Flow/includes/Import/OptInController.php: Unbreak disabling of Flow beta feature (T229795) (duration: 00m 56s)
  • 23:36 mutante: phabricator - added ssingh to acl*sre-team (group 29), WMF-NDA-requests (group 974) and WMF-NDA (group 61) (T229860)
  • 23:25 catrope@deploy1001: Synchronized wmf-config/InitialiseSettings.php: Update HD logos for enwikisource and sourceswiki (T229769) (duration: 00m 55s)
  • 23:24 catrope@deploy1001: Synchronized static/images/project-logos/: Update HD logos for enwikisource and sourceswiki (T229769) (duration: 00m 56s)
  • 23:19 catrope@deploy1001: Synchronized wmf-config/InitialiseSettings.php: Switch updateBetaFeaturesUserCounts job to eventgate (T228705) (duration: 00m 57s)
  • 23:12 eileen: civicrm revision changed from 2e03f9bb1e to 727a2c193b, config revision is 84b785d41c
  • 22:33 ebernhardson: restart mjolnir-kafka-daemon across all elasticsearch servers
  • 22:25 ebernhardson@deploy1001: Finished deploy [search/mjolnir/deploy@9e95ab4]: Deploy latest mjolnir daemon to handle bulk imports via swift (duration: 05m 35s)
  • 22:19 ebernhardson@deploy1001: Started deploy [search/mjolnir/deploy@9e95ab4]: Deploy latest mjolnir daemon to handle bulk imports via swift
  • 21:53 ebernhardson@deploy1001: Finished deploy [search/mjolnir/deploy@8e513f6]: Deploy latest mjolnir daemon to handle bulk imports via swift (duration: 16m 35s)
  • 21:36 ebernhardson@deploy1001: Started deploy [search/mjolnir/deploy@8e513f6]: Deploy latest mjolnir daemon to handle bulk imports via swift
  • 21:35 ebernhardson@deploy1001: Finished deploy [search/mjolnir/deploy@860fb33]: Deploy latest mjolnir daemon to handle bulk imports via swift (duration: 01m 50s)
  • 21:34 ebernhardson@deploy1001: Started deploy [search/mjolnir/deploy@860fb33]: Deploy latest mjolnir daemon to handle bulk imports via swift
  • 21:28 @: helmfile [STAGING] Ran 'apply' command on namespace 'sessionstore' for release 'staging' .
  • 20:17 subbu: repooled wtp2019 ( after papaul finished upgrade as part of T221572 )
  • 19:52 papaul: shutting down wtp2019 for firmware upgrade
  • 19:50 herron: disabling puppet on logstash collectors for rolling deploy of https://gerrit.wikimedia.org/r/#/c/operations/puppet/+/528306/ T166107
  • 19:42 subbu: depooled wtp2019 ( to assist papaul with T221572 )
  • 19:22 thcipriani: gerrit restart on cobalt
  • 19:12 brennen@deploy1001: rebuilt and synchronized wikiversions files: Group0 to 1.34.0-wmf.17
  • 18:38 brennen@deploy1001: Finished scap: testwiki to php-1.34.0-wmf.17 and rebuild l10n cache (duration: 19m 02s)
  • 18:19 brennen@deploy1001: Started scap: testwiki to php-1.34.0-wmf.17 and rebuild l10n cache
  • 18:13 brennen@deploy1001: Pruned MediaWiki: 1.34.0-wmf.14 [keeping static files] (duration: 08m 28s)
  • 17:37 accraze@deploy1001: Finished deploy [ores/deploy@d08fa62]: T229848 (duration: 17m 21s)
  • 17:20 accraze@deploy1001: Started deploy [ores/deploy@d08fa62]: T229848
  • 17:14 volans: uploaded spicerack_0.0.26-1_amd64.deb to apt.wikimedia.org stretch-wikimedia
  • 16:54 akosiaris@puppetmaster1001: conftool action : set/pooled=true; selector: dnsdisc=mathoid,name=codfw
  • 16:52 @: helmfile [CODFW] Ran 'sync' command on namespace 'mathoid' for release 'production' .
  • 16:50 brennen: cutting branch for 1.34.0-wmf.17
  • 16:50 akosiaris@puppetmaster1001: conftool action : set/pooled=true; selector: dnsdisc=citoid,name=codfw
  • 16:50 akosiaris@puppetmaster1001: conftool action : set/pooled=true; selector: dnsdisc=eventgate-analytics,name=codfw
  • 16:48 @: helmfile [CODFW] Ran 'sync' command on namespace 'eventgate-analytics' for release 'analytics' .
  • 16:47 akosiaris@puppetmaster1001: conftool action : set/pooled=false; selector: dnsdisc=mathoid,name=codfw
  • 16:43 akosiaris@puppetmaster1001: conftool action : set/pooled=false; selector: dnsdisc=eventgate-analytics,name=codfw
  • 16:40 ebernhardson@deploy1001: Synchronized wmf-config/InitialiseSettings.php: T220625: Re-sync enable group1 on cloudelastic, job runners are claiming its not enabled while app servers are sending jobs (duration: 00m 47s)
  • 16:39 @: helmfile [CODFW] Ran 'apply' command on namespace 'cxserver' for release 'production' .
  • 16:37 @: helmfile [CODFW] Ran 'apply' command on namespace 'blubberoid' for release 'production' .
  • 16:36 @: helmfile [CODFW] Ran 'sync' command on namespace 'citoid' for release 'production' .
  • 16:33 @: helmfile [CODFW] Ran 'sync' command on namespace 'mathoid' for release 'production' .
  • 16:33 @: helmfile [CODFW] Ran 'apply' command on namespace 'mathoid' for release 'production' .
  • 16:33 @: helmfile [CODFW] Ran 'sync' command on namespace 'mathoid' for release 'production' .
  • 16:32 @: helmfile [CODFW] Ran 'apply' command on namespace 'mathoid' for release 'production' .
  • 16:31 @: helmfile [STAGING] Ran 'apply' command on namespace 'mathoid' for release 'staging' .
  • 16:19 ebernhardson@deploy1001: Synchronized wmf-config/InitialiseSettings.php: T220625: Turn on cloudelastic writes for group1 (duration: 00m 47s)
  • 16:08 akosiaris@puppetmaster1001: conftool action : set/pooled=false; selector: dnsdisc=citoid,name=codfw
  • 15:13 moritzm: installing bind9 security updates (client-side tools/libs only) for jessie
  • 15:04 cdanis: ✔️ cdanis@cumin1001.eqiad.wmnet ~ 🕚☕ sudo debdeploy deploy -u 2019-08-06-conftool.yaml -s all
  • 14:55 moritzm: rebooting mwlog1001 for kernel update
  • 14:55 cdanis: ✔️ cdanis@cumin1001.eqiad.wmnet ~ 🕚☕ sudo cumin -p99 -b100 'A:all' 'apt-get update'
  • 14:54 jmm@cumin2001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
  • 14:54 jmm@cumin2001: START - Cookbook sre.hosts.downtime
  • 14:52 herron: restarting logstash service on logstash1007 to pick up puppet managed log4j2 config
  • 14:50 cdanis: ✔️ cdanis@cumin1001.eqiad.wmnet ~ 🕚☕ sudo debdeploy deploy -u 2019-08-06-conftool.yaml -s mw-canary
  • 14:45 cdanis: ✔️ cdanis@install1002.wikimedia.org ~/conftool-1.1.4-2 🕥☕ sudo -E reprepro -C main include buster-wikimedia conftool_1.1.4-2+deb10u1_amd64.changes
  • 14:44 cdanis: ✔️ cdanis@install1002.wikimedia.org ~/conftool-1.1.4-2 🕥☕ sudo -E reprepro -C main include stretch-wikimedia conftool_1.1.4-2_amd64.changes
  • 14:37 cdanis: ✔️ cdanis@install1002.wikimedia.org ~/conftool-1.1.4-2 🕥 sudo -E reprepro -C main include jessie-wikimedia conftool_1.1.4-2+deb8u1_amd64.changes
  • 14:36 marostegui: Start mysql on db1100 after on-site maintenance - T228732
  • 12:30 elukey: roll restart cassandra on aqs for openjdk-8 upgrades
  • 12:06 akosiaris@: helmfile [CODFW] Ran 'apply' command on namespace 'kube-system' for release 'calico-policy-controller' .
  • 12:05 akosiaris@: helmfile [STAGING] Ran 'apply' command on namespace 'kube-system' for release 'calico-policy-controller' .
  • 11:49 akosiaris@: helmfile [CODFW] Ran 'apply' command on namespace 'kube-system' for release 'calico-policy-controller' .
  • 11:49 akosiaris@: helmfile [STAGING] Ran 'apply' command on namespace 'kube-system' for release 'calico-policy-controller' .
  • 11:36 Urbanecm: EU SWAT done
  • 11:21 urbanecm@deploy1001: Synchronized php-1.34.0-wmf.16/extensions/AbuseFilter/: SWAT: 8cc96db: Better handling of DNONE (T214674, T228677) (duration: 00m 48s)
  • 11:11 moritzm: rebooting install1002 to pick up MDS-enabled qemu
  • 11:11 jmm@cumin2001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
  • 11:11 jmm@cumin2001: START - Cookbook sre.hosts.downtime
  • 11:08 ladsgroup@deploy1001: Synchronized wmf-config/InitialiseSettings.php: SWAT: Disable EntitySchema in production wikidata (duration: 00m 48s)
  • 10:52 moritzm: rebooting install2002 to pick up MDS-enabled qemu
  • 10:49 jmm@cumin2001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
  • 10:49 jmm@cumin2001: START - Cookbook sre.hosts.downtime
  • 10:07 moritzm: rebooting etherpad1001 to pick up MDS-enabled qemu
  • 10:06 jmm@cumin2001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
  • 10:06 jmm@cumin2001: START - Cookbook sre.hosts.downtime
  • 09:59 filippo@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
  • 09:59 filippo@cumin1001: START - Cookbook sre.hosts.downtime
  • 09:59 filippo@cumin1001: END (FAIL) - Cookbook sre.hosts.downtime (exit_code=99)
  • 09:58 filippo@cumin1001: START - Cookbook sre.hosts.downtime
  • 08:52 @: helmfile [STAGING] Ran 'apply' command on namespace 'mathoid' for release 'staging' .
  • 08:39 marostegui: Add db2130 to tendril and zarcillo T228969
  • 08:22 @: helmfile [STAGING] Ran 'apply' command on namespace 'mathoid' for release 'staging' .
  • 07:27 marostegui: Stop MySQL on db1100 before powering the host off - T228732
  • 07:27 marostegui@cumin1001: dbctl commit (dc=all): 'Depool for firmware and BIOS upgrade T228732', diff saved to https://phabricator.wikimedia.org/P8869 and previous config saved to /var/cache/conftool/dbconfig/20190806-072720-marostegui.json
  • 07:10 onimisionipe: pool maps1001. Postgres init complete - T229788
  • 05:59 urbanecm@deploy1001: Synchronized php-1.34.0-wmf.16/extensions/CheckUser: Fix T229893 (duration: 00m 47s)
  • 05:53 marostegui@cumin1001: dbctl commit (dc=all): 'Pool db2127 into s3 T228969', diff saved to https://phabricator.wikimedia.org/P8868 and previous config saved to /var/cache/conftool/dbconfig/20190806-055357-marostegui.json
  • 05:49 marostegui@deploy1001: Synchronized wmf-config/db-codfw.php: Provision db2127 into s3 T228969 (duration: 00m 48s)
  • 05:34 marostegui: Restart wikibugs
  • 05:06 marostegui: Reload haproxy on dbproxy1011 to repool labsdb1010 T222978
  • 03:58 ebernhardson: start importing group[12] to cloudelastic from mwmaint1002
  • 02:08 eileen: civicrm revision changed from 857dcc9461 to 2e03f9bb1e, config revision is 84b785d41c
  • 02:05 MaxSem: Creating local accounts for Community Tech bot on every Wikipedia

2019-08-05

  • 23:34 mutante: mwmaint1002 - remove getJobQueueLengths.php from www-data's crontab (T195392)
  • 23:03 Urbanecm: Evening SWAT done
  • 23:03 urbanecm@deploy1001: Synchronized wmf-config/ProductionServices.php: SWAT: 87b428d: Repoint cloudelastic at LB dns (T220625) (duration: 00m 48s)
  • 21:55 papaul: powering down wtp2011 for BIOS upgrade
  • 21:39 cdanis: ✔️ cdanis@cumin1001.eqiad.wmnet ~ 🕠🍺 sudo debdeploy deploy -u 2019-08-05-conftool.yaml -s all
  • 21:35 cdanis: ❌cdanis@cumin1001.eqiad.wmnet ~ 🕠🍺 sudo debdeploy deploy -u 2019-08-05-conftool.yaml -s eqsin
  • 21:29 cdanis: ✔️ cdanis@cumin1001.eqiad.wmnet ~ 🕠🍺 sudo cumin -p99 -b100 'A:all' 'apt-get update'
  • 21:28 mutante: 🔔 scandium - ree-enabled icinga notifications for various services
  • 21:27 cdanis: ✔️ cdanis@cumin1001.eqiad.wmnet ~ 🕠🍺 sudo debdeploy deploy -u 2019-08-05-conftool.yaml -s mw-canary
  • 21:25 cdanis: ✔️ cdanis@install1002.wikimedia.org ~ 🕠🍺 sudo -E reprepro -C main include jessie-wikimedia conftool-1.1.4-1/conftool_1.1.4-1+deb8u1_amd64.changes
  • 21:25 cdanis: ✔️ cdanis@install1002.wikimedia.org ~ 🕠🍺 sudo -E reprepro -C main include buster-wikimedia conftool-1.1.4-1/conftool_1.1.4-1+deb10u1_amd64.changes
  • 21:24 cdanis: ✔️ cdanis@install1002.wikimedia.org ~ 🕠 sudo -E reprepro -C main include stretch-wikimedia conftool-1.1.4-1/conftool_1.1.4-1_amd64.changes
  • 21:22 ebernhardson: start importing group0 to cloudelastic from mwmaint1002
  • 20:49 ebernhardson: nuke all search indices on cloudelastic preparing for fresh imports and live updates T220625
  • 20:34 arlolra: Updated Parsoid to 7232dff (T228223)
  • 20:28 arlolra@deploy1001: Finished deploy [parsoid/deploy@d3a2937]: Updating Parsoid to 7232dff (duration: 09m 02s)
  • 20:19 arlolra@deploy1001: Started deploy [parsoid/deploy@d3a2937]: Updating Parsoid to 7232dff
  • 20:06 mholloway-shell@deploy1001: Finished deploy [mobileapps/deploy@e774a05]: Update mobileapps to c713c2e (duration: 04m 51s)
  • 20:01 mholloway-shell@deploy1001: Started deploy [mobileapps/deploy@e774a05]: Update mobileapps to c713c2e
  • 19:51 gehel: depool wdqs1005 - T229876
  • 19:35 thcipriani: gerrit restart on cobalt for configuration updates
  • 19:34 bblack: fixing up cloudelastic LVS IPv6 stuff on lvs1014, lvs1016, cloudelastic* - possible monitoring noise
  • 19:33 thcipriani: gerrit restart for gerrit-replica on gerrit2001
  • 18:44 Urbanecm: Morning SWAT done
  • 18:39 urbanecm@deploy1001: Synchronized php-1.34.0-wmf.16/extensions/AbuseFilter/: SWAT: d358f17: Revert "Better handling of DNONE" (T214674, T228677) (duration: 00m 47s)
  • 18:32 urbanecm@deploy1001: Synchronized php-1.34.0-wmf.16/extensions/AbuseFilter/: SWAT: 936a462: Better handling of DNONE (T214674, T228677) (duration: 00m 47s)
  • 18:29 urbanecm@deploy1001: Synchronized php-1.34.0-wmf.16/extensions/WikimediaEvents/: SWAT: 3ee0e84: Temporarily log search to two schemas (duration: 00m 47s)
  • 18:25 Urbanecm: Deployed patch for T207094
  • 18:21 urbanecm@deploy1001: Synchronized dblists/: SWAT: a9e4ed8: Remove related-articles-footer-blacklisted-skins.dblist (T229644, 3/3) (duration: 00m 46s)
  • 18:20 urbanecm@deploy1001: Synchronized wmf-config/CommonSettings.php: SWAT: a9e4ed8: Remove related-articles-footer-blacklisted-skins.dblist (T229644, 2/3) (duration: 00m 47s)
  • 18:19 urbanecm@deploy1001: Synchronized wmf-config/InitialiseSettings.php: SWAT: a9e4ed8: Remove related-articles-footer-blacklisted-skins.dblist (T229644, 1/3) (duration: 00m 49s)
  • 18:09 urbanecm@deploy1001: Synchronized wmf-config/InitialiseSettings.php: SWAT: 254ecc1: Switch testwiki to use kask (only) for sessions (T222099) (duration: 00m 48s)
  • 18:07 urbanecm@deploy1001: Synchronized wmf-config/InitialiseSettings.php: SWAT: e44a6e6: Enable editor gender surveys (T227793) (duration: 00m 48s)
  • 18:06 onimisionipe: reinit postgres on maps1001 - T229788
  • 17:33 jijiki: Pool restbase2009 - T227408
  • 17:28 fsero@puppetmaster1001: conftool action : set/pooled=true; selector: dnsdisc=sessionstore|citoid|cxserver|eventgate-analytics|eventgate-main|termbox|blubberoid|mathoid|zotero,name=codfw
  • 16:53 fsero@: helmfile [CODFW] Ran 'apply' command on namespace 'cxserver' for release 'production' .
  • 16:53 fsero@: helmfile [CODFW] Ran 'apply' command on namespace 'cxserver' for release 'production' .
  • 16:52 fsero@: helmfile [CODFW] Ran 'apply' command on namespace 'blubberoid' for release 'production' .
  • 16:37 fsero@: helmfile [CODFW] Ran 'apply' command on namespace 'sessionstore' for release 'production' .
  • 16:32 fsero@: helmfile [CODFW] Ran 'apply' command on namespace 'mathoid' for release 'production' .
  • 16:22 crusnov@: helmfile [CODFW] Ran 'apply' command on namespace 'eventgate-main' for release 'main' .
  • 16:22 crusnov@: helmfile [CODFW] Ran 'apply' command on namespace 'eventgate-main' for release 'main' .
  • 16:18 crusnov@: helmfile [CODFW] Ran 'apply' command on namespace 'eventgate-analytics' for release 'analytics' .
  • 16:16 crusnov@: helmfile [CODFW] Ran 'apply' command on namespace 'citoid' for release 'production' .
  • 16:10 fsero: recreating citoid eventgate-analytics eventgate-main mathoid sessionstore namespaces and redeploying from helmfile T228837
  • 16:06 crusnov@: helmfile [CODFW] Ran 'apply' command on namespace 'zotero' for release 'production' .
  • 16:04 crusnov@: helmfile [CODFW] Ran 'apply' command on namespace 'termbox' for release 'production' .
  • 16:02 crusnov@: helmfile [CODFW] Ran 'apply' command on namespace 'termbox' for release 'production' .
  • 15:58 Urbanecm: Deploy patch for T200104
  • 15:41 vgutierrez@puppetmaster1001: conftool action : set/pooled=yes; selector: name=ncredir2002.codfw.wmnet
  • 15:36 fsero@: helmfile [CODFW] Ran 'apply' command on namespace 'zotero' for release 'production' .
  • 15:32 fsero@: helmfile [CODFW] Ran 'apply' command on namespace 'termbox' for release 'production' .
  • 15:27 fsero: recreating zotero and termbox namespaces and services from helmfile codfw - T228837
  • 15:26 fsero: recreating zotero and termbox from helmfile codfw - T228837
  • 15:21 marostegui: Add db2127 to tendril and zarcillo (s3) - T228969
  • 15:18 vgutierrez@puppetmaster1001: conftool action : set/pooled=no; selector: name=ncredir2002.codfw.wmnet
  • 14:32 marostegui: Reload haproxy on dbproxy1011 to depool labsdb1010 T222978
  • 14:24 papaul: shut down rstbase2009 for battery replacement
  • 14:12 fsero@puppetmaster1001: conftool action : set/pooled=false; selector: dnsdisc=sessionstore|citoid|cxserver|eventgate-analytics|eventgate-main|termbox|blubberoid|mathoid|zotero,name=codfw
  • 14:08 jiji@cumin1001: END (FAIL) - Cookbook sre.hosts.downtime (exit_code=99)
  • 14:08 jiji@cumin1001: START - Cookbook sre.hosts.downtime
  • 14:07 jiji@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
  • 14:07 jiji@cumin1001: START - Cookbook sre.hosts.downtime
  • 14:06 jijiki: Depool and restart restbase2009 for maint - T227408
  • 14:05 fsero@: helmfile [CODFW] Ran 'apply' command on namespace 'kube-system' for release 'coredns' .
  • 14:04 fsero@: helmfile [CODFW] Ran 'apply' command on namespace 'kube-system' for release 'rbac-deploy-clusterrole' .
  • 14:00 fsero@: helmfile [CODFW] Ran 'apply' command on namespace 'kube-system' for release 'calico-policy-controller' .
  • 13:57 fsero@: helmfile [CODFW] Ran 'apply' command on namespace 'kube-system' for release 'calico-policy-controller' .
  • 13:56 fsero: deploying calico controller in codfw via helmfile - T228837
  • 13:42 fsero: deploying tiller in kube-system for helmfile changes - T228837
  • 13:37 volans: run cumin 'A:cumin' 'rm -v /usr/local/sbin/{wmf-upgrade-varnish,wmf-upgrade-and-reboot,wmf-downtime-host,wmf-decommission-host}' T205886
  • 13:28 elukey@cumin1001: END (PASS) - Cookbook sre.kafka.roll-restart-mirror-maker (exit_code=0)
  • 13:16 elukey@cumin1001: START - Cookbook sre.kafka.roll-restart-mirror-maker
  • 13:01 jbond42: rolling update of openjdk-8 on restbase
  • 12:44 moritzm: restarting cassandra on restbase-dev1004
  • 12:44 moritzm: restarting cassandra on restbase-dev1040
  • 12:33 moritzm: uploaded openjdk-8 u222 for jessie-wikimedia
  • 12:26 Krinkle: mwscript deleteEqualMessages.php --wiki fywiktionary (requested at m:Steward_requests/Miscellaneous)
  • 12:13 ladsgroup@deploy1001: Synchronized wmf-config/InitialiseSettings.php: SWAT: Switch property terms migration to WRITE_NEW on production wikidata (T225053) (duration: 00m 48s)
  • 12:01 Urbanecm: EU SWAT done
  • 11:46 urbanecm@deploy1001: Synchronized wmf-config/InitialiseSettings.php: SWAT: 0032b0a: Enable Page Previews as default on hewikivoyage (T222017) (duration: 00m 47s)
  • 11:43 jbond@cumin1001: END (FAIL) - Cookbook sre.elasticsearch.rolling-restart (exit_code=99)
  • 11:43 jbond@cumin1001: START - Cookbook sre.elasticsearch.rolling-restart
  • 11:42 jbond@cumin1001: END (ERROR) - Cookbook sre.elasticsearch.rolling-restart (exit_code=97)
  • 11:42 jbond@cumin1001: START - Cookbook sre.elasticsearch.rolling-restart
  • 11:38 urbanecm@deploy1001: Synchronized php-1.34.0-wmf.16/extensions/MobileFrontend/: SWAT: b7ae4fb: Revert "[AMC] [desktop] [mobile] use AMC by default for desktop users" (T229722) (duration: 00m 49s)
  • 11:33 marostegui: Upgrade MySQL on db2074 db2057 db2050 db2035 db2098
  • 11:29 urbanecm@deploy1001: Synchronized php-1.34.0-wmf.16/extensions/Wikibase: SWAT: 3ecaa57: Add only needed entity usages in AddUsagesForPageJob (T226818, T205045) (duration: 01m 12s)
  • 11:04 urbanecm@deploy1001: Synchronized wmf-config/InitialiseSettings.php: SWAT: 9eb74c2: Define import sources for fawiki (T229717) (duration: 00m 48s)
  • 10:40 jbond42: update java on sessionstore
  • 10:40 jdrewniak@deploy1001: Synchronized portals: Wikimedia Portals Update: Bumping portals to master (T128546) (duration: 00m 46s)
  • 10:39 jdrewniak@deploy1001: Synchronized portals/wikipedia.org/assets: Wikimedia Portals Update: Bumping portals to master (T128546) (duration: 00m 49s)
  • 10:27 ema: upload fifo-log-demux 0.5 to stretch-wikimedia
  • 10:12 jbond42: rolling update of openjdk on maps servers
  • 09:30 marostegui: Stop MySQL on db2105 to change binlog format
  • 09:18 @: helmfile [STAGING] Ran 'apply' command on namespace 'restrouter' for release 'staging' .
  • 09:07 arturo: downtime toolschecker for 5hours
  • 09:05 @: helmfile [STAGING] Ran 'apply' command on namespace 'restrouter' for release 'staging' .
  • 08:56 moritzm: installing vim security updates for jessie (stretch/buster already fixed)
  • 08:46 marostegui@deploy1001: Synchronized wmf-config/db-eqiad.php: Remove db2035 from config T229784 (duration: 00m 46s)
  • 08:45 marostegui@deploy1001: Synchronized wmf-config/db-codfw.php: Remove db2035 from config T229784 (duration: 00m 47s)
  • 08:43 @: helmfile [STAGING] Ran 'apply' command on namespace 'restrouter' for release 'staging' .
  • 08:32 marostegui@cumin1001: dbctl commit of MediaWiki config (dc=all), diff saved to 'https://phabricator.wikimedia.org/P8861', previous config saved to /var/cache/conftool/dbconfig/20190805-083254-marostegui.json
  • 08:21 marostegui: Switchover s2 codfw master from db2035 to db2107 - T221533 T220170
  • 07:53 marostegui@deploy1001: Synchronized wmf-config/db-codfw.php: Reorganize s2 T228969 (duration: 00m 47s)
  • 07:53 marostegui@deploy1001: Synchronized wmf-config/db-eqiad.php: Reorganize s2 T228969 (duration: 00m 48s)
  • 07:52 marostegui@deploy1001: sync-file aborted: Reorganize s2 T228969 (duration: 00m 06s)
  • 07:49 marostegui@cumin1001: dbctl commit of MediaWiki config (dc=all), diff saved to 'https://phabricator.wikimedia.org/P8859', previous config saved to /var/cache/conftool/dbconfig/20190805-074930-marostegui.json
  • 07:45 moritzm: installing unzip regression DLA for jessie
  • 07:43 moritzm: removed orespoolcounter[12]00[12] from debmonitor T227640
  • 07:23 marostegui: Move db2095:3312 from db2063 to db2126 - T228969
  • 05:58 marostegui: Update rack column on zarcillo.servers for the new servers T229683
  • 05:30 marostegui@deploy1001: Synchronized wmf-config/db-eqiad.php: Provision db2124 into s6 T228969 (duration: 00m 46s)
  • 05:29 marostegui@deploy1001: Synchronized wmf-config/db-codfw.php: Provision db2124 into s6 T228969 (duration: 00m 49s)
  • 05:28 marostegui@cumin1001: dbctl commit of MediaWiki config (dc=all), diff saved to 'https://phabricator.wikimedia.org/P8858', previous config saved to /var/cache/conftool/dbconfig/20190805-052839-marostegui.json

2019-08-04

  • 18:45 krinkle@deploy1001: Synchronized wmf-config/abusefilter.php: labs-only noop - f740f89c594979 (duration: 00m 50s)

2019-08-03

  • 12:02 gilles: purging ruwiki articles on mwmaint1002
  • 11:30 gilles: purging eswiki articles on mwmaint1002
  • 10:01 ema: cp1085: restart varnish-be
  • 09:36 gilles@deploy1001: Synchronized wmf-config/InitialiseSettings.php: T216499 T216594 Renew origin trial tokens (duration: 00m 48s)
  • 00:40 ejegg: rolled back fundraising python tools from 493a38f9e0 to 2a56e5e283

2019-08-02

  • 23:58 mutante: scandium - apt-get remove --purge prometheus-hhvm-exporter - not needed here, no HHVM (T228069)
  • 23:16 XioNoX: Make the Level3 link between eqiad-knams primary - T228827
  • 23:06 mutante: mwdebug1001/mwdebug1002 - restart-php7.2-fpm - low opcache
  • 20:48 sbassett: Deployed security patch for T229541
  • 20:14 Urbanecm: Run mwscript deleteEqualMessages.php --wiki=cswiki --delete
  • 19:24 mutante: gerrit2001 - re-enabling puppet, starting as slave for the first time ever, thanks to codfw dbproxy, gerrit service running (T176532)
  • 18:37 mutante: gerrit2001 - disabling puppet, stopping gerrit service
  • 18:36 mutante: adding gerrit2001 to ferm rules on dbproxy for misc
  • 18:14 Lucas_WMDE: recached all WikibaseView messages in ResourceLoader for T229604, cf. https://w.wiki/6kc
  • 17:46 XioNoX: flap NTT link in eqsin
  • 17:42 lucaswerkmeister-wmde@deploy1001: Finished scap: Fix WikibaseView i18n globals (T229604) (duration: 16m 51s)
  • 17:26 XioNoX: add avoid_path to cr1/2-eqsin
  • 17:25 lucaswerkmeister-wmde@deploy1001: Started scap: Fix WikibaseView i18n globals (T229604)
  • 17:19 krinkle@deploy1001: Synchronized docroot/noc/db.php: a75d23ecb1b (duration: 00m 47s)
  • 17:10 krinkle@deploy1001: Synchronized docroot/noc/db.php: ee528e8 (duration: 00m 48s)
  • 16:42 XioNoX: replace rhenium with netflow1001 netflow target + iBGP peer on all routers
  • 15:52 mholloway-shell@deploy1001: Finished deploy [mobileapps/deploy@250f711]: Fix MCS production crashers (T229521, T229630) (duration: 04m 41s)
  • 15:47 mholloway-shell@deploy1001: Started deploy [mobileapps/deploy@250f711]: Fix MCS production crashers (T229521, T229630)
  • 15:14 @: helmfile [STAGING] Ran 'apply' command on namespace 'restrouter' for release 'staging' .
  • 15:12 @: helmfile [STAGING] Ran 'apply' command on namespace 'restrouter' for release 'staging' .
  • 14:14 mforns@deploy1001: Finished deploy [analytics/refinery@b50a939]: deploying refinery up to b50a939 (rollback of cassandra and edit_hourly hive2 actions to unbreak production) (duration: 16m 47s)
  • 13:57 mforns@deploy1001: Started deploy [analytics/refinery@b50a939]: deploying refinery up to b50a939 (rollback of cassandra and edit_hourly hive2 actions to unbreak production)
  • 13:54 @: helmfile [STAGING] Ran 'sync' command on namespace 'restrouter' for release 'staging' .
  • 13:45 oblivian@puppetmaster1001: conftool action : set/weight=10; selector: cluster=api_appserver,dc=eqiad,service=nginx,name=mw12[23].*
  • 12:33 marostegui: Restarted wikibugs a few minutes ago as it was not sending anything on IRC
  • 11:56 Amir1: aborted l10nupdate
  • 11:54 Amir1: start of l10nupdate
  • 11:48 ladsgroup@deploy1001: scap sync-l10n completed (1.34.0-wmf.16) (duration: 00m 44s)
  • 11:39 ladsgroup@deploy1001: Finished scap: Rebuilding l10n cache (duration: 05m 06s)
  • 11:34 ladsgroup@deploy1001: Started scap: Rebuilding l10n cache
  • 10:51 ladsgroup@deploy1001: Synchronized php-1.34.0-wmf.16/extensions/Wikibase: Revert "fix eslint errors in lib after moving submodule files into lib" (duration: 01m 08s)
  • 10:01 @: helmfile [STAGING] Ran 'apply' command on namespace 'restrouter' for release 'staging' .
  • 09:22 marostegui: Compress s7 on labsdb1010 - T222978
  • 09:17 ladsgroup@deploy1001: Synchronized wmf-config/InitialiseSettings.php: SWAT: Revert: Switch property terms migration to WRITE_NEW on production wikidata (T225053) (duration: 00m 48s)
  • 09:12 elukey: umount /sys/kernel/debug/tracing on analytics1043
  • 08:57 akosiaris@: helmfile [STAGING] Ran 'apply' command on namespace 'kube-system' for release 'calico-policy-controller' .
  • 08:56 @: helmfile [STAGING] Ran 'apply' command on namespace 'kube-system' for release 'calico-policy-controller' .
  • 08:16 marostegui@deploy1001: Synchronized wmf-config/db-codfw.php: Add db2129 to s6 (duration: 00m 46s)
  • 07:56 marostegui@cumin2001: dbctl commit of MediaWiki config (dc=all), diff saved to 'https://phabricator.wikimedia.org/P8852', previous config saved to /var/cache/conftool/dbconfig/20190802-075548-marostegui.json
  • 07:53 marostegui@deploy1001: Synchronized wmf-config/db-eqiad.php: Add db2129 to the config T228969 (duration: 00m 47s)
  • 07:52 marostegui@deploy1001: Synchronized wmf-config/db-codfw.php: Add db2129 to the config T228969 (duration: 00m 47s)
  • 07:43 marostegui: Restart hhvm on mw1226
  • 07:40 _joe_: restarting php-fpm on mw1270, with 80 pms - static, apc 6 GB no ttl
  • 07:38 _joe_: disabling puppet on mw1270 for testing of different php settings
  • 07:21 marostegui: Add db2124 to tendril and zarcillo T228969
  • 07:00 _joe_: running systemd-tmpfiles --create nutcracker.conf on scandium
  • 06:46 vgutierrez: upgrading acme-chief to version 0.20 in acme-chief test instances - T229096
  • 05:21 vgutierrez: uploaded acme-chief 0.20 to apt.wikimedia.org (buster) - T229096
  • 05:10 marostegui: Stop MySQL on db2058 for decommissioning T229543
  • 05:06 marostegui: Remove db2058 from tendril and zarcillo T229543

2019-08-01

  • 23:32 Urbanecm: Evening SWAT done
  • 23:32 urbanecm@deploy1001: Synchronized wmf-config/InitialiseSettings.php: SWAT: 819073a: Add `autopatrolled` group to az wikisource (T229371) (duration: 00m 49s)
  • 23:29 urbanecm@deploy1001: Synchronized wmf-config/flaggedrevs.php: SWAT: 8aca0eb: Remove the "autoreview" user group from ru.wikipedia (T229596) (duration: 00m 47s)
  • 23:22 urbanecm@deploy1001: Synchronized wmf-config/InitialiseSettings.php: SWAT: cf01272: Add importing to english wikiquote (T228607) (duration: 00m 48s)
  • 23:10 ebernhardson@deploy1001: Synchronized php-1.34.0-wmf.16/extensions/WikimediaEvents/modules/all/ext.wikimediaEvents.searchSatisfaction.js: T229614: Pass proper types to eventlogging to resolve eventlogging errors in wmf.16 (duration: 00m 47s)
  • 22:52 mholloway-shell@deploy1001: Finished deploy [mobileapps/deploy@5ebf93e]: Update mobileapps to 2ee48ab (duration: 04m 34s)
  • 22:47 mholloway-shell@deploy1001: Started deploy [mobileapps/deploy@5ebf93e]: Update mobileapps to 2ee48ab
  • 22:17 ebernhardson@deploy1001: Synchronized php-1.34.0-wmf.16/extensions/WikimediaEvents/extension.json: T229614: Update eventlogging schema version to resolve eventlogging errors in wmf.16 (duration: 00m 47s)
  • 22:13 mutante: scandium apt-get autoremove
  • 22:13 mutante: scandium apt-get remove --purge wikimedia-lvs-realserver (T228069)
  • 21:48 mutante: scandium - apt-get remove --purge hhvm* (T228069)
  • 21:23 brennen@deploy1001: Synchronized php: group1 and group2 to 1.34.0-wmf.16 (duration: 00m 46s)
  • 21:22 brennen@deploy1001: rebuilt and synchronized wikiversions files: group1 and group2 to 1.34.0-wmf.16
  • 20:57 krinkle@deploy1001: Synchronized php-1.34.0-wmf.16/includes/Revision/RevisionRenderer.php: T229589 - 3f1b32e (duration: 00m 50s)
  • 20:47 mutante: scandium - turning into an mw appserver
  • 20:46 mutante: puppetmaster: create mcrouter certs for scandium.eqiad.wmnet needed to make it an appserver (https://wikitech.wikimedia.org/wiki/Mcrouter#Generate_certs_for_a_new_host) (T228069)
  • 20:29 bblack: restart pybal on lvs1014
  • 19:57 bblack: lvs1016 - restart pybal for slight LVS config change for cloudelastic - T224324
  • 19:40 brennen@deploy1001: Synchronized php: Revert group1 and group2 back to 1.34.0-wmf.15 (duration: 00m 53s)
  • 19:39 twentyafterfour: finished phabricator database dump
  • 19:34 bblack: lvs1014 - puppetize and restart pybal for cloudelastic LVS - T224324
  • 19:31 brennen@deploy1001: rebuilt and synchronized wikiversions files: group1 and group2 to 1.34.0-wmf.15
  • 19:20 brennen: rolling back to wfm.15 on group1 and group2 while we investigate T229575
  • 19:12 brennen@deploy1001: rebuilt and synchronized wikiversions files: all wikis to 1.34.0-wmf.16
  • 18:52 mutante: scandium (parsoid testing) - added mw application server roles - puppet work / maintenance
  • 18:47 mutante: stat1004 - starting nagios-nrpe-server which got killed again - jbd2/md0-8 invoked oom-killer
  • 18:32 bblack@puppetmaster1001: conftool action : set/pooled=yes; selector: name=^cloudelastic.*
  • 18:30 bblack: lvs1016: puppet re-enabled, pybal restarted, cloudelastic deploy - T224324
  • 18:06 urbanecm@deploy1001: Synchronized wmf-config/CommonSettings.php: SWAT: 469c42d: Switch testwiki to read sessions from kask, with fallback to redis (T222099) (duration: 00m 55s)
  • 17:42 bblack: disable puppet on lvs1014 + lvs1016 for cloudelastic LVS merge - T224324
  • 17:36 twentyafterfour: running db dump on phab1003 (in tmux). command: sudo ./bin/storage dump --output /srv/dumps/phabricator_db_20190801.sql.gz --compress
  • 16:05 XioNoX: power down msw1-codfw
  • 15:47 XioNoX: start codfw mgmt work - T228112
  • 15:40 brennen@deploy1001: Synchronized php: group1 wikis to 1.34.0-wmf.16 (duration: 00m 54s)
  • 15:39 brennen@deploy1001: rebuilt and synchronized wikiversions files: group1 wikis to 1.34.0-wmf.16
  • 15:16 mholloway-shell@deploy1001: Synchronized php-1.34.0-wmf.15/extensions/Wikibase: Do not warn about entity that was not found in WikiPageEntityRevisionLookup (T229482) (duration: 01m 14s)
  • 15:13 mholloway-shell@deploy1001: Synchronized php-1.34.0-wmf.16/extensions/Wikibase: Do not warn about entity that was not found in WikiPageEntityRevisionLookup (T229482) (duration: 01m 20s)
  • 14:51 herron: performing rolling restarts of eqiad logstash cluster for security updates
  • 14:38 cdanis@deploy1001: Synchronized wmf-config/CommonSettings.php: Iaaa1238 comment-only no-op change (dbctl to 100% of production!) (duration: 00m 55s)
  • 14:22 cdanis@deploy1001: Synchronized wmf-config/etcd.php: Iaaa1238 dbctl to 100% of production! (duration: 00m 54s)
  • 12:38 jbond42: add cp1008 to canary hosts https://github.com/wikimedia/puppet/blob/production/hieradata/role/common/puppetmaster/frontend.yaml#L22
  • 12:18 marostegui: Rename math table on db1089 (enwiki) - T196055
  • 11:42 Urbanecm: EU SWAT done
  • 11:40 urbanecm@deploy1001: Synchronized wmf-config/InitialiseSettings.php: SWAT: c51baa3: Add files.geocollections.info to the wgCopyUploadsDomains whitelist for commonswiki (T229547) (duration: 00m 55s)
  • 11:34 urbanecm@deploy1001: Synchronized wmf-config/InitialiseSettings.php: SWAT: 1e4458e: Add nlm.nih.gov to the wgCopyUploadsDomains whitelist for commonswiki (T229470) (duration: 00m 53s)
  • 11:23 urbanecm@deploy1001: Synchronized wmf-config/InitialiseSettings.php: SWAT: c164132: Revert "Revert "Switch property terms migration to WRITE_NEW on production wikidata"" (T225053) (duration: 00m 55s)
  • 11:19 urbanecm@deploy1001: Synchronized php-1.34.0-wmf.16/extensions/ExternalGuidance/: SWAT: 9402c36: Provide the messages in the target language of translation (T228019) (duration: 00m 56s)
  • 11:09 urbanecm@deploy1001: Synchronized wmf-config/flaggedrevs.php: SWAT: 7db98f3: flaggedrevs.php: Remove useless wgAddGroups/wgRemoveGroups declarations (duration: 00m 55s)
  • 11:05 urbanecm@deploy1001: Synchronized wmf-config/flaggedrevs.php: SWAT: aa82657: flaggedrevs.php: Allow wikis to remove ability to promote to/demote from autoreview/editor (T229346) (duration: 00m 54s)
  • 10:51 marostegui@deploy1001: Synchronized wmf-config/db-eqiad.php: Remove db2058 from config T229543 (duration: 00m 57s)
  • 10:50 marostegui@deploy1001: Synchronized wmf-config/db-codfw.php: Remove db2058 from config T229543 (duration: 00m 55s)
  • 10:12 jbond42: rolling upgrade for patch
  • 10:10 _joe_: repooling mw1348 after reimaging as pure-php7
  • 07:38 marostegui@deploy1001: Synchronized wmf-config/db-codfw.php: Provision db2126 into s2 T228969 (duration: 00m 55s)
  • 07:37 marostegui@deploy1001: Synchronized wmf-config/db-eqiad.php: Provision db2126 into s2 T228969 (duration: 00m 54s)
  • 07:35 marostegui@cumin1001: dbctl commit of MediaWiki config (dc=all), diff saved to 'https://phabricator.wikimedia.org/P8844', previous config saved to /var/cache/conftool/dbconfig/20190801-073459-marostegui.json
  • 07:29 oblivian@puppetmaster1001: conftool action : set/pooled=inactive; selector: dc=eqiad,name=mw1348.eqiad.wmnet
  • 07:27 _joe_: removing mw1348 from rotation - reimaging for T228976
  • 07:10 marostegui@cumin1001: dbctl commit of MediaWiki config (dc=all), diff saved to 'https://phabricator.wikimedia.org/P8843', previous config saved to /var/cache/conftool/dbconfig/20190801-071022-marostegui.json
  • 07:09 marostegui@deploy1001: Synchronized wmf-config/db-eqiad.php: Repool db1112 (duration: 00m 54s)
  • 06:59 elukey: install python3-docopt manually on lithium to test check_anycast_healthchecker
  • 06:51 oblivian@puppetmaster1001: conftool action : set/pooled=yes; selector: dc=eqiad,name=mw1270.eqiad.wmnet
  • 06:42 oblivian@puppetmaster1001: conftool action : set/pooled=no; selector: dc=eqiad,name=mw1270.eqiad.wmnet
  • 06:42 _joe_: depooling mw1270 while migrating it to pure-php7
  • 06:28 oblivian@puppetmaster1001: conftool action : set/pooled=yes; selector: dc=eqiad,name=mw1348.eqiad.wmnet
  • 06:19 oblivian@puppetmaster1001: conftool action : set/pooled=no; selector: dc=eqiad,name=mw1348.eqiad.wmnet
  • 06:18 _joe_: depooling mw1348 while moving it to no hhvm support.
  • 00:33 krinkle@deploy1001: Synchronized php-1.34.0-wmf.16/resources/Resources.php: acfff67 (duration: 00m 54s)
  • 00:32 krinkle@deploy1001: Synchronized php-1.34.0-wmf.16/includes/specials/SpecialJavaScriptTest.php: acfff67 (duration: 00m 54s)
  • 00:30 krinkle@deploy1001: Synchronized php-1.34.0-wmf.16/includes/resourceloader/ResourceLoader.php: acfff67 (duration: 00m 55s)
  • 00:28 krinkle@deploy1001: sync-file aborted: composer.json composer.lock dblists debug.json docroot errorpages fc-list fonts images langlist langlist-labs multiversion php php-1.34.0-wmf.13 php-1.34.0-wmf.14 php-1.34.0-wmf.15 php-1.34.0-wmf.16 phpcs.xml phpunit.xml portals private README requirements.txt robots.txt rpc scap setup.py src static test-requirements.txt tests tox.ini typos vendor w wikiversions.json wikiversions-labs.js

2019-07-31

  • 23:34 eileen: civicrm revision changed from 218328b29d to 857dcc9461, config revision is 84b785d41c
  • 23:22 mholloway-shell@deploy1001: Finished deploy [mobileapps/deploy@db795ec]: Update mobileapps to b8c4166 (duration: 04m 21s)
  • 23:17 mholloway-shell@deploy1001: Started deploy [mobileapps/deploy@db795ec]: Update mobileapps to b8c4166
  • 23:14 Urbanecm: Evening SWAT done
  • 23:12 urbanecm@deploy1001: Synchronized wmf-config/: SWAT: Add kask session storage configuration. Use only on testwiki, (ede989e, 862df8d, T222099) (duration: 00m 56s)
  • 21:56 ejegg: updated fundraising python tools from 2a56e5e283 to 493a38f9e0
  • 21:32 XioNoX: set cr1-eqiad's netflow target port to 2100 (nfacctd)
  • 20:58 brennen@deploy1001: Synchronized php: Revert group1 back to 1.34.0-wmf.15 (duration: 00m 53s)
  • 20:55 brennen@deploy1001: rebuilt and synchronized wikiversions files: Revert group1 back to 1.34.0-wmf.15
  • 20:48 brennen@deploy1001: Synchronized php: group1 wikis to 1.34.0-wmf.16 (duration: 00m 54s)
  • 20:47 brennen@deploy1001: rebuilt and synchronized wikiversions files: group1 wikis to 1.34.0-wmf.16
  • 20:37 brennen@deploy1001: Synchronized php-1.34.0-wmf.16/skins/MinervaNeue/includes/MinervaHooks.php: Limit Recent Changes disable-table mode to Minerva skin T228280 (duration: 00m 56s)
  • 20:32 mdholloway: mobileapps deploy failed, investigating
  • 20:32 mholloway-shell@deploy1001: Finished deploy [mobileapps/deploy@7c6ce69]: Update mobileapps to 5eb9068 (duration: 01m 39s)
  • 20:30 mholloway-shell@deploy1001: Started deploy [mobileapps/deploy@7c6ce69]: Update mobileapps to 5eb9068
  • 20:01 mbsantos@deploy1001: Finished deploy [proton/deploy@ed6ebd8]: Update chromium-renderer to 529c493 (T227124) (duration: 01m 43s)
  • 19:59 mbsantos@deploy1001: Started deploy [proton/deploy@ed6ebd8]: Update chromium-renderer to 529c493 (T227124)
  • 19:55 ejegg: updated payments-wiki from 70b432d309 to 9533f70fab
  • 18:49 mutante: phab1003 - manually running project_changes.sh to create mail to phabricator-reports@lists (T228575)
  • 17:46 cdanis@deploy1001: Synchronized wmf-config/etcd.php: I45b705c8 disable dbctl on half of canary hosts (duration: 00m 57s)
  • 17:21 volans@deploy1001: Synchronized wmf-config/db-codfw.php: depool db2058, I/O error, T229449 (duration: 00m 54s)
  • 17:15 volans@cumin1001: dbctl commit of MediaWiki config (dc=codfw), diff saved to 'https://phabricator.wikimedia.org/P8841', previous config saved to /var/cache/conftool/dbconfig/20190731-171536-volans.json
  • 16:52 Urbanecm: Morning SWAT done
  • 16:39 urbanecm@deploy1001: Synchronized wmf-config/InitialiseSettings.php: SWAT: Enable MobileWebUIActionsTracking schema with 50% sampling rate (T220016) (duration: 00m 58s)
  • 16:37 urbanecm@deploy1001: Synchronized php-1.34.0-wmf.15/extensions/WikimediaEvents/: SWAT: Improved MobileUIActions tracking schema (T220016) (duration: 00m 54s)
  • 16:26 urbanecm@deploy1001: Synchronized php-1.34.0-wmf.16/extensions/GrowthExperiments/: SWAT: Only set relevant title on mobile skin (T229263, T225659) (duration: 00m 51s)
  • 16:25 urbanecm@deploy1001: Synchronized php-1.34.0-wmf.15/extensions/GrowthExperiments/: SWAT: Only set relevant title on mobile skin (T229263, T225659) (duration: 00m 56s)
  • 16:14 bblack: deploying VCL for H/2 coalesce 421 responses - T207340
  • 16:12 marostegui: Poweroff pc2010 for on-site maintenance T227552
  • 15:52 mforns@deploy1001: Finished deploy [analytics/refinery@eb2d9b0]: deploying analytics-refinery up to eb2d9b0 (duration: 13m 09s)
  • 15:45 bstorm_: restarting nfs service on labstore1004
  • 15:39 mforns@deploy1001: Started deploy [analytics/refinery@eb2d9b0]: deploying analytics-refinery up to eb2d9b0
  • 15:24 thcipriani: restarting jenkins for update
  • 15:22 ema: cp-ats: upgrade fifo-log-demux to 0.4 and restart atsmtail@backend.service T229414
  • 15:17 brennen@deploy1001: rebuilt and synchronized wikiversions files: Group0 to 1.34.0-wmf.16
  • 15:15 ema: upload fifo-log-demux 0.4 to stretch-wikimedia T229414
  • 15:03 XioNoX: power down re1:cr1-codfw (backup) - T226422
  • 14:57 godog: ms-be2018 disablepd 1I:1:1 - T225630
  • 14:47 marostegui@cumin1001: dbctl commit of MediaWiki config (dc=all), diff saved to 'https://phabricator.wikimedia.org/P8838', previous config saved to /var/cache/conftool/dbconfig/20190731-144731-marostegui.json
  • 14:45 marostegui@deploy1001: Synchronized wmf-config/db-eqiad.php: Depool db1112 (duration: 00m 46s)
  • 14:29 marostegui@deploy1001: Synchronized wmf-config/db-eqiad.php: Fully repool db1078 after upgrade and alter (duration: 00m 47s)
  • 14:28 herron: beginning rolling reboots of codfw logstash hosts for security updates
  • 14:28 marostegui@cumin1001: dbctl commit of MediaWiki config (dc=all), diff saved to 'https://phabricator.wikimedia.org/P8837', previous config saved to /var/cache/conftool/dbconfig/20190731-142814-marostegui.json
  • 14:18 cdanis@deploy1001: Synchronized wmf-config/etcd.php: I02d66736 expand dbctl to 25% of the fleet (duration: 00m 46s)
  • 14:04 elukey@cumin1001: END (PASS) - Cookbook sre.zookeeper.roll-restart-zookeeper (exit_code=0)
  • 14:03 marostegui@deploy1001: Synchronized wmf-config/db-eqiad.php: More traffic to db1078 after upgrade and alter (duration: 00m 46s)
  • 14:01 marostegui@cumin1001: dbctl commit of MediaWiki config (dc=all), diff saved to 'https://phabricator.wikimedia.org/P8836', previous config saved to /var/cache/conftool/dbconfig/20190731-140124-marostegui.json
  • 13:56 marostegui@deploy1001: Synchronized wmf-config/db-eqiad.php: Slowly repool db1078 after upgrade and alter (duration: 00m 46s)
  • 13:51 marostegui@cumin1001: dbctl commit of MediaWiki config (dc=all), diff saved to 'https://phabricator.wikimedia.org/P8835', previous config saved to /var/cache/conftool/dbconfig/20190731-135129-marostegui.json
  • 13:49 elukey@cumin1001: START - Cookbook sre.zookeeper.roll-restart-zookeeper
  • 13:46 ema: cp4021: test fifo-log-demux 0.4 T229414
  • 13:37 elukey@cumin1001: END (PASS) - Cookbook sre.zookeeper.roll-restart-zookeeper (exit_code=0)
  • 13:35 herron: beginning rolling restarts of codfw kafka-main brokers for security updates
  • 13:32 jbond42: rolling update of exim
  • 13:31 elukey@cumin1001: START - Cookbook sre.zookeeper.roll-restart-zookeeper
  • 13:27 elukey: roll restart of zookeeper on conf100[4-6] and conf200[1-3] for openjdk upgrades
  • 13:19 marostegui@deploy1001: Synchronized wmf-config/db-eqiad.php: Depool db1078 for alter and upgrade (duration: 00m 47s)
  • 13:19 marostegui: Upgrade db1078
  • 13:19 marostegui@cumin1001: dbctl commit of MediaWiki config (dc=all), diff saved to 'https://phabricator.wikimedia.org/P8834', previous config saved to /var/cache/conftool/dbconfig/20190731-131900-marostegui.json
  • 13:15 marostegui: Drop abuse_filter_log.afl_log_id in s3 eqiad - T226851
  • 13:12 elukey@cumin1001: END (PASS) - Cookbook sre.zookeeper.roll-restart-zookeeper (exit_code=0)
  • 13:05 elukey@cumin1001: START - Cookbook sre.zookeeper.roll-restart-zookeeper
  • 12:59 elukey@cumin1001: END (PASS) - Cookbook sre.zookeeper.roll-restart-zookeeper (exit_code=0)
  • 12:53 marostegui: Drop abuse_filter_log.afl_log_id from s3 codfw with replication (this will cause lag in s3 codfw) - T226851
  • 12:53 elukey@cumin1001: START - Cookbook sre.zookeeper.roll-restart-zookeeper
  • 12:22 Amir1: EU SWAT is done
  • 12:19 ladsgroup@deploy1001: Synchronized wmf-config/InitialiseSettings.php: SWAT: Revert: Switch property terms migration to WRITE_NEW on production wikidata (T225053) (duration: 00m 47s)
  • 12:06 ladsgroup@deploy1001: Synchronized wmf-config/InitialiseSettings.php: SWAT: Switch property terms migration to WRITE_NEW on production wikidata (T225053) (duration: 00m 47s)
  • 12:05 ladsgroup@deploy1001: sync-file aborted: SWAT: Switch property terms migration to WRITE_NEW on production wikidata (T225053) (duration: 00m 03s)
  • 11:56 jbond42: enable puppet fleet wide https://gerrit.wikimedia.org/r/c/operations/puppet/+/526645 deployed
  • 11:52 kartik@deploy1001: Synchronized php-1.34.0-wmf.15/extensions/ExternalGuidance: SWAT: 526637|Provide the messages in the target language of translation (T228019) (duration: 00m 46s)
  • 11:41 jbond42: disable puppet to deploy https://gerrit.wikimedia.org/r/c/operations/puppet/+/526645
  • {{safesubst:SAL entry|1=11:40 ladsgroup@deploy1001: Synchronized wmf-config/Wikibase.php: SWAT: [[gerrit:526646|Fix typo in name of config (T225055) (duration: 00m 47s)}}
  • 11:25 kartik@deploy1001: Synchronized wmf-config/InitialiseSettings.php: SWAT: Decrease idwiki MT threshold for publishing (T228971) (duration: 00m 48s)
  • 11:16 cparle@deploy1001: Synchronized wmf-config/InitialiseSettings.php: [SDC] Enable other statements on Commons (duration: 00m 48s)
  • 10:08 elukey@cumin1001: END (PASS) - Cookbook sre.kafka.roll-restart-mirror-maker (exit_code=0)
  • 10:05 jbond42: rolling back https://gerrit.wikimedia.org/r/q/c9f876e9990fb171f27616515e7d125824d7a6ac
  • 09:56 elukey@cumin1001: START - Cookbook sre.kafka.roll-restart-mirror-maker
  • 09:49 _joe_: pruning orphaned images on contint1001
  • 08:37 elukey: restart Yarn Resource Managers on an-master100[12] to pick up the new openjdk version
  • 08:06 _joe_: running puppet (and restarting mtail) on all eqiad appservers
  • 08:05 elukey: restart hadoop Namenodes on an-master100[12] to pick up new heap settings and new openjdk
  • 07:40 marostegui: Drop abuse_filter_log.afl_log_id in s1 eqiad - T226851
  • 07:36 marostegui@cumin1001: dbctl commit of MediaWiki config (dc=codfw), diff saved to 'https://phabricator.wikimedia.org/P8833', previous config saved to /var/cache/conftool/dbconfig/20190731-073608-marostegui.json
  • 07:34 marostegui@deploy1001: Synchronized wmf-config/db-eqiad.php: Provision db2125 into s2 T228969 (duration: 00m 47s)
  • 07:34 marostegui@deploy1001: Synchronized wmf-config/db-codfw.php: Provision db2125 into s2 T228969 (duration: 00m 49s)
  • 07:29 elukey: restart-hhvm on mw1290
  • 07:25 marostegui: Add db2125 to tendril and zarcillo T228969
  • 05:44 marostegui: Drop abuse_filter_log.afl_log_id from s1 codfw with replication (this will cause lag in s1 codfw) - T226851
  • 05:39 marostegui@deploy1001: Synchronized wmf-config/db-codfw.php: Clarify that db2128 is the new sanitarium master (duration: 00m 47s)
  • 05:00 marostegui: Compress s6 on labsdb1010 - T222978
  • 04:00 tstarling@deploy1001: Synchronized php-1.34.0-wmf.16/tests/phpunit/includes/parser/ParserOutputTest.php: T229366 (duration: 00m 46s)
  • 03:59 tstarling@deploy1001: Synchronized php-1.34.0-wmf.16/includes/parser/ParserOutput.php: T229366 (duration: 00m 47s)
  • 02:24 TimStarling: on mwmaint1002 reverted previous change using scap pull
  • 01:08 TimStarling: on mwmaint1002, editing wikiversions.json locally to move wikimania2006wiki to .16, to investigate T229366
  • 00:24 eileen: tools revision changed from 4910f1507c to 2a56e5e283
  • 00:04 catrope@deploy1001: Synchronized php-1.34.0-wmf.15/extensions/CentralNotice/: T227711 among others (duration: 00m 47s)
  • 00:01 catrope@deploy1001: Synchronized php-1.34.0-wmf.16/extensions/CentralNotice/: T227711 among others (duration: 00m 48s)

2019-07-30

  • 23:31 catrope@deploy1001: Synchronized wmf-config/InitialiseSettings.php: Revert "Enable MobileWebUIActionsTracking schema with 50% sampling rate" (T220016) (duration: 00m 47s)
  • 23:13 catrope@deploy1001: Synchronized wmf-config/InitialiseSettings.php: Specify CentralAuth and OAuth session storage separately from per-wiki session storage (T227097, T227696) (duration: 00m 47s)
  • 23:06 catrope@deploy1001: Synchronized wmf-config/InitialiseSettings.php: Enable MobileWebUIActionsTracking schema with 50% sampling rate (T220016) (duration: 00m 48s)
  • 22:26 crusnov@deploy1001: Finished deploy [netbox/deploy@b76139e]: Upgrade Netbox to v2.6.1 (pass 3) - T226331 (duration: 00m 09s)
  • 22:26 crusnov@deploy1001: Started deploy [netbox/deploy@b76139e]: Upgrade Netbox to v2.6.1 (pass 3) - T226331
  • 22:23 crusnov@deploy1001: Finished deploy [netbox/deploy@b76139e]: Upgrade Netbox to v2.6.1 (pass 2) - T226331 (duration: 00m 10s)
  • 22:23 crusnov@deploy1001: Started deploy [netbox/deploy@b76139e]: Upgrade Netbox to v2.6.1 (pass 2) - T226331
  • 22:19 crusnov@deploy1001: Finished deploy [netbox/deploy@b76139e]: Upgrade Netbox to v2.6.1 - T226331 (duration: 00m 47s)
  • 22:18 crusnov@deploy1001: Started deploy [netbox/deploy@b76139e]: Upgrade Netbox to v2.6.1 - T226331
  • 22:18 crusnov@deploy1001: Finished deploy [netbox/deploy@b76139e]: Upgrade Netbox to v2.6.1 - T226331 (duration: 00m 20s)
  • 22:18 crusnov@deploy1001: Started deploy [netbox/deploy@b76139e]: Upgrade Netbox to v2.6.1 - T226331
  • 22:15 eileen: tools revision changed from 8a464c4f0d to 4910f1507c (reverted pgmysql switch)
  • 22:13 ppchelko@deploy1001: Finished deploy [changeprop/deploy@76b6639]: Report 400 errors by default. T229277 (duration: 01m 29s)
  • 22:11 ppchelko@deploy1001: Started deploy [changeprop/deploy@76b6639]: Report 400 errors by default. T229277
  • 22:02 ppchelko@deploy1001: Finished deploy [restbase/deploy@c7e0e33]: Enable language variants filter for PCS endpoints. T229060, take 2, feeds timed out (duration: 01m 03s)
  • 22:00 ppchelko@deploy1001: Started deploy [restbase/deploy@c7e0e33]: Enable language variants filter for PCS endpoints. T229060, take 2, feeds timed out
  • 22:00 ppchelko@deploy1001: Finished deploy [restbase/deploy@c7e0e33]: Enable language variants filter for PCS endpoints. T229060 (duration: 18m 40s)
  • 21:42 ppchelko@deploy1001: Started deploy [restbase/deploy@c7e0e33]: Enable language variants filter for PCS endpoints. T229060
  • 19:39 brennen@deploy1001: rebuilt and synchronized wikiversions files: Revert "group0 wikis to 1.34.0-wmf.15
  • 19:19 mutante: restbase2017 - sudo systemctl start cassandra-b after it had failed for unknown reason
  • 19:19 XioNoX: repool ulsfo
  • 19:13 brennen@deploy1001: rebuilt and synchronized wikiversions files: group0 to 1.34.0-wmf.16
  • 18:49 XioNoX: rollback vrrp priority changes on cr4-ulsfo
  • 18:48 XioNoX: rollback bump cr4-ulsfo<->cr1-codfw ospf metric
  • 18:39 XioNoX: restart cr4-ulsfo
  • 18:38 elukey@cumin1001: END (PASS) - Cookbook sre.kafka.roll-restart-brokers (exit_code=0)
  • 18:38 XioNoX: bump cr4-ulsfo<->cr1-codfw ospf metric
  • 18:26 XioNoX: failover VRRP master to cr3-ulsfo
  • 18:25 XioNoX: activate transit BGP groups on cr3-ulsfo
  • 18:25 XioNoX: rollback - bump cr3-ulsfo<->cr2-eqord ospf metric
  • 18:15 XioNoX: restart cr3-ulsfo
  • 18:15 brennen@deploy1001: Finished scap: testwiki to php-1.34.0-wmf.16 and rebuild l10n cache (duration: 18m 23s)
  • 18:14 XioNoX: bump cr3-ulsfo<->cr2-eqord ospf metric
  • 18:07 XioNoX: deactivate transit BGP groups on cr3-ulsfo
  • 18:06 XioNoX: failover VRRP master to cr4-ulsfo
  • 17:56 brennen@deploy1001: Started scap: testwiki to php-1.34.0-wmf.16 and rebuild l10n cache
  • 17:55 brennen@deploy1001: Pruned MediaWiki: 1.34.0-wmf.11 (duration: 07m 40s)
  • 17:53 bsitzmann@deploy1001: Finished deploy [mobileapps/deploy@af8b471]: Update mobileapps to ec865a7 (duration: 05m 45s)
  • 17:47 bsitzmann@deploy1001: Started deploy [mobileapps/deploy@af8b471]: Update mobileapps to ec865a7
  • 17:20 XioNoX: depool ulsfo for routers upgrades - T227886
  • 17:15 godog: use wezen.codfw.wmnet instead of syslog.codfw.wmnet for production hosts
  • 17:00 thcipriani: gerrit restart incoming -- gc time increasing causing timeouts
  • 16:46 XioNoX: adding port 9105 to term prometheus in filter labs-in4 - T225296
  • 16:41 cdanis@deploy1001: Synchronized wmf-config/etcd.php: Icf57a2ab enable dbctl on all mw canaries (duration: 00m 47s)
  • 16:37 brennen: cutting 1.34-wmf.16
  • 16:33 elukey@cumin1001: START - Cookbook sre.kafka.roll-restart-brokers
  • 16:22 godog: bounce rsyslog on centrallog1001 - T199406
  • 15:41 elukey@cumin1001: END (FAIL) - Cookbook sre.kafka.roll-restart-brokers (exit_code=99)
  • 15:28 legoktm@deploy1001: Finished scap: Rebuild l10n cache for SecureLinkFixer message (duration: 18m 51s)
  • 15:21 elukey@cumin1001: START - Cookbook sre.kafka.roll-restart-brokers
  • 15:18 jijiki: Disable puppet on mw1347 and mw2136, depool and pool back - T219150
  • 15:13 elukey: remove snakebite from buster-wikimedia (not needed anymore)
  • 15:09 legoktm@deploy1001: Started scap: Rebuild l10n cache for SecureLinkFixer message
  • 15:06 legoktm@deploy1001: Synchronized wmf-config/InitialiseSettings.php: Enable SecureLinkFixer everywhere (T200751) (duration: 00m 47s)
  • 14:48 cdanis@deploy1001: Synchronized wmf-config/etcd.php: I17c55428 dbctl canary on mwdebug*, mw1261, mw1276 (duration: 00m 47s)
  • 14:36 cdanis@deploy1001: Synchronized wmf-config/CommonSettings.php: Ie98a8d9e dbctl canary on mwdebug1001 (duration: 00m 47s)
  • 14:34 cdanis@deploy1001: Synchronized wmf-config/etcd.php: Ie98a8d9e dbctl canary on mwdebug1001 (duration: 00m 47s)
  • 14:33 cdanis@deploy1001: Synchronized docroot/noc/db.php: Ie98a8d9e dbctl canary on mwdebug1001 (duration: 00m 48s)
  • 14:14 fsero: refreshing calico policy from code in eqiad
  • 14:13 fsero: refreshing calico policy from code in codfw
  • 13:38 marostegui: Move db2094:3315 from db2066 to db2128 - T228258
  • 13:14 @: helmfile [CODFW] Ran 'apply' command on namespace 'eventgate-analytics' for release 'analytics' .
  • 13:13 @: helmfile [EQIAD] Ran 'apply' command on namespace 'eventgate-analytics' for release 'analytics' .
  • 12:36 marostegui@cumin1001: dbctl commit of MediaWiki config (dc=all), diff saved to 'https://phabricator.wikimedia.org/P8824', previous config saved to /var/cache/conftool/dbconfig/20190730-123630-marostegui.json
  • 12:21 @: helmfile [EQIAD] Ran 'apply' command on namespace 'termbox' for release 'production' .
  • 12:15 @: helmfile [CODFW] Ran 'apply' command on namespace 'termbox' for release 'production' .
  • 12:13 @: helmfile [STAGING] Ran 'apply' command on namespace 'termbox' for release 'test' .
  • 12:13 jbond42: while testing some changes on the puppet master a bad config caused a small blip in catalouge compilation
  • 12:09 @: helmfile [STAGING] Ran 'apply' command on namespace 'termbox' for release 'staging' .
  • 11:34 @: helmfile [EQIAD] Ran 'apply' command on namespace 'cxserver' for release 'production' .
  • 11:31 @: helmfile [CODFW] Ran 'apply' command on namespace 'cxserver' for release 'production' .
  • 11:30 jijiki: Depool mw1348 and pool back
  • 11:28 @: helmfile [STAGING] Ran 'apply' command on namespace 'cxserver' for release 'staging' .
  • 09:49 elukey: upload python-snakebite to buster-wikimedia (rebuilt for buster from source)
  • 09:31 akosiaris@: helmfile [STAGING] Ran 'apply' command on namespace 'kube-system' for release 'calico-policy-controller' .
  • 09:27 elukey: add thirdparty/cloudera to buster-wikimedia and import packages to it (pull from the jessie component)
  • 08:17 marostegui: Stop MySQL on db2038 T227565
  • 08:10 marostegui: Remove db2038 from tendril and zarcillo T227565
  • 08:04 akosiaris: delete orespoolcounter{1,2}00{1,2} T227640
  • 08:04 akosiaris: revoke and deactivate orespoolcounter{1,2}00{1,2} T227640
  • 07:30 godog: bounce hhvm on mw1221
  • 05:36 marostegui: Disable puppet on cumin2001 to investigate a backups issue
  • 05:25 tstarling@deploy1001: Synchronized php-1.34.0-wmf.15/includes/jobqueue/jobs/AssembleUploadChunksJob.php: T228929 (duration: 00m 46s)
  • 05:24 tstarling@deploy1001: Synchronized php-1.34.0-wmf.15/includes/api/ApiUpload.php: T228929 (duration: 00m 47s)
  • 05:23 tstarling@deploy1001: Synchronized php-1.34.0-wmf.15/includes/upload/UploadBase.php: T228929 (duration: 00m 48s)
  • 05:02 marostegui@deploy1001: Synchronized wmf-config/db-eqiad.php: Remove s8 ready only T227062 (duration: 00m 24s)
  • 05:01 marostegui@deploy1001: Synchronized wmf-config/db-eqiad.php: Switchover s8 master eqiad from db1071 to db1104 T227062 (duration: 00m 24s)
  • 05:00 marostegui@deploy1001: Synchronized wmf-config/db-eqiad.php: Set s8 on read-only T227062 (duration: 00m 26s)
  • 05:00 marostegui: Starting s8 failover from db1071 to db1104 - T227062
  • 04:48 eileen: civicrm revision changed from 1d57aca19c to 218328b29d, config revision is 3f960c48f6
  • 04:15 marostegui: Start pre-steps for s8 primary master failover - T227062
  • 02:37 eileen: civicrm revision changed from 121feb5d53 to 1d57aca19c, config revision is 3f960c48f6

2019-07-29

  • 23:37 XioNoX: replace export policy BGP_Wikimedia_own_space with BGP_Wikimedia_no_dfz in ams
  • 23:36 catrope@deploy1001: Synchronized php-1.34.0-wmf.15/extensions/GrowthExperiments/: Make welcome and discovery tours fully mutually exclusive (T229044) (duration: 00m 48s)
  • 23:26 XioNoX: replace export policy BGP_Wikimedia_own_space with BGP_Wikimedia_no_dfz in ulsfo
  • 23:22 XioNoX: replace export policy BGP_Wikimedia_own_space with BGP_Wikimedia_no_dfz in Dallas
  • 22:33 krinkle@deploy1001: Synchronized php-1.34.0-wmf.15/includes/cache/MessageCache.php: T208897 - fa817b0 (duration: 00m 47s)
  • 22:32 krinkle@deploy1001: Synchronized php-1.34.0-wmf.15/extensions/AbuseFilter/: T214674 - bfcaf0c26d6 (duration: 00m 48s)
  • 22:28 XioNoX: roll out anycast DNS and syslog to all network devices - T228190
  • 22:16 krinkle@deploy1001: Synchronized php-1.34.0-wmf.15/extensions/AbuseFilter/: T214674 - 940955e (duration: 00m 48s)
  • 22:05 XioNoX: replace ulsfo network devices' DNS target with 10.3.0.1
  • 22:00 Krinkle: krinkle@deploy1001: Dirty git status on extensions/AbusesFilter and extensions/CheckUser in php-1.34.0-wmf.15
  • 21:43 XioNoX: replace ulsfo network devices' syslog target with syslog.anycast.wmnet
  • 19:22 smalyshev@deploy1001: Finished deploy [wdqs/wdqs@c3ffbee]: Weekly deploy (duration: 11m 42s)
  • 19:10 smalyshev@deploy1001: Started deploy [wdqs/wdqs@c3ffbee]: Weekly deploy
  • 18:23 Urbanecm: Morning SWAT done
  • 18:21 urbanecm@deploy1001: Synchronized wmf-config/InitialiseSettings.php: SWAT: Rename Image-reviewer to image-reviewer on fawiki (T216406) (duration: 00m 47s)
  • 18:19 Urbanecm: Run mwscript migrateUserGroup.php --wiki=fawiki Image-reviewer image-reviewer (T216406)
  • 18:18 XioNoX: switch traffic to the GTT link between Ashburn and Amsterdam (set GTT metric to 820 vs. 1820 before) - T228827
  • 18:10 urbanecm@deploy1001: Synchronized wmf-config/InitialiseSettings.php: SWAT: Add several rights to eliminators in fawiki (T176553, 2/2) (duration: 00m 47s)
  • 18:08 urbanecm@deploy1001: Synchronized wmf-config/abusefilter.php: SWAT: Add several rights to eliminators in fawiki (T176553, 1/2) (duration: 00m 47s)
  • 18:04 urbanecm@deploy1001: Synchronized php-1.34.0-wmf.15/extensions/AbuseFilter: SWAT: Initialize user-defined variables during shortcircuit (T214674) (duration: 00m 49s)
  • 17:37 ejegg: updated payments-wiki config to a7dacbf8e9
  • 17:08 XioNoX: reprepro copy buster-wikimedia stretch-wikimedia python3-anycast-healthchecker
  • 17:05 XioNoX: reprepro copy buster-wikimedia stretch-wikimedia python3-json-logger
  • 17:05 XioNoX: reprepro copy buster-wikimedia stretch-wikimedia anycast-healthchecker
  • 16:47 godog: add anycast syslog to wezen/centrallog1001
  • 16:19 elukey: manually stopped the sre.kafka.roll-restart-brokers cookbook after 4 brokers restarts since the sleep interval (10mins) is too tight.
  • 16:17 elukey@cumin1001: END (ERROR) - Cookbook sre.kafka.roll-restart-brokers (exit_code=97)
  • 15:35 otto@deploy1001: Synchronized wmf-config/InitialiseSettings.php: Retry - Produce resource_change stream to eventgate-main - T211248 (duration: 00m 46s)
  • 15:34 elukey@cumin1001: START - Cookbook sre.kafka.roll-restart-brokers
  • 15:30 otto@deploy1001: Synchronized wmf-config/InitialiseSettings.php: Produce resource_change stream to eventgate-main - T211248 (duration: 00m 47s)
  • 14:35 papaul: shutting down pc2010 for maintenance
  • 13:57 cdanis@cumin1001: dbctl commit of MediaWiki config (dc=all), diff saved to 'https://phabricator.wikimedia.org/P8816', previous config saved to /var/cache/conftool/dbconfig/20190729-135730-cdanis.json
  • 13:30 elukey@cumin1001: END (PASS) - Cookbook sre.druid.roll-restart-workers (exit_code=0)
  • 13:28 marostegui: Stop MySQL on pc2010 - T227552
  • 13:23 arturo: T228870 reboot cloudvirt1007.eqiad.wmnet for kernel updates
  • 13:23 aborrero@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
  • 13:23 aborrero@cumin1001: START - Cookbook sre.hosts.downtime
  • 13:09 arturo: T228870 reboot cloudvirt1006.eqiad.wmnet for kernel updates
  • 13:09 aborrero@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
  • 13:09 aborrero@cumin1001: START - Cookbook sre.hosts.downtime
  • 13:01 elukey@cumin1001: START - Cookbook sre.druid.roll-restart-workers
  • 12:45 marostegui@deploy1001: Synchronized wmf-config/db-eqiad.php: Provision db2128 into s5 api T221533 (duration: 00m 47s)
  • 12:45 marostegui: Provision db2128 into s5 codfw - T228969
  • 12:44 marostegui@deploy1001: Synchronized wmf-config/db-codfw.php: Provision db2128 into s5 api T221533 (duration: 00m 47s)
  • 12:39 arturo: T228870 reboot cloudvirt1005.eqiad.wmnet for kernel updates
  • 12:38 aborrero@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
  • 12:38 aborrero@cumin1001: START - Cookbook sre.hosts.downtime
  • 12:20 arturo: T228870 reboot cloudvirt1004.eqiad.wmnet for kernel updates
  • 12:20 aborrero@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
  • 12:20 aborrero@cumin1001: START - Cookbook sre.hosts.downtime
  • 11:58 arturo: T228870 reboot cloudvirt1003.eqiad.wmnet for kernel updates
  • 11:57 aborrero@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
  • 11:57 aborrero@cumin1001: START - Cookbook sre.hosts.downtime
  • 11:36 arturo: icinga downtime toolschecker for 6h
  • 11:31 arturo: T228870 reboot cloudvirt1002.eqiad.wmnet for kernel updates
  • 11:31 aborrero@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
  • 11:31 aborrero@cumin1001: START - Cookbook sre.hosts.downtime
  • 11:14 arturo: T228870 reboot cloudvirt1001.eqiad.wmnet for kernel updates
  • 11:14 aborrero@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
  • 11:13 aborrero@cumin1001: START - Cookbook sre.hosts.downtime
  • 11:13 aborrero@cumin1001: END (FAIL) - Cookbook sre.hosts.downtime (exit_code=99)
  • 11:13 aborrero@cumin1001: START - Cookbook sre.hosts.downtime
  • 11:11 dcausse: EU SWAT done
  • 11:10 dcausse@deploy1001: Synchronized wmf-config/SearchSettingsForWikidata.php: [cirrus] Use correct factory declaration for EntityFullTextQueryBuilder (duration: 00m 47s)
  • 10:37 jdrewniak@deploy1001: Synchronized portals: Wikimedia Portals Update: Bumping portals to master (T128546) (duration: 00m 47s)
  • 10:36 jdrewniak@deploy1001: Synchronized portals/wikipedia.org/assets: Wikimedia Portals Update: Bumping portals to master (T128546) (duration: 00m 47s)
  • 09:49 marostegui: Add db2128 to tendril and zarcillo - T228969
  • 09:24 elukey@cumin1001: END (FAIL) - Cookbook sre.druid.roll-restart-workers (exit_code=99)
  • 09:22 elukey@cumin1001: START - Cookbook sre.druid.roll-restart-workers
  • 09:21 elukey@cumin1001: END (PASS) - Cookbook sre.druid.roll-restart-workers (exit_code=0)
  • 08:55 elukey@cumin1001: START - Cookbook sre.druid.roll-restart-workers
  • 08:51 root@: helmfile [STAGING] Ran 'apply' command on namespace 'kube-system' for release 'calico-policy-controller' .
  • 08:47 elukey: set mcrouter async behavior for codfw replication to all mw app/api servers (changes will be picked up when puppet runs on the hosts) - T225642
  • 08:35 godog: temp stop puppet on cp hosts to apply https://gerrit.wikimedia.org/r/c/operations/puppet/+/525259
  • 08:32 elukey@cumin1001: END (ERROR) - Cookbook sre.hadoop.roll-restart-workers (exit_code=97)
  • 08:32 elukey@cumin1001: START - Cookbook sre.hadoop.roll-restart-workers
  • 08:16 marostegui: Drop abuse_filter_log.afl_log_id in s7 eqiad - T226851
  • 07:49 dcausse: elastic@eqiad force recovery of failed shards (eswiki stuck)
  • 07:30 marostegui@deploy1001: Synchronized wmf-config/db-eqiad.php: Remove db2038 from config T221533 (duration: 00m 46s)
  • 07:29 marostegui@deploy1001: Synchronized wmf-config/db-codfw.php: Remove db2038 from config T221533 (duration: 00m 50s)
  • 07:18 elukey@cumin1001: END (PASS) - Cookbook sre.hadoop.roll-restart-workers (exit_code=0)
  • 06:45 akosiaris: poweroff orespoolcounter{1,2}00{1,2} for removal T227640
  • 06:37 _joe_: restarted php7.2 on mwdebug1002, low opcache
  • 06:36 _joe_: restarted coherence report on netmon1002, it failed earlier this morning
  • 06:31 _joe_: restarting nrpe on restbase-dev1006 T224260
  • 06:30 elukey@cumin1001: START - Cookbook sre.hadoop.roll-restart-workers
  • 05:33 marostegui@deploy1001: Synchronized wmf-config/db-eqiad.php: Depool db1104 in preparation for Tuesday 30th failover in s8 (duration: 00m 54s)
  • 05:18 marostegui: Drop Drop abuse_filter_log.afl_log_id from s7 codfw with replication (this will cause lag in s7 codfw) - T226851
  • 05:05 marostegui: Remove db1072 from tendril and zarcillo T228956

2019-07-28

  • 15:13 arturo: disable 1m load average check in icinga for labstore1007 for 24h

2019-07-27

  • 17:39 bd808: Updated profile & images for @wikimediatech twitter account
  • 14:49 godog: bounce rsyslog on wezen / centrallog1001
  • 06:43 elukey: powercycle mw1300 - no ssh, serial com2 stuck with no root loging available
  • 00:35 mutante: restbase-dev1006 - starting nagios-nrpe-server
  • 00:33 mutante: wikitech-static - fix /etc/letsencrypt/renewal/wikitech-static.wikimedia.org.conf - remove webroot_map and and line for status.wm.org that caused errors when doing a renewal dry-run. now dry run finishes succesfully and we are using "webroot" authenticator and not "apache" anymore. This should have resolved what this ticket was about. No more Apache kills/restarts on renewal. (T214640)

2019-07-26

  • 23:51 mutante: restbase-dev1006 - manually booting into PXE to debug boot issue / start Debian installer (T224260)
  • 23:27 mutante: restbase-dev1006 - does not boot - hangs at "attempting to boot from C:" - entering "Legacy BIOS One Time Boot Menu" (T224260)
  • 21:52 mutante: restbase-dev1006 - power reset via mgmt
  • 20:48 mutante: restbase-dev1006 - rebooting from busybox shell where it was idling since a failed reimage attempt
  • 20:22 foks: reset password for Sharons36
  • 18:43 XioNoX: remove lvs100[1-6] switch config from asw2-d-eqiad - T224223
  • 18:33 mutante: deploy2001 - delgroup gerrit-root (follow-up to https://gerrit.wikimedia.org/r/c/operations/puppet/+/525444)
  • 18:32 mutante: deploy1001 - delgroup gerrit-root (follow-up to https://gerrit.wikimedia.org/r/c/operations/puppet/+/525444)
  • 18:20 XioNoX: remove lvs100[1-6] switch config from asw2-c-eqiad - T224223
  • 18:08 XioNoX: remove lvs100[1-6] switch config from asw2-b-eqiad - T224223
  • 18:01 XioNoX: remove lvs100[1-6] switch config from asw2-a-eqiad - T224223
  • 17:37 robh@cumin1001: END (PASS) - Cookbook sre.hosts.decommission (exit_code=0)
  • 17:37 robh@cumin1001: START - Cookbook sre.hosts.decommission
  • 16:05 jforrester@deploy1001: Synchronized php-1.34.0-wmf.15/extensions/Flow/includes/Search/Iterators/TopicIterator.php: T229114 make orderUUID public, as it is needed by other classes for Dumps (duration: 00m 47s)
  • 15:12 krinkle@deploy1001: Synchronized wmf-config/CommonSettings.php: de08224 (duration: 00m 48s)
  • 15:02 Krinkle: krinkle@deploy1001: php-1.34.0-wmf.15 is still dirty on extensions/CheckUser
  • 14:23 ema: re-enable puppet on cache nodes T229091
  • 14:10 ema: disable puppet on cache nodes T229091
  • 13:41 fsero: sudo -i reprepro --ignore=wrongdistribution include stretch-wikimedia /home/fsero/envoyproxy_1.11.0~wmf1_amd64.changes
  • 13:41 jeh: updated labstore100[67].wikimedia.org performance scaling_governor T225713
  • 13:07 jeh: rebooting labstore1006.wikimedia.org for updates T224228
  • 13:00 Urbanecm: Change user email assigned to SUL user Stansfield (T229004)
  • 12:45 jeh: rebooting labsdb1012.eqiad.wmnet for updates T224228
  • 12:35 marostegui@deploy1001: Synchronized wmf-config/db-codfw.php: Provision db2123 into s5 vslow T221533 (duration: 00m 50s)
  • 09:32 marostegui@deploy1001: Synchronized wmf-config/db-codfw.php: Provision db2123 into s5 T228969 (duration: 00m 47s)
  • 09:31 marostegui@deploy1001: Synchronized wmf-config/db-eqiad.php: Provision db2123 into s5 T228969 (duration: 00m 48s)
  • 08:42 marostegui: Add db2123 to tendril and zarcillo - T228969
  • 06:12 marostegui@deploy1001: Synchronized wmf-config/db-eqiad.php: Fully repool db1096 (duration: 00m 47s)
  • 06:01 marostegui@deploy1001: Synchronized wmf-config/db-eqiad.php: More traffic to db1096 (duration: 00m 47s)
  • 05:45 marostegui@deploy1001: Synchronized wmf-config/db-eqiad.php: More traffic to db1096 (duration: 00m 46s)
  • 05:40 marostegui: Stop MySQL on db1072 to get it ready for decommission - T228956
  • 05:36 marostegui@deploy1001: Synchronized wmf-config/db-eqiad.php: Slowly repool db1096 (duration: 00m 48s)
  • 05:05 marostegui: Stop MySQL on db1096 for upgrade
  • 05:00 marostegui@deploy1001: Synchronized wmf-config/db-eqiad.php: Depool db1096 (duration: 00m 49s)
  • 00:53 ejegg: re-enabled dedupe_civicrm_contacts and major_gifts_addresses fundraising jobs
  • 00:51 ejegg: re-enabled donations queue consumer
  • 00:15 ejegg: disabled donations queue consumer

2019-07-25

  • 23:47 catrope@deploy1001: Synchronized php-1.34.0-wmf.15/extensions/GrowthExperiments/extension.json: Fix over-eager GrowthExperiments popups (T229045) (duration: 00m 50s)
  • 23:19 urbanecm@deploy1001: Synchronized wmf-config/CommonSettings.php: SWAT: Revert "Delete Image-reviewer group from commonswiki for good" (T228098) (duration: 00m 47s)
  • 23:18 urbanecm@deploy1001: Synchronized wmf-config/InitialiseSettings.php: SWAT: Add sju, sjd, and rmf to wmgExtraLanguageNames (T226701) (duration: 00m 47s)
  • 23:11 urbanecm@deploy1001: Synchronized wmf-config/InitialiseSettings.php: SWAT: Enable VisualEditor in namespace Wikipédia on Slovak Wikipedia (T229014) (duration: 00m 48s)
  • 22:34 ejegg: re-enabled donations queue consumer
  • 22:07 bblack: lvs1013 - restart pybal for resolv.conf changes - T228190
  • 22:04 bblack: lvs1014 - restart pybal for resolv.conf changes - T228190
  • 22:02 bblack: lvs1015 - restart pybal for resolv.conf changes - T228190
  • 22:02 ejegg: turned off dedupe_civicrm_contacts fundraising job
  • 21:59 bblack: lvs1016 - restart pybal for resolv.conf changes - T228190
  • 21:47 bblack: primary high-traffic2 lvses in codfw, esams, ulsfo: restart pybal for resolv.conf changes - T228190
  • 21:46 XioNoX: apply export BGP_Wikimedia_no_dfz to eqiad's Confed_esams - T227808
  • 21:40 ejegg: turned off major_gifts_addresses fundraising job
  • 21:38 bblack: primary high-traffic1 lvses in codfw, esams, ulsfo: restart pybal for resolv.conf changes - T228190
  • 21:07 bblack: backup lvses in codfw, esams, ulsfo: restart pybal for resolv.conf changes - T228190
  • 20:54 hashar: Rebasing mediawiki/extensions/MobileFrontend@wmf/1.34.0-wmf.15 for a build/CI related change to package.json https://gerrit.wikimedia.org/r/#/c/mediawiki/extensions/MobileFrontend/+/525632/
  • 20:37 XioNoX: add prometheus-bird-exporter to stretch-wikimedia repo
  • 20:15 robh@cumin1001: END (PASS) - Cookbook sre.hosts.decommission (exit_code=0)
  • 20:15 robh@cumin1001: START - Cookbook sre.hosts.decommission
  • 20:03 robh@cumin1001: END (PASS) - Cookbook sre.hosts.decommission (exit_code=0)
  • 20:02 robh@cumin1001: START - Cookbook sre.hosts.decommission
  • 19:59 ppchelko@deploy1001: Finished deploy [restbase/deploy@279cf27]: Set proper CSP headers for mobile-html T229016, feeds timing out. (duration: 05m 34s)
  • 19:53 ppchelko@deploy1001: Started deploy [restbase/deploy@279cf27]: Set proper CSP headers for mobile-html T229016, feeds timing out.
  • 19:53 robh@cumin1001: END (PASS) - Cookbook sre.hosts.decommission (exit_code=0)
  • 19:53 robh@cumin1001: START - Cookbook sre.hosts.decommission
  • 19:52 ppchelko@deploy1001: Finished deploy [restbase/deploy@279cf27]: Set proper CSP headers for mobile-html T229016, take 3 (duration: 03m 14s)
  • 19:49 ppchelko@deploy1001: Started deploy [restbase/deploy@279cf27]: Set proper CSP headers for mobile-html T229016, take 3
  • 19:49 ppchelko@deploy1001: Finished deploy [restbase/deploy@279cf27]: Set proper CSP headers for mobile-html T229016, take 2 (duration: 06m 33s)
  • 19:44 robh@cumin1001: END (PASS) - Cookbook sre.hosts.decommission (exit_code=0)
  • 19:44 robh@cumin1001: START - Cookbook sre.hosts.decommission
  • 19:42 ppchelko@deploy1001: Started deploy [restbase/deploy@279cf27]: Set proper CSP headers for mobile-html T229016, take 2
  • 19:42 ppchelko@deploy1001: Finished deploy [restbase/deploy@279cf27]: Set proper CSP headers for mobile-html T229016 (duration: 13m 42s)
  • 19:29 ppchelko@deploy1001: Started deploy [restbase/deploy@279cf27]: Set proper CSP headers for mobile-html T229016
  • 19:04 robh@cumin1001: END (PASS) - Cookbook sre.hosts.decommission (exit_code=0)
  • 19:04 robh@cumin1001: START - Cookbook sre.hosts.decommission
  • 19:03 robh@cumin1001: END (PASS) - Cookbook sre.hosts.decommission (exit_code=0)
  • 19:03 robh@cumin1001: START - Cookbook sre.hosts.decommission
  • 19:01 otto@: helmfile [STAGING] Ran 'apply' command on namespace 'eventgate-analytics' for release 'analytics' .
  • 18:36 robh@cumin1001: END (PASS) - Cookbook sre.hosts.decommission (exit_code=0)
  • 18:36 robh@cumin1001: START - Cookbook sre.hosts.decommission
  • 18:19 robh@cumin1001: END (PASS) - Cookbook sre.hosts.decommission (exit_code=0)
  • 18:19 robh@cumin1001: START - Cookbook sre.hosts.decommission
  • 18:19 robh@cumin1001: END (PASS) - Cookbook sre.hosts.decommission (exit_code=0)
  • 18:19 robh@cumin1001: START - Cookbook sre.hosts.decommission
  • 18:04 robh@cumin1001: END (PASS) - Cookbook sre.hosts.decommission (exit_code=0)
  • 18:04 robh@cumin1001: START - Cookbook sre.hosts.decommission
  • 18:04 robh@cumin1001: END (PASS) - Cookbook sre.hosts.decommission (exit_code=0)
  • 18:04 robh@cumin1001: START - Cookbook sre.hosts.decommission
  • 18:04 robh@cumin1001: END (PASS) - Cookbook sre.hosts.decommission (exit_code=0)
  • 18:04 robh@cumin1001: START - Cookbook sre.hosts.decommission
  • 18:00 robh@cumin1001: END (PASS) - Cookbook sre.hosts.decommission (exit_code=0)
  • 18:00 robh@cumin1001: START - Cookbook sre.hosts.decommission
  • 18:00 robh@cumin1001: END (PASS) - Cookbook sre.hosts.decommission (exit_code=0)
  • 17:59 robh@cumin1001: START - Cookbook sre.hosts.decommission
  • 17:59 robh@cumin1001: END (PASS) - Cookbook sre.hosts.decommission (exit_code=0)
  • 17:59 robh@cumin1001: START - Cookbook sre.hosts.decommission
  • 17:58 mbsantos@deploy1001: Finished deploy [mobileapps/deploy@11d9d4a]: Update service-mobileapp-node to 200a323 (T228938 T228287) (duration: 04m 39s)
  • 17:53 mbsantos@deploy1001: Started deploy [mobileapps/deploy@11d9d4a]: Update service-mobileapp-node to 200a323 (T228938 T228287)
  • 17:51 elukey: powercycle stat1007
  • 17:44 volans: sudo cumin -s30 -b1 -m async 'A:wdqs-all and not A:wdqs-internal and not P{wdqs1009.eqiad.wmnet}' 'run-puppet-agent -e "volans - T228122 - deploying gerrit/524954"' 'systemctl restart wdqs-blazegraph'
  • 17:33 volans: running sudo cumin -s30 -b1 -m async 'A:wdqs-internal' 'run-puppet-agent -e "volans - T228122 - deploying gerrit/524954"' 'systemctl restart wdqs-blazegraph'
  • 17:18 volans: disabled puppet on A:wdqs-all, deploying gerrit/524954 - T228122
  • 17:17 elukey@cumin1001: END (PASS) - Cookbook sre.hadoop.rolling-restart-workers (exit_code=0)
  • 17:01 elukey@cumin1001: START - Cookbook sre.hadoop.rolling-restart-workers
  • 16:54 bblack: lvs5001 - restart pybal for resolv.conf change - T228190
  • 16:53 jforrester@deploy1001: Synchronized php-1.34.0-wmf.15/extensions/WikibaseMediaInfo/resources/statements/: T228807 Fix formatValue abort handling (duration: 00m 48s)
  • 16:52 jijiki: Rolling restart of hhvm across the fleet
  • 16:50 bblack: lvs5002 - restart pybal for resolv.conf change - T228190
  • 16:44 bblack: lvs5003 - restart pybal for resolv.conf change - T228190
  • 16:19 jijiki: Disable puppet on mw* servers for 525156
  • 15:52 jeh: rebooting cloudstore1008.wikimedia.org for updates T224228
  • 15:41 jeh: rebooting cloudstore1009.wikimedia.org for updates T224228
  • 15:41 nuria@deploy1001: Finished deploy [analytics/refinery@f310917]: deploying refinery - migrations to hive2 actions (duration: 13m 40s)
  • 15:35 robh@cumin1001: END (PASS) - Cookbook sre.hosts.decommission (exit_code=0)
  • 15:35 robh@cumin1001: START - Cookbook sre.hosts.decommission
  • 15:35 robh@cumin1001: END (PASS) - Cookbook sre.hosts.decommission (exit_code=0)
  • 15:35 robh@cumin1001: START - Cookbook sre.hosts.decommission
  • 15:32 jforrester@deploy1001: Synchronized wmf-config/CommonSettings.php: Remove redundant wgResourceLoaderStorageEnabled override (duration: 00m 50s)
  • 15:27 nuria@deploy1001: Started deploy [analytics/refinery@f310917]: deploying refinery - migrations to hive2 actions
  • 15:09 jeh: rebooting labstore1004.eqiad.wmnet for updates T224228
  • 14:42 ebernhardson@deploy1001: Finished deploy [wikimedia/discovery/analytics@87b25f2]: Convert oozie actions from hive to hive2 (duration: 00m 19s)
  • 14:42 ebernhardson@deploy1001: Started deploy [wikimedia/discovery/analytics@87b25f2]: Convert oozie actions from hive to hive2
  • 14:22 root@: helmfile [STAGING] Ran 'apply' command on namespace 'kube-system' for release 'calico-policy-controller' .
  • 14:22 root@: helmfile [STAGING] Ran 'apply' command on namespace 'kube-system' for release 'coredns' .
  • 14:22 root@: helmfile [STAGING] Ran 'apply' command on namespace 'kube-system' for release 'rbac-deploy-clusterrole' .
  • 14:06 root@: helmfile [STAGING] Ran 'apply' command on namespace 'kube-system' for release 'calico-policy-controller' .
  • 14:06 root@: helmfile [STAGING] Ran 'apply' command on namespace 'kube-system' for release 'coredns' .
  • 14:06 root@: helmfile [STAGING] Ran 'apply' command on namespace 'kube-system' for release 'rbac-deploy-clusterrole' .
  • 14:02 moritzm: installing Java security updates on Druid servers
  • 13:52 moritzm: installing Java security updates on AQS, Hadoop and Kafka/Jumbo servers
  • 13:49 root@: helmfile [STAGING] Ran 'apply' command on namespace 'kube-system' for release 'calico-policy-controller' .
  • 13:49 root@: helmfile [STAGING] Ran 'apply' command on namespace 'kube-system' for release 'coredns' .
  • 13:49 root@: helmfile [STAGING] Ran 'apply' command on namespace 'kube-system' for release 'rbac-deploy-clusterrole' .
  • 13:42 root@: helmfile [STAGING] Ran 'apply' command on namespace 'kube-system' for release 'calico-policy-controller' .
  • 13:42 root@: helmfile [STAGING] Ran 'apply' command on namespace 'kube-system' for release 'coredns' .
  • 13:42 root@: helmfile [STAGING] Ran 'apply' command on namespace 'kube-system' for release 'rbac-deploy-clusterrole' .
  • 13:39 root@: helmfile [STAGING] Ran 'apply' command on namespace 'kube-system' for release 'calico-policy-controller' .
  • 13:39 root@: helmfile [STAGING] Ran 'apply' command on namespace 'kube-system' for release 'coredns' .
  • 13:39 root@: helmfile [STAGING] Ran 'apply' command on namespace 'kube-system' for release 'rbac-deploy-clusterrole' .
  • 13:39 root@: helmfile [STAGING] Ran 'apply' command on namespace 'kube-system' for release 'rbac-deploy-clusterrole' .
  • 13:38 root@: helmfile [STAGING] Ran 'apply' command on namespace 'kube-system' for release 'rbac-deploy-clusterrole' .
  • 13:35 robh: cloudvirt1015 offline for ram swap via T220853
  • 13:20 root@: helmfile [STAGING] Ran 'apply' command on namespace 'kube-system' for release 'rbac-deploy-clusterrole' .
  • 13:19 fsero: recreating clusterrole deploy from helmfile in staging
  • 13:09 marostegui: Drop abuse_filter_log.afl_log_id in s5 eqiad - T226851
  • 13:04 liw@deploy1001: rebuilt and synchronized wikiversions files: all wikis to 1.34.0-wmf.15
  • 12:49 marostegui: Drop abuse_filter_log.afl_log_id in s4 codfw (lag will appear on codfw) - T226851
  • 11:53 marostegui: Compress s3 wikis on labsdb1010 - T222978
  • 11:03 arturo: update stretch-wikimedia/thirdparty/kubeadm-k8s on install1002 for T215531 (kubeadm 1.15.1)
  • 10:53 moritzm: rebooting cloudvirt2003-dev
  • 10:52 jmm@cumin2001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
  • 10:52 jmm@cumin2001: START - Cookbook sre.hosts.downtime
  • 10:35 moritzm: rebooting cloudvirt1024 for kernel update
  • 10:35 jmm@cumin2001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
  • 10:35 jmm@cumin2001: START - Cookbook sre.hosts.downtime
  • 09:21 marostegui: Failover m1 from dbproxy1006 to dbproxy1001 - T227139
  • 08:54 jmm@cumin2001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
  • 08:54 jmm@cumin2001: START - Cookbook sre.hosts.downtime
  • 08:54 moritzm: rebooting cloudvirt2001-dev
  • 08:32 Urbanecm: Password reset for SUL user Strejc
  • 08:04 oblivian@puppetmaster1001: conftool action : set/pooled=yes; selector: cluster=api_appserver,dc=eqiad,name=mw128[0-3].*
  • 08:01 oblivian@puppetmaster1001: conftool action : set/pooled=yes; selector: cluster=appserver,dc=eqiad,name=mw12(6[89]|7[0-5]).*
  • 08:01 _joe_: repooling mw1268-1275 in the appserver cluster
  • 08:00 moritzm: rebooting cloudvirt2001-dev
  • 07:59 oblivian@puppetmaster1001: conftool action : set/pooled=yes; selector: cluster=api_appserver,dc=eqiad,name=mw12(7[6-9|8[0-3]).*
  • 07:59 _joe_: repooling mw1276-1283 in the API cluster
  • 07:33 moritzm: rebooting cloudvirt2001-dev
  • 07:23 marostegui: Upgrade MySQL on db1072
  • 07:02 jmm@cumin2001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
  • 07:02 jmm@cumin2001: START - Cookbook sre.hosts.downtime
  • 06:42 elukey: restart kafka* on kafka-jumbo1001 to pick up new openjdk-8 version
  • 06:37 elukey: restart cassandra instances on aqs1004 to pick up new openjdk-8 version
  • 06:34 elukey: add term eventgate to analytics-in4 on cr1/cr2-eqiad - T228882
  • 05:31 twentyafterfour: set phabricator to read-write mode
  • 05:30 marostegui: Failover m3 from db1072 to db1128 - T228243
  • 05:30 twentyafterfour: phabricator set to read-only mode
  • 04:51 marostegui: Start pre-failover steps on m3 T228243
  • 02:02 XioNoX: remove peer AS63541 from cr1-eqsin

2019-07-24

  • 23:46 nuria@deploy1001: Finished deploy [analytics/refinery@7d93398]: deploying refinery 0.0.96 (skipping 0.0.95 due to some jenkins/archiva issues). Try 2 (duration: 13m 34s)
  • 23:43 catrope@deploy1001: Synchronized php-1.34.0-wmf.15/extensions/Flow: Fix JS error when saving Flow board descriptions (T228818) (duration: 01m 01s)
  • 23:42 catrope@deploy1001: Synchronized php-1.34.0-wmf.14/extensions/Flow: Fix JS error when saving Flow board descriptions (T228818) (duration: 01m 03s)
  • 23:39 catrope@deploy1001: Synchronized wmf-config/InitialiseSettings.php: Enable homepage for 50% of new users on arwiki (T228120) (duration: 00m 58s)
  • 23:32 nuria@deploy1001: Started deploy [analytics/refinery@7d93398]: deploying refinery 0.0.96 (skipping 0.0.95 due to some jenkins/archiva issues). Try 2
  • 23:30 nuria@deploy1001: Finished deploy [analytics/refinery@834db0a]: deploying refinery 0.0.96 (skipping 0.0.95 due to some jenkins/archiva issues) (duration: 18m 10s)
  • 23:22 catrope@deploy1001: Synchronized wmf-config/InitialiseSettings.php: Enable GrowthExperiments homepage on arwiki (T228120) (duration: 00m 55s)
  • 23:13 catrope@deploy1001: Synchronized wmf-config/InitialiseSettings.php: Correct typo in arwiki help panel config (T228820) (duration: 00m 57s)
  • 23:12 nuria@deploy1001: Started deploy [analytics/refinery@834db0a]: deploying refinery 0.0.96 (skipping 0.0.95 due to some jenkins/archiva issues)
  • 22:41 thcipriani@: helmfile [EQIAD] Ran 'apply' command on namespace 'blubberoid' for release 'production' .
  • 22:36 thcipriani@: helmfile [CODFW] Ran 'apply' command on namespace 'blubberoid' for release 'production' .
  • 22:28 thcipriani@: helmfile [STAGING] Ran 'apply' command on namespace 'blubberoid' for release 'staging' .
  • 21:22 mutante: <+icinga-wm> RECOVERY - Device not healthy -SMART- on restbase-dev1006 is OK: All metrics within thresholds. (T224260)
  • 21:18 cscott@deploy1001: Finished deploy [parsoid/deploy@abd05ab]: Updating Parsoid to df1af404 (T227216, T226523, T226451) (duration: 18m 35s)
  • 21:16 nuria@deploy1001: Finished deploy [analytics/refinery@58e64c1]: deploying refinery 0.0.95 (duration: 03m 54s)
  • 21:12 nuria@deploy1001: Started deploy [analytics/refinery@58e64c1]: deploying refinery 0.0.95
  • 21:03 ppchelko@deploy1001: Finished deploy [restbase/deploy@7911f65]: Store PCS endpoints T222384 (duration: 18m 18s)
  • 21:00 cscott@deploy1001: Started deploy [parsoid/deploy@abd05ab]: Updating Parsoid to df1af404 (T227216, T226523, T226451)
  • 20:45 ppchelko@deploy1001: Started deploy [restbase/deploy@7911f65]: Store PCS endpoints T222384
  • 20:39 bsitzmann@deploy1001: Finished deploy [mobileapps/deploy@2e2ce6c]: Update mobileapps to 1751a2e (duration: 04m 20s)
  • 20:38 ppchelko@deploy1001: Finished deploy [changeprop/deploy@bf28187]: Rerender PCS endpoints T222384 (duration: 01m 34s)
  • 20:36 ppchelko@deploy1001: Started deploy [changeprop/deploy@bf28187]: Rerender PCS endpoints T222384
  • 20:35 bsitzmann@deploy1001: Started deploy [mobileapps/deploy@2e2ce6c]: Update mobileapps to 1751a2e
  • 20:12 jeh: redirecting dumps.wikimedia.org back to labstore1007.wikimedia.org T224228
  • 19:43 ejegg: updated fundraising CiviCRM from 875ab97742 to 121feb5d53
  • 19:08 legoktm@deploy1001: Synchronized wmf-config/InitialiseSettings.php: Enable SecureLinkFixer on group0 wikis - T200751 (duration: 00m 55s)
  • 18:33 cmjohnson1: moving cloudvirt107 to 10G rack T228691
  • 18:19 krinkle@deploy1001: Synchronized php-1.34.0-wmf.15/includes/cache/localisation/LocalisationCache.php: 31d99eb381bc (duration: 00m 54s)
  • 18:15 ejegg: updated payments-wiki from a28ad541ed to 70b432d309
  • 18:13 urandom: creating new restbase keyspaces -- T228804
  • 18:12 Krinkle: krinkle@deploy1001: extensions/CheckUser is dirty in php-1.34.0-wmf.15
  • 17:14 XioNoX: rollback failover master VIP of ae2.1202 inet6 away from cr1-eqiad - T226782
  • 17:10 XioNoX: Add mr1-codfw<->cr1/2-codfw vlan/link config on asw-a-codfw - T228112
  • 16:44 jijiki: Rolling puppet-enable and apache reload of jobrunners in codfw
  • 16:12 bblack@cumin1001: conftool action : set/pooled=yes; selector: name=dns1001.wikimedia.org
  • 16:12 bblack: re-pooling recdns on dns1001 via confctl - T226782
  • 16:11 bblack: lvs1014 - restore puppet and resolv.conf contents, restart pybal
  • 16:10 bblack: dns1001 - restart recursor and re-enable puppet - T226782
  • 16:07 jforrester@deploy1001: Synchronized php-1.34.0-wmf.15/includes/export/XmlDumpWriter.php: T228720 make XmlDumpwriter more resilient to blob store corruption (duration: 00m 55s)
  • 16:06 jforrester@deploy1001: Synchronized php-1.34.0-wmf.14/includes/export/XmlDumpWriter.php: T228720 make XmlDumpwriter more resilient to blob store corruption (duration: 00m 55s)
  • 15:59 bblack: dns1001 - puppet disable, stop recursor service to kill anycast advert - T226782
  • 15:59 bblack: lvs1014 - puppet disable, remove dns1001 from resolv.conf, restart pybal - T226782
  • 15:58 XioNoX: failover master VIP of ae2.1202 inet6 away from cr1-eqiad - T226782
  • 15:56 bblack@cumin1001: conftool action : set/pooled=no; selector: name=dns1001.wikimedia.org
  • 15:56 bblack: depooling recdns on dns1001 via confctl - T226782
  • 15:56 bblack: depooling recdns on dns1001 via confctl
  • 15:47 jijiki: Rolling puppet-enable and apache reload of jobrunners in eqiad
  • 15:44 jeh: rebooting labstore1007.wikimedia.org for updates T224228
  • 15:42 jijiki: Disable puppet on jobrunners for 525306
  • 15:11 herron: resume ingesting [message] =~ /^SlowTimer/ logs on logstash1007 (as a canary)
  • 15:02 XioNoX: re-enable vc link between asw2-a6 and asw2-a7 - T228823
  • 14:58 jeh: unmounting dumps NFS clients from labstore1007.wikimedia.org T224228
  • 14:54 XioNoX: cleared vc ports stats on asw2-a-eqiad - T228823
  • 14:43 marostegui: Drop abuse_filter_log.afl_log_id in s5 eqiad - T226851
  • 14:40 marostegui: Drop abuse_filter_log.afl_log_id in s5 codfw (lag will appear on codfw) - T226851
  • 14:31 tarrow@: helmfile [CODFW] Ran 'apply' command on namespace 'termbox' for release 'production' .
  • 13:49 robh: rebooting cloudvirt1015 into OS, memory error confirmed. new memory replacement dispatch entered via T220853
  • 13:31 marostegui: Drop abuse_filter_log.afl_log_id in s2 eqiad - T226851
  • 13:25 robh: rebooting cloudvirt1015 into memtest for dell support repair via T220853
  • 13:06 liw@deploy1001: Synchronized php: group1 wikis to 1.34.0-wmf.15 (duration: 00m 54s)
  • 13:05 liw@deploy1001: rebuilt and synchronized wikiversions files: group1 wikis to 1.34.0-wmf.15
  • 12:19 marostegui: Stop haproxy on dbproxy1004 and dbproxy1009 (m4 - eventlogging) - T228768
  • 11:23 awight@deploy1001: Synchronized wmf-config/CommonSettings.php: Disable FileImporter source wiki edits (T228851) (duration: 00m 54s)
  • 11:12 awight@deploy1001: Synchronized wmf-config/InitialiseSettings.php: Remove Content Translation event logging config (part 2/2) (duration: 00m 54s)
  • 11:10 awight@deploy1001: Synchronized wmf-config/CommonSettings.php: Remove Content Translation event logging config (part 1/2) (duration: 00m 59s)
  • 10:04 marostegui: Drop abuse_filter_log.afl_log_id from labswiki (wikitech) and labtestwiki - T226851
  • 09:39 marostegui@deploy1001: Synchronized wmf-config/db-eqiad.php: More traffic to db1082 (duration: 00m 55s)
  • 08:58 marostegui@deploy1001: Synchronized wmf-config/db-eqiad.php: Repool db1082 into API after upgrade (duration: 00m 55s)
  • 08:53 marostegui@deploy1001: Synchronized wmf-config/db-eqiad.php: Slowly repool db1082 after upgrade (duration: 00m 54s)
  • 08:40 marostegui: Stop MySQL on db1082 for upgrade
  • 08:40 marostegui@deploy1001: Synchronized wmf-config/db-eqiad.php: Depool db1082 for upgrade (duration: 00m 57s)
  • 08:35 marostegui: Drop abuse_filter_log.afl_log_id in s2 codfw (lag will appear on codfw) - T226851
  • 07:58 marostegui: Drop abuse_filter_log.afl_log_id from wikidata in eqiad - T226851
  • 07:21 marostegui: Stop MySQL on db1117:3322 to check dbproxy1013 notifications - T202367
  • 07:10 marostegui: Deploy grants for dbproxy1013 in m2 - T202367
  • 05:00 marostegui: Stop puppet on dbprov2001 to generate s5 mysqldump manually
  • 04:52 tstarling@deploy1001: Synchronized php-1.34.0-wmf.14/includes/MediaWiki.php: T227700 (duration: 00m 54s)
  • 04:51 tstarling@deploy1001: Synchronized php-1.34.0-wmf.14/includes/specials/SpecialGoToInterwiki.php: T227700 (duration: 00m 54s)
  • 04:50 tstarling@deploy1001: Synchronized php-1.34.0-wmf.14/includes/MediaWiki.php: T227700 (duration: 00m 53s)
  • 04:49 tstarling@deploy1001: Synchronized php-1.34.0-wmf.14/includes/specials/SpecialGoToInterwiki.php: T227700 (duration: 00m 54s)
  • 04:46 tstarling@deploy1001: Synchronized php-1.34.0-wmf.15/includes/MediaWiki.php: T227700 (duration: 00m 54s)
  • 04:45 tstarling@deploy1001: Synchronized php-1.34.0-wmf.15/includes/specials/SpecialGoToInterwiki.php: T227700 (duration: 00m 54s)
  • 04:42 tstarling@deploy1001: Synchronized php-1.34.0-wmf.15/includes/MediaWiki.php: T227700 (duration: 00m 54s)
  • 04:40 tstarling@deploy1001: Synchronized php-1.34.0-wmf.15/includes/specials/SpecialGoToInterwiki.php: (no justification provided) (duration: 00m 56s)
  • 03:41 tstarling@deploy1001: Synchronized w/fatal-error.php: Adding post-send exception test for T228462 (duration: 00m 54s)
  • 03:39 tstarling@deploy1001: Synchronized wmf-config/InitialiseSettings.php: Adding DeferredUpdates log channel (T228462) (duration: 00m 56s)
  • 02:01 eileen: payments-wiki revision changed from 224c6b2d7b to a28ad541ed, config revision is 8dcb77cf22

2019-07-23

  • 23:44 eileen: civicrm revision changed from 88e9f24893 to 875ab97742, config revision is 4006d3bdc5
  • 23:43 shdubsh: reverting logstash mitigations and re-enable puppet
  • 23:42 jforrester@deploy1001: Synchronized php-1.34.0-wmf.15/includes/diff/DifferenceEngine.php: T228766 Don't double wrap rollback links (duration: 00m 56s)
  • 23:31 mutante: mw1267 - rm -rf /srv/mediawiki/php-1.33.0-wmf.23 ; rm -rf /srv/mediawiki/php-1.32.0-wmf.3 ; scap pull
  • 23:25 dzahn@cumin1001: conftool action : set/pooled=yes; selector: name=mw1267.eqiad.wmnet
  • 22:36 mutante: rolling out scap 3.11.1-1 on mw-eqiad servers
  • 22:14 mutante: continuing rollout of new scap version 3.11.1-1, starting with kafka-all followed by other cumin-alias groups (T228328)
  • 22:06 herron: puppet temporarily disabled on eqiad/codfw logstash collectors while catching up with backlog. see /etc/logstash/conf.d/01-filter_temp_drops.conf
  • 21:52 herron: logstash - temporarily dropping logs matching [message] =~ /^SlowTimer/ due to UTF-8 parsing errors that are stopping the logstash processing pipeline. will re-enable after logstash has caught up with the backlog
  • 20:59 shdubsh: temporarily disable input-kafka-rsyslog-shipper and drop memcached logs on logstash nodes
  • 20:08 paravoid: asw2-a-eqiad: request virtual-chassis vc-port set interface member 6 vcp-255/1/0 disable
  • 19:58 eileen: process-control config revision is 4006d3bdc5 - disabled drush fill donor totals job
  • 19:49 mutante: mwdebug1002 - restarting hhvm - mw1312 - restarted apache
  • 19:44 andrewbogott: restarting rabbitmq-server on cloudcontrol1003 and 1004
  • 19:40 mutante: restarting hhvm on mw1312
  • 19:28 cdanis: depool all appservers in eqiad A7 cdanis@cumin1001.eqiad.wmnet ~ 🍵 sudo cumin 'mw12[67-83]*' 'depool'
  • 19:11 bblack: repool lvs1013 - T227143
  • 19:10 bblack: repool cp1077 + cp1078 - T227143
  • 19:09 elukey: depool mw1261 for investigation
  • 19:06 herron: restarting logstash on logstash100[789]
  • 18:53 robh: mw1271 had power loss event due to pdu swap via T227143
  • 18:45 mutante: rolling out scap 3.11.1-1 on all mw codfw servers (T228328)
  • 18:43 mutante: rolling out scap 3.11.1-1 on mw canary servers (T228328)
  • 18:13 robh: started depooling servers in a7-eqiad for pdu work via T227143
  • 18:11 cdanis: depool mw1267
  • 18:10 cdanis: cdanis@mw1267.eqiad.wmnet /srv/mediawiki ☕ scap pull
  • 18:09 cdanis: cdanis@mw1267.eqiad.wmnet ~ ☕ sudo apt install python-concurrent.futures
  • 18:08 jforrester@deploy1001: Synchronized php-1.34.0-wmf.15/includes/export/XmlDumpWriter.php: T228720 Make XmlDumpwriter resilient to blob store corruption (duration: 00m 54s)
  • 18:07 James_F: Belay that, error on mw1267.
  • 18:06 James_F: Sync error on mw1314.eqiad.wmnet, No module named concurrent.futures
  • 18:06 jforrester@deploy1001: Synchronized php-1.34.0-wmf.14/includes/export/XmlDumpWriter.php: T228720 Make XmlDumpwriter resilient to blob store corruption (duration: 00m 57s)
  • 18:05 bblack: lvs1013 - disable puppet and stop pybal - T227143
  • 18:04 bblack: depool cp1077 + cp1088 - T227143
  • 18:03 cdanis@deploy1001: Synchronized docroot/noc/db.php: 8def4af1d noc db.php: include readonly status & group loads (duration: 00m 55s)
  • 17:52 moritzm: installing Java security updates on kafka/main and Logstash servers
  • 17:38 ppchelko@deploy1001: Finished deploy [changeprop/deploy@6c5c0a3]: Switch internal events to the new schema T226522, step 2 (duration: 01m 37s)
  • 17:36 ppchelko@deploy1001: Started deploy [changeprop/deploy@6c5c0a3]: Switch internal events to the new schema T226522, step 2
  • 17:00 ppchelko@deploy1001: Finished deploy [changeprop/deploy@894f735]: Switch internal events to the new schema T226522 (duration: 01m 30s)
  • 16:58 ppchelko@deploy1001: Started deploy [changeprop/deploy@894f735]: Switch internal events to the new schema T226522
  • 16:22 godog: pool prometheus1003 - T227139
  • 15:46 robh: side b of a5-eqiad swapping pdu via T227141
  • 15:14 otto@: helmfile [CODFW] Ran 'apply' command on namespace 'eventgate-main' for release 'main' .
  • 15:08 _joe_: uninstalling php-pear, php-mail, php-mail-mime from mw1267 T195364
  • 14:52 ppchelko@deploy1001: Finished deploy [restbase/deploy@ea10fa5]: Switch event production to eventgate T211248, attempt 2 (duration: 13m 08s)
  • 14:39 ppchelko@deploy1001: Started deploy [restbase/deploy@ea10fa5]: Switch event production to eventgate T211248, attempt 2
  • 14:14 robh: a3-eqiad pdu swap taking place now via T227139
  • 13:47 otto@: helmfile [EQIAD] Ran 'apply' command on namespace 'eventgate-main' for release 'main' .
  • 13:45 godog: depool restbase1016 restbase1019 restbase1011 restbase1010 prometheus1003 ahead of PDU work - T227139
  • 13:45 otto@: helmfile [CODFW] Ran 'apply' command on namespace 'eventgate-main' for release 'main' .
  • 13:44 moritzm: installing Java security updates on furud/flerovium
  • 13:43 otto@: helmfile [STAGING] Ran 'apply' command on namespace 'eventgate-main' for release 'main' .
  • 13:27 jeh: dumps switching active vps to labstore1006 T224228
  • 13:17 liw@deploy1001: rebuilt and synchronized wikiversions files: group0 to 1.34.0-wmf.15
  • 13:07 jmm@cumin2001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
  • 13:07 jmm@cumin2001: START - Cookbook sre.hosts.downtime
  • 13:06 liw@deploy1001: rebuilt and synchronized wikiversions files: group0 to 1.34.0-wmf.15
  • 13:06 marostegui: Drop abuse_filter_log.afl_log_id from s8 codfw (lag will happen on codfw s8) - T226851
  • 12:33 liw@deploy1001: Finished scap: testwiki to php-1.34.0-wmf.15 and rebuild l10n cache (duration: 29m 46s)
  • 12:04 liw@deploy1001: Started scap: testwiki to php-1.34.0-wmf.15 and rebuild l10n cache
  • 12:02 akosiaris: drain kubernetes1001. T227139
  • 12:01 akosiaris: empty ganeti1007 from running instances. T227139
  • 11:59 akosiaris: enable disable poolcounter1003, switchover codfw poolcounters T224572
  • 11:58 tarrow: EU SWAT finished
  • 11:58 akosiaris@deploy1001: Synchronized wmf-config/ProductionServices.php: (no justification provided) (duration: 00m 46s)
  • 11:56 tarrow@deploy1001: Synchronized wmf-config/InitialiseSettings.php: SWAT: T214902 Fix missing /termbox in SSRTermboxServerUrl (duration: 00m 44s)
  • 11:54 liw@deploy1001: Pruned MediaWiki: 1.34.0-wmf.10 (duration: 07m 55s)
  • 11:43 jijiki: restart php-fpm on mwdebug*
  • 11:25 tarrow@deploy1001: Synchronized wmf-config/InitialiseSettings.php: SWAT: T214902 Enable termbox on testwikidatawiki (duration: 01m 37s)
  • 11:08 jijiki: enable puppet on jobrunners
  • 10:17 marostegui: Drop abuse_filter_log.afl_log_id from db1096:3316, db1139:3316 and dbstore1005:3316 T226851
  • 10:02 moritzm: installing Java security updates on notebook/stat hosts
  • 09:59 fsero@: helmfile [STAGING] Ran 'apply' command on namespace 'termbox' for release 'staging' .
  • 09:59 fsero@: helmfile [STAGING] Ran 'apply' command on namespace 'termbox' for release 'test' .
  • 09:57 jmm@cumin2001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
  • 09:57 jmm@cumin2001: START - Cookbook sre.hosts.downtime
  • 09:53 marostegui: Drop abuse_filter_log.afl_log_id from s6 codfw with replication (this will cause lag in s6 codfw) - T226851
  • 09:51 akosiaris: enable poolcounter1005, disablepoolcounter1001 T224572
  • 09:51 akosiaris@deploy1001: Synchronized wmf-config/ProductionServices.php: (no justification provided) (duration: 00m 47s)
  • 09:23 marostegui@deploy1001: Synchronized wmf-config/db-eqiad.php: Fully repool into API db1100 after upgrade (duration: 00m 46s)
  • 09:13 marostegui@deploy1001: Synchronized wmf-config/db-eqiad.php: Slowly repool into API db1100 after upgrade (duration: 00m 47s)
  • 09:09 akosiaris@deploy1001: Synchronized wmf-config/ProductionServices.php: (no justification provided) (duration: 00m 47s)
  • 09:03 marostegui@deploy1001: Synchronized wmf-config/db-eqiad.php: Slowly repool db1100 after upgrade (duration: 00m 46s)
  • 08:34 marostegui: Upgrade db1100
  • 08:33 marostegui@deploy1001: Synchronized wmf-config/db-eqiad.php: Depool db1100 for upgrade (duration: 00m 53s)
  • 08:08 marostegui: Stop MySQL on db2044 to test dbproxy2002 notifications - T202367
  • 07:31 marostegui: Deploy grants for dbproxy2002 on m2 - T202367
  • 04:52 eileen: civicrm revision changed from d951b07ce3 to 88e9f24893, config revision is f7b7622e27
  • 04:43 marostegui: Failover m1 from dbproxy1001 to dbproxy1006 T227139
  • 00:06 Urbanecm: slwiki updateCollection.php completed (T208984)

2019-07-22

  • 23:54 urbanecm@deploy1001: Synchronized wmf-config/InitialiseSettings.php: SWAT: 524952 Increase hewiki rollback limit for patrollers to 50/60 (duration: 00m 48s)
  • 23:54 Urbanecm: Run mwscript importImages.php --wiki=commonswiki --user=Meisam /home/urbanecm/T223052
  • 23:42 Urbanecm: All updateCollation.php runs completed, except the one for slwiki (T208984)
  • 23:42 urbanecm@deploy1001: Synchronized wmf-config/InitialiseSettings.php: SWAT: Add flood group to ptwiki (T228521) (duration: 00m 47s)
  • 23:39 Urbanecm: Run mwscript updateCollation.php --wiki=slwiktionary --previous-collation=uppercase (T208984)
  • 23:39 Urbanecm: Run mwscript updateCollation.php --wiki=slwikiversity --previous-collation=uppercase (T208984)
  • 23:37 Urbanecm: Run mwscript updateCollation.php --wiki=slwikisource --previous-collation=uppercase (T208984)
  • 23:36 urbanecm@deploy1001: Synchronized wmf-config/InitialiseSettings.php: SWAT: Fix comment in IS.php (noop, T227000) (duration: 00m 46s)
  • 23:34 Urbanecm: Run mwscript updateCollation.php --wiki=slwikiquote --previous-collation=uppercase (T208984)
  • 23:34 Urbanecm: Run mwscript updateCollation.php --wiki=slwikibooks --previous-collation=uppercase (T208984)
  • 23:33 urbanecm@deploy1001: Synchronized wmf-config/flaggedrevs.php: SWAT: Fix "Remove "עמוד" namespace from wgFlaggedRevsNamespaces for hewikisource" (T227000) (duration: 00m 47s)
  • 23:29 Urbanecm: Run mwscript updateCollation.php --wiki=slwiki --previous-collation=uppercase (T208984)
  • 23:27 urbanecm@deploy1001: Synchronized wmf-config/InitialiseSettings.php: SWAT: Set wgCategoryCollation to uca-sl-u-kn on Slovene projects (sl) (T208984) (duration: 00m 47s)
  • 22:11 mutante: dropped zero.wikiMedia.org from DNS (T187716)
  • 21:50 sbassett@deploy1001: Synchronized private/PrivateSettings.php: Further mitigations for T227416 (duration: 00m 46s)
  • 21:38 ppchelko@deploy1001: Finished deploy [restbase/deploy@9a99b17]: Rollback: Switch event production to eventgate T211248 (duration: 13m 01s)
  • 21:35 sbassett@deploy1001: Synchronized wmf-config/InitialiseSettings.php: Revert "Temporary make account creation limits more restrictive" (duration: 00m 47s)
  • 21:27 eileen: civicrm revision is d951b07ce3, config revision is f7b7622e27
  • 21:25 ppchelko@deploy1001: Started deploy [restbase/deploy@9a99b17]: Rollback: Switch event production to eventgate T211248
  • 21:21 ppchelko@deploy1001: Finished deploy [restbase/deploy@ea10fa5]: Switch event production to eventgate T211248 (duration: 16m 14s)
  • 21:21 otto@: helmfile [EQIAD] Ran 'apply' command on namespace 'eventgate-main' for release 'main' .
  • 21:20 otto@: helmfile [CODFW] Ran 'apply' command on namespace 'eventgate-main' for release 'main' .
  • 21:19 otto@: helmfile [STAGING] Ran 'apply' command on namespace 'eventgate-main' for release 'main' .
  • 21:17 otto@: helmfile [STAGING] Ran 'apply' command on namespace 'eventgate-main' for release 'main' .
  • 21:05 eileen: civicrm revision changed from f932e56cd2 to d951b07ce3, config revision is f7b7622e27
  • 21:04 ppchelko@deploy1001: Started deploy [restbase/deploy@ea10fa5]: Switch event production to eventgate T211248
  • 20:04 smalyshev@deploy1001: Finished deploy [wdqs/wdqs@0be6045]: Weekly deploy (duration: 18m 42s)
  • 19:46 smalyshev@deploy1001: Started deploy [wdqs/wdqs@0be6045]: Weekly deploy
  • 19:09 ppchelko@deploy1001: Finished deploy [changeprop/deploy@3f8aad2]: Switch revision-score to eventgate T211248 (duration: 01m 31s)
  • 19:07 ppchelko@deploy1001: Started deploy [changeprop/deploy@3f8aad2]: Switch revision-score to eventgate T211248
  • 18:59 elukey: repool scb1001 after pdu maintenance
  • 18:59 herron: repooling kafka1001 T227140
  • 18:31 catrope@deploy1001: Synchronized wmf-config/InitialiseSettings.php: Enable help panel for 50% of new users on arwiki (T226729) (duration: 00m 47s)
  • 18:17 catrope@deploy1001: Synchronized wmf-config/InitialiseSettings.php: Trying the last sync again, because it's appearing inconsistently (duration: 00m 47s)
  • 18:15 thcipriani: restarting gerrit due to T224448
  • 18:09 catrope@deploy1001: Synchronized wmf-config/InitialiseSettings.php: Enable GrowthExperiments help panel on arwiki (T226729) (duration: 00m 48s)
  • 18:00 elukey: arm keyholder on netmon1002 after power loss
  • 17:35 elukey: depool scb1001 for PDU work T227140
  • 17:22 herron: depooling kafka1001 for PDU work T227140
  • 17:17 nuria@deploy1001: Finished deploy [analytics/refinery@d889893]: deploying refinery jar bump forwebrequest/load jobs (duration: 14m 51s)
  • 17:02 nuria@deploy1001: Started deploy [analytics/refinery@d889893]: deploying refinery jar bump forwebrequest/load jobs
  • 17:02 jijiki: enable puppet on all jobrunners
  • 16:55 jforrester@deploy1001: Synchronized wmf-config/CommonSettings.php: T87899 Use wfLoadExtension for Collection rather than deprecated entry point (duration: 00m 47s)
  • 16:48 jforrester@deploy1001: Synchronized wmf-config/extension-list: Load Collection i18n via extension.json directly (duration: 00m 47s)
  • 16:36 jeh: redirecting dumps.wikimedia.org dns to labstore1006 T224228
  • 15:49 jijiki: Rolling depool and pool of mw1293, mw1294, mw1295, mw1296, mw1299 - T219148
  • 15:38 marostegui: Stop mysql and power off pc2010 for on-site maintenance - T227552
  • 15:35 jforrester@deploy1001: Synchronized php-1.34.0-wmf.14/extensions/Wikibase/lib/WikibaseLib.php: T227814 Wikibase: Define $wgMessagesDirs in WikibaseLib PHP entry point (duration: 00m 48s)
  • 15:27 jijiki: Depool mw1300 and pool back
  • 15:24 jforrester@deploy1001: Synchronized php-1.34.0-wmf.14/includes/export/XmlDumpWriter.php: T228614 XmlDumpWriter: don't load revision text content unless requested to (duration: 00m 48s)
  • 15:17 jijiki: Disable puppet on jobrunners to enable php7_only
  • 14:55 otto@: helmfile [EQIAD] Ran 'apply' command on namespace 'eventgate-main' for release 'main' .
  • 14:53 otto@: helmfile [CODFW] Ran 'apply' command on namespace 'eventgate-main' for release 'main' .
  • 14:44 otto@: helmfile [EQIAD] Ran 'apply' command on namespace 'eventgate-analytics' for release 'analytics' .
  • 14:38 otto@: helmfile [CODFW] Ran 'apply' command on namespace 'eventgate-analytics' for release 'analytics' .
  • 14:30 otto@: helmfile [CODFW] Ran 'apply' command on namespace 'eventgate-analytics' for release 'analytics' .
  • 14:30 ottomata: deploying refactored eventgate chart using eventgate-wikimedia image to eventgate-* services - T226668
  • 14:28 liw@deploy1001: rebuilt and synchronized wikiversions files: all wikis to 1.34.0-wmf.14
  • 13:12 kart_: Updated cxserver to 2019-07-17-074415-production (T227553, T216812)
  • 13:07 kartik@deploy1001: scap-helm cxserver finished
  • 13:07 kartik@deploy1001: scap-helm cxserver cluster eqiad completed
  • 13:07 kartik@deploy1001: scap-helm cxserver upgrade -f cxserver-eqiad-values.yaml production stable/cxserver [namespace: cxserver, clusters: eqiad]
  • 13:02 kartik@deploy1001: scap-helm cxserver finished
  • 13:02 kartik@deploy1001: scap-helm cxserver cluster codfw completed
  • 13:02 kartik@deploy1001: scap-helm cxserver upgrade -f cxserver-codfw-values.yaml production stable/cxserver [namespace: cxserver, clusters: codfw]
  • 13:00 kartik@deploy1001: scap-helm cxserver finished
  • 13:00 kartik@deploy1001: scap-helm cxserver cluster staging completed
  • 12:59 kartik@deploy1001: scap-helm cxserver upgrade -f cxserver-staging-values.yaml staging stable/cxserver [namespace: cxserver, clusters: staging]
  • 12:58 marostegui: Stop MySQL on db1117:3321 to test dbproxy1014 (replacement for dbproxy1006) on m1 - T202367
  • 12:22 moritzm: installing debian-archive-keyring Stretch update (SUA 164)
  • 11:20 awight@deploy1001: Synchronized wmf-config/InitialiseSettings.php: Enable wgNamespacesWithSubpages on main NS for kowikiversity (T228481) (duration: 00m 54s)
  • 11:14 awight@deploy1001: Synchronized wmf-config/CommonSettings-labs.php: SWAT: Enable FileImporter source wiki edit and delete, (remove labs customizations) (T225617, T226532) (duration: 00m 54s)
  • 11:13 awight@deploy1001: Synchronized wmf-config/CommonSettings.php: SWAT: Enable FileImporter source wiki edit and delete (T225617, T226532) (duration: 00m 56s)
  • 10:55 jijiki: Enable puppet on jobrunners
  • 10:27 jijiki: Depool and pool mw1300
  • 10:23 jijiki: Disable puppet on jobrunners for 524336 - T219148
  • 10:21 root@: helmfile [STAGING] Ran 'apply' command on namespace 'kube-system' for release 'coredns' .
  • 10:20 fsero: deploy coredns in staging T226516
  • 09:47 elukey: failover + restart of Hadoop HDFS namenode on an-master1001 to apply GC settings - T228620
  • 09:40 marostegui: Deploy grants on m1 to allow connections from dbproxy1014 - T202367
  • 09:32 elukey: restart hadoop hdfs namenode on an-master1002 to apply new GC settings - T228620
  • 08:33 marostegui: Rename table enwiki.math on db2116 T196055
  • 07:57 marostegui@deploy1001: Synchronized wmf-config/db-eqiad.php: Repool db1134 after schema change T226851 (duration: 00m 51s)
  • 07:54 elukey: sudo -i depool on elastic1046 - broken disk (srv partition not available) - T228606
  • 07:40 elukey: systemctl reset-failed restbase on restbase1007->15 (decommed nodes)
  • 07:27 marostegui: Drop afl_log_id column from enwiki.abuse_filter_log on db1134 T226851
  • 07:27 marostegui@deploy1001: Synchronized wmf-config/db-eqiad.php: Depool db1134 for schema change T226851 (duration: 00m 56s)
  • 07:17 moritzm: installing openjdk-11 security updates
  • 06:47 marostegui: Stop MySQL on db2062 to test dbproxy2001 notification T202367
  • 06:23 elukey: restart hadoop-hdfs-namenode on an-master1002 to verify if out-of-the-ordinary GC activity
  • 06:15 marostegui@deploy1001: Synchronized wmf-config/db-eqiad.php: Remove db1104 from s8 API (duration: 00m 55s)
  • 05:53 marostegui@deploy1001: Synchronized wmf-config/db-eqiad.php: More traffic to db1109 into API (duration: 00m 58s)
  • 05:24 marostegui: Compress more tables on labsdb1009 - T222978
  • 04:48 tstarling@deploy1001: Synchronized php-1.34.0-wmf.14/extensions/TorBlock/extension.json: fixing UBN T228465 (duration: 00m 54s)
  • 04:46 tstarling@deploy1001: Synchronized php-1.34.0-wmf.14/extensions/TorBlock/maintenance/loadExitNodes.php: fixing UBN T228465 (duration: 00m 54s)
  • 04:44 tstarling@deploy1001: Synchronized php-1.34.0-wmf.14/extensions/TorBlock/includes/TorExitNodes.php: fixing UBN T228465 (duration: 00m 56s)
  • 04:17 tstarling@deploy1001: Synchronized php-1.34.0-wmf.14/extensions/CentralAuth/includes/specials/SpecialMultiLock.php: fix UBN bug T227772 (duration: 00m 56s)

2019-07-21

  • 01:06 Urbanecm: Deployed patch for T228574

2019-07-19

  • 22:36 mutante: phab2001 - switching apache to php-fpm and worker instead of mpm-prefork (to match phab1001) (T190568 T137928 T190572)
  • 21:57 eileen: update process control process-control config revision is c913a5f261
  • 21:34 gehel@cumin1001: END (ERROR) - Cookbook sre.wdqs.data-transfer (exit_code=97)
  • 21:25 eileen: civicrm revision changed from 21d3c5a3fc to f932e56cd2, config revision is 9f7eba2193
  • 19:35 gehel@cumin1001: START - Cookbook sre.wdqs.data-transfer
  • 19:35 gehel@cumin1001: END (ERROR) - Cookbook sre.wdqs.data-transfer (exit_code=97)
  • 19:34 gehel@cumin1001: START - Cookbook sre.wdqs.data-transfer
  • 19:07 eevans@: helmfile [CODFW] Ran 'apply' command on namespace 'sessionstore' for release 'production' .
  • 19:02 eevans@: helmfile [EQIAD] Ran 'apply' command on namespace 'sessionstore' for release 'production' .
  • 17:53 cdanis@deploy1001: Synchronized docroot/noc/db.php: noc: db.php: support ?dc=codfw, and cleanups (duration: 00m 56s)
  • 17:44 XioNoX: change netflow target port to 2055 in eqiad
  • 16:17 gehel@cumin1001: END (PASS) - Cookbook sre.wdqs.data-transfer (exit_code=0)
  • 15:55 moritzm: rebooting mw2164 for a test
  • 15:54 jmm@cumin2001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
  • 15:54 jmm@cumin2001: START - Cookbook sre.hosts.downtime
  • 15:40 root@: helmfile [STAGING] Ran 'apply' command on namespace 'kube-system' for release 'coredns' .
  • 15:27 root@: helmfile [STAGING] Ran 'apply' command on namespace 'kube-system' for release 'coredns' .
  • 15:26 root@: helmfile [STAGING] Ran 'apply' command on namespace 'kube-system' for release 'coredns' .
  • 15:22 fsero: deploy coredns in staging T226516
  • 15:03 gehel@cumin1001: START - Cookbook sre.wdqs.data-transfer
  • 14:42 krinkle@deploy1001: Synchronized php-1.34.0-wmf.14/extensions/Collection/Collection.php: 90eed0fad / T87899 (duration: 00m 54s)
  • 14:35 krinkle@deploy1001: Synchronized php-1.34.0-wmf.13/extensions/Collection/Collection.php: 66ce154 / T87899 (duration: 00m 56s)
  • 14:29 ariel@deploy1001: Finished deploy [dumps/dumps@71e62ee]: better exception handling for misc dumps (duration: 00m 03s)
  • 14:29 ariel@deploy1001: Started deploy [dumps/dumps@71e62ee]: better exception handling for misc dumps
  • 14:28 Krinkle: krinkle@deploy1001: Untracked file found in php-1.34-wmf.13
  • 14:28 Krinkle: krinkle@deploy1001: extensions/CheckUser is dirty in php-1.34-wmf.13 and php-1.34-wmf.14
  • 13:30 tarrow@: helmfile [STAGING] Ran 'apply' command on namespace 'termbox' for release 'staging' .
  • 13:04 moritzm: installing bzip2 security updates on jessie
  • 12:28 gehel@cumin1001: END (ERROR) - Cookbook sre.wdqs.data-transfer (exit_code=97)
  • 10:56 root@: helmfile [STAGING] Ran 'apply' command on namespace 'kube-system' for release 'calico-policy-controller' .
  • 10:55 root@: helmfile [STAGING] Ran 'apply' command on namespace 'kube-system' for release 'calico-policy-controller' .
  • 10:53 root@: helmfile [STAGING] Ran 'apply' command on namespace 'kube-system' for release 'calico-policy-controller' .
  • 10:53 fsero: deploying calico from helmfile in staging T227775
  • 10:35 jijiki: enable puppet on jobrunners
  • 10:26 jijiki: disable puppet on jobrunners for 523908
  • 08:37 ariel@deploy1001: Finished deploy [dumps/dumps@440faa0]: more error reporting for stubs/abstracts/pagelogs; more public table dumps by default (duration: 00m 04s)
  • 08:37 ariel@deploy1001: Started deploy [dumps/dumps@440faa0]: more error reporting for stubs/abstracts/pagelogs; more public table dumps by default
  • 08:36 gehel@cumin1001: START - Cookbook sre.wdqs.data-transfer
  • 08:24 gehel: repooling wdqs2004 - T228122
  • 08:22 gehel: repooling wdqs2003 - T228122
  • 08:20 vgutierrez: restart pybal on lvs2003
  • 08:16 vgutierrez: restart pybal on lvs2006
  • 08:10 marostegui@deploy1001: Synchronized wmf-config/db-eqiad.php: Pool db1109 into API (duration: 00m 54s)
  • 07:57 moritzm: installing idp1001 T228403
  • 07:38 moritzm: rebooting tungsten for kernel update
  • 07:38 jmm@cumin2001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
  • 07:38 jmm@cumin2001: START - Cookbook sre.hosts.downtime
  • 07:25 jmm@cumin2001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
  • 07:25 jmm@cumin2001: START - Cookbook sre.hosts.downtime
  • 07:25 jmm@cumin2001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
  • 07:25 jmm@cumin2001: START - Cookbook sre.hosts.downtime
  • 07:03 elukey: restart php-fpm on mw1330 - op-cache hit ratio low
  • 07:02 jynus: reloading dbproxy1004/9
  • 07:01 elukey: depool wdqs2004 from all services (waiting for maintenance)
  • 06:32 legoktm@deploy1001: Synchronized php-1.34.0-wmf.13/extensions/EventBus/includes/EventBus.php: Add more debugging to figure out which events are invalid: T225199 (duration: 00m 55s)
  • 06:30 legoktm@deploy1001: Synchronized php-1.34.0-wmf.14/extensions/EventBus/includes/EventBus.php: Add more debugging to figure out which events are invalid: T225199 (duration: 00m 55s)
  • 06:15 elukey: clear opcache on mwdebug*
  • 05:26 fsero: repool ms-fe2005 - T228196
  • 05:11 marostegui@deploy1001: Synchronized wmf-config/db-codfw.php: Repool db2116 (duration: 00m 55s)
  • 04:11 eileen: I think I didn't push the turn it on commit - tried again process-control config revision is 9f7eba2193
  • 03:03 eileen: process-control config revision is 7598dc1bf9 (jobs reenabled)
  • 01:52 XioNoX: enable outbound sampling on eqiad's router
  • 00:52 sbassett@deploy1001: Synchronized private/PrivateSettings.php: Add even more severe rate limits for eswikiquote and some other, smaller wikis (T227416) (duration: 00m 58s)
  • 00:38 mutante: mwmaint2001 - puppet fails - not removing a bunch of log dirs for maintenance crons
  • 00:10 dzahn@cumin1001: conftool action : set/pooled=yes; selector: name=mw2250.codfw.wmnet
  • 00:08 eileen: process-control config revision is 7598dc1bf9 - jobs disabled
  • 00:04 mutante: install1002 - exported indices for new scap version - copied back from buster to stretch - upgraded scap version on mw2250 - scap pull now works and starts to rsync (T228482, T228328, T226948)

2019-07-18

  • 23:50 mutante: built new scap version 3.11.1-1 on boron, copied to install1002, imported package with reprepro, copied from stretch to jessie and buster (T228482)
  • 23:22 Lucas_WMDE: Evening SWAT done
  • 23:17 lucaswerkmeister-wmde@deploy1001: Synchronized wmf-config/InitialiseSettings-labs.php: SWAT: Configure Citoid+Wikibase integration on Beta (production no-op) (T228411) (duration: 00m 54s)
  • 23:13 lucaswerkmeister-wmde@deploy1001: Synchronized wmf-config/Wikibase.php: SWAT: Set $wgWBRepoSettings[enableRefTabs] in Wikibase.php (T228414) (duration: 01m 16s)
  • 23:09 lucaswerkmeister-wmde@deploy1001: Synchronized wmf-config/InitialiseSettings.php: SWAT: Define settings for Citoid+Wikibase integration (T228414) (duration: 00m 55s)
  • 22:23 gehel@puppetmaster1001: conftool action : set/pooled=inactive; selector: dc=eqiad,name=wdqs1008.eqiad.wmnet
  • 22:16 gehel@cumin1001: END (ERROR) - Cookbook sre.wdqs.data-transfer (exit_code=97)
  • 22:00 eevans@: helmfile [STAGING] Ran 'apply' command on namespace 'sessionstore' for release 'staging' .
  • 21:49 bd808: Cleaned up stale striker logs on labweb1001 and labweb1002. Logs go to journald now so log rotate is not triggered to rotate out logs from before that change.
  • 21:42 eevans@: helmfile [STAGING] Ran 'apply' command on namespace 'sessionstore' for release 'staging' .
  • 21:36 bd808@deploy1001: Finished deploy [striker/deploy@91594df]: Fixes for deprecation warnings and editing Tool models (T228222, T228332) (duration: 01m 13s)
  • 21:34 bd808@deploy1001: Started deploy [striker/deploy@91594df]: Fixes for deprecation warnings and editing Tool models (T228222, T228332)
  • 21:15 mutante: gerrit (cobalt) - scheduled 1h downtime, rebooting for kernel upgrade
  • 21:03 jforrester@deploy1001: Synchronized php-1.34.0-wmf.14/extensions/Flow: T228290 Fix fatal in ChangesListFormatter::getLogTextLinks() (duration: 01m 02s)
  • 20:57 mutante: gerrit2001 - icinga downtime for 1h
  • 20:56 mutante: gerrit2001 - reboot for kernel upgrade
  • 20:51 mutante: gerrit2001 - apt-get upgrade; apt-get autoremove ; puppet agent -tv
  • 19:55 eevans@: helmfile [STAGING] Ran 'apply' command on namespace 'sessionstore' for release 'staging' .
  • 19:33 jforrester@deploy1001: Synchronized wmf-config/CommonSettings.php: T228374 Enable SecureLinkFixer in beta cluster (2/2) (duration: 00m 55s)
  • 19:31 jforrester@deploy1001: Synchronized wmf-config/InitialiseSettings.php: T228374 Enable SecureLinkFixer in beta cluster (1/2) (duration: 00m 55s)
  • 19:27 jforrester@deploy1001: Synchronized wmf-config/CommonSettings.php: T207750 Revoke editmyuserjsredirect from all users (duration: 00m 54s)
  • 19:25 otto@: helmfile [STAGING] Ran 'apply' command on namespace 'eventgate-main' for release 'main' .
  • 19:21 otto@: helmfile [STAGING] Ran 'apply' command on namespace 'eventgate-analytics' for release 'analytics' .
  • 19:20 eevans@: helmfile [STAGING] Ran 'apply' command on namespace 'sessionstore' for release 'staging' .
  • 18:45 mutante: contint2001 - had puppet failure in puppet board / dpkg issue due to unfinished zuul install which was done on contint1001 - stopped zuul and zuul-merger, apt-install zuul (was already latest version but needed to finish configure step), apt-get autoremove to remove unused packages, ran puppet. dpkg and puppet happy again
  • 17:45 krinkle@deploy1001: Synchronized php-1.34.0-wmf.14/includes/libs/objectcache/RedisBagOStuff.php: 69cd8b0 (duration: 00m 55s)
  • 17:15 Krinkle: krinkle@depoy1001: Pull down https://gerrit.wikimedia.org/r/#/c/mediawiki/extensions/CentralAuth/+/523844/ and https://gerrit.wikimedia.org/r/#/c/mediawiki/extensions/CentralAuth/+/524276/ (no-op, not deploying)
  • 16:36 jmm@cumin1001: END (PASS) - Cookbook sre.ganeti.makevm (exit_code=0)
  • 16:29 XioNoX: upgrade Routinator to 0.5.0 in eqiad - T220669
  • 16:24 krinkle@deploy1001: Synchronized php-1.34.0-wmf.14/resources/src/mediawiki.misc-authed-ooui/special.movePage.js: e97a284dbe54 (duration: 00m 58s)
  • 16:17 jmm@cumin1001: START - Cookbook sre.ganeti.makevm
  • 16:06 XioNoX: upgrade Routinator to 0.5.0 in codfw - T220669
  • 16:05 XioNoX: add routinator 0.5.0 to APT
  • 15:54 fsero: depool ms-fe2005 - T228196
  • 15:40 hashar@deploy1001: rebuilt and synchronized wikiversions files: Revert group2 wikis to 1.34.0-wmf.13 # T228436 T220739
  • 15:19 gehel@cumin2001: END (PASS) - Cookbook sre.wdqs.data-transfer (exit_code=0)
  • 14:46 godog: roll-restart thumbor in codfw - T228086
  • 14:45 gehel@cumin1001: END (FAIL) - Cookbook sre.postgresql.postgres-init (exit_code=99)
  • 14:37 liw: all wikis at 1.34.0-wmf.14
  • 14:36 liw@deploy1001: rebuilt and synchronized wikiversions files: all wikis to 1.34.0-wmf.14
  • 14:28 bblack: cp hosts: apt autoremove to clean up pkgs on the fleet
  • 14:27 nuria@deploy1001: Finished deploy [analytics/refinery@4f07755]: deploying v0.0.94 of refinery (duration: 00m 20s)
  • 14:26 nuria@deploy1001: Started deploy [analytics/refinery@4f07755]: deploying v0.0.94 of refinery
  • 14:24 godog: repool thumbor2003
  • 14:20 godog: reboot thumbor2003
  • 14:17 jijiki: Depool thumbor2003 for reboot
  • 14:12 gehel@cumin2001: START - Cookbook sre.wdqs.data-transfer
  • 13:53 moritzm: installing php5 security updates
  • 13:50 gehel@cumin2001: END (PASS) - Cookbook sre.wdqs.data-transfer (exit_code=0)
  • 13:36 jeh: rebooting labstore1005.eqiad.wmnet - T224228
  • 13:34 jbond42: remove mtail 3.0.0~rc24.1-1+wmf1 from stretch-wikimedia
  • 13:30 liw@deploy1001: Synchronized php: group1 wikis to 1.34.0-wmf.14 (duration: 00m 53s)
  • 13:29 liw@deploy1001: rebuilt and synchronized wikiversions files: group1 wikis to 1.34.0-wmf.14
  • 13:24 jbond42: downgrade cp servers backl to 3.0.0~rc5-1~bpo9+1
  • 13:23 liw: promoting 1.34.0-wmf.14 to group1
  • 13:22 godog: temporarily stop ircecho on icinga1001 to avoid spam
  • 13:00 jbond42: rolling upgrade of mtail
  • 12:57 gehel@cumin2001: START - Cookbook sre.wdqs.data-transfer
  • 12:53 gehel@cumin1001: START - Cookbook sre.wdqs.data-transfer
  • 12:53 gehel@cumin1001: END (ERROR) - Cookbook sre.wdqs.data-transfer (exit_code=97)
  • 12:51 gehel@cumin1001: START - Cookbook sre.wdqs.data-transfer
  • 12:34 gehel@cumin1001: START - Cookbook sre.postgresql.postgres-init
  • 12:26 jbond42: add mtail 3.0.0~rc24.1-1+wmf1 to stretch-wikimedia
  • 11:13 dcausse: EU Swat done
  • 11:08 dcausse@deploy1001: Synchronized wmf-config/InitialiseSettings.php: Revert [cirrus] switch search traffic (except completion) to codfw (duration: 00m 56s)
  • 11:02 godog: swift eqiad-prod: put back ms-be1043 sdk1 - T218544
  • 10:51 gehel@cumin1001: END (PASS) - Cookbook sre.wdqs.data-transfer (exit_code=0)
  • 10:43 ema: cp-eqiad: varnish frontend rolling restarts for 5.1.3-1wm11 upgrades T227672
  • 10:37 jijiki: enable puppet on services_proxy hosts - T228063
  • 10:29 godog: reboot wezen.codfw.wmnet - T225713
  • 10:27 gehel@cumin1001: END (FAIL) - Cookbook sre.postgresql.postgres-init (exit_code=99)
  • 10:15 jijiki: Disable puppet on services_proxy hosts - T228063
  • 09:33 gehel@cumin1001: START - Cookbook sre.wdqs.data-transfer
  • 09:26 gehel@cumin1001: END (PASS) - Cookbook sre.wdqs.data-transfer (exit_code=0)
  • 09:09 godog: resume swift ms-be rolling restarts - T225713
  • 09:03 fsero: reuploding missing layers T228196
  • 08:57 hashar: contint1001: stopped zuul, ran apt install to get the new python2.7 copied to Zuul virtualenv, restarted zuul/zuul-merger. That clears a couple Icinga alarms from yesterday
  • 08:56 marostegui: Drop afl_log_id column from enwiki.abuse_filter_log on db2116 T226851
  • 08:54 marostegui@deploy1001: Synchronized wmf-config/db-codfw.php: Depool db2116 (duration: 00m 55s)
  • 08:18 gehel@cumin1001: START - Cookbook sre.postgresql.postgres-init
  • 08:14 gehel@cumin1001: START - Cookbook sre.wdqs.data-transfer
  • 06:56 dcausse: deleting zerowiki elastic indices (eqiad and codfw) T227718
  • 05:22 marostegui: Stop MySQL on db2045, host will be decommissioned T228281
  • 05:18 marostegui: Remove db2045 from tendril and zarcillo T228281
  • 05:16 marostegui: Disable notifications on db2045 T228281
  • 05:09 marostegui@deploy1001: Synchronized wmf-config/db-eqiad.php: Remove db2045 from config, will be decommissioned T228281 (duration: 00m 54s)
  • 05:08 marostegui@deploy1001: Synchronized wmf-config/db-codfw.php: Remove db2045 from config, will be decommissioned T228281 (duration: 00m 56s)
  • 04:31 legoktm: running query for T227843 on mwmaint102

2019-07-17

  • 23:51 catrope@deploy1001: Synchronized wmf-config/CommonSettings.php: Add wmgUseTheWikipediaLibrary (false everywhere, no-op) (duration: 00m 54s)
  • 23:48 catrope@deploy1001: Synchronized wmf-config/InitialiseSettings.php: Add wmgUseTheWikipediaLibrary (false everywhere, no-op) (duration: 00m 53s)
  • 22:35 mutante: reimaging mw2250 after disks have been replaced
  • 22:16 hoo: Manually started the Wikidata RDF dumps on snapshot1008 (due to T228104)
  • 21:42 apergos: started wikidata entity dumps json run on snapshot1008
  • 21:37 nuria: deployment aborted for refinary 0.0.94
  • 21:37 nuria@deploy1001: Finished deploy [analytics/refinery@4f07755]: refinery 0.0.94 (duration: 36m 28s)
  • 21:16 jforrester@deploy1001: Synchronized php-1.34.0-wmf.13/includes/libs/rdbms/loadbalancer: T228104 rdbms: better handle a non-existing defaultGroup in LoadBalancer (duration: 00m 55s)
  • 21:15 catrope@deploy1001: Synchronized php-1.34.0-wmf.14/extensions/Flow: Clean up accidentally-deployed debugging code for T228290 (duration: 01m 02s)
  • 21:10 otto@deploy1001: Finished deploy [eventstreams/deploy@dbc9bbb]: Fix ?doc to use openapi instead of swagger - T227958 (duration: 02m 52s)
  • 21:07 otto@deploy1001: Started deploy [eventstreams/deploy@dbc9bbb]: Fix ?doc to use openapi instead of swagger - T227958
  • 21:00 nuria@deploy1001: Started deploy [analytics/refinery@4f07755]: refinery 0.0.94
  • 20:35 accraze@deploy1001: Finished deploy [ores/deploy@676f7ba]: T228331 (duration: 24m 59s)
  • 20:10 accraze@deploy1001: Started deploy [ores/deploy@676f7ba]: T228331
  • 19:35 jforrester@deploy1001: Synchronized php-1.34.0-wmf.14/includes/libs/rdbms/loadbalancer: T228104 rdbms: better handle a non-existing defaultGroup in LoadBalancer (duration: 00m 55s)
  • 19:04 dzahn@cumin1001: conftool action : set/pooled=yes; selector: name=mw2181.codfw.wmnet
  • 18:36 cdanis: cdanis@cumin1001.eqiad.wmnet ~ % sudo debdeploy deploy -u 2019-07-17-conftool.yaml -s eqiad
  • 18:28 cdanis: cdanis@cumin1001.eqiad.wmnet ~ % sudo debdeploy deploy -u 2019-07-17-conftool.yaml -s codfw
  • 18:26 cdanis: cdanis@cumin1001.eqiad.wmnet ~ % sudo debdeploy deploy -u 2019-07-17-conftool.yaml -s esams
  • 18:25 cdanis: cdanis@cumin1001.eqiad.wmnet ~ % sudo debdeploy deploy -u 2019-07-17-conftool.yaml -s ulsfo
  • 18:23 cdanis: cdanis@cumin1001.eqiad.wmnet ~ % sudo debdeploy deploy -u 2019-07-17-conftool.yaml -s eqsin
  • 18:20 cdanis: cdanis@mw1261.eqiad.wmnet ~ % sudo -i pool
  • 18:19 cdanis: testing conftool upgrade: cdanis@mw1261.eqiad.wmnet ~ % sudo -i depool
  • 18:15 mutante: mw2181 - sudo: /usr/local/bin/mwscript: command not found on scap pull ??
  • 18:14 mutante: mw2181 - scap pull (T205240)
  • 18:06 cdanis: cdanis@cumin1001.eqiad.wmnet ~ % sudo debdeploy deploy -u 2019-07-17-conftool.yaml -s mw-canary
  • 18:02 cdanis: upgrade to python3-conftool 1.1.1-1 on mwdebug2001
  • 18:01 cdanis: cdanis@install1002.wikimedia.org ~ % sudo -E reprepro -C main include jessie-wikimedia conftool/conftool_1.1.1-1+deb8u1_amd64.changes
  • 18:01 cdanis: cdanis@install1002.wikimedia.org ~ % sudo -E reprepro -C main include buster-wikimedia conftool/conftool_1.1.1-1+deb10u1_amd64.changes
  • 18:01 cdanis: cdanis@install1002.wikimedia.org ~ % sudo -E reprepro -C main include stretch-wikimedia conftool/conftool_1.1.1-1_amd64.changes
  • 17:09 papaul: shutting down restbase2009 for firmware upgrade
  • 17:06 liw@deploy1001: rebuilt and synchronized wikiversions files: Revert "group[0|1] wikis to 1.34.0-wmf.13"
  • 16:57 dcausse: morning swat done
  • 16:54 dcausse@deploy1001: Synchronized php-1.34.0-wmf.14/extensions/CirrusSearch/includes/ElasticaErrorHandler.php: T228283: Log response data JSON on errors (duration: 00m 55s)
  • 16:48 Urbanecm: Deployed patch for T207094
  • 16:47 gehel@cumin1001: END (FAIL) - Cookbook sre.postgresql.postgres-init (exit_code=99)
  • 16:40 elukey: execute reprepro clearvanished on install1002 to clear buster-wikimedia|thirdparty/amd-rocm (not used anymore)
  • 16:37 dcausse: reponing morning SWAT
  • 16:24 papaul: shutting down mw2181 for firmware upgrade
  • 16:20 jiji@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
  • 16:20 jiji@cumin1001: START - Cookbook sre.hosts.downtime
  • 16:19 jijiki: Depool mw2181 - T205240
  • 16:08 Urbanecm: Morning SWAT done
  • 16:08 urbanecm@deploy1001: Synchronized wmf-config/InitialiseSettings.php: SWAT: Raise zh_classicalwiki requirement for autoconfirmed (T228141) (duration: 00m 55s)
  • 16:07 cmjohnson1: powering off cloudvirt1014 for rack move T226188
  • 16:05 urbanecm@deploy1001: Synchronized wmf-config/InitialiseSettings.php: SWAT: Enable partial blocks on dewiki (T228150) (duration: 00m 54s)
  • 16:01 jbond42: copy confd package from stretch-wikimedia to buster-wikimedia
  • 15:47 Urbanecm: Re-syncing patch for T207094 T228284 and wmf.14
  • 15:37 Urbanecm: Deployed patch for T207094 T228284 to wmf.13 and wmf.14
  • 15:15 fsero: restarting swift-container-sync on ms-be* for getting logging configuration T228196
  • 15:11 papaul: shutting down mw2250 for disk replacement
  • 15:10 gehel@cumin1001: END (PASS) - Cookbook sre.wdqs.data-transfer (exit_code=0)
  • 15:07 hashar: upgrading CI Jenkins # T228142
  • 15:06 papaul: shutting down ms-be2022 for HW troubleshooting
  • 15:06 jiji@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
  • 15:05 jiji@cumin1001: START - Cookbook sre.hosts.downtime
  • 15:03 jijiki: Depool mw2269 to reboot it - T227548
  • 15:00 godog: poweroff ms-be2022 - T227667
  • 14:55 moritzm: updated jenkins in thirdparty/ci (stretch) and thirdparty (jessie) to 2.176.2 (T228142)
  • 14:45 fsero: enabling container-sync logging T228196
  • 14:41 otto@cumin1001: END (PASS) - Cookbook sre.hosts.decommission (exit_code=0)
  • 14:41 otto@cumin1001: START - Cookbook sre.hosts.decommission
  • 14:35 moritzm: restart pybal on lvs2002 (codfw primary) T227778
  • 14:32 gehel@cumin1001: START - Cookbook sre.postgresql.postgres-init
  • 14:31 gehel: repool maps1004 - T218097
  • 14:11 liw@deploy1001: Synchronized php: group1 wikis to 1.34.0-wmf.14 (duration: 00m 54s)
  • 14:10 liw@deploy1001: rebuilt and synchronized wikiversions files: group1 wikis to 1.34.0-wmf.14
  • 14:09 moritzm: restarting pybal on backup LVSes in codfw
  • 14:02 liw@deploy1001: Synchronized php-1.34.0-wmf.14/extensions/CirrusSearch/includes/Searcher.php: Do not serialize ResultsType instance T228276 (duration: 00m 55s)
  • 13:37 gehel@cumin1001: START - Cookbook sre.wdqs.data-transfer
  • 13:26 moritzm: disabled puppet on Icinga hosts in preparation of adding the LDAP replicas/codfw to LVS
  • 13:10 ema: cp-codfw: varnish frontend rolling restarts for 5.1.3-1wm11 upgrades T227672
  • 13:06 ema: prometheus servers: remove varnish-upload_$dc_backend.yaml, replaced by ATS equivalent T227668
  • 12:57 gehel@cumin1001: END (PASS) - Cookbook sre.wdqs.data-transfer (exit_code=0)
  • 12:36 godog: upgrade hp raid firmware on ms-be1 hosts - T141756
  • 12:15 Urbanecm: Running foreachwiki extensions/AbuseFilter/maintenance/normalizeThrottleParameters.php in tmux session on mwmaint1002 (T209565)
  • 12:11 Urbanecm: Ran extensions/AbuseFilter/maintenance/normalizeThrottleParameters.php for cawiki and viwiki (T209565)
  • 11:58 gehel@cumin1001: START - Cookbook sre.wdqs.data-transfer
  • 11:30 mlitn@deploy1001: Synchronized php-1.34.0-wmf.14/extensions/WikibaseMediaInfo: [WikibaseMediaInfo] Revert "Add Wikidata links to statement UI elements" (duration: 00m 56s)
  • 11:16 dcausse: reindexing wikidata (elastic@eqiad) T227136
  • 11:08 dcausse@deploy1001: Synchronized wmf-config/InitialiseSettings.php: T227136: [cirrus] switch search traffic (except completion) to codfw (duration: 00m 54s)
  • 10:53 moritzm: re-enabled icinga1001 in meta monitoring
  • 10:41 godog: install updated linux-image-4.9.0-9-amd64 on ms-be hosts
  • 10:30 godog: start rolling reboot of ms-be eqiad hosts - T225713
  • 10:30 gehel@cumin1001: END (PASS) - Cookbook sre.wdqs.data-transfer (exit_code=0)
  • 10:23 moritzm: rebooting icinga1001 for kernel update
  • 10:20 moritzm: disabled icinga1001 in meta monitoring
  • 10:18 jmm@cumin2001: END (FAIL) - Cookbook sre.hosts.downtime (exit_code=99)
  • 10:18 jmm@cumin2001: START - Cookbook sre.hosts.downtime
  • 10:08 moritzm: rebooting lithium for kernel update
  • 10:04 jmm@cumin2001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
  • 10:04 jmm@cumin2001: START - Cookbook sre.hosts.downtime
  • 09:33 gehel@cumin1001: START - Cookbook sre.wdqs.data-transfer
  • 09:33 gehel@cumin1001: END (ERROR) - Cookbook sre.wdqs.data-transfer (exit_code=97)
  • 09:23 moritzm: rebooting grafana1001 to pick up MDS-enabled qemu
  • 09:21 ema: cp-ats: upgrade fifo-log-demux to 0.3 T227668
  • 09:21 marostegui@deploy1001: Synchronized wmf-config/db-codfw.php: Depool and clarify db2045 status T227862 (duration: 00m 55s)
  • 09:19 jmm@cumin2001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
  • 09:19 jmm@cumin2001: START - Cookbook sre.hosts.downtime
  • 09:15 gehel@cumin1001: START - Cookbook sre.wdqs.data-transfer
  • 09:07 ema: upload fifo-log-demux 0.3 to stretch-wikimedia T227668
  • 08:51 jmm@cumin2001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
  • 08:51 jmm@cumin2001: START - Cookbook sre.hosts.downtime
  • 08:36 jijiki: Disable puppet on thumbor* in eqiad, depool and pool back to apply 523728 - T224572
  • 08:17 jijiki: Pool mw1239 - T227867
  • 07:48 godog: swift eqiad-prod: put back ms-be1043 sdk1 - T218544
  • 07:46 ema: cp-esams: varnish frontend rolling restarts for 5.1.3-1wm11 upgrades T227672
  • 07:33 moritzm: reimaging sarin for some tests
  • 06:59 elukey: apply mcrouter async replication to mw2224 - T225642
  • 06:25 elukey: reboot analytics1072 as attempt to clear the megacli's config (and add a new disk)
  • 06:20 elukey: sudo -i /usr/local/sbin/restart-php7.2-fpm on mwdebug* to reset opcache
  • 05:26 marostegui: Stop MySQL on db1065 for decommissioning - T227560
  • 05:24 marostegui: Remove db1065 from tendril and zarcillo - T227560
  • 03:46 tstarling@deploy1001: Synchronized php-1.34.0-wmf.14/extensions/CentralAuth/includes/specials/SpecialMultiLock.php: T227772 (duration: 00m 54s)
  • 03:42 tstarling@deploy1001: Synchronized php-1.34.0-wmf.13/extensions/CentralAuth/includes/specials/SpecialMultiLock.php: T227772 (duration: 00m 56s)
  • 03:00 tstarling@deploy1001: Synchronized php-1.34.0-wmf.13/includes/Permissions/PermissionManager.php: (no justification provided) (duration: 00m 54s)
  • 02:58 tstarling@deploy1001: Synchronized php-1.34.0-wmf.14/includes/Permissions/PermissionManager.php: (no justification provided) (duration: 00m 57s)
  • 00:50 mutante: wikitech-static commented out cert renewal cron job out of caution - still needs fixing but continue tomorrow
  • 00:12 mutante: wikitech-static - adding (undocumented!) option webroot-map to certbot config to use webroot authenticator with different document roots per domain while using the config file and not cli params (T214640)
  • 00:01 mutante: wikitech-static certbot --dry-run renew (T214640)
  • 00:01 mutante: wikitech-static changing certbot renewalparams: authenticator = webroot (changed from standalone), install = apache (unchanged) (T214640)

2019-07-16

  • 23:53 RoanKattouw: Deployed patch for T207094
  • 23:27 catrope@deploy1001: Synchronized php-1.34.0-wmf.14/skins/MinervaNeue/: Do not load main menu icons in critical path (T227929) (duration: 00m 55s)
  • 23:26 catrope@deploy1001: Synchronized php-1.34.0-wmf.13/skins/MinervaNeue/: Do not load main menu icons in critical path (T227929) (duration: 00m 56s)
  • 23:26 mutante: wikitech-static - current status with method 'standalone' is that it's broken on cert renewal and gets fixed by restarting apache, which makes no sense since the previous fixes were the straight opposite and the ticket claims the fix was moving back from apache to standalone (T214640)
  • 23:26 fsero: repool ms-fe2005 T228196
  • 23:23 mutante: wikitech-static - testing cert renewal with dry-run option - getting some temp icinga alerts is now expected again because renewal method was changed back from 'apache' to 'standalone' (not by me -> T204840#5243222 i previously did the opposite change in T214640#4907685 to fix it) and that takes down apache during the renewal (T214640)
  • 23:20 mutante: wikitech-static - testing cert renewal with dry-run option - getting some temp icinga alerts is now expected again because renewal method was changed back from 'apache' to 'standalone' (not by me) and that takes down apache during the renewal
  • 23:17 catrope@deploy1001: Synchronized php-1.34.0-wmf.14/extensions/GrowthExperiments/: Don't use timestamp in help panel questions in Flow (T212433) (duration: 00m 56s)
  • 23:09 mutante: wikitech-static got ssl config files in sync with the repo, the difference was really just that space on one line each though (T225258)
  • 22:35 fsero: uploading only blobs on docker-registry-codfw from a backup on ms-fe2005 T228196
  • 22:29 mutante: wikitech-static the diff between the ssl config files in the repo and on server were just a space at the end of the ServerAdmin line .... T225258
  • 22:28 fsero: depooling ms-fe2005 for swift upload for registry T228196
  • 22:26 mutante: wikitech-static ran certbot with --dry-run renew to confirm cert renewal works and it was just fine .. 2 minutes later apache errors which were fixed by restarting apache2 (T214640)
  • 22:24 mutante: wikitech-static restarted apache
  • 22:11 mutante: wikitech-static: turn /etc/apache2/sites-available/wikitech-static.wikimedia.org-ssl.conf and status.wikimedia.org-ssl.conf into symlinks to /wikitech-static/apache/ to match config for http vhosts (T225258)
  • 22:06 mutante: wikitech-static: move /etc/apache2/sites-available/000-default.conf and default-ssl.conf out of directory and reload apache to confirm they are not used and get us in sync with the repo contents again (T225258)
  • 21:17 bd808@deploy1001: Finished deploy [striker/deploy@247a8a6]: Fixes for ssh key management, git repo creation, and Django upgrade (T221657, T227508) (duration: 01m 08s)
  • 21:15 bd808@deploy1001: Started deploy [striker/deploy@247a8a6]: Fixes for ssh key management, git repo creation, and Django upgrade (T221657, T227508)
  • 20:55 SMalyshev: repooled wdqs2004 and wdqs2001 - reload done
  • 20:26 mutante: ganeti1001 - gnt-instance remove netmon1003.wikimedia.org (T220355)
  • 19:59 XioNoX: update ACLs on pfw3-eqiad/codfw - T228205
  • 19:52 gehel@cumin1001: END (PASS) - Cookbook sre.wdqs.data-transfer (exit_code=0)
  • 19:51 fsero: republishing base images for wikimedia-(stretch,jessie and buster) T228196
  • 18:58 gehel@cumin1001: START - Cookbook sre.wdqs.data-transfer
  • 18:58 gehel@cumin1001: END (FAIL) - Cookbook sre.wdqs.data-transfer (exit_code=99)
  • 18:58 gehel@cumin1001: START - Cookbook sre.wdqs.data-transfer
  • 18:54 gehel: data copy from wdqs2004 to wdqs2001 - T228122
  • 18:47 otto@deploy1001: Synchronized wmf-config/InitialiseSettings.php: retry - Produce revision-create stream to eventgate-main - T211248 (duration: 00m 54s)
  • 18:23 otto@deploy1001: Synchronized wmf-config/InitialiseSettings.php: Produce revision-create stream to eventgate-main - T211248 (duration: 00m 54s)
  • 18:08 jforrester@deploy1001: Synchronized wmf-config/CommonSettings.php: Update ExtensionDistributor config to point to REL1_33 as the released version (duration: 00m 54s)
  • 18:05 fsero: republishing base images for nodejs-slim due to registry T228196
  • 18:02 andrewbogott: rebooting cloudcontrol2003-dev, cloudweb2001-dev, cloudcontrol1004 for T225713
  • 17:39 otto@deploy1001: Synchronized wmf-config/InitialiseSettings.php: Produce centralnotice.campaign-* streams to eventgate-main - T211248 (duration: 00m 55s)
  • 17:23 bsitzmann@deploy1001: Finished deploy [mobileapps/deploy@cb6e7bc]: Update mobileapps to 334a4c4 (T227907) (duration: 04m 51s)
  • 17:19 bsitzmann@deploy1001: Started deploy [mobileapps/deploy@cb6e7bc]: Update mobileapps to 334a4c4 (T227907)
  • 16:55 mutante: netmon1003: shutdown -h now | ganeti1001: gnt-instance shutdown netmon1003.wikmedia.org - removed from icinga T198939 T220355
  • 16:36 jiji@deploy1001: Finished deploy [cpjobqueue/deploy@5d8128e]: Migrating videoscaling jobs to PHP7 - T219150 (duration: 00m 50s)
  • 16:35 jiji@deploy1001: Started deploy [cpjobqueue/deploy@5d8128e]: Migrating videoscaling jobs to PHP7 - T219150
  • 16:28 dcausse: reindexing wikidata (elastic@eqiad) T227136
  • 15:57 tarrow@: helmfile [STAGING] Ran 'apply' command on namespace 'termbox' for release 'staging' .
  • 15:37 elukey: reboot analytics1072 as attempt to force the raid controller to set a drive failed - T226467
  • 15:12 elukey: start mariadb on db1107 and re-enable mysql consumers on eventlog1002 and replication on db1108
  • 14:53 elukey: stop mariadb on db1107 to allow maintenance
  • 14:53 elukey: stop eventlogging mysql consumers on eventlog1002 and eventlogging_sync on db1108 to allow db1107 maintenance
  • 14:52 jbond42: will restart redis on oresdb at 16:00 UTC - T228045
  • 14:51 jbond42: enable puppet accross the fleat
  • 14:50 jbond@cumin1001: conftool action : set/pooled=yes; selector: name=dns1001.wikimedia.org
  • 14:43 jbond@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
  • 14:43 jbond@cumin1001: START - Cookbook sre.hosts.downtime
  • 14:43 jbond@cumin1001: conftool action : set/pooled=no; selector: name=dns1001.wikimedia.org
  • 14:40 jbond42: disable puppet accross the fleat to make a change to the hiera
  • 14:30 jijiki: Enable puppet and rolling restart thumbor* in codfw - T224572
  • 14:16 jijiki: Depool thumbor2001 and pool back - T224572
  • 14:13 jijiki: Disabling puppet on thumbor*codfw.wmnet - T224572
  • 14:08 liw: group0 to 1.34.0-wmf.14
  • 14:06 liw@deploy1001: rebuilt and synchronized wikiversions files: group0 to php-1.34.0-wmf.14
  • 13:41 liw@deploy1001: Finished scap: testwiki to php-1.34.0-wmf.14 and rebuild l10n cache (duration: 26m 45s)
  • 13:24 vgutierrez: restarting pybal on lvs2001 and lvs1013
  • 13:20 vgutierrez: restarting pybal on lvs2004 and lvs1016
  • 13:14 liw@deploy1001: Started scap: testwiki to php-1.34.0-wmf.14 and rebuild l10n cache
  • 12:59 liw@deploy1001: Pruned MediaWiki: 1.34.0-wmf.8 (duration: 01m 46s)
  • 12:57 liw@deploy1001: Pruned MediaWiki: 1.34.0-wmf.7 (duration: 02m 01s)
  • 12:54 liw@deploy1001: Pruned MediaWiki: 1.34.0-wmf.6 (duration: 02m 04s)
  • 12:52 liw@deploy1001: Pruned MediaWiki: 1.34.0-wmf.4 (duration: 02m 11s)
  • 12:49 liw@deploy1001: Pruned MediaWiki: 1.34.0-wmf.5 (duration: 07m 42s)
  • 12:42 dcausse: deleting stale wikidata indices (elastic@eqiad) T227136
  • 12:11 jijiki: Depool mw1293 and pool back
  • 11:57 moritzm: synched docker-ce, docker-ce-cli, containerd.io to thirdparty/ci for stretch-wikimedia (T226236)
  • 11:12 jmm@cumin2001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
  • 11:12 jmm@cumin2001: START - Cookbook sre.hosts.downtime
  • 11:12 moritzm: rebooting remaining swift frontends in eqiad to pick up a kernel with SACK fixed (T228086)
  • 10:29 moritzm: rebooting ms-fe1005 to pick up kernel with SACK fixed (T228086)
  • 10:28 jmm@cumin2001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
  • 10:28 jmm@cumin2001: START - Cookbook sre.hosts.downtime
  • 10:17 vgutierrez: restart pybal on lvs1013
  • 10:15 vgutierrez: restart pybal on lvs2001
  • 10:11 vgutierrez: restarting pybal on lvs1016
  • 10:08 vgutierrez: restarting pybal on lvs2004
  • 10:04 vgutierrez@puppetmaster1001: conftool action : set/pooled=yes; selector: cluster=ncredir,service=nginx
  • 09:24 elukey: apply mcrouter async replication settings to mw1276 - T225642
  • 09:23 elukey: pool mw1261 back with mcrouter async replication settings - T225642
  • 08:50 fsero: upload coredns docker image into registry T226516
  • 08:44 jynus: droping servermon accounts from m1 dbs T198939
  • 08:12 fsero: uploading coredns_1.5.2 for buster and stretch - T226516
  • 08:11 fsero: uploading coredns_1.5.2 for buster and stretch
  • 07:45 elukey: depool mw1261 to test mcrouter changes
  • 00:24 krinkle@deploy1001: Synchronized php-1.34.0-wmf.13/includes/cache/LinkCache.php: 4a5f4ca2fd788 (duration: 00m 51s)
  • 00:05 catrope@deploy1001: Synchronized php-1.34.0-wmf.13/skins/MinervaNeue/: Restrict AMC scripts and styles to AMC mode (T227929) (duration: 00m 52s)
  • 00:03 shdubsh: restart logstash to revert mitigations - T228089

2019-07-15

  • 23:55 XioNoX: rotate network-root password
  • 23:31 robh@cumin1001: END (PASS) - Cookbook sre.hosts.decommission (exit_code=0)
  • 23:31 robh@cumin1001: START - Cookbook sre.hosts.decommission
  • 23:07 catrope@deploy1001: Synchronized wmf-config/InitialiseSettings.php: GrowthExperiments: Remove reference to non-existent feature flag (duration: 00m 51s)
  • 22:33 XenoRyet: updated civicrm from 8a4451f390 to 3be1a8c77c
  • 22:01 jforrester@deploy1001: Synchronized wmf-config/InitialiseSettings.php: Stop setting wgNonincludableNamespaces, default, never varied (duration: 00m 52s)
  • 22:00 jforrester@deploy1001: Synchronized wmf-config/InitialiseSettings.php: Drop wmgEnableTabularData and wmgEnableMapData, unused (duration: 00m 55s)
  • 21:58 jforrester@deploy1001: Synchronized wmf-config/CommonSettings.php: Use wmgEnableJsonConfigDataMode instead of wmgEnableTabularData and wmgEnableMapData (duration: 00m 56s)
  • 21:56 jijiki: Depool mw1239 for maintenance - T227867
  • 21:55 jforrester@deploy1001: Synchronized wmf-config/InitialiseSettings.php: Add wmgEnableJsonConfigDataMode to IS (duration: 00m 55s)
  • 21:46 sbassett@deploy1001: Synchronized private/PrivateSettings.php: Add more severe rate limits for eswikiquote (T227416) (duration: 00m 50s)
  • 21:16 gehel@cumin1001: END (PASS) - Cookbook sre.wdqs.data-transfer (exit_code=0)
  • 21:06 XioNoX: rollback `as-path HE ".* 6939 .*"` to AVOID-PATH in eqsin - T228015
  • 20:59 krinkle@deploy1001: Synchronized php-1.34.0-wmf.13/includes/Title.php: T227700 / T227700: getSubpage should not lose the interwiki prefix (duration: 00m 52s)
  • 20:54 bsitzmann@deploy1001: Finished deploy [mobileapps/deploy@bc3a2fd]: Update mobileapps to 7fd39da (T227907) (duration: 02m 24s)
  • 20:52 bsitzmann@deploy1001: Started deploy [mobileapps/deploy@bc3a2fd]: Update mobileapps to 7fd39da (T227907)
  • 20:52 bsitzmann@deploy1001: Finished deploy [mobileapps/deploy@bc3a2fd]: Update mobileapps to 7fd39da (T227907) (duration: 07m 53s)
  • 20:50 Krinkle: deploy1001: Unable to fetch git commits from Gerrit for php-1.34.0-wmf.13 due to "error: cannot update the ref 'refs/remotes/origin/fundraising/REL1_31': unable to append to '.git/logs/refs/remotes/origin/fundraising/REL1_31': Permission denied"
  • 20:47 XioNoX: add `as-path HE ".* 6939 .*"` to AVOID-PATH in eqsin - T228015
  • 20:44 bsitzmann@deploy1001: Started deploy [mobileapps/deploy@bc3a2fd]: Update mobileapps to 7fd39da (T227907)
  • 20:30 XioNoX: deactivate HE peering in eqsin - T228015
  • 20:02 jynus: reducing consistency of db2045 to avoid lag at T227862
  • 19:53 gehel@cumin1001: START - Cookbook sre.wdqs.data-transfer
  • 19:31 ppchelko@deploy1001: Finished deploy [cpjobqueue/deploy@fd0a41a]: Change the name of the error log field for deduplicatio (duration: 01m 13s)
  • 19:30 ppchelko@deploy1001: Started deploy [cpjobqueue/deploy@fd0a41a]: Change the name of the error log field for deduplicatio
  • 19:27 ppchelko@deploy1001: Finished deploy [changeprop/deploy@df6322a]: Rename error field in deduplication logs (duration: 01m 28s)
  • 19:26 ppchelko@deploy1001: Started deploy [changeprop/deploy@df6322a]: Rename error field in deduplication logs
  • 19:25 XenoRyet: update payments-wiki from 59ace50d66 to 224c6b2d7b
  • 19:10 thcipriani: gerrit back
  • 19:09 thcipriani: gerrit restart for v2.15.14
  • 19:09 thcipriani@deploy1001: Finished deploy [gerrit/gerrit@40d88dc]: Bump gerrit version to 2.15.14 (cobalt - restart incoming) (duration: 00m 10s)
  • 19:08 thcipriani@deploy1001: Started deploy [gerrit/gerrit@40d88dc]: Bump gerrit version to 2.15.14 (cobalt - restart incoming)
  • 19:06 thcipriani@deploy1001: Finished deploy [gerrit/gerrit@40d88dc]: Bump gerrit version to 2.15.14 (gerrit2001) (duration: 00m 12s)
  • 19:06 thcipriani@deploy1001: Started deploy [gerrit/gerrit@40d88dc]: Bump gerrit version to 2.15.14 (gerrit2001)
  • 19:05 shdubsh: restarting logstash on logstash1008
  • 18:27 Urbanecm: Morning SWAT done
  • 18:13 urbanecm@deploy1001: Synchronized private/PrivateSettings.php: Remove spam mitigations (T200104) (duration: 00m 50s)
  • 18:11 urbanecm@deploy1001: Synchronized wmf-config/InitialiseSettings.php: SWAT: GrowthExperiments: Enable WelcomeSurvey A/B test for arwiki (T226221) (duration: 01m 02s)
  • 18:07 jbond42: syncing puppetmaster1001 facts to compiler1001/1002
  • 17:34 cdanis: downtime mr1-eqsin.oob IPv6 for 20h T227967
  • 16:58 jynus: setting labsdb1009/10/11 to performance scaling_governor T225713
  • 16:57 otto@deploy1001: Synchronized wmf-config/InitialiseSettings.php: Produce revision-visibility-change stream to eventgate-main - T211248 (duration: 00m 49s)
  • 14:08 otto@deploy1001: Finished deploy [analytics/refinery@3296aab] (notebook): (no justification provided) (duration: 00m 06s)
  • 14:08 otto@deploy1001: Started deploy [analytics/refinery@3296aab] (notebook): (no justification provided)
  • 14:08 otto@deploy1001: Finished deploy [analytics/refinery@3296aab] (notebook): (no justification provided) (duration: 00m 01s)
  • 14:07 otto@deploy1001: Started deploy [analytics/refinery@3296aab] (notebook): (no justification provided)
  • 14:07 otto@deploy1001: Finished deploy [analytics/refinery@3296aab] (notebook): (no justification provided) (duration: 00m 01s)
  • 14:07 otto@deploy1001: Started deploy [analytics/refinery@3296aab] (notebook): (no justification provided)
  • 14:06 otto@deploy1001: Finished deploy [analytics/refinery@3296aab] (notebook): (no justification provided) (duration: 00m 07s)
  • 14:06 otto@deploy1001: Started deploy [analytics/refinery@3296aab] (notebook): (no justification provided)
  • 14:04 otto@deploy1001: Finished deploy [analytics/refinery@3296aab] (notebook): (no justification provided) (duration: 00m 05s)
  • 14:04 otto@deploy1001: Started deploy [analytics/refinery@3296aab] (notebook): (no justification provided)
  • 13:55 otto@deploy1001: Finished deploy [analytics/refinery@3296aab] (notebook): (no justification provided) (duration: 00m 06s)
  • 13:55 elukey: enable profile::base::firewall on notebook100[3,4]
  • 13:55 otto@deploy1001: Started deploy [analytics/refinery@3296aab] (notebook): (no justification provided)
  • 13:55 otto@deploy1001: Finished deploy [analytics/refinery@3296aab] (notebook): (no justification provided) (duration: 00m 15s)
  • 13:54 otto@deploy1001: Started deploy [analytics/refinery@3296aab] (notebook): (no justification provided)
  • 13:23 Urbanecm: Running mwscript importImages.php --wiki=commonswiki --user=Meisam /home/urbanecm/T223052
  • 13:16 gehel: repooling maps eqiad - T218097
  • 13:02 jdrewniak@deploy1001: Synchronized portals: Wikimedia Portals Update: Bumping portals to master (T128546) (duration: 00m 50s)
  • 13:01 jdrewniak@deploy1001: Synchronized portals/wikipedia.org/assets: Wikimedia Portals Update: Bumping portals to master (T128546) (duration: 00m 50s)
  • 12:59 gehel: depooling kartotherian eqiad - T225713
  • 12:59 gehel: re-enabling kartotherian codfw - T225713
  • 12:55 gehel: shutting down tilerator on maps eqiad to free some CPU - T225713
  • 12:54 gehel: shutting down tilerator on maps eqiad to free some CPU -
  • 12:52 urbanecm@deploy1001: Synchronized wmf-config/CommonSettings.php: Delete Image-reviewer group from commonswiki for good (T216406) (duration: 00m 51s)
  • 12:50 gehel: restarting kartotherian on maps1002
  • 12:35 gehel: reimporting OSM data for maps eqiad cluster - T218097
  • 12:25 moritzm: installing openjpeg2 security updates
  • 12:20 Amir1: ladsgroup@mwmaint1002:~$ mwscript maintenance/createAndPromote.php --wiki=testwikidatawiki --force --bureaucrat Ladsgroup
  • 12:16 jbond42: update redis on mwlog, pybal-test, maps and rdb*
  • 12:10 moritzm: installing ldap-replica200[12] (T227778)
  • 12:07 lucaswerkmeister-wmde@deploy1001: Synchronized wmf-config/Wikibase.php: SWAT: Specify $wgWBRepoSettings['conceptBaseUri'] again (T225212) (duration: 00m 50s)
  • 12:06 moritzm: removing myself from cn=tools.admin (currently not used, was mostly historical for debugging some Toollabs issue in the past)
  • 12:00 lucaswerkmeister-wmde@deploy1001: Synchronized wmf-config/: SWAT: Specify $wmgWBRepoConceptBaseUri again (T225212) (duration: 00m 51s)
  • 12:00 Urbanecm: Running mwscript initSiteStats.php --wiki=commonswiki --update to update Special:Statistics after a big change (T216406)
  • 11:52 urbanecm@deploy1001: Synchronized wmf-config/InitialiseSettings.php: SWAT: Regrant image reviewers on commonswiki the ability to mass upload (T216406) (duration: 00m 50s)
  • 11:49 urbanecm@deploy1001: Synchronized wmf-config/InitialiseSettings.php: SWAT: Rename `Image-reviewer` to `image-reviewer` for Commons (2/2, T216406) (duration: 00m 48s)
  • 11:48 urbanecm@deploy1001: Synchronized wmf-config/CommonSettings.php: SWAT: Rename `Image-reviewer` to `image-reviewer` for Commons (1/2, T216406) (duration: 00m 50s)
  • 11:31 urbanecm@deploy1001: Synchronized wmf-config/InitialiseSettings.php: SWAT: Enable partial blocks on the Finnish Wikipedia (T228008) (duration: 00m 51s)
  • 11:28 urbanecm@deploy1001: Synchronized wmf-config/InitialiseSettings.php: SWAT: Move private and fishbowl overrides from groupOverrides to groupOverrides2 (T227980) (duration: 00m 51s)
  • 11:24 urbanecm@deploy1001: Synchronized php-1.34.0-wmf.13/includes/libs/http/MultiHttpClient.php: SWAT: Raise default reqTimeout in MultiHttpClient (T226979) (duration: 00m 51s)
  • 11:23 moritzm: installing python-django security updates on jessie
  • 11:22 urbanecm@deploy1001: Synchronized php-1.34.0-wmf.13/includes/Title.php: SWAT: When title contains only slashes, Title::getRootText() shouldnt return false (T227816) (duration: 00m 51s)
  • 11:19 urbanecm@deploy1001: Synchronized wmf-config/InitialiseSettings.php: SWAT: Enable WikiLove and SandboxLink on sqwiki (T227970) (duration: 00m 51s)
  • 11:15 Urbanecm: Running mwscript extensions/WikimediaMaintenance/createExtensionTables.php sqwiki wikilove for T227970
  • 11:13 Urbanecm: Running mwscript migrateUserGroup.php --wiki=commonswiki Image-reviewer image-reviewer for T216406
  • 11:11 urbanecm@deploy1001: Synchronized wmf-config/InitialiseSettings.php: SWAT: Disallow admins to grant or revoke image reviewer due to migration (T216406) (duration: 00m 50s)
  • 11:08 urbanecm@deploy1001: Synchronized wmf-config/CommonSettings.php: SWAT: Create image-reviewer for commonswiki with same rights as Image-reviewer (T216406) (duration: 00m 52s)
  • 10:52 moritzm: installing ldap-replica200[12] (T227778)
  • 10:34 jdrewniak@deploy1001: Synchronized portals: Wikimedia Portals Update: Bumping portals to master (T128546) (duration: 00m 50s)
  • 10:34 jdrewniak@deploy1001: Synchronized portals/wikipedia.org/assets: Wikimedia Portals Update: Bumping portals to master (T128546) (duration: 00m 56s)
  • 09:56 ema: cp-eqsin: varnish frontend rolling restarts for 5.1.3-1wm11 upgrades T227672
  • 09:39 fsero: repooling ms-fe2005 T227570
  • 08:50 fsero: creating docker_registry_codfw on eqiad T227570
  • 08:49 gehel: correction: set oemhp_powerreg=os + reboot for elastic1052 (NOT elastic1054) - T225713
  • 08:49 fsero: T227570 changing container_synchronization on docker_registry_codfw to //docker_registry/eqiad/AUTH_docker/docker_registry_codfw
  • 08:48 gehel: set oemhp_powerreg=os + reboot for elastic1054 - T225713
  • 08:22 godog: set oemhp_powerreg=os on ms-be10[16-39] - T225713
  • 08:01 vgutierrez: upgrading acme-chief to version 0.19 in acme-chief production instances - T225945

2019-07-14

  • 13:18 godog: silence mr1-eqsin.oob IPv6 until tomorrow 8 UTC - T227967
  • 12:01 Urbanecm: Running mwscript importImages.php --wiki=commonswiki --comment-ext=txt --user=Sporti /home/urbanecm/T227968 for server side upload

2019-07-13

  • 01:51 MaxSem: DIsabled 2FA for my staff account

2019-07-12

  • 23:35 mutante: netmon1003 - shutdown -h now after it's gone from Icinga now
  • 23:31 dzahn@cumin1001: END (PASS) - Cookbook sre.hosts.decommission (exit_code=0)
  • 23:31 dzahn@cumin1001: START - Cookbook sre.hosts.decommission
  • 23:28 mutante: netmon1003 - stopping apache2 service (decom of servermon.wikimedia.org)
  • 19:41 James_F: Disabled 2FA for MSchottlender-WMF for device reset.
  • 19:17 shdubsh: add prometheus-varnishkafka-exporter 0.1 to apt repo T196066
  • 19:15 urandom: bootstrapping restbase1017-c -- T222960
  • 19:08 jeh: rebooting cloudvirt1018.eqiad.wmnet T216040
  • 18:53 mutante: cp1072 - enabling notifications for service checks in icinga, they were disabled but all green and no SAL/ticket. looked like forgotten from the past
  • 18:49 gehel: setting CPU governor to performance for wdqs1010 - T225713
  • 18:16 Krinkle: Remove bogus Graphite data at frontend.navtiming2.requet (typo from Nov 2018), graphite1004/2003
  • 18:02 urandom: bootstrapping restbase1017-b -- T222960
  • 16:32 urandom: bootstrapping restbase1017-a -- T222960
  • 16:25 jijiki: Rolling restart swift proxy on ms-fe*
  • 15:25 jeh: rebooting cloudvirt1018.eqiad.wmnet T216040
  • 14:05 gehel@cumin1001: END (PASS) - Cookbook sre.wdqs.data-transfer (exit_code=0)
  • 12:45 gehel@cumin1001: START - Cookbook sre.wdqs.data-transfer
  • 12:39 fsero: recreating ci staging namespaces T227775
  • 12:39 fsero@: helmfile [STAGING] Ran 'apply' command on namespace 'blubberoid' for release 'staging' .
  • 12:38 fsero@: helmfile [STAGING] Ran 'apply' command on namespace 'blubberoid' for release 'staging' .
  • 12:36 fsero@: helmfile [STAGING] Ran 'apply' command on namespace 'eventgate-main' for release 'main' .
  • 12:33 fsero@: helmfile [STAGING] Ran 'apply' command on namespace 'eventgate-analytics' for release 'analytics' .
  • 12:33 fsero@: helmfile [STAGING] Ran 'apply' command on namespace 'eventgate-analytics' for release 'analytics' .
  • 12:22 fsero: recreating eventgate-* and blubberoid staging namespaces T227775
  • 12:22 fsero@: helmfile [STAGING] Ran 'apply' command on namespace 'mathoid' for release 'staging' .
  • 12:22 fsero@: helmfile [STAGING] Ran 'apply' command on namespace 'mathoid' for release 'staging' .
  • 12:18 fsero@: helmfile [STAGING] Ran 'apply' command on namespace 'cxserver' for release 'staging' .
  • 12:18 fsero@: helmfile [STAGING] Ran 'apply' command on namespace 'cxserver' for release 'staging' .
  • 12:18 fsero@: helmfile [STAGING] Ran 'apply' command on namespace 'cxserver' for release 'staging' .
  • 12:15 fsero@: helmfile [STAGING] Ran 'apply' command on namespace 'sessionstore' for release 'staging' .
  • 12:11 fsero: recreating sessionstore,cxserver and mathoid staging namespaces T227775
  • 12:10 fsero@: helmfile [STAGING] Ran 'apply' command on namespace 'citoid' for release 'staging' .
  • 12:06 fsero: recreating citoid staging namespace T227775
  • 12:05 fsero@: helmfile [STAGING] Ran 'apply' command on namespace 'termbox' for release 'staging' .
  • 12:01 fsero: recreating termbox staging namespace T227775
  • 11:09 jynus@deploy1001: Synchronized wmf-config/db-codfw.php: Switchover db2045 x1 codfw master to db2069 (duration: 00m 51s)
  • 10:24 jynus: switchover x1 codfw master from db2045 to db2069 T227862
  • 10:23 jynus: switchover x1 codfw master from db2045 to db2069
  • 09:43 moritzm: shut down ldap-codfw-replica01/ldap-codfw-replica02 (pending reimage)
  • 08:18 jijiki: enable puppet on mw1222
  • 06:35 vgutierrez: upgrading acme-chief to version 0.19 in acme-chief test instances - T225945
  • 06:28 vgutierrez: uploaded acme-chief 0.19 to apt.wikimedia.org (buster) - T225945
  • 05:45 elukey: sudo -i /usr/local/sbin/restart-php7.2-fpm on mwdebug* to clear opcache
  • 01:01 Krinkle: mw1342 generated some ~ 11,500 additional PHP errors over a 4 hour period (18:00-22:30 UTC), ref T224491
  • 00:59 Krinkle: mw1342 is generating strange PHP erros (php7 only), ref T224491
  • 00:58 urandom: bootstrapping restbase1017-a -- T222960
  • 00:50 mutante: restbase1018 - restart ferm service
  • 00:15 krinkle@deploy1001: Synchronized wmf-config/InitialiseSettings.php: e4bd91f71b (duration: 00m 50s)
  • 00:13 krinkle@deploy1001: Synchronized wmf-config/InitialiseSettings.php: f309856f0912 (duration: 00m 50s)
  • 00:03 eevans@deploy1001: Finished deploy [cassandra/metrics-collector@df909a1]: deploy logback to restbase1017 (T222960) (duration: 00m 03s)
  • 00:03 eevans@deploy1001: Started deploy [cassandra/metrics-collector@df909a1]: deploy logback to restbase1017 (T222960)
  • 00:01 eevans@deploy1001: Finished deploy [cassandra/metrics-collector@df909a1]: deploy logback to restbase1017 (T222960) (duration: 00m 25s)
  • 00:01 eevans@deploy1001: Started deploy [cassandra/metrics-collector@df909a1]: deploy logback to restbase1017 (T222960)

2019-07-11

  • 23:58 thcipriani@deploy1001: Synchronized php-1.34.0-wmf.13/includes/watcheditem/WatchedItemStore.php: SWAT: WatchedItemStore: Fix fatal when revision is deleted T226741 (duration: 00m 51s)
  • 23:49 eevans@deploy1001: Finished deploy [cassandra/logstash-logback-encoder@d085ffa]: deploy logback to restbase1017 (T222960) (duration: 00m 47s)
  • 23:48 eevans@deploy1001: Started deploy [cassandra/logstash-logback-encoder@d085ffa]: deploy logback to restbase1017 (T222960)
  • 23:47 eevans@deploy1001: Finished deploy [cassandra/logstash-logback-encoder@d085ffa]: (no justification provided) (duration: 01m 56s)
  • 23:45 eevans@deploy1001: Started deploy [cassandra/logstash-logback-encoder@d085ffa]: (no justification provided)
  • 23:38 eevans@deploy1001: deploy aborted: (no justification provided) (duration: 02m 00s)
  • 23:36 eevans@deploy1001: Started deploy [cassandra/logstash-logback-encoder@d085ffa]: (no justification provided)
  • 23:15 thcipriani@deploy1001: Synchronized wmf-config: SWAT: Oversample all EditAttemptStep events on VE-as-mobile-default wikis T227317 (duration: 00m 50s)
  • 22:59 mutante: netmon1003 - removing servermon - servermon.wikimedia.org is being decom'ed (T198939)
  • 22:37 RoanKattouw: Deployed fix for T224240, accidentally rode along with Tyler's no-op scap
  • 22:34 thcipriani@deploy1001: rebuilt and synchronized wikiversions files: wikidatawiki back to 1.34.0-wmf.13
  • 22:26 thcipriani@deploy1001: Finished scap: no op scap sync to rebuild l10n-cache (T227814) (duration: 19m 34s)
  • 22:07 thcipriani@deploy1001: Started scap: no op scap sync to rebuild l10n-cache (T227814)
  • 21:23 otto@deploy1001: Finished deploy [analytics/refinery@3296aab]: (no justification provided) (duration: 02m 02s)
  • 21:21 otto@deploy1001: Started deploy [analytics/refinery@3296aab]: (no justification provided)
  • 20:22 otto@deploy1001: Finished deploy [analytics/refinery@3296aab]: (no justification provided) (duration: 00m 02s)
  • 20:22 otto@deploy1001: Started deploy [analytics/refinery@3296aab]: (no justification provided)
  • 20:20 otto@deploy1001: Finished deploy [analytics/refinery@3296aab]: (no justification provided) (duration: 00m 03s)
  • 20:20 otto@deploy1001: Started deploy [analytics/refinery@3296aab]: (no justification provided)
  • 20:19 otto@deploy1001: Finished deploy [analytics/refinery@3296aab]: (no justification provided) (duration: 00m 02s)
  • 20:19 otto@deploy1001: Started deploy [analytics/refinery@3296aab]: (no justification provided)
  • 20:18 otto@deploy1001: Finished deploy [analytics/refinery@3296aab]: (no justification provided) (duration: 00m 02s)
  • 20:18 otto@deploy1001: Started deploy [analytics/refinery@3296aab]: (no justification provided)
  • 20:11 milimetric@deploy1001: deploy aborted: Fix to reimport cu_changes (duration: 27m 34s)
  • 20:03 jhuneidi@deploy1001: rebuilt and synchronized wikiversions files: Revert wikidata to 1.34.0-wmf.11
  • 19:44 milimetric@deploy1001: Started deploy [analytics/refinery@3296aab]: Fix to reimport cu_changes
  • 19:29 jhuneidi@deploy1001: rebuilt and synchronized wikiversions files: all wikis to 1.34.0-wmf.13 refs T220738
  • 18:09 jhuneidi@deploy1001: Synchronized php: group1 wikis to 1.34.0-wmf.13 refs T220738 (duration: 00m 57s)
  • 18:08 jhuneidi@deploy1001: rebuilt and synchronized wikiversions files: group1 wikis to 1.34.0-wmf.13 refs T220738
  • 18:02 cdanis: cdanis@cumin1001.eqiad.wmnet ~ % sudo debdeploy deploy -u T197126-2019-07-11-conftool.yaml -s eqiad
  • 17:37 cdanis: cdanis@cumin1001.eqiad.wmnet ~ % sudo debdeploy deploy -u T197126-2019-07-11-conftool.yaml -s codfw
  • 17:02 cdanis: cdanis@cumin1001.eqiad.wmnet ~ % sudo debdeploy deploy -u T197126-2019-07-11-conftool.yaml -s esams
  • 16:48 cdanis: cdanis@cumin1001.eqiad.wmnet ~ % sudo debdeploy deploy -u T197126-2019-07-11-conftool.yaml -s eqsin
  • 16:19 cdanis: cdanis@cumin1001.eqiad.wmnet ~ % sudo debdeploy deploy -u T197126-2019-07-11-conftool.yaml -s ulsfo
  • 16:12 XioNoX: revert deactivate ping-offload in eqiad for server reboot
  • 16:03 moritzm: rebooting ping1001 to pick up MDS-enabled qemu
  • 16:02 cdanis: repool cp4022 after testing conftool change
  • 15:59 XioNoX: deactivate ping-offload in eqiad for server reboot
  • 15:58 cdanis: depool cp4022 for testing conftool change
  • 15:58 XioNoX: revert deactivate ping-offload in codfw for server reboot
  • 15:56 moritzm: installing dnspython update from stretch point release
  • 15:53 moritzm: rebooting ping2001 to pick up MDS-enabled qemu
  • 15:53 jmm@cumin2001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
  • 15:53 jmm@cumin2001: START - Cookbook sre.hosts.downtime
  • 15:50 XioNoX: deactivate ping-offload in codfw for server reboot
  • 15:45 thcipriani@deploy1001: Finished deploy [gerrit/gerrit@4daa16c]: it-phabricator plugin update (cobalt) (duration: 00m 11s)
  • 15:45 thcipriani@deploy1001: Started deploy [gerrit/gerrit@4daa16c]: it-phabricator plugin update (cobalt)
  • 15:44 thcipriani@deploy1001: Finished deploy [gerrit/gerrit@4daa16c]: it-phabricator plugin update (gerrit2001 only) (duration: 00m 11s)
  • 15:44 thcipriani@deploy1001: Started deploy [gerrit/gerrit@4daa16c]: it-phabricator plugin update (gerrit2001 only)
  • 15:28 gehel: setting CPU governor to performance for wdqs1004 - T225713
  • 15:28 cdanis: upgrade to python3-conftool 1.1.0-1 on cp4022
  • 15:05 cdanis: cdanis@cumin1001.eqiad.wmnet ~ % sudo debdeploy deploy -u T197126-2019-07-11-conftool.yaml -s cp-canary
  • 15:00 hashar_: restarted Jenkins for plugins upgrades
  • 14:57 cdanis: cdanis@cumin1001.eqiad.wmnet ~ % sudo debdeploy deploy -u T197126-2019-07-11-conftool.yaml -s mw-canary
  • 14:55 gehel: setting CPU governor to performance for elastic1052 - T225713
  • 14:51 cdanis: upgrade to python3-conftool 1.1.0-1 on mwdebug2001
  • 14:45 krinkle@deploy1001: Synchronized php-1.34.0-wmf.13/includes/libs/rdbms/database/Database.php: 903f3f94f5d2e3 / T227708 (duration: 00m 59s)
  • 14:26 cdanis: cdanis@install1002.wikimedia.org ~ % sudo -E reprepro -C main include stretch-wikimedia /home/volans/conftool/stretch/conftool_1.1.0-1_amd64.changes
  • 14:26 cdanis: cdanis@install1002.wikimedia.org ~ % sudo -E reprepro -C main include jessie-wikimedia /home/volans/conftool/jessie/conftool_1.1.0-1+deb8u1_amd64.changes
  • 14:26 cdanis: cdanis@install1002.wikimedia.org ~ % sudo -E reprepro -C main include buster-wikimedia /home/volans/conftool/buster/conftool_1.1.0-1+deb10u1_amd64.changes
  • 14:17 ema: restart wikibugs
  • 13:40 godog: roll restart ms-be2016 ms-be2017 ms-be2018 ms-be2019 ms-be2020 ms-be2021 ms-be2028 ms-be2029 ms-be2030 ms-be2031 ms-be2032 ms-be2033 ms-be2034 ms-be2035 ms-be2036 - T225713
  • 13:00 ema: cp-ulsfo: varnish frontend rolling restarts for 5.1.3-1wm11 upgrades T227672
  • 12:48 ema: fleet-wide: remove obsolete file /etc/debdeploy-autorestarts.conf
  • 12:44 ema: cp-ulsfo: upgrade mtail to 3.0.0~rc5-1~bpo9+1wmf1
  • 12:44 Urbanecm: Running purgePage.php on pages in Page: NS on pawikisource (T226959)
  • 12:39 jijiki: Disable puppet on mw1222, server will be depooled and pooled a few times for tests - T224538
  • 12:07 godog: ms-be2031 raid controller firmware upgrade 4.52 -> 6.88 - T141756
  • 12:03 godog: power reset ms-be2031, stuck and nothing on console
  • 11:56 Urbanecm: EU SWAT done
  • 11:54 urbanecm@deploy1001: Finished scap: Namespace translation for Punjabi (T226959) (duration: 30m 13s)
  • 11:24 urbanecm@deploy1001: Started scap: Namespace translation for Punjabi (T226959)
  • 11:22 urbanecm@deploy1001: Synchronized wmf-config/InitialiseSettings.php: SWAT: Remove usergroup communityapps from officewiki (T227680) (duration: 01m 02s)
  • 11:20 urbanecm@deploy1001: Synchronized dblists/mobilemainpagelegacy.dblist: SWAT: Remove commonswiki from mobilemainpagelegacy (T227719) (duration: 00m 58s)
  • 11:14 dcausse@deploy1001: Synchronized wmf-config/InitialiseSettings.php: [cirrus] Enable UTR30 as a lookup method for ns prefixes on group2 (duration: 01m 02s)
  • 10:45 moritzm: installing ldap-codfw-replica*
  • 10:28 fsero: depooling ms-fe2005 for docker_registry_backups T227570
  • 10:08 fsero: creating swift docker_registry_container_backup T227570
  • 09:56 moritzm: re-enabling puppet (puppetdb reboots completed)
  • 09:47 moritzm: rebooting puppetdb1001 to pick up MDS-enabled qemu
  • 09:35 moritzm: rebooting puppetdb2001 to pick up MDS-enabled qemu
  • 09:31 moritzm: disabling puppet temporarily (for puppetdb reboots)
  • 09:27 jmm@cumin2001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
  • 09:27 jmm@cumin2001: START - Cookbook sre.hosts.downtime
  • 08:51 godog: upload mtail 3.0.0~rc5-1~bpo9+1wmf1 to stretch-wikimedia - T225604
  • 08:14 ema: cp-ulsfo: downgrade mtail to 3.0.0~rc5-1~bpo9+1 to fix varnishmtail-backend T225604
  • 07:43 moritzm: installing ldap-codfw-replica* T227669
  • 07:31 jmm@cumin2001: END (PASS) - Cookbook sre.ganeti.makevm (exit_code=0)
  • 07:21 jmm@cumin2001: START - Cookbook sre.ganeti.makevm
  • 07:21 jmm@cumin2001: END (PASS) - Cookbook sre.ganeti.makevm (exit_code=0)
  • 07:11 jmm@cumin2001: START - Cookbook sre.ganeti.makevm
  • 07:10 jmm@cumin2001: END (ERROR) - Cookbook sre.ganeti.makevm (exit_code=97)
  • 07:10 jmm@cumin2001: START - Cookbook sre.ganeti.makevm
  • 02:27 ejegg: updated payments-wiki from 4c1261fe5d to 59ace50d66

2019-07-10

  • 23:16 jforrester@deploy1001: Synchronized php-1.34.0-wmf.13/extensions/CirrusSearch/includes: T227691 RedirectsAndIncomingLinks: succeede or fail, but not both (duration: 01m 02s)
  • 23:02 jforrester@deploy1001: Synchronized php-1.34.0-wmf.13/extensions/OAuth/includes/backend/MWOAuthUtils.php: T227688 OAuth: Do not rely on array autocreation for custom User properties; re-try (duration: 00m 58s)
  • 22:59 jforrester@deploy1001: scap failed: average error rate on 3/11 canaries increased by 10x (rerun with --force to override this check, see https://logstash.wikimedia.org/goto/db09a36be5ed3e81155041f7d46ad040 for details)
  • 22:57 jforrester@deploy1001: Synchronized php-1.34.0-wmf.13/includes/user/User.php: T227688 User: support setting custom fields + array autocreation in non-existent field (duration: 00m 58s)
  • 22:46 shdubsh: downgrading cp4031 to mtail_3.0.0~rc5-1~bpo9+1wmf1 to fix varnishmtail T225604
  • 22:46 jforrester@deploy1001: Synchronized w: T156319 Remove /w/skin-1.5 symlink (duration: 00m 58s)
  • 22:16 jforrester@deploy1001: Synchronized wmf-config/InitialiseSettings.php: T212865 Stop configuring ZeroBanner and ZeroPortal, unused (duration: 00m 58s)
  • 22:06 jforrester@deploy1001: Synchronized wmf-config/CommonSettings.php: T212865 Drop the ability to use ZeroBanner and ZeroPortal from production (duration: 00m 57s)
  • 22:03 jforrester@deploy1001: Synchronized wmf-config/mobile.php: T212865 Drop the ability to use ZeroBanner and ZeroPortal from production, mobile code (duration: 00m 57s)
  • 21:59 jforrester@deploy1001: Synchronized w/robots.php: T212865 Drop the special treatment for Wikipedia Zero (duration: 00m 58s)
  • 21:55 jforrester@deploy1001: Synchronized wmf-config/InitialiseSettings.php: T212865 Drop the Wikipedia Zero debug log channel (duration: 00m 58s)
  • 21:51 jforrester@deploy1001: Synchronized wmf-config/InitialiseSettings.php: T187716 Drop all zerowiki configuration (duration: 00m 58s)
  • 21:50 mutante: mwdebug1002 - php7adm /opcache-free because icinga showed a warning for opcache free space below 100MB
  • 21:49 jforrester@deploy1001: Synchronized dblists/: T187716 Mark zerowiki as deleted in dblists (duration: 01m 00s)
  • 21:41 jforrester@deploy1001: Synchronized wmf-config/InitialiseSettings.php: T212865 Disable ZeroBanner on all wikis (duration: 00m 59s)
  • 21:36 mutante: mw1235 - restarting hhvm (socket timeout alert in icinga since about 1.5h)
  • 21:35 mutante: mw1290 - restarting hhvm (socket timeout alert in icinga since about 5h)
  • 19:45 hoo: Updated the Wikidata property suggester with data from the 2019-07-01 JSON dump and applied the T132839 workarounds
  • 19:32 otto@deploy1001: Synchronized wmf-config/InitialiseSettings.php: Produce recentchange stream to eventgate-main - T211248 (duration: 00m 57s)
  • 19:26 otto@deploy1001: Synchronized wmf-config/CommonSettings.php: Use wgEventServiceStreamConfig to configure wgRCFeeds eventbus. No-op in prod. - T211248 (duration: 00m 58s)
  • 19:05 jiji@deploy1001: Finished deploy [cpjobqueue/deploy@8761480]: Migrating rest of hightraffic jobs to PHP7 - T219150 (duration: 01m 00s)
  • 19:04 jiji@deploy1001: Started deploy [cpjobqueue/deploy@8761480]: Migrating rest of hightraffic jobs to PHP7 - T219150
  • 18:15 jforrester@deploy1001: Synchronized php-1.34.0-wmf.13/includes/Linker.php: T227656 Fix visibility of IPs that aren't suppressed (duration: 00m 59s)
  • 17:54 twentyafterfour: phabricator: hotfixing fatal error by pulling upstream fix ( see https://secure.phabricator.com/D20644 )
  • 16:09 Urbanecm: Morning SWAT done
  • 16:09 urbanecm@deploy1001: Synchronized wmf-config/InitialiseSettings.php: SWAT: Change bawikibooks logo to correct one according to community wish (2/2, T227418) (duration: 00m 58s)
  • 16:07 Urbanecm: Purged two urls for T227418
  • 16:06 urbanecm@deploy1001: Synchronized static/images/project-logos/: SWAT: Change bawikibooks logo to correct one according to community (1/2, T227418) (duration: 01m 16s)
  • 16:04 urbanecm@deploy1001: Synchronized dblists/commonsuploads.dblist: SWAT: Disable local uploads on wuuwiki (T226764) (duration: 00m 58s)
  • 15:23 ema: cp-ulsfo: upgrade varnish to 5.1.3-1wm11 T227672
  • 15:08 ema: restart wb2-phab wikibugs job
  • 14:51 ema: upload varnish 5.1.3-1wm11 to stretch-wikimedia T227672
  • 14:42 godog: reimage ms-be2022 - T227667
  • 14:03 jbond42: copy puppetdb-termini 4.4.0-1~wmf2 from stretch-wikimedia to jessie-wikimedia
  • 13:47 ema: cp hosts: cleanup WP zero leftovers T213769
  • 13:22 godog: reset ilo on ms-be2022 - bios can't talk to it on boot
  • 12:49 godog: reboot ms-be2022 - T225713
  • 11:53 Urbanecm: Purged 14 urls for T211413
  • 11:51 Urbanecm: Purged 24 urls for T227635
  • 11:11 Urbanecm: EU SWAT done
  • 11:11 urbanecm@deploy1001: Synchronized wmf-config/InitialiseSettings.php: SWAT: Remove autopromote to patroller on testwiki (T168718) (duration: 00m 58s)
  • 11:10 urbanecm@deploy1001: Synchronized static/images/project-logos/: SWAT: Several logo changes (T227635 T211413) (duration: 01m 00s)
  • 11:09 urbanecm@deploy1001: Synchronized wmf-config/InitialiseSettings.php: SWAT: Remove fawikiquote HD logo (T211413) (duration: 00m 57s)
  • 11:07 urbanecm@deploy1001: sync-file aborted: SWAT: Several logo changes (T227635 T211413) (duration: 00m 20s)
  • 11:06 urbanecm@deploy1001: Synchronized docroot/noc/conf/highlight.php: SWAT: Fix non-working "raw text" links on noc.wikimedia.org web pages (T227606) (duration: 01m 02s)
  • 09:57 moritzm: re-enabled puppet on hosts using acme_chief::cert for reboots of acmechief hosts (actually did that 20 minutes ago, but missed to log earlier)
  • 09:54 jynus: disabling puppet on prometheus* hosts for upcoming deploy
  • 09:38 fsero: doing the same on ms-be1030
  • 09:37 fsero: docker-registry: running manual only once swift-container-sync on ms-be2019
  • 09:36 moritzm: rearmed keyholder on acmechief1001
  • 09:29 moritzm: rebooting acmechief1001 to pick up MDS-enabled qemu
  • 09:25 moritzm: rearmed keyholder on acmechief2001
  • 09:22 moritzm: rebooting acmechief2001 to pick up MDS-enabled qemu
  • 09:21 jmm@cumin2001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
  • 09:21 jmm@cumin2001: START - Cookbook sre.hosts.downtime
  • 09:19 moritzm: disabled puppet on hosts using acme_chief::cert for reboots of acmechief hosts
  • 08:16 vgutierrez@cumin1001: END (PASS) - Cookbook sre.ganeti.makevm (exit_code=0)
  • 08:16 vgutierrez@cumin1001: END (PASS) - Cookbook sre.ganeti.makevm (exit_code=0)
  • 08:06 vgutierrez@cumin1001: START - Cookbook sre.ganeti.makevm
  • 08:06 vgutierrez@cumin1001: START - Cookbook sre.ganeti.makevm
  • 05:41 marostegui@deploy1001: Synchronized wmf-config/db-eqiad.php: Fully repool db1079 after upgrade (duration: 00m 57s)
  • 05:32 marostegui@deploy1001: Synchronized wmf-config/db-eqiad.php: More traffic to db1079 after upgrade (duration: 00m 57s)
  • 05:17 marostegui@deploy1001: Synchronized wmf-config/db-eqiad.php: Slowly repool db1079 after upgrade (duration: 00m 58s)
  • 05:05 marostegui: Upgrade db1079
  • 05:05 marostegui@deploy1001: Synchronized wmf-config/db-eqiad.php: Depool db1079 for upgrade (duration: 00m 59s)

2019-07-09

  • 23:07 jhuneidi@deploy1001: rebuilt and synchronized wikiversions files: group0 wikis to 1.34.0-wmf.13 refs T220738
  • 23:06 robh: updating power ports on T209101 and disabling ports not in used (only turning off one side and awaiting any icinga alerts for 15 minutes before touching other side of power)
  • 22:55 krinkle@deploy1001: Synchronized php-1.34.0-wmf.13/extensions/AbuseFilter/includes/AbuseFilter.php: 0096dff3022 / T227613 (duration: 00m 57s)
  • 22:52 krinkle@deploy1001: Synchronized php-1.34.0-wmf.13/extensions/SecurePoll/includes/pages/: c7d7a55 / T227620 (duration: 00m 57s)
  • 22:09 krinkle@deploy1001: Synchronized php-1.34.0-wmf.13/extensions/Collection/includes/CollectionProposals.php: T227407 / 69a30966c (duration: 00m 57s)
  • 21:53 krinkle@deploy1001: Synchronized php-1.34.0-wmf.13/includes/libs/rdbms/: T226770 / 4c2a58589f2db (duration: 00m 59s)
  • 20:58 jhuneidi@deploy1001: rebuilt and synchronized wikiversions files: Revert "group0 wikis to 1.34.0-wmf.11"
  • 20:37 mutante: scb1001 - re-activate puppet, run puppet, stop pdfrender service, run puppet again (T226675)
  • 20:36 mutante: scb2001 - sudo systemctl stop pdfrender (T226675)
  • 20:25 mutante: temp disabling puppet on scb1001 - removing pdfrender classes from scb2001
  • 20:23 jhuneidi@deploy1001: rebuilt and synchronized wikiversions files: group0 wikis to 1.34.0-wmf.13 refs T220738
  • 20:12 jhuneidi@deploy1001: Finished scap: testwikis wikis to 1.34.0-wmf.13 (duration: 36m 39s)
  • 19:36 jhuneidi@deploy1001: Started scap: testwikis wikis to 1.34.0-wmf.13
  • 19:17 XioNoX: enable samping on cr2-eqiad:border-in4
  • 19:14 XioNoX: replace netflow target on cr2-eqiad with netflow1001
  • 18:19 longma: cutting the branch for 1.34.0-wmf.13 T220738
  • 17:32 mholloway-shell@deploy1001: Finished deploy [mobileapps/deploy@2a9d097]: Fix etag generation for the talk endpoint, take 2 (duration: 02m 04s)
  • 17:30 mholloway-shell@deploy1001: Started deploy [mobileapps/deploy@2a9d097]: Fix etag generation for the talk endpoint, take 2
  • 17:30 mholloway-shell@deploy1001: Finished deploy [mobileapps/deploy@2a9d097]: Fix etag generation for the talk endpoint (T227481) (duration: 03m 49s)
  • 17:26 mholloway-shell@deploy1001: Started deploy [mobileapps/deploy@2a9d097]: Fix etag generation for the talk endpoint (T227481)
  • 16:59 godog: reboot ms-be2039 with oemhp_powerreg=os - T225713
  • 16:54 godog: reboot ms-be2027 with oemhp_powerreg=os - T225713
  • 16:42 godog: reboot ms-be2026 with oemhp_powerreg=os - T225713
  • 16:29 godog: reboot ms-be2025 with oemhp_powerreg=os - T225713
  • 15:44 XioNoX: reject RPKI invalids on Ashburn peering links - T220669
  • 15:38 akosiaris: restart pybal on lvs2003, lvs1015. Removal of pdfrender service T226675
  • 15:38 XioNoX: reject RPKI invalids on Amsterdam peering link - T220669
  • 15:33 akosiaris: restart pybal on lvs2006, lvs1016. Removal of pdfrender service T226675
  • 15:28 XioNoX: reject RPKI invalids on Chicago peering link - T220669
  • 15:27 godog: reboot ms-be2024 with oemhp_powerreg=os - T225713
  • 15:22 godog: reboot ms-be2023 with oemhp_powerreg=os - T225713
  • 15:20 XioNoX: reject RPKI invalids on Singapore peering link - T220669
  • 15:13 XioNoX: reject RPKI invalids on Dallas peering link - T220669
  • 15:03 jeh: rebooting cloudnet1003.eqiad T224228
  • 14:53 gehel: repooled elastic2054 - T227298
  • 14:50 moritzm: installing orespoolcounter100[34] T227567
  • 14:42 XioNoX: reject RPKI invalids on ulsfo peering link - T220669
  • 14:29 jiji@deploy1001: Finished deploy [cpjobqueue/deploy@8517fec]: Migrating cirrus* jobs to PHP7 - T219150 (duration: 01m 02s)
  • 14:28 jiji@deploy1001: Started deploy [cpjobqueue/deploy@8517fec]: Migrating cirrus* jobs to PHP7 - T219150
  • 14:28 jeh: rebooting cloudnet1004.eqiad T224228
  • 14:21 tarrow@deploy1001: scap-helm termbox finished
  • 14:21 tarrow@deploy1001: scap-helm termbox cluster staging completed
  • 14:21 tarrow@deploy1001: scap-helm termbox upgrade staging stable/termbox -f termbox-staging-values.yaml [namespace: termbox, clusters: staging]
  • 13:59 moritzm: installing orespoolcounter200[34] T227567
  • 13:26 elukey: enable base::firewall on stat1007
  • 13:25 jbond@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
  • 13:25 jbond@cumin1001: START - Cookbook sre.hosts.downtime
  • 13:25 jbond@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
  • 13:25 jbond@cumin1001: START - Cookbook sre.hosts.downtime
  • 12:27 jmm@cumin1001: END (PASS) - Cookbook sre.ganeti.makevm (exit_code=0)
  • 12:21 jmm@cumin1001: START - Cookbook sre.ganeti.makevm
  • 12:18 jmm@cumin1001: END (PASS) - Cookbook sre.ganeti.makevm (exit_code=0)
  • 12:13 jmm@cumin1001: START - Cookbook sre.ganeti.makevm
  • 12:11 jmm@cumin2001: END (FAIL) - Cookbook sre.ganeti.makevm (exit_code=99)
  • 12:11 jmm@cumin2001: START - Cookbook sre.ganeti.makevm
  • 12:09 jmm@cumin2001: END (PASS) - Cookbook sre.ganeti.makevm (exit_code=0)
  • 12:04 jmm@cumin2001: START - Cookbook sre.ganeti.makevm
  • 12:02 jmm@cumin2001: END (PASS) - Cookbook sre.ganeti.makevm (exit_code=0)
  • 11:57 jmm@cumin2001: START - Cookbook sre.ganeti.makevm
  • 11:47 jbond@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
  • 11:47 jbond@cumin1001: START - Cookbook sre.hosts.downtime
  • 11:30 jbond@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
  • 11:30 jbond@cumin1001: START - Cookbook sre.hosts.downtime
  • 11:30 jbond@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
  • 11:29 jbond@cumin1001: START - Cookbook sre.hosts.downtime
  • 11:26 jbond@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
  • 11:26 jbond@cumin1001: START - Cookbook sre.hosts.downtime
  • 11:13 Urbanecm: EU SWAT done
  • 11:12 urbanecm@deploy1001: Synchronized wmf-config/flaggedrevs.php: SWAT: Disable flaggedrevs for hewikisource main page (T227000) (duration: 00m 48s)
  • 11:11 urbanecm@deploy1001: Synchronized wmf-config/InitialiseSettings.php: SWAT: Clean up `wgNamespacesWithSubpages` to remove unneeded entries (T227546) (duration: 00m 49s)
  • 11:09 urbanecm@deploy1001: Synchronized wmf-config/CommonSettings.php: SWAT: Configuration migration for Translate (T87985) (duration: 00m 49s)
  • 11:04 urbanecm@deploy1001: Synchronized wmf-config/InitialiseSettings.php: SWAT: Configure help urls for MediaInfo (T227226) (duration: 00m 50s)
  • 10:39 elukey: update wikimedia-buster thirparty/amd-rocm component with upstream packages - T224723
  • 10:14 jbond42: upgrade openssl on canary systems
  • 09:30 ema@puppetmaster1001: conftool action : set/pooled=yes; selector: name=cp1076.eqiad.wmnet,service=ats-be
  • 09:26 ema: cp1076: restart trafficserver with storage.config set to /dev/nvme0n1
  • 09:25 ema@puppetmaster1001: conftool action : set/pooled=no; selector: name=cp1076.eqiad.wmnet,service=ats-be
  • 09:13 elukey: enable per-server metrics on all prometheus-mcrouter-exporter(s) via puppet - T225059
  • 09:11 marostegui@deploy1001: Synchronized wmf-config/db-eqiad.php: Fully repool db1086 after upgrade (duration: 00m 49s)
  • 08:56 marostegui@deploy1001: Synchronized wmf-config/db-eqiad.php: Slowly repool db1086 after upgrade (duration: 00m 47s)
  • 08:49 elukey: upgrade prometheus-mcrouter-exporter to 0.0.0+git20190709-1 on mw-eqiad (cumin alias) via debdeploy - T225059
  • 08:41 marostegui: Upgrade db1086
  • 08:41 marostegui@deploy1001: Synchronized wmf-config/db-eqiad.php: Depool db1086 for upgrade (duration: 00m 51s)
  • 08:36 elukey: upgrade prometheus-mcrouter-exporter to 0.0.0+git20190709-1 on mw-codfw (cumin alias) via debdeploy - T225059
  • 08:08 moritzm: installing zeromq3 security updates
  • 08:00 marostegui: Upgrade db1065 to 10.1.39
  • 07:39 moritzm: pruning unused libzmq3/python-zmq packages from swift/parsoid hosts
  • 07:26 elukey: upload prometheus-mcrouter-exporter 0.0.0+git20190709-1 to stretch-wikimedia - T225059
  • 06:00 marostegui: Failover m2 from db1065 to db1132 - T226952
  • 05:19 marostegui: Start switchover steps T226952
  • 05:13 marostegui: Rebooting pc2010 for a second time as per papaul's suggestion T227552
  • 04:53 marostegui: Reboot pc2010 to debug a memory issue
  • 01:47 XioNoX: restart PHP FPM on mwdebug2001
  • 01:35 XioNoX: restart PHP FPM on mwdebug1002

2019-07-08

  • 23:03 tzatziki: changing password for user "Naomi.piquette"
  • 20:57 bd808: Upgraded prometheus-pdns-exporter to 0.4.1 on cloudservices1004.wikimedia.org (T227411)
  • 20:53 bd808: Upgraded prometheus-pdns-exporter to 0.4.1 on cloudservices1003.wikimedia.org (T227411)
  • 19:38 reedy@deploy1001: Synchronized php-1.34.0-wmf.11/extensions/OATHAuth/src/Key/TOTPKey.php: T227502 (duration: 00m 50s)
  • 19:23 moritzm: uploaded prometheus-pdns-exporter 0.4.1 to stretch-wikimedia T227411
  • 18:43 otto@deploy1001: Synchronized wmf-config/InitialiseSettings.php: Produce page-* streams to eventgate-main - T211248 (duration: 00m 50s)
  • 18:33 moritzm: installing zeromq3 security updates
  • 18:15 Urbanecm: Morning SWAT done
  • 18:14 urbanecm@deploy1001: Synchronized wmf-config/InitialiseSettings.php: SWAT: Change liwikinews logo to correct one per community wish (2/2, T227418) (duration: 00m 49s)
  • 18:13 urbanecm@deploy1001: Synchronized static/images/project-logos/: SWAT: Change liwikinews logo to correct one per community wish (1/2, T227418) (duration: 00m 49s)
  • 18:10 urbanecm@deploy1001: Synchronized wmf-config/InitialiseSettings.php: SWAT: Add templateeditor user group and protection level on commons (T227420) (duration: 00m 49s)
  • 18:06 urbanecm@deploy1001: Synchronized wmf-config/CirrusSearch-production.php: SWAT: [cirrus] Increase elastic master timeout to 5m (T227136) (duration: 00m 49s)
  • 18:04 urbanecm@deploy1001: Synchronized wmf-config/InitialiseSettings.php: SWAT: Enable RDF output for MediaInfo (T221916) (duration: 00m 49s)
  • 17:20 gehel@deploy1001: Finished deploy [wdqs/wdqs@4b7cdf5]: new blazegraph and updater version (duration: 12m 47s)
  • 17:08 gehel@deploy1001: Started deploy [wdqs/wdqs@4b7cdf5]: new blazegraph and updater version
  • 16:40 eevans@deploy1001: scap-helm sessionstore finished
  • 16:40 eevans@deploy1001: scap-helm sessionstore cluster staging completed
  • 16:40 eevans@deploy1001: scap-helm sessionstore upgrade staging -f sessionstore-staging-values.yaml stable/kask [namespace: sessionstore, clusters: staging]
  • 16:39 eevans@deploy1001: scap-helm sessionstore finished
  • 16:38 eevans@deploy1001: scap-helm sessionstore cluster staging completed
  • 16:38 eevans@deploy1001: scap-helm sessionstore upgrade staging -f sessionstore-staging-values.yaml stable/kask [namespace: sessionstore, clusters: staging]
  • 16:38 eevans@deploy1001: scap-helm sessionstore upgrade staging -f sessionstore-staging-values.yaml stable/kask [namespace: sessionstore, clusters: staging]
  • 16:36 eevans@deploy1001: scap-helm sessionstore finished
  • 16:36 eevans@deploy1001: scap-helm sessionstore cluster staging completed
  • 16:36 eevans@deploy1001: scap-helm sessionstore upgrade staging -f sessionstore-staging-values.yaml stable/kask [namespace: sessionstore, clusters: staging]
  • 16:05 sbassett@deploy1001: Synchronized wmf-config/InitialiseSettings.php: Temporary make account creation limits more restrictive - part III (duration: 00m 50s)
  • 15:59 godog: bounce prometheus@k8s on prometheus200[34] - T227478
  • 15:54 marostegui@deploy1001: Synchronized wmf-config/db-codfw.php: Promote db2045 instead of db2069 as x1 codfw master (duration: 00m 49s)
  • 15:45 marostegui: Failover db2069 to db2045 on x1 codfw
  • 15:21 marostegui@deploy1001: Synchronized wmf-config/db-codfw.php: Promote db2069 as x1 codfw master (duration: 00m 50s)
  • 15:15 jynus: shutting down db2097 T225378 T216240
  • 15:13 jiji@deploy1001: Finished deploy [cpjobqueue/deploy@7379e91]: Migrating refreshLinks to PHP7 - T219150 (duration: 01m 26s)
  • 15:12 jiji@deploy1001: Started deploy [cpjobqueue/deploy@7379e91]: Migrating refreshLinks to PHP7 - T219150
  • 15:07 eevans@deploy1001: scap-helm sessionstore finished
  • 15:07 eevans@deploy1001: scap-helm sessionstore cluster staging completed
  • 15:07 eevans@deploy1001: scap-helm sessionstore upgrade staging -f sessionstore-staging-values.yaml stable/kask [namespace: sessionstore, clusters: staging]
  • 15:04 eevans@deploy1001: scap-helm sessionstore finished
  • 15:04 eevans@deploy1001: scap-helm sessionstore cluster staging completed
  • 15:04 eevans@deploy1001: scap-helm sessionstore upgrade staging -f sessionstore-staging-values.yaml stable/kask [namespace: sessionstore, clusters: staging]
  • 14:57 marostegui: Failover x1 codfw from db2045 to db2069
  • 14:48 ppchelko@deploy1001: Finished deploy [restbase/deploy@9a99b17]: Loosen etag regex for talk endpoint and fix alert (duration: 16m 07s)
  • 14:45 marostegui: Restart MySQL on db1132 to enable performance_schema - T226952
  • 14:43 urandom: decommissioning restbase1017-c -- T222960
  • 14:32 ppchelko@deploy1001: Started deploy [restbase/deploy@9a99b17]: Loosen etag regex for talk endpoint and fix alert
  • 14:21 papaul: shutting down elastic2054 for troubleshooting
  • 14:05 ppchelko@deploy1001: Finished deploy [restbase/deploy@8e81e98]: Release 1.0, expose talk endpoints T225733, suggestions endpoints T224754, fix summary purging T226983 (duration: 16m 11s)
  • 14:03 eevans@deploy1001: scap-helm sessionstore finished
  • 14:03 eevans@deploy1001: scap-helm sessionstore cluster staging completed
  • 14:03 eevans@deploy1001: scap-helm sessionstore upgrade staging -f sessionstore-staging-values.yaml stable/kask [namespace: sessionstore, clusters: staging]
  • 13:53 godog: reprepro --delete clearvanished on install1002 to cleanup trusty
  • 13:52 elukey: import AMD ROCm's Debian repo key (9386B48A1A693C5C) manually on install1002 - T224723
  • 13:51 moritzm: running "apt-get --allow-releaseinfo-update" on all buster hosts which were installed prior to the final buster release
  • 13:48 ppchelko@deploy1001: Started deploy [restbase/deploy@8e81e98]: Release 1.0, expose talk endpoints T225733, suggestions endpoints T224754, fix summary purging T226983
  • 13:30 godog: bounce prometheus@k8s on prometheus1003
  • 12:52 godog: copy mtail to buster-wikimedia - T225604
  • 12:42 kartik@deploy1001: scap-helm cxserver finished
  • 12:42 kartik@deploy1001: scap-helm cxserver cluster eqiad completed
  • 12:42 kartik@deploy1001: scap-helm cxserver upgrade -f cxserver-eqiad-values.yaml production stable/cxserver [namespace: cxserver, clusters: eqiad]
  • 12:39 kartik@deploy1001: scap-helm cxserver finished
  • 12:39 kartik@deploy1001: scap-helm cxserver cluster codfw completed
  • 12:39 kartik@deploy1001: scap-helm cxserver upgrade -f cxserver-codfw-values.yaml production stable/cxserver [namespace: cxserver, clusters: codfw]
  • 12:36 kartik@deploy1001: scap-helm cxserver finished
  • 12:36 kartik@deploy1001: scap-helm cxserver cluster staging completed
  • 12:36 kartik@deploy1001: scap-helm cxserver upgrade -f cxserver-staging-values.yaml staging stable/cxserver [namespace: cxserver, clusters: staging]
  • 11:47 Urbanecm: EU SWAT done
  • 11:44 urbanecm@deploy1001: Synchronized php-1.34.0-wmf.11/includes/Title.php: SWAT: Title: ensure getBaseTitle and getRootTitle return valid Titles (T225585) (duration: 00m 50s)
  • 11:39 Urbanecm: Purged 14 logo urls for T227418
  • 11:36 urbanecm@deploy1001: Synchronized wmf-config/CirrusSearch-common.php: SWAT: Fix array shape for $wgCirrusSearchExtraIndexes (T227379) (duration: 00m 51s)
  • 11:32 urbanecm@deploy1001: Synchronized wmf-config/InitialiseSettings.php: SWAT: Remove HD logos for projects with no entry in wgLogo or add a wgLogo entry (2/2, T227418) (duration: 00m 49s)
  • 11:30 urbanecm@deploy1001: Synchronized static/images/project-logos/: SWAT: Remove HD logos for projects with no entry in wgLogo or add a wgLogo entry (1/2, T227418) (duration: 00m 49s)
  • 11:26 moritzm: installing poolcounter1004/1005
  • 11:25 urbanecm@deploy1001: Synchronized php-1.34.0-wmf.11/extensions/AbuseFilter/: SWAT: Fix query in normalizeThrottleParameters (T209565) (duration: 00m 51s)
  • 11:22 ladsgroup@deploy1001: Synchronized wmf-config/Wikibase.php: Disable Wikidata for ProofreadPage namespaces (T227201) (duration: 00m 50s)
  • 11:16 ladsgroup@deploy1001: Synchronized wmf-config/InitialiseSettings.php: Enable jsonld output format for wikibase entities everywhere (T207168) (duration: 00m 49s)
  • 11:11 urbanecm@deploy1001: Synchronized wmf-config/flaggedrevs.php: SWAT: Remove "עמוד" namespace from wgFlaggedRevsNamespaces for hewikisource (T227000) (duration: 00m 49s)
  • 11:09 urbanecm@deploy1001: Synchronized wmf-config/InitialiseSettings.php: SWAT: Add several Ukrainian government websites to wgCopyUploadsDomains (T227366) (duration: 00m 49s)
  • 11:07 urbanecm@deploy1001: Synchronized wmf-config/InitialiseSettings.php: SWAT: Create "autopatrolled" user group on az.wiktionary (T227208) (duration: 00m 49s)
  • 11:04 urbanecm@deploy1001: Synchronized dblists/commonsuploads.dblist: SWAT: Create "autopatrolled" user group on az.wiktionary (T227208) (duration: 00m 50s)
  • 10:56 moritzm: installing poolcounter2003/2004
  • 10:39 jdrewniak@deploy1001: Synchronized portals: Wikimedia Portals Update: Bumping portals to master (T128546) (duration: 00m 49s)
  • 10:38 jdrewniak@deploy1001: Synchronized portals/wikipedia.org/assets: Wikimedia Portals Update: Bumping portals to master (T128546) (duration: 00m 51s)
  • 09:51 elukey@cumin1001: END (FAIL) - Cookbook sre.ganeti.makevm (exit_code=99)
  • 09:51 elukey@cumin1001: START - Cookbook sre.ganeti.makevm
  • 09:49 ema: removed /srv/prometheus/ops/targets/varnish-upload-ats_mtail_$DC.yaml from prometheus hosts
  • 08:27 moritzm: updated buster installer images to final release
  • 07:43 moritzm: rebooting hassium to pick up MDS-enabled qemu
  • 07:43 jmm@cumin2001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
  • 07:43 jmm@cumin2001: START - Cookbook sre.hosts.downtime
  • 07:40 moritzm: rebooting weblog1001 for kernel security update
  • 07:38 jynus: deploying sys schema to missing db production hosts
  • 07:34 jmm@cumin2001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
  • 07:34 jmm@cumin2001: START - Cookbook sre.hosts.downtime
  • 07:00 elukey: add base::firewall to stat1004 - T170826
  • 06:43 marostegui@deploy1001: Synchronized wmf-config/db-eqiad.php: Fully repool db1109 after changing its binlog format (duration: 00m 49s)
  • 06:36 marostegui: Run compare for s5 main tables on db2038 vs db2059 - T221533
  • 06:09 marostegui@deploy1001: Synchronized wmf-config/db-eqiad.php: More traffic to db1109 after changing its binlog format (duration: 00m 49s)
  • 05:50 marostegui@deploy1001: Synchronized wmf-config/db-eqiad.php: Fully repool db1094 after upgrade, slowly repool db1109 after changing its binlog format (duration: 00m 49s)
  • 05:45 marostegui: Restart MySQL on db1109 to pick up STATEMENT as binlog format - T227062
  • 05:44 marostegui@deploy1001: Synchronized wmf-config/db-eqiad.php: Depool db1109 for binlog format change (duration: 00m 49s)
  • 05:39 marostegui@deploy1001: Synchronized wmf-config/db-eqiad.php: More weight to db1094 after upgrade (duration: 00m 51s)
  • 05:31 marostegui: Compress medium wikis on labsdb1009 - T222978
  • 05:24 marostegui@deploy1001: Synchronized wmf-config/db-eqiad.php: Slowly repool db1094 after upgrade (duration: 00m 49s)
  • 05:22 marostegui: Drop empty table edit_page_tracking from some s3 wikis - T57385
  • 05:11 marostegui: Drop empty table edit_page_tracking from s7 - T57385
  • 05:08 marostegui: Stop MySQL on db1094 for upgrade
  • 05:08 marostegui@deploy1001: Synchronized wmf-config/db-eqiad.php: Depool db1094 for upgrade (duration: 00m 50s)
  • 03:19 sbassett@deploy1001: Synchronized wmf-config/InitialiseSettings.php: Temporary make account creation limits more restrictive (duration: 00m 53s)
  • 01:16 sbassett@deploy1001: Synchronized wmf-config/InitialiseSettings.php: Temporary make account creation limits more restrictive (duration: 00m 50s)

2019-07-07

  • 20:13 urandom: decommissioning restbase1017-b -- T222960
  • 17:25 urandom: decommissioning restbase1017-a -- T222960
  • 15:14 godog: power reset restbase2009

2019-07-06

  • 07:56 thcipriani: restarting gerrit out of heap space

2019-07-05

  • 17:18 smalyshev@deploy1001: Finished deploy [wdqs/wdqs@4b7cdf5]: Deploy new versions preparing for reload (duration: 00m 39s)
  • 17:17 smalyshev@deploy1001: Started deploy [wdqs/wdqs@4b7cdf5]: Deploy new versions preparing for reload
  • 17:17 smalyshev@deploy1001: Finished deploy [wdqs/wdqs@4b7cdf5]: Deploy new versions preparing for reload (duration: 00m 01s)
  • 17:17 smalyshev@deploy1001: Started deploy [wdqs/wdqs@4b7cdf5]: Deploy new versions preparing for reload
  • 15:32 fsero: uploaded debian buster base docker image
  • 15:30 jmm@cumin2001: END (PASS) - Cookbook sre.ganeti.makevm (exit_code=0)
  • 15:23 fsero: restarting swift-container-sync on swift backends
  • 15:20 jmm@cumin2001: START - Cookbook sre.ganeti.makevm
  • 15:15 jmm@cumin2001: END (PASS) - Cookbook sre.ganeti.makevm (exit_code=0)
  • 15:05 jmm@cumin2001: START - Cookbook sre.ganeti.makevm
  • 15:01 jmm@cumin2001: END (PASS) - Cookbook sre.ganeti.makevm (exit_code=0)
  • 14:51 jmm@cumin2001: START - Cookbook sre.ganeti.makevm
  • 14:24 jmm@cumin2001: END (PASS) - Cookbook sre.ganeti.makevm (exit_code=0)
  • 14:15 jmm@cumin2001: START - Cookbook sre.ganeti.makevm
  • 13:44 elukey: roll restart of aqs on aqs100* to pick up new druid settings
  • 13:33 fsero: disabling puppet on swift backends
  • 13:26 fsero: restarting swift-container-sync on swift backends
  • 13:05 ema: pool cp1090 w/ ATS backend T226638
  • 12:12 ema: depool cp1090 and reimage as upload_ats T226638
  • 11:46 ema: pool cp1088 w/ ATS backend T226638
  • 11:39 jiji@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
  • 11:39 jiji@cumin1001: START - Cookbook sre.hosts.downtime
  • 11:38 jijiki: Reboot ms-be1021 - T141756 - T227076
  • 11:32 jijiki: Upgrading smartarray firmware on ms-be1021 - T141756 - T227076
  • 11:31 moritzm: installing postgresql-9.4 updates on jessie
  • 11:10 jmm@cumin1001: END (FAIL) - Cookbook sre.ganeti.makevm (exit_code=99)
  • 11:09 jmm@cumin1001: START - Cookbook sre.ganeti.makevm
  • 11:05 jmm@cumin2001: END (FAIL) - Cookbook sre.ganeti.makevm (exit_code=99)
  • 11:05 jmm@cumin2001: START - Cookbook sre.ganeti.makevm
  • 11:05 jmm@cumin2001: END (FAIL) - Cookbook sre.ganeti.makevm (exit_code=99)
  • 11:04 jmm@cumin2001: START - Cookbook sre.ganeti.makevm
  • 11:00 ema: depool cp1088 and reimage as upload_ats T226638
  • 10:55 ema: pool cp1086 w/ ATS backend T226638
  • 10:29 moritzm: rebooting debug proxies to pick up MDS-enabled qemu
  • 10:29 jmm@cumin2001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
  • 10:29 jmm@cumin2001: START - Cookbook sre.hosts.downtime
  • 10:23 jiji@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
  • 10:23 jiji@cumin1001: START - Cookbook sre.hosts.downtime
  • 10:23 moritzm: rebooting seaborgium to pick up correct Stretch kernel
  • 10:15 moritzm: rebooting serpens to pick up correct Stretch kernel
  • 10:14 moritzm: fixed up kernel packages on serpens/seaborgium, these were dist-upgraded from jessie, but the correct kernel packages for Stretch were not setup, as such there were still stuck with an old jessie kernel
  • 10:06 jiji@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
  • 10:06 jiji@cumin1001: START - Cookbook sre.hosts.downtime
  • 10:02 jijiki: Rolling rebood rdb* hosts - T227304
  • 10:00 moritzm: rebooting seaborgium to pick up MDS-enabled qemu
  • 09:51 moritzm: rebooting serpens to pick up MDS-enabled qemu
  • 09:48 jmm@cumin2001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
  • 09:48 jmm@cumin2001: START - Cookbook sre.hosts.downtime
  • 09:39 ema: depool cp1086 and reimage as upload_ats T226638
  • 09:31 moritzm: rebooting LDAP replicas in eqiad
  • 09:28 jmm@cumin2001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
  • 09:28 jmm@cumin2001: START - Cookbook sre.hosts.downtime
  • 09:15 gehel@puppetmaster1001: conftool action : set/pooled=inactive; selector: dc=codfw,name=elastic2054.codfw.wmnet
  • 09:01 moritzm: rebooting kraz (irc.wikimedia.org) to pick up MDS-enabled qemu
  • 08:54 jmm@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
  • 08:54 jmm@cumin1001: START - Cookbook sre.hosts.downtime
  • 07:57 jynus@deploy1001: Synchronized wmf-config/db-eqiad.php: Repool db1087 (duration: 00m 48s)
  • 07:35 moritzm: installing imagemagick security updates on jessie
  • 07:23 moritzm: installing wireshark security updates on jessie
  • 07:17 marostegui: Compress small wikis on labsdb1009 T222978
  • 07:13 jynus@deploy1001: Synchronized wmf-config/db-eqiad.php: Depool db1087 (duration: 00m 52s)
  • 06:46 jynus@deploy1001: Synchronized wmf-config/db-eqiad.php: Repool db1109 with full weight (duration: 00m 49s)
  • 06:35 marostegui@deploy1001: Synchronized wmf-config/db-codfw.php: Remove old comments (duration: 00m 50s)
  • 05:56 marostegui@deploy1001: Synchronized wmf-config/db-eqiad.php: Fully repool db1104 after upgrade (duration: 00m 49s)
  • 05:43 marostegui@deploy1001: Synchronized wmf-config/db-eqiad.php: Slowly repool db1104 after upgrade (duration: 00m 49s)
  • 05:23 marostegui: Upgrade db1104 T227062
  • 05:22 marostegui@deploy1001: Synchronized wmf-config/db-eqiad.php: Depool db1104 for upgrade (duration: 00m 51s)
  • 05:18 vgutierrez@cumin1001: END (PASS) - Cookbook sre.ganeti.makevm (exit_code=0)
  • 05:17 vgutierrez@cumin1001: END (PASS) - Cookbook sre.ganeti.makevm (exit_code=0)
  • 05:09 marostegui: Stop MySQL on db1069 for decommission T227166
  • 05:08 vgutierrez@cumin1001: START - Cookbook sre.ganeti.makevm
  • 05:08 vgutierrez@cumin1001: START - Cookbook sre.ganeti.makevm
  • 05:02 marostegui: Remove db1069 from tendril and zarcillo - T227166

2019-07-04

  • 21:50 volans@deploy1001: Finished deploy [debmonitor/deploy@0ee26a3]: Deploy Debmonitor v0.1.10 (duration: 00m 48s)
  • 21:50 volans@deploy1001: Started deploy [debmonitor/deploy@0ee26a3]: Deploy Debmonitor v0.1.10
  • 21:35 volans: forcing reboot of elastic2054 from console, host unresponsive - T227298
  • 17:03 AndyRussG: re-enabled banner impressions loader job
  • 16:36 ema: pool cp1084 w/ ATS backend T226638
  • 16:02 AndyRussG: DjangoBannerStats revision changed from 02be6cbb74 to 8965666e17
  • 15:56 AndyRussG: temporarily disabled banner impressions loader job
  • 15:34 ema: depool cp1084 and reimage as upload_ats T226638
  • 15:22 ema: pool cp1082 w/ ATS backend T226638
  • 14:51 twentyafterfour: phabricator: lowered phd.taskmasters config to 1 from 10
  • 14:28 ema: depool cp1080 and reimage as upload_ats T226638
  • 13:51 volans: removing python-conftool (old py2 version) from all hosts - T226965
  • 13:40 ema: pool cp1080 w/ ATS backend T226638
  • 13:23 volans: upgraded scap to 3.11.0-1 on A:eqiad - T227225
  • 13:15 godog: reboot ms-be2037 after setting "os control" for power regulator mode - T225713
  • 13:05 volans: upgraded scap to 3.11.0-1 on A:codfw - T227225
  • 12:43 marostegui: Restore defaults replication consistency options on db2065 - T227251
  • 12:40 volans: upgraded scap to 3.11.0-1 on deploy[12]001 - T227225
  • 12:39 ema: depool cp1080 and reimage as upload_ats T226638
  • 12:24 jynus@deploy1001: Synchronized wmf-config/db-eqiad.php: Repool db1109 with low weight (duration: 00m 49s)
  • 12:21 hoo: Started a Wikidata JSON dump run (sudo -b -u dumpsgen /usr/local/bin/dumpwikidatajson.sh) on snapshot1008 (T227207)
  • 12:01 moritzm: upgrading buster installations to final frozen package state
  • 11:59 jynus: stop and upgrade db1109 T227062
  • 11:53 jynus@deploy1001: Synchronized wmf-config/db-eqiad.php: Depool db1109 for upgrade (duration: 00m 50s)
  • 11:47 jynus@deploy1001: Synchronized wmf-config/db-eqiad.php: Depool db1109 for upgrade (duration: 00m 45s)
  • 11:38 volans: upgraded scap to 3.11.0-1 on A:mw-canary - T227225
  • 10:47 marostegui: Ease replication consistency option on db2065 to allow it to catch a bit - T227251
  • 10:01 gehel@cumin1001: END (PASS) - Cookbook sre.wdqs.data-transfer (exit_code=0)
  • 09:56 jmm@cumin2001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
  • 09:56 jmm@cumin2001: START - Cookbook sre.hosts.downtime
  • 09:55 moritzm: rolling reboot of kubestagetcd* to pick up MDS-enabled qemu
  • 09:52 jmm@cumin2001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
  • 09:52 jmm@cumin2001: START - Cookbook sre.hosts.downtime
  • 09:41 moritzm: rearmed keyholder on netmon1002
  • 09:36 moritzm: rebooting netmon1002 for kernel security update
  • 09:36 jmm@cumin2001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
  • 09:36 jmm@cumin2001: START - Cookbook sre.hosts.downtime
  • 09:25 volans: uploaded scap_3.11.0-1 to {jessie,stretch,buster}-wikimedia APT - T227225
  • 09:07 moritzm: partly rearmed keyholder on deploy1001 (missing for apache2modsec)
  • 09:00 moritzm: rebooting deploy1001 for kernel security update
  • 08:59 jmm@cumin2001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
  • 08:59 jmm@cumin2001: START - Cookbook sre.hosts.downtime
  • 08:41 marostegui: Repool labsdb1011 - T222978
  • 08:29 gehel@cumin1001: START - Cookbook sre.wdqs.data-transfer
  • 08:29 vgutierrez: upgrading acme-chief to version 0.18 in acme-chief test instances - T225945
  • 08:25 moritzm: rearmed keyholder on cumin1001
  • 08:22 vgutierrez: uploaded acme-chief 0.18 to apt.wikimedia.org (buster) - T225945
  • 08:22 ema: pool cp1078 w/ ATS backend T226638
  • 08:21 moritzm: rebooting cumin1001 for kernel security update
  • 08:20 jmm@cumin2001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
  • 08:20 jmm@cumin2001: START - Cookbook sre.hosts.downtime
  • 08:08 marostegui: Upgrade db2044 - T226952
  • 08:00 moritzm: rearmed keyholder on cumin2001
  • 07:57 moritzm: rebooting cumin2001 for kernel security update
  • 07:55 jmm@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
  • 07:55 jmm@cumin1001: START - Cookbook sre.hosts.downtime
  • 07:51 marostegui@deploy1001: Synchronized wmf-config/db-codfw.php: Remove db1069 from config as it will be decommissioned T227166 (duration: 00m 48s)
  • 07:50 marostegui@deploy1001: Synchronized wmf-config/db-eqiad.php: Remove db1069 from config as it will be decommissioned T227166 (duration: 00m 49s)
  • 07:28 marostegui@deploy1001: Synchronized wmf-config/db-eqiad.php: Fully repool db1101 after upgrade (duration: 00m 49s)
  • 07:17 marostegui@deploy1001: Synchronized wmf-config/db-eqiad.php: More traffic to db1101 after upgrade (duration: 00m 49s)
  • 07:17 ema: depool cp1078 and reimage as upload_ats T226638
  • 07:09 moritzm: rebooting restbase-dev* for kernel security updates
  • 07:00 jmm@cumin2001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
  • 07:00 jmm@cumin2001: START - Cookbook sre.hosts.downtime
  • 06:59 marostegui@deploy1001: Synchronized wmf-config/db-eqiad.php: More traffic to db1101 after upgrade (duration: 00m 48s)
  • 06:45 moritzm: restarting archiva on archiva.wikimedia.org to pick up Java security update
  • 06:42 elukey: update puppet compiler's facts
  • 05:57 twentyafterfour: disabled phd on phab1003 while I clean things up. Registered the downtime in icinga
  • 05:33 marostegui@deploy1001: Synchronized wmf-config/db-eqiad.php: Slowly repool db1101 after upgrade (duration: 00m 49s)
  • 05:16 marostegui: Upgrade db1101 - T227062
  • 05:16 marostegui@deploy1001: Synchronized wmf-config/db-eqiad.php: Depool db1101 for upgrade (duration: 00m 50s)
  • 00:41 twentyafterfour: phabricator upgrade complete
  • 00:27 twentyafterfour: Deploying Phabricator release/2019-07-03/1 from wmf/stable
  • 00:21 cscott@deploy1001: Finished deploy [parsoid/deploy@af5fd0e]: Updating Parsoid to d355bc90 (deploy-20170703 branch, T227216) (duration: 06m 48s)
  • 00:15 cscott@deploy1001: Started deploy [parsoid/deploy@af5fd0e]: Updating Parsoid to d355bc90 (deploy-20170703 branch, T227216)
  • 00:03 niharika29@deploy1001: Synchronized wmf-config/InitialiseSettings.php: Deploy PB to wikisource, wikivoyage and wiktionary projects; T218626 (duration: 00m 50s)

2019-07-03

  • 23:26 foks: reset email for "Uwe Martens"
  • 23:00 jforrester@deploy1001: Synchronized php-1.34.0-wmf.11/extensions/MobileFrontend/resources/dist/: T221197 schemaEditAttemptStep: only set bucket and anonymous-user-token on defaults if non-null (duration: 00m 51s)
  • 22:59 mutante: stat1007 - jbd2/md0-8 invoked oom-killer
  • 22:57 mutante: stat1007 - systemctl restart nagios-nrpe-server after OOM from some python process
  • 20:58 XioNoX: add static backup routes for anycast recdns on cr1/2-codfw/eqiad - T186550
  • 20:45 bsitzmann@deploy1001: Finished deploy [mobileapps/deploy@350e74b]: Update mobileapps to 94d0233 (T205550) (duration: 05m 11s)
  • 20:40 bsitzmann@deploy1001: Started deploy [mobileapps/deploy@350e74b]: Update mobileapps to 94d0233 (T205550)
  • 20:28 bsitzmann@deploy1001: Finished deploy [mobileapps/deploy@cf64319]: Update mobileapps to fdb0108 (T205550) (duration: 01m 10s)
  • 20:27 bsitzmann@deploy1001: Started deploy [mobileapps/deploy@cf64319]: Update mobileapps to fdb0108 (T205550)
  • 20:25 bsitzmann@deploy1001: Finished deploy [mobileapps/deploy@cf64319]: Update mobileapps to fdb0108 (T205550) (duration: 01m 25s)
  • 20:24 bsitzmann@deploy1001: Started deploy [mobileapps/deploy@cf64319]: Update mobileapps to fdb0108 (T205550)
  • 20:12 jeh: rebooting labmon1001 T224228
  • 19:58 jeh: rebooting labmon1002 T224228
  • 19:44 jeh: rebooting labpuppetmaster1001 T224228
  • 19:22 jeh: rebooting labpuppetmaster1002 T224228
  • 19:10 jeh: rebooting cloudelastic1004 T224228
  • 19:02 jeh: rebooting cloudelastic1003 T224228
  • 18:58 jforrester@deploy1001: Synchronized php-1.34.0-wmf.11/extensions/Wikibase/data-access/src/GenericServices.php: T227207 Fix missing qualifier hashes in JSON output (duration: 00m 50s)
  • 18:54 jeh: rebooting cloudelastic1002 T224228
  • 18:46 jeh: rebooting cloudelastic1001 T224228
  • 16:43 fsero@: helmfile [STAGING] Ran 'apply' command on namespace 'zotero' for release 'staging' .
  • 16:36 fsero@: helmfile [STAGING] Ran 'apply' command on namespace 'zotero' for release 'staging' .
  • 16:35 fsero@: helmfile [STAGING] Ran 'apply' command on namespace 'zotero' for release 'staging' .
  • 16:35 fsero@: helmfile [STAGING] Ran 'apply' command on namespace 'blubberoid' for release 'staging' .
  • 16:24 Urbanecm: Morning SWAT done
  • 16:23 urbanecm@deploy1001: Synchronized php-1.34.0-wmf.11/extensions/ReadingLists/: SWAT: Fix API continuation (T226640) (duration: 00m 49s)
  • 16:21 lucaswerkmeister-wmde@deploy1001: Synchronized wmf-config/: SWAT: Enable DataBridge on Beta (T226816) (production no-op) (duration: 00m 54s)
  • 16:18 robh@cumin1001: END (PASS) - Cookbook sre.hosts.decommission (exit_code=0)
  • 16:18 robh@cumin1001: START - Cookbook sre.hosts.decommission
  • 16:18 robh@cumin1001: END (PASS) - Cookbook sre.hosts.decommission (exit_code=0)
  • 16:18 robh@cumin1001: START - Cookbook sre.hosts.decommission
  • 16:18 robh@cumin1001: END (PASS) - Cookbook sre.hosts.decommission (exit_code=0)
  • 16:17 robh@cumin1001: START - Cookbook sre.hosts.decommission
  • 16:16 fsero: deleting zotero namespace and recreating it with helmfile on staging cluster
  • 16:13 root@: helmfile [STAGING] Ran 'apply' command on namespace 'zotero' for release 'staging' .
  • 16:13 root@: helmfile [STAGING] Ran 'apply' command on namespace 'termbox' for release 'staging' .
  • 16:13 root@: helmfile [STAGING] Ran 'apply' command on namespace 'sessionstore' for release 'staging' .
  • 16:13 root@: helmfile [STAGING] Ran 'apply' command on namespace 'mathoid' for release 'staging' .
  • 16:13 root@: helmfile [STAGING] Ran 'apply' command on namespace 'eventgate-main' for release 'main' .
  • 16:13 root@: helmfile [STAGING] Ran 'apply' command on namespace 'eventgate-analytics' for release 'analytics' .
  • 16:13 root@: helmfile [STAGING] Ran 'apply' command on namespace 'cxserver' for release 'staging' .
  • 16:13 root@: helmfile [STAGING] Ran 'apply' command on namespace 'citoid' for release 'staging' .
  • 16:13 root@: helmfile [STAGING] Ran 'apply' command on namespace 'blubberoid' for release 'staging' .
  • 16:10 moritzm: rearmed keyholder on netmon2001 (was rebooted earlier)
  • 16:09 urbanecm@deploy1001: Synchronized wmf-config/InitialiseSettings.php: SWAT: Undeploy reader demographics surveys (T226273) (duration: 00m 49s)
  • 16:07 urbanecm@deploy1001: Synchronized wmf-config/throttle.php: SWAT: Clean expired throttle rules (duration: 00m 49s)
  • 15:55 jmm@puppetmaster1001: conftool action : set/pooled=inactive; selector: name=mw2250.codfw.wmnet
  • 15:55 jbond@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
  • 15:55 jbond@cumin1001: START - Cookbook sre.hosts.downtime
  • 15:46 jmm@cumin2001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
  • 15:46 jmm@cumin2001: START - Cookbook sre.hosts.downtime
  • 15:46 jbond@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
  • 15:46 jbond@cumin1001: START - Cookbook sre.hosts.downtime
  • 15:42 jbond@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
  • 15:42 jbond@cumin1001: START - Cookbook sre.hosts.downtime
  • 15:37 jbond@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
  • 15:37 jbond@cumin1001: START - Cookbook sre.hosts.downtime
  • 15:33 jbond@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
  • 15:33 jbond@cumin1001: START - Cookbook sre.hosts.downtime
  • 15:28 moritzm: rolling reboot of Kubernetes etcd nodes in eqiad
  • 15:28 jmm@cumin2001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
  • 15:28 jmm@cumin2001: START - Cookbook sre.hosts.downtime
  • 15:28 jbond@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
  • 15:28 jbond@cumin1001: START - Cookbook sre.hosts.downtime
  • 15:26 jeh: rebooting cloudweb2001-dev.codfw T224228
  • 15:23 jbond@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
  • 15:23 jbond@cumin1001: START - Cookbook sre.hosts.downtime
  • 15:18 jbond@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
  • 15:18 jbond@cumin1001: START - Cookbook sre.hosts.downtime
  • 15:18 jeh: rebooting clouddb2001-dev.codfw T224228
  • 15:15 jbond@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
  • 15:14 jbond@cumin1001: START - Cookbook sre.hosts.downtime
  • 15:10 jbond@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
  • 15:10 jbond@cumin1001: START - Cookbook sre.hosts.downtime
  • 15:05 moritzm: rolling reboot of Kubernetes etcd nodes in codfw
  • 15:05 jbond@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
  • 15:05 jbond@cumin1001: START - Cookbook sre.hosts.downtime
  • 15:04 jmm@cumin2001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
  • 15:04 jmm@cumin2001: START - Cookbook sre.hosts.downtime
  • 15:04 jeh: rebooting cloudservices2002-dev.codfw T224228
  • 15:01 jbond@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
  • 15:01 jbond@cumin1001: START - Cookbook sre.hosts.downtime
  • 14:57 jbond@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
  • 14:57 jbond@cumin1001: START - Cookbook sre.hosts.downtime
  • 14:55 jeh: rebooting cloudnet2003-dev.codfw T224228
  • 14:52 jbond@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
  • 14:51 jbond@cumin1001: START - Cookbook sre.hosts.downtime
  • 14:47 jbond@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
  • 14:47 jbond@cumin1001: START - Cookbook sre.hosts.downtime
  • 14:42 jbond@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
  • 14:42 jbond@cumin1001: START - Cookbook sre.hosts.downtime
  • 14:39 jeh: rebooting cloudnet2002-dev.codfw T224228
  • 14:38 jbond@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
  • 14:38 jbond@cumin1001: START - Cookbook sre.hosts.downtime
  • 14:33 jbond@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
  • 14:33 jbond@cumin1001: START - Cookbook sre.hosts.downtime
  • 14:22 jbond@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
  • 14:22 jbond@cumin1001: START - Cookbook sre.hosts.downtime
  • 14:18 jbond@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
  • 14:18 jbond@cumin1001: START - Cookbook sre.hosts.downtime
  • 14:13 jbond@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
  • 14:13 jbond@cumin1001: START - Cookbook sre.hosts.downtime
  • 14:13 ema: pool cp1076 w/ ATS backend T226638
  • 14:09 jbond@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
  • 14:09 jbond@cumin1001: START - Cookbook sre.hosts.downtime
  • 14:06 XioNoX: power off msw1-codfw - T224250
  • 14:05 jbond@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
  • 14:05 jbond@cumin1001: START - Cookbook sre.hosts.downtime
  • 14:00 jbond@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
  • 14:00 jbond@cumin1001: START - Cookbook sre.hosts.downtime
  • 13:54 jbond@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
  • 13:54 jbond@cumin1001: START - Cookbook sre.hosts.downtime
  • 13:54 XioNoX: remove all mentions of sampling (curently disabled) on cr2-esams to try to reduce memory usage
  • 13:51 jbond@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
  • 13:51 jbond@cumin1001: START - Cookbook sre.hosts.downtime
  • 13:33 moritzm: rebooting doc1001 to pick up MDS-enabled qemu
  • 13:30 jmm@cumin2001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
  • 13:30 jmm@cumin2001: START - Cookbook sre.hosts.downtime
  • 13:24 jynus: upgrade and restart db2097 T225378
  • 13:08 ema: depool cp1076 and reimage as upload_ats T226638
  • 13:07 ema: depool cp1076 and reimage as upload_ats T226637
  • 12:55 marostegui: Drop secret and stratch_tokens columns from centralauth (s7) T226826
  • 12:53 ema: pool cp2026 w/ ATS backend T226637
  • 12:50 Urbanecm: foreachwiki refreshImageMetadata.php --mediatype=AUDIO --mime=audio/mid --force completed (T226784)
  • 12:40 Urbanecm: Started foreachwiki refreshImageMetadata.php --mediatype=AUDIO --mime=audio/mid --force for T226784 on mwmaint1002 in a tmux
  • 12:40 moritzm: rebooting mendelevium (ticket.wikimedia.org) to pick up MDS-enabled qemu
  • 12:39 jmm@cumin2001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
  • 12:39 jmm@cumin2001: START - Cookbook sre.hosts.downtime
  • 12:35 moritzm: rebooting dubnium/pollux (corp LDAP replicas) to pick up MDS-enabled qemu
  • 12:34 jmm@cumin2001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
  • 12:34 jmm@cumin2001: START - Cookbook sre.hosts.downtime
  • 12:31 moritzm: rebooting neon (kubernetes staging master) to pick up MDS-enabled qemu
  • 12:30 jmm@cumin2001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
  • 12:30 jmm@cumin2001: START - Cookbook sre.hosts.downtime
  • 12:24 moritzm: rebooting bromine to pick up MDS-enabled qemu
  • 12:24 jmm@cumin2001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
  • 12:24 jmm@cumin2001: START - Cookbook sre.hosts.downtime
  • 12:21 moritzm: rebooting pybal-test hosts to pick up MDS-enabled qemu
  • 12:19 jmm@cumin2001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
  • 12:19 jmm@cumin2001: START - Cookbook sre.hosts.downtime
  • 12:14 ema: reimage cp2026 as upload_ats T226637
  • 12:13 kart_: Updated cxserver to b447674 (T226611)
  • 12:10 kartik@deploy1001: scap-helm cxserver finished
  • 12:10 kartik@deploy1001: scap-helm cxserver cluster eqiad completed
  • 12:10 kartik@deploy1001: scap-helm cxserver upgrade -f cxserver-eqiad-values.yaml production stable/cxserver [namespace: cxserver, clusters: eqiad]
  • 12:09 kartik@deploy1001: scap-helm cxserver finished
  • 12:09 kartik@deploy1001: scap-helm cxserver cluster codfw completed
  • 12:09 kartik@deploy1001: scap-helm cxserver upgrade -f cxserver-codfw-values.yaml production stable/cxserver [namespace: cxserver, clusters: codfw]
  • 12:07 kartik@deploy1001: scap-helm cxserver finished
  • 12:07 kartik@deploy1001: scap-helm cxserver cluster staging completed
  • 12:07 kartik@deploy1001: scap-helm cxserver upgrade -f cxserver-staging-values.yaml staging stable/cxserver [namespace: cxserver, clusters: staging]
  • 11:55 reedy@deploy1001: Synchronized php-1.34.0-wmf.11/extensions/TimedMediaHandler/: T226840 (duration: 00m 50s)
  • 11:29 moritzm: ran puppet clean/deactivate and debdeploy removal for cp3037 (host is broken for a long time and triggering failing Cumin/debdeploy runs) T227077
  • 11:14 Urbanecm: EU SWAT done
  • 11:14 Urbanecm: Ran mwscript namespaceDupes.php --wiki=pawikisource --fix for T226959
  • 11:12 urbanecm@deploy1001: Synchronized wmf-config/throttle.php: SWAT: Add new throttle rule for enwiki event (T227059) (duration: 00m 48s)
  • 11:11 urbanecm@deploy1001: Synchronized wmf-config/throttle-analyze.php: SWAT: [throttle-analyze] Grant autoconfirmed permission to user when throttle rule is applied (T204583) (duration: 00m 49s)
  • 11:11 moritzm: rebooting people1001 (people.wikimedia.org) to pick up MDS-enabled qemu
  • 11:06 urbanecm@deploy1001: Synchronized wmf-config/InitialiseSettings.php: SWAT: Configuring Namespaces at pawikisource (T226959) (duration: 00m 52s)
  • 11:05 moritzm: rebooting krypton nodes to pick up MDS-enabled qemu
  • 11:05 jmm@cumin2001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
  • 11:05 jmm@cumin2001: START - Cookbook sre.hosts.downtime
  • 11:04 jmm@cumin2001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
  • 11:04 jmm@cumin2001: START - Cookbook sre.hosts.downtime
  • 10:36 Amir1: start of ladsgroup@mwmaint1002:~$ foreachwikiindblist wiktionary extensions/Cognate/maintenance/populateCognatePages.php (T226358)
  • 10:12 jmm@cumin2001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
  • 10:11 jmm@cumin2001: START - Cookbook sre.hosts.downtime
  • 10:11 moritzm: rolling reboot of eventschema service hosts to pick up MDS-enabled qemu
  • 10:00 marostegui: Drop secret and stratch_tokens columns from the private wiki list T226826
  • 09:58 jmm@cumin2001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
  • 09:58 jmm@cumin2001: START - Cookbook sre.hosts.downtime
  • 09:54 moritzm: rebooting netmon2001 for kernel security update
  • 09:52 jmm@cumin2001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
  • 09:52 jmm@cumin2001: START - Cookbook sre.hosts.downtime
  • 09:47 moritzm: rebooting debmonitor nodes to pick up MDS-enabled qemu
  • 09:47 jmm@cumin2001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
  • 09:46 jmm@cumin2001: START - Cookbook sre.hosts.downtime
  • 09:27 moritzm: rebooting failoid nodes to pick up MDS-enabled qemu
  • 09:25 jmm@cumin2001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
  • 09:25 jmm@cumin2001: START - Cookbook sre.hosts.downtime
  • 09:01 moritzm: rolling reboot of kubernetes masters in eqiad to pick up MDS-enabled qemu
  • 08:44 moritzm: rolling reboot of kubernetes masters in codfw to pick up MDS-enabled qemu
  • 08:44 moritzm: rolling reboot of kubernetes masters in codfw
  • 08:43 jmm@cumin2001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
  • 08:43 jmm@cumin2001: START - Cookbook sre.hosts.downtime
  • 07:45 jmm@cumin2001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
  • 07:45 jmm@cumin2001: START - Cookbook sre.hosts.downtime
  • 07:34 godog: reenable puppet fleetwide
  • 07:33 marostegui: Upgrade db2079 (s8 codfw master)
  • 07:25 marostegui: Upgrade db2100 (snapshots on that hosts are finished)
  • 07:24 godog: temporarily disable puppet to test/apply https://gerrit.wikimedia.org/r/c/operations/puppet/+/520012
  • 07:23 moritzm: updated buster installer d-i image to RC3
  • 07:10 marostegui: Drop secret and scratch_tokens from labswiki (wikitech) and labstestwiki - T226826
  • 07:06 marostegui: Drop secret and scratch_tokens from fishbowl wiki list T226826
  • 07:05 godog: add 150G to graphite hosts lv, was at 94% utilization
  • 06:55 godog: depool and roll-restart swift proxy - T209182
  • 06:42 marostegui@deploy1001: Synchronized wmf-config/db-eqiad.php: Clarify db1069 status (duration: 00m 28s)
  • 06:01 marostegui@deploy1001: Synchronized wmf-config/db-eqiad.php: Switchover x1 master eqiad from db1069 to db1120 T226358 (duration: 00m 27s)
  • 06:00 marostegui: Starting x1 failover from db1069 to db1120 - T226358
  • 06:00 elukey: move the zookeeper puppet submodule into operations/puppet - T226466
  • 05:52 vgutierrez@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
  • 05:52 vgutierrez@cumin1001: START - Cookbook sre.hosts.downtime
  • 05:21 vgutierrez@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
  • 05:21 vgutierrez@cumin1001: START - Cookbook sre.hosts.downtime
  • 05:03 vgutierrez: restarting pybal on lvs4006
  • 05:02 marostegui: Start pre-failover steps for x1 - T226358
  • 04:47 vgutierrez@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
  • 04:47 vgutierrez@cumin1001: START - Cookbook sre.hosts.downtime
  • 04:34 vgutierrez@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
  • 04:34 vgutierrez@cumin1001: START - Cookbook sre.hosts.downtime
  • 04:24 vgutierrez@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
  • 04:24 vgutierrez@cumin1001: START - Cookbook sre.hosts.downtime
  • 04:23 vgutierrez: rebooting primary lvs servers for MDS security updates
  • 00:14 eileen: process-control config revision is 8e215d07f2 (renable jobs)
  • 00:08 eileen: civicrm revision is 8a4451f390, config revision is ec8c43ee86 Redis
  • 00:05 eileen: process-control config revision is ec8c43ee86 (Redis turned on)

2019-07-02

  • 23:42 eileen: civicrm revision is 8a4451f390, config revision is c02a038331 (mysql locks enabled)
  • 23:36 catrope@deploy1001: Synchronized php-1.34.0-wmf.11/extensions/Echo/: T226594 (duration: 00m 51s)
  • 23:34 catrope@deploy1001: Synchronized php-1.34.0-wmf.11/skins/MonoBook/: T226594 (duration: 00m 50s)
  • 22:35 eileen: civicrm revision changed from 96985fcc4b to 8a4451f390, config revision is af9e657134
  • 20:35 mutante: contint1001 - created new partitions on /dev/sdc and /dev/sdd; created new RAID 1 over /dev/sdc1 and /dev/sdd1
  • 20:28 smalyshev@deploy1001: Finished deploy [wdqs/wdqs@cc60181]: Weekly WDQS deploy (duration: 14m 43s)
  • 20:20 mutante: contint1001 - temp installing parted for labeling new disks sdc and sdd for raid for docker images (T207707)
  • 20:13 smalyshev@deploy1001: Started deploy [wdqs/wdqs@cc60181]: Weekly WDQS deploy
  • 19:37 krinkle@deploy1001: Finished scap: l10n sync did not work as expected, try full scap to fix missing i18n message for 9963d843622 (duration: 18m 24s)
  • 19:18 krinkle@deploy1001: Started scap: l10n sync did not work as expected, try full scap to fix missing i18n message for 9963d843622
  • 19:07 krinkle@deploy1001: scap sync-l10n completed (1.34.0-wmf.11) (duration: 00m 47s)
  • 19:05 krinkle@deploy1001: Synchronized php-1.34.0-wmf.11/extensions/AbuseFilter/: 9963d843622b / T227095 (duration: 00m 51s)
  • 19:03 krinkle@deploy1001: scap sync-l10n completed (1.34.0-wmf.11) (duration: 00m 48s)
  • 19:00 mholloway-shell@deploy1001: Finished deploy [recommendation-api/deploy@a29da76]: Update recommendation-api to 4f50c71 (duration: 02m 50s)
  • 18:57 mholloway-shell@deploy1001: Started deploy [recommendation-api/deploy@a29da76]: Update recommendation-api to 4f50c71
  • 18:07 XioNoX: setup tunnel between eqord and eqiad - T226158
  • 17:49 bsitzmann@deploy1001: Finished deploy [mobileapps/deploy@9ca9b0f]: Update mobileapps to 941e14f (T219998 T217352 T219909) (duration: 05m 49s)
  • 17:43 bsitzmann@deploy1001: Started deploy [mobileapps/deploy@9ca9b0f]: Update mobileapps to 941e14f (T219998 T217352 T219909)
  • 16:59 hashar: CI is back, I had to restart Zuul :-\ T227111
  • 16:55 hashar: Starting Jenkins and Zuul T227111
  • 16:53 bblack@cumin1001: conftool action : set/pooled=yes; selector: name=dns2002.wikimedia.org
  • 16:52 hashar: Stopping Jenkins and Zuul T227111
  • 16:32 bblack: testing failure scenarios on dns2002, possible false-alarm alerts (depooled from LVS recdns)
  • 16:31 bblack@cumin1001: conftool action : set/pooled=no; selector: name=dns2002.wikimedia.org
  • 16:31 bblack: depool dns2002 from recdns server for testing
  • 16:30 hashar: CI code-review +2 changes are not quite processed for some unknown reason T227111
  • 16:19 XioNoX: add term allow-anycast-dns in filter labs-in4
  • 15:55 ema: depool cp2026 and reimage as upload_ats T226637
  • 15:47 ema: pool cp2025 w/ ATS backend T226637
  • 15:43 XioNoX: "Equinix will be expanding the DA IX subnet from a /24 to a /23." (cf. email)
  • 15:34 XioNoX: Add BGP to AS15830 in AMS-IX
  • 15:26 XioNoX: add centrallog1001 to routers ACLs - T226813
  • 15:20 Krinkle: Set repo back from active to read-only https://gerrit.wikimedia.org/r/#/admin/projects/operations/puppet/cdh (T226474))
  • 14:58 jijiki: Run restart-php-fpm in all-mw-codfw - T223391
  • 14:49 ema: depool cp2025 and reimage as upload_ats T226637
  • 14:47 XioNoX: add anycast BGP statement to eqsin
  • 14:25 jbond42: restart apache2 on phab1003
  • 14:22 XioNoX: add DNS anycast BGP statement to cr3-ulsfo
  • 14:18 ema: pool cp2024 w/ ATS backend T226637
  • 14:13 otto@deploy1001: Finished deploy [eventstreams/deploy@de1d356]: Limit concurrent number of connections per X-Client-IP - T226808 (duration: 06m 17s)
  • 14:07 otto@deploy1001: Started deploy [eventstreams/deploy@de1d356]: Limit concurrent number of connections per X-Client-IP - T226808
  • 14:02 bblack: deploying anycast_healthchecker changes to the recdnses (puppet disabled on all, testing dns4002 first) - https://gerrit.wikimedia.org/r/#/c/operations/puppet/+/397723/
  • 13:31 marostegui: Upgrade db2086
  • 13:27 ema: depool cp2024 and reimage as upload_ats T226637
  • 13:26 XioNoX: test fix policy ASXXX_in (missing `then next policy`)
  • 13:23 marostegui: Upgrade db2085
  • 13:13 XioNoX: push RPKI classification to eqiad - T220669
  • 13:09 XioNoX: push RPKI classification to eqsin - T220669
  • 13:06 ema: pool cp2022 w/ ATS backend T226637
  • 12:51 XioNoX: push RPKI classification to AMS - T220669
  • 12:47 marostegui: Upgrade db2082 - T227062
  • 12:30 jijiki: Power cycle ms-be1021 - T227076
  • 11:51 ema: depool cp2022 and reimage as upload_ats T226637
  • 11:40 Urbanecm: EU SWAT really done
  • 11:37 Urbanecm: Ran mwscript resetAuthenticationThrottle.php --wiki=metawiki --signup --ip 86.49.134.37 for T225555
  • 11:37 urbanecm@deploy1001: Synchronized wmf-config/throttle.php: SWAT: Add new throttle rule for cswiki workshop (T225555) (duration: 00m 49s)
  • 11:33 Urbanecm: Reopen EU SWAT for last-time throttle rule
  • 11:33 moritzm: re-enabled meta monitoring for icinga2001
  • 11:26 moritzm: rebooting icinga2001 for kernel security update
  • 11:26 jijiki: Run restart-php-fpm in all-mw-eqiad - T223391
  • 11:25 jmm@cumin2001: END (FAIL) - Cookbook sre.hosts.downtime (exit_code=99)
  • 11:25 jmm@cumin2001: START - Cookbook sre.hosts.downtime
  • 11:23 moritzm: temporarily disabled meta monitoring for icinga2001
  • 11:16 dcausse: EU Swat done
  • 11:15 dcausse@deploy1001: Synchronized php-1.34.0-wmf.11/extensions/CirrusSearch/includes/Updater.php: T226592: Ignore broken redirects when updating incoming link counts (duration: 00m 49s)
  • 11:06 dcausse@deploy1001: Synchronized wmf-config/InitialiseSettings.php: [cirrus] Enable UTR30 as a lookup method for ns prefixes on group1 (duration: 00m 50s)
  • 10:47 jijiki: Rollout Wikidiff 1.8.2 to eqiad - T223391
  • 10:45 jijiki: Rollout Wikidiff 1.8.2 to codfw - T223391
  • 10:32 vgutierrez@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
  • 10:32 vgutierrez@cumin1001: START - Cookbook sre.hosts.downtime
  • 10:21 moritzm: draining restbase1027 for eventual reboot for MDS security updates / OpenJDK security update
  • 10:15 moritzm: draining restbase1026 for eventual reboot for MDS security updates / OpenJDK security update
  • 10:15 vgutierrez@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
  • 10:15 vgutierrez@cumin1001: START - Cookbook sre.hosts.downtime
  • 10:05 elukey: powercycle analytics1056 (soft lockups logged in the serial console, no ssh, no net connectivity)
  • 10:05 vgutierrez@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
  • 10:05 vgutierrez@cumin1001: START - Cookbook sre.hosts.downtime
  • 10:02 ema: pool cp2020 w/ ATS backend T226637
  • 10:00 vgutierrez@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
  • 10:00 vgutierrez@cumin1001: START - Cookbook sre.hosts.downtime
  • 09:58 godog: restart rsyslog on wezen - T199406
  • 09:55 moritzm: draining restbase1025 for eventual reboot for MDS security updates / OpenJDK security update
  • 09:53 vgutierrez@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
  • 09:53 vgutierrez@cumin1001: START - Cookbook sre.hosts.downtime
  • 09:50 vgutierrez: rebooting secondary lvs servers for MDS security updates
  • 09:48 jmm@cumin2001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
  • 09:48 jmm@cumin2001: START - Cookbook sre.hosts.downtime
  • 09:46 moritzm: draining restbase1024 for eventual reboot for MDS security updates / OpenJDK security update
  • 09:39 marostegui: Upgrade db2094 (codfw sanitarium) T227062
  • 09:39 moritzm: draining restbase1023 for eventual reboot for MDS security updates / OpenJDK security update
  • 09:34 marostegui: Upgrade mysql on 2080 db2081 db2083 - T227062
  • 09:29 moritzm: draining restbase1022 for eventual reboot for MDS security updates / OpenJDK security update
  • 09:28 jmm@cumin2001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
  • 09:28 jmm@cumin2001: START - Cookbook sre.hosts.downtime
  • 09:22 moritzm: draining restbase1021 for eventual reboot for MDS security updates / OpenJDK security update
  • 09:11 moritzm: draining restbase1020 for eventual reboot for MDS security updates / OpenJDK security update
  • 09:00 moritzm: draining restbase1019 for eventual reboot for MDS security updates / OpenJDK security update
  • 08:56 jmm@cumin2001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
  • 08:56 jmm@cumin2001: START - Cookbook sre.hosts.downtime
  • 08:56 jmm@cumin2001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
  • 08:55 jmm@cumin2001: START - Cookbook sre.hosts.downtime
  • 08:55 ema: depool cp2020 and reimage as upload_ats T226637
  • 08:52 moritzm: draining restbase1018 for eventual reboot for MDS security updates / OpenJDK security update
  • 08:50 ema: pool cp2018 w/ ATS backend T226637
  • 08:36 moritzm: draining restbase1017 for eventual reboot for MDS security updates / OpenJDK security update
  • 08:20 moritzm: draining restbase1016 for eventual reboot for MDS security updates / OpenJDK security update
  • 08:19 jmm@cumin2001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
  • 08:19 jmm@cumin2001: START - Cookbook sre.hosts.downtime
  • 08:10 godog: restbase spare hosts, mask and stop restbase - T227054
  • 07:58 moritzm: draining restbase2020 for eventual reboot for MDS security updates / OpenJDK security update
  • 07:55 ema: depool cp2018 and reimage as upload_ats T226637
  • 07:48 moritzm: draining restbase2019 for eventual reboot for MDS security updates / OpenJDK security update
  • 07:48 jmm@cumin2001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
  • 07:48 jmm@cumin2001: START - Cookbook sre.hosts.downtime
  • 07:47 jmm@cumin2001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
  • 07:47 jmm@cumin2001: START - Cookbook sre.hosts.downtime
  • 07:20 marostegui@deploy1001: Synchronized wmf-config/db-eqiad.php: Fully repool db1092 (duration: 00m 49s)
  • 07:05 marostegui@deploy1001: Synchronized wmf-config/db-eqiad.php: More traffic to db1092 (duration: 00m 49s)
  • 05:48 marostegui@deploy1001: Synchronized wmf-config/db-eqiad.php: Slowly repool db1092 into API (duration: 00m 49s)
  • 05:37 marostegui@deploy1001: Synchronized wmf-config/db-eqiad.php: Slowly repool db1092 (duration: 00m 48s)
  • 05:23 marostegui: Upgrade MySQL and kernel on db1092
  • 05:23 marostegui@deploy1001: Synchronized wmf-config/db-eqiad.php: Depool db1092 (duration: 00m 54s)
  • 01:39 milimetric@deploy1001: Finished deploy [analytics/refinery@b8a496b]: fix private sqoop (duration: 17m 36s)
  • 01:21 milimetric@deploy1001: Started deploy [analytics/refinery@b8a496b]: fix private sqoop

2019-07-01

  • 20:33 milimetric@deploy1001: Finished deploy [analytics/refinery@4e9894c]: minor, just removing hiwikisource from sqoop list (duration: 01m 33s)
  • 20:32 milimetric@deploy1001: Started deploy [analytics/refinery@4e9894c]: minor, just removing hiwikisource from sqoop list
  • 20:32 milimetric@deploy1001: Finished deploy [analytics/refinery@4e9894c]: minor, just removing hiwikisource from sqoop list (duration: 16m 59s)
  • 20:15 milimetric@deploy1001: Started deploy [analytics/refinery@4e9894c]: minor, just removing hiwikisource from sqoop list
  • 19:31 tzatziki: removing nine files for legal compliance
  • 19:04 catrope@deploy1001: Synchronized wmf-config/InitialiseSettings.php: Enable Special:Homepage for 50% of new users on viwiki (duration: 00m 49s)
  • 18:53 urbanecm@deploy1001: Synchronized php-1.34.0-wmf.11/extensions/ContentTranslation: SWAT: Require only one user group to allow publishing to main namespace (T225398) (duration: 00m 49s)
  • 18:49 urbanecm@deploy1001: Synchronized wmf-config/InitialiseSettings.php: SSWAT: Dont show cannot publish error to sysop users (T225398) (duration: 00m 49s)
  • 18:46 urbanecm@deploy1001: Synchronized php-1.34.0-wmf.11/extensions/CentralAuth: SWAT: Require only one user group to allow publishing to main namespace (T225398) (duration: 00m 51s)
  • 18:35 elukey@cumin1001: END (PASS) - Cookbook sre.ganeti.makevm (exit_code=0)
  • 18:29 catrope@deploy1001: Synchronized wmf-config/InitialiseSettings.php: Enable Special:Homepage on viwiki (T218237) (duration: 00m 49s)
  • 18:23 catrope@deploy1001: Synchronized wmf-config/InitialiseSettings.php: Enable EditorJourney on arwiki (T225737) (duration: 00m 49s)
  • 17:27 elukey@cumin1001: START - Cookbook sre.ganeti.makevm
  • 16:01 ema: pool cp2017 w/ ATS backend T226637
  • 15:42 moritzm: draining restbase2018 for eventual reboot for MDS kernel updates
  • 15:37 elukey@cumin1001: END (FAIL) - Cookbook sre.ganeti.makevm (exit_code=99)
  • 15:36 elukey@cumin1001: START - Cookbook sre.ganeti.makevm
  • 15:31 moritzm: draining restbase2017 for eventual reboot for MDS kernel updates
  • 15:27 elukey@cumin1001: END (FAIL) - Cookbook sre.ganeti.makevm (exit_code=99)
  • 15:27 elukey@cumin1001: START - Cookbook sre.ganeti.makevm
  • 15:19 moritzm: draining restbase2016 for eventual reboot for MDS kernel updates
  • 15:15 jmm@cumin2001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
  • 15:15 jmm@cumin2001: START - Cookbook sre.hosts.downtime
  • 15:11 moritzm: draining restbase2015 for eventual reboot for MDS kernel updates
  • 14:54 moritzm: draining restbase2014 for eventual reboot for MDS kernel updates
  • 14:50 moritzm: installing openjdk-8 security updates on stretch-based restbase hosts
  • 14:45 ejegg: updated payments-wiki from 86381aeeff to 5f974d2386
  • 14:44 moritzm: draining restbase2013 for eventual reboot for MDS kernel updates
  • 14:41 jmm@cumin2001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
  • 14:41 jmm@cumin2001: START - Cookbook sre.hosts.downtime
  • 14:25 ema: depool cp2017 and reimage as upload_ats T226637
  • 14:24 moritzm: draining restbase2012 for eventual reboot for MDS kernel updates
  • 14:10 moritzm: rolling reboot of docker registry nodes to pick up MDS-enabled qemu
  • 14:09 jmm@cumin2001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
  • 14:08 jmm@cumin2001: START - Cookbook sre.hosts.downtime
  • 14:02 moritzm: draining restbase2011 for eventual reboot for MDS kernel updates
  • 13:54 moritzm: draining restbase2010 for eventual reboot for MDS kernel updates
  • 13:53 jmm@cumin2001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
  • 13:53 jmm@cumin2001: START - Cookbook sre.hosts.downtime
  • 13:43 ottomata: modified dt format of webrequest logs to use 'Z' suffix for timezone offset - T217040
  • 13:42 jbond42: rolling update of expat
  • 13:41 fsero: uploading helmfile to jessie as well
  • 13:38 moritzm: draining restbase2009 for eventual reboot for MDS kernel updates
  • 13:38 jmm@cumin2001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
  • 13:38 jmm@cumin2001: START - Cookbook sre.hosts.downtime
  • 13:20 akosiaris: repool eqiad after kubernetes upgrades. T226256
  • 13:20 akosiaris@puppetmaster1001: conftool action : set/pooled=true; selector: dnsdisc=sessionstore|citoid|cxserver|eventgate-analytics|eventgate-main|termbox|blubberoid|mathoid|zotero,name=eqiad
  • 12:51 akosiaris: depool eqiad for kubernetes upgrades. T226256
  • 12:51 akosiaris@puppetmaster1001: conftool action : set/pooled=true; selector: dnsdisc=sessionstore|citoid|cxserver|eventgate-analytics|eventgate-main|termbox|blubberoid|mathoid|zotero,name=eqiad
  • 12:49 akosiaris@puppetmaster1001: conftool action : set/pooled=true; selector: dnsdisc=sessionstore|citoid|cxserver|eventgate-analytics|eventgate-main|termbox|blubberoid|mathoid|zotero,name=codfw
  • 12:49 akosiaris: repool codfw after kubernetes upgrades. T226256
  • 12:01 akosiaris: depool codfw for kubernetes upgrades. T226256
  • 12:01 akosiaris@puppetmaster1001: conftool action : set/pooled=false; selector: dnsdisc=sessionstore|citoid|cxserver|eventgate-analytics|eventgate-main|termbox|blubberoid|mathoid|zotero,name=codfw
  • 11:36 Urbanecm: EU SWAT done
  • 11:36 urbanecm@deploy1001: Synchronized wmf-config/InitialiseSettings.php: SWAT: Clean up wgNamespaceAliases (T226765) (duration: 00m 49s)
  • 11:27 apergos: urbanecm@deploy1001 Synchronized php-1.34.0-wmf.11/includes/: SWAT: Join slot and content tables when dumping XML (T220493) (duration: 01m 14s)
  • 11:12 jbond42: rolling upgrade of facter3
  • 11:12 jbond42: upload facter_3.11.0-2~debu9u2+wmf1 to stretch-wikimedia component/facter3
  • 11:10 urbanecm@deploy1001: Synchronized wmf-config/abusefilter.php: Add abusefilter-view-private to checkusers on arwiki (T226899) (duration: 00m 49s)
  • 11:06 urbanecm@deploy1001: Synchronized dblists/: Close wikimania2018.wikimedia.org (T201188) (duration: 00m 49s)
  • 10:04 elukey: remove burrow-analytics.service from kafkamon1001 (the analytics cluster has been decommed)
  • 09:55 elukey: reboot kafkamon1001 with 4g of dedicated ram (was 8g) - T224988
  • 09:54 elukey: reboot kafkamon2001 with 4g of dedicated ram (was 8g) - T224988
  • 09:54 godog: swift eqiad-prod eqiad-prod: put back ms-be1033 - T223518
  • 09:33 _joe_: removing python-conftool from all hosts where it's still installed
  • 09:16 _joe_: update python3-etcd, python3-conftool to their latest versions T226965
  • 09:13 ladsgroup@deploy1001: Synchronized wmf-config/InitialiseSettings.php: Noop: Do not load InitialiseSettings-labs.php multiple times (T224899) (duration: 00m 51s)
  • 08:39 elukey: restart hadoop-yarn-nodemanager on all hadoop workers to pick up new jvm settings - T225296
  • 07:04 ema: pool cp2014 w/ ATS backend T226637
  • 06:16 ema: depool cp2014 and reimage as upload_ats T226637
  • 04:53 marostegui: Keep compressing tables on labsdb1011 - T222978
  • 04:50 marostegui: Reload haproxy on dbproxy1010 and dbproxy1011 to depool labsdb1011 - T222978
  • 04:49 marostegui: Change pt-kill value on labsdb1009 temporarily, from 300 to 14400 T222978

2019-06-30

  • 23:27 urbanecm@deploy1001: Synchronized wmf-config/interwiki.php: Update interwiki cache (duration: 02m 45s)
  • 07:05 Urbanecm: Remove 2FA from User:SQL (T226918)

2019-06-28

  • 21:19 otto@deploy1001: Finished deploy [eventstreams/deploy@2af2719]: Manually blacklisting IP - T226808 (duration: 03m 07s)
  • 21:16 otto@deploy1001: Started deploy [eventstreams/deploy@2af2719]: Manually blacklisting IP - T226808
  • 20:09 jforrester@deploy1001: Synchronized php-1.34.0-wmf.11/extensions/Wikibase/repo/RepoHooks.php: Make it possible for File pages to be moved on Commons again T224303 T226672 (duration: 00m 50s)
  • 19:49 jforrester@deploy1001: Synchronized wmf-config/mobile.php: T221196 VE mobile A/B test part 2 (duration: 00m 49s)
  • 19:48 jforrester@deploy1001: Synchronized wmf-config/InitialiseSettings.php: T221196 VE mobile A/B test part 1 (duration: 00m 50s)
  • 19:05 joal@deploy1001: Finished deploy [analytics/refinery@de8eb99]: Missing bit of regular analytics deploy (duration: 02m 08s)
  • 19:03 joal@deploy1001: Started deploy [analytics/refinery@de8eb99]: Missing bit of regular analytics deploy
  • 18:51 joal@deploy1001: Finished deploy [analytics/refinery@de8eb99]: Missing bit of regular analytics deploy (duration: 17m 47s)
  • 18:33 joal@deploy1001: Started deploy [analytics/refinery@de8eb99]: Missing bit of regular analytics deploy
  • 18:14 joal@deploy1001: Finished deploy [analytics/refinery@8d6fa30]: Late regular analytics weekly deploy - notebook1004 only (duration: 01m 03s)
  • 18:13 joal@deploy1001: Started deploy [analytics/refinery@8d6fa30]: Late regular analytics weekly deploy - notebook1004 only
  • 18:12 elukey: systemctl reset-failed kafka* units on kafka2001 (in decom phase)
  • 18:12 joal@deploy1001: Finished deploy [analytics/refinery@8d6fa30]: Late regular analytics weekly deploy - notebook1003 only again (duration: 00m 26s)
  • 18:11 joal@deploy1001: Started deploy [analytics/refinery@8d6fa30]: Late regular analytics weekly deploy - notebook1003 only again
  • 18:09 joal@deploy1001: Finished deploy [analytics/refinery@8d6fa30]: Late regular analytics weekly deploy - notebook1003 only (duration: 00m 05s)
  • 18:09 joal@deploy1001: Started deploy [analytics/refinery@8d6fa30]: Late regular analytics weekly deploy - notebook1003 only
  • 18:08 joal@deploy1001: Finished deploy [analytics/refinery@8d6fa30]: Late regular analytics weekly deploy - notebook1003 only (duration: 00m 04s)
  • 18:08 joal@deploy1001: Started deploy [analytics/refinery@8d6fa30]: Late regular analytics weekly deploy - notebook1003 only
  • 18:06 joal@deploy1001: Finished deploy [analytics/refinery@8d6fa30]: Laste regular analytics weekly deploy (duration: 53m 35s)
  • 17:53 cdanis: increasing nginx proxy_buffer_size / proxy_buffers 02d7bcaa
  • 17:36 ottomata: restarting eventstreams on scb1001 with trace logging of X-Client-IP for T226808
  • 17:13 joal@deploy1001: Started deploy [analytics/refinery@8d6fa30]: Laste regular analytics weekly deploy
  • 16:35 bblack: Raising varnish max_http_hdr (max allowed applayer response header count) from 64->128 in systemd config and live tuning - https://gerrit.wikimedia.org/r/519661 - T226840
  • 15:04 eevans@deploy1001: scap-helm sessionstore finished
  • 15:04 eevans@deploy1001: scap-helm sessionstore cluster codfw completed
  • 15:04 eevans@deploy1001: scap-helm sessionstore upgrade production -f sessionstore-codfw-values.yaml stable/kask [namespace: sessionstore, clusters: codfw]
  • 15:02 eevans@deploy1001: scap-helm sessionstore finished
  • 15:02 eevans@deploy1001: scap-helm sessionstore cluster eqiad completed
  • 15:02 eevans@deploy1001: scap-helm sessionstore upgrade production -f sessionstore-eqiad-values.yaml stable/kask [namespace: sessionstore, clusters: eqiad]
  • 14:48 ema: pool cp2011 w/ ATS backend T226637
  • 14:47 XioNoX: upload kafkatee to buster-wikimedia
  • 14:11 eevans@deploy1001: scap-helm sessionstore finished
  • 14:11 eevans@deploy1001: scap-helm sessionstore cluster staging completed
  • 14:11 eevans@deploy1001: scap-helm sessionstore upgrade staging -f sessionstore-staging-values.yaml stable/kask [namespace: sessionstore, clusters: staging]
  • 14:07 eevans@deploy1001: scap-helm sessionstore upgrade production -f sessionstore-staging-values.yaml stable/kask [namespace: sessionstore, clusters: staging]
  • 14:06 ema: depool cp2011 and reimage as upload_ats T226637
  • 11:36 elukey: roll restart eventstreams on all scb1* nodes
  • 11:33 elukey: restart eventstreams on scb1001
  • 11:18 fsero: draining kubernetes1006 for applying updates
  • 11:14 fsero: draining kubernetes1005 for applying updates
  • 11:13 fsero: draining kubernetes2006 for applying updates
  • 11:09 fsero: draining kubernetes2005 for applying updates
  • 11:04 _joe_: uploading php-wmerrors to thirdparty/php72 - T187147
  • 10:31 Reedy: running `foreachwiki extensions/TimedMediaHandler/maintenance/requeueTranscodes.php --audio --mime=audio/midi --missing --throttle` on mwmaint1002 in screen T226713
  • 10:20 reedy@deploy1001: Synchronized php-1.34.0-wmf.11/extensions/TimedMediaHandler/maintenance/requeueTranscodes.php: Extra filtering option (duration: 00m 51s)
  • 10:09 ema: pool cp2008 w/ ATS backend T226637
  • 09:28 jmm@cumin2001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
  • 09:28 jmm@cumin2001: START - Cookbook sre.hosts.downtime
  • 09:17 ema: depool cp2008 and reimage as upload_ats T226637
  • 09:16 elukey: systemctl reset-failed kafka* units on kafka2002 (role spare, failed units, already masked)
  • 09:16 jmm@cumin2001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
  • 09:15 jmm@cumin2001: START - Cookbook sre.hosts.downtime
  • 09:10 moritzm: rebooting releases* hosts for MDS-enabled qemu/kernel
  • 09:10 jmm@cumin2001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
  • 09:10 jmm@cumin2001: START - Cookbook sre.hosts.downtime
  • 08:43 elukey: roll restart of eventstreams on all scb2* nodes, service now working (kafka transport failures logged)
  • 08:02 moritzm: updating openssl packages on mw1265
  • 07:57 ema: pool cp2005 w/ ATS backend T226637
  • 07:11 _joe_: upgrading php-wikidiff2 on the mw canaries, only on php7 - T223391
  • 07:05 ema: depool cp2005 and reimage as upload_ats T226637
  • 01:22 Krinkle: Killing arclamp-log on webperf1002, no flame graphs for three days, presumably mwlog/redis connection dropped again. T215740

2019-06-27

  • 23:28 catrope@deploy1001: Synchronized php-1.34.0-wmf.11/extensions/TimedMediaHandler/: T226748 (duration: 00m 50s)
  • 23:26 catrope@deploy1001: Synchronized php-1.34.0-wmf.11/extensions/GrowthExperiments/includes/HomepageHooks.php: Fix JS error on Special:Homepage (duration: 00m 50s)
  • 23:25 brion: roan is fixing deploy of T226748 which failed to include the patch (whoops)
  • 21:58 cdanis: cdanis@cp1075.eqiad.wmnet ~ % sudo -i varnish-backend-restart
  • 21:44 brion: deploying fix for TMH jobqueue bug T226748
  • 20:31 jforrester@deploy1001: Synchronized php-1.34.0-wmf.11/extensions/MobileFrontend/resources/dist: T221191: Log editor switches to visualeditorfeatureuse (duration: 00m 50s)
  • {{safesubst:SAL entry|1=20:18 ladsgroup@deploy1001: Synchronized php-1.34.0-wmf.11/extensions/Wikibase: [[gerrit:519492|Avoid inserting a new addUsage job when the current usage stays untouched (duration: 01m 14s)}}
  • 19:23 Urbanecm: run namespaceDupes.php for wikis in P8674 (T173070)
  • 19:23 jhuneidi@deploy1001: rebuilt and synchronized wikiversions files: all wikis to 1.34.0-wmf.11 refs T220736
  • 19:16 ppchelko@deploy1001: Finished deploy [restbase/deploy@ff6f302]: Use new projects and new config layout T220855 (duration: 11m 21s)
  • 19:04 ppchelko@deploy1001: Started deploy [restbase/deploy@ff6f302]: Use new projects and new config layout T220855
  • 18:52 Urbanecm: Morning SWAT done for real
  • 18:51 urbanecm@deploy1001: Synchronized wmf-config/InitialiseSettings.php: Tidy up GroupOverrides (T173070) (duration: 00m 56s)
  • 18:50 urbanecm@deploy1001: Synchronized dblists/commonsuploads.dblist: gerrit:Tidy up GroupOverrides, part 1 (T173070) (duration: 00m 57s)
  • 18:48 Urbanecm: foreachwiki namespaceDupes.php --fix done (T173070)
  • 18:46 Urbanecm: Reopen Morning SWAT
  • 18:33 legoktm: gerrit set-account --active '"Dzahn"'
  • 18:33 Urbanecm: Morning SWAT done, namespaceDupes.php still running for T173070
  • 18:29 urbanecm@deploy1001: Synchronized wmf-config/InitialiseSettings.php: Tidy up groupOverrides (T185898) (duration: 00m 56s)
  • 18:22 urbanecm@deploy1001: Synchronized dblists/commonsuploads.dblist: Remove several wikis from commonsuploads.dblist (T185898) (duration: 00m 57s)
  • 18:20 urbanecm@deploy1001: Synchronized dblists/commonsuploads.dblist: Restrict uploading on wikimaniawiki (T225505) (duration: 00m 56s)
  • 18:18 urbanecm@deploy1001: Synchronized wmf-config/InitialiseSettings.php: Restrict uploading on wikimaniawiki, Add + in front of wikimaniawiki in GroupOverrides (T225505) (duration: 00m 57s)
  • 18:13 herron: kafka2001 -> kafka-main2001 migration complete. re-enabling alerting on kafka-main2001, and moving kafka2001 to role::spare::system T225005
  • 18:08 Urbanecm: running namespaceDupes.php across all wikis in tmux on mwmaint1002 (T173070)
  • 18:06 ppchelko@deploy1001: Finished deploy [restbase/deploy@ff6f302]: Use new projects and new config layout T220855, restbase1016 (duration: 01m 41s)
  • 18:05 urbanecm@deploy1001: Synchronized wmf-config/InitialiseSettings.php: Revert "Revert "Set default aliases for Project_talk namespace"" (T173070) (duration: 00m 57s)
  • 18:05 ppchelko@deploy1001: Started deploy [restbase/deploy@ff6f302]: Use new projects and new config layout T220855, restbase1016
  • 18:05 robh@cumin1001: END (PASS) - Cookbook sre.hosts.decommission (exit_code=0)
  • 18:04 robh@cumin1001: START - Cookbook sre.hosts.decommission
  • 18:03 ppchelko@deploy1001: Finished deploy [restbase/deploy@ff6f302]: Use new projects and new config layout T220855, restbase1016 (duration: 00m 08s)
  • 18:03 ppchelko@deploy1001: Started deploy [restbase/deploy@ff6f302]: Use new projects and new config layout T220855, restbase1016
  • 18:01 ppchelko@deploy1001: Finished deploy [restbase/deploy@ff6f302]: Use new projects and new config layout T220855, rb2009 only, fixed mathoid config (duration: 02m 19s)
  • 17:59 ppchelko@deploy1001: Started deploy [restbase/deploy@ff6f302]: Use new projects and new config layout T220855, rb2009 only, fixed mathoid config
  • 17:55 krinkle@deploy1001: Synchronized php-1.34.0-wmf.11/extensions/WikimediaIncubator/includes/WikimediaIncubator.php: T204883 / 93643b44a52ea7 (duration: 01m 00s)
  • 17:26 ppchelko@deploy1001: Finished deploy [restbase/deploy@da50001]: Use new projects and new config layout T220855, rb2009 only (duration: 02m 38s)
  • 17:23 ppchelko@deploy1001: Started deploy [restbase/deploy@da50001]: Use new projects and new config layout T220855, rb2009 only
  • 17:21 arturo: imported gpg keys 9DC858229FC7DD38854AE2D88D81803C0EBFCD88 and 54A647F9048D5688D7DA2ABE6A030B21BA07F4FB into install1002 for T215975
  • 17:14 ejegg: updated fundraising tools from da82ed111d to 3089c0ec76
  • 16:42 jynus: repool labsdb1011 T222978
  • 16:39 ema: pool cp2002 w/ ATS backend T226637
  • 14:43 herron: beginning replacement of kafka2001 with kafka-main2001 T225005
  • 14:33 akosiaris: push newer calico outgoing policy rules. T225005
  • 14:28 XioNoX: push RPKI classification to Dallas - T220669
  • 14:23 Reedy: running `mwscript extensions/TimedMediaHandler/maintenance/requeueTranscodes.php --wiki=commonswiki --audio --missing --throttle` in screen as me on mwmaint1002 T226713
  • 14:13 XioNoX: push RPKI classification test to eqord - T220669
  • 14:11 ema: depool cp2002 and reimage as upload_ats T226637
  • 13:43 XioNoX: push RPKI classification test to cr3-ulsfo - T220669
  • 13:26 XioNoX: push RPKI classification test to cr4-ulsfo - T220669
  • 13:15 elukey: start druid drop datasource test - might affect AQS - T226035
  • 13:11 godog: depool restbase10(0[7-9]|1[0-5]) before merging https://gerrit.wikimedia.org/r/c/operations/puppet/+/513262
  • 12:01 Amir1: start of mwscript extensions/Wikibase/repo/maintenance/rebuildPropertyTerms.php --wiki=testwikidatawiki --batch-size=100 --sleep=3 (T225052)
  • 11:23 Amir1: EU SWAT is done
  • 11:21 ladsgroup@deploy1001: Synchronized wmf-config/InitialiseSettings.php: SWAT: Add Portal Namespace to VisualEditor option on kowiki (T224813) (duration: 00m 57s)
  • 10:48 jijiki: Rolling restart ms-fe* proxy services for T226373 and T211661
  • 10:48 moritzm: updated buster d-i image to release candidate 2
  • 10:40 _joe_: progressively restarting pybal in codfw, eqiad to pick up the change in monitoring for wdqs
  • 10:39 volans: restarted stashbot on toolforge was not !log-ing since 01:11 UTC this morning
  • 01:11 bblack: depool eqiad front edge

2019-06-26

  • 23:39 catrope@deploy1001: Synchronized php-1.34.0-wmf.10/extensions/Echo/modules/nojs/mw.echo.badge.monobook.less: Fix horizontal scrollbars in Monobook (T226594) (duration: 00m 55s)
  • 23:38 catrope@deploy1001: Synchronized php-1.34.0-wmf.11/extensions/Echo/modules/nojs/mw.echo.badge.monobook.less: Fix horizontal scrollbars in Monobook (T226594) (duration: 00m 57s)
  • 21:36 jforrester@deploy1001: Synchronized wmf-config/CommonSettings.php: Don't set wgSentryEventGateUri in prod CS (duration: 00m 55s)
  • 21:35 jforrester@deploy1001: Synchronized wmf-config/InitialiseSettings.php: Explicitly set wgSentryEventGateUri to false in prod IS (duration: 00m 56s)
  • 21:22 cparle@deploy1001: Synchronized wmf-config/InitialiseSettings.php: SDC: Enable other statements on test commons (duration: 00m 58s)
  • 20:58 marktraceur: added cparle to wmf-deployment group on Gerrit (already has deploy access)
  • 20:56 cscott@deploy1001: Finished deploy [parsoid/deploy@3d20703]: Updating Parsoid to 31d356a5 (ensure proper source texts when parsing) (duration: 20m 55s)
  • 20:52 bsitzmann@deploy1001: Finished deploy [mobileapps/deploy@85fc707]: Update mobileapps to 4f9b376 (duration: 02m 08s)
  • 20:50 bsitzmann@deploy1001: Started deploy [mobileapps/deploy@85fc707]: Update mobileapps to 4f9b376
  • 20:48 bsitzmann@deploy1001: Finished deploy [mobileapps/deploy@41a86f8]: Merge "Update prod config template to pass thru accept-language to the MW API" (duration: 03m 17s)
  • 20:44 bsitzmann@deploy1001: Started deploy [mobileapps/deploy@41a86f8]: Merge "Update prod config template to pass thru accept-language to the MW API"
  • 20:37 bsitzmann@deploy1001: deploy aborted: Merge "Update prod config template to pass thru accept-language to the MW API" (duration: 02m 15s)
  • 20:35 cscott@deploy1001: Started deploy [parsoid/deploy@3d20703]: Updating Parsoid to 31d356a5 (ensure proper source texts when parsing)
  • 20:35 bsitzmann@deploy1001: Started deploy [mobileapps/deploy@41a86f8]: Merge "Update prod config template to pass thru accept-language to the MW API"
  • 19:42 shdubsh: file-read-backwards v2.0.0 deployed to apt repo
  • 19:08 jhuneidi@deploy1001: Synchronized php: group1 wikis to 1.34.0-wmf.11 refs T220736 (duration: 00m 56s)
  • 19:06 jhuneidi@deploy1001: rebuilt and synchronized wikiversions files: group1 wikis to 1.34.0-wmf.11 refs T220736
  • 17:52 herron: finished migration of kafka2002 to kafka-main2002 — enabling alert notifications for kafka-main2002, and leaving kafka2002 disabled T225005
  • 16:42 urbanecm@deploy1001: Synchronized wmf-config/CommonSettings.php: Fix $wgSentryEventGateUri (T217142) (duration: 09m 52s)
  • 16:28 urbanecm@deploy1001: Synchronized wmf-config/InitialiseSettings.php: Reverting change scap had problems with (duration: 00m 55s)
  • 16:25 urbanecm@deploy1001: scap failed: average error rate on 11/11 canaries increased by 10x (rerun with --force to override this check, see https://logstash.wikimedia.org/goto/db09a36be5ed3e81155041f7d46ad040 for details)
  • 16:21 urbanecm@deploy1001: Synchronized wmf-config/InitialiseSettings.php: Change name of Serbian Wikinews in InitialiseSettings.php (part 2) (T226315) (duration: 00m 55s)
  • 16:20 Urbanecm: Purged srwikinews.png, srwikinews-1.5x.png, srwikinews-2x.png (T226315)
  • 16:19 urbanecm@deploy1001: Synchronized static/images/project-logos/: Change name of Serbian Wikinews (part 1) (T226315) (duration: 00m 56s)
  • 16:15 jijiki: Pooling restbase1007 back
  • 16:14 urbanecm@deploy1001: Synchronized wmf-config/CommonSettings.php: Enable sending JS errors to EventGate (T217142) (duration: 00m 55s)
  • 16:12 urbanecm@deploy1001: Synchronized wmf-config/InitialiseSettings.php: GrowthExperiments: Enable mobile homepage for cswiki and kowiki (T225676) (duration: 00m 56s)
  • 16:08 ppchelko@deploy1001: Finished deploy [restbase/deploy@a915f69]: Really revert (duration: 01m 35s)
  • 16:06 ppchelko@deploy1001: Started deploy [restbase/deploy@a915f69]: Really revert
  • 16:04 ema: pool cp5006 w/ ATS backend T226477
  • 15:56 jijiki: Depooling restbase1007
  • 15:54 ppchelko@deploy1001: Finished deploy [restbase/deploy@574a678]: Revert (duration: 03m 47s)
  • 15:51 ppchelko@deploy1001: Started deploy [restbase/deploy@574a678]: Revert
  • 15:50 ppchelko@deploy1001: deploy aborted: Use new projects and new config layout T220855, canaries only (duration: 03m 31s)
  • 15:46 ppchelko@deploy1001: Started deploy [restbase/deploy@995bc9d]: Use new projects and new config layout T220855, canaries only
  • 15:04 ema: depool cp5006 and reimage as upload_ats T226477
  • 15:01 ema: pool cp3043 as cache_text
  • 14:16 herron: beginning replacement of kafka2002 with kafka-main2002 T225005
  • 14:12 ema: depool cp3043 and convert it from upload to text
  • 14:01 moritzm: rebooting graphite1004 for kernel security update
  • 13:55 moritzm: rebooting puppetboard* to pick up MDS-enabled qemu and new kernel
  • 13:55 jmm@cumin2001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
  • 13:55 jmm@cumin2001: START - Cookbook sre.hosts.downtime
  • 13:48 XioNoX: push RPKI classification test to cr4-ulsfo - T220669
  • 13:32 ema: pool cp5005 w/ ATS backend T226477
  • 13:31 moritzm: rebooting graphite2003 for kernel security update
  • 13:25 jmm@cumin2001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
  • 13:25 jmm@cumin2001: START - Cookbook sre.hosts.downtime
  • 13:17 Lucas_WMDE: end (success) lucaswerkmeister-wmde@mwmaint1002:~$ mwscript extensions/WikibaseQualityConstraints/maintenance/ImportConstraintStatements.php wikidatawiki # T223372
  • 13:16 Lucas_WMDE: begin lucaswerkmeister-wmde@mwmaint1002:~$ mwscript extensions/WikibaseQualityConstraints/maintenance/ImportConstraintStatements.php wikidatawiki # T223372
  • 12:27 Amir1: EU SWAT is done for real
  • 12:27 Amir1: end of ladsgroup@mwmaint1002:~$ mwscript extensions/Wikibase/repo/maintenance/rebuildPropertyTerms.php --wiki=wikidatawiki --batch-size=100 --sleep=3 (T225052)
  • 12:25 ema: depool cp5005 and reimage as upload_ats T226477
  • 12:07 Amir1: start of ladsgroup@mwmaint1002:~$ mwscript extensions/Wikibase/repo/maintenance/rebuildPropertyTerms.php --wiki=wikidatawiki --batch-size=100 --sleep=3
  • 12:06 ladsgroup@deploy1001: Synchronized wmf-config/InitialiseSettings.php: SWAT: Set EntityUsageTable addUsage batch size to 100 (T225500) (duration: 00m 56s)
  • 12:02 dcausse: Revert: EU swat done
  • 12:02 dcausse: EU swat done
  • 12:01 dcausse@deploy1001: Synchronized php-1.34.0-wmf.11/extensions/CirrusSearch/includes/RequestLogger.php: T226568: Convert array params to string when logging requests (duration: 00m 56s)
  • 11:55 dcausse@deploy1001: Synchronized wmf-config/InitialiseSettings.php: [cirrus] Enable UTR30 as a lookup method for ns prefixes on group0 (duration: 00m 56s)
  • 11:47 dcausse@deploy1001: Synchronized wmf-config/InitialiseSettings.php: [cirrus] remove unused wgCirrusSearchRequestEventSampling (duration: 00m 54s)
  • 11:40 dcausse@deploy1001: Synchronized wmf-config/InitialiseSettings.php: T226273: Enable reader demographics surveys (duration: 00m 55s)
  • 11:17 urbanecm@deploy1001: sync-file aborted: Reverting gerrit:519167 (T226273) (duration: 00m 32s)
  • 11:13 urbanecm@deploy1001: Synchronized wmf-config/InitialiseSettings.php: Switch property terms migration to WRITE_BOTH on wikidata production (T225051) (duration: 00m 56s)
  • 11:05 urbanecm@deploy1001: Synchronized wmf-config/InitialiseSettings.php: Allow bureaucrats to remove sysop on nycwikimedia (T226591) (duration: 00m 57s)
  • 10:39 jmm@cumin2001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
  • 10:39 jmm@cumin2001: START - Cookbook sre.hosts.downtime
  • 10:37 jmm@cumin2001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
  • 10:37 jmm@cumin2001: START - Cookbook sre.hosts.downtime
  • 10:21 ema: pool cp5004 w/ ATS backend T226477
  • 09:49 _joe_: restarted php7.2-fpm on mwdebug1002, testing php-check-and-restart script
  • 09:37 jmm@cumin2001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
  • 09:37 jmm@cumin2001: START - Cookbook sre.hosts.downtime
  • 09:23 marostegui@deploy1001: Synchronized wmf-config/db-eqiad.php: Remove db1068 from config T217396 (duration: 00m 55s)
  • 09:22 marostegui@deploy1001: Synchronized wmf-config/db-codfw.php: Remove db1068 from config T217396 (duration: 01m 11s)
  • 09:18 ema: depool cp5004 and reimage as upload_ats T226477
  • 09:04 elukey: reboot druid100[4-6] for kernel and openjdk upgrades
  • 09:00 kart_: Updated cxserver to 9bad239 (T226482)
  • 08:58 kartik@deploy1001: scap-helm cxserver finished
  • 08:58 kartik@deploy1001: scap-helm cxserver cluster codfw completed
  • 08:58 kartik@deploy1001: scap-helm cxserver upgrade -f cxserver-codfw-values.yaml production stable/cxserver [namespace: cxserver, clusters: codfw]
  • 08:56 kartik@deploy1001: scap-helm cxserver finished
  • 08:56 kartik@deploy1001: scap-helm cxserver cluster eqiad completed
  • 08:56 kartik@deploy1001: scap-helm cxserver upgrade -f cxserver-eqiad-values.yaml production stable/cxserver [namespace: cxserver, clusters: eqiad]
  • 08:52 kartik@deploy1001: scap-helm cxserver finished
  • 08:52 kartik@deploy1001: scap-helm cxserver cluster staging completed
  • 08:52 kartik@deploy1001: scap-helm cxserver upgrade -f cxserver-staging-values.yaml staging stable/cxserver [namespace: cxserver, clusters: staging]
  • 08:43 moritzm: rebooting deployment-mediawiki-07 for new kernel
  • 08:30 ema: pool cp5003 w/ ATS backend T226477
  • 07:50 godog: bounce rsyslog on lithium - T199406
  • 07:30 godog: powercycle ms-be2032 - T226600
  • 07:19 ema: depool cp5003 and reimage as upload_ats T226477
  • 07:09 elukey: reboot of druid100[1-3] hosts for kernel + openjdk upgrades
  • 05:59 elukey: systemctl mask + reset-failed kafka on kafka10[12-23] - T226517
  • 05:57 marostegui: wikimedia_editor_tasks_entity_description_exists from s8:testwikidatawiki T226326
  • 05:46 marostegui: wikimedia_editor_tasks_entity_description_exists from s3:testwikidatawiki T226326
  • 05:29 marostegui@deploy1001: Synchronized wmf-config/db-codfw.php: Add db1133 into m5 depooled T222682 (duration: 00m 55s)
  • 05:28 marostegui@deploy1001: Synchronized wmf-config/db-eqiad.php: Add db1133 into m5 depooled T222682 (duration: 00m 55s)

2019-06-25

  • 22:50 jforrester@deploy1001: Synchronized php-1.34.0-wmf.11/extensions/Echo/modules/nojs/: T226503 Fix badge icons in Monobook (duration: 00m 56s)
  • 22:48 jforrester@deploy1001: Synchronized php-1.34.0-wmf.10/extensions/Echo/modules/nojs/: T226503 Fix badge icons in Monobook (duration: 00m 57s)
  • 21:30 jgleeson: updating civicrm from 5c02e62d6e to 98fd34417d
  • 21:30 jhuneidi@deploy1001: rebuilt and synchronized wikiversions files: group0 wikis to 1.34.0-wmf.11
  • 21:23 thcipriani: gerrit back on 2.15.13
  • 21:19 thcipriani@deploy1001: Finished deploy [gerrit/gerrit@7b379a6]: revert Gerrit to 2.15.13 on cobalt (restart incoming) (duration: 00m 11s)
  • 21:19 thcipriani@deploy1001: Started deploy [gerrit/gerrit@7b379a6]: revert Gerrit to 2.15.13 on cobalt (restart incoming)
  • 21:07 thcipriani@deploy1001: Finished deploy [gerrit/gerrit@7b379a6]: revert Gerrit to 2.15.13 on gerrit2001 (duration: 00m 10s)
  • 21:07 thcipriani@deploy1001: Started deploy [gerrit/gerrit@7b379a6]: revert Gerrit to 2.15.13 on gerrit2001
  • 21:03 hashar: contint1001: running puppet to clear a puppet alarm (due to Gerrit restart)
  • 20:44 moritzm: rebooting ununpentium for kernel security update
  • 20:44 jmm@cumin2001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
  • 20:44 jmm@cumin2001: START - Cookbook sre.hosts.downtime
  • 20:35 thcipriani: gerrit back
  • 20:33 thcipriani: restarting gerrit due to T224448
  • 20:28 moritzm: rebooting vega for kernel security update
  • 20:27 jmm@cumin2001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
  • 20:27 jmm@cumin2001: START - Cookbook sre.hosts.downtime
  • 20:24 jhuneidi@deploy1001: Finished scap: testwikis wikis to 1.34.0-wmf.11 (duration: 41m 35s)
  • 20:10 moritzm: rebooting webperf hosts for kernel security update
  • 20:09 jmm@cumin2001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
  • 20:09 jmm@cumin2001: START - Cookbook sre.hosts.downtime
  • 20:00 moritzm: rebooting torrelay1001 for kernel security update
  • 20:00 jmm@cumin2001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
  • 20:00 jmm@cumin2001: START - Cookbook sre.hosts.downtime
  • 19:43 jhuneidi@deploy1001: Started scap: testwikis wikis to 1.34.0-wmf.11
  • 19:16 twentyafterfour@deploy1001: rebuilt and synchronized wikiversions files: all wikis to 1.34.0-wmf.10 refs T220735
  • 19:08 twentyafterfour: deploying MediaWiki 1.34.0-wmf.10 to all wikis
  • 19:07 twentyafterfour: looks like we are unblocked for wmf.10, deploying that first
  • 18:45 longma: cutting the branch f or 1.34.0-wmf.11 T220736
  • 17:51 jiji@deploy1001: Finished deploy [cpjobqueue/deploy@eb8f692]: Migrating RecentChangesUpdate to PHP7 - T219148 (duration: 01m 37s)
  • 17:49 jiji@deploy1001: Started deploy [cpjobqueue/deploy@eb8f692]: Migrating RecentChangesUpdate to PHP7 - T219148
  • 17:24 smalyshev@deploy1001: Finished deploy [wdqs/wdqs@ca96238]: undo: modify agents for T226471 (duration: 16m 14s)
  • 17:08 smalyshev@deploy1001: Started deploy [wdqs/wdqs@ca96238]: undo: modify agents for T226471
  • 17:01 herron: finished migration of kafka2003 to kafka-main2003 — enabling alert notifications for kafka-main2003, and leaving kafka2003 disabled T225005
  • 16:45 reedy@deploy1001: Synchronized wmf-config/flaggedrevs.php: Remove some dupe config (duration: 00m 55s)
  • 16:25 jynus: upgrade and restart db1114 (test-s1)
  • 15:59 krinkle@deploy1001: Synchronized php-1.34.0-wmf.10/maintenance/: T226448 / 40e725b6502cd6 (duration: 01m 15s)
  • 15:56 krinkle@deploy1001: Synchronized php-1.34.0-wmf.10/includes/: T226448 / 40e725b6502cd6 (duration: 01m 20s)
  • 15:28 jforrester@deploy1001: Synchronized php-1.34.0-wmf.10/skins/MonoBook/includes/SkinMonoBook.php: T226503 Fix Notifications RL module dependency (duration: 00m 57s)
  • 15:13 otto@deploy1001: Synchronized wmf-config/InitialiseSettings.php: Produce page-properties-change stream to eventgate-main - T211248 (duration: 00m 58s)
  • 14:43 herron: beginning replacement of kafka2003 with kafka-main2003 T225005
  • 14:26 ottomata: shutting down Kafka on old analytics brokers - T183303
  • 14:21 andrewbogott: rebooting cloudvirt1014, 1018, 1024
  • 14:02 ema: pool cp5002 w/ ATS backend T226477
  • 13:54 onimisionipe: changing replication factor of v4 keyspace for maps codfw cluster - T226161
  • 12:49 marostegui: Stop MySQL on db1117:m5 (checked dumps, they are done) to clone db1133 - T222682
  • 12:48 godog: swift eqiad-prod: put back ms-be1033 - T223518
  • 12:46 ema: depool cp5002 and reimage as upload_ats T226477
  • 12:34 jijiki: Upgrade scap to eqiad - T224915
  • 12:32 jijiki: Upgrade scap to codfw - T224915
  • 12:27 akosiaris: fully depool kubernetes2001 T226237
  • 12:26 akosiaris@puppetmaster1001: conftool action : set/pooled=inactive; selector: name=kubernetes2001.*
  • 12:24 jijiki: Upgrade to scap 3.10.0-1 on mw-api-canary as well - T224915
  • 12:22 jijiki: Upgrade to scap 3.10.0-1 on mw* codfw
  • 10:22 ema: pool cp5001 w/ ATS backend T226477
  • 09:30 jijiki: enable puppet on dbproxy*
  • 09:24 _joe_: restarting gerrit on cobalt
  • 09:09 ema: depool cp5001 and reimage as upload_ats T226477
  • 09:08 jijiki: Rolling haproxy restarts on thumbor* - T225284
  • 09:02 jijiki: Disable puppet on dbproxy* - T225284
  • 08:51 marostegui@deploy1001: Synchronized wmf-config/db-eqiad.php: Change parsercache key everywhere after deploying it in small batches for a few hours T210725 (duration: 00m 57s)
  • 08:50 jijiki: Disable puppet on thumbor* - T225284
  • 08:30 marostegui: Change parsercachekey on 20 more hosts
  • 08:22 smalyshev@deploy1001: Finished deploy [wdqs/wdqs@bd3df8c]: modify agents for T226471 (duration: 11m 02s)
  • 08:18 marostegui: Change parsercachekey on 10 more hosts
  • 08:11 smalyshev@deploy1001: Started deploy [wdqs/wdqs@bd3df8c]: modify agents for T226471
  • 08:08 marostegui: Change parsercachekey on 10 more hosts
  • 07:58 marostegui: Change parsercachekey on 20 more hosts
  • 07:49 marostegui: Change parsercachekey on 20 more hosts
  • 07:44 marostegui: Change parsercachekey on 20 more hosts
  • 07:35 marostegui: Change parsercachekey on 20 more hosts
  • 07:21 marostegui: Change parsercachekey on 10 more hosts
  • 07:09 SMalyshev: depooled wdqs1004 due to lag
  • 07:09 marostegui: Change parsercachekey on 10 more hosts
  • 06:51 marostegui: Change parsercachekey on 20 more hosts
  • 05:52 marostegui: Change parsercachekey on 10 more hosts
  • 05:43 marostegui: Change parsercachekey on 10 more hosts
  • 05:33 marostegui: Change parsercachekey on 20 more hosts
  • 05:24 marostegui: Change parsercachekey on 20 more hosts
  • 05:12 marostegui: Change parsercache key on 20 more hosts
  • 05:01 marostegui: Change parsercache key on the canaries T210725
  • 05:01 marostegui@deploy1001: Synchronized wmf-config/db-codfw.php: Change parsercache key T210725 (duration: 00m 58s)
  • 04:46 kart_: Updated cxserver to use nodejs10 (T226074)
  • 04:44 kartik@deploy1001: scap-helm cxserver finished
  • 04:44 kartik@deploy1001: scap-helm cxserver cluster codfw completed
  • 04:44 kartik@deploy1001: scap-helm cxserver upgrade -f cxserver-codfw-values.yaml production stable/cxserver [namespace: cxserver, clusters: codfw]
  • 04:37 kartik@deploy1001: scap-helm cxserver finished
  • 04:37 kartik@deploy1001: scap-helm cxserver cluster eqiad completed
  • 04:37 kartik@deploy1001: scap-helm cxserver upgrade -f cxserver-eqiad-values.yaml production stable/cxserver [namespace: cxserver, clusters: eqiad]
  • 04:31 kartik@deploy1001: scap-helm cxserver finished
  • 04:31 kartik@deploy1001: scap-helm cxserver cluster staging completed
  • 04:30 kartik@deploy1001: scap-helm cxserver upgrade -f cxserver-staging-values.yaml staging stable/cxserver [namespace: cxserver, clusters: staging]
  • 00:10 reedy@deploy1001: Synchronized wmf-config/flaggedrevs.php: poke at autopromote config (duration: 00m 54s)

2019-06-24

  • 23:51 twentyafterfour@deploy1001: Synchronized php-1.34.0-wmf.10/extensions/Wikibase/: Sync https://gerrit.wikimedia.org/r/#/c/mediawiki/extensions/Wikibase/+/518782/ refs T220735 (duration: 01m 21s)
  • 23:42 reedy@deploy1001: Synchronized php-1.34.0-wmf.8/extensions/AdvancedSearch/: (no justification provided) (duration: 00m 56s)
  • 22:56 krinkle@deploy1001: Synchronized php-1.34.0-wmf.10/extensions/ProofreadPage/includes/Special/SpecialProofreadPages.php: ed556868f / T225813 (duration: 00m 53s)
  • 22:53 Krinkle: krinkle@deploy1001: There is an untracked "wmf-config/event-schemas/" directory in the /srv/mediawiki deployment source, ref T226436
  • 21:42 thcipriani: gerrit back
  • 21:40 thcipriani: restart gerrit for https://gerrit.wikimedia.org/r/518811/
  • 21:39 ppchelko@deploy1001: Finished deploy [changeprop/deploy@17e71b5]: Support .meta.stream as well as .meta.topic T226198 (duration: 01m 42s)
  • 21:37 ppchelko@deploy1001: Started deploy [changeprop/deploy@17e71b5]: Support .meta.stream as well as .meta.topic T226198
  • 21:32 sbassett@deploy1001: Synchronized wmf-config/InitialiseSettings.php: Deployed r/518350 - Revert "Temporary make account creation limits more restrictive" (duration: 00m 56s)
  • 21:23 mobrovac@deploy1001: Finished deploy [restbase/deploy@a915f69]: Add /page/media-lint - T226105 - and various other cleanups (duration: 19m 08s)
  • 21:04 mobrovac@deploy1001: Started deploy [restbase/deploy@a915f69]: Add /page/media-lint - T226105 - and various other cleanups
  • 20:42 XenoRyet: updated payments-wiki from 79d1822644 to a19e5ae077
  • 20:13 andrewbogott: rebooting cloudvirt1024
  • 19:57 reedy@deploy1001: Synchronized wmf-config/flaggedrevs.php: Update more fr config (duration: 00m 55s)
  • 19:56 reedy@deploy1001: Synchronized wmf-config/InitialiseSettings.php: rm old comments move more FR config (duration: 00m 52s)
  • 19:50 thcipriani: gerrit back
  • 19:48 thcipriani: restarting gerrit for 2.15.14 update
  • 19:47 thcipriani@deploy1001: Finished deploy [gerrit/gerrit@e3695fd]: Gerrit to 2.15.14 on cobalt (restart incoming) (duration: 00m 12s)
  • 19:47 thcipriani@deploy1001: Started deploy [gerrit/gerrit@e3695fd]: Gerrit to 2.15.14 on cobalt (restart incoming)
  • 19:46 thcipriani@deploy1001: Finished deploy [gerrit/gerrit@e3695fd]: Gerrit to 2.15.14 (gerrit2001 only) (duration: 00m 11s)
  • 19:46 thcipriani@deploy1001: Started deploy [gerrit/gerrit@e3695fd]: Gerrit to 2.15.14 (gerrit2001 only)
  • 19:43 otto@deploy1001: Synchronized .gitmodules: Remove the event-schemas submodule - .gitmodules - T226436 (duration: 00m 55s)
  • 19:41 otto@deploy1001: Synchronized wmf-config: Remove the event-schemas submodule - wmf-config - T226436 (duration: 00m 55s)
  • 19:32 elukey: restart yarn/hdfs on analytics1072 to apply https://gerrit.wikimedia.org/r/#/c/operations/puppet/+/518767/ (broken disk)
  • 19:32 otto@deploy1001: Synchronized wmf-config: Remove remaining monolog kafka and avro related configs - wmf-config - T226436 (duration: 00m 55s)
  • 19:30 otto@deploy1001: Synchronized tests/TestServices.php: Remove remaining monolog kafka and avro related configs - tests - T226436 (duration: 00m 56s)
  • 19:16 otto@deploy1001: Synchronized wmf-config: Remove usages of monolog kafka handler and avro formatter - wmf-config - T226436 (duration: 00m 56s)
  • 19:14 otto@deploy1001: Synchronized tests/loggingTest.php: Remove usages of monolog kafka handler and avro formatter - tests - T226436 (duration: 00m 55s)
  • 19:13 otto@deploy1001: sync-file aborted: Remove usages of monolog kafka handler and avro formatter - tests - T226436 (duration: 00m 06s)
  • 18:58 reedy@deploy1001: Synchronized wmf-config/InitialiseSettings.php: Stop setting wgFlaggedRevsAutoReview to a boolean (duration: 00m 55s)
  • 18:50 reedy@deploy1001: Synchronized wmf-config/flaggedrevs.php: Remove some now redundant config (duration: 00m 55s)
  • 18:48 andrewbogott: rebooting cloudvirt1018
  • 18:46 reedy@deploy1001: Synchronized wmf-config/InitialiseSettings.php: Move some basic FR config into IS (duration: 00m 55s)
  • 18:35 otto@deploy1001: Synchronized wmf-config/InitialiseSettings.php: Disable CirrusSearchRequestSet avro monolog channel - T222268 (duration: 00m 55s)
  • 18:27 otto@deploy1001: Scap failed!: Call to mwscript eval.php stderr: not empty
  • 18:21 urbanecm@deploy1001: Synchronized wmf-config/InitialiseSettings.php: Add hualab.nl to $wgCopyUploadsDomains (T225917) (duration: 00m 55s)
  • 18:19 urbanecm@deploy1001: Synchronized wmf-config/InitialiseSettings.php: Add "mass-upload" to autopatrollers and patrollers on commons (T226217) (duration: 00m 55s)
  • 18:13 urbanecm@deploy1001: Synchronized wmf-config/InitialiseSettings.php: Fix wgMetaNamespaceTalk for aswikisource (T226027) (duration: 00m 55s)
  • 18:02 smalyshev@deploy1001: Finished deploy [wdqs/wdqs@157f40c]: weekly WDQS deploy (duration: 18m 11s)
  • 17:44 smalyshev@deploy1001: Started deploy [wdqs/wdqs@157f40c]: weekly WDQS deploy
  • 17:26 reedy@deploy1001: Synchronized wmf-config/flaggedrevs.php: Cleanup (duration: 00m 55s)
  • 17:10 reedy@deploy1001: Synchronized wmf-config/flaggedrevs.php: T226410 (duration: 00m 54s)
  • 16:25 reedy@deploy1001: Synchronized wmf-config/flaggedrevs.php: Remove some duplicated config (duration: 00m 55s)
  • 16:13 mobrovac@deploy1001: Synchronized rpc/RunSingleJob.php: RunSingleJob: check that only the database param is set and leave the rest to JobExecutor - T226109 (duration: 00m 55s)
  • 16:08 reedy@deploy1001: Synchronized wmf-config/flaggedrevs.php: comments (duration: 00m 56s)
  • 15:49 jbond@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
  • 15:48 jbond@cumin1001: START - Cookbook sre.hosts.downtime
  • 15:48 XioNoX: remove cwdent from all network devices - T226405
  • 15:39 jbond@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
  • 15:39 jbond@cumin1001: START - Cookbook sre.hosts.downtime
  • 15:35 jbond@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
  • 15:35 jbond@cumin1001: START - Cookbook sre.hosts.downtime
  • 15:35 jbond@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
  • 15:34 jbond@cumin1001: START - Cookbook sre.hosts.downtime
  • 15:28 reedy@deploy1001: Synchronized wmf-config/flaggedrevs.php: Simple config outside callback (duration: 00m 56s)
  • 15:17 reedy@deploy1001: Synchronized wmf-config/flaggedrevs.php: Remove some unnecessary copy pasted code (duration: 00m 55s)
  • 15:05 gehel: re-enabling wdqs updater on wdqs-public / eqiad
  • 14:43 jbond@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
  • 14:43 jbond@cumin1001: START - Cookbook sre.hosts.downtime
  • 14:37 jbond@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
  • 14:37 jbond@cumin1001: START - Cookbook sre.hosts.downtime
  • 14:15 jbond@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
  • 14:15 jbond@cumin1001: START - Cookbook sre.hosts.downtime
  • 14:01 ema: cp3032: upgrade varnish to 5.1.3-1wm11 T226375
  • 13:51 jbond42: rolling restart of the conf servers starting in 10 minutes please let me know if you forsee any issue
  • 13:51 reedy@deploy1001: Synchronized wmf-config/flaggedrevs.php: T225144 T225276 T225414 T225776 T225797 T226054 (duration: 00m 56s)
  • 13:26 moritzm: re-enabling TCP SACKs on cp4024-4029 (half of Varnish/text and Varnish/upload in ulsfo) T225998
  • 13:25 jbond42: update libviry on cloudvirt* stretch servers
  • 13:19 moritzm: re-enabling TCP SACKs on cp3040-cp3047, cp3049 (half of Varnish/text and Varnish/upload in esams) T225998
  • 13:10 moritzm: re-enabling TCP SACKs on cp2001,2002,2004-2008,2010,2011, 2014, 2017 (half of Varnish/text and Varnish/upload in codfw) T225998
  • 13:04 moritzm: re-enabling TCP SACKs on cp1075-1082 (half of Varnish/text and Varnish/upload in eqiad) T225998
  • 13:00 gehel: shutdown wdqs updater on wdqs/public/eqiad
  • 12:49 gehel: restarting blazegraph on wdqs1004 (JVM thread out of control)
  • 11:31 Lucas_WMDE: EU SWAT done
  • 11:30 lucaswerkmeister-wmde@deploy1001: Synchronized wmf-config/InitialiseSettings-labs.php: SWAT: Labs: enable QuickSurveys on hewiki (T225819) (duration: 00m 57s)
  • 10:36 moritzm: re-enabling TCP SACKs on cp5007-cp5009 (half of Varnish/text in eqsin) T225998
  • 10:28 moritzm: re-enabling TCP SACKs on cp5001-cp5003 (half of Varnish/upload in eqsin) T225998
  • 09:23 elukey: reboot of kafka-jumbo100[1-6] for kernel + openjdk upgrades
  • 08:56 elukey: re-enable eventloggign mysql consumers after maintenance on eventlog1002
  • 08:52 marostegui: Upgrade Mysql on db1140 (checked that all snapshots backups are done) - T226358
  • 08:42 elukey: reboot an-master100[1,2] for kernel + openjdk upgrades
  • 08:38 jynus: upgrade, stop and restart db1108
  • 08:34 jynus: reloading haproxy on dbproxy1004/9
  • 08:24 marostegui@deploy1001: Synchronized wmf-config/db-eqiad.php: Repool db1120 after upgrade T226358 (duration: 00m 56s)
  • 08:14 jynus: upgrade, stop and restart db1107
  • 08:09 marostegui: Stop MySQL on db1120 for upgrade - T226358
  • 08:08 marostegui@deploy1001: Synchronized wmf-config/db-eqiad.php: Depool db1120 for upgrade T226358 (duration: 00m 56s)
  • 07:51 elukey: stop mysql consumer on eventlog1002 (so traffic to db1107 will be stopped, to allow maintenance to happen)
  • 07:06 moritzm: installing vim update for stretch
  • 06:31 _joe_: publishing docker-registry.wikimedia.org/nodejs10-slim:0.0.2, T226346
  • 06:16 elukey: powercycle analytics1060 (stuck, no ssh, no console com2 available)
  • 06:01 marostegui: Stop MySQL on db1117:3321 to clone db1135 (haproxy alert will be triggered) - T222682
  • 05:57 _joe_: rebuilding base debian/alpine images to pick up security updates
  • 05:07 marostegui@deploy1001: Synchronized wmf-config/db-codfw.php: Remove db1135 from config T222682 (duration: 00m 55s)
  • 05:06 marostegui@deploy1001: Synchronized wmf-config/db-eqiad.php: Remove db1135 from config T222682 (duration: 01m 07s)
  • 04:59 marostegui: Rename table wikimedia_editor_tasks_entity_description_exists in db1123 (testwikidatawiki) T226326
  • 04:54 marostegui: Rename table wikimedia_editor_tasks_entity_description_exists in db1092 T226326

2019-06-21

  • 14:54 jmm@cumin2001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
  • 14:53 jmm@cumin2001: START - Cookbook sre.hosts.downtime
  • 14:51 moritzm: rebooting planet1001 to pick up MDS mitigations/new kernel
  • 14:50 jmm@cumin2001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
  • 14:50 jmm@cumin2001: START - Cookbook sre.hosts.downtime
  • 14:50 jmm@cumin2001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
  • 14:49 jmm@cumin2001: START - Cookbook sre.hosts.downtime
  • 14:37 moritzm: rebooting kerberos1001 to pick up MDS mitigations/new kernel
  • 14:26 jmm@cumin2001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
  • 14:26 jmm@cumin2001: START - Cookbook sre.hosts.downtime
  • 14:23 ema@cumin1001: END (PASS) - Cookbook sre.hosts.upgrade-and-reboot (exit_code=0)
  • 14:23 moritzm: rebooting wezen
  • 14:22 jmm@cumin2001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
  • 14:22 jmm@cumin2001: START - Cookbook sre.hosts.downtime
  • 14:17 ema@cumin1001: START - Cookbook sre.hosts.upgrade-and-reboot
  • 14:16 ema@cumin1001: END (PASS) - Cookbook sre.hosts.upgrade-and-reboot (exit_code=0)
  • 14:10 ema@cumin1001: START - Cookbook sre.hosts.upgrade-and-reboot
  • 14:10 Urbanecm: Attached Carmen0428@metawiki to Carmen0428 global account (T223036)
  • 14:09 Urbanecm: Renamed Carmen0429@metawiki to Carmen0428@metawiki as part of re-attaching to global account (T223036)
  • 13:55 ema@cumin1001: END (PASS) - Cookbook sre.hosts.upgrade-and-reboot (exit_code=0)
  • 13:52 jmm@cumin2001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
  • 13:52 jmm@cumin2001: START - Cookbook sre.hosts.downtime
  • 13:48 jmm@cumin2001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
  • 13:48 jmm@cumin2001: START - Cookbook sre.hosts.downtime
  • 13:48 ema@cumin1001: START - Cookbook sre.hosts.upgrade-and-reboot
  • 13:43 akosiaris@deploy1001: scap-helm mathoid finished
  • 13:43 akosiaris@deploy1001: scap-helm mathoid cluster staging completed
  • 13:43 akosiaris@deploy1001: scap-helm mathoid upgrade --recreate-pods -f mathoid-staging-values.yaml staging stable/mathoid [namespace: mathoid, clusters: staging]
  • 13:33 ema@cumin1001: END (PASS) - Cookbook sre.hosts.upgrade-and-reboot (exit_code=0)
  • 13:26 ema@cumin1001: START - Cookbook sre.hosts.upgrade-and-reboot
  • 13:16 moritzm: rebooting kafkamon instances to pick up MDS mitigations/new kernel
  • 13:16 jmm@cumin2001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
  • 13:16 jmm@cumin2001: START - Cookbook sre.hosts.downtime
  • 13:11 jmm@cumin2001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
  • 13:11 jmm@cumin2001: START - Cookbook sre.hosts.downtime
  • 12:58 ema@cumin1001: END (PASS) - Cookbook sre.hosts.upgrade-and-reboot (exit_code=0)
  • 12:51 ema@cumin1001: START - Cookbook sre.hosts.upgrade-and-reboot
  • 12:36 ema@cumin1001: END (PASS) - Cookbook sre.hosts.upgrade-and-reboot (exit_code=0)
  • 12:30 ema@cumin1001: START - Cookbook sre.hosts.upgrade-and-reboot
  • 10:15 ema@cumin1001: END (PASS) - Cookbook sre.hosts.upgrade-and-reboot (exit_code=0)
  • 10:13 ema@cumin1001: END (PASS) - Cookbook sre.hosts.upgrade-and-reboot (exit_code=0)
  • 10:09 ema@cumin1001: START - Cookbook sre.hosts.upgrade-and-reboot
  • 10:06 ema@cumin1001: START - Cookbook sre.hosts.upgrade-and-reboot
  • 09:51 ema@cumin1001: END (PASS) - Cookbook sre.hosts.upgrade-and-reboot (exit_code=0)
  • 09:45 ema@cumin1001: START - Cookbook sre.hosts.upgrade-and-reboot
  • 09:43 moritzm: rebooting cp1008
  • 09:42 ema@cumin1001: END (PASS) - Cookbook sre.hosts.upgrade-and-reboot (exit_code=0)
  • 09:35 ema@cumin1001: START - Cookbook sre.hosts.upgrade-and-reboot
  • 09:30 ema@cumin1001: END (PASS) - Cookbook sre.hosts.upgrade-and-reboot (exit_code=0)
  • 09:23 ema@cumin1001: START - Cookbook sre.hosts.upgrade-and-reboot
  • 09:15 ema@cumin1001: END (PASS) - Cookbook sre.hosts.upgrade-and-reboot (exit_code=0)
  • 09:09 jiji@deploy1001: Synchronized wmf-config/ProductionServices.php: Remove kafka1018 from ProductionServices - T224538 (duration: 00m 56s)
  • 09:08 ema@cumin1001: START - Cookbook sre.hosts.upgrade-and-reboot
  • 09:08 ema@cumin1001: END (PASS) - Cookbook sre.hosts.upgrade-and-reboot (exit_code=0)
  • 09:01 ema@cumin1001: START - Cookbook sre.hosts.upgrade-and-reboot
  • 08:48 ema@cumin1001: END (PASS) - Cookbook sre.hosts.upgrade-and-reboot (exit_code=0)
  • 08:46 ema@cumin1001: END (PASS) - Cookbook sre.hosts.upgrade-and-reboot (exit_code=0)
  • 08:42 ema@cumin1001: START - Cookbook sre.hosts.upgrade-and-reboot
  • 08:40 ema@cumin1001: START - Cookbook sre.hosts.upgrade-and-reboot
  • 07:39 akosiaris@puppetmaster1001: conftool action : set/pooled=yes; selector: dc=codfw,cluster=kubernetes,service=eventgate-analytics,name=kubernetes2001.codfw.wmnet
  • 07:38 akosiaris@puppetmaster1001: conftool action : set/pooled=yes; selector: dc=codfw,cluster=kubernetes,service=eventgate-main,name=kubernetes2001.codfw.wmnet
  • 07:24 moritzm: installing python-thumbor-wikimedia, python-opencv on stat1006
  • 06:54 moritzm: installed radeontop on stat1005 to diagnose GPU usage (T220811)
  • 06:44 moritzm: installed python-opencv on stat1005 (T220811)
  • 05:07 marostegui@deploy1001: Synchronized wmf-config/db-codfw.php: Pool db2051 into s2 to replace db2035 as a master (duration: 01m 00s)
  • 00:45 RoanKattouw: Running FlowReserializeRevisionContent.php on testwiki

2019-06-20

  • 23:06 jforrester@deploy1001: Synchronized wmf-config/InitialiseSettings.php: Enable TimedMediaHandler's new video player Beta Feature T148103 (duration: 00m 57s)
  • 23:01 jforrester@deploy1001: Synchronized php-1.34.0-wmf.8/extensions/TimedMediaHandler/resources/videojs/: Latest VideoJS for T222763 (duration: 00m 59s)
  • 23:01 onimisionipe: pool maps1003 - node is ready to receive requests - T224395
  • 22:31 jforrester@deploy1001: Finished scap: Full scap for new i18n in VisualEditor (duration: 31m 29s)
  • 22:31 James_F: Scap is stuck in scap-cdb-rebuild with one server left to sync.
  • 22:00 jforrester@deploy1001: Started scap: Full scap for new i18n in VisualEditor
  • 21:49 James_F: Manually purged https://bn.m.wikipedia.org/w/load.php?lang=bn&modules=startup&only=scripts&skin=minerva&target=mobile from Varnish
  • 21:43 jforrester@deploy1001: Synchronized wmf-config/CommonSettings.php: Ensure that wmgVisualEditorEnableNewMobileContext CS part is set on all servers (duration: 00m 59s)
  • 21:42 jforrester@deploy1001: Synchronized wmf-config/InitialiseSettings.php: Ensure that wmgVisualEditorEnableNewMobileContext IS part is set on all servers (duration: 00m 59s)
  • 21:34 jforrester@deploy1001: Synchronized php-1.34.0-wmf.8/extensions/VisualEditor/modules/ve-mw/init/targets/ve.init.mw.MobileArticleTarget.js: Revert 'MobileArticleTarget: Update loading interface for new design' (duration: 00m 57s)
  • 21:23 jforrester@deploy1001: Synchronized php-1.34.0-wmf.8/extensions/VisualEditor/: Pull VisualEditor wmf.8 all the way to wmf.10 (duration: 01m 08s)
  • 20:23 jforrester@deploy1001: Synchronized wmf-config/InitialiseSettings.php: Revert Centralize enwiki's VisualEditor feedback page T224851 (duration: 00m 59s)
  • 18:54 hashar: upgrading and restarting jenkins
  • 18:52 tgr@deploy1001: Synchronized wmf-config/InitialiseSettings.php: SWAT: Deploy partial blocks on hewikivoyage on community request (Bug: T218626) (duration: 00m 58s)
  • 18:47 tgr@deploy1001: Synchronized php-1.34.0-wmf.10/extensions/GrowthExperiments/extension.json: SWAT: HomepageModule: Use newer schema with start module name (Bug: T222836) (duration: 00m 58s)
  • 18:29 tgr@deploy1001: Synchronized docroot/wwwportal/.well-known/: SWAT: Add .well-known/matrix for wikimedia.org (Bug: T223835) (duration: 00m 57s)
  • 18:16 tgr@deploy1001: Synchronized wmf-config/CommonSettings.php: SWAT: Ensure no lossy WTE→VE switching in public wikis (no-op) (duration: 00m 58s)
  • 18:15 tgr@deploy1001: Synchronized wmf-config/InitialiseSettings.php: SWAT: Centralize enwikis VisualEditor feedback page (T224851) (duration: 00m 57s)
  • 18:02 arlolra: Updated Parsoid to 4fa8d01 (T211251)
  • 17:43 mholloway-shell@deploy1001: Finished deploy [mobileapps/deploy@fd98900]: Deploy media-list endpoint (T225443) and service template upgrade to v0.7.0 (duration: 05m 38s)
  • 17:37 mholloway-shell@deploy1001: Started deploy [mobileapps/deploy@fd98900]: Deploy media-list endpoint (T225443) and service template upgrade to v0.7.0
  • 17:34 arlolra@deploy1001: Finished deploy [parsoid/deploy@1084a7b]: Updating Parsoid to 4fa8d01 (duration: 06m 17s)
  • 17:27 arlolra@deploy1001: Started deploy [parsoid/deploy@1084a7b]: Updating Parsoid to 4fa8d01
  • 17:25 mholloway-shell@deploy1001: Finished deploy [recommendation-api/deploy@7dc63ab]: Deploy Suggested Edits endpoints (T209997, T224233) (duration: 02m 55s)
  • 17:22 mholloway-shell@deploy1001: Started deploy [recommendation-api/deploy@7dc63ab]: Deploy Suggested Edits endpoints (T209997, T224233)
  • 16:53 otto@deploy1001: Synchronized wmf-config/InitialiseSettings.php: Revert page-properties-change back to eventbus, new schema does not work with change prop - deploy take 3 (duration: 00m 56s)
  • 16:37 otto@deploy1001: Synchronized wmf-config/InitialiseSettings.php: ACTUALLY Revert page-properties-change back to eventbus, new schema does not work with change prop (duration: 00m 57s)
  • 16:27 otto@deploy1001: Synchronized wmf-config/InitialiseSettings.php: Revert page-properties-change back to eventbus, new schema does not work with change prop (duration: 00m 55s)
  • 16:19 krinkle@deploy1001: Synchronized php-1.34.0-wmf.10/includes/specials/pagers/ImageListPager.php: T226102 / 294500d (duration: 00m 58s)
  • 16:16 Krinkle: scb1001 is producing 120,000 errors per minute as of 16:09 UTC minute ago (under 500/min before that)
  • 15:40 Krinkle: krinkle@deploy1001: pull down 98399b1032a0 to wmf.10 (test-only change)
  • 15:05 jijiki: Rolling restart php-fpm on jobrunners to pick up new opcache settings - 518023
  • 15:03 jijiki: Repool mw1311
  • 15:01 jeh: T101631 updating replica views on labsdb1009
  • 14:58 akosiaris: make sure all kubernetes hosts (except kubernetes2001 which is used to investigate some outgoing packet discards) are pooled and with the exact same weight
  • 14:57 jijiki: enable puppet on jobrunners
  • 14:57 akosiaris@puppetmaster1001: conftool action : set/pooled=yes; selector: name=kubernetes1005.*
  • 14:57 akosiaris@puppetmaster1001: conftool action : set/pooled=yes; selector: name=kubernetes1006.*
  • 14:56 akosiaris@puppetmaster1001: conftool action : set/pooled=yes; selector: name=kubernetes2006.*
  • 14:56 akosiaris@puppetmaster1001: conftool action : set/pooled=yes; selector: name=kubernetes2005.*
  • 14:54 jeh: T101631 updating replica views on labsdb1010
  • 14:47 jeh: T101631 updating replica views on labsdb1011
  • 14:41 akosiaris@puppetmaster1001: conftool action : set/pooled=no; selector: dc=codfw,cluster=kubernetes,service=eventgate-main,name=kubernetes2001.codfw.wmnet
  • 14:36 jeh: T101631 updating replica views on labsdb1012
  • 14:28 Amir1: end of ladsgroup@mwmaint1002:~$ mwscript extensions/Wikibase/repo/maintenance/rebuildPropertyTerms.php --wiki=testwikidatawiki --batch-size=100 --sleep=3 (T225052)
  • 14:22 ladsgroup@deploy1001: Synchronized wmf-config/InitialiseSettings.php: SWAT: Set EntityUsageTable addUsage batch size to 150 (T225500) (duration: 00m 56s)
  • 14:18 ema@cumin1001: END (PASS) - Cookbook sre.hosts.upgrade-and-reboot (exit_code=0)
  • 14:18 Amir1: start of ladsgroup@mwmaint1002:~$ mwscript extensions/Wikibase/repo/maintenance/rebuildPropertyTerms.php --wiki=testwikidatawiki --batch-size=100 --sleep=3 (T225052)
  • 14:16 akosiaris@puppetmaster1001: conftool action : set/pooled=no; selector: dc=codfw,cluster=kubernetes,service=eventgate-analytics,name=kubernetes2001.codfw.wmnet
  • 14:16 akosiaris@puppetmaster1001: conftool action : set/pooled=yes; selector: name=kubernetes2001.*
  • 14:14 ladsgroup@deploy1001: Synchronized wmf-config/Wikibase.php: SWAT: Switch property terms migration to WRITE_BOTH on test wikidata (T225051) (duration: 00m 56s)
  • 14:14 akosiaris@puppetmaster1001: conftool action : set/pooled=no; selector: dc=codfw,cluster=kubernetes,service=eventgate-main,name=kubernetes2001.codfw.wmnet
  • 14:14 ema@cumin1001: END (PASS) - Cookbook sre.hosts.upgrade-and-reboot (exit_code=0)
  • 14:13 akosiaris@puppetmaster1001: conftool action : set/pooled=yes; selector: dc=codfw,cluster=kubernetes,service=eventgate-analytics,name=kubernetes2001.codfw.wmnet
  • 14:12 akosiaris@puppetmaster1001: conftool action : set/pooled=yes; selector: dc=codfw,cluster=kubernetes,service=eventgate-main,name=kubernetes2001.codfw.wmnet
  • 14:11 ema@cumin1001: START - Cookbook sre.hosts.upgrade-and-reboot
  • 14:10 akosiaris@puppetmaster1001: conftool action : set/pooled=no; selector: name=kubernetes2001.*
  • 14:09 ladsgroup@deploy1001: Synchronized wmf-config/InitialiseSettings.php: SWAT: Switch property terms migration to WRITE_BOTH on test wikidata (T225051) (duration: 00m 56s)
  • 14:06 ema@cumin1001: START - Cookbook sre.hosts.upgrade-and-reboot
  • 14:04 ema@cumin1001: END (FAIL) - Cookbook sre.hosts.upgrade-and-reboot (exit_code=99)
  • 13:58 ema@cumin1001: START - Cookbook sre.hosts.upgrade-and-reboot
  • 13:56 ema@cumin1001: END (PASS) - Cookbook sre.hosts.upgrade-and-reboot (exit_code=0)
  • 13:50 ema@cumin1001: START - Cookbook sre.hosts.upgrade-and-reboot
  • 13:38 ema@cumin1001: END (PASS) - Cookbook sre.hosts.upgrade-and-reboot (exit_code=0)
  • 13:35 ema@cumin1001: END (PASS) - Cookbook sre.hosts.upgrade-and-reboot (exit_code=0)
  • 13:31 ema@cumin1001: START - Cookbook sre.hosts.upgrade-and-reboot
  • 13:28 ema@cumin1001: START - Cookbook sre.hosts.upgrade-and-reboot
  • 13:23 marostegui: Stop replication on labsdb1011 to defragment tables T222978
  • 13:22 ema@cumin1001: END (FAIL) - Cookbook sre.hosts.upgrade-and-reboot (exit_code=99)
  • 13:21 jijiki: depool mw1311
  • 13:20 marostegui: Reload haproxy on dbproxy1010 and dbproxy1011 to depool labsdb1011 - T222978
  • 13:16 ema@cumin1001: START - Cookbook sre.hosts.upgrade-and-reboot
  • 13:11 ema@cumin1001: END (PASS) - Cookbook sre.hosts.upgrade-and-reboot (exit_code=0)
  • 13:04 ema@cumin1001: START - Cookbook sre.hosts.upgrade-and-reboot
  • 12:59 jijiki: Disable puppet on jobrunners to merge 518023 and 518018
  • 12:56 ema@cumin1001: END (PASS) - Cookbook sre.hosts.upgrade-and-reboot (exit_code=0)
  • 12:50 ema: powercycle cp2017, stuck rebooting
  • 12:44 hashar: Upgrading packages on contint1001
  • 12:44 ema@cumin1001: END (PASS) - Cookbook sre.hosts.upgrade-and-reboot (exit_code=0)
  • 12:40 hashar: Upgrading java/jenkins on releases* hosts # T226159
  • 12:37 ema@cumin1001: START - Cookbook sre.hosts.upgrade-and-reboot
  • 12:36 ema@cumin1001: START - Cookbook sre.hosts.upgrade-and-reboot
  • 12:36 moritzm: updated jenkins package on apt.wikimedia.org to 2.176.1 for jessie and stretch (T226159)
  • 11:54 Amir1: EU SWAT is done
  • 11:49 ladsgroup@deploy1001: Synchronized wmf-config/InitialiseSettings.php: SWAT: Switch property terms migration to WRITE_BOTH on test wikidata (T225051) (duration: 00m 58s)
  • 11:42 ladsgroup@deploy1001: Synchronized wmf-config/Wikibase.php: SWAT: Introduce config variables for new terms store in mediawiki-config (T226086), Part II (duration: 00m 57s)
  • 11:39 ladsgroup@deploy1001: Synchronized wmf-config/InitialiseSettings.php: SWAT: Introduce config variables for new terms store in mediawiki-config (T226086) (duration: 00m 57s)
  • 11:20 kartik@deploy1001: Synchronized wmf-config/InitialiseSettings.php: SWAT Remove ExternalGuidanceEnableContentDetection (T219819) (duration: 01m 00s)
  • 11:14 moritzm: rebooting mw2235, mw2255, mw2271 for MDS kernel update
  • 11:12 tgr@deploy1001: Synchronized wmf-config/InitialiseSettings.php: SWAT: Fix import group name (duration: 00m 57s)
  • 11:09 mlitn@deploy1001: Finished scap: [SDC] Enable depicts qualifiers on Commons & increase rate limits (duration: 20m 34s)
  • 10:58 jmm@cumin2001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
  • 10:58 jmm@cumin2001: START - Cookbook sre.hosts.downtime
  • 10:58 jmm@cumin2001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
  • 10:58 jmm@cumin2001: START - Cookbook sre.hosts.downtime
  • 10:58 moritzm: rebooting scb100[12], mw2139 for MDS kernel update (their CPUs were previously unsupported by Intel, but are now covered with the new release)
  • 10:48 mlitn@deploy1001: Started scap: [SDC] Enable depicts qualifiers on Commons & increase rate limits
  • 10:33 marostegui: Deploy schema change on the fishbowl wikis list on T225643
  • 10:31 ema@cumin1001: END (PASS) - Cookbook sre.hosts.upgrade-and-reboot (exit_code=0)
  • 10:24 ema@cumin1001: START - Cookbook sre.hosts.upgrade-and-reboot
  • 10:23 ema@cumin1001: END (PASS) - Cookbook sre.hosts.upgrade-and-reboot (exit_code=0)
  • 10:17 ema@cumin1001: END (PASS) - Cookbook sre.hosts.upgrade-and-reboot (exit_code=0)
  • 10:11 ema@cumin1001: START - Cookbook sre.hosts.upgrade-and-reboot
  • 10:10 ema@cumin1001: START - Cookbook sre.hosts.upgrade-and-reboot
  • 09:58 _joe_: upgraded service-checker T225707
  • 09:56 ema@cumin1001: END (FAIL) - Cookbook sre.hosts.upgrade-and-reboot (exit_code=99)
  • 09:51 ema@cumin1001: END (PASS) - Cookbook sre.hosts.upgrade-and-reboot (exit_code=0)
  • 09:50 ema@cumin1001: START - Cookbook sre.hosts.upgrade-and-reboot
  • 09:44 ema@cumin1001: START - Cookbook sre.hosts.upgrade-and-reboot
  • 09:30 ema@cumin1001: END (PASS) - Cookbook sre.hosts.upgrade-and-reboot (exit_code=0)
  • 09:25 marostegui: Remove dbprov1001:/srv/backups/tmp/db1112 - T225981
  • 09:24 ema@cumin1001: END (PASS) - Cookbook sre.hosts.upgrade-and-reboot (exit_code=0)
  • 09:21 ema@cumin1001: START - Cookbook sre.hosts.upgrade-and-reboot
  • 09:17 ema@cumin1001: START - Cookbook sre.hosts.upgrade-and-reboot
  • 09:17 ema: cache nodes: resume rolling reboots for kernel and varnish upgrades T224694 T225998 T226048
  • 08:39 marostegui: Stop Mysql on db1124: s1, s3, s5 and s8 to upgrade mysql, this will generate lag on labs
  • 07:59 marostegui: Stop MYSQL and reboot db2084
  • 07:15 marostegui: Transfer dbprov1001:/srv/backups/tmp/db1112/sqldata to db1077 T225981
  • 07:00 moritzm: installing intel-microcode updates to June 2019 release (microcode is unmodified for most CPUs except for Sandybridge/Core-X models)
  • 06:43 marostegui@deploy1001: Synchronized wmf-config/db-eqiad.php: Depool and remove from config db1077 T225981 (duration: 00m 54s)
  • 06:31 marostegui@deploy1001: Synchronized wmf-config/db-eqiad.php: More traffic to db1112 in s3 T225981 (duration: 00m 56s)
  • 06:18 jmm@cumin2001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
  • 06:18 jmm@cumin2001: START - Cookbook sre.hosts.downtime
  • 06:18 moritzm: rebooting sarin for some tests with updated intel-microcode for MDS (also covering Sandybridge server CPUs initially not supported by Intel)
  • 06:16 marostegui@deploy1001: Synchronized wmf-config/db-eqiad.php: More traffic to db1112 in s3 T225981 (duration: 00m 55s)
  • 06:09 marostegui@deploy1001: Synchronized wmf-config/db-eqiad.php: More traffic to db1112 in s3 T225981 (duration: 00m 57s)
  • 05:54 marostegui@deploy1001: Synchronized wmf-config/db-eqiad.php: More traffic to db1112 in s3 T225981 (duration: 00m 56s)
  • 05:40 marostegui@deploy1001: Synchronized wmf-config/db-eqiad.php: More traffic to db1112 in s3 T225981 (duration: 00m 56s)
  • 05:37 marostegui: Deploy schema change on centralauth.oathauth_users T225643
  • 05:23 marostegui@deploy1001: Synchronized wmf-config/db-eqiad.php: Slowly pool db1112 into s3 T225981 (duration: 00m 55s)
  • 05:22 marostegui@deploy1001: Synchronized wmf-config/db-codfw.php: Slowly pool db1112 into s3 T225981 (duration: 00m 55s)
  • 05:04 marostegui@deploy1001: Synchronized wmf-config/db-eqiad.php: Repool db1077 T225981 (duration: 00m 55s)
  • 04:53 marostegui: Stop replication in sync on db1112 and db1077 to move db1124 under db1112 - T225981
  • 04:52 marostegui@deploy1001: Synchronized wmf-config/db-eqiad.php: Depool db1077 T225981 (duration: 00m 59s)
  • 04:00 onimisionipe: depooling maps1003 for reimage into new partition scheme - T224395

2019-06-19

  • 18:09 legoktm: added MatmaRex to extension-VisualEditor-staff Gerrit group
  • 16:50 moritzm: running racreset on multatuli
  • 16:50 XioNoX: rollback redirect ns0 to authdns2001
  • 16:45 moritzm: rebooting authdns1001 for kernel security update
  • 16:42 jmm@cumin2001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
  • 16:42 jmm@cumin2001: START - Cookbook sre.hosts.downtime
  • 16:39 XioNoX: redirect ns0 to authdns2001
  • 16:37 XioNoX: rollback redirect ns1 to authdns1001
  • 16:34 moritzm: rebooting authdns2001 for kernel security update
  • 16:28 XioNoX: redirect ns1 to authdns1001
  • 16:26 jmm@cumin2001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
  • 16:26 jmm@cumin2001: START - Cookbook sre.hosts.downtime
  • 16:23 onimisionipe: pooling elastic1029 - T214283
  • 16:01 ema: cache nodes: stop rolling reboots for today, 47/80 done T224694 T225998
  • 15:43 reedy@deploy1001: rebuilt and synchronized wikiversions files: group0 back to .8 T226109
  • 15:43 ema@cumin1001: END (PASS) - Cookbook sre.hosts.upgrade-and-reboot (exit_code=0)
  • 15:40 ema@cumin1001: END (PASS) - Cookbook sre.hosts.upgrade-and-reboot (exit_code=0)
  • 15:37 onimisionipe: pooled maps1002 - postgres init is complete and successfully joined to its cluster - T224395
  • 15:36 ema@cumin1001: START - Cookbook sre.hosts.upgrade-and-reboot
  • 15:33 ema@cumin1001: START - Cookbook sre.hosts.upgrade-and-reboot
  • 15:24 jmm@cumin2001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
  • 15:24 jmm@cumin2001: START - Cookbook sre.hosts.downtime
  • 15:21 moritzm: rolling reboot of proton* for kernel security update
  • 15:18 moritzm: rebooting boron for kernel security update
  • 15:18 jmm@cumin2001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
  • 15:18 jmm@cumin2001: START - Cookbook sre.hosts.downtime
  • 15:16 ema@cumin1001: END (PASS) - Cookbook sre.hosts.upgrade-and-reboot (exit_code=0)
  • 15:13 ema@cumin1001: END (PASS) - Cookbook sre.hosts.upgrade-and-reboot (exit_code=0)
  • 15:08 ema@cumin1001: START - Cookbook sre.hosts.upgrade-and-reboot
  • 15:06 ema@cumin1001: START - Cookbook sre.hosts.upgrade-and-reboot
  • 14:57 XioNoX: update syslog target on frack network devices (T224128)
  • 14:57 jmm@cumin2001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
  • 14:57 jmm@cumin2001: START - Cookbook sre.hosts.downtime
  • 14:55 jmm@cumin2001: END (FAIL) - Cookbook sre.hosts.downtime (exit_code=99)
  • 14:55 jmm@cumin2001: START - Cookbook sre.hosts.downtime
  • 14:55 XioNoX: jnt push to knams, remove old protect-old-lvs-servers term + update syslog target (T224128) + replace /28 with /29 (T211254)
  • 14:54 moritzm: rolling reboot of URL downloaders for kernel security update
  • 14:48 ema@cumin1001: END (PASS) - Cookbook sre.hosts.upgrade-and-reboot (exit_code=0)
  • 14:48 XioNoX: jnt push to eqiad, remove old protect-old-lvs-servers term + update syslog target T224128
  • 14:48 ema@cumin1001: END (PASS) - Cookbook sre.hosts.upgrade-and-reboot (exit_code=0)
  • 14:46 reedy@deploy1001: rebuilt and synchronized wikiversions files: group1 back to .8 T226109
  • 14:40 ema@cumin1001: START - Cookbook sre.hosts.upgrade-and-reboot
  • 14:40 ema@cumin1001: START - Cookbook sre.hosts.upgrade-and-reboot
  • 14:20 ema@cumin1001: END (PASS) - Cookbook sre.hosts.upgrade-and-reboot (exit_code=0)
  • 14:20 ema@cumin1001: END (PASS) - Cookbook sre.hosts.upgrade-and-reboot (exit_code=0)
  • 14:13 ema@cumin1001: START - Cookbook sre.hosts.upgrade-and-reboot
  • 14:11 ema@cumin1001: START - Cookbook sre.hosts.upgrade-and-reboot
  • 14:06 moritzm: rolling reboot of mwdebug servers for kernel security update
  • 14:06 jmm@cumin2001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
  • 14:06 jmm@cumin2001: START - Cookbook sre.hosts.downtime
  • 13:56 otto@deploy1001: Synchronized wmf-config/InitialiseSettings.php: Disabling Avro ApiAction Monolog channel - T222267 (duration: 00m 57s)
  • 13:53 ema@cumin1001: END (PASS) - Cookbook sre.hosts.upgrade-and-reboot (exit_code=0)
  • 13:51 ema@cumin1001: END (PASS) - Cookbook sre.hosts.upgrade-and-reboot (exit_code=0)
  • 13:50 cdanis: rebooting wikitech-static
  • 13:48 cdanis: apt upgrade on wikitech-static
  • 13:47 ema@cumin1001: START - Cookbook sre.hosts.upgrade-and-reboot
  • 13:44 ema@cumin1001: START - Cookbook sre.hosts.upgrade-and-reboot
  • 13:27 ema@cumin1001: END (PASS) - Cookbook sre.hosts.upgrade-and-reboot (exit_code=0)
  • 13:24 ema@cumin1001: END (PASS) - Cookbook sre.hosts.upgrade-and-reboot (exit_code=0)
  • 13:20 ema@cumin1001: START - Cookbook sre.hosts.upgrade-and-reboot
  • 13:17 ema@cumin1001: START - Cookbook sre.hosts.upgrade-and-reboot
  • 13:00 ema@cumin1001: END (PASS) - Cookbook sre.hosts.upgrade-and-reboot (exit_code=0)
  • 12:57 ema@cumin1001: END (PASS) - Cookbook sre.hosts.upgrade-and-reboot (exit_code=0)
  • 12:53 marostegui: Deploy schema change on the private wikis listed at T225643
  • 12:51 ema@cumin1001: START - Cookbook sre.hosts.upgrade-and-reboot
  • 12:51 ema@cumin1001: START - Cookbook sre.hosts.upgrade-and-reboot
  • 12:31 ema@cumin1001: END (PASS) - Cookbook sre.hosts.upgrade-and-reboot (exit_code=0)
  • 12:31 ema@cumin1001: END (PASS) - Cookbook sre.hosts.upgrade-and-reboot (exit_code=0)
  • 12:25 ema@cumin1001: START - Cookbook sre.hosts.upgrade-and-reboot
  • 12:21 ema@cumin1001: START - Cookbook sre.hosts.upgrade-and-reboot
  • 12:20 ema: cache nodes: resume rolling reboots for kernel and varnish upgrades T224694 T225998
  • 11:07 ema: cache nodes: pause rolling reboots for kernel and varnish upgrades T224694 T225998
  • 10:59 ema@cumin1001: END (PASS) - Cookbook sre.hosts.upgrade-and-reboot (exit_code=0)
  • 10:54 ema@cumin1001: END (PASS) - Cookbook sre.hosts.upgrade-and-reboot (exit_code=0)
  • 10:52 moritzm: rebooting mx1001 for kernel security update
  • 10:50 ema@cumin1001: START - Cookbook sre.hosts.upgrade-and-reboot
  • 10:47 ema@cumin1001: START - Cookbook sre.hosts.upgrade-and-reboot
  • 10:38 ladsgroup@deploy1001: scap-helm termbox finished
  • 10:38 ladsgroup@deploy1001: scap-helm termbox cluster codfw completed
  • 10:38 ladsgroup@deploy1001: scap-helm termbox upgrade -f termbox-values.yaml production stable/termbox [namespace: termbox, clusters: codfw]
  • 10:36 moritzm: rebooting mx2001 for kernel security update
  • 10:35 jmm@cumin2001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
  • 10:35 jmm@cumin2001: START - Cookbook sre.hosts.downtime
  • 10:33 akosiaris@deploy1001: scap-helm termbox finished
  • 10:33 akosiaris@deploy1001: scap-helm termbox cluster staging completed
  • 10:33 akosiaris@deploy1001: scap-helm termbox upgrade -f termbox-staging-values.yaml staging stable/termbox [namespace: termbox, clusters: staging]
  • 10:30 jbond42: update late-install so it installs the correct puppet version https://gerrit.wikimedia.org/r/c/operations/puppet/+/515087
  • 10:30 ema@cumin1001: END (PASS) - Cookbook sre.hosts.upgrade-and-reboot (exit_code=0)
  • 10:30 moritzm: installing glibc and ca-certificates-java updates from stretch point release
  • 10:29 akosiaris@deploy1001: scap-helm termbox finished
  • 10:29 akosiaris@deploy1001: scap-helm termbox cluster eqiad completed
  • 10:29 akosiaris@deploy1001: scap-helm termbox upgrade -f termbox-values.yaml production stable/termbox [namespace: termbox, clusters: eqiad]
  • 10:27 ema@cumin1001: END (FAIL) - Cookbook sre.hosts.upgrade-and-reboot (exit_code=99)
  • 10:23 ema@cumin1001: START - Cookbook sre.hosts.upgrade-and-reboot
  • 10:21 ema@cumin1001: START - Cookbook sre.hosts.upgrade-and-reboot
  • 10:05 ema: cp3030: increase varnish-be thread_pool_max from 12000 (250 * 48) to 14400 (300 * 48) to observe impact on fetcherrors
  • 10:03 ema@cumin1001: END (PASS) - Cookbook sre.hosts.upgrade-and-reboot (exit_code=0)
  • 10:02 marostegui@deploy1001: Synchronized wmf-config/db-eqiad.php: Fully repool db1077 (duration: 00m 55s)
  • 10:01 ema@cumin1001: END (PASS) - Cookbook sre.hosts.upgrade-and-reboot (exit_code=0)
  • 09:56 ema@cumin1001: START - Cookbook sre.hosts.upgrade-and-reboot
  • 09:54 ema@cumin1001: START - Cookbook sre.hosts.upgrade-and-reboot
  • 09:49 marostegui@deploy1001: Synchronized wmf-config/db-eqiad.php: More traffic to db1077 (duration: 00m 55s)
  • 09:36 ema@cumin1001: END (PASS) - Cookbook sre.hosts.upgrade-and-reboot (exit_code=0)
  • 09:34 ema@cumin1001: END (PASS) - Cookbook sre.hosts.upgrade-and-reboot (exit_code=0)
  • 09:34 marostegui@deploy1001: Synchronized wmf-config/db-eqiad.php: More traffic to db1077 (duration: 00m 55s)
  • 09:29 ema@cumin1001: START - Cookbook sre.hosts.upgrade-and-reboot
  • 09:25 ema@cumin1001: START - Cookbook sre.hosts.upgrade-and-reboot
  • 09:24 marostegui@deploy1001: Synchronized wmf-config/db-eqiad.php: Slowly repool db1077 T225981 (duration: 01m 00s)
  • 09:20 XioNoX: jnt push to esams, remove old protect-old-lvs-servers term + update syslog target T224128
  • 09:14 marostegui: Start MySQL on db1077 - s3 labsdb lag should start catching up T225981
  • 09:13 akosiaris@puppetmaster1001: conftool action : set/pooled=yes; selector: name=kubernetes2001.*
  • 09:09 ema@cumin1001: END (PASS) - Cookbook sre.hosts.upgrade-and-reboot (exit_code=0)
  • 09:06 akosiaris: repool kubernetes2002, kubernetes2003. Point proven, chasing down lead
  • 09:06 akosiaris: repool kubernetes2002, kubernetes2003. Point proven, chasing down load
  • 09:06 akosiaris@puppetmaster1001: conftool action : set/pooled=yes; selector: name=kubernetes2002.*
  • 09:06 akosiaris@puppetmaster1001: conftool action : set/pooled=yes; selector: name=kubernetes2003.*
  • 09:05 ema@cumin1001: END (PASS) - Cookbook sre.hosts.upgrade-and-reboot (exit_code=0)
  • 09:03 ema@cumin1001: START - Cookbook sre.hosts.upgrade-and-reboot
  • 08:57 akosiaris: depool kubernetes200{2,3} for the same out discards investigation
  • 08:56 ema@cumin1001: START - Cookbook sre.hosts.upgrade-and-reboot
  • 08:56 akosiaris@puppetmaster1001: conftool action : set/pooled=no; selector: name=kubernetes2003.*
  • 08:56 akosiaris@puppetmaster1001: conftool action : set/pooled=no; selector: name=kubernetes2002.*
  • 08:54 akosiaris: uncordon kubernetes2001, reschedule some pods on it. Investigating out discards still
  • 08:51 XioNoX: jnt push to codfw, remove old protect-old-lvs-servers term + update syslog target T224128
  • 08:43 ema@cumin1001: END (PASS) - Cookbook sre.hosts.upgrade-and-reboot (exit_code=0)
  • 08:43 akosiaris: depool kubernetes2001 from all services to investigate some IP out discard statistics
  • 08:42 akosiaris@puppetmaster1001: conftool action : set/pooled=no; selector: name=kubernetes2001.*
  • 08:36 ema@cumin1001: END (PASS) - Cookbook sre.hosts.upgrade-and-reboot (exit_code=0)
  • 08:36 akosiaris: cordon kubernetes2001 to investigate some IP out discard statistics
  • 08:34 ema@cumin1001: START - Cookbook sre.hosts.upgrade-and-reboot
  • 08:28 ema@cumin1001: START - Cookbook sre.hosts.upgrade-and-reboot
  • 08:24 moritzm: installing new kernels with SACK fix on jessie servers
  • 08:21 akosiaris: upgrade citoid, mathoid, termbox to latest chart releases to address the GC metric naming issue T220709 T222795
  • 08:20 akosiaris@deploy1001: scap-helm termbox finished
  • 08:20 akosiaris@deploy1001: scap-helm termbox cluster staging completed
  • 08:20 akosiaris@deploy1001: scap-helm termbox upgrade -f termbox-staging-values.yaml staging stable/termbox [namespace: termbox, clusters: staging]
  • 08:20 akosiaris@deploy1001: scap-helm termbox finished
  • 08:20 akosiaris@deploy1001: scap-helm termbox cluster codfw completed
  • 08:20 akosiaris@deploy1001: scap-helm termbox cluster eqiad completed
  • 08:20 akosiaris@deploy1001: scap-helm termbox upgrade -f termbox-values.yaml production stable/termbox [namespace: termbox, clusters: eqiad,codfw]
  • 08:19 akosiaris@deploy1001: scap-helm mathoid finished
  • 08:18 akosiaris@deploy1001: scap-helm mathoid cluster codfw completed
  • 08:18 akosiaris@deploy1001: scap-helm mathoid cluster eqiad completed
  • 08:18 akosiaris@deploy1001: scap-helm mathoid upgrade -f mathoid-values.yaml production stable/mathoid [namespace: mathoid, clusters: eqiad,codfw]
  • 08:14 ema@cumin1001: END (PASS) - Cookbook sre.hosts.upgrade-and-reboot (exit_code=0)
  • 08:13 akosiaris@deploy1001: scap-helm mathoid finished
  • 08:13 akosiaris@deploy1001: scap-helm mathoid cluster staging completed
  • 08:13 akosiaris@deploy1001: scap-helm mathoid upgrade -f mathoid-staging-values.yaml staging stable/mathoid [namespace: mathoid, clusters: staging]
  • 08:13 akosiaris@deploy1001: scap-helm mathoid upgrade -f mathoid-staging-values.yaml staging stable/mathoid [namespace: mathoid, clusters: staging]
  • 08:13 akosiaris@deploy1001: scap-helm citoid finished
  • 08:13 akosiaris@deploy1001: scap-helm citoid cluster eqiad completed
  • 08:13 akosiaris@deploy1001: scap-helm citoid upgrade -f citoid-eqiad-values.yaml production stable/citoid [namespace: citoid, clusters: eqiad]
  • 08:08 ema@cumin1001: END (PASS) - Cookbook sre.hosts.upgrade-and-reboot (exit_code=0)
  • 08:07 ema@cumin1001: START - Cookbook sre.hosts.upgrade-and-reboot
  • 08:02 ema@cumin1001: START - Cookbook sre.hosts.upgrade-and-reboot
  • 08:01 ema: cache nodes: resume rolling reboots for kernel and varnish upgrades T224694
  • 08:00 akosiaris@deploy1001: scap-helm citoid finished
  • 08:00 akosiaris@deploy1001: scap-helm citoid cluster codfw completed
  • 08:00 akosiaris@deploy1001: scap-helm citoid upgrade -f citoid-codfw-values.yaml production stable/citoid [namespace: citoid, clusters: codfw]
  • 08:00 akosiaris@deploy1001: scap-helm citoid finished
  • 07:59 akosiaris@deploy1001: scap-helm citoid cluster staging completed
  • 07:59 akosiaris@deploy1001: scap-helm citoid upgrade -f citoid-staging-values.yaml staging stable/citoid [namespace: citoid, clusters: staging]
  • 07:56 moritzm: rearmed keyholder on acmechief-test2001
  • 07:51 moritzm: installing vim security updates on stretch
  • 07:46 akosiaris@deploy1001: scap-helm mathoid upgrade -f mathoid-staging-values.yaml staging stable/mathoid [namespace: mathoid, clusters: staging]
  • 07:35 akosiaris@deploy1001: scap-helm mathoid upgrade -f mathoid-staging-values.yaml staging stable/mathoid [namespace: mathoid, clusters: staging]
  • 07:34 akosiaris@deploy1001: scap-helm mathoid upgrade -f mathoid-staging-values.yaml staging stable/mathoid [namespace: mathoid, clusters: eqiad,codfw]
  • 07:18 XioNoX: jnt push to eqdfw, remove old protect-old-lvs-servers term + update syslog target T224128
  • 07:17 XioNoX: jnt push to eqord, remove old protect-old-lvs-servers term + update syslog target T224128
  • 07:13 XioNoX: jnt push to eqsin, remove old protect-old-lvs-servers term + update syslog target T224128
  • 07:12 marostegui: s3 will be lagging on labsdb hosts due to maintenance on db1077 - T225981
  • 07:02 XioNoX: jnt push to ulsfo, remove old protect-old-lvs-servers term + update syslog target T224128
  • 06:57 marostegui: Stop MySQL on db1077 to transfer its data to db1112 - T225981
  • 06:53 marostegui@deploy1001: Synchronized wmf-config/db-eqiad.php: Depool db1077 T225981 (duration: 01m 06s)
  • 05:50 marostegui@deploy1001: Synchronized wmf-config/db-eqiad.php: Depool db1135 T222682 (duration: 00m 56s)
  • 05:37 marostegui: Upgrade db1068 (old s4 master) to 10.1.39
  • 05:35 marostegui@deploy1001: Synchronized wmf-config/db-eqiad.php: Clarify db1138 status (duration: 00m 55s)
  • 05:03 marostegui@deploy1001: Synchronized wmf-config/db-eqiad.php: Remove s4 ready only T224852 (duration: 00m 33s)
  • 05:02 marostegui@deploy1001: Synchronized wmf-config/db-eqiad.php: Switchover s4 master eqiad from db1068 to db1081 T224852 (duration: 00m 33s)
  • 05:01 marostegui@deploy1001: Synchronized wmf-config/db-eqiad.php: Set s4 on read-only T224852 (duration: 00m 34s)
  • 05:00 marostegui: Starting s4 failover from db1068 to db1081 - T224852
  • 04:40 kartik@deploy1001: scap-helm cxserver finished
  • 04:40 kartik@deploy1001: scap-helm cxserver cluster eqiad completed
  • 04:40 kartik@deploy1001: scap-helm cxserver upgrade -f cxserver-eqiad-values.yaml production stable/cxserver [namespace: cxserver, clusters: eqiad]
  • 04:40 kartik@deploy1001: scap-helm cxserver finished
  • 04:40 kartik@deploy1001: scap-helm cxserver cluster codfw completed
  • 04:40 kartik@deploy1001: scap-helm cxserver upgrade -f cxserver-codfw-values.yaml production stable/cxserver [namespace: cxserver, clusters: codfw]
  • 04:40 kartik@deploy1001: scap-helm cxserver finished
  • 04:40 kartik@deploy1001: scap-helm cxserver cluster staging completed
  • 04:39 kartik@deploy1001: scap-helm cxserver upgrade -f cxserver-staging-values.yaml staging stable/cxserver [namespace: cxserver, clusters: staging]
  • 04:28 marostegui: Starting pre-steps for the s4 failover that will happen at 05:00 UTC - T224852
  • 04:25 kartik@deploy1001: scap-helm cxserver finished
  • 04:25 kartik@deploy1001: scap-helm cxserver cluster eqiad completed
  • 04:25 kartik@deploy1001: scap-helm cxserver upgrade -f cxserver-eqiad-values.yaml production stable/cxserver [namespace: cxserver, clusters: eqiad]
  • 04:24 kartik@deploy1001: scap-helm cxserver finished
  • 04:24 kartik@deploy1001: scap-helm cxserver cluster codfw completed
  • 04:24 kartik@deploy1001: scap-helm cxserver upgrade -f cxserver-codfw-values.yaml production stable/cxserver [namespace: cxserver, clusters: codfw]
  • 04:21 onimisionipe: depooling maps1002 for reimaging into new partition scheme - T224395
  • 04:20 kartik@deploy1001: scap-helm cxserver finished
  • 04:20 kartik@deploy1001: scap-helm cxserver cluster staging completed
  • 04:20 kartik@deploy1001: scap-helm cxserver upgrade -f cxserver-staging-values.yaml staging stable/cxserver [namespace: cxserver, clusters: staging]
  • 04:19 marostegui@deploy1001: Synchronized wmf-config/db-eqiad.php: Depool db1081 T224852 (duration: 00m 57s)

2019-06-18

  • 22:33 jijiki: pool thumbor1001
  • 22:20 krinkle@deploy1001: Synchronized php-1.34.0-wmf.8/includes/htmlform/fields/HTMLSelectAndOtherField.php: 90b513d96e36 / T222170 (duration: 00m 57s)
  • 21:48 twentyafterfour@deploy1001: Synchronized php: group1 wikis to 1.34.0-wmf.10 refs T220735 (duration: 00m 54s)
  • 21:47 twentyafterfour@deploy1001: rebuilt and synchronized wikiversions files: group1 wikis to 1.34.0-wmf.10 refs T220735
  • 21:33 twentyafterfour: Promoting Group 1 wikis to MediaWiki 1.34.0-wmf.10 ahead of schedule because tomorrow is a WMF holiday.
  • 20:49 smalyshev@deploy1001: Finished deploy [wdqs/wdqs@0a1c946]: deploy new GUI for T226017 (duration: 30m 03s)
  • 20:31 twentyafterfour@deploy1001: rebuilt and synchronized wikiversions files: group0 wikis to 1.34.0-wmf.10 refs T220735
  • 20:24 twentyafterfour@deploy1001: Finished scap: testwikis wikis to 1.34.0-wmf.10 refs T220735 (duration: 37m 10s)
  • 20:19 smalyshev@deploy1001: Started deploy [wdqs/wdqs@0a1c946]: deploy new GUI for T226017
  • 19:47 twentyafterfour@deploy1001: Started scap: testwikis wikis to 1.34.0-wmf.10 refs T220735
  • 19:15 twentyafterfour: branching 1.34.0-wmf.10
  • 19:04 ebernhardson: deployed discovery.query_clicks_{hourly,daily} fill jobs updated to use eventgate to oozie
  • 18:57 jijiki: depool thumbor1001
  • 18:53 ebernhardson@deploy1001: Finished deploy [wikimedia/discovery/analytics@deb30dc]: Ship search analytics jobs updated to source from eventgate (duration: 00m 17s)
  • 18:52 ebernhardson@deploy1001: Started deploy [wikimedia/discovery/analytics@deb30dc]: Ship search analytics jobs updated to source from eventgate
  • 18:20 jynus: running data compare on s4 (commons) databases T224852
  • 17:38 jynus: testing switchover automation on es2001/es2002 T224852
  • 17:34 mbsantos@deploy1001: Finished deploy [mobileapps/deploy@dea8e94]: Update mobileapps to c6804c5 (duration: 04m 41s)
  • 17:29 mbsantos@deploy1001: Started deploy [mobileapps/deploy@dea8e94]: Update mobileapps to c6804c5
  • 16:38 otto@deploy1001: scap-helm eventgate-analytics finished
  • 16:38 otto@deploy1001: scap-helm eventgate-analytics cluster eqiad completed
  • 16:38 otto@deploy1001: scap-helm eventgate-analytics upgrade analytics -f /srv/scap-helm/eventgate/analytics/eqiad-values.yaml --reset-values stable/eventgate [namespace: eventgate-analytics, clusters: eqiad]
  • 16:35 otto@deploy1001: scap-helm eventgate-analytics finished
  • 16:35 otto@deploy1001: scap-helm eventgate-analytics cluster codfw completed
  • 16:34 otto@deploy1001: scap-helm eventgate-analytics upgrade analytics -f /srv/scap-helm/eventgate/analytics/codfw-values.yaml --reset-values stable/eventgate [namespace: eventgate-analytics, clusters: codfw]
  • 16:32 otto@deploy1001: scap-helm eventgate-analytics finished
  • 16:32 otto@deploy1001: scap-helm eventgate-analytics cluster staging completed
  • 16:32 otto@deploy1001: scap-helm eventgate-analytics upgrade analytics -f /srv/scap-helm/eventgate/analytics/staging-values.yaml --reset-values stable/eventgate [namespace: eventgate-analytics, clusters: staging]
  • 16:27 otto@deploy1001: scap-helm eventgate-main finished
  • 16:27 otto@deploy1001: scap-helm eventgate-main cluster eqiad completed
  • 16:26 otto@deploy1001: scap-helm eventgate-main upgrade main -f /srv/scap-helm/eventgate/main/eqiad-values.yaml --reset-values stable/eventgate [namespace: eventgate-main, clusters: eqiad]
  • 16:25 otto@deploy1001: scap-helm eventgate-main finished
  • 16:25 otto@deploy1001: scap-helm eventgate-main cluster codfw completed
  • 16:25 otto@deploy1001: scap-helm eventgate-main upgrade main -f /srv/scap-helm/eventgate/main/codfw-values.yaml --reset-values stable/eventgate [namespace: eventgate-main, clusters: codfw]
  • 16:23 otto@deploy1001: scap-helm eventgate-main finished
  • 16:23 otto@deploy1001: scap-helm eventgate-main cluster staging completed
  • 16:23 otto@deploy1001: scap-helm eventgate-main upgrade main -f /srv/scap-helm/eventgate/main/staging-values.yaml --reset-values stable/eventgate [namespace: eventgate-main, clusters: staging]
  • 16:07 ema: cache nodes: stop rolling reboots for today, 17/80 done T224694
  • 16:06 ema@cumin1001: END (PASS) - Cookbook sre.hosts.upgrade-and-reboot (exit_code=0)
  • 16:01 ema@cumin1001: END (PASS) - Cookbook sre.hosts.upgrade-and-reboot (exit_code=0)
  • 16:01 bmansurov@deploy1001: Finished deploy [recommendation-api/deploy@09404fb]: Update the recommendation API service (duration: 03m 09s)
  • 15:59 ema@cumin1001: START - Cookbook sre.hosts.upgrade-and-reboot
  • 15:58 bmansurov@deploy1001: Started deploy [recommendation-api/deploy@09404fb]: Update the recommendation API service
  • 15:55 ema@cumin1001: START - Cookbook sre.hosts.upgrade-and-reboot
  • 15:39 ema@cumin1001: END (PASS) - Cookbook sre.hosts.upgrade-and-reboot (exit_code=0)
  • 15:35 ema@cumin1001: END (PASS) - Cookbook sre.hosts.upgrade-and-reboot (exit_code=0)
  • 15:30 ema@cumin1001: START - Cookbook sre.hosts.upgrade-and-reboot
  • 15:26 ema@cumin1001: START - Cookbook sre.hosts.upgrade-and-reboot
  • 15:10 ema@cumin1001: END (PASS) - Cookbook sre.hosts.upgrade-and-reboot (exit_code=0)
  • 15:06 ema@cumin1001: END (PASS) - Cookbook sre.hosts.upgrade-and-reboot (exit_code=0)
  • 15:03 ema@cumin1001: START - Cookbook sre.hosts.upgrade-and-reboot
  • 14:57 ema@cumin1001: START - Cookbook sre.hosts.upgrade-and-reboot
  • 14:43 ema@cumin1001: END (PASS) - Cookbook sre.hosts.upgrade-and-reboot (exit_code=0)
  • 14:37 ema@cumin1001: END (PASS) - Cookbook sre.hosts.upgrade-and-reboot (exit_code=0)
  • 14:35 ema@cumin1001: START - Cookbook sre.hosts.upgrade-and-reboot
  • 14:30 ema@cumin1001: START - Cookbook sre.hosts.upgrade-and-reboot
  • 14:16 akosiaris@puppetmaster1001: conftool action : set/pooled=true; selector: dnsdisc=termbox
  • 14:16 akosiaris@puppetmaster1001: conftool action : set/pooled=true; selector: dnsdisc=sessionstore
  • 14:15 ema@cumin1001: END (PASS) - Cookbook sre.hosts.upgrade-and-reboot (exit_code=0)
  • 14:12 otto@deploy1001: Synchronized wmf-config/InitialiseSettings.php: Produce page-restrictions-change to eventgate-main - T211248 (duration: 00m 47s)
  • 14:10 ema@cumin1001: END (PASS) - Cookbook sre.hosts.upgrade-and-reboot (exit_code=0)
  • 14:09 ema@cumin1001: START - Cookbook sre.hosts.upgrade-and-reboot
  • 14:05 otto@deploy1001: Synchronized wmf-config/InitialiseSettings.php: Produce page-links-change to eventgate-main - T211248 (duration: 00m 48s)
  • 14:04 ottomata: deploying mediawiki-config to Produce page-linkT211248s-change stream to eventgate-main - T211248
  • 14:04 ema@cumin1001: START - Cookbook sre.hosts.upgrade-and-reboot
  • 13:56 Amir1: ladsgroup@mwmaint1002:~$ mwscript sql.php --wiki=wikidatawiki /srv/mediawiki/php-1.34.0-wmf.8/extensions/Wikibase/repo/sql/AddNormalizedTermsTablesDDL.sql (T225039)
  • 13:56 Amir1: ladsgroup@mwmaint1002:~$ mwscript sql.php --wiki=testwikidatawiki /srv/mediawiki/php-1.34.0-wmf.8/extensions/Wikibase/repo/sql/AddNormalizedTermsTablesDDL.sql (T225039)
  • 13:49 ema@cumin1001: END (PASS) - Cookbook sre.hosts.upgrade-and-reboot (exit_code=0)
  • 13:48 otto@deploy1001: Synchronized wmf-config/InitialiseSettings.php: Produce page-properties-change to eventgate-main - T211248 (duration: 00m 48s)
  • 13:46 ottomata: deploying mediawiki-config to produce page-properties-change events to eventgate-main
  • 13:44 ema@cumin1001: END (PASS) - Cookbook sre.hosts.upgrade-and-reboot (exit_code=0)
  • 13:42 ema@cumin1001: START - Cookbook sre.hosts.upgrade-and-reboot
  • 13:42 XioNoX: push new syslog target to msw* - T224128
  • 13:37 ema@cumin1001: START - Cookbook sre.hosts.upgrade-and-reboot
  • 13:31 XioNoX: push new syslog target to mr* - T224128
  • 13:22 ema@cumin1001: END (PASS) - Cookbook sre.hosts.upgrade-and-reboot (exit_code=0)
  • 13:17 ema@cumin1001: END (PASS) - Cookbook sre.hosts.upgrade-and-reboot (exit_code=0)
  • 13:12 ema@cumin1001: START - Cookbook sre.hosts.upgrade-and-reboot
  • 13:10 ema@cumin1001: START - Cookbook sre.hosts.upgrade-and-reboot
  • 13:10 ema: cache nodes: begin rolling reboots for kernel and varnish upgrades T224694
  • 12:59 ema@cumin1001: END (PASS) - Cookbook sre.hosts.upgrade-and-reboot (exit_code=0)
  • 12:55 ema@cumin1001: END (PASS) - Cookbook sre.hosts.upgrade-and-reboot (exit_code=0)
  • 12:53 ema@cumin1001: START - Cookbook sre.hosts.upgrade-and-reboot
  • 12:49 ema@cumin1001: START - Cookbook sre.hosts.upgrade-and-reboot
  • 12:49 ema: cp3034 (ats-be upload) cp2002 (varnish-be upload): reboot for kernel and varnish upgrade T224694
  • 12:38 ema@cumin1001: END (PASS) - Cookbook sre.hosts.upgrade-and-reboot (exit_code=0)
  • 12:37 XioNoX: merge puppet change to make all router down alerts paging - T224535
  • 12:29 ema@cumin1001: START - Cookbook sre.hosts.upgrade-and-reboot
  • 12:27 ema: cp5007 (varnish-be text): reboot for kernel and varnish upgrade T224694
  • 12:23 XioNoX: activate bgp to telia on cr1-codfw - T222967
  • 12:13 Urbanecm: Assigned an email address to Eritha@enwiki per user request (T223960)
  • 12:12 akosiaris: slowly rolling restart php7 on mw1299-mw1338 to avoid opcache exhaustion
  • 12:03 Urbanecm: EU SWAT done
  • 12:03 urbanecm@deploy1001: Synchronized wmf-config/flaggedrevs.php: Allow sysops to manage flaggedrevs group membership only if the group exists (T225797) (duration: 00m 47s)
  • 12:00 Urbanecm: EU SWAT is going a few minutes beyond its slot
  • 11:55 Urbanecm: running namespaceDupes.php for eswikibooks (T216143)
  • 11:54 urbanecm@deploy1001: Synchronized wmf-config/InitialiseSettings.php: Set two new namespace aliases for es.wikibooks (T216143) (duration: 00m 47s)
  • 11:49 akosiaris: set all termbox backends with weight 10 (from 0) for consistency's sake
  • 11:49 dcausse@deploy1001: Synchronized wmf-config/CirrusSearch-common.php: [cirrus] drop most wmgCirrusSearch* ephemeral config vars [3/3] (3/3) (duration: 00m 46s)
  • 11:49 akosiaris@puppetmaster1001: conftool action : set/weight=10; selector: service=termbox
  • 11:47 dcausse@deploy1001: Synchronized wmf-config/CirrusSearch-production.php: [cirrus] drop most wmgCirrusSearch* ephemeral config vars [3/3] (2/3) (duration: 00m 47s)
  • 11:46 dcausse@deploy1001: Synchronized wmf-config/CommonSettings.php: [cirrus] drop most wmgCirrusSearch* ephemeral config vars [3/3] (1/3) (duration: 00m 47s)
  • 11:39 akosiaris: restart pybal on lvs1016
  • 11:39 dcausse@deploy1001: Synchronized wmf-config/InitialiseSettings.php: [cirrus] drop most wmgCirrusSearch* ephemeral config vars [2/3] (duration: 00m 47s)
  • 11:38 akosiaris: restart pybal on lvs2003
  • 11:34 jijiki: restarting php-fpm on mwdebug1001
  • 11:29 dcausse@deploy1001: Synchronized wmf-config/CirrusSearch-common.php: [cirrus] drop most wmgCirrusSearch* ephemeral config vars [1/3] (duration: 00m 46s)
  • 11:26 akosiaris: pool all hosts for termbox
  • 11:26 akosiaris@puppetmaster1001: conftool action : set/pooled=yes; selector: service=termbox
  • 11:25 dcausse@deploy1001: Synchronized wmf-config/extension-list: [cirrus] Load cirrus using wfLoadExtension 2/2 (duration: 00m 46s)
  • 11:24 dcausse@deploy1001: Synchronized wmf-config/CommonSettings.php: [cirrus] Load cirrus using wfLoadExtension 1/2 (duration: 00m 47s)
  • 11:22 akosiaris: set elastic1029 as inactive in all conftool data. Command was sudo confctl select "name=elastic1029.eqiad.wmnet" set/pooled=inactive T214283
  • 11:21 akosiaris@puppetmaster1001: conftool action : set/pooled=inactive; selector: name=elastic1029.eqiad.wmnet
  • 11:15 akosiaris: deploy lvs termbox configuration changes
  • 11:12 awight@deploy1001: Synchronized wmf-config/InitialiseSettings.php: SWAT: Add 'sms' and 'smn' langcodes to commons for use in captions (T222309) (duration: 00m 48s)
  • 10:58 jbond@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
  • 10:58 jbond@cumin1001: START - Cookbook sre.hosts.downtime
  • 10:57 jbond42: reboot bast1002
  • 10:53 jmm@cumin2001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
  • 10:53 jmm@cumin2001: START - Cookbook sre.hosts.downtime
  • 10:53 moritzm: rebooting pybal-test2001 for some tests with the new 4.9 kernel for jessie
  • 10:52 jbond@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
  • 10:52 jbond@cumin1001: START - Cookbook sre.hosts.downtime
  • 10:46 jbond@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
  • 10:46 jbond@cumin1001: START - Cookbook sre.hosts.downtime
  • 10:46 jbond42: reboot bast2002
  • 10:40 jbond@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
  • 10:40 jbond@cumin1001: START - Cookbook sre.hosts.downtime
  • 10:31 jbond@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
  • 10:31 jbond@cumin1001: START - Cookbook sre.hosts.downtime
  • 10:28 jbond42: reboot bast3002
  • 10:28 jbond@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
  • 10:28 jbond@cumin1001: START - Cookbook sre.hosts.downtime
  • 10:24 jbond@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
  • 10:24 jbond@cumin1001: START - Cookbook sre.hosts.downtime
  • 10:24 jbond42: reboot iron.wikimedia.org
  • 10:19 jbond42: reboot bast4001
  • 10:08 jbond42: reboot bast5001
  • 10:01 moritzm: upgrading acmechief* to latest Buster
  • 09:53 ema: upgrade varnish packages to 5.1.3-1wm10 on all A:cp (no restarts yet)
  • 09:42 jbond42: I will start a rolling reboot of all bastion servers at 10:00UTC
  • 09:36 jijiki: restarting php-fpm in mwdebug1002
  • 09:19 jmm@cumin2001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
  • 09:19 jmm@cumin2001: START - Cookbook sre.hosts.downtime
  • 09:08 elukey: reboot analytics-tool1004 a second time to pick up the new kernel upgrades
  • 08:54 akosiaris@deploy1001: scap-helm termbox finished
  • 08:54 akosiaris@deploy1001: scap-helm termbox cluster codfw completed
  • 08:54 akosiaris@deploy1001: scap-helm termbox cluster eqiad completed
  • 08:54 akosiaris@deploy1001: scap-helm termbox upgrade --install -f termbox-values.yaml production stable/termbox [namespace: termbox, clusters: eqiad,codfw]
  • 08:52 akosiaris: deploy termbox T220402
  • 08:52 akosiaris@deploy1001: scap-helm termbox finished
  • 08:52 akosiaris@deploy1001: scap-helm termbox cluster codfw completed
  • 08:52 akosiaris@deploy1001: scap-helm termbox cluster eqiad completed
  • 08:52 akosiaris@deploy1001: scap-helm termbox upgrade --install -f termbox-values.yaml production stable/termbox [namespace: termbox, clusters: eqiad,codfw]
  • 08:23 marostegui: Stop MySQL on db2039 - T225988
  • 08:16 marostegui: Remove db2039 from tendril and zarcillo - T225988
  • 07:45 elukey: roll restart of cassandra on aqs* to pick up new openjdk upgrades
  • 07:39 elukey: reboot matomo1001 for kernel upgrades
  • 07:36 elukey: reboot archiva1001 for kernel upgrades
  • 07:32 elukey: reboot analytics-tool100* and an-tool100* for kernel upgrades
  • 07:21 elukey: upload matomo_3.9.1-3 to stretch-wikimedia and upgrade matomo1001
  • 07:06 moritzm: disabling TCP selective acknowledgements on a number of internal test hosts
  • 07:04 marostegui@deploy1001: Synchronized wmf-config/db-codfw.php: Remove db2039 from config T221533 (duration: 00m 46s)
  • 07:03 marostegui@deploy1001: Synchronized wmf-config/db-eqiad.php: Remove db2039 from config T221533 (duration: 00m 51s)
  • 06:56 onimisionipe: pooling maps1001 - reimage is complete - T224395
  • 06:19 marostegui: Stop slave and mysql on db1112 to copy its content to dbprov1001:/srv/backups/tmp/db1112 - T225981
  • 05:54 marostegui: Stop slave and mysql on db1112 to copy its content to dbstore1001:/srv/tmp/db1112 - T225981
  • 04:44 marostegui: Deploy schema change on db1073 (labtestwiki and labswiki) - T225643
  • 04:28 marostegui@deploy1001: Synchronized wmf-config/db-eqiad.php: Repool pc1008 after optimizing its tables T210725 (duration: 00m 47s)
  • 03:46 kartik@deploy1001: scap-helm cxserver finished
  • 03:46 kartik@deploy1001: scap-helm cxserver cluster eqiad completed
  • 03:46 kartik@deploy1001: scap-helm cxserver upgrade -f cxserver-eqiad-values.yaml production stable/cxserver [namespace: cxserver, clusters: eqiad]
  • 03:45 kartik@deploy1001: scap-helm cxserver finished
  • 03:45 kartik@deploy1001: scap-helm cxserver cluster codfw completed
  • 03:45 kartik@deploy1001: scap-helm cxserver upgrade -f cxserver-codfw-values.yaml production stable/cxserver [namespace: cxserver, clusters: codfw]
  • 03:42 kartik@deploy1001: scap-helm cxserver finished
  • 03:42 kartik@deploy1001: scap-helm cxserver cluster staging completed
  • 03:42 kartik@deploy1001: scap-helm cxserver upgrade -f cxserver-staging-values.yaml staging stable/cxserver [namespace: cxserver, clusters: staging]
  • 00:22 RoanKattouw: Running populateRevisionSha1.php on dewikivoyage for T219816
  • 00:15 RoanKattouw: Running populateRevisionSha1.php on testwiki for T219816

2019-06-17

  • 23:40 Krinkle: Repopulating lost "coal.*" data in Graphite from NavigationTiming for 2019-04-17, ref T221401
  • 23:27 niharika29@deploy1001: Synchronized wmf-config/InitialiseSettings.php: No further use of ShortUrl (duration: 00m 47s)
  • 23:22 Krinkle: Prune debugging data "coal_tmp2.*" and "coal_tmp3.*" from graphite1004 and graphite2003 from last week, ref T221401
  • 23:21 Krinkle: Prune random spare "BetaMediaWiki.*" data points from graphite1004 and graphite2003 from pre Nov 2018.
  • 20:53 arlolra: Updated Parsoid to 2bf94f0 (T225217)
  • 20:45 arlolra@deploy1001: Finished deploy [parsoid/deploy@a8d9f6e]: Updating Parsoid to 2bf94f0 (duration: 10m 28s)
  • 20:34 arlolra@deploy1001: Started deploy [parsoid/deploy@a8d9f6e]: Updating Parsoid to 2bf94f0
  • 20:18 halfak@deploy1001: Finished deploy [ores/deploy@04fbd58]: T224484 (duration: 15m 17s)
  • 20:02 halfak@deploy1001: Started deploy [ores/deploy@04fbd58]: T224484
  • 18:37 jforrester@deploy1001: Synchronized wmf-config/InitialiseSettings.php: GrowthExperiments (testwiki): Switch on mobile homepage feature (duration: 00m 47s)
  • 18:36 jforrester@deploy1001: Synchronized wmf-config/InitialiseSettings.php: Produce mediawiki.user-blocks-change stream to eventgate-main, again (duration: 00m 49s)
  • 18:25 jforrester@deploy1001: Synchronized wmf-config/InitialiseSettings.php: Enable ExtensionDistributor log channel to help with T225243 (duration: 00m 47s)
  • 18:24 jforrester@deploy1001: Synchronized wmf-config/CommonSettings.php: ExtensionDistributor: Enable REL1_33 (beta), drop pre-REL1_30 (duration: 00m 48s)
  • 18:21 jforrester@deploy1001: Synchronized wmf-config/InitialiseSettings.php: SWAT: Deploy Partial blocks to English wikisource, wiktionary and wikivoyage T218626 (duration: 00m 47s)
  • 18:18 jforrester@deploy1001: Synchronized wmf-config/InitialiseSettings.php: SWAT: Extend wgCopyUploadsDomains T213901 T224875 T225852 (duration: 00m 47s)
  • 18:14 jforrester@deploy1001: Synchronized php-1.34.0-wmf.8/extensions/EventBus/includes/EventFactory.php: SWAT: Ensure user-blocks-change expiry_dt is in ISO-8601 (duration: 00m 48s)
  • 18:07 jforrester@deploy1001: Synchronized php-1.34.0-wmf.8/extensions/FlaggedRevs/frontend/modules/ext.flaggedRevs.advanced.js: SWAT: FlaggedRevs: Bring back diff toggle T225351 (duration: 00m 48s)
  • 18:05 jforrester@deploy1001: Synchronized wmf-config/InitialiseSettings.php: SWAT: Turn off mobile-ab test for VE section editing (duration: 00m 48s)
  • 18:03 moritzm: disabled TCP selective acknowledgements on caches/bastions
  • 18:00 onimisionipe@deploy1001: Finished deploy [wdqs/wdqs@dcf3338]: New Updater, GUI and Blazegraph build (duration: 17m 37s)
  • 17:56 otto@deploy1001: scap-helm eventgate-main finished
  • 17:56 otto@deploy1001: scap-helm eventgate-main cluster eqiad completed
  • 17:56 otto@deploy1001: scap-helm eventgate-main upgrade main -f /srv/scap-helm/eventgate/main/eqiad-values.yaml --reset-values stable/eventgate [namespace: eventgate-main, clusters: eqiad]
  • 17:56 otto@deploy1001: scap-helm eventgate-main finished
  • 17:56 otto@deploy1001: scap-helm eventgate-main cluster codfw completed
  • 17:56 otto@deploy1001: scap-helm eventgate-main upgrade main -f /srv/scap-helm/eventgate/main/codfw-values.yaml --reset-values stable/eventgate [namespace: eventgate-main, clusters: codfw]
  • 17:54 otto@deploy1001: scap-helm eventgate-main finished
  • 17:54 otto@deploy1001: scap-helm eventgate-main cluster staging completed
  • 17:54 otto@deploy1001: scap-helm eventgate-main upgrade main -f /srv/scap-helm/eventgate/main/staging-values.yaml --reset-values stable/eventgate [namespace: eventgate-main, clusters: staging]
  • 17:43 onimisionipe@deploy1001: Started deploy [wdqs/wdqs@dcf3338]: New Updater, GUI and Blazegraph build
  • 17:29 onimisionipe: pooled wdqs1003 - after rolling back failed deployment.
  • 17:26 onimisionipe@deploy1001: Finished deploy [wdqs/wdqs@d6ed70b]: New Updater, GUI and Blazegraph build (duration: 10m 19s)
  • 17:23 otto@deploy1001: Synchronized wmf-config/InitialiseSettings.php: Revert - Produce user-blocks-change to eventgate-main. Depends on https://gerrit.wikimedia.org/r/c/mediawiki/extensions/EventBus/+/514560 (duration: 00m 47s)
  • 17:16 onimisionipe@deploy1001: Started deploy [wdqs/wdqs@d6ed70b]: New Updater, GUI and Blazegraph build
  • 17:14 otto@deploy1001: Synchronized wmf-config/InitialiseSettings.php: Produce user-blocks-change to eventgate-main - T211248 (duration: 00m 48s)
  • 17:10 ottomata: mw-config change to produce user-blocks-change event to eventgate-main - T211248
  • 16:27 jynus: starting data check on db2097+db2046, expect increase in read row rate T225378
  • 16:02 otto@deploy1001: scap-helm eventgate-main finished
  • 16:02 otto@deploy1001: scap-helm eventgate-main cluster eqiad completed
  • 16:02 otto@deploy1001: scap-helm eventgate-main upgrade main -f /srv/scap-helm/eventgate/main/eqiad-values.yaml --reset-values stable/eventgate [namespace: eventgate-main, clusters: eqiad]
  • 15:57 otto@deploy1001: scap-helm eventgate-main finished
  • 15:57 otto@deploy1001: scap-helm eventgate-main cluster codfw completed
  • 15:57 otto@deploy1001: scap-helm eventgate-main upgrade main -f /srv/scap-helm/eventgate/main/codfw-values.yaml --reset-values stable/eventgate [namespace: eventgate-main, clusters: codfw]
  • 15:55 otto@deploy1001: scap-helm eventgate-main finished
  • 15:55 otto@deploy1001: scap-helm eventgate-main cluster staging completed
  • 15:55 otto@deploy1001: scap-helm eventgate-main upgrade main -f /srv/scap-helm/eventgate/main/staging-values.yaml --reset-values stable/eventgate [namespace: eventgate-main, clusters: staging]
  • 15:42 ema: cp4026: ats-backend-restart to apply systemd unit hardening changes
  • 15:32 otto@deploy1001: scap-helm eventgate-main finished
  • 15:32 otto@deploy1001: scap-helm eventgate-main cluster staging completed
  • 15:32 otto@deploy1001: scap-helm eventgate-main upgrade main -f /srv/scap-helm/eventgate/main/staging-values.yaml --reset-values stable/eventgate [namespace: eventgate-main, clusters: staging]
  • 15:30 otto@deploy1001: scap-helm eventgate-main finished
  • 15:30 otto@deploy1001: scap-helm eventgate-main cluster staging completed
  • 15:30 otto@deploy1001: scap-helm eventgate-main upgrade main -f /srv/scap-helm/eventgate/main/staging-values.yaml --reset-values stable/eventgate [namespace: eventgate-main, clusters: staging]
  • 15:17 thcipriani: gerrit back
  • 15:16 thcipriani: gerrit restart to pick up new config changes.
  • 14:45 elukey: stop eventlogging on eventlog1002 and reboot for kernel upgrades
  • 14:32 otto@deploy1001: scap-helm eventgate-main finished
  • 14:32 otto@deploy1001: scap-helm eventgate-main cluster codfw completed
  • 14:32 otto@deploy1001: scap-helm eventgate-main upgrade main -f /srv/scap-helm/eventgate/main/codfw-values.yaml --reset-values stable/eventgate [namespace: eventgate-main, clusters: codfw]
  • 14:26 otto@deploy1001: scap-helm eventgate-main finished
  • 14:26 otto@deploy1001: scap-helm eventgate-main cluster staging completed
  • 14:26 otto@deploy1001: scap-helm eventgate-main upgrade main -f /srv/scap-helm/eventgate/main/staging-values.yaml --reset-values stable/eventgate [namespace: eventgate-main, clusters: staging]
  • 14:15 moritzm: installing poppler security updates on jessie
  • 14:03 cdanis: cdanis@cobalt.wikimedia.org ~ % sudo systemctl start gerrit.service
  • 13:53 ema@cumin1001: END (PASS) - Cookbook sre.hosts.upgrade-and-reboot (exit_code=0)
  • 13:49 moritzm: installing libav security updates
  • 13:45 ema@cumin1001: START - Cookbook sre.hosts.upgrade-and-reboot
  • 13:45 ema: reboot cp4027 for dist and Varnish upgrade T224694
  • 13:34 elukey: reboot of an-worker* (Hadoop worker nodes) for kernel + openjdk upgrades
  • 13:25 ema: cp4027: upgrade Varnish packages to 5.1.3-1wm10 T224694
  • 12:37 jbond42: upgrade mtail on lithium - T225604
  • 12:35 jbond42: add mtail_3.0.0~rc24.1-1+wmf1_amd64.deb to jessie-wikimedia backports
  • 12:13 awight@deploy1001: Synchronized wmf-config/CommonSettings-labs.php: SWAT: FileImporter configuration to fetch sitelinks from Wikidata (T225609 T224007) - finishing partial deployment (duration: 00m 47s)
  • 12:06 awight: EU SWAT complete
  • 12:05 awight@deploy1001: Synchronized wmf-config/InitialiseSettings.php: SWAT: 517391 Enable AMC mode for Persian, Japanese, Thai and Italian wikis (T225123) (duration: 00m 47s)
  • 12:02 Urbanecm: EU SWAT is going a few minutes beyond its window
  • 11:55 awight@deploy1001: Synchronized wmf-config/InitialiseSettings.php: SWAT: 516608 Enable feature flag for breaking Wikibase API change (T223303) (duration: 00m 47s)
  • 11:49 awight@deploy1001: Synchronized wmf-config/InitialiseSettings.php: SWAT: 516478 Set EntityUsageTable addUsage batch size to 200 (T225500) (duration: 00m 47s)
  • 11:46 awight@deploy1001: Synchronized php-1.34.0-wmf.8/extensions/ContentTranslation: SWAT: Fix undefined index notices (T225198) (duration: 00m 49s)
  • 11:33 awight@deploy1001: Synchronized wmf-config/InitialiseSettings.php: SWAT: Add autoreview protection level on ar.wikipedia (T225896) (duration: 00m 47s)
  • 11:28 awight@deploy1001: Synchronized wmf-config/InitialiseSettings.php: SWAT: Enable VisualEditor in draft namespace on sr.wiki (T223024) (duration: 00m 47s)
  • 11:23 awight: ran mwscript namespaceDupes.php nds_nlwiki, no dupes found
  • 11:22 awight@deploy1001: Synchronized wmf-config/InitialiseSettings.php: SWAT: Set nds_nlwiki's sitename and metanamespace back to defaults (T224349) (duration: 00m 47s)
  • 11:12 awight@deploy1001: Synchronized wmf-config/CommonSettings.php: wmf-config/CommonSettings-labs.php SWAT: FileImporter configuration to fetch sitelinks from Wikidata (T225609 T224007) (duration: 00m 47s)
  • 10:51 jdrewniak@deploy1001: Synchronized portals: Wikimedia Portals Update: Bumping portals to master (T128546) (duration: 00m 47s)
  • 10:51 jdrewniak@deploy1001: Synchronized portals/wikipedia.org/assets: Wikimedia Portals Update: Bumping portals to master (T128546) (duration: 00m 49s)
  • 09:39 moritzm: rebooting mw2184, mw1265 for some tests
  • 09:36 jmm@cumin2001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
  • 09:36 jmm@cumin2001: START - Cookbook sre.hosts.downtime
  • 09:36 jmm@cumin2001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
  • 09:36 jmm@cumin2001: START - Cookbook sre.hosts.downtime
  • 09:31 elukey: set cpu governor to performance (was powersave) on analytics1070 (hadoop worker node)
  • 09:17 moritzm: rebooting sulfur for some tests
  • 09:15 _joe_: The governor was set to "powersave", not "ondemand"
  • 09:13 _joe_: setting cpufreq governor to "ondemand" on mw1348, T225713
  • 08:52 onimisionipe: remove maps1001 from cassandra cluster - T224395
  • 07:25 XioNoX: restart snmp daemon on mr1-eqsin
  • 07:10 marostegui@deploy1001: Synchronized wmf-config/db-codfw.php: Repool db2107 (duration: 00m 47s)
  • 06:22 marostegui@deploy1001: Synchronized wmf-config/db-codfw.php: Repool db2084 (duration: 00m 47s)
  • 06:12 marostegui@deploy1001: Synchronized wmf-config/db-codfw.php: Depool db2084 for a reboot (duration: 00m 48s)
  • 06:04 marostegui: Stop MySQ on db2084 to reboot the host T225884
  • 05:16 marostegui: Stop MySQL on db2107 to clone db2051 - T221533
  • 05:11 marostegui@deploy1001: Synchronized wmf-config/db-codfw.php: Depool db2107 to clone db2051 (duration: 00m 47s)
  • 05:03 marostegui: Optimize all pc1008's tables T210725
  • 05:03 marostegui@deploy1001: Synchronized wmf-config/db-eqiad.php: Depool pc1008 and pool pc1010 temporarily while pc1008 gets all its tables optimized T210725 (duration: 00m 59s)

2019-06-16

  • 14:20 Urbanecm: running mwscript importImages.php --wiki=commonswiki --comment-ext=txt --user='AKA MBG' /home/urbanecm/T225886
  • 08:21 elukey: roll restart of druid brokers on druid100[4-6], stuck after regular data drop maintenance

2019-06-15

  • 20:38 smalyshev@deploy1001: Finished deploy [wdqs/wdqs@55174a4]: deploy new pattern for bots (duration: 21m 42s)
  • 20:17 smalyshev@deploy1001: Started deploy [wdqs/wdqs@55174a4]: deploy new pattern for bots
  • 20:16 smalyshev@deploy1001: Finished deploy [wdqs/wdqs@55174a4]: deploy new pattern for bots (duration: 00m 54s)
  • 20:15 smalyshev@deploy1001: Started deploy [wdqs/wdqs@55174a4]: deploy new pattern for bots
  • 19:14 SMalyshev: repooled wdqs1004
  • 17:35 elukey: restart hadoop-yarn-resourcemanager on an-masters as attempt to fix yarn.w.o
  • 07:44 SMalyshev: depooled wdqs1004 to catch it up

2019-06-14

  • 23:23 ejegg: updated payments-wiki from 75abd71cc1 to 79d1822644
  • 23:19 SMalyshev: repooled wdqs1003
  • 23:13 SMalyshev: repooled wdqs2003
  • 23:10 _joe_: set cpufreq governor for mw1348 to performance
  • 19:56 SMalyshev: depooled wdqs2003 to catch up
  • 19:17 SMalyshev: depooled wdqs1003 to catch up
  • 15:56 gehel: repooling wdqs1003, not catching up anyway (high edit load)
  • 15:24 godog: test setting 'performance' governor on ms-be2035 - T210723
  • 14:35 godog: powercycle mw1294, down and no console
  • 13:26 gehel: depooling wdqs1003 to allow it to catch up on lag
  • 13:22 joal@deploy1001: Started restart [analytics/aqs/deploy@fc1d232]: (no justification provided)
  • 12:38 godog: test setting 'performance' governor on ms-be2032 - T210723
  • 11:36 godog: test setting 'performance' governor on ms-be2034 - T210723
  • 10:22 marostegui: Optimize tables on pc2008 - T210725
  • 10:17 marostegui@deploy1001: Synchronized wmf-config/db-eqiad.php: Fully repool db1077 after recovering from a crash (duration: 00m 49s)
  • 10:14 godog: test setting 'performance' governor on ms-be2031 - T210723
  • 09:44 godog: test setting 'performance' governor on ms-be2037 - T210723
  • 09:43 godog: test setting 'performance' governor on ms-be2033 - T210723
  • 09:28 godog: test setting 'performance' governor on ms-be2038 - T210723
  • 09:26 godog: test setting 'performance' governor on ms-be2016 - T210723
  • 03:57 SMalyshev: repooled wdqs1005
  • 00:11 SMalyshev: depooled wdqs1005 - let it catch up
  • 00:10 SMalyshev: repooled wdqs1006 - caught up

2019-06-13

  • 23:25 SMalyshev: depooled wdqs1006 to let it catch up quicker
  • 18:10 fdans@deploy1001: Finished deploy [analytics/refinery@67b34fe]: retrying deployment of analytics refinery (duration: 00m 19s)
  • 18:10 fdans@deploy1001: Started deploy [analytics/refinery@67b34fe]: retrying deployment of analytics refinery
  • 18:01 fdans@deploy1001: Finished deploy [analytics/refinery@67b34fe]: deploying refinery source 0.0.92 into refinery (duration: 16m 45s)
  • 17:44 fdans@deploy1001: Started deploy [analytics/refinery@67b34fe]: deploying refinery source 0.0.92 into refinery
  • 17:34 bstorm_: T203254 set cpu scaling governor to performance on labstore1004 and labstore1005
  • 16:02 gehel: restart blazegraph on wdqs public cluster completed
  • 15:58 gehel: restart blazegraph on wdqs public cluster
  • 15:36 gehel: restarting blazegraph on wdqs-internal / eqiad (just in case)
  • 08:09 jynus: reloading proxies for wikireplicas to rebalance load
  • 07:00 marostegui@deploy1001: Synchronized wmf-config/db-eqiad.php: More traffic to db1077 after recovering from a crash (duration: 00m 50s)
  • 00:45 paravoid: setting the CPU governor to performance for ms-be1036 (a while ago)

2019-06-12

  • 18:15 krinkle@deploy1001: Synchronized php-1.34.0-wmf.8/thumb.php: T225197 / 06b631fae5 (duration: 00m 47s)
  • 18:13 krinkle@deploy1001: Synchronized php-1.34.0-wmf.8/extensions/ArticlePlaceholder/includes/: T207235 / a42aa15 (duration: 00m 49s)
  • 16:06 gehel@cumin1001: END (PASS) - Cookbook sre.elasticsearch.rolling-restart (exit_code=0)
  • 15:49 gehel@cumin1001: START - Cookbook sre.elasticsearch.rolling-restart
  • 15:37 legoktm: re-enabled bawolff's gerrit account
  • 15:14 gehel@cumin1001: END (ERROR) - Cookbook sre.elasticsearch.rolling-restart (exit_code=97)
  • 14:38 marostegui: Start replication on all threads on labsdb1010 - T222978
  • 14:35 marostegui@deploy1001: Synchronized wmf-config/db-eqiad.php: More traffic to db1077 after recovering from a crash (duration: 00m 47s)
  • 13:19 gehel@cumin1001: START - Cookbook sre.elasticsearch.rolling-restart
  • 11:55 godog: swift eqiad-prod: put back ms-be1033 - T223518
  • 10:52 godog: force-upgrade mtail to 3.0.0~rc24.1-1 on wezen - T225604
  • 10:36 marostegui@deploy1001: Synchronized wmf-config/db-eqiad.php: More traffic to db1077 after recovering from a crash (duration: 00m 47s)
  • 10:18 akosiaris@deploy1001: scap-helm zotero finished
  • 10:18 akosiaris@deploy1001: scap-helm zotero cluster codfw completed
  • 10:17 akosiaris@deploy1001: scap-helm zotero cluster eqiad completed
  • 10:17 akosiaris@deploy1001: scap-helm zotero upgrade --dry-run --debug production stable/zotero [namespace: zotero, clusters: eqiad,codfw]
  • 10:01 marostegui@deploy1001: Synchronized wmf-config/db-eqiad.php: Slowly repool db1077 after a crash (duration: 00m 48s)
  • 09:51 gehel@cumin2001: END (PASS) - Cookbook sre.elasticsearch.rolling-restart (exit_code=0)
  • 08:59 hashar: Gracefully stopping Zuul (kill -SIGUSR1) to prepare for the restart of the CI Jenkins T225322
  • 08:41 onimisionipe: pool map2003. reimage and setup is complete - T224395
  • 08:31 gehel@cumin2001: START - Cookbook sre.elasticsearch.rolling-restart
  • 06:49 marostegui@deploy1001: Synchronized wmf-config/db-eqiad.php: Slowly repool db1077 after a crash (duration: 00m 49s)

2019-06-11

  • 19:24 tzatziki: Removing four (4) files for legal compliance
  • 15:41 gehel: shutting down elastic1029 for investigation - T214283
  • 12:54 godog: swift eqiad-prod: put back ms-be1033 - T223518
  • 11:52 gehel@cumin2001: END (PASS) - Cookbook sre.postgresql.postgres-init (exit_code=0)
  • 10:54 godog: wipe fs on ms-be1033 data partitions - T223518
  • 09:56 gehel@cumin2001: START - Cookbook sre.postgresql.postgres-init
  • 09:20 godog: free up space wrongly allocated onto / with sdc1 umounted on ms-be2018
  • 08:26 gehel: repooling maps200[124]

2019-06-10

  • 19:39 thcipriani: restarting jenkins
  • 19:11 akosiaris: refresh all zotero pods in all clusters
  • 19:11 akosiaris@deploy1001: scap-helm zotero finished
  • 19:11 akosiaris@deploy1001: scap-helm zotero cluster staging completed
  • 19:11 akosiaris@deploy1001: scap-helm zotero upgrade -f zotero-values-staging.yaml staging stable/zotero [namespace: zotero, clusters: staging]
  • 19:11 akosiaris@deploy1001: scap-helm zotero finished
  • 19:10 akosiaris@deploy1001: scap-helm zotero cluster eqiad completed
  • 19:10 akosiaris@deploy1001: scap-helm zotero upgrade -f zotero-values-eqiad.yaml production stable/zotero [namespace: zotero, clusters: eqiad]
  • 19:10 akosiaris@deploy1001: scap-helm zotero finished
  • 19:10 akosiaris@deploy1001: scap-helm zotero cluster codfw completed
  • 19:10 akosiaris@deploy1001: scap-helm zotero upgrade -f zotero-values-codfw.yaml production stable/zotero [namespace: zotero, clusters: codfw]
  • 17:55 ottomata: rolling restart of AQS service using scap deploy for new mediawiki_history_snaphost
  • 17:55 otto@deploy1001: Started restart [analytics/aqs/deploy@fc1d232]: (no justification provided)
  • 16:24 marostegui: Power reset db1077 from the idrac T225391
  • 13:18 mvolz@deploy1001: scap-helm citoid finished
  • 13:18 mvolz@deploy1001: scap-helm citoid cluster codfw completed
  • 13:18 mvolz@deploy1001: scap-helm citoid upgrade production -f citoid-codfw-values.yaml stable/citoid [namespace: citoid, clusters: codfw]
  • 13:13 mvolz@deploy1001: scap-helm citoid finished
  • 13:13 mvolz@deploy1001: scap-helm citoid cluster eqiad completed
  • 13:13 mvolz@deploy1001: scap-helm citoid upgrade production -f citoid-eqiad-values.yaml stable/citoid [namespace: citoid, clusters: eqiad]
  • 13:04 mvolz@deploy1001: scap-helm citoid finished
  • 13:04 mvolz@deploy1001: scap-helm citoid cluster staging completed
  • 13:04 mvolz@deploy1001: scap-helm citoid upgrade staging -f citoid-staging-values.yaml stable/citoid [namespace: citoid, clusters: staging]
  • 05:37 marostegui@deploy1001: Synchronized wmf-config/db-eqiad.php: Depool db1077 - host crashed (duration: 00m 52s)

2019-06-09

  • 08:30 vgutierrez: rebooting lvs4007 after NIC driver crash

2019-06-08

  • 11:58 godog: stop swift processes on ms-be1033 - T223518
  • 10:46 reedy@deploy1001: Synchronized wmf-config/throttle.php: T225344 (duration: 00m 51s)

2019-06-07

  • 18:56 herron: performing rolling reboots of logstash codfw frontends for security updates
  • 18:22 cstone: Update payments-wiki revision changed from c6c7bbf71e to 75abd71cc1
  • 15:34 godog: bounce rsyslog on wezen - T199406

2019-06-07

  • 15:09 elukey: reboot thorium for kernel upgrades
  • 14:00 ema: pool cp3039 w/ ATS backend T222937
  • 13:15 ema: depool cp3039 and reimage as upload_ats T222937
  • 13:04 arturo: aborrero@cumin1001:~ $ sudo cumin "P{R:Systemd::Timer::Job}" "puppet agent --enable && run-puppet-agent" (patch already merged)
  • 13:03 arturo: aborrero@cumin1001:~$ sudo cumin "P{R:Systemd::Timer::Job}" "puppet agent --disable 'arturo merging systemd timer nrpe change'" (19 hosts affected) merging: https://gerrit.wikimedia.org/r/c/operations/puppet/+/514988
  • 11:45 ema: pool cp3043 w/ ATS backend T222937
  • 10:51 jbond42: upload libcpp-hocon0.1.6_0.1.6-1~bpo9+1_amd64.deb to wikimedia-stretch component/facter3
  • 10:45 jbond42: upload libleatherman-data_1.4.0+dfsg-1\~bpo9+1_all.deb to wikimedia-stretch component/facter3
  • 10:43 ema: depool cp3043 and reimage as upload_ats T222937
  • 10:09 _joe_: restarting php-fpm on the codfw hosts to pick up the recent changes in opcache
  • 09:59 jbond42: upload libleatherman1.4.0_1.4.0+dfsg-1~bpo9+1_amd64.deb to wikimedia-stretch component/facter3
  • 09:49 jbond42: upload libleatherman1.4.0_1.4.0+dfsg-1~bpo8+1_amd64.deb to wikimedia-jessie component/facter3
  • 09:16 mobrovac@deploy1001: scap-helm mathoid finished
  • 09:16 mobrovac@deploy1001: scap-helm mathoid cluster codfw completed
  • 09:16 mobrovac@deploy1001: scap-helm mathoid cluster eqiad completed
  • 09:16 mobrovac@deploy1001: scap-helm mathoid upgrade production stable/mathoid -f mathoid-values.yaml [namespace: mathoid, clusters: eqiad,codfw]
  • 09:00 marostegui: Upgrade x1 codfw hosts in preparation for its failover T220170
  • 08:46 elukey: start the reboot of the Analytics Hadoop's worker nodes for kernel+openjdk upgrades
  • 08:24 marostegui: Upgrade s2 codfw to 10.1.39 in preparation for its codfw failover - T221533
  • 08:19 XioNoX: remove BGP session to AS55658 on cr1-eqsin (left the IXP)
  • 08:12 vgutierrez: upgrading certbot in wikitech-static
  • 07:29 marostegui: Drop unused temporary test tables on db1111 and db1112
  • 05:40 marostegui@deploy1001: Synchronized wmf-config/db-codfw.php: Move db2051 from s4 to s2T221533 (duration: 00m 49s)
  • 00:00 jforrester@deploy1001: Synchronized wmf-config/CommonSettings.php: Remove unused preference T47877-buster (duration: 00m 47s)
  • 00:00 bstorm_: T224850 repooled labsdb1009 after completing view updates

2019-06-06

  • 23:57 jforrester@deploy1001: Synchronized wmf-config/CommonSettings.php: Specify the fluidsynth paths for TMH MIDI conversion T135597 (duration: 00m 47s)
  • 23:56 jforrester@deploy1001: Synchronized wmf-config/Wikibase.php: Remove T225183 (duration: 00m 48s)
  • 23:03 jeh: T224850 depooled labsdb1009
  • 22:42 bstorm_: T224850 repooled labsdb1011
  • 21:01 bstorm_: T224850 depooled labsdb1011
  • 20:58 jforrester@deploy1001: Synchronized wmf-config/reverse-proxy.php: Stop setting wgSquidServersNoPurge, MW now uses wgCdnServersNoPurge (duration: 00m 47s)
  • 20:57 jforrester@deploy1001: Synchronized wmf-config/InitialiseSettings.php: Stop setting wgSquidMaxage, MW now uses wgCdnMaxAge (duration: 00m 46s)
  • 20:55 jforrester@deploy1001: Synchronized wmf-config/CommonSettings.php: Stop setting wgUseSquid or using wgSquidServersNoPurge, duplicate existing values (duration: 00m 48s)
  • 20:49 jforrester@deploy1001: Synchronized wmf-config/Wikibase.php: Drop backwards-compatibility for dataSquidMaxage (duration: 00m 48s)
  • 19:47 herron: performing rolling reboot of eqiad logstash hw for MDS security updates
  • 18:58 jbond42: reimage sarin to stretch
  • 18:39 jbond42: mw1249 - sudo systemctl restart php7.2-fpm.service
  • 18:38 papaul: shutting down backup2001 for 10G nic troubleshooting
  • 18:24 bstorm_: T224850 repooled labsdb1010 after completing view run
  • 18:04 jijiki: Continuing rolling restarts of php-fpm in eqiad
  • 17:30 elukey: restart mcrouter on mw2271 (codfw proxy) to pick up new config changes
  • 15:56 bstorm_: T224850 depooled labsdb1010 for view updates
  • 15:28 jmm@cumin2001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
  • 15:28 jmm@cumin2001: START - Cookbook sre.hosts.downtime
  • 15:06 jmm@cumin2001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
  • 15:05 jmm@cumin2001: START - Cookbook sre.hosts.downtime
  • 15:05 moritzm: rolling reboot of sessionstore hosts in eqiad for kernel security update
  • 15:02 _joe_: rolling restart of php-fpm on {appservers,api} in eqiad, in groups of 4, staggered by 10 minutes, to pick up the new opcache settings
  • 14:57 bstorm_: T224850 update views on labsdb1012
  • 14:43 moritzm: updating qemu packages on ganeti hosts to deploy support for md_clear/MDS for Ganeti instances
  • 14:43 elukey: restart mcrouter on mw2255 (codfw proxy) to pick up new config changes
  • 14:22 dcausse@deploy1001: Synchronized wmf-config/CirrusSearch-common.php: fix logspam (duration: 00m 48s)
  • 14:18 gehel@cumin1001: END (PASS) - Cookbook sre.wdqs.restart-wdqs (exit_code=0)
  • 13:54 dcausse@deploy1001: Synchronized wmf-config/CirrusSearch-production.php: fix logspam (duration: 00m 47s)
  • 13:44 moritzm: rolling reboot of sessionstore hosts in codfw for kernel security update
  • 13:44 jmm@cumin2001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
  • 13:44 jmm@cumin2001: START - Cookbook sre.hosts.downtime
  • 13:36 gehel@cumin1001: START - Cookbook sre.wdqs.restart-wdqs
  • 13:35 zfilipin@deploy1001: rebuilt and synchronized wikiversions files: all wikis to 1.34.0-wmf.8
  • 13:35 gehel@cumin1001: END (FAIL) - Cookbook sre.wdqs.restart-wdqs (exit_code=99)
  • 13:35 gehel@cumin1001: START - Cookbook sre.wdqs.restart-wdqs
  • 13:34 gehel@cumin1001: END (PASS) - Cookbook sre.wdqs.restart-wdqs (exit_code=0)
  • 13:33 gehel@cumin1001: START - Cookbook sre.wdqs.restart-wdqs
  • 13:32 gehel@cumin1001: END (PASS) - Cookbook sre.wdqs.restart-wdqs (exit_code=0)
  • 13:31 gehel@cumin1001: START - Cookbook sre.wdqs.restart-wdqs
  • 12:44 jbond42: reimage neodymium
  • 12:23 _joe_: running puppet, restarting php-fpm on the canaries to pick up the new opcache size
  • 12:11 ema: cp1075: repool with varnish 5.1.3-1wm10 T224694
  • 12:10 elukey: restart mcrouter on mw2235
  • 12:05 Lucas_WMDE: EU SWAT done
  • {{safesubst:SAL entry|1=12:04 lucaswerkmeister-wmde@deploy1001: Synchronized wmf-config/: SWAT: [[gerrit:514700|Revert "Specify $wgWBRepoSettings['conceptBaseUri']" (duration: 00m 56s)}}
  • 12:00 ema: cp1075: upgrade varnish to 5.1.3-1wm10 T224694
  • 11:55 lucaswerkmeister-wmde@deploy1001: scap failed: average error rate on 8/11 canaries increased by 10x (rerun with --force to override this check, see https://logstash.wikimedia.org/goto/db09a36be5ed3e81155041f7d46ad040 for details)
  • 11:48 Urbanecm: running mwscript namespaceDupes.php --wiki=thwikisource --fix (T216322)
  • 11:47 Urbanecm: running mwscript namespaceDupes.php --wiki=thwikibooks --fix for T216322
  • 11:46 urbanecm@deploy1001: Synchronized wmf-config/InitialiseSettings.php: Add new namespaces for several Thai projects|gerrit:514678Add new namespaces for several Thai projects (T216322) (duration: 00m 54s)
  • 11:38 lucaswerkmeister-wmde@deploy1001: Synchronized wmf-config/InitialiseSettings.php: SWAT: Remove unused config variable wgWikibaseEnableSenses|gerrit:514534Remove unused config variable wgWikibaseEnableSenses (duration: 00m 55s)
  • 11:23 gehel@cumin2001: END (PASS) - Cookbook sre.postgresql.postgres-init (exit_code=0)
  • 11:22 lucaswerkmeister-wmde@deploy1001: Synchronized php-1.34.0-wmf.8/extensions/CirrusSearch/: SWAT: Fix event validation error for cirrussearch-request event|gerrit:514566Fix event validation error for cirrussearch-request event (duration: 01m 06s)
  • 10:55 elukey: restart mcrouter on mw2163 (codfw mcrouter proxy)
  • 10:43 mobrovac@deploy1001: scap-helm mathoid finished
  • 10:43 mobrovac@deploy1001: scap-helm mathoid cluster codfw completed
  • 10:43 mobrovac@deploy1001: scap-helm mathoid cluster eqiad completed
  • 10:43 mobrovac@deploy1001: scap-helm mathoid upgrade production stable/mathoid -f mathoid-values.yaml [namespace: mathoid, clusters: eqiad,codfw]
  • 10:30 ema: varnish 5.1.3-1wm10 uploaded to stretch-wikimedia T224694
  • 10:19 elukey: rolling restart of mcrouter on mw1* hosts to pick up config change (batch of 5 hosts, depool/run-puppet/pool)
  • 10:12 elukey: disable puppet on mw1* and mw[2163,2235,2255,2271] as prep step for mcrouter config deploy
  • 10:10 fsero: rollbacked last deployment of mathoid to revision 16
  • 09:59 mobrovac@deploy1001: scap-helm mathoid finished
  • 09:59 mobrovac@deploy1001: scap-helm mathoid cluster codfw completed
  • 09:59 mobrovac@deploy1001: scap-helm mathoid cluster eqiad completed
  • 09:59 mobrovac@deploy1001: scap-helm mathoid upgrade production stable/mathoid -f mathoid-values.yaml [namespace: mathoid, clusters: eqiad,codfw]
  • 09:32 moritzm: rebooting mwdebug2002 for some tests
  • 09:31 jmm@cumin2001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
  • 09:30 jmm@cumin2001: START - Cookbook sre.hosts.downtime
  • 09:28 moritzm: updating qemu on ganeti2004 for some tests
  • 09:24 gehel@cumin2001: START - Cookbook sre.postgresql.postgres-init
  • 08:38 marostegui: Stop MySQL on db1117:3322 - this will trigger haproxy alerts - T222682
  • 07:35 marostegui@deploy1001: Synchronized wmf-config/db-eqiad.php: Repool db1121 after upgrade T224852 (duration: 00m 53s)
  • 07:20 marostegui: Stop MySQL on db1121 for upgrade, this will generate lag on labs hosts for s6 - T224852
  • 07:16 marostegui@deploy1001: Synchronized wmf-config/db-codfw.php: Promote db2046 to s6 master as db2039 will be decommissioned T221533 (duration: 00m 55s)
  • 06:31 marostegui: Start topology changes on s6 codfw to promote db2046 as master - T221533
  • 06:23 marostegui@deploy1001: Synchronized wmf-config/db-eqiad.php: Depool db1121 for upgrade T224852 (duration: 00m 55s)
  • 06:15 marostegui@deploy1001: Synchronized wmf-config/db-eqiad.php: Fully repool db1091 after getting its BBU replaced (duration: 00m 54s)
  • 06:01 marostegui@deploy1001: Synchronized wmf-config/db-eqiad.php: More traffic to db1091 after getting its BBU replaced (duration: 01m 01s)
  • 05:47 marostegui@deploy1001: Synchronized wmf-config/db-eqiad.php: More traffic to db1091 after getting its BBU replaced (duration: 00m 55s)
  • 05:41 marostegui: Upgrade MySQL on s6 codfw hosts in preparation for s6 codfw master failover - T221533
  • 05:32 marostegui@deploy1001: Synchronized wmf-config/db-eqiad.php: More traffic to db1091 after getting its BBU replaced (duration: 00m 55s)
  • 05:18 marostegui: Remove db2042 from tendril and zarcillo T225090
  • 05:18 marostegui: Remove db2042 from tendril and zarcillo
  • 05:14 marostegui: Stop MySQL on db2042 to copy its content to dbprov2001 as a temporary backup - T225090
  • 05:11 marostegui: Disable notifications db2042 - T225090
  • 05:09 marostegui@deploy1001: Synchronized wmf-config/db-eqiad.php: Slowly repool db1091 after getting its BBU replaced T225060 (duration: 00m 56s)

2019-06-05

  • 22:15 chaomodus: restarting gerrit on cobalt due to it being down (seems like Java out of heap space)
  • 20:43 mforns@deploy1001: Finished deploy [analytics/refinery@0660e70]: deploying analytics/refinery up to 0660e70 (duration: 19m 30s)
  • 20:39 reedy@deploy1001: Synchronized wmf-config/flaggedrevs.php: Turn off some FR config T225138 (duration: 00m 54s)
  • 20:25 akosiaris@deploy1001: scap-helm blubberoid finished
  • 20:25 akosiaris@deploy1001: scap-helm blubberoid cluster codfw completed
  • 20:25 akosiaris@deploy1001: scap-helm blubberoid cluster eqiad completed
  • 20:25 akosiaris@deploy1001: scap-helm blubberoid upgrade -f blubberoid-values.yaml production stable/blubberoid [namespace: blubberoid, clusters: eqiad,codfw]
  • 20:23 mforns@deploy1001: Started deploy [analytics/refinery@0660e70]: deploying analytics/refinery up to 0660e70
  • 19:57 hashar: contint1001: docker container prune -f && docker image prune -f # reclaimed 166 MB and 3.4 GB
  • 19:48 marostegui: Check data consistency on db1091 against db1135 - T225060
  • 19:45 reedy@deploy1001: Synchronized wmf-config/flaggedrevs.php: T225115 (duration: 00m 54s)
  • 17:36 marostegui: Start replication db1091 - T225060
  • 17:32 marostegui: Start MySQL with replication stopped on db1091 - T225060
  • 16:29 otto@deploy1001: Synchronized wmf-config/InitialiseSettings.php: Revert user-blocks-change to use eventbus and old schema - T211248 (duration: 00m 54s)
  • 16:22 otto@deploy1001: Synchronized wmf-config/InitialiseSettings.php: use eventgate-main for 2 events on all wikis - T211248 (duration: 00m 55s)
  • 16:11 reedy@deploy1001: Synchronized wmf-config/InitialiseSettings.php: Add wgEventServiceStreamConfig and switch 2 topics in group0 T222822 (duration: 00m 56s)
  • 16:11 XioNoX: remove BGP to AS38082 on cr4-ulsfo (left the IXP)
  • 15:46 reedy@deploy1001: Scap failed!: Call to mwscript eval.php returned: None
  • 15:44 reedy@deploy1001: Finished scap: Rebuild .8 i18n for FlaggedRevs (duration: 41m 14s)
  • 15:36 moritzm: installing exim4 security updates
  • 15:03 reedy@deploy1001: Started scap: Rebuild .8 i18n for FlaggedRevs
  • 14:24 marostegui: Poweroff db1091 for BBU replacement - T225060
  • 13:57 elukey: restart mcrouter on MediaWiki app/api canaries to pick up new config change (timeouts before marking a memcached shard as TKO from 3 to 10) - T203786
  • 13:56 jijiki: enabling puppet and pooling on mw* canaries
  • 13:17 jynus: start es2,es3 backup on codfw
  • 13:17 zfilipin@deploy1001: rebuilt and synchronized wikiversions files: group1 wikis to 1.34.0-wmf.8
  • 13:03 hashar: restarting Jenkins
  • 12:53 marostegui@deploy1001: Synchronized wmf-config/db-eqiad.php: More traffic to db1135 (duration: 00m 54s)
  • 12:46 Lucas_WMDE: EU SWAT finished
  • 12:32 ladsgroup@deploy1001: Synchronized php-1.34.0-wmf.8/extensions/WikimediaMessages/: SWAT: Fix wikidata copyright message (T224536)|gerrit:514460Fix wikidata copyright message (T224536) (duration: 00m 56s)
  • 11:43 pmiazga@deploy1001: Synchronized wmf-config/InitialiseSettings.php: SWAT: Enable the new history page in the advanced mobile contributions mode (T219895)|gerrit:514449Enable the new history page in the advanced mobile contributions mode (T219895) (duration: 00m 56s)
  • 11:27 urbanecm@deploy1001: Synchronized wmf-config/flaggedrevs.php: Remove project namespace from flaggedrevs on ruwikisource|gerrit:514413Remove project namespace from flaggedrevs on ruwikisource (T225037) (duration: 00m 54s)
  • 10:57 ladsgroup@deploy1001: Synchronized php-1.34.0-wmf.8/extensions/FlaggedRevs: Add ext.flaggedRevs.icons to modules registeration|gerrit:514456Add ext.flaggedRevs.icons to modules registeration (duration: 00m 57s)
  • 10:14 marostegui@deploy1001: Synchronized wmf-config/db-eqiad.php: More traffic to db1135 (duration: 00m 55s)
  • 10:09 godog: mount sdb3 on ms-be1022 - T225079
  • 09:50 marostegui@deploy1001: Synchronized wmf-config/db-eqiad.php: Pool db1135 with very low weight on s4 (duration: 00m 55s)
  • 09:26 marostegui@deploy1001: Synchronized wmf-config/db-codfw.php: Pool without traffic db1135 into s4 T225060 (duration: 00m 55s)
  • 09:25 marostegui@deploy1001: Synchronized wmf-config/db-eqiad.php: Pool without traffic db1135 into s4 T225060 (duration: 00m 56s)
  • 08:42 onimisionipe: removing maps2001 from cassandra cluster. It is going to be reimaged - T224395
  • 08:40 _joe_: rolling restart of php7 on the api servers, to test a different strategy of restarting compared to the appservers.
  • 08:21 _joe_: performing a rolling restart of the php appservers via cumin to test speed and safety of the operations proposed in T224857
  • 08:13 jmm@cumin2001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
  • 08:12 jmm@cumin2001: START - Cookbook sre.hosts.downtime
  • 08:12 moritzm: rebooting pybal-test2001 for tests with new qemu
  • 08:12 ema: pool cp3035 w/ ATS backend T222937
  • 08:12 marostegui: Reboot db1091 T225060
  • 08:05 moritzm: installing qemu security updates on Ganeti hosts
  • 07:45 marostegui: Transfer dbprov1001.eqiad.wmnet:snapshot.s4.2019-06-04--21-37-03.tar.gz to db1135 to provision it on s4 T225060
  • 07:33 marostegui@deploy1001: Synchronized wmf-config/db-eqiad.php: Clarify db1091 status (duration: 00m 56s)
  • 07:22 ema: depool cp3035 and reimage as upload_ats T222937
  • 07:11 marostegui@deploy1001: Synchronized wmf-config/db-eqiad.php: Depool db1091 - host went down (duration: 00m 55s)
  • 06:45 marostegui: Restart MySQL on db2110 to get the binlog format changed to STATEMENT - T220170
  • 06:45 marostegui@deploy1001: Synchronized wmf-config/db-codfw.php: Promote db2090 to s4 codfw master T220170 (duration: 00m 54s)
  • 06:25 marostegui@deploy1001: Synchronized wmf-config/db-codfw.php: Mimic s4 codfw weights to eqiad T220170 (duration: 00m 55s)
  • 06:17 marostegui: Start topology changes on s4 codfw to replace current master db2051 with db2090 - T220170
  • 06:13 marostegui@deploy1001: Synchronized wmf-config/db-eqiad.php: Fully repool db1084 into API (duration: 00m 54s)
  • 05:57 marostegui@deploy1001: Synchronized wmf-config/db-eqiad.php: Repool db1084 after upgrade T224852 (duration: 00m 55s)
  • 05:49 marostegui: Upgrade MySQL on db1084 T224852
  • 05:49 marostegui@deploy1001: Synchronized wmf-config/db-eqiad.php: Depool db1084 for upgrade T224852 (duration: 01m 06s)
  • 05:31 marostegui: Stop MySQL on db1125 (sanitarium) s2,s4,s6,s7 to upgrade mysql - T224852
  • 05:29 marostegui: Keep compressing tables on labsdb1012 - T222978
  • 05:22 marostegui: Change replication topology on m3 codfw to promote db2065 as codfw master instead of db2042 - T221533
  • 05:07 marostegui: Upgrade Mysql on labsdb1012 - T224852
  • 04:09 onimisionipe: starting postgres slave init on maps2001 - T224395

2019-06-04

  • 23:03 catrope@deploy1001: Synchronized wmf-config/InitialiseSettings.php: Change log level to debug for PageTriage (duration: 01m 03s)
  • 22:06 eileen: civicrm revision changed from 506ebe2f2a to 5c02e62d6e, config revision is 63438eea43
  • 21:08 jbond42: finished rolling reboots of mw1* servers
  • 21:07 jbond42: finished tolling reboots of mw1* servers
  • 20:56 jbond@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
  • 20:55 jbond@cumin1001: START - Cookbook sre.hosts.downtime
  • 20:36 jbond@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
  • 20:36 jbond@cumin1001: START - Cookbook sre.hosts.downtime
  • 20:26 jbond@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
  • 20:26 jbond@cumin1001: START - Cookbook sre.hosts.downtime
  • 20:10 jbond@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
  • 20:10 jbond@cumin1001: START - Cookbook sre.hosts.downtime
  • 20:03 jbond@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
  • 20:03 jbond@cumin1001: START - Cookbook sre.hosts.downtime
  • 19:55 jbond@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
  • 19:55 jbond@cumin1001: START - Cookbook sre.hosts.downtime
  • 19:48 jbond@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
  • 19:48 jbond@cumin1001: START - Cookbook sre.hosts.downtime
  • 19:48 XioNoX: replace logstash.svc.eqiad.wmnet syslog target with syslog.codfw.wmnet on cr4-ulsfo - T224128
  • 19:41 jbond@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
  • 19:41 jbond@cumin1001: START - Cookbook sre.hosts.downtime
  • 19:41 jbond42: reboot mwdebug1002
  • 19:36 jbond42: reboot mwdebug1001
  • 19:35 jbond@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
  • 19:35 jbond@cumin1001: START - Cookbook sre.hosts.downtime
  • 19:28 jbond@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
  • 19:28 jbond@cumin1001: START - Cookbook sre.hosts.downtime
  • 19:21 jbond@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
  • 19:21 jbond@cumin1001: START - Cookbook sre.hosts.downtime
  • 18:45 jbond@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
  • 18:45 jbond@cumin1001: START - Cookbook sre.hosts.downtime
  • 18:38 jbond@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
  • 18:38 jbond@cumin1001: START - Cookbook sre.hosts.downtime
  • 18:17 jbond@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
  • 18:17 jbond@cumin1001: START - Cookbook sre.hosts.downtime
  • 18:10 herron: correction — performing rolling reboots of codfw logstash hardware hosts for MDS security updates
  • 18:10 herron: performing rolling reboots of eqiad logstash hardware hosts for MDS security updates
  • 18:06 jbond@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
  • 18:05 jbond@cumin1001: START - Cookbook sre.hosts.downtime
  • 18:04 bblack: pool cp3045 - T222937
  • 17:49 jbond@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
  • 17:49 jbond@cumin1001: START - Cookbook sre.hosts.downtime
  • 17:38 jbond@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
  • 17:38 jbond@cumin1001: START - Cookbook sre.hosts.downtime
  • 17:37 jbond@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
  • 17:37 jbond@cumin1001: START - Cookbook sre.hosts.downtime
  • 17:33 jbond@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
  • 17:33 jbond@cumin1001: START - Cookbook sre.hosts.downtime
  • 17:27 jbond@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
  • 17:27 jbond@cumin1001: START - Cookbook sre.hosts.downtime
  • 17:21 jbond@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
  • 17:21 jbond@cumin1001: START - Cookbook sre.hosts.downtime
  • 16:58 legoktm: deleted some gerrit changes
  • 16:54 jbond@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
  • 16:54 jbond@cumin1001: START - Cookbook sre.hosts.downtime
  • 16:40 jbond@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
  • 16:40 jbond@cumin1001: START - Cookbook sre.hosts.downtime
  • 16:33 robh@cumin1001: END (PASS) - Cookbook sre.hosts.decommission (exit_code=0)
  • 16:33 robh@cumin1001: START - Cookbook sre.hosts.decommission
  • 16:33 robh@cumin1001: END (PASS) - Cookbook sre.hosts.decommission (exit_code=0)
  • 16:33 robh@cumin1001: START - Cookbook sre.hosts.decommission
  • 16:33 robh@cumin1001: END (PASS) - Cookbook sre.hosts.decommission (exit_code=0)
  • 16:33 robh@cumin1001: START - Cookbook sre.hosts.decommission
  • 16:32 marostegui: Compress some more tables on labsdb1012 before upgrading the host tomorrow T222978
  • 16:25 jbond@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
  • 16:25 jbond@cumin1001: START - Cookbook sre.hosts.downtime
  • 16:14 bblack: repool cp3035 (still varnish-be, but freshly installed!)
  • 16:13 jbond@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
  • 16:13 jbond@cumin1001: START - Cookbook sre.hosts.downtime
  • 16:12 jbond42: starting rolling reboots of mw1*
  • 16:09 bblack@cumin1001: conftool action : set/pooled=no; selector: name=cp3045.esams.wmnet
  • 16:08 bblack: depool cp3045 for reimage - T222937
  • 15:56 reedy@deploy1001: Synchronized wmf-config/CommonSettings.php: JADE - T212182 (duration: 00m 53s)
  • 15:55 reedy@deploy1001: Synchronized wmf-config/extension-list: JADE - T212182 (duration: 00m 53s)
  • 15:52 reedy@deploy1001: Synchronized php-1.34.0-wmf.7/extensions/Jade: Consistency (duration: 01m 08s)
  • 15:50 otto@deploy1001: Synchronized wmf-config/CommonSettings.php: Configure eventgate-main EventService. No-op in prod. T211248 (duration: 01m 19s)
  • 15:41 bblack: reboot cp3035 post-reimage
  • 15:27 otto@deploy1001: Synchronized wmf-config/InitialiseSettings-labs.php: Use eventgate-main in beta. No-op in prod. T211248 (duration: 00m 49s)
  • 15:18 zfilipin@deploy1001: rebuilt and synchronized wikiversions files: group0 to 1.34.0-wmf.8
  • 15:13 jmm@cumin2001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
  • 15:13 jmm@cumin2001: START - Cookbook sre.hosts.downtime
  • 15:13 moritzm: draining ganeti1003 for eventual reboot to MDS-enabled Linux kernel
  • 15:13 zfilipin@deploy1001: Finished scap: testwiki to php-1.34.0-wmf.8 and rebuild l10n cache (duration: 29m 46s)
  • 15:04 moritzm: failover Ganeti master in eqiad to ganeti1001
  • 14:51 jmm@cumin2001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
  • 14:51 jmm@cumin2001: START - Cookbook sre.hosts.downtime
  • 14:51 bblack: depool cp3035 for ATS reimage - T222937
  • 14:43 zfilipin@deploy1001: Started scap: testwiki to php-1.34.0-wmf.8 and rebuild l10n cache
  • 14:41 zfilipin@deploy1001: Pruned MediaWiki: 1.34.0-wmf.5 [keeping static files] (duration: 01m 38s)
  • 14:39 zfilipin@deploy1001: Pruned MediaWiki: 1.34.0-wmf.4 [keeping static files] (duration: 01m 34s)
  • 14:36 zfilipin@deploy1001: Pruned MediaWiki: 1.34.0-wmf.3 (duration: 11m 02s)
  • 13:53 jbond42: restart mtail on lithium
  • 13:46 fsero@cumin1001: END (PASS) - Cookbook sre.hosts.decommission (exit_code=0)
  • 13:46 fsero@cumin1001: START - Cookbook sre.hosts.decommission
  • 13:30 jbond42: starting rolling reboots of mw1*
  • 13:12 moritzm: draining ganeti1008 for eventual reboot to MDS-enabled Linux kernel
  • 12:57 jmm@cumin2001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
  • 12:57 jmm@cumin2001: START - Cookbook sre.hosts.downtime
  • 12:22 Urbanecm: ran mwscript deleteBatch.php --wiki=sawikisource -r 'T214553|phab:T214553T214553: deleting useless red
  • 12:13 akosiaris: restart pybal on lvs2003, lvs1015 for sessionstore LVS configuration. T220401
  • 12:09 marostegui@deploy1001: Synchronized wmf-config/db-codfw.php: Repool db2046 (duration: 00m 46s)
  • 12:04 akosiaris: restart pybal on lvs2006 for sessionstore LVS configuration. T220401
  • 11:40 akosiaris: restart pybal on lvs1015 for sessionstore LVS configuration. T220401
  • 11:39 krinkle@deploy1001: Synchronized php-1.34.0-wmf.7/includes/: T221577 / 1286d131c01886 (duration: 01m 04s)
  • 11:39 jijiki: enabling puppet on mc1*
  • 11:38 Urbanecm: run mwscript namespaceDupes.php --wiki=kuwiktionary --fix (T224327)
  • 11:37 urbanecm@deploy1001: Synchronized wmf-config/InitialiseSettings.php: Custom namespaces for ku.wiktionary|gerrit:514239Custom namespaces for ku.wiktionary (T224327) (duration: 00m 46s)
  • 11:35 urbanecm@deploy1001: Synchronized wmf-config/InitialiseSettings.php: Add localized project logo for sahwikiquote|gerrit:507931Add localized project logo for sahwikiquote (2/2, T222065) (duration: 00m 47s)
  • 11:34 urbanecm@deploy1001: Synchronized static/images/project-logos/: Add localized project logo for sahwikiquote|gerrit:507931Add localized project logo for sahwikiquote (1/2, T222065) (duration: 00m 47s)
  • 11:31 jijiki: enabling puppet on mc2*
  • 11:29 Urbanecm: running mwscript namespaceDupes.php --wiki=sawikisource --add-prefix=T214553 --fix (T214553)
  • 11:28 Urbanecm: run mwscript namespaceDupes.php --wiki=thwiki --fix (T216322)
  • 11:28 urbanecm@deploy1001: Synchronized wmf-config/InitialiseSettings.php: Add Author namespace in Sanskrit Wikisource|gerrit:486221Add Author namespace in Sanskrit Wikisource (T214553) (duration: 00m 46s)
  • 11:24 urbanecm@deploy1001: Synchronized wmf-config/CommonSettings.php: Create new protection levels for dewiktionary|gerrit:495918Create new protection levels for dewiktionary (2/2, T216885) (duration: 00m 47s)
  • 11:23 urbanecm@deploy1001: Synchronized wmf-config/InitialiseSettings.php: Create new protection levels for dewiktionary|gerrit:495918Create new protection levels for dewiktionary (1/2, T216885) (duration: 00m 47s)
  • 11:19 urbanecm@deploy1001: Synchronized wmf-config/InitialiseSettings.php: Add editcontentmodel right to the templateeditor group on testwiki|gerrit:494016Add editcontentmodel right to the templateeditor group on testwiki (T217499) (duration: 00m 47s)
  • 11:15 urbanecm@deploy1001: Synchronized wmf-config/InitialiseSettings.php: Add new namespaces for th.wiki|gerrit:491054Add new namespaces for th.wiki (T216322) (duration: 00m 47s)
  • 11:09 krinkle@deploy1001: Synchronized php-1.34.0-wmf.6/includes/: T221577 / 1286d131c01886 (duration: 01m 07s)
  • 11:02 moritzm: draining ganeti1007 for eventual reboot to MDS-enabled Linux kernel
  • 11:02 jmm@cumin2001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
  • 11:02 jmm@cumin2001: START - Cookbook sre.hosts.downtime
  • 10:44 jbond42: mw1* restarts will be delayed untill 11:15
  • 10:42 jbond42: will start rolling reboots of mw1* servers 1t 10:50
  • 09:27 moritzm: draining ganeti1006 for eventual reboot to MDS-enabled Linux kernel
  • 09:25 jijiki: disable puppet on mc* hosts to merge 511963 and 511973
  • 09:01 moritzm: draining ganeti1005 for eventual reboot to MDS-enabled Linux kernel
  • 08:57 jmm@cumin2001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
  • 08:57 jmm@cumin2001: START - Cookbook sre.hosts.downtime
  • 08:32 elukey: remove memcached nutcracker config from mw1* hosts (not used). Changes will be picked up when nutcracker will be restarted (after reboots, etc..) - T214275
  • 08:23 moritzm: draining ganeti1004 for eventual reboot to MDS-enabled Linux kernel
  • 08:04 marostegui: Stop MySQL on db2046 to clone db2058 - T221533
  • 08:04 marostegui@deploy1001: Synchronized wmf-config/db-codfw.php: Depool db2046 (duration: 00m 47s)
  • 08:03 elukey: restart hive-server2 on an-coord1001 to pick up new GC/Heap settings
  • 07:35 mobrovac@deploy1001: Finished deploy [restbase/deploy@abcb534]: Use only Proton for PDF rendering - T210651 (duration: 19m 16s)
  • 07:22 jmm@cumin2001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
  • 07:21 jmm@cumin2001: START - Cookbook sre.hosts.downtime
  • 07:21 moritzm: draining ganeti1002 for eventual reboot to MDS-enabled Linux kernel
  • 07:16 marostegui@deploy1001: Synchronized wmf-config/db-codfw.php: Move db2058 from s4 to s6 (duration: 00m 47s)
  • 07:16 mobrovac@deploy1001: Started deploy [restbase/deploy@abcb534]: Use only Proton for PDF rendering - T210651
  • 06:57 elukey: restart hive metastore on an-coord1001 to apply new GC/heap settings
  • 06:42 marostegui@deploy1001: Synchronized wmf-config/db-eqiad.php: Fully repool db1091 after upgrade (duration: 00m 48s)
  • 06:21 elukey: restart pdfrender on scb1002 (flapping)
  • 06:04 marostegui@deploy1001: Synchronized wmf-config/db-eqiad.php: More traffic to db1091 after upgrade (duration: 00m 47s)
  • 05:54 marostegui: Stop MySQL on db2078:m3 - T221533
  • 05:51 marostegui@deploy1001: Synchronized wmf-config/db-eqiad.php: Slowly repool db1091 after upgrade (duration: 00m 47s)
  • 05:40 marostegui: Stop MySQL on db1091 for MySQL upgrade T224852
  • 05:40 marostegui@deploy1001: Synchronized wmf-config/db-eqiad.php: Depool db1091 for upgrade (duration: 00m 48s)
  • 05:27 marostegui@deploy1001: Synchronized wmf-config/db-eqiad.php: Repool db1097 after upgrade (duration: 00m 46s)
  • 05:19 marostegui: Stop MySQL on db1097 for upgrade
  • 05:19 marostegui@deploy1001: Synchronized wmf-config/db-eqiad.php: Depool db1097 for upgrade (duration: 00m 47s)
  • 04:59 marostegui@deploy1001: Synchronized wmf-config/db-eqiad.php: Remove db1081 from API (duration: 00m 49s)
  • 01:10 bstorm_: T223406 depooled/repooled labsdb1009 for view updates
  • 00:09 bstorm_: T223406 repooled labsdb1011 after completing view updates

2019-06-03

  • 22:20 bstorm_: T223406 depooled labsdb1011
  • 22:09 bstorm_: T223406 repooled labsdb1010 after completing view updates
  • 21:29 XioNoX: drop all ICMP frag on all routers - T224186
  • 19:57 XioNoX: stop sampling from cr2-eqiad
  • 18:48 XioNoX: Add RPKI validators to all routers - T220669
  • 18:35 hashar: switch most Quibble jobs to node 10 T222406 - ttps://gerrit.wikimedia.org/r/#/c/integration/config/+/514034/ T222406
  • 18:35 XioNoX: drop all ICMP frag on cr1/2-eqiad - T224186
  • 18:17 XioNoX: add routinator 0.4.0 to APT repo - T220669
  • 17:16 onimisionipe@deploy1001: Finished deploy [wdqs/wdqs@9e3035c]: Blazegraph version wmf.4 (duration: 11m 29s)
  • 17:05 onimisionipe@deploy1001: Started deploy [wdqs/wdqs@9e3035c]: Blazegraph version wmf.4
  • 16:40 onimisionipe: started osm-import on maps2004 - T224395
  • 16:30 bstorm_: T223406 depooled labsdb1010 for view updates
  • 15:39 bstorm_: T223406 labsdb1012 updated views for actor table changes
  • 14:46 akosiaris: deploy kask in sessionstore kubernetes namespace in eqiad, codfw T220401
  • 14:34 arturo: T221769 reimaging cloudservices1003 to stretch
  • 14:20 vgutierrez: upgrading acme-chief to version 0.17 in acme-chief production instances - T220518
  • 13:53 jmm@cumin2001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
  • 13:53 jmm@cumin2001: START - Cookbook sre.hosts.downtime
  • 13:53 moritzm: draining ganeti1001 for eventual reboot to MDS-enabled Linux kernel
  • 13:44 mholloway-shell@deploy1001: Synchronized wmf-config/InitialiseSettings.php: WikimediaEditorTasks: Drop caption edit counter unlock delay to 0 (duration: 00m 49s)
  • 13:38 marostegui@deploy1001: Synchronized wmf-config/db-eqiad.php: Pool db1138 into s4 API (duration: 00m 48s)
  • 13:19 marostegui: Move db2078:3321 under db2062 T220170
  • 13:03 arturo: add prometheus-pdns-rec-exporter v0.7 to stretch-wikimedia (T224877)
  • 12:56 anomie@deploy1001: Synchronized wmf-config/InitialiseSettings.php: Setting actor migration to write-new/read-new on remaining wikis (T188327) (duration: 00m 48s)
  • 12:24 arturo: add prometheus-pdns-exporter v0.4 to stretch-wikimedia (T224877)
  • 11:28 gehel: reboot relforge for microcode + jvm upgrade
  • 11:17 jijiki: Restarting php7.2-fpm in eqiad in batches of 2 for 513949
  • 11:15 Urbanecm: EU SWAT done
  • 11:15 urbanecm@deploy1001: Synchronized wmf-config/InitialiseSettings.php: Add Wikiprojekti namespace to wgExtraSignatureNamespaces for fiwiki|gerrit:513740Add Wikiprojekti namespace to wgExtraSignatureNamespaces for fiwiki (T224215) (duration: 00m 47s)
  • 11:11 urbanecm@deploy1001: Synchronized wmf-config/InitialiseSettings.php: Add 5 active namespaces for VisualEditor on en.wikiversity|gerrit:503680Add 5 active namespaces for VisualEditor on en.wikiversity (T220881) (duration: 00m 48s)
  • 11:06 urbanecm@deploy1001: Synchronized wmf-config/InitialiseSettings.php: Add "Zerrenda" (list) namespace to VisualEditor on euwiki|gerrit:513720Add "Zerrenda" (list) namespace to VisualEditor on euwiki (T224801) (duration: 00m 48s)
  • 10:52 moritzm: upgrading maps servers to new Java security release
  • 10:47 moritzm: upgrading WDQS servers to new Java security release
  • 10:42 vgutierrez: upgrading prometheus-trafficserver-exporter in upload_ats ulsfo instances
  • 10:41 jdrewniak@deploy1001: Synchronized portals: Wikimedia Portals Update: Bumping portals to master (T128546)|gerrit:513972 Bumping portals to master (T128546) (duration: 00m 47s)
  • 10:40 jdrewniak@deploy1001: Synchronized portals/wikipedia.org/assets: Wikimedia Portals Update: Bumping portals to master (T128546)|gerrit:513972 Bumping portals to master (T128546) (duration: 00m 49s)
  • 10:36 jijiki: Restarting php7.2-fpm in codfw in batches of 2 for 513949
  • 10:34 moritzm: upgrading Elastic servers to new Java security release
  • 10:26 bmansurov@deploy1001: Finished deploy [recommendation-api/deploy@5046f3c]: Update the recommendation API service (duration: 03m 15s)
  • 10:23 bmansurov@deploy1001: Started deploy [recommendation-api/deploy@5046f3c]: Update the recommendation API service
  • 10:03 oblivian@puppetmaster1001: conftool action : set/pooled=false; selector: name=codfw,dnsdisc=kartotherian
  • 10:02 oblivian@puppetmaster1001: conftool action : set/pooled=true; selector: name=eqiad,dnsdisc=kartotherian
  • 09:48 onimisionipe: depooled maps codfw due to lag and disk issues - T224395
  • 09:46 moritzm: upgrading Druid/Kafka-Jumbo servers to new Java security release (will be picked up by forthcoming MDS reboots)
  • 09:43 moritzm: upgrading AQS servers to new Java security release (will be picked up by forthcoming MDS reboots)
  • 09:33 moritzm: upgrading Hadoop servers to new Java security release (will be picked up by forthcoming MDS reboots)
  • 08:18 ema: cp1077: restart varnish-be
  • 08:17 elukey: manually removed phab_clean_tmp from www-data's crontab on phab1001 to reduce cronspam
  • 08:16 ema: cp1075: restart varnish-be
  • 08:03 marostegui: Stop MySQL on db1064 T223217
  • 08:01 marostegui: Remove db1064 from tendril and zarcillo T223217
  • 07:58 elukey: refresh field list for logstash (via kibana Management -> Index patterns -> etc..)
  • 07:48 marostegui: Repool db1103 after upgrade T224852
  • 07:29 marostegui: Stop MySQL on db1103 (s2 and s4) for upgrade T224852
  • 07:29 marostegui@deploy1001: Synchronized wmf-config/db-eqiad.php: Depool db1103 for upgrade (duration: 00m 47s)
  • 07:10 marostegui@deploy1001: Synchronized wmf-config/db-eqiad.php: Fully repool db1081 into API after upgrade (duration: 00m 48s)
  • 06:50 elukey: roll restart varnishkafka (via puppet) for a config change - T224236
  • 06:46 kartik@deploy1001: scap-helm cxserver finished
  • 06:46 kartik@deploy1001: scap-helm cxserver cluster eqiad completed
  • 06:45 kartik@deploy1001: scap-helm cxserver upgrade -f cxserver-eqiad-values.yaml production stable/cxserver [namespace: cxserver, clusters: eqiad]
  • 06:45 kartik@deploy1001: scap-helm cxserver finished
  • 06:45 kartik@deploy1001: scap-helm cxserver cluster codfw completed
  • 06:45 kartik@deploy1001: scap-helm cxserver upgrade -f cxserver-codfw-values.yaml production stable/cxserver [namespace: cxserver, clusters: codfw]
  • 06:44 kartik@deploy1001: scap-helm cxserver finished
  • 06:44 kartik@deploy1001: scap-helm cxserver cluster staging completed
  • 06:44 kartik@deploy1001: scap-helm cxserver upgrade -f cxserver-staging-values.yaml staging stable/cxserver [namespace: cxserver, clusters: staging]
  • 06:39 marostegui@deploy1001: Synchronized wmf-config/db-eqiad.php: Slowly repool db1081 into API after upgrade (duration: 00m 49s)
  • 06:26 marostegui@deploy1001: Synchronized wmf-config/db-eqiad.php: Repool db1081 after upgrade (duration: 00m 46s)
  • 06:04 marostegui: Stop MySQL on db1081 for upgrade - T224852
  • 06:00 marostegui@deploy1001: Synchronized wmf-config/db-eqiad.php: Depool db1081 for upgrade (duration: 00m 47s)
  • 05:50 marostegui@deploy1001: Synchronized wmf-config/db-eqiad.php: Fully repool es1019 T213422 (duration: 00m 46s)
  • 05:45 marostegui: Upgrade mariadb on dbstore1004 - T224852
  • 05:17 marostegui: Upgrade MariaDB on codfw hosts in preparation for s4 master failover T217396
  • 05:15 marostegui@deploy1001: Synchronized wmf-config/db-eqiad.php: More traffic to es1019 T213422 (duration: 00m 46s)
  • 05:05 marostegui: Remove db2037 from tendril and zarcillo T224720
  • 05:04 marostegui: Stop MySQL on db2037 for decommission T224720
  • 04:56 marostegui@deploy1001: Synchronized wmf-config/db-eqiad.php: Slowly repool es1019 T213422 (duration: 00m 51s)

2019-06-02

  • 20:28 onimisionipe: pooled wdqs1007. It caught up on lag
  • 15:24 onimisionipe: depooled wdqs1007 to catch up on lags
  • 15:22 onimisionipe: depool wdqs internal cluster to allow them catch up on lags. depool one at a time
  • 03:09 andrewbogott: restarting pdns-recursor on cloudservices 1003 and 1004 (but not at the same time)

2019-06-01

  • 22:49 krinkle@deploy1001: Synchronized php-1.34.0-wmf.7/extensions/3D/modules/mmv.3d.js: T224812 / bd4fbfddbe1a0 (duration: 01m 07s)

2019-05-31

  • 21:47 aaron@deploy1001: Synchronized wmf-config/db-eqiad.php: Set "secret" field in $wgLBFactoryConf for ChronologyProtector HMACs (duration: 00m 47s)
  • 21:46 aaron@deploy1001: Synchronized wmf-config/db-codfw.php: Set "secret" field in $wgLBFactoryConf for ChronologyProtector HMACs (duration: 00m 50s)
  • 21:10 bblack: cp3034: repool - T222937
  • 20:04 bblack: cp3034: depool for reimage - T222937
  • 18:44 marostegui: Start MySQL on es1019 - T213422
  • 18:34 jgleeson: payments-wiki updated from a76658f0a3 to c6c7bbf71e
  • 17:29 andrewbogott: added jeh to the 'ops' group in ldap
  • 16:20 ariel@deploy1001: Finished deploy [dumps/dumps@fd6100a]: remove orderrevs config option, unneeded now (duration: 00m 03s)
  • 16:20 ariel@deploy1001: Started deploy [dumps/dumps@fd6100a]: remove orderrevs config option, unneeded now
  • 15:05 bblack: cp3039: restart varnish-be for mbox lag (likely induced by 3049's depool for ATS conversion!)
  • 15:00 Krinkle: krinkle@deploy1001: pulling down 6f91b41 for php-1.34-wmf.7/extensions/ORES (without deploy), commit seems test-only
  • 14:59 Krinkle: krinkle@deploy1001: git status in php-1.34-wmf.7/ is dirty (extensions/ORES)
  • 14:52 bblack: pool cp3049 back into service - T222937
  • 14:32 onimisionipe: depool maps2004 (again) - T224395
  • 14:32 elukey: powercycle notebook1003 - host stuck due to user processes, no ssh available, OOM didn't trigger
  • 14:20 _joe_: rolling restart of php-fpm across production to pick up the shorter revalidate frequency for T224491
  • 14:10 bblack: reboot cp3049 - T222937
  • 13:16 bblack: depool cp3049 for reimage - T222937
  • 11:46 jynus: stop and upgrade db2084
  • 11:09 jynus@deploy1001: Synchronized wmf-config/db-eqiad.php: Fully repool db1099 after maintenance (duration: 00m 48s)
  • 10:54 jynus: depool labsdb1010 for maintenance
  • 10:47 arturo: merging multiple commits to labs/private.git. We now require `puppet-merge --labsprivate` and people may not be yet aware of that
  • 09:28 jynus: stop and upgrade db2073
  • 09:11 jynus: stop and upgrade db2095 (s2, s4, s6, s7)
  • 08:33 jynus: upgrade and restart db2065
  • 08:16 jynus: depool labsdb1011 for maintenance
  • 07:54 jynus@deploy1001: Synchronized wmf-config/db-eqiad.php: Repool db1099 with low weight (duration: 00m 49s)
  • 07:43 _joe_: restarting php-fpm on canaries
  • 07:24 _joe_: repooling mw1348
  • 07:24 jynus: upgrade and restart labsdb1009
  • 07:15 _joe_: draining mw1348 from traffic
  • 07:14 jynus: depool labsdb1009 for maintenance
  • 06:55 jynus: upgrade and restart db2058
  • 06:33 _joe_: repooled mw1348
  • 06:21 jijiki: depool mw1348
  • 06:16 _joe_: restarting php-fpm on mw1348
  • 00:08 jgleeson: Updating civicrm from bb4acf3d8a to e028bfcd63

2019-05-30

  • 23:36 XioNoX: remove BGP sessions to starhub on cr4-ulsfo (left the IXP)
  • 22:59 marxarelli: deleted 95 docker images from contint1001, freeing ~ 8G on / cc: T219850
  • 22:59 XioNoX: add terms to drop specific icmp frag packets from cr1/2-eqiad - T224186
  • 22:53 marxarelli: deleting stale docker images from contint1001, cc: T207707 T219850
  • 22:25 mutante: phab2001 / phab1003 - why is 'git status' in /srv/phab/phabricator unclean with lots of file deletions but also not identical
  • 22:24 mutante: phab2001 - scap pull - but it fails with directory /srv/mediawiki not found that's so wrong
  • 22:20 niharika29@deploy1001: Synchronized php-1.34.0-wmf.7/extensions/WikimediaEvents/: Avoid division by zero warnings T224686 (duration: 00m 49s)
  • 22:19 niharika29@deploy1001: Synchronized php-1.34.0-wmf.7/extensions/PageTriage/: Fix broken feed - T224693 (duration: 00m 51s)
  • 21:27 Krinkle: krinkle@mwmaint1002 Add 1 row to pagetriage_tags table on test2wiki db, based on PageTriageTagsPatch-recreated.sql. T224693, T189929
  • 21:12 Krinkle: krinkle@mwmaint1002 Add 1 row to pagetriage_tags table on testwiki db, based on PageTriageTagsPatch-recreated.sql. T224693, T189929
  • 21:11 Krinkle: krinkle@mwmaint1002 Add 1 row to pagetriage_tags table on enwiki, based on PageTriageTagsPatch-recreated.sql. T224693, T189929
  • 21:10 niharika29@deploy1001: Synchronized php-1.34.0-wmf.7/extensions/PageTriage: Bump wgPageTriageCacheVersion T224693 (duration: 00m 51s)
  • 21:07 XioNoX: add RPKI sessions on cr4-ulsfo - T220669
  • 20:39 twentyafterfour: phabricator: restart ssh-phab.service
  • 19:49 mutante: sodium (mirrors) - sudo -u mirror /usr/local/sbin/update-ubuntu-mirror
  • 18:49 Urbanecm: Morning SWAT finished
  • 18:47 urbanecm@deploy1001: Synchronized php-1.34.0-wmf.7/extensions/GrowthExperiments/: QuestionPoster: Correctly set timestamp when question is posted|gerrit:513300QuestionPoster: Correctly set timestamp when question is posted (T223338) (duration: 00m 51s)
  • 18:26 mutante: phab1003 - switch 'vcs' user to 'NP' to match phab1001 setup and then /srv/phab/phabricator# ./bin/config set diffusion.ssh-user vcs (T224677)
  • 18:24 XioNoX: bounce eqord-ulsfo interface to try to fix BFD sessions
  • 18:12 Krinkle: Running `php7adm /opcache-free` on mw1348 and mw1321, T224491
  • 18:12 Krinkle: Running `php7adm /opcache-free` on mw1348 and mw1321
  • 18:11 Krinkle: mw1348 (recent api/php72 100% experiment) shows signs of corruption
  • 18:11 Krinkle: mw1321 php7.2 shows signs of corruption for over 2 hours – https://phabricator.wikimedia.org/T224491#5224464
  • 18:03 krinkle@deploy1001: Synchronized wmf-config/arclamp.php: (no justification provided) (duration: 00m 53s)
  • 16:24 bblack: re-pool cp3047 into service as ats-be - T222937
  • 16:04 mutante: phab1001 - removing 2620:0:861:103:10:64:32:186/128 from eth0
  • 16:03 mutante: phab1001 - removing 10.64.32.186/32 from eth0
  • 16:01 mutante: phab1001 - removing git-ssh.wm.org IP from interface - phab1003 - activating IPv6 listen address for git-ssh
  • 15:36 jynus: stop es1019 for maintenance T213422
  • 15:26 cmjohnson1: shutting down db1099 to swap DIMM T221502
  • 15:20 jynus@deploy1001: Synchronized wmf-config/db-eqiad.php: Repool db1089 with full weight; depool es1019 (duration: 00m 52s)
  • 15:19 herron: performing rolling reboots of eqiad kafka main cluster hosts for security updates
  • 15:06 onimisionipe: pooled maps2004 - osm import is complete - T224395
  • 14:44 andrewbogott: reimaging cloudvirtan1001 for T224566
  • 14:43 jbond@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
  • 14:43 jbond@cumin1001: START - Cookbook sre.hosts.downtime
  • 14:42 andrewbogott: reimaging cloudvirtan1001
  • 14:29 jbond@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
  • 14:29 jbond@cumin1001: START - Cookbook sre.hosts.downtime
  • 14:25 jbond@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
  • 14:25 jbond@cumin1001: START - Cookbook sre.hosts.downtime
  • 14:22 bblack: rebooting cp3047 (post-reimage/puppetization for T222937)
  • 14:21 jbond@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
  • 14:21 jbond@cumin1001: START - Cookbook sre.hosts.downtime
  • 14:05 jbond@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
  • 14:05 jbond@cumin1001: START - Cookbook sre.hosts.downtime
  • 14:00 jbond@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
  • 14:00 jbond@cumin1001: START - Cookbook sre.hosts.downtime
  • 13:57 jijiki: enable puppet on mw* in eqiad
  • 13:44 volans: rm /root/.ssh/known_hosts on cumin[12]001
  • 13:40 jbond@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
  • 13:40 jbond@cumin1001: START - Cookbook sre.hosts.downtime
  • 13:36 zfilipin@deploy1001: rebuilt and synchronized wikiversions files: all wikis to 1.34.0-wmf.7
  • 13:28 jijiki: Enabling puppet on mw*.codfw.net
  • 13:22 zfilipin@deploy1001: Synchronized php-1.34.0-wmf.7/resources/src/jquery/jquery.suggestions.js: SWAT: [[gerrit:513237|jquery.suggestions: Do not show suggestions on prefilled values ([T224524])]] (duration: 00m 58s)
  • 13:17 filippo@puppetmaster1001: conftool action : set/pooled=no; selector: name=restbase1015.eqiad.wmnet
  • 13:17 filippo@puppetmaster1001: conftool action : set/pooled=no; selector: name=restbase1014.eqiad.wmnet
  • 13:17 filippo@puppetmaster1001: conftool action : set/pooled=no; selector: name=restbase1013.eqiad.wmnet
  • 13:17 filippo@puppetmaster1001: conftool action : set/pooled=no; selector: name=restbase1012.eqiad.wmnet
  • 13:17 filippo@puppetmaster1001: conftool action : set/pooled=no; selector: name=restbase1011.eqiad.wmnet
  • 13:16 filippo@puppetmaster1001: conftool action : set/pooled=no; selector: name=restbase1010.eqiad.wmnet
  • 13:16 filippo@puppetmaster1001: conftool action : set/pooled=no; selector: name=restbase1009.eqiad.wmnet
  • 13:16 filippo@puppetmaster1001: conftool action : set/pooled=no; selector: name=restbase1008.eqiad.wmnet
  • 13:16 filippo@puppetmaster1001: conftool action : set/pooled=no; selector: name=restbase1007.eqiad.wmnet
  • 13:08 bblack: cp3047 puppet-disable + depool for reimage to ATS - T222937
  • 13:03 marostegui: Stop MySQL on db1099 for onsite maintenance - T221502
  • 13:02 marostegui@deploy1001: Synchronized wmf-config/db-eqiad.php: Depool db1099 T221502 (duration: 00m 56s)
  • 13:00 reedy@deploy1001: Synchronized php-1.34.0-wmf.7/tests/phpunit/includes/: T222628 (duration: 01m 06s)
  • 12:58 reedy@deploy1001: Synchronized php-1.34.0-wmf.7/includes/Linker.php: T222628 (duration: 01m 04s)
  • 12:49 jbond@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
  • 12:49 jbond@cumin1001: START - Cookbook sre.hosts.downtime
  • 12:42 jbond@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
  • 12:42 jbond@cumin1001: START - Cookbook sre.hosts.downtime
  • 12:37 jbond@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
  • 12:37 jbond@cumin1001: START - Cookbook sre.hosts.downtime
  • 12:25 jbond@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
  • 12:25 jbond@cumin1001: START - Cookbook sre.hosts.downtime
  • 12:21 jbond@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
  • 12:21 jbond@cumin1001: START - Cookbook sre.hosts.downtime
  • 12:15 jbond@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
  • 12:15 jbond@cumin1001: START - Cookbook sre.hosts.downtime
  • 12:08 jbond@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
  • 12:08 jbond@cumin1001: START - Cookbook sre.hosts.downtime
  • 12:03 jbond@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
  • 12:02 jbond@cumin1001: START - Cookbook sre.hosts.downtime
  • 11:57 jbond@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
  • 11:57 jbond@cumin1001: START - Cookbook sre.hosts.downtime
  • 11:52 jbond@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
  • 11:52 jbond@cumin1001: START - Cookbook sre.hosts.downtime
  • 11:44 jbond@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
  • 11:44 jbond@cumin1001: START - Cookbook sre.hosts.downtime
  • 11:39 jbond@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
  • 11:39 jbond@cumin1001: START - Cookbook sre.hosts.downtime
  • 11:36 jbond@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
  • 11:35 jbond@cumin1001: START - Cookbook sre.hosts.downtime
  • 11:34 akosiaris: reboot ganeti2003 for kernel upgrades
  • 11:28 jbond@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
  • 11:28 jbond@cumin1001: START - Cookbook sre.hosts.downtime
  • 11:24 jbond@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
  • 11:24 jbond@cumin1001: START - Cookbook sre.hosts.downtime
  • 11:20 jbond@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
  • 11:20 jbond@cumin1001: START - Cookbook sre.hosts.downtime
  • 11:14 jbond@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
  • 11:14 jbond@cumin1001: START - Cookbook sre.hosts.downtime
  • 11:14 _joe_: freed opcache on mw1281
  • 11:11 jbond@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
  • 11:11 jbond@cumin1001: START - Cookbook sre.hosts.downtime
  • 11:07 jbond@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
  • 11:07 jbond@cumin1001: START - Cookbook sre.hosts.downtime
  • 11:05 Urbanecm: EU SWAT finished
  • 11:04 urbanecm@deploy1001: Synchronized wmf-config/abusefilter.php: gerrit:Enable abusefilter blocking ability in plwiki (T224617) (duration: 00m 58s)
  • 11:00 jijiki: Disable puppet on mw* servers to merge 507939 - T219150
  • 10:42 jynus: upgrade and restart db1117 (temporary proxy fail for passive host, reduced redundancy for m*)
  • 10:39 jbond@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
  • 10:39 jbond@cumin1001: START - Cookbook sre.hosts.downtime
  • 10:36 jbond@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
  • 10:36 jbond@cumin1001: START - Cookbook sre.hosts.downtime
  • 10:32 jbond@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
  • 10:32 jbond@cumin1001: START - Cookbook sre.hosts.downtime
  • 10:28 jbond@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
  • 10:28 jbond@cumin1001: START - Cookbook sre.hosts.downtime
  • 10:25 jbond@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
  • 10:25 jbond@cumin1001: START - Cookbook sre.hosts.downtime
  • 10:22 jbond@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
  • 10:22 jbond@cumin1001: START - Cookbook sre.hosts.downtime
  • 10:19 jbond@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
  • 10:19 jbond@cumin1001: START - Cookbook sre.hosts.downtime
  • 10:18 jbond@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
  • 10:18 jbond@cumin1001: START - Cookbook sre.hosts.downtime
  • 10:15 jbond@cumin1001: END (FAIL) - Cookbook sre.hosts.downtime (exit_code=99)
  • 10:15 jbond@cumin1001: START - Cookbook sre.hosts.downtime
  • 10:07 jynus: upgrade and restart test-s4 hosts (db1111, db1112)
  • 09:42 jynus: stop and upgrade db1102
  • 09:32 jbond@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
  • 09:32 jbond@cumin1001: START - Cookbook sre.hosts.downtime
  • 09:31 _joe_: depooling mw1261 for benchmarking for T224491
  • 09:26 jynus@deploy1001: Synchronized wmf-config/db-eqiad.php: Repool db1089 with low weight (duration: 00m 55s)
  • 08:54 jynus: stop and restart db1089 for upgrade
  • 08:50 onimisionipe: maps2001 postgres initialization - T224395
  • 08:44 jynus@deploy1001: Synchronized wmf-config/db-eqiad.php: Depool db1089 for maintenance (duration: 00m 57s)
  • 08:32 jynus@deploy1001: Synchronized wmf-config/db-codfw.php: Repool db2087 for maintenance (duration: 01m 00s)
  • 08:10 mobrovac: drop old Parsoid tables from cassandra -- T223998
  • 07:40 mobrovac@deploy1001: Finished deploy [restbase/deploy@92591a7]: Switch to OpenAPI v3 and drop page/html/title/revision/tid - T218218 T215956 (duration: 19m 28s)
  • 07:33 _joe_: upgraded service-checker on icinga1001,2
  • 07:21 mobrovac@deploy1001: Started deploy [restbase/deploy@92591a7]: Switch to OpenAPI v3 and drop page/html/title/revision/tid - T218218 T215956
  • 00:40 marostegui@deploy1001: Synchronized wmf-config/db-codfw.php: Repool db2091 - T224393 (duration: 00m 56s)
  • 00:24 mutante: re-enabling puppet on phab1001 now that it does not have the phab role anymore (T221389)
  • 00:17 mutante: rsyncing /srv/repos again. pulling on phab2001 from phab1003 (T221389)

2019-05-29

  • 23:37 jforrester@deploy1001: Synchronized wmf-config/InitialiseSettings.php: Remove wikibase sameAs A/B test config, part II (duration: 00m 56s)
  • 23:36 jforrester@deploy1001: sync-file aborted: Remove wikibase sameAs A/B test config, part I (duration: 00m 00s)
  • 23:35 jforrester@deploy1001: Synchronized wmf-config/Wikibase.php: Remove wikibase sameAs A/B test config, part I (duration: 00m 56s)
  • 23:26 jforrester@deploy1001: Synchronized php-1.34.0-wmf.7/extensions/AbuseFilter/includes/parser/AbuseFilterTokenizer.php: SWAT AbuseFilter: Tokenizer caching back to APC I8c6a4a95e (duration: 00m 54s)
  • 23:19 jforrester@deploy1001: Synchronized wmf-config/flaggedrevs.php: Replace FR constants with numbers Ia52f644948 (duration: 00m 56s)
  • 23:17 jforrester@deploy1001: Synchronized multiversion/MWScript.php: Mark refreshMessageBlobs.php as a global script (duration: 00m 56s)
  • 23:15 mutante: repooled phab2001-vcs , fixes pybal / lvs alerts
  • 23:13 dzahn@cumin1001: conftool action : set/pooled=yes; selector: name=phab2001-vcs.codfw.wmnet
  • 23:10 jforrester@deploy1001: Synchronized wmf-config/InitialiseSettings.php: SWAT Enable wgSpecialSearchFormOptions on production Wikidata T55652 (duration: 00m 57s)
  • 23:01 mutante: phab2001 - same issue with tin.eqiad.wmnet still showing up when first trying to git clone
  • 22:52 mutante: misweb2001 - a2dismod mpm_event ; systemctl restart apache2 to fix php7.0 dependency issue
  • 22:50 mutante: miscweb2001 - when first trying to git pull iegreview - still tries to resolve 'tin.eqiad.wmnet' which is long gone. fix is still to manually edit /srv/deployment/iegreview/iegreview-cache/cache/.git/config
  • 22:46 jforrester@deploy1001: Synchronized wmf-config/CirrusSearch-common.php: Hot-deploy T224634 to fix CirrusSearch for extension registration (duration: 00m 57s)
  • 21:50 dzahn@cumin1001: conftool action : set/pooled=no; selector: name=phab2001-vcs.codfw.wmnet
  • 21:47 mutante: installing OS on miscweb2001 VM failed at grub install step :( T224323
  • 21:47 mutante: sign puppet cert request for phab2001 after reinstall (for some reason it needed me to connect to console and hit enter, reimage script itself was stuck)
  • 20:54 mutante: creating new ganeti VM miscweb2001.codfw.wmnet with same specs as krypton.eqiad.wmnet (T224323)
  • 20:35 arlolra: Updated Parsoid to 8546c79 (T219927, T211125)
  • 20:35 ejegg: updated payments-wiki from 332aaa96e2 to 45b73e7749
  • 20:28 arlolra@deploy1001: Finished deploy [parsoid/deploy@6caac43]: Updating Parsoid to 8546c79 (duration: 07m 46s)
  • 20:20 arlolra@deploy1001: Started deploy [parsoid/deploy@6caac43]: Updating Parsoid to 8546c79
  • 20:10 bblack: pool cp3044 (esams cache_upload ats-be) - T222937
  • 19:46 reedy@deploy1001: Synchronized php-1.34.0-wmf.7/extensions/Collection/: Replace missing wfCollectionSuggestAction (duration: 00m 57s)
  • 19:45 XioNoX: enable cr1-codfw:et-0/2/1 - T224511
  • 19:45 reedy@deploy1001: Synchronized php-1.34.0-wmf.6/extensions/Collection/: Replace missing wfCollectionSuggestAction (duration: 01m 01s)
  • 19:42 dzahn@cumin1001: conftool action : set/pooled=no; selector: name=phab2001-vcs.codfw.wmnet
  • 19:32 mutante: phab2001 - reinstalling with stretch - upgrade from jessie (T190568)
  • 19:09 XioNoX: enable cr1-codfw:et-0/2/0 - T224511
  • 18:37 bblack@cumin1001: conftool action : set/pooled=no; selector: name=cp3044.esams.wmnet
  • 17:44 XioNoX: enable cr1-codfw:et-0/0/1 - T224511
  • 17:13 XioNoX: enable cr1-codfw:et-0/0/0 - T224511
  • 17:02 urbanecm@deploy1001: Synchronized wmf-config/CommonSettings.php: Change arwiki default user preferences|gerrit:501926Change arwiki default user preferences, part 3/3 (T220186) (duration: 00m 56s)
  • 17:00 urbanecm@deploy1001: Synchronized wmf-config/flaggedrevs.php: Change arwiki default user preferences|gerrit:501926Change arwiki default user preferences, part 2/3 (T220186) (duration: 00m 56s)
  • 16:59 urbanecm@deploy1001: Synchronized wmf-config/InitialiseSettings.php: Change arwiki default user preferences|gerrit:501926Change arwiki default user preferences, part 1/3 (T220186) (duration: 00m 56s)
  • 16:48 sbisson@deploy1001: Synchronized wmf-config/InitialiseSettings.php: SWAT: gerrit:512942 Revert: Hardcode korean help desk config (duration: 00m 56s)
  • 16:45 sbisson@deploy1001: Synchronized php-1.34.0-wmf.7/extensions/GrowthExperiments/includes/HelpPanel.php: SWAT: gerrit:512941 Prevent parsing of GEHelpPanelHelpDeskTitle from accessing the session (duration: 00m 56s)
  • 16:42 sbisson@deploy1001: Synchronized php-1.34.0-wmf.6/extensions/GrowthExperiments/includes/HelpPanel.php: SWAT: gerrit:512940 Prevent parsing of GEHelpPanelHelpDeskTitle from accessing the session (duration: 01m 00s)
  • 16:32 sbisson@deploy1001: Synchronized php-1.34.0-wmf.6/extensions/GrowthExperiments/includes/HelpPanel/QuestionRecord.php: SWAT: gerrit:512950 Revert: Fix phan job: ignore line using JsonSerializable (duration: 00m 57s)
  • 16:08 gehel@cumin1001: END (PASS) - Cookbook sre.elasticsearch.rolling-reboot (exit_code=0)
  • 15:55 jynus: upgrade and restart db2087
  • 15:11 moritzm: draining ganeti2008 for eventual reboot to pick up MDS-enabled kernel
  • 15:06 jmm@cumin2001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
  • 15:06 jmm@cumin2001: START - Cookbook sre.hosts.downtime
  • 15:06 anomie@deploy1001: Synchronized wmf-config/InitialiseSettings.php: Setting actor migration to write-new/read-new on group 1 (T188327) (duration: 00m 57s)
  • 14:54 jmm@cumin2001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
  • 14:54 jmm@cumin2001: START - Cookbook sre.hosts.downtime
  • 14:54 moritzm: draining ganeti2007 for eventual reboot to pick up MDS-enabled kernel
  • 14:51 XioNoX: `request chassis fpc online slot 0` on cr1-codfw - T224511
  • 14:48 XioNoX: `request chassis fpc offline slot 0` on cr1-codfw - T224511
  • 14:47 XioNoX: disable et- interfaces on cr1-codfw - T224511
  • 14:45 andrewbogott: reimaging cloudcontrol1003 T221770
  • 14:34 moritzm: draining ganeti2006 for eventual reboot to pick up MDS-enabled kernel
  • 14:34 jmm@cumin2001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
  • 14:34 jmm@cumin2001: START - Cookbook sre.hosts.downtime
  • 14:32 andrewbogott: powering off cloudcontrol1003 as one last check to see what explodes before I reimage it
  • 14:30 _joe_: installing the new service checker on restbase in eqiad
  • 14:29 _joe_: installing new service checker version on restbase in codfw
  • 14:20 jbond@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
  • 14:20 jbond@cumin1001: START - Cookbook sre.hosts.downtime
  • 14:09 jbond@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
  • 14:09 jbond@cumin1001: START - Cookbook sre.hosts.downtime
  • 14:01 jbond@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
  • 14:01 jbond@cumin1001: START - Cookbook sre.hosts.downtime
  • 13:58 gehel@cumin1001: START - Cookbook sre.elasticsearch.rolling-reboot
  • 13:58 gehel@cumin1001: END (ERROR) - Cookbook sre.elasticsearch.rolling-reboot (exit_code=97)
  • 13:54 jbond@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
  • 13:54 jbond@cumin1001: START - Cookbook sre.hosts.downtime
  • 13:48 urandom: decommissioning restbase1015-c -- T223976
  • 13:35 jmm@cumin2001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
  • 13:35 jmm@cumin2001: START - Cookbook sre.hosts.downtime
  • 13:19 zfilipin@deploy1001: Synchronized php: group1 wikis to 1.34.0-wmf.7 (duration: 00m 58s)
  • 13:18 zfilipin@deploy1001: rebuilt and synchronized wikiversions files: group1 wikis to 1.34.0-wmf.7
  • 13:16 jbond@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
  • 13:16 jbond@cumin1001: START - Cookbook sre.hosts.downtime
  • 13:12 Urbanecm: mwscript emptyUserGroup.php --wiki=fawiki 'uploader' finished (T221441)
  • 13:06 andrewbogott: stopping openstack services on cloudcontrol1003 in anticipation of a re-image
  • 13:03 gehel@cumin1001: START - Cookbook sre.elasticsearch.rolling-reboot
  • 13:02 gehel@cumin1001: END (ERROR) - Cookbook sre.elasticsearch.rolling-reboot (exit_code=97)
  • 13:02 gehel@cumin1001: END (PASS) - Cookbook sre.elasticsearch.force-shard-allocation (exit_code=0)
  • 13:02 gehel@cumin1001: START - Cookbook sre.elasticsearch.force-shard-allocation
  • 13:01 gehel@cumin1001: END (PASS) - Cookbook sre.elasticsearch.force-shard-allocation (exit_code=0)
  • 13:01 gehel@cumin1001: START - Cookbook sre.elasticsearch.force-shard-allocation
  • 13:00 gehel@cumin1001: START - Cookbook sre.elasticsearch.rolling-reboot
  • 12:42 Zppix: [12:27:02] jbond@cumin1001 END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
  • 12:41 Zppix: [12:27:02] jbond@cumin1001 START - Cookbook sre.hosts.downtime
  • 12:40 Zppix: [12:23:06] <jijiki> Rolling restart pdfrender on scb*
  • {{safesubst:SAL entry|1=12:39 Zppix: [[12:20:49] jbond@cumin1001 START - Cookbook sre.hosts.downtime}}
  • 12:39 Zppix: [12:20:49] jbond@cumin1001 START - Cookbook sre.hosts.downtime
  • 12:38 Zppix: [12:11:55] jbond@cumin1001 END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
  • 12:38 Zppix: [12:11:54] jbond@cumin1001 START - Cookbook sre.hosts.downtime
  • 12:37 Zppix: [12:01:54] jbond@cumin1001 END (PASS) - Cookbook sre.hosts.downtime (exit_code=0
  • 12:36 Zppix: [12:01:54] jbond@cumin1001 START - Cookbook sre.hosts.downtime
  • 12:36 Zppix: [12:00:21] marostegui@deploy1001 Synchronized wmf-config/db-eqiad.php: Remove db2037 from config as it will be decommissioned T221533 (duration: 00m 56s)
  • 12:35 jbond@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
  • 12:35 jbond@cumin1001: START - Cookbook sre.hosts.downtime
  • 12:34 Zppix: [11:59:19] marostegui@deploy1001 Synchronized wmf-config/db-codfw.php: Remove db2037 from config as it will be decommissioned T221533
  • 12:33 Zppix: [11:58:16] <arturo> T221770 icinga downtime cloudcontrol1003.wikimedia.org for upcoming rebuild as stretch
  • 12:32 Zppix: [11:57:57] aborrero@cumin1001 END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
  • 12:32 Zppix: [11:57:55] aborrero@cumin1001 START - Cookbook sre.hosts.downtime
  • 12:31 Zppix: [11:55:54] <Urbanecm> EU SWAT finished, maintenance script emptyUserGroup.php still running in separate tmux session
  • 12:31 Zppix: [11:55:11] urbanecm@deploy1001 Synchronized wmf-config/InitialiseSettings.php: Set wgLocaltimezone for euwiki to Europe/Berlin|gerrit:511849Set wgLocaltimezone for euwiki to Europe/Berlin (T224091) (duration: 00m 57s)
  • 12:30 Zppix: [11:55:10] jbond@cumin1001 END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
  • 12:29 Zppix: [11:55:09] jbond@cumin1001 START - Cookbook sre.hosts.downtime
  • 11:55 jbond@cumin1001: START - Cookbook sre.hosts.downtime
  • 11:50 urbanecm@deploy1001: Synchronized wmf-config/InitialiseSettings.php: RSS: Update URLs to the old Wikimedia Foundation blog to point to the new site|gerrit:471260RSS: Update URLs to the old Wikimedia Foundation blog to point to the new site (T208458) (duration: 00m 57s)
  • 11:46 jbond@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
  • 11:46 jbond@cumin1001: START - Cookbook sre.hosts.downtime
  • 11:46 gehel@cumin1001: END (ERROR) - Cookbook sre.elasticsearch.rolling-reboot (exit_code=97)
  • 11:45 Urbanecm: Started mwscript emptyUserGroup.php --wiki=fawiki 'uploader' (T221441)
  • 11:44 urbanecm@deploy1001: Synchronized dblists/commonsuploads.dblist: Remove uploader user group from fawiki and merge it with autoconfirmed|gerrit:505228Remove uploader user group from fawiki and merge it with autoconfirmed, part 2 (T221441) (duration: 00m 55s)
  • 11:43 urbanecm@deploy1001: Synchronized wmf-config/InitialiseSettings.php: Remove uploader user group from fawiki and merge it with autoconfirmed|gerrit:505228Remove uploader user group from fawiki and merge it with autoconfirmed, part 1 (T221441) (duration: 00m 55s)
  • 11:40 Urbanecm: Purged angwikibooks HD logos
  • 11:38 urbanecm@deploy1001: Synchronized static/images/project-logos/: Add HD logo for angwikibooks|gerrit:512433Add HD logo for angwikibooks, logo files (T150618) (duration: 00m 56s)
  • 11:35 urbanecm@deploy1001: Synchronized wmf-config/InitialiseSettings.php: Enable transwiki import between sqwiki and sqwikiquote|gerrit:512478Enable transwiki import between sqwiki and sqwikiquote (T221234) (duration: 00m 56s)
  • 11:31 jbond@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
  • 11:31 jbond@cumin1001: START - Cookbook sre.hosts.downtime
  • 11:30 pmiazga@deploy1001: Synchronized wmf-config/InitialiseSettings.php: SWAT: gerrit:509130 Enable Advanced Mobile Contributions Overflow menu (T223883) (duration: 00m 57s)
  • 11:18 urbanecm@deploy1001: Synchronized wmf-config/InitialiseSettings.php: Remove bureaucrat protection level for all Serbian projects|gerrit:512488Remove bureaucrat protection level for all Serbian projects (T217005) (duration: 00m 57s)
  • 11:14 urbanecm@deploy1001: Synchronized wmf-config/InitialiseSettings.php: Fix Serbian projects wgRestrictionLevels|gerrit:512487Fix Serbian projects wgRestrictionLevels (T217005) (duration: 00m 57s)
  • 11:11 jbond@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
  • 11:11 jbond@cumin1001: START - Cookbook sre.hosts.downtime
  • 11:08 urbanecm@deploy1001: Synchronized wmf-config/InitialiseSettings.php: Add namespace aliases on zhwiktionary|gerrit:506892Add namespace aliases on zhwiktionary (T222024) (duration: 00m 57s)
  • 11:02 jbond@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
  • 11:02 jbond@cumin1001: START - Cookbook sre.hosts.downtime
  • 10:59 gehel@cumin1001: START - Cookbook sre.elasticsearch.rolling-reboot
  • 10:57 jynus@deploy1001: Synchronized wmf-config/db-codfw.php: Depool db2087 for maintenance (duration: 01m 11s)
  • 10:57 Urbanecm: deleteBatch.php for srwikinews finished (T212346)
  • 10:48 jmm@cumin2001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
  • 10:48 jmm@cumin2001: START - Cookbook sre.hosts.downtime
  • 10:33 mobrovac@deploy1001: Finished deploy [restbase/deploy@92591a7] (dev-cluster): Switch to OpenAPI v3 (duration: 03m 36s)
  • 10:29 mobrovac@deploy1001: Started deploy [restbase/deploy@92591a7] (dev-cluster): Switch to OpenAPI v3
  • 09:51 gehel@cumin1001: END (ERROR) - Cookbook sre.elasticsearch.rolling-reboot (exit_code=97)
  • 09:45 _joe_: uploading a new service-checker version to jessie-wikimedia
  • 09:18 gehel@cumin1001: START - Cookbook sre.elasticsearch.rolling-reboot
  • 08:51 moritzm: draining ganeti2002 for eventual reboot to pick up MDS-enabled kernel
  • 08:31 jmm@cumin2001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
  • 08:31 jmm@cumin2001: START - Cookbook sre.hosts.downtime
  • 08:31 moritzm: draining ganeti2001 for eventual reboot to pick up MDS-enabled kernel
  • 07:42 mobrovac: decommission restbase1015-b -- T223976
  • 07:40 godog: ms-be2043 start sdd rebuild - T222654
  • 07:03 jijiki: restarting pdfrender on scb1003

2019-05-28

  • 23:19 jforrester@deploy1001: Synchronized php-1.34.0-wmf.7/extensions/TimedMediaHandler/includes/ApiTimedText.php: T224522 Fix fatal in ApiTimedText following redirect pages (duration: 00m 56s)
  • 23:17 jforrester@deploy1001: Synchronized php-1.34.0-wmf.7/extensions/TimedMediaHandler/includes/handlers/TextHandler/TextHandler.php: T224367 Fix regression in subtitles for non-English sites on Commons videos (duration: 00m 57s)
  • 23:17 bstorm_: T221339 completed view updates on labsdb1009 without depooling
  • 23:16 jforrester@deploy1001: Synchronized php-1.34.0-wmf.6/extensions/TimedMediaHandler/includes/handlers/TextHandler/TextHandler.php: T224367 Fix regression in subtitles for non-English sites on Commons videos (duration: 00m 56s)
  • 23:14 jforrester@deploy1001: Synchronized php-1.34.0-wmf.6/extensions/TimedMediaHandler/includes/ApiTimedText.php: T224522 Fix fatal in ApiTimedText following redirect pages (duration: 00m 58s)
  • 23:11 jforrester@deploy1001: Synchronized wmf-config/flaggedrevs.php: FlaggedRevisions: Copy in rest of the config, for static registration I77d70519f Id0cd2e18c (duration: 00m 56s)
  • 23:10 bstorm_: T221339 repooled labsdb1011
  • 23:06 jforrester@deploy1001: Synchronized wmf-config/throttle.php: Remove expired throttle rules I4ba3d489 (duration: 00m 55s)
  • 23:06 bstorm_: T221339 depooled labsdb1011 and updated views
  • 23:02 jforrester@deploy1001: Synchronized wmf-config/InitialiseSettings.php: SWAT T55652 Enable wgSpecialSearchFormOptions on testwikidata (duration: 00m 56s)
  • 22:49 jforrester@deploy1001: Synchronized wmf-config/CommonSettings.php: SWAT Fix order of edit tabs for multi-tabs on SET wikis T223793 (duration: 00m 57s)
  • 22:28 cstone_: Re-enabled fundraising thank you mail job
  • 22:25 mutante: cp3034 - sudo -i varnish-backend-restart
  • 22:18 cstone_: Updated fundraising civicrm from 21afd001b6 to bb4acf3d8a
  • 22:14 mutante: cp3035 - varnish-backend-restart
  • 22:13 bstorm_: repooled labsdb1010
  • 22:09 mutante: cp3034 - restart varnish backend
  • 22:09 XioNoX: restart varnish backend on cp3039
  • 22:02 cstone_: Disabled fundraising thank you mail job
  • 21:46 bstorm_: depool labsdb1010 for view updates
  • 21:38 smalyshev@deploy1001: Finished deploy [wdqs/wdqs@5a69072]: Deploy GUI & Blazegraph update (duration: 14m 37s)
  • 21:35 urandom: decommissioning restbase1015-a -- T223976
  • 21:24 smalyshev@deploy1001: Started deploy [wdqs/wdqs@5a69072]: Deploy GUI & Blazegraph update
  • 21:23 ebernhardson: restart elasticsearch on cloudelastic1001 to test sanely sized readahead on /dev/dm-0
  • 21:11 gehel@cumin2001: END (PASS) - Cookbook sre.elasticsearch.rolling-reboot (exit_code=0)
  • 20:58 mutante: phab1003 / phab2001 - removing 'apache restart' from root's crontab (gerrit:512977) (T187790)
  • 20:28 mholloway-shell@deploy1001: Synchronized wmf-config/InitialiseSettings.php: WikimediaEditorTasks: Update caption edit target counts (duration: 00m 57s)
  • 19:17 gehel@cumin2001: START - Cookbook sre.elasticsearch.rolling-reboot
  • 19:15 marostegui@deploy1001: Synchronized wmf-config/db-codfw.php: Remove db1064 from config as it will be decommissioned T223217 (duration: 00m 55s)
  • 19:14 marostegui@deploy1001: Synchronized wmf-config/db-eqiad.php: Remove db1064 from config as it will be decommissioned T223217 (duration: 00m 56s)
  • 19:02 marostegui: Reboot db2091 for full OS and MySQL upgrade - T224393
  • 18:55 jforrester@deploy1001: Synchronized wmf-config/InitialiseSettings.php: Stop setting wgMediaInfoEnableFilePageDepicts, no longer read (duration: 00m 57s)
  • 18:51 jforrester@deploy1001: Synchronized wmf-config/Wikibase.php: Add forwards-compatibility for dataCdnMaxAge (duration: 01m 00s)
  • 18:11 marostegui: Start mysql for s2 and s4 on db2091 T224393
  • 17:57 jbond@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
  • 17:57 jbond@cumin1001: START - Cookbook sre.hosts.downtime
  • 17:52 jmm@cumin2001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
  • 17:52 jmm@cumin2001: START - Cookbook sre.hosts.downtime
  • 17:48 jbond@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
  • 17:48 jbond@cumin1001: START - Cookbook sre.hosts.downtime
  • 17:42 moritzm: rebooting yubiauth* servers for kernel update
  • 17:41 jmm@cumin2001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
  • 17:41 jmm@cumin2001: START - Cookbook sre.hosts.downtime
  • 17:35 mholloway-shell@deploy1001: Finished deploy [mobileapps/deploy@0735c45]: Update mobileapps to ab67b78 (duration: 05m 56s)
  • 17:34 jbond@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
  • 17:34 jbond@cumin1001: START - Cookbook sre.hosts.downtime
  • 17:29 mholloway-shell@deploy1001: Started deploy [mobileapps/deploy@0735c45]: Update mobileapps to ab67b78
  • 17:26 jbond@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
  • 17:26 jbond@cumin1001: START - Cookbook sre.hosts.downtime
  • 17:11 jbond@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
  • 17:11 jbond@cumin1001: START - Cookbook sre.hosts.downtime
  • 17:03 jbond@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
  • 17:03 jbond@cumin1001: START - Cookbook sre.hosts.downtime
  • 16:57 jbond@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
  • 16:57 jbond@cumin1001: START - Cookbook sre.hosts.downtime
  • 16:49 jbond@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
  • 16:49 jbond@cumin1001: START - Cookbook sre.hosts.downtime
  • 16:41 jbond@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
  • 16:41 jbond@cumin1001: START - Cookbook sre.hosts.downtime
  • 16:35 hoo: Ran scap pull on mw1240 (curl -H 'Host: www.wikidata.org' … mw1240.eqiad.wmnet/wiki/Special:SetEntitySchemaLabelDescriptionAliases/E10/en returned 404)
  • 16:27 jbond@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
  • 16:27 jbond@cumin1001: START - Cookbook sre.hosts.downtime
  • 16:20 Lucas_WMDE: lucaswerkmeister-wmde@mw1271:~$ scap pull
  • 16:17 jmm@cumin2001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
  • 16:17 jmm@cumin2001: START - Cookbook sre.hosts.downtime
  • 16:15 moritzm: rearmed keyholder on deploy2001 following reboot
  • 16:09 jmm@cumin2001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
  • 16:09 jmm@cumin2001: START - Cookbook sre.hosts.downtime
  • 16:09 jmm@cumin2001: END (FAIL) - Cookbook sre.hosts.downtime (exit_code=99)
  • 16:09 jmm@cumin2001: START - Cookbook sre.hosts.downtime
  • 16:04 jbond@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
  • 16:04 jbond@cumin1001: START - Cookbook sre.hosts.downtime
  • 15:56 jbond@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
  • 15:56 jbond@cumin1001: START - Cookbook sre.hosts.downtime
  • 15:54 papaul: shutting down db2091 for firmware upgrade
  • 15:53 godog: put back wrongly-replaced sdf on ms-be2043 - T222654
  • 15:42 jbond@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
  • 15:42 jbond@cumin1001: START - Cookbook sre.hosts.downtime
  • 15:42 Lucas_WMDE: Extension:EntitySchema deployment finished successfully
  • 15:38 Lucas_WMDE: lucaswerkmeister-wmde@mwmaint1002:~$ mwscript extensions/EntitySchema/maintenance/createPreexistingSchemas.php --wiki=wikidatawiki
  • 15:38 lucaswerkmeister-wmde@deploy1001: Synchronized wmf-config/InitialiseSettings.php: Enable extension EntitySchema in production|gerrit:512909Enable extension EntitySchema in production (duration: 00m 56s)
  • 15:35 jbond@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
  • 15:35 jbond@cumin1001: START - Cookbook sre.hosts.downtime
  • 15:34 lucaswerkmeister-wmde@deploy1001: Synchronized php-1.34.0-wmf.6/extensions/EntitySchema/: Steal maintenance script user|gerrit:512911Steal maintenance script user (duration: 00m 58s)
  • 15:26 jbond@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
  • 15:26 jbond@cumin1001: START - Cookbook sre.hosts.downtime
  • 15:17 Lucas_WMDE: lucaswerkmeister-wmde@mwmaint1002:~$ mwscript extensions/EntitySchema/maintenance/createPreexistingSchemas.php --wiki=testwikidatawiki
  • 15:17 lucaswerkmeister-wmde@deploy1001: Synchronized php-1.34.0-wmf.7/extensions/EntitySchema/: Steal maintenance script user|gerrit:512912Steal maintenance script user – forgot `git submodule update` before previous sync (duration: 00m 57s)
  • 15:15 jbond@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
  • 15:15 jbond@cumin1001: START - Cookbook sre.hosts.downtime
  • 15:11 lucaswerkmeister-wmde@deploy1001: Synchronized php-1.34.0-wmf.7/extensions/EntitySchema/: Steal maintenance script user|gerrit:512912Steal maintenance script user (duration: 00m 59s)
  • 15:08 jbond@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
  • 15:07 jbond@cumin1001: START - Cookbook sre.hosts.downtime
  • 15:01 gehel@cumin2001: END (ERROR) - Cookbook sre.elasticsearch.rolling-reboot (exit_code=97)
  • 14:57 jbond42: reboot ms-be2016
  • 14:57 jbond@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
  • 14:57 jbond@cumin1001: START - Cookbook sre.hosts.downtime
  • 14:36 gehel@cumin2001: START - Cookbook sre.elasticsearch.rolling-reboot
  • 14:30 zfilipin@deploy1001: rebuilt and synchronized wikiversions files: group0 to 1.34.0-wmf.7
  • 14:10 herron: beginning rolling reboots of codfw kafka-main cluster for security updates
  • 14:10 zfilipin@deploy1001: Finished scap: testwiki to php-1.34.0-wmf.7 and rebuild l10n cache (duration: 34m 22s)
  • 14:04 gehel@cumin2001: END (ERROR) - Cookbook sre.elasticsearch.rolling-reboot (exit_code=97)
  • 13:50 _joe_: hhvm restarted on mwdebug1001
  • 13:48 _joe_: stopping hhvm on mwdebug1001 for testing
  • 13:39 gehel@cumin2001: START - Cookbook sre.elasticsearch.rolling-reboot
  • 13:35 zfilipin@deploy1001: Started scap: testwiki to php-1.34.0-wmf.7 and rebuild l10n cache
  • 13:32 gilles@deploy1001: Finished deploy [performance/asoranking@60369cc]: T224388 (duration: 00m 03s)
  • 13:31 gilles@deploy1001: Started deploy [performance/asoranking@60369cc]: T224388
  • 13:31 gilles@deploy1001: deploy aborted: T224388 (duration: 00m 01s)
  • 13:31 gilles@deploy1001: Started deploy [performance/asoranking@1c60db1]: T224388
  • 13:24 urandom: decommissioning restbase1014-c -- T223976
  • 13:23 gehel@cumin2001: END (ERROR) - Cookbook sre.elasticsearch.rolling-reboot (exit_code=97)
  • 12:55 gehel@cumin2001: START - Cookbook sre.elasticsearch.rolling-reboot
  • 12:51 gilles@deploy1001: Finished deploy [performance/asoranking@1c60db1]: T224388 (duration: 00m 04s)
  • 12:50 gilles@deploy1001: Started deploy [performance/asoranking@1c60db1]: T224388
  • 12:40 gilles@deploy1001: Finished deploy [performance/asoranking@157c25f]: T224388 (duration: 00m 06s)
  • 12:40 gilles@deploy1001: Started deploy [performance/asoranking@157c25f]: T224388
  • 12:13 raynor: EU SWAT done
  • 12:11 pmiazga@deploy1001: Synchronized wmf-config/InitialiseSettings.php: SWAT: gerrit:512743 Disable the rdf2latex Collection portlet format(T224433) (duration: 00m 55s)
  • 12:00 raynor: EU SWAT re-opened
  • 11:58 Lucas_WMDE: EU SWAT done
  • 11:54 Lucas_WMDE: ^ error, no change to wiki
  • 11:54 Lucas_WMDE: lucaswerkmeister-wmde@mwmaint1002:~$ mwscript extensions/EntitySchema/maintenance/createPreexistingSchemas.php --wiki=testwikidatawiki
  • 11:52 lucaswerkmeister-wmde@deploy1001: Synchronized php-1.34.0-wmf.6/extensions/EntitySchema/: SWAT: Add maintenance script to create preexisting Schemas|gerrit:512689Add maintenance script to create preexisting Schemas + Small maintenance script adjustments|gerrit:512717Small maintenance script adjustments (duration: 00m 54s)
  • 11:48 lucaswerkmeister-wmde@deploy1001: Synchronized php-1.34.0-wmf.6/extensions/EntitySchema: SWAT: Skip configured IDs|gerrit:512677Skip configured IDs (duration: 00m 57s)
  • 11:43 lucaswerkmeister-wmde@deploy1001: Synchronized wmf-config/InitialiseSettings.php: SWAT: Add a list of IDs to skip in production|gerrit:511753Add a list of IDs to skip in production (duration: 00m 54s)
  • 11:37 lucaswerkmeister-wmde@deploy1001: Synchronized wmf-config: SWAT: Add feature flag config for breaking Wikibase API change (T223300)|gerrit:510204Add feature flag config for breaking Wikibase API change (T223300) (duration: 00m 54s)
  • 11:31 Urbanecm: Ran namespaceDupes.php for urwikibooks, urwikiquote, urwiktionary and aswikisource
  • 11:27 urbanecm@deploy1001: Synchronized wmf-config/InitialiseSettings.php: SWAT: Use underscores instead of spaces in wgMetaNamespace(Talk) for several projects|gerrit:512426Use underscores instead of spaces in wgMetaNamespace(Talk) for several projects (T223039) (duration: 00m 54s)
  • 11:25 arturo: merging change to the puppet sudo module https://gerrit.wikimedia.org/r/c/operations/puppet/+/508311
  • 11:18 urbanecm@deploy1001: Synchronized wmf-config/abusefilter.php: SWAT: Add abusefilter-modify-restricted to abusefilter group on plwiki (T224308)|gerrit:512422Add abusefilter-modify-restricted to abusefilter group on plwiki (T224308) (duration: 02m 36s)
  • 10:54 zfilipin@deploy1001: scap failed: CalledProcessError Command '/usr/local/bin/mwscript rebuildLocalisationCache.php --wiki="testwiki" --outdir="/tmp/scap_l10n_4182265560" --threads=30 --lang en --quiet' returned non-zero exit status 1 (duration: 03m 00s)
  • 10:51 zfilipin@deploy1001: Started scap: testwiki to php-1.34.0-wmf.7 and rebuild l10n cache
  • 10:48 zfilipin@deploy1001: Pruned MediaWiki: 1.34.0-wmf.3 [keeping static files] (duration: 01m 32s)
  • 10:45 zfilipin@deploy1001: Pruned MediaWiki: 1.34.0-wmf.4 [keeping static files] (duration: 06m 06s)
  • 09:32 mobrovac@deploy1001: Synchronized wmf-config/InitialiseSettings.php: Allow MW to honour the X-Request-Id header if set - T201409 (duration: 01m 12s)
  • 09:28 moritzm: installing php5 security updates
  • 09:00 moritzm: installing ffmpeg security updates
  • 08:58 gehel: rebooting wdqs nodes for kernel upgrade
  • 08:54 jiji@deploy1001: Finished deploy [cpjobqueue/deploy@04cc66d]: Migrating ORESFetchScoresJob to PHP7 - T219148 (duration: 01m 21s)
  • 08:52 jiji@deploy1001: Started deploy [cpjobqueue/deploy@04cc66d]: Migrating ORESFetchScoresJob to PHP7 - T219148
  • 08:52 moritzm: uploaded ffmpeg 3.2.14-1~deb9u1+wmf3 to component/vp9 of stretch-wikimedia (rebase of our vp9-row-mt backport to the latest stretch-security ffmpeg update)
  • 08:47 vgutierrez: uploaded acme-chief 0.17 to apt.wikimedia.org (buster) - T220518 T213820
  • 08:40 volans: T224448 sudo cumin -b 15 -p 95 'R:git::clone' 'run-puppet-agent -q --failed-only'
  • 08:29 volans: restarting gerrit due to stack threads - T224448
  • 07:17 moritzm: uploaded ffmpeg 3.2.14-1~deb9u1+wmf1 to component/vp9 of stretch-wikimedia (rebase of our vp9-row-mt backport to the latest stretch-security ffmpeg update)
  • 07:02 mobrovac: decommission restbase1014-b -- T223976
  • 06:40 jiji@deploy1001: Synchronized wmf-config/CommonSettings.php: Send 20% of anonymous users to PHP7.2 - T219150 (duration: 00m 51s)
  • 00:38 urandom: decommissioning restbase1014-a -- T223976

2019-05-27

  • 23:19 thcipriani: gerrit back after restarting due to T224448
  • 23:10 thcipriani: restarting gerrit due to active threads being stuck being a sendemail thread.
  • 22:52 gilles@deploy1001: Finished deploy [performance/asoranking@bacfc37]: T224388 (duration: 00m 05s)
  • 22:52 gilles@deploy1001: Started deploy [performance/asoranking@bacfc37]: T224388
  • 22:19 gilles@deploy1001: Finished deploy [performance/asoranking@d0c156e]: T224388 (duration: 00m 05s)
  • 22:19 gilles@deploy1001: Started deploy [performance/asoranking@d0c156e]: T224388
  • 20:19 gilles@deploy1001: Finished deploy [performance/asoranking@61039f1]: (no justification provided) (duration: 00m 06s)
  • 20:19 gilles@deploy1001: Started deploy [performance/asoranking@61039f1]: (no justification provided)
  • 18:41 krinkle@deploy1001: Synchronized php-1.34.0-wmf.6/includes/libs/rdbms: 66556bf37e8 / T223310, T223978 (duration: 00m 50s)
  • 18:06 krinkle@deploy1001: Synchronized errorpages/: 4ffcbfc2ba3 (duration: 00m 48s)
  • 17:56 andrewbogott: re-imaging cloudservices1004 in order to make sure our apt magic is working properly
  • 17:37 andrewbogott: refreshing puppet-compiler facts
  • 16:40 volans: removed unreferenced files in /etc/dhcp/ on install[12]002
  • 16:34 mobrovac: decommission restbase1013-c - T223976
  • 15:40 akosiaris: initialize termbox namespace on eqiad/codfw/staging kubernetes clusters T220402
  • 15:36 akosiaris: initialize sessionstore namespace on eqiad/codfw/staging kubernetes clusters T220401
  • 13:03 godog: swift eqiad-prod: ms-be1033 weight to 0 - T223518
  • 11:33 onimisionipe: starting osm initial import on maps2004 - T224395
  • 10:35 mobrovac: decommission restbase1013-b - T223976
  • 10:31 onimisionipe: rebooting maps2004 - cassandra unit failed and got stuck
  • 09:59 jiji@deploy1001: Finished deploy [cpjobqueue/deploy@421c029]: Migrating wikibase-addUsagesForPage to PHP7 - T219148 (duration: 01m 09s)
  • 09:58 jiji@deploy1001: Started deploy [cpjobqueue/deploy@421c029]: Migrating wikibase-addUsagesForPage to PHP7 - T219148
  • 09:52 _joe_: disabling puppet on mw1261, running some tests for T223180
  • 08:52 arturo: 1 day downtime systemd check for cloudcontrol1003
  • 08:27 jiji@deploy1001: Synchronized wmf-config/db-codfw.php: Depool db2091 - T224393 (duration: 00m 49s)
  • 08:03 gehel: depool maps2004 - T224395
  • 07:05 gehel: running nodetool repair on maps2004 -T224395
  • 04:23 gilles@deploy1001: Finished deploy [performance/asoranking@61039f1]: (no justification provided) (duration: 00m 28s)
  • 04:23 gilles@deploy1001: Started deploy [performance/asoranking@61039f1]: (no justification provided)
  • 02:59 urandom: decommissioning restbase1013-a -- T223976

2019-05-26

  • 20:39 urandom: decommissioning restbase1012-c -- T223976
  • 14:09 urandom: decommissioning restbase1012-b -- T223976
  • 13:37 krinkle@deploy1001: Synchronized php-1.34.0-wmf.6/includes/debug: T187147 / 2be7aa4bc4af36 (duration: 00m 51s)
  • 08:01 mobrovac: decommission restbase1012-a - T223976

2019-05-25

  • 22:41 urandom: decommissioning restbase1011-c -- T223976
  • 22:00 krinkle@deploy1001: Synchronized php-1.34.0-wmf.6/includes/Linker.php: T222628 / c735a545df3a (duration: 00m 51s)
  • 19:12 andrewbogott: reimaging cloudservices1004 with Stretch
  • 13:46 urandom: decommissioning restbase1011-b -- T223976
  • 12:28 godog: bounce thumbor on thumbor1002
  • 12:21 godog: bounce thumbor on thumbor1002
  • 11:48 _joe_: restarted tumbor-instances on thumbor1001
  • 09:20 mobrovac: decommission restbase1011-b - T223976
  • 04:56 ariel@deploy1001: Finished deploy [dumps/dumps@61114e0]: add namespaces param only once for abstracts with lang variants (duration: 00m 07s)
  • 04:56 ariel@deploy1001: Started deploy [dumps/dumps@61114e0]: add namespaces param only once for abstracts with lang variants
  • 00:30 jforrester@deploy1001: Synchronized php-1.34.0-wmf.6/extensions/VisualEditor/modules/ve-mw/init/ve.init.mw.ArticleTarget.js: Hot-deploy T224319 for VisualEditor switching and auto-restore (duration: 00m 50s)

2019-05-24

  • 21:56 urandom: decommissioning restbase1011-a -- T223976
  • 16:34 XioNoX: add routinator package to reprepro/APT - T220669
  • 15:44 urandom: decommissioning restbase1010-c -- T223976
  • 15:30 XioNoX: disable bgp to telia on cr1-codfw for X-connect investigation - T222967
  • 15:01 jbond42: upload python{,3}-statsd.3.2.1-2 to jessie-wikimedia
  • 14:55 krinkle@deploy1001: Synchronized php-1.34.0-wmf.6/includes/libs/objectcache/: d262078b1 / T220470 (duration: 01m 06s)
  • 11:45 hoo: Updated the Wikidata property suggester with data from the 2019-05-13 JSON dump and applied the T132839 workarounds
  • 11:32 jbond42: [actully] rebooting prometheous1004 now
  • 11:32 jbond@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
  • 11:32 jbond@cumin1001: START - Cookbook sre.hosts.downtime
  • 11:25 jbond@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
  • 11:25 jbond@cumin1001: START - Cookbook sre.hosts.downtime
  • 11:23 jbond@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
  • 11:23 jbond@cumin1001: START - Cookbook sre.hosts.downtime
  • 11:23 jbond@cumin1001: END (FAIL) - Cookbook sre.hosts.downtime (exit_code=99)
  • 11:23 jbond@cumin1001: START - Cookbook sre.hosts.downtime
  • 11:23 jbond42: rebooting prometheous1004
  • 10:56 jbond@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
  • 10:56 jbond@cumin1001: START - Cookbook sre.hosts.downtime
  • 10:56 jbond42: rebooting prometheous2003
  • 10:25 jbond42: rebooting prometheous2004
  • 10:25 jbond@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
  • 10:25 jbond@cumin1001: START - Cookbook sre.hosts.downtime
  • 10:09 mobrovac: decommission restbase1010-b - T223976
  • 07:33 jmm@cumin2001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
  • 07:33 jmm@cumin2001: START - Cookbook sre.hosts.downtime
  • 07:32 moritzm: rebooting labweb* for kernel security update
  • 07:05 mobrovac: restbase-dev1006 force-stop the cassandra instances, fsync exception during decomm - T224260
  • 06:47 moritzm: bounced ferm on mw2286, wasn't correctly started after reboot
  • 06:45 mobrovac: restbase-dev1006 decommission cass-b - T224260
  • 06:43 _joe_: disable notifications in icinga for restbase-dev1006 T224260
  • 06:40 mobrovac: restbase-dev1006 decommission cass-a - T224260
  • 06:39 mobrovac: restbase-dev1006 stop restbase - T224260
  • 06:38 mobrovac: restbase-dev1006 puppet disabled - T224260
  • 06:26 mobrovac@deploy1001: Finished deploy [restbase/deploy@b153f5d] (dev-cluster): Remove Parsoid fallback and rate-limit stashing (duration: 05m 41s)
  • 06:20 mobrovac@deploy1001: Started deploy [restbase/deploy@b153f5d] (dev-cluster): Remove Parsoid fallback and rate-limit stashing
  • 06:20 mobrovac@deploy1001: Finished deploy [restbase/deploy@b153f5d]: Remove Parsoid fallback and rate-limit stashing - T215956 T224055 (duration: 21m 30s)
  • 06:17 marostegui: Stop MySQL on db2078:m1 to clone db2062 - T220170
  • 06:08 marostegui@deploy1001: Synchronized wmf-config/db-eqiad.php: More traffic to new hosts T220170 (duration: 00m 48s)
  • 05:58 mobrovac@deploy1001: Started deploy [restbase/deploy@b153f5d]: Remove Parsoid fallback and rate-limit stashing - T215956 T224055
  • 05:35 marostegui@deploy1001: Synchronized wmf-config/db-codfw.php: Remove db2062 from config T220170 (duration: 00m 48s)
  • 05:34 marostegui@deploy1001: Synchronized wmf-config/db-eqiad.php: Remove db2062 from config T220170 (duration: 00m 49s)
  • 05:30 marostegui: Reload haproxy on dbproxy1010 to repool labsdb1011
  • 00:32 XioNoX: remove lvs1001-5 bgp sessions from cr1/2-eqiad - T224223
  • 00:27 XioNoX: remove term protect-old-lvs-servers from cr1/2-eqiad - T224223
  • 00:20 urandom: decommissioning restbase1010-a -- T223976
  • 00:04 ebernhardson@deploy1001: Finished scap: php-1.34.0-wmf.6/extensions/CirrusSearch/includes/ T223738 Consider searching out of limits an error (duration: 21m 32s)

2019-05-23

  • 23:43 ebernhardson@deploy1001: Started scap: php-1.34.0-wmf.6/extensions/CirrusSearch/includes/ T223738 Consider searching out of limits an error
  • 23:08 jforrester@deploy1001: Synchronized wmf-config/InitialiseSettings.php: Invariant config cleanup VII–X, InitialiseSettings (duration: 00m 48s)
  • 23:06 jforrester@deploy1001: Synchronized wmf-config/CommonSettings.php: Invariant config cleanup VII–X, CommonSettings (duration: 00m 47s)
  • 23:00 jforrester@deploy1001: Synchronized wmf-config/InitialiseSettings.php: Invariant config cleanup VI, InitialiseSettings (duration: 00m 47s)
  • 22:59 jforrester@deploy1001: Synchronized wmf-config/CommonSettings.php: Invariant config cleanup VI, CommonSettings (duration: 00m 48s)
  • 22:57 jforrester@deploy1001: Synchronized wmf-config/InitialiseSettings.php: Invariant config cleanup V, InitialiseSettings (duration: 00m 47s)
  • 22:56 jforrester@deploy1001: Synchronized wmf-config/CommonSettings.php: Invariant config cleanup V, CommonSettings (duration: 00m 47s)
  • 22:53 jforrester@deploy1001: Synchronized wmf-config/InitialiseSettings.php: Invariant config cleanup IV, InitialiseSettings (duration: 00m 47s)
  • 22:51 jforrester@deploy1001: Synchronized wmf-config/CommonSettings.php: Invariant config cleanup IV, CommonSettings (duration: 00m 48s)
  • 22:50 jforrester@deploy1001: Synchronized wmf-config/InitialiseSettings.php: Invariant config cleanup III, InitialiseSettings (duration: 00m 47s)
  • 22:47 jforrester@deploy1001: Synchronized wmf-config/CommonSettings.php: Invariant config cleanup III, CommonSettings (duration: 00m 48s)
  • 22:44 jforrester@deploy1001: Synchronized wmf-config/InitialiseSettings.php: Invariant config cleanup II, InitialiseSettings (duration: 00m 48s)
  • 22:43 jforrester@deploy1001: Synchronized wmf-config/CommonSettings.php: Invariant config cleanup II, CommonSettings (duration: 00m 48s)
  • 22:39 jforrester@deploy1001: Synchronized wmf-config/InitialiseSettings.php: Invariant config cleanup I, InitialiseSettings (duration: 00m 47s)
  • 22:37 jforrester@deploy1001: Synchronized wmf-config/CommonSettings.php: Invariant config cleanup I, CommonSettings (duration: 00m 48s)
  • 22:32 jforrester@deploy1001: Synchronized wmf-config/InitialiseSettings.php: Stop setting wmgUseClusterSquid, never varied, no longer used (duration: 00m 48s)
  • 22:29 jforrester@deploy1001: Synchronized wmf-config/CommonSettings.php: Stop reading wmgUseClusterSquid, never varied (duration: 00m 47s)
  • 22:25 jforrester@deploy1001: Synchronized wmf-config/CommonSettings.php: T104148 Duplicate …Squid variables into …Cdn ahead of MW renaming, part 3 (duration: 00m 47s)
  • 22:24 jforrester@deploy1001: Synchronized wmf-config/InitialiseSettings.php: T104148 Duplicate …Squid variables into …Cdn ahead of MW renaming, part 2 (duration: 00m 48s)
  • 22:23 jforrester@deploy1001: Synchronized wmf-config/reverse-proxy.php: T104148 Duplicate …Squid variables into …Cdn ahead of MW renaming, part 1 (duration: 00m 48s)
  • 22:19 jforrester@deploy1001: Synchronized wmf-config/InitialiseSettings.php: T223793 Drop wmgVisualEditorSingleEditTabSecondaryEditor and wmgVisualEditorSecondaryTabs from InitialiseSettings (duration: 00m 48s)
  • 22:17 jforrester@deploy1001: Synchronized wmf-config/CommonSettings.php: T223793 Read wmgVisualEditorIsSecondaryEditor in CommonSettings (duration: 00m 48s)
  • 22:13 jforrester@deploy1001: Synchronized wmf-config/InitialiseSettings.php: T223793 Add wmgVisualEditorIsSecondaryEditor to InitialiseSettings (duration: 00m 49s)
  • 19:48 ejegg: updated payments-wiki from 786d76e212 to 332aaa96e2
  • 18:54 urandom: decommissioning restbase1009-c -- T223976
  • 16:13 twentyafterfour: restarting phd on phab1003 to pick up new php module config
  • 15:57 moritzm: rebooting furud/flerovium for kernel updates
  • 15:56 jmm@cumin2001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
  • 15:56 jmm@cumin2001: START - Cookbook sre.hosts.downtime
  • 15:56 jmm@cumin2001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
  • 15:56 jmm@cumin2001: START - Cookbook sre.hosts.downtime
  • 15:33 ottomata: rolling restart of swift-proxy to apply creation of analytics_admin account
  • 15:31 hashar@deploy1001: Synchronized wmf-config/InitialiseSettings.php: Hardcode korean help desk config - T224224 (duration: 00m 48s)
  • 15:31 jbond@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
  • 15:31 jbond@cumin1001: START - Cookbook sre.hosts.downtime
  • 15:31 jbond42: reboot thumbor2004
  • 15:02 jbond@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
  • 15:02 jbond@cumin1001: START - Cookbook sre.hosts.downtime
  • 15:02 jbond42: reboot thumbor2003
  • 14:57 jbond@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
  • 14:57 jbond@cumin1001: START - Cookbook sre.hosts.downtime
  • 14:57 jbond42: reboot thumbor2002
  • 14:51 jbond@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
  • 14:51 jbond@cumin1001: START - Cookbook sre.hosts.downtime
  • 14:51 jbond@cumin1001: END (FAIL) - Cookbook sre.hosts.downtime (exit_code=99)
  • 14:51 jbond@cumin1001: START - Cookbook sre.hosts.downtime
  • 14:50 jbond42: reboot thumbor2001
  • 14:43 jbond@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
  • 14:43 jbond@cumin1001: START - Cookbook sre.hosts.downtime
  • 14:43 jbond42: reboot thumbor1004
  • 14:37 jbond@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
  • 14:37 jbond@cumin1001: START - Cookbook sre.hosts.downtime
  • 14:36 jbond42: reboot thumbor1003
  • 14:30 jbond@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
  • 14:29 jbond@cumin1001: START - Cookbook sre.hosts.downtime
  • 14:28 jbond42: reboot thumbor1002
  • 14:25 jbond@cumin1001: conftool action : set/pooled=yes; selector: name=thumbor1001.eqiad.wmnet
  • 14:21 jbond@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
  • 14:21 jbond@cumin1001: START - Cookbook sre.hosts.downtime
  • 14:21 jbond@cumin1001: conftool action : set/pooled=no; selector: name=thumbor1001.eqiad.wmnet
  • 13:56 sbisson@deploy1001: Synchronized php-1.34.0-wmf.6/extensions/Echo: SWAT: Don't add CommentStoreComment as plaintext params|gerrit:512070Don't add CommentStoreComment as plaintext params (duration: 00m 50s)
  • 13:55 urandom: decommissioning restbase1009-b -- T223976
  • 13:41 bblack: stopped pybal on lvs1001-6 - T224223
  • 13:25 hashar@deploy1001: rebuilt and synchronized wikiversions files: all wikis to 1.34.0-wmf.6
  • 13:00 godog: swift eqiad-prod: ms-be1033 weight to 1500 - T223518
  • 12:04 moritzm: powercycling mw2268 (stuck after reboot)
  • 11:50 jbond42: will shortly start rolling reboots of thumbor servers
  • 11:37 jmm@cumin2001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
  • 11:37 jmm@cumin2001: START - Cookbook sre.hosts.downtime
  • 11:34 jmm@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
  • 11:34 jmm@cumin1001: START - Cookbook sre.hosts.downtime
  • 11:23 moritzm: rebooting auth1002 for kernel update
  • 11:21 jmm@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
  • 11:21 jmm@cumin1001: START - Cookbook sre.hosts.downtime
  • 10:51 Amir1: Deploying EntitySchema to testwikidatawiki is done
  • 10:50 Amir1: ladsgroup@mwmaint1002:/srv/mediawiki/php-1.34.0-wmf.5$ mwscript sql.php --wiki=wikidatawiki extensions/EntitySchema/sql/EntitySchema.sql (T216955)
  • 10:50 ladsgroup@deploy1001: Synchronized wmf-config/InitialiseSettings.php: SWAT: deploy WikibaseSchema to test (T216956)|gerrit:511844deploy WikibaseSchema to test (T216956) (duration: 00m 56s)
  • 10:44 Amir1: ladsgroup@mwmaint1002:/srv/mediawiki/php-1.34.0-wmf.5$ mwscript sql.php --wiki=testwikidatawiki extensions/EntitySchema/sql/EntitySchema.sql (T216956)
  • 10:28 jmm@cumin2001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
  • 10:28 jmm@cumin2001: START - Cookbook sre.hosts.downtime
  • 10:16 marostegui@deploy1001: Synchronized wmf-config/db-eqiad.php: Fully repool db1080 (duration: 00m 57s)
  • 10:15 _joe_: restarted php7.2-fpm on mw1261 to assess the effect of a larger APCu shm size T223180
  • 10:00 jmm@cumin2001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
  • 10:00 moritzm: rebooting remaining mw servers in codfw (sans mcrouter proxies for now)
  • 10:00 jmm@cumin2001: START - Cookbook sre.hosts.downtime
  • 09:51 hashar@deploy1001: Synchronized php-1.34.0-wmf.6/extensions/Collection: Rename wfAjaxCollectionGetItemList() T224093 (duration: 00m 57s)
  • 09:37 marostegui@deploy1001: Synchronized wmf-config/db-eqiad.php: Repool db1080 into API (duration: 00m 55s)
  • 09:22 godog: bounce rsyslog on lithium - listener stuck /T199406
  • 09:10 jmm@cumin2001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
  • 09:10 jmm@cumin2001: START - Cookbook sre.hosts.downtime
  • 09:10 moritzm: rebooting scb servers in eqiad
  • 09:02 marostegui@deploy1001: Synchronized wmf-config/db-eqiad.php: Slowly repool db1080 (duration: 00m 55s)
  • 08:29 marostegui: Upgrade MySQL and kernel on db1080
  • 08:29 marostegui@deploy1001: Synchronized wmf-config/db-eqiad.php: Depool db1080 (duration: 00m 55s)
  • 08:26 jmm@cumin2001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
  • 08:26 jmm@cumin2001: START - Cookbook sre.hosts.downtime
  • 08:26 moritzm: rebooting scb servers in codfw
  • 07:58 marostegui@deploy1001: Synchronized wmf-config/db-eqiad.php: Slowly repool db1080 (duration: 00m 56s)
  • 07:34 jmm@cumin2001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
  • 07:34 jmm@cumin2001: START - Cookbook sre.hosts.downtime
  • 07:33 moritzm: rebooting swift frontends in eqiad
  • 07:26 marostegui@deploy1001: Synchronized wmf-config/db-eqiad.php: More traffic to db1136 (duration: 00m 53s)
  • 07:11 marostegui: Stop MySQL on db1117:3323 to clone db1128 T222682
  • 06:38 marostegui@deploy1001: Synchronized wmf-config/db-eqiad.php: More traffic to db1136 (duration: 00m 55s)
  • 06:29 marostegui@deploy1001: Synchronized wmf-config/db-eqiad.php: Remove db2065 from config as it will be moved to m3 to replace db2042 (duration: 00m 55s)
  • 06:28 marostegui@deploy1001: Synchronized wmf-config/db-codfw.php: Remove db2065 from config as it will be moved to m3 to replace db2042 (duration: 00m 56s)
  • 06:14 mobrovac: start ruwiki dumps to fill the new parsoid tables - T215956
  • 05:33 marostegui@deploy1001: Synchronized wmf-config/db-codfw.php: Promote db2070 as m5 codfw master - T221533 (duration: 00m 54s)
  • 05:29 marostegui: Promote db2070 to m5 codfw master instead of db2037 - T221533
  • 05:20 marostegui@deploy1001: Synchronized wmf-config/db-codfw.php: Clarify db2107 status - will be the new master (duration: 00m 54s)
  • 05:03 marostegui@deploy1001: Synchronized wmf-config/db-codfw.php: Pool db1136 into s7 T222682 (duration: 00m 55s)
  • 05:02 marostegui@deploy1001: Synchronized wmf-config/db-eqiad.php: Pool db1136 into s7 T222682 (duration: 00m 55s)
  • 04:57 mobrovac: decommission restbase1009-a - T223976
  • 04:46 marostegui@deploy1001: Synchronized wmf-config/db-eqiad.php: Depool db1080 (duration: 00m 55s)
  • 04:40 marostegui@deploy1001: Synchronized wmf-config/db-eqiad.php: Repool db1080 (duration: 00m 58s)
  • 04:24 mobrovac: start nl, pt, pl wiki dumps to fill the new parsoid tables - T215956
  • 03:50 twentyafterfour: m3 database activity levels look like they have returned to normal
  • 03:48 twentyafterfour: puppet runs cleanly on phab1003
  • 03:39 mutante: phab1003 - disabling puppet; /etc/php/7.2/fpm/conf.d# ln -s /etc/php/7.2/mods-available/ldap.ini 20-ldap.ini ; systemctl restart php7.2-fpm
  • 03:27 twentyafterfour: restarted php-fpm on phab1003
  • 02:56 mutante: phab1001 - removing community_metrics and project_changes cron jobs to avoid duplicate mails
  • 02:51 mutante: phab1003 - chown -R phd /srv/repos/
  • 02:41 twentyafterfour: downtimed the systemd state on phab1001 for 1 year
  • 02:35 mutante: phabricator - going read-write again
  • 02:24 twentyafterfour: manually started aphlict on phab1003
  • 02:06 dzahn@cumin1001: conftool action : set/pooled=yes; selector: name=phab1003-vcs.eqiad.wmnet
  • 02:04 mutante: puppetmaster1001 - sudo -i conftool-merge
  • 01:52 twentyafterfour: phabricator is now served by phab1003 though still in read-only mode for a bit longer
  • 01:52 dzahn@cumin1001: conftool action : set/pooled=yes; selector: name=phab1003-vcs.eqiad.wmnet
  • 01:49 mutante: puppetmaster1001 - conftool-merge
  • 01:41 eileen: civicrm revision changed from e6e846708f to 21afd001b6, config revision is 87e78d3eac
  • 01:37 mutante: depooled phab1001-vcs from git-ssh via conftool
  • 01:36 dzahn@cumin1001: conftool action : set/pooled=no; selector: name=phab1001-vcs.eqiad.wmnet
  • 01:33 mutante: run puppet on mx1001/mx2001 - switch mail route for phab to phab1003
  • 01:30 mutante: switched from phab1001 to phab1003 - applied on cp1008 varnish canary first
  • 01:28 twentyafterfour: stopping phd on phab1001
  • 01:18 mutante: phabricator going readonly momentarily
  • 01:09 twentyafterfour: extended phab downtime in icinga, actual downtime hasn't started yet, prep work taking longer than expected
  • 00:52 smalyshev@deploy1001: Finished deploy [wdqs/wdqs@e040c6c]: Deploy GUI update (duration: 09m 54s)
  • 00:45 mutante: phab1003 - rsyncing /srv/repos from phab1001
  • 00:42 smalyshev@deploy1001: Started deploy [wdqs/wdqs@e040c6c]: Deploy GUI update
  • 00:33 ejegg: updated payments-wiki from fa005a0640 to 786d76e212

2019-05-22

  • 23:30 twentyafterfour: scheduling downtime for phabricator from 0:00 to 1:00 utc
  • 23:10 maxsem@deploy1001: Synchronized wmf-config/InitialiseSettings.php: https://gerrit.wikimedia.org/r/#/c/operations/mediawiki-config/+/511889/ (duration: 00m 55s)
  • 22:18 mdholloway: mobileapps rolled back deployment (again) due to occasional references endpoint timeouts
  • 22:17 mholloway-shell@deploy1001: Finished deploy [mobileapps/deploy@39e0ef1]: Update mobileapps to fcf3724, take 2 (duration: 07m 19s)
  • 22:15 foks: reset user email and password for Nv8200pa
  • 22:09 mholloway-shell@deploy1001: Started deploy [mobileapps/deploy@39e0ef1]: Update mobileapps to fcf3724, take 2
  • 22:09 mdholloway: mobileapps rolled back deployment due to endpoint check failure (not the same one as before); retrying momentarily
  • 22:08 mholloway-shell@deploy1001: Finished deploy [mobileapps/deploy@39e0ef1]: Update mobileapps to fcf3724 (duration: 03m 25s)
  • 22:08 foks: reset user email and password for DarkKyoushu
  • 22:05 mholloway-shell@deploy1001: Started deploy [mobileapps/deploy@39e0ef1]: Update mobileapps to fcf3724
  • 21:51 krinkle@deploy1001: Synchronized php-1.34.0-wmf.5/includes/resourceloader/MessageBlobStore.php: T222539 / 734b3d84f7 (duration: 00m 56s)
  • 21:47 krinkle@deploy1001: Synchronized php-1.34.0-wmf.6/includes/resourceloader/MessageBlobStore.php: T222539 / 3cb01cc73ce9 (duration: 00m 56s)
  • 21:41 urandom: decommissioning restbase1008-c -- T223976
  • 20:46 mdholloway: mobileapps rolled back deployment due to endpoint check failures
  • 20:43 mholloway-shell@deploy1001: Finished deploy [mobileapps/deploy@07632e1]: Update mobileapps to b058298, take 2 (duration: 04m 19s)
  • 20:39 mholloway-shell@deploy1001: Started deploy [mobileapps/deploy@07632e1]: Update mobileapps to b058298, take 2
  • 20:38 mholloway-shell@deploy1001: Finished deploy [mobileapps/deploy@07632e1]: Update mobileapps to b058298 (duration: 02m 41s)
  • 20:35 mholloway-shell@deploy1001: Started deploy [mobileapps/deploy@07632e1]: Update mobileapps to b058298
  • 19:26 jforrester@deploy1001: Finished scap: Re-build i18n and re-scap everything for i18n issues for T224116 T224124 T220731 (duration: 32m 55s)
  • 18:53 jforrester@deploy1001: Started scap: Re-build i18n and re-scap everything for i18n issues for T224116 T224124 T220731
  • 18:35 jforrester@deploy1001: Synchronized php-1.34.0-wmf.6/extensions/FlaggedRevs: Hot-deploy reverting FlaggedRevs config for T224116 T224124 (duration: 00m 58s)
  • 18:17 jforrester@deploy1001: Synchronized php-1.34.0-wmf.6/extensions/UrlShortener/modules/ext.urlShortener.special.js: Fix i18n/command mix-up Ic99cf063a (duration: 01m 00s)
  • 17:38 bblack: repool cp3046 as esams cache_upload ats-be node - T222937
  • 17:06 urandom: decommissioning restbase1008-b -- T223976
  • 16:17 hashar@deploy1001: rebuilt and synchronized wikiversions files: Revert group1 to 1.34.0-wmf.5 T224116 T224124 # T220731
  • 15:11 jbond@cumin1001: conftool action : set/pooled=yes; selector: name=dns1002.wikimedia.org
  • 15:08 jbond@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
  • 15:08 jbond@cumin1001: START - Cookbook sre.hosts.downtime
  • 15:08 jbond@cumin1001: conftool action : set/pooled=no; selector: name=dns1002.wikimedia.org
  • 15:07 jbond@cumin1001: conftool action : set/pooled=yes; selector: name=dns1001.wikimedia.org
  • 15:04 jbond@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
  • 15:04 jbond@cumin1001: START - Cookbook sre.hosts.downtime
  • 15:04 jbond@cumin1001: conftool action : set/pooled=no; selector: name=dns1001.wikimedia.org
  • 15:00 jbond@cumin1001: conftool action : set/pooled=yes; selector: name=dns2002.wikimedia.org
  • 14:58 jbond@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
  • 14:58 jbond@cumin1001: START - Cookbook sre.hosts.downtime
  • 14:58 jbond@cumin1001: conftool action : set/pooled=no; selector: name=dns2002.wikimedia.org
  • 14:57 jbond@cumin1001: conftool action : set/pooled=yes; selector: name=dns2001.wikimedia.org
  • 14:54 jbond@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
  • 14:54 jbond@cumin1001: START - Cookbook sre.hosts.downtime
  • 14:54 jbond@cumin1001: conftool action : set/pooled=no; selector: name=dns2001.wikimedia.org
  • 14:49 jbond@cumin1001: conftool action : set/pooled=yes; selector: name=nescio.wikimedia.org
  • 14:45 jbond@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
  • 14:45 jbond@cumin1001: START - Cookbook sre.hosts.downtime
  • 14:45 jbond@cumin1001: conftool action : set/pooled=no; selector: name=nescio.wikimedia.org
  • 14:42 jbond@cumin1001: conftool action : set/pooled=yes; selector: name=maerlant.wikimedia.org
  • 14:35 jbond@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
  • 14:35 jbond@cumin1001: START - Cookbook sre.hosts.downtime
  • 14:35 jbond@cumin1001: conftool action : set/pooled=no; selector: name=maerlant.wikimedia.org
  • 14:17 jbond@cumin1001: conftool action : set/pooled=yes; selector: name=dns4002.wikimedia.org
  • 14:14 hashar: 1.34.0-wmf.6 deployed to group1 with the exception of cawikinews due to T224116
  • 14:14 mobrovac: start it, es wiki dumps (fr and de completed) to fill the new parsoid tables - T215956
  • 14:11 jbond@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
  • 14:11 jbond@cumin1001: START - Cookbook sre.hosts.downtime
  • 14:10 jbond@cumin1001: conftool action : set/pooled=no; selector: name=dns4002.wikimedia.org
  • 14:09 jbond@cumin1001: conftool action : set/pooled=yes; selector: name=dns4001.wikimedia.org
  • 14:08 jbond@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
  • 14:08 jbond@cumin1001: START - Cookbook sre.hosts.downtime
  • 14:02 marostegui: Stop MySQL on db2078 for upgrade
  • 13:58 bblack: depool cp3046 for reimage to ats-be - T222937
  • 13:58 jmm@cumin2001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
  • 13:58 jmm@cumin2001: START - Cookbook sre.hosts.downtime
  • 13:57 moritzm: rebooting swift frontends in codfw
  • 13:46 jbond@cumin1001: conftool action : set/pooled=yes; selector: name=dns5002.wikimedia.org
  • 13:43 jbond@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
  • 13:43 jbond@cumin1001: START - Cookbook sre.hosts.downtime
  • 13:43 jbond@cumin1001: conftool action : set/pooled=no; selector: name=dns5002.wikimedia.org
  • 13:42 jbond@cumin1001: conftool action : set/pooled=yes; selector: name=dns5001.wikimedia.org
  • 13:36 jbond@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
  • 13:36 jbond@cumin1001: START - Cookbook sre.hosts.downtime
  • 13:35 jbond@cumin1001: conftool action : set/pooled=no; selector: name=dns5001.wikimedia.org
  • 13:27 reedy@deploy1001: Synchronized php-1.34.0-wmf.6/extensions/Collection/templates/: T224092 (duration: 00m 58s)
  • 13:13 hashar@deploy1001: Synchronized php: group1 wikis to 1.34.0-wmf.6 (duration: 00m 54s)
  • 13:06 urandom: decommissioning restbase1008-a -- T223976
  • 12:39 marostegui: Stop replication on db2048 (s1 codfw master) to rebuild revision table - this will generate lag on codfw - T224017
  • 12:35 bblack: cp3035: restarting varnish backend
  • 12:34 marostegui: Stop replication on db1080 to rebuild revision table - T224017
  • 12:29 marostegui@deploy1001: Synchronized wmf-config/db-eqiad.php: Depool db1080 to rebuild revision table T224017 (duration: 00m 55s)
  • 11:30 Amir1: EU SWAT is done
  • 11:30 ladsgroup@deploy1001: Synchronized wmf-config/InitialiseSettings.php: SWAT: Remove constraint-suggestions beta feature (T220609)|gerrit:503342Remove constraint-suggestions beta feature (T220609) (duration: 00m 57s)
  • 11:19 ladsgroup@deploy1001: Synchronized wmf-config/InitialiseSettings.php: SWAT: Add configuration for EntitySchema ShExSimpleUrl (T223120)|gerrit:509878Add configuration for EntitySchema ShExSimpleUrl (T223120) (duration: 00m 56s)
  • 11:14 ladsgroup@deploy1001: Synchronized wmf-config/InitialiseSettings.php: SWAT: [[gerrit:511674|[SDC] Enable depicts qualifiers on testcommons]] (duration: 00m 57s)
  • 10:01 vgutierrez: restarting varnish-backend on cp3039
  • 09:52 mobrovac: start the en, fr and de wiki dumps again to populate the new parsoid table - T215956
  • 09:43 mobrovac@deploy1001: Finished deploy [restbase/deploy@b90fb8b]: Temporarily copy from old tables to new ones if the data is not found, the correct way this time - T215956 (duration: 27m 07s)
  • 09:42 marostegui: Stop MySQL on db2078:m5 to clone db2070 - T221533
  • 09:16 mobrovac@deploy1001: Started deploy [restbase/deploy@b90fb8b]: Temporarily copy from old tables to new ones if the data is not found, the correct way this time - T215956
  • 08:52 marostegui@deploy1001: Synchronized wmf-config/db-codfw.php: Move db2070 from s1 to m5 (duration: 00m 55s)
  • 08:51 marostegui@deploy1001: sync-file aborted: Move db2070 from s1 to m5 (duration: 00m 03s)
  • 08:42 jmm@cumin2001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
  • 08:42 jmm@cumin2001: START - Cookbook sre.hosts.downtime
  • 08:32 marostegui@deploy1001: Synchronized wmf-config/db-eqiad.php: Fully repool db1086 (duration: 00m 56s)
  • 08:17 marostegui@deploy1001: Synchronized wmf-config/db-eqiad.php: Repool db1086 into API (duration: 00m 56s)
  • 08:03 marostegui@deploy1001: Synchronized wmf-config/db-eqiad.php: Slowly repool db1086 (duration: 00m 55s)
  • 07:41 marostegui@deploy1001: Synchronized wmf-config/db-codfw.php: Tackle s8 codfw weights T220170 (duration: 00m 55s)
  • 07:36 mobrovac: decommission restbase1007-c - T223976
  • 07:24 marostegui@deploy1001: Synchronized wmf-config/db-codfw.php: Tackle s4 codfw weights T220170 (duration: 01m 06s)
  • 07:23 marostegui: Restart MySQL on db2090 to change binlog format T220170
  • 06:17 marostegui@deploy1001: Synchronized wmf-config/db-codfw.php: Remove db2040 from config T224079 (duration: 00m 55s)
  • 06:16 marostegui@deploy1001: Synchronized wmf-config/db-eqiad.php: Remove db2040 from config T224079 (duration: 00m 56s)
  • 06:13 marostegui: Remove db2040 from zarcillo and tendril - T224079
  • 06:01 marostegui: Stop MySQL on db2040 - T224079
  • 05:42 marostegui: Stop MySQL on db1086 to clone db1136
  • 05:39 marostegui@deploy1001: Synchronized wmf-config/db-eqiad.php: Depool db1086 (duration: 00m 55s)
  • 05:26 marostegui@deploy1001: Synchronized wmf-config/db-codfw.php: Pool db2118 and db2120 into s7 T222772 (duration: 00m 55s)
  • 05:25 marostegui@deploy1001: Synchronized wmf-config/db-eqiad.php: Pool db2118 and db2120 into s7 T222772 (duration: 00m 55s)
  • 05:09 marostegui@deploy1001: Synchronized wmf-config/db-eqiad.php: Remove db1118 from s1 api and pool db1134 instead T224017 (duration: 00m 57s)
  • 04:41 gilles: purging ruwiki and eswiki to make them get the new origin trial tokens
  • 04:39 gilles@deploy1001: Synchronized wmf-config/InitialiseSettings.php: Renew origin trial tokens (duration: 00m 57s)
  • 03:22 legoktm: removed 2fa for T224075
  • 01:46 aaron@deploy1001: Synchronized php-1.34.0-wmf.5/includes/specials/SpecialWatchlist.php: 68eeaa5 (duration: 00m 57s)
  • 01:22 aaron@deploy1001: Synchronized php-1.34.0-wmf.6/includes/specials/SpecialWatchlist.php: 447bf50 (duration: 00m 57s)

2019-05-21

  • 23:47 maxsem@deploy1001: Synchronized wmf-config/InitialiseSettings.php: https://gerrit.wikimedia.org/r/#/c/operations/mediawiki-config/+/511668/ (duration: 00m 57s)
  • 23:34 maxsem@deploy1001: Synchronized wmf-config/CommonSettings.php: https://gerrit.wikimedia.org/r/#/c/operations/mediawiki-config/+/511667/ (duration: 00m 56s)
  • 22:56 mutante: ms-be2034 - degraded systemd state was cleared and originally caused by " failed Session 72587 of user debmonitor"
  • 22:56 mutante: ms-be2034 - sudo systemctl reset-failed
  • 22:51 urandom: decommissioning restbase1007-b -- T223976
  • 21:35 ejegg: updated payments-wiki from d5ef5ad067 to fa005a0640
  • 21:21 mutante: re-enabling puppet on mc1* hosts
  • 20:43 mutante: re-enabling puppet on all hosts using memcached class - except mc1*
  • 20:31 mutante: mc2019 - stopping memcached and letting puppet restart it to confirm no issues after switching to systemd::service
  • 20:20 mutante: disabling puppet on all servers using class memcached (57)
  • 20:06 tzatziki: removing (another) two files for legal compliance
  • 19:43 tzatziki: removing two files for legal compliance
  • 19:12 thcipriani: gerrit back on 2.15.13
  • 19:09 thcipriani: restart gerrit for 2.15.13 update
  • 19:08 thcipriani@deploy1001: Finished deploy [gerrit/gerrit@2de9001]: Gerrit to 2.15.13 (cobalt, restart incoming) (duration: 00m 20s)
  • 19:08 thcipriani@deploy1001: Started deploy [gerrit/gerrit@2de9001]: Gerrit to 2.15.13 (cobalt, restart incoming)
  • 19:06 thcipriani@deploy1001: Finished deploy [gerrit/gerrit@2de9001]: Gerrit to 2.15.13 (gerrit 2001 only) (duration: 00m 11s)
  • 19:06 thcipriani@deploy1001: Started deploy [gerrit/gerrit@2de9001]: Gerrit to 2.15.13 (gerrit 2001 only)
  • 18:50 bblack: repooling cp1085 frontends (weren't meant to be depooled)
  • 18:38 bblack: re-pooling eqiad front edge traffic (onto new LVSes from T184293 )
  • 18:36 XioNoX: update lvs static routes on cr1/2-eqiad - T184293
  • 18:06 andrewbogott: restarting rabbitmq-server on cloudcontrol1003 (turning on HA queues)
  • 17:59 bblack: rebooting lvs1016 in attempt to clear interface config issues - T224027
  • 17:51 XioNoX: add BGP sessions to AS202053 in esams
  • 17:31 bblack: eqiad LVS: low-traffic (all internal services): puppeting lvs1016, bringing back pybal in "secondary" role for all 3 traffic classes (high-traffic1, high-traffic2, low-traffic), no traffic shift expected (again, after merging last-minute fixup https://gerrit.wikimedia.org/r/c/operations/puppet/+/511759 )
  • 17:25 bblack: eqiad LVS: low-traffic (all internal services): puppeting lvs1016, bringing back pybal in "secondary" role for all 3 traffic classes (high-traffic1, high-traffic2, low-traffic), no traffic shift expected
  • 17:24 bblack: eqiad LVS: low-traffic (all internal services): puppeting lvs1006, basically no-op
  • 17:21 bblack: eqiad LVS: low-traffic (all internal services): puppeting lvs1015, bringing back pybal in primary role, shifting traffic to lvs1015
  • 17:20 bblack: eqiad LVS: low-traffic (all internal services): disable pybal on lvs1016 + lvs1015, shifting traffic to lvs1006
  • 17:18 reedy@deploy1001: Synchronized php-1.34.0-wmf.6/extensions/Collection/includes/CollectionHooks.php: Fix paths (duration: 00m 56s)
  • 17:17 bblack: eqiad LVS: high-traffic2 (upload): puppeting lvs1005, basically no-op
  • 17:15 bblack: eqiad LVS: high-traffic2 (upload): puppeting lvs1002, bringing back pybal in backup role, no traffic shift
  • 17:13 bblack: eqiad LVS: high-traffic2 (upload): puppeting lvs1014, bringing back pybal in primary role, shifting traffic to lvs1014
  • 17:11 bblack: eqiad LVS: high-traffic2 (upload): disable pybal on lvs1014 + lvs1002, shifting traffic to lvs1005
  • 17:09 bblack: eqiad LVS: high-traffic1 (text): puppeting lvs1004, basically no-op
  • 17:07 bblack: eqiad LVS: high-traffic1 (text): puppeting lvs1001, bringing back pybal in backup role, no traffic shift
  • 17:06 bblack: eqiad LVS: high-traffic1 (text): puppeting lvs1013, bringing back pybal in primary role, shifting traffic to lvs1013
  • 17:04 bblack: eqiad LVS: high-traffic1 (text): disable pybal on lvs1013 + lvs1001, shifting traffic to lvs1004
  • 16:55 jbond@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
  • 16:55 jbond@cumin1001: START - Cookbook sre.hosts.downtime
  • 16:55 jbond42: rebooting wtp1046-1048
  • 16:55 bblack: starting Eqiad LVS re-arrangement shortly - T184293 - https://gerrit.wikimedia.org/r/c/operations/puppet/+/511717 (eqiad front edge is still depooled from public traffic)
  • 16:50 jbond@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
  • 16:50 jbond@cumin1001: START - Cookbook sre.hosts.downtime
  • 16:50 jbond42: rebooting wtp1043-1045
  • 16:46 mutante: rebooting phab1003 (non-prod)
  • 16:44 jbond@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
  • 16:44 jbond@cumin1001: START - Cookbook sre.hosts.downtime
  • 16:44 jbond42: rebooting wtp1040-1042
  • 16:39 jbond@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
  • 16:39 jbond@cumin1001: START - Cookbook sre.hosts.downtime
  • 16:39 jbond42: rebooting wtp1037-1039
  • 16:26 mobrovac: truncate "others_T_parsoid".data
  • 16:25 mobrovac: restbase truncate "commons_T_parsoid".data
  • 16:24 jbond@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
  • 16:24 jbond@cumin1001: START - Cookbook sre.hosts.downtime
  • 16:24 jbond42: rebooting wtp1033-1034
  • 16:18 mobrovac: restbase truncate "enwiki_T_parsoid".data
  • 16:16 jbond@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
  • 16:16 jbond@cumin1001: START - Cookbook sre.hosts.downtime
  • 16:16 jbond42: rebooting wtp1031-1032
  • 16:10 mobrovac: restbase truncate "wikipedia_T_parsoid".data
  • 16:09 jbond@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
  • 16:09 jbond@cumin1001: START - Cookbook sre.hosts.downtime
  • 16:09 jbond42: rebooting wtp1029-2030
  • 16:01 jbond@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
  • 16:01 jbond@cumin1001: START - Cookbook sre.hosts.downtime
  • 16:01 jbond42: rebooting wtp1027-2028
  • 15:56 urandom: decommissioning restbase1007-a -- T208087
  • 15:54 jbond@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
  • 15:54 jbond@cumin1001: START - Cookbook sre.hosts.downtime
  • 15:54 jbond42: rebooting wtp1025-2026
  • 15:45 mobrovac@deploy1001: Finished deploy [restbase/deploy@cf00120]: Revert Temporarily copy from old tables to new ones if the data is not found, rb1007 (duration: 02m 43s)
  • 15:42 mobrovac@deploy1001: Started deploy [restbase/deploy@cf00120]: Revert Temporarily copy from old tables to new ones if the data is not found, rb1007
  • 15:42 mobrovac@deploy1001: Finished deploy [restbase/deploy@cf00120]: Revert Temporarily copy from old tables to new ones if the data is not found (duration: 02m 40s)
  • 15:40 jbond@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
  • 15:40 jbond@cumin1001: START - Cookbook sre.hosts.downtime
  • 15:40 jbond42: rebooting wtp2019-2020
  • 15:39 mobrovac@deploy1001: Started deploy [restbase/deploy@cf00120]: Revert Temporarily copy from old tables to new ones if the data is not found
  • 15:38 mobrovac@deploy1001: Finished deploy [restbase/deploy@022cb98]: Temporarily copy from old tables to new ones if the data is not found, take #2 (duration: 00m 45s)
  • 15:38 mobrovac@deploy1001: Started deploy [restbase/deploy@022cb98]: Temporarily copy from old tables to new ones if the data is not found, take #2
  • 15:37 mobrovac@deploy1001: Finished deploy [restbase/deploy@022cb98]: Temporarily copy from old tables to new ones if the data is not found - T215956 (duration: 07m 10s)
  • 15:37 oblivian@deploy1001: Synchronized wmf-config/CommonSettings.php: Moving to 10% of users on php7 T219150 (duration: 00m 57s)
  • 15:32 XioNoX: enable BGP to telia on cr1-codfw - T222967
  • 15:30 mobrovac@deploy1001: Started deploy [restbase/deploy@022cb98]: Temporarily copy from old tables to new ones if the data is not found - T215956
  • 15:24 jbond@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
  • 15:23 jbond@cumin1001: START - Cookbook sre.hosts.downtime
  • 15:23 jbond42: rebooting wtp2017-2018
  • 15:13 jbond@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
  • 15:13 jbond@cumin1001: START - Cookbook sre.hosts.downtime
  • 15:13 jbond42: rebooting wtp2015-2016
  • 15:10 XioNoX: disable BGP to telia on cr1-codfw - T222967
  • 15:05 jbond@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
  • 15:05 jbond@cumin1001: START - Cookbook sre.hosts.downtime
  • 15:05 jbond42: rebooting wtp2013-2014
  • 15:02 crusnov@deploy1001: Finished deploy [netbox/deploy@3091b51]: deploy new version of ganeti-netbox sync - T220422 (duration: 00m 55s)
  • 15:01 crusnov@deploy1001: Started deploy [netbox/deploy@3091b51]: deploy new version of ganeti-netbox sync - T220422
  • 14:57 jbond@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
  • 14:57 jbond@cumin1001: START - Cookbook sre.hosts.downtime
  • 14:57 jbond42: rebooting wtp2011-2012
  • 14:57 hashar@deploy1001: rebuilt and synchronized wikiversions files: group0 to 1.34.0-wmf.6
  • 14:50 jbond42: rebooting wtp2009-2010
  • 14:50 jbond@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
  • 14:50 jbond@cumin1001: START - Cookbook sre.hosts.downtime
  • 14:44 jbond42: rebooting wtp2007-2008
  • 14:44 jbond@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
  • 14:44 jbond@cumin1001: START - Cookbook sre.hosts.downtime
  • 14:37 jbond@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
  • 14:37 jbond@cumin1001: START - Cookbook sre.hosts.downtime
  • 14:37 jbond42: rebooting wtp2005-2006
  • 14:31 jbond@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
  • 14:31 jbond@cumin1001: START - Cookbook sre.hosts.downtime
  • 14:31 jbond42: rebooting wtp2003-2004
  • 14:27 hashar@deploy1001: Finished scap: testwiki to php-1.344.0-wmf.6 and rebuild l10n cache # T220731 (duration: 48m 09s)
  • 14:26 volans: restarting wikibugs
  • 14:25 elukey@cumin1001: END (PASS) - Cookbook sre.hosts.decommission (exit_code=0)
  • 14:25 elukey@cumin1001: START - Cookbook sre.hosts.decommission
  • 14:14 jbond@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
  • 14:13 jbond@cumin1001: START - Cookbook sre.hosts.downtime
  • 14:13 jbond42: rebooting wtp2001-2002
  • 13:50 bblack: rebooting lvs1013,14,15 for verification
  • 13:39 hashar@deploy1001: Started scap: testwiki to php-1.344.0-wmf.6 and rebuild l10n cache # T220731
  • 13:37 hashar@deploy1001: Pruned MediaWiki: 1.34.0-wmf.1 (duration: 02m 12s)
  • 13:36 hashar: scap clean --verbose --delete 1.34.0-wmf.1 # T220731
  • 13:29 hashar: scap clean --verbose --delete 1.33.0-wmf.25 # T220731
  • 13:25 godog: swift eqiad-prod: start depool ms-be1033 - T223518
  • 13:24 hashar: Applied security patches to 1.34.0-wmf.6 # T220731
  • 13:24 hashar: Applied security patches to 1.34.0-wmf.6
  • 13:23 bblack: rebooting lvs1013 (possibly a few times, debugging startup issues)
  • 13:20 hashar: scap prep 1.34.0-wmf.6 # T220731
  • 13:11 hashar: Updated plugins on https://releases-jenkins.wikimedia.org/
  • 13:09 hashar: Restarting Jenkins T224002
  • 12:45 hashar: Cutting branch wmf/1.34.0-wmf.6 # T220731
  • 12:22 volans: restarting Icinga on icinga1001 to pick up new open files limits
  • 12:08 jiji@deploy1001: Finished deploy [cpjobqueue/deploy@4588f16]: Migrating htmlCacheUpdate to PHP7 - T219148 (duration: 00m 54s)
  • 12:07 jiji@deploy1001: Started deploy [cpjobqueue/deploy@4588f16]: Migrating htmlCacheUpdate to PHP7 - T219148
  • 11:59 mobrovac: started dewiki dumps - T215956
  • 11:58 mobrovac: started frwiki dumps - T215956
  • 11:46 mobrovac: started enwiki dumps - T215956
  • 11:27 Amir1: EU SWAT is done
  • 11:27 ladsgroup@deploy1001: Synchronized wmf-config/CommonSettings.php: SWAT: Revert "Switch off php7 for investigation of production instabilities"|gerrit:511658Revert "Switch off php7 for investigation of production instabilities" (duration: 00m 50s)
  • 11:20 volans: restarting Icinga on icinga2001 (passive server) to pick up new open file limits
  • 11:18 jbond@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
  • 11:17 jbond@cumin1001: START - Cookbook sre.hosts.downtime
  • 11:17 jbond42: reboot wtp1025.eqiad.wmnet
  • 11:10 ladsgroup@deploy1001: Synchronized wmf-config/CommonSettings.php: SWAT: Define wmgUseEntitySchema (T221651)|gerrit:505816Define wmgUseEntitySchema (T221651), part II (duration: 00m 49s)
  • 11:09 mobrovac@deploy1001: Finished deploy [restbase/deploy@cf00120]: Switch Parsoid to simple k/v bucket - T215956 (duration: 25m 50s)
  • 11:08 ladsgroup@deploy1001: Synchronized wmf-config/InitialiseSettings.php: SWAT: Define wmgUseEntitySchema (T221651)|gerrit:505816Define wmgUseEntitySchema (T221651), part I (duration: 00m 50s)
  • 11:07 godog: swift codfw-prod: remove ms-be201[345] - T221068
  • 10:59 _joe_: rolling restart of php7.2-fpm across the fleet to pick up a config change
  • 10:44 mobrovac@deploy1001: Started deploy [restbase/deploy@cf00120]: Switch Parsoid to simple k/v bucket - T215956
  • 10:39 jijiki: updating prometheus-mcrouter-exporter on mw* servers
  • 10:26 godog: pool new restbase hosts - T219404
  • 10:20 filippo@puppetmaster1001: conftool action : set/pooled=yes; selector: name=restbase1019.eqiad.wmnet
  • 09:49 moritzm: updated buster netboot image to daily image from 20190521
  • 09:26 moritzm: reimaging graphite2001 to buster for some d-i tests
  • 08:58 marostegui@deploy1001: Synchronized wmf-config/db-codfw.php: Repool db2104 as candidate master and as API (duration: 00m 51s)
  • 08:56 marostegui: Stop MySQL on db2041 as it will be decommissioned T223950
  • 06:59 oblivian@deploy1001: Synchronized wmf-config/CommonSettings.php: Turning off php7 sampling for investigation in T223952 (duration: 00m 53s)
  • 06:55 elukey: reboot of stat100[4,5,6,7] and notebook100[3,4] for kernel upgrades
  • 06:31 marostegui: Stop mariadb on db2104 to convert it to s2 candidate master
  • 06:30 marostegui@deploy1001: Synchronized wmf-config/db-codfw.php: Depool db2104 (duration: 00m 51s)
  • 05:50 marostegui: Remove db2041 from tendril and zarcillo - T223950
  • 05:43 marostegui@deploy1001: Synchronized wmf-config/db-eqiad.php: Remove db2041 for decommissioning T223950 (duration: 00m 51s)
  • 05:42 marostegui@deploy1001: Synchronized wmf-config/db-codfw.php: Remove db2041 for decommissioning T223950 (duration: 00m 51s)
  • 05:16 marostegui: Stop MySQL on db2040
  • 05:16 marostegui@deploy1001: Synchronized wmf-config/db-codfw.php: Depool db2040 (duration: 00m 50s)
  • 05:14 marostegui@deploy1001: Synchronized wmf-config/db-codfw.php: Pool db2114 into s6 - T222772 (duration: 00m 50s)
  • 05:13 marostegui@deploy1001: Synchronized wmf-config/db-eqiad.php: Pool db2114 into s6 - T222772 (duration: 00m 51s)
  • 03:36 urandom: bootstrapping restbase1027-c -- T219404
  • 00:47 urandom: bootstrapping restbase1027-b -- T219404
  • 00:05 aaron@deploy1001: Synchronized php-1.34.0-wmf.5/includes/libs/objectcache/APCUBagOStuff.php: 982299d (duration: 00m 54s)

2019-05-20

  • 21:07 ejegg: updated payments-wiki from 8397ccf9cc to d5ef5ad067
  • 19:20 mobrovac: bootstrap restbase1027-a - T219404
  • 18:55 krinkle@deploy1001: Synchronized php-1.34.0-wmf.5/includes/Linker.php: T222857 / Iecc2140fabd3 (duration: 00m 54s)
  • 16:43 onimisionipe: rolling reboot of maps eqiad to pick kernel upgrades
  • 16:38 mobrovac: bootstrap restbase1026-c - T219404
  • 15:26 onimisionipe: rebooting codfw maps to pick up kernel upgrades
  • 15:26 marostegui: Stop replication on labsdb1011 to start compressing tables - T222978
  • 15:13 anomie@deploy1001: Synchronized wmf-config/InitialiseSettings.php: Setting actor migration to write-new/read-new on group 0 (T188327) (duration: 00m 55s)
  • 14:54 bblack: rebooting lvs1013, lvs1014, lvs1015 (not in active service, yet)
  • 14:43 jiji@deploy1001: Finished deploy [cpjobqueue/deploy@89b0ad0]: Migrating RecordLintJob to PHP7 - T219148 (duration: 00m 55s)
  • 14:42 jiji@deploy1001: Started deploy [cpjobqueue/deploy@89b0ad0]: Migrating RecordLintJob to PHP7 - T219148
  • 14:21 marostegui: Reload haproxy on dbroxy1010 to depool labsdb1011
  • 14:14 marostegui: Reload haproxy on dbroxy1010 to repool labsdb1010
  • 13:58 mobrovac: bootstrap restbase1026-b - T219404
  • 12:43 marostegui@deploy1001: Synchronized wmf-config/db-eqiad.php: More traffic to db1126 and db1134 (duration: 00m 50s)
  • 11:44 fsero@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
  • 11:44 fsero@cumin1001: START - Cookbook sre.hosts.downtime
  • 11:28 fsero@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
  • 11:28 fsero@cumin1001: START - Cookbook sre.hosts.downtime
  • 11:21 fsero@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
  • 11:21 fsero@cumin1001: START - Cookbook sre.hosts.downtime
  • 11:17 mobrovac: bootstrap restbase1026-a - T219404
  • 11:16 fsero@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
  • 11:15 fsero@cumin1001: START - Cookbook sre.hosts.downtime
  • 11:01 arturo: icinga downtime toolschecker for 3h for T223332
  • 10:45 jmm@cumin2001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
  • 10:44 jmm@cumin2001: START - Cookbook sre.hosts.downtime
  • 10:43 jdrewniak@deploy1001: Synchronized portals: Wikimedia Portals Update: Bumping portals to master (T128546)|gerrit:511398 Bumping portals to master (T128546) (duration: 00m 49s)
  • 10:42 jdrewniak@deploy1001: Synchronized portals/wikipedia.org/assets: Wikimedia Portals Update: Bumping portals to master (T128546)|gerrit:511398 Bumping portals to master (T128546) (duration: 00m 50s)
  • 10:27 moritzm: rebooting contint1001 for kernel update
  • 10:25 hashar: contint1001: docker image prune -f | Total reclaimed space: 7.115GB | T207707
  • 10:20 hashar: Stopped Zuul gracefully
  • 10:18 jmm@cumin2001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
  • 10:18 jmm@cumin2001: START - Cookbook sre.hosts.downtime
  • 10:18 fsero: puppet reenabled certs renewed - T221346
  • 10:08 fsero: rolling over certs into mcrouter proxies codfw - T221346
  • 10:03 fsero: rolling over certs into mcrouter proxies eqiad - T221346
  • 09:42 marostegui: Remove db2036 from tendril and zarcillo - T223885
  • 09:39 marostegui: Stop MySQL on db2036 T223885
  • 09:38 marostegui@deploy1001: Synchronized wmf-config/db-codfw.php: Remove db2036, going to be decommissioned T223885 (duration: 00m 49s)
  • 09:37 marostegui@deploy1001: Synchronized wmf-config/db-eqiad.php: Remove db2036, going to be decommissioned T223885 (duration: 00m 49s)
  • 09:36 fsero: rolling over new certs to all mcrouter hosts except proxys - T221346
  • 09:26 fsero: continue to rolling over new certs - T221346
  • 09:01 fsero: disabling puppet on mcrouter hosts for regenerating certs - T221346
  • 08:49 moritzm: installing atftpd security updates
  • 08:43 mobrovac: bootstrap restbase1025-c - T219404
  • 08:38 moritzm: installing samba security updates
  • 08:36 moritzm: installing ghostscript security updates on jessie
  • 08:25 moritzm: installing cups-filter security updates on jessie (prerequisite for ghostscript security update)
  • 07:48 marostegui@deploy1001: Synchronized wmf-config/db-eqiad.php: More traffic to db1126 and db1134 (duration: 00m 48s)
  • 07:26 marostegui@deploy1001: Synchronized wmf-config/db-codfw.php: Repool db2046 (duration: 00m 50s)
  • 06:25 elukey: rebuild and upload memkeys 20181031-1 to stretch-wikimedia
  • 06:21 marostegui@deploy1001: Synchronized wmf-config/db-eqiad.php: More traffic to db1126 and db1134 (duration: 00m 49s)
  • 06:20 elukey: upgrade memkeys to version 20181031-1 on all the mc* hosts (was deployied only on a few of them) - T208376
  • 06:11 mobrovac: bootstrap restbase1025-b - T219404
  • 06:00 elukey: powercycle analytics1071 - soft lockups error messages in the dmesg
  • 05:51 marostegui: Reload haproxy on dbproxy1010 to depool labsdb1010
  • 05:42 marostegui: Reload haproxy on dbproxy1010 and dbproxy1011 to repool labsdb1009 and restore original weights
  • 05:39 marostegui@deploy1001: Synchronized wmf-config/db-eqiad.php: Pool db1126 into s8, db1134 into s1 T222682 (duration: 00m 49s)
  • 05:38 marostegui@deploy1001: Synchronized wmf-config/db-codfw.php: Pool db1126 into s8, db1134 into s1 T222682 (duration: 00m 49s)
  • 05:12 marostegui: Stop MySQL on db2046
  • 05:11 marostegui@deploy1001: Synchronized wmf-config/db-codfw.php: Depool db2046 (duration: 00m 50s)
  • 05:07 marostegui@deploy1001: Synchronized wmf-config/db-codfw.php: Repool db2038 (duration: 00m 49s)
  • 05:01 marostegui@deploy1001: Synchronized wmf-config/db-codfw.php: Depool db2038 (duration: 00m 55s)
  • 02:42 cdanis: cdanis@cp1075.eqiad.wmnet ~ % sudo -i varnish-backend-restart

2019-05-19

  • 20:16 ariel@deploy1001: Finished deploy [dumps/dumps@4febe0c]: for abstract dumps, skip any processing of pages not in main namespace (duration: 00m 03s)
  • 20:16 ariel@deploy1001: Started deploy [dumps/dumps@4febe0c]: for abstract dumps, skip any processing of pages not in main namespace
  • 17:51 mobrovac: bootstrap restbase1025-a - T219404
  • 13:26 ebernhardson@deploy1001: Synchronized wmf-config/ProductionServices.php: T223734: Depool cloudelastic100[12] (duration: 00m 49s)
  • 12:37 reedy@deploy1001: Synchronized wmf-config/interwiki-labs.php: update (duration: 00m 57s)
  • 10:32 reedy@deploy1001: Synchronized wikiversions-labs.json: T223770 (duration: 00m 48s)
  • 10:31 reedy@deploy1001: Synchronized dblists/all-labs.dblist: T223770 (duration: 00m 51s)
  • 10:12 mobrovac: bootstrap restbase1024-c - T219404
  • 09:59 ebernhardson: eqiad psi elasticsearch high disk watermark to 89% to allow unallocated shard to initialize
  • 09:56 ebernhardson: eqiad psi elasticsearch low disk watermark to 79% to allow unallocated shard to initialize
  • 08:13 jijiki: varnish-backend-restart on cp1087
  • 06:56 mobrovac: bootstrap restbase1024-b - T219404
  • 05:09 marostegui: varnish-backend-restart on cp1081

2019-05-18

  • 23:53 bblack: rebooting lvs1015 for interface changes
  • 22:44 bblack: imaging lvs1013-lvs1015
  • 21:01 bblack: depooling eqiad public front edge in authdns
  • 19:18 krinkle@deploy1001: Synchronized php-1.34.0-wmf.5/extensions/Collection/templates/CollectionSuggestTemplate.php: T223742 / 89bd434 (duration: 00m 49s)
  • 19:16 mobrovac: bootstrap restbase1024-a - T219404
  • 18:50 krinkle@deploy1001: Synchronized wmf-config/CommonSettings.php: T222146 / 9385b2dd66 (duration: 00m 50s)
  • 16:53 mobrovac: bootstrap restbase1023-c - T219404
  • 15:57 krinkle@deploy1001: Synchronized php-1.34.0-wmf.5/extensions/TimedMediaHandler/includes/handlers/WebMHandler/WebMHandler.php: T223445 / a9df59c59d7a30 (duration: 00m 51s)
  • 14:59 reedy@deploy1001: Synchronized wmf-config/InitialiseSettings.php: whitespace is srs (duration: 00m 49s)
  • 14:56 reedy@deploy1001: Synchronized wmf-config/flaggedrevs.php: Copy in default config (duration: 01m 04s)
  • 13:51 urandom: bootstrapping restbase1023-b - T219404
  • 05:41 mobrovac: bootstrap rb1023-a - T219404
  • 02:37 urandom: bootstrapping restbase1022-c - T219404

2019-05-17

  • 23:55 urandom: bootstrapping restbase1022-b - T219404
  • 23:11 foks: removing one file for legal compliance
  • 15:20 hashar@deploy1001: Synchronized php-1.34.0-wmf.5/includes/api/ApiUpload.php: Revert "Always validate uploads over api" - T223448 (T222994 T223446) (duration: 01m 00s)
  • 15:18 hashar: Deploying hotfix https://gerrit.wikimedia.org/r/#/c/mediawiki/core/+/510924/ . Should restore upload of large files on commons and other wikis #T223448 (poke T22994 T223446 )
  • 14:51 mobrovac: bootstrap restbase1022-a - T219404
  • 14:43 fsero: reenabling puppet puppet on mcrouter hosts for T221346, checks in place is there any alert for cert expiration and mcrouter this is the source :)
  • 14:17 jynus@deploy1001: Synchronized wmf-config/db-eqiad.php: Repool db1098 & db1131 after maintenance (duration: 00m 49s)
  • 14:09 fsero: second round of setting up cert check, disablign puppet on mcrouter hosts T221346
  • 12:58 mobrovac: bootstrap restbase1021-c - T219404
  • 10:59 mobrovac: bootstrap restbase1021-b - T219404
  • 09:27 godog: swift remove ms-be101[345] from rings - T220590
  • 09:02 marostegui@deploy1001: Synchronized wmf-config/db-eqiad.php: Fully repool db1083 (duration: 00m 48s)
  • 08:53 marostegui@deploy1001: Synchronized wmf-config/db-eqiad.php: More traffic to db1083 (duration: 00m 49s)
  • 08:43 marostegui@deploy1001: Synchronized wmf-config/db-eqiad.php: More traffic to db1083 (duration: 00m 49s)
  • 08:32 marostegui@deploy1001: Synchronized wmf-config/db-eqiad.php: More traffic to db1083 (duration: 00m 49s)
  • 08:24 fsero: reenabling puppet after reverting T221346
  • 08:19 marostegui@deploy1001: Synchronized wmf-config/db-eqiad.php: Slowly repool db1083 (duration: 00m 59s)
  • 07:57 fsero: disabling puppet on mcrouter hosts for T221346
  • 07:12 marostegui: Compress s7 on labsdb1012 T222978
  • 06:36 marostegui@deploy1001: Synchronized wmf-config/db-codfw.php: Pool db2111 and db2113 into s5 T222772 (duration: 00m 49s)
  • 06:35 marostegui@deploy1001: Synchronized wmf-config/db-eqiad.php: Pool db2111 and db2113 into s5 T222772 (duration: 00m 50s)
  • 05:19 marostegui: Stop MySQL on db1083 to clone db1134
  • 05:17 marostegui@deploy1001: Synchronized wmf-config/db-eqiad.php: Depool db1083 (duration: 00m 50s)
  • 05:00 mobrovac: bootstrap 1021-a - T219404

2019-05-16

  • 21:02 Jeff_Green: authdns-update to switch payments.wikimedia.org back to eqiad cluster
  • 19:24 onimisionipe: pooling elastic2038 - shards are properly balanced across nodes
  • 18:31 onimisionipe: depooling elastic2038 to investigate more
  • 17:26 jbond@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
  • 17:26 jbond@cumin1001: START - Cookbook sre.hosts.downtime
  • 17:26 jbond42: reboot ores1007-1009
  • 17:15 jbond@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
  • 17:15 jbond@cumin1001: START - Cookbook sre.hosts.downtime
  • 17:15 jbond42: reboot ores1005-1006
  • 17:10 jbond@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
  • 17:10 jbond@cumin1001: START - Cookbook sre.hosts.downtime
  • 17:10 jbond42: reboot ores1003-1004
  • 17:05 jbond@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
  • 17:05 jbond@cumin1001: START - Cookbook sre.hosts.downtime
  • 17:05 jbond42: reboot ores1001-1002
  • 17:00 jbond42: reboot orespoolcounter[12]002
  • 16:53 jbond@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
  • 16:53 jbond@cumin1001: START - Cookbook sre.hosts.downtime
  • 16:53 jbond@cumin1001: END (FAIL) - Cookbook sre.hosts.downtime (exit_code=99)
  • 16:53 jbond@cumin1001: START - Cookbook sre.hosts.downtime
  • 16:53 jbond@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
  • 16:52 jbond@cumin1001: START - Cookbook sre.hosts.downtime
  • 16:52 jbond@cumin1001: END (FAIL) - Cookbook sre.hosts.downtime (exit_code=99)
  • 16:52 jbond@cumin1001: START - Cookbook sre.hosts.downtime
  • 16:51 jbond42: reboot orespoolcounter[12]001
  • 16:44 jbond42: reboot ores2008-2009
  • 16:38 jbond42: will frist reboot ores2006-2007
  • 16:36 jbond@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
  • 16:36 jbond@cumin1001: START - Cookbook sre.hosts.downtime
  • 16:36 jbond42: reboot ores2006-2009
  • 16:28 jbond@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
  • 16:28 jbond@cumin1001: START - Cookbook sre.hosts.downtime
  • 16:28 jbond42: reboot ores2003-2005
  • 16:22 XioNoX: add BGP session to Hetzner in AMS-IX
  • 16:19 akosiaris: switch all etcd* kubestagetcd* servers from "drbd" ganeti disk template to "plain" ganeti disk template
  • 16:17 jbond42: reboot ores2001-2002
  • 16:16 jbond@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
  • 16:16 jbond@cumin1001: START - Cookbook sre.hosts.downtime
  • 15:59 akosiaris: build service-checker OCI container 0.0.2 with 0.1.5 service-checker version T220401
  • 15:49 jforrester@deploy1001: Synchronized php-1.34.0-wmf.5/extensions/CirrusSearch/includes/InterwikiSearcher.php: Hot-deploy CirrusSearch interwiki no result UBN T223449 (duration: 00m 49s)
  • 15:45 marostegui: Drop the following databases from tendril to recreated them with the right user: db1127,db1129,db1130, db1131, db1137,db1138
  • 15:35 jforrester@deploy1001: Synchronized php-1.34.0-wmf.5/includes/specials/pagers/ContribsPager.php: Hot-deploy Contribs getNamespaceInfo UBN fix T223440 (duration: 00m 53s)
  • 15:25 aborrero@puppetmaster1001: conftool action : set/pooled=yes; selector: name=labweb1001.wikimedia.org,service=labweb
  • 15:02 jbond@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
  • 15:02 jbond@cumin1001: START - Cookbook sre.hosts.downtime
  • 15:02 jbond42: rebooting aqs1009
  • 14:54 jbond@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
  • 14:54 jbond@cumin1001: START - Cookbook sre.hosts.downtime
  • 14:54 jbond42: rebooting aqs1008
  • 14:45 jbond@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
  • 14:45 jbond@cumin1001: START - Cookbook sre.hosts.downtime
  • 14:45 jbond42: rebooting aqs1007
  • 14:34 jbond@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
  • 14:34 jbond@cumin1001: START - Cookbook sre.hosts.downtime
  • 14:34 jbond42: rebooting aqs1006
  • 14:28 jbond42: rebooting aqs1005
  • 14:21 jbond@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
  • 14:21 jbond@cumin1001: START - Cookbook sre.hosts.downtime
  • 14:18 moritzm: powercycling mw2199, stuck during reboot
  • 14:08 jbond@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
  • 14:08 jbond@cumin1001: START - Cookbook sre.hosts.downtime
  • 14:07 jbond@cumin1001: END (FAIL) - Cookbook sre.hosts.downtime (exit_code=99)
  • 14:07 jbond@cumin1001: START - Cookbook sre.hosts.downtime
  • 14:07 jbond@cumin1001: END (FAIL) - Cookbook sre.hosts.downtime (exit_code=99)
  • 14:07 jbond@cumin1001: START - Cookbook sre.hosts.downtime
  • 13:57 marostegui: and recreate the following hosts in tendril: db2103,db2104,db2105,db2106,db2107,db2108,db2109,db2110,db2111,db2112,db2113,db2115,db2116,db2117,db2119 T222772
  • 13:50 jmm@cumin2001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
  • 13:50 jmm@cumin2001: START - Cookbook sre.hosts.downtime
  • 13:39 cmjohnson1: replacing pdu in rack B5 eqiad
  • 13:04 hashar@deploy1001: rebuilt and synchronized wikiversions files: all wikis to 1.34.0-wmf.5
  • 13:00 arturo: labweb1001 depooled
  • 12:59 mobrovac: bootstrap restbase1020-c - T219404
  • 12:21 godog: stop swift and rsync on ms-be10[16,17,18,32,33] for eqiad B5 pdu replacement - T223126
  • 12:03 jynus: stop and shutdown db1098,db1131,db1139 T223126
  • 11:56 jmm@cumin2001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
  • 11:55 jmm@cumin2001: START - Cookbook sre.hosts.downtime
  • 11:54 moritzm: rebooting mw app servers in codfw for kernel update
  • 11:32 hoo@deploy1001: Synchronized wmf-config/extension-list: Add EntitySchema to extension-list (T221650) (duration: 00m 56s)
  • 11:22 jynus@deploy1001: Synchronized wmf-config/db-eqiad.php: Depool db1098 & db1131 for maintenance (duration: 00m 57s)
  • 11:00 arturo: T223148 downtime cloudvirt[1014,1028].eqiad.wmnet and labweb1001.wikimedia.org for 8 hours
  • 11:00 aborrero@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
  • 11:00 aborrero@cumin1001: START - Cookbook sre.hosts.downtime
  • 11:00 aborrero@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
  • 11:00 aborrero@cumin1001: START - Cookbook sre.hosts.downtime
  • 11:00 aborrero@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
  • 11:00 aborrero@cumin1001: START - Cookbook sre.hosts.downtime
  • 10:50 godog: bootstrap restbase1020-b - T219404
  • 10:27 jiji@deploy1001: Finished deploy [cpjobqueue/deploy@4d55dff]: Migrating updateBetaFeaturesUserCounts to PHP7 - T219148 (duration: 01m 07s)
  • 10:26 jiji@deploy1001: Started deploy [cpjobqueue/deploy@4d55dff]: Migrating updateBetaFeaturesUserCounts to PHP7 - T219148
  • 08:52 akosiaris: upgrade mathoid to statsd_exporter 0.9 T220709
  • 08:48 akosiaris@deploy1001: scap-helm mathoid finished
  • 08:48 akosiaris@deploy1001: scap-helm mathoid cluster codfw completed
  • 08:48 akosiaris@deploy1001: scap-helm mathoid cluster eqiad completed
  • 08:48 akosiaris@deploy1001: scap-helm mathoid upgrade -f mathoid-values.yaml production stable/mathoid [namespace: mathoid, clusters: eqiad,codfw]
  • 08:47 akosiaris@deploy1001: scap-helm mathoid upgrade -f mathoid-values.yaml [namespace: mathoid, clusters: eqiad,codfw]
  • 08:37 godog: bootstrap restbase1020-a - T219404
  • 08:32 elukey: depool/restart-nutcracker-pool mw1293/1313 - T214275
  • 08:22 elukey: depool/restart-nutcracker-pool mw1238 - T214275
  • 08:03 marostegui@deploy1001: Synchronized wmf-config/db-eqiad.php: Fully repool db1104 (duration: 00m 56s)
  • 07:57 moritzm: installing linux 4.9.168-1+deb9u2~deb8u1 kernel on jessie hosts (no reboots, just installing the new package)
  • 07:45 moritzm: removed intel-microcode 3.20180807a from jessie-wikimedia (superceded by newer version in security.debian.org, which doesn't get picked up by apt due to the higher apr priority of jessie-wikimedia)
  • 07:44 marostegui@deploy1001: Synchronized wmf-config/db-eqiad.php: Repool db1104 into API (duration: 00m 56s)
  • 07:25 marostegui@deploy1001: Synchronized wmf-config/db-eqiad.php: Slowly repool db1104 (duration: 00m 57s)
  • 06:59 moritzm: installing intel-microcode updates
  • 05:34 elukey: roll restart of nutcracker on mw2* to pick up new config changes (no more memcached config) - T214275
  • 05:33 marostegui: Stop MySQL on db1104 to clone db1126
  • 05:29 marostegui@deploy1001: Synchronized wmf-config/db-eqiad.php: Depool db1104 (duration: 00m 56s)
  • 05:18 marostegui@deploy1001: Synchronized wmf-config/db-codfw.php: Pool db2106, db2110, db2119 into s4 - T222772 (duration: 00m 56s)
  • 05:17 marostegui@deploy1001: Synchronized wmf-config/db-eqiad.php: Pool db2106, db2110, db2119 into s4 - T222772 (duration: 00m 58s)
  • 02:27 onimisionipe: pooling elastic2038 after unbanning - T217398

2019-05-15

  • 22:16 mutante: phab1003 - start ssh-phab service after adding service IPs
  • 22:01 eileen: civicrm update - lost the commit versions but 5.13.4 release
  • 21:47 mutante: phab1003 - ip -6 addr del 2620:0:861:ed1a::3:16/128 dev lo - remove extra service IP for phab's separate sshd, duplicated with phab1001 (T190568)
  • 21:24 jforrester@deploy1001: Synchronized wmf-config/MetaContactPages.php: Add movecomsignup contact page on meta T218363 (duration: 00m 56s)
  • 21:23 eileen: civicrm revision changed from 7d3ef1f2ae to c69c6e2e6a, config revision is a099f13a55
  • 21:00 fdans@deploy1001: Finished deploy [analytics/refinery@ffa4931]: deploying analytics refinery (duration: 15m 31s)
  • 20:45 tgr@deploy1001: Finished deploy [proton/deploy@9373c42]: Add gistcdn.githack.com to host blacklist (T213362) (duration: 02m 41s)
  • 20:45 fdans@deploy1001: Started deploy [analytics/refinery@ffa4931]: deploying analytics refinery
  • 20:42 tgr@deploy1001: Started deploy [proton/deploy@9373c42]: Add gistcdn.githack.com to host blacklist (T213362)
  • 20:20 robh: rebooting cloudvirt1015 into dell hardware tests per T220853
  • 20:18 arlolra@deploy1001: Finished deploy [parsoid/deploy@8f28977]: Updating Parsoid to 6658cad (duration: 06m 23s)
  • 20:12 arlolra@deploy1001: Started deploy [parsoid/deploy@8f28977]: Updating Parsoid to 6658cad
  • 19:42 hashar: group 1 promoted to 1.34.0-wmf.5 apparently without any issue # T220730
  • 19:03 hashar@deploy1001: Synchronized php: group1 wikis to 1.34.0-wmf.5 (duration: 00m 58s)
  • 19:02 hashar@deploy1001: rebuilt and synchronized wikiversions files: group1 wikis to 1.34.0-wmf.5
  • 18:38 andyrussg@deploed php-1.34.0-wmf.5/extensions/CentralNotice/: Revert CentralNotice (duration: 01m 00s)
  • 17:32 thcipriani: deploy1001:sudo -u www-data /usr/local/bin/foreachwiki extensions/WikimediaMaintenance/refreshMessageBlobs.php
  • 17:19 onimisionipe: unban elastic2038 from shard allocation - T217398
  • 17:19 XenoRyet: updated civicrm from 4b6d569383 to 7d3ef1f2ae
  • 17:09 elukey: powerup elastic2038 (was down for maintenance)
  • 17:01 godog: bootstrap restbase1019-c - T219404
  • 16:58 bstorm_: T212972 updated all views on labsdb1012
  • 16:50 elukey: restart Hadoop HDFS namenodes on an-master100[1,2] to pick up new settings
  • 16:40 urandom: bootstrap restbase1019-c - T219404
  • 16:28 elukey: restart nutcracker on mw2240 to pick up the new config (no more memcached settings)
  • 16:26 bstorm_: T212972 updated all views on labsdb1009
  • 16:17 reedy@deploy1001: Synchronized wmf-config/InitialiseSettings.php: T223166 (duration: 00m 56s)
  • 16:16 reedy@deploy1001: Synchronized php-1.34.0-wmf.5/extensions/WikimediaEvents/: T219128 (duration: 01m 13s)
  • 16:14 reedy@deploy1001: Synchronized php-1.34.0-wmf.4/extensions/WikimediaEvents/: T219128 (duration: 01m 06s)
  • 16:03 jynus: disable puppet on all production databases
  • 15:21 reedy@deploy1001: Synchronized php-1.34.0-wmf.4/extensions/GrowthExperiments/includes/HelpPanel/QuestionPoster.php: T222980 (duration: 00m 57s)
  • 14:28 andrewbogott: repooling labweb1002
  • 14:16 andrewbogott: depooling labweb1002 to test https://gerrit.wikimedia.org/r/#/c/operations/puppet/+/509916/
  • 14:15 godog: bootstrap restbase1019-b - T219404
  • 13:21 anomie@deploy1001: Synchronized wmf-config/InitialiseSettings.php: Setting actor migration to write-new/read-new on testwikis and mediawikiwiki (T188327) (duration: 00m 57s)
  • 12:22 Lucas_WMDE: EU SWAT done
  • 12:20 lucaswerkmeister-wmde@deploy1001: Synchronized php-1.34.0-wmf.4/extensions/VisualEditor/: SWAT: VisualEditorHooks: Use isVisualAvailable() when changing tabs/editsections|gerrit:510217VisualEditorHooks: Use isVisualAvailable() when changing tabs/editsections + DesktopArticleTarget.init: Allow veaction=edit to override namespace settings (T221892)|gerrit:510218DesktopArticleTarget.init: Allow veaction=edit to override namespace settings (T221892) (duration: 01m 15s)
  • 12:20 akosiaris: depool esams, network issues
  • 11:47 akosiaris@deploy1001: scap-helm mathoid finished
  • 11:47 akosiaris@deploy1001: scap-helm mathoid cluster staging completed
  • 11:46 akosiaris@deploy1001: scap-helm mathoid upgrade --wait -f mathoid-staging-values.yaml staging stable/mathoid [namespace: mathoid, clusters: staging]
  • 11:41 akosiaris@deploy1001: scap-helm citoid finished
  • 11:41 akosiaris@deploy1001: scap-helm citoid cluster eqiad completed
  • 11:41 akosiaris@deploy1001: scap-helm citoid upgrade -f citoid-eqiad-values.yaml production stable/citoid [namespace: citoid, clusters: eqiad]
  • 11:32 akosiaris@deploy1001: scap-helm citoid finished
  • 11:32 akosiaris@deploy1001: scap-helm citoid cluster codfw completed
  • 11:31 akosiaris@deploy1001: scap-helm citoid upgrade -f citoid-codfw-values.yaml production stable/citoid [namespace: citoid, clusters: codfw]
  • 11:31 godog: bootstrap restbase1019-a - T219404
  • 11:29 akosiaris: upgrade to statsd_export 0.9 for citoid T220709
  • 11:27 akosiaris@deploy1001: scap-helm citoid finished
  • 11:27 akosiaris@deploy1001: scap-helm citoid cluster staging completed
  • 11:27 akosiaris@deploy1001: scap-helm citoid upgrade -f citoid-staging-values.yaml staging stable/citoid [namespace: citoid, clusters: staging]
  • 10:31 elukey: superset.wikimedia.org moved to analytics-tool1004 (Buster + python 3.7 + Superset 0.32 upgrade)
  • 10:27 moritzm: installing linux 4.9.168-1+deb9u2 kernel on stretch hosts (no reboots, just installing the new package)
  • 10:04 elukey@deploy1001: Finished deploy [analytics/superset/deploy@9cdb9c5]: Superset 0.32 - update pyhive dependency (duration: 00m 26s)
  • 10:04 elukey@deploy1001: Started deploy [analytics/superset/deploy@9cdb9c5]: Superset 0.32 - update pyhive dependency
  • 09:33 hashar: Disable CI castor cache system since the instance is being migrated. Some / most CI jobs might have failed for the last 20 minutes or so T223148
  • 08:45 elukey@deploy1001: Finished deploy [analytics/superset/deploy@31c2c30]: Superset 0.32 (duration: 00m 26s)
  • 08:44 elukey@deploy1001: Started deploy [analytics/superset/deploy@31c2c30]: Superset 0.32
  • 08:36 elukey: stop superset on analytics-tool1003 as prep step for the migration to the new host - T212243
  • 08:31 moritzm: rebooting mw2164
  • 07:33 elukey: restart nutcracker on mw2245 to pick up config changes (removal of memcached config)
  • 07:29 elukey: powercycle an-worker1094 (OEM event occurred, checking if temporary)
  • 07:21 oblivian@deploy1001: Synchronized wmf-config/InitialiseSettings.php: Remove the php7 beta feature T219128 (duration: 00m 59s)
  • 06:24 elukey: force remount of /mnt/hdfs on stat1007 - fuse hdfs stuck
  • 01:40 eileen: process control updated - omnigroupmember.load re-enabled
  • 01:39 eileen: civicrm revision changed from 5024c968ed to 4b6d569383, config revision is a099f13a55

2019-05-14

  • 20:44 herron@deploy1001: Finished deploy [logstash/plugins@7fb8843]: adding logstash-filter-truncate plugin (duration: 00m 07s)
  • 20:43 herron@deploy1001: Started deploy [logstash/plugins@7fb8843]: adding logstash-filter-truncate plugin
  • 20:41 herron@deploy1001: Finished deploy [logstash/plugins@7fb8843]: (no justification provided) (duration: 00m 01s)
  • 20:41 herron@deploy1001: Started deploy [logstash/plugins@7fb8843]: (no justification provided)
  • 20:13 chaomodus: restarting gerrit on cobalt to pick up metrics export changes
  • 19:37 herron: adding logstash filter truncate plugin to prod logstash collectors
  • 19:28 gehel: shutting down elastic2038 for memory replacement - T217398
  • 19:25 gehel: ban elastic2038 from elasticsearch cluster for memory replacement - T217398
  • 18:21 mutante: mwmaint1002 - deleting /root/home-mwmaint2001 to save space - confirmed we have bacula backups of home on mwmaint2001
  • 17:55 mutante: elastic2029 - enable puppet agent - was disabled without reason and nobody seems to have logged in recently
  • 17:54 mutante: elastic2038 - restart nagios-nrpe-server - attempt to fix "CHECK_NRPE STATE UNKNOWN" for a single check
  • 17:32 mutante: contint1001 - mkdir /srv/zuul-logs ; mv /var/log/zuul/debug.log* /srv/zuul-logs/ to prevent CI running out of disk again (T207707)
  • 17:22 mbsantos@deploy1001: Finished deploy [proton/deploy@881b22b]: Update chromium-render to 8cc96e7 make timeout handler more robust (T217724) (duration: 02m 23s)
  • 17:20 mbsantos@deploy1001: Started deploy [proton/deploy@881b22b]: Update chromium-render to 8cc96e7 make timeout handler more robust (T217724)
  • 16:30 jynus: stop replication and start table recompression on labsdb1009 T222978
  • 16:22 godog: statsd_exporter 0.9 upgrade on thumbor - T220709
  • 16:04 gilles@deploy1001: Finished deploy [performance/coal@5a32eb2]: T221401 (duration: 00m 06s)
  • 16:04 gilles@deploy1001: Started deploy [performance/coal@5a32eb2]: T221401
  • 15:56 jforrester@deploy1001: Synchronized php-1.34.0-wmf.4/extensions/VisualEditor/includes/ApiVisualEditor.php: Hot-deploy VE unset variable fix T223281 (duration: 00m 55s)
  • 15:51 jforrester@deploy1001: Synchronized php-1.34.0-wmf.5/extensions/VisualEditor/includes/ApiVisualEditor.php: Hot-deploy VE unset variable fix T223281 (duration: 00m 57s)
  • 15:49 crusnov@deploy1001: Finished deploy [netbox/deploy@81059c6]: Deploy new reqs for reports (duration: 00m 55s)
  • 15:49 crusnov@deploy1001: Started deploy [netbox/deploy@81059c6]: Deploy new reqs for reports
  • 15:43 jynus: reload haproxy config @ dbproxy1010, dbproxy1011
  • 15:38 XioNoX: re-activate bgp to telia on cr1-codfw - T222967
  • 15:33 XioNoX: deactivate bgp to telia on cr1-codfw - T222967
  • 15:19 papaul: shutting down elastic2038 for memory replacement
  • 15:14 hashar: mw1263: scap pull
  • 14:53 hashar@deploy1001: rebuilt and synchronized wikiversions files: group0 to 1.34.0-wmf.5
  • 14:50 moritzm: rebooting mw1263 for kernel update
  • 14:47 hashar@deploy1001: Finished scap: testwiki to 1.34.0-wmf.5 and rebuild l10n cache (duration: 62m 47s)
  • 14:07 _joe_: apt-get lean on mwmaint1002
  • 13:44 hashar@deploy1001: Started scap: testwiki to 1.34.0-wmf.5 and rebuild l10n cache
  • 13:44 godog: rearm keyholder on deploy and cumin hosts
  • 13:27 hashar@deploy1001: Finished scap: testwiki to 1.34.0-wmf.5 and rebuild l10n cache (duration: 14m 39s)
  • 13:12 hashar: train delay, I forgot to sync 1.34.0-wmf.5
  • 13:12 hashar@deploy1001: Started scap: testwiki to 1.34.0-wmf.5 and rebuild l10n cache
  • 12:37 jforrester@deploy1001: Synchronized php-1.34.0-wmf.4/extensions/VisualEditor/: Hot-deploy T223023 fix I1b35b28e42 for mobile VE edit section switches (duration: 00m 54s)
  • 12:10 moritzm: rebooting mw2164 for kernel update
  • 11:33 hashar@deploy1001: Pruned MediaWiki: 1.33.0-wmf.24 (duration: 03m 20s)
  • 11:30 hashar: Deleting 1.33.0-wmf.24 from deploy1001 # T220730
  • 11:28 kart_: EU-Mid day SWAT Done.
  • 11:25 kartik@deploy1001: Synchronized wmf-config/InitialiseSettings.php: SWAT Decrease idwiki MT thresold for publishing|gerrit:508818Decrease idwiki MT thresold for publishing (T222782) (duration: 00m 51s)
  • 11:23 hashar@deploy1001: clean aborted: Pruned MediaWiki: 1.33.0-wmf.23 (duration: 14m 31s)
  • 11:23 jbond42: cumin1001 ~ % sudo cumin A:all '/usr/local/sbin/run-puppet-agent --failed-only
  • 11:18 jbond42: enable puppet issue fixed https://gerrit.wikimedia.org/r/c/operations/puppet/+/510131
  • 11:12 ema: pool cp3036 reimaged to ATS T222937
  • 11:09 hashar: Deleting 1.33.0-wmf.23 from deploy1001 # T220730
  • 11:09 jbond42: disable puppet
  • 10:58 hashar: scap prep 1.34.0-wmf.5 # T220730
  • 10:16 hashar: Cutting branches for 1.34.0-wmf.5
  • 10:01 ema: depool cp3036 and reimage as upload_ats T222937
  • 09:55 marostegui@deploy1001: Synchronized wmf-config/db-eqiad.php: Remove db2034 from config T219493 (duration: 00m 49s)
  • 09:53 marostegui@deploy1001: scap failed: average error rate on 4/11 canaries increased by 10x (rerun with --force to override this check, see https://logstash.wikimedia.org/goto/db09a36be5ed3e81155041f7d46ad040 for details)
  • 09:52 marostegui: Remove db2034 from tendril and zarcillo - T219493
  • 09:51 marostegui@deploy1001: Synchronized wmf-config/db-codfw.php: Remove db2034 from config T219493 (duration: 00m 50s)
  • 09:34 jynus: restart apache on ununpentium
  • 09:29 marostegui: Parsercache deployment window FINISHED
  • 09:27 marostegui@deploy1001: Synchronized wmf-config/db-eqiad.php: Deploy second parsercache key change everywhere after deploying it in batches first T210725 (duration: 00m 50s)
  • 09:15 godog: statsd_exporter 0.9 upgrade on ores - T220709
  • 09:02 godog: statsd_exporter 0.9 upgrade on logstash - T220709
  • 08:53 jynus: failing connections over dbproxy1006 to dbproxy1001
  • 07:48 moritzm: installing bind security updates for stretch (only client-side tools/libraries in use)
  • 06:45 ema: cp-ats: upgrade trafficserver to 8.0.3-1wm2
  • 06:20 ema: cp4021: upgrade trafficserver to 8.0.3-1wm2
  • 06:15 ema: upload trafficserver 8.0.3-1wm2 to stretch-wikimedia
  • 06:02 marostegui: Deploy parsercache change to eqiad canaries - T210725
  • 06:01 marostegui: Lock wmf-config deployment on deploy1001 to slowly change parsercache key on eqiad - T210725
  • 06:01 marostegui@deploy1001: Synchronized wmf-config/db-codfw.php: Change parsercache on codfw T210725 (duration: 00m 54s)
  • 01:55 mutante: re-scheduled nginx / HTTP availability icinga checks
  • 01:42 mutante: cumin -b 6 'R:git::clone' 'run-puppet-agent -q --failed-only'
  • 01:37 mutante: restarting Gerrit to apply 2 config changes - disable DNS reverse lookup (gerrit:508127) & list projects from index (gerrit:508892) - removes blockers for 2.16 upgrade (T200739)
  • 00:32 mutante: restarting wikibugs because it left some channels

2019-05-13

  • 20:29 ejegg: updated payments-wiki from 6e0172bac3 to 8397ccf9cc
  • 20:24 halfak@deploy1001: Finished deploy [ores/deploy@c17a1a2]: T202202 (duration: 04m 16s)
  • 20:20 halfak@deploy1001: Started deploy [ores/deploy@c17a1a2]: T202202
  • 20:19 ariel@deploy1001: Finished deploy [dumps/dumps@941d374]: lbzip2 decompression for 7z file production for big wikis (duration: 00m 03s)
  • 20:19 ariel@deploy1001: Started deploy [dumps/dumps@941d374]: lbzip2 decompression for 7z file production for big wikis
  • 20:04 halfak@deploy1001: Started deploy [ores/deploy@c17a1a2]: T202202
  • 18:29 otto@deploy1001: Synchronized wmf-config/InitialiseSettings.php: re-sync: re-enabling all eventgate-analytics monolog events - T222962 (duration: 00m 49s)
  • 18:28 ejegg: updated SmashPig standalone deploy 22b6982 Try turning off WSDL caching for Adyen
  • 18:25 reedy@deploy1001: Synchronized wmf-config/CommonSettings.php: T222954 (duration: 00m 49s)
  • 18:19 otto@deploy1001: Synchronized wmf-config/InitialiseSettings.php: re-enabling all eventgate-analytics monolog events - T222962 (duration: 00m 50s)
  • 18:17 ottomata: re-enabling all eventgate-analytics monolog events - T222962
  • 18:12 reedy@deploy1001: Synchronized wmf-config/InitialiseSettings.php: T223006 T222740 T222044 (duration: 00m 49s)
  • 18:07 otto@deploy1001: scap-helm eventgate-analytics finished
  • 18:07 otto@deploy1001: scap-helm eventgate-analytics cluster eqiad completed
  • 18:07 otto@deploy1001: scap-helm eventgate-analytics upgrade analytics -f analytics/eqiad-values.yaml --reset-values stable/eventgate [namespace: eventgate-analytics, clusters: eqiad]
  • 18:05 otto@deploy1001: scap-helm eventgate-analytics finished
  • 18:05 otto@deploy1001: scap-helm eventgate-analytics cluster codfw completed
  • 18:04 otto@deploy1001: scap-helm eventgate-analytics upgrade analytics -f analytics/codfw-values.yaml --reset-values stable/eventgate [namespace: eventgate-analytics, clusters: codfw]
  • 18:03 fsero: deleting eventgate-analytics-production releases on codfw
  • 18:01 otto@deploy1001: scap-helm eventgate-analytics finished
  • 18:01 otto@deploy1001: scap-helm eventgate-analytics cluster staging completed
  • 18:01 otto@deploy1001: scap-helm eventgate-analytics install -n analytics -f analytics/staging-values.yaml stable/eventgate [namespace: eventgate-analytics, clusters: staging]
  • 17:57 fsero: deleting eventgate-analytics and eventgate-analytics-staging releases on staging
  • 17:41 otto@deploy1001: Synchronized wmf-config/InitialiseSettings.php: retry - disabling all eventgate-analytics monolog events for eventgate chart migration - T222962 (duration: 00m 50s)
  • 17:11 otto@deploy1001: Synchronized wmf-config/InitialiseSettings.php: disabling all eventgate-analytics monolog events for eventgate chart migration - T222962 (duration: 00m 50s)
  • 17:10 ottomata: disabling all eventgate-analytics monolog events for eventgate chart migration - T222962
  • 16:14 Amir1: removing tokipona language terms from items using maintenance script (T200432)
  • 16:00 andrewbogott: reimaging clouvirt1024 (for the last time I hope)
  • 14:33 otto@deploy1001: Synchronized wmf-config/ProductionServices.php: no-op in prod - Configure eventgate services in beta (duration: 00m 49s)
  • 14:32 otto@deploy1001: Synchronized wmf-config/LabsServices.php: no-op in prod - Configure eventgate services in beta (duration: 00m 49s)
  • 14:05 moritzm: uploaded puppet 4.8.2-5+wmf1 to component/puppetdb4 for apt.wikimedia.org/stretch-wikimedia (T219803)
  • 14:00 elukey: roll restart of aqs on aqs1* to pick up new druid settings
  • 13:50 cdanis: cdanis@cumin1001.eqiad.wmnet ~ % sudo cumin -p95 -b8 'ms-fe2*' 'run-puppet-agent'
  • 13:46 moritzm: updating puppet on deployment-puppetmaster03 to 4.8.2-5+wmf1 (T219803)
  • 13:39 akosiaris: bump eventgate-analytics chart to 0.0.36. Renames nodejs GC stats to microseconds and bumps the biggest bucket to 100ms. T220709
  • 13:38 akosiaris@deploy1001: scap-helm eventgate-analytics finished
  • 13:38 akosiaris@deploy1001: scap-helm eventgate-analytics cluster staging completed
  • 13:38 akosiaris@deploy1001: scap-helm eventgate-analytics upgrade -f eventgate-analytics-staging-values.yaml staging stable/eventgate-analytics [namespace: eventgate-analytics, clusters: staging]
  • 13:38 akosiaris@deploy1001: scap-helm eventgate-analytics finished
  • 13:38 akosiaris@deploy1001: scap-helm eventgate-analytics cluster eqiad completed
  • 13:38 akosiaris@deploy1001: scap-helm eventgate-analytics upgrade -f eventgate-analytics-eqiad-values.yaml production stable/eventgate-analytics [namespace: eventgate-analytics, clusters: eqiad]
  • 13:38 akosiaris@deploy1001: scap-helm eventgate-analytics finished
  • 13:38 akosiaris@deploy1001: scap-helm eventgate-analytics cluster codfw completed
  • 13:38 akosiaris@deploy1001: scap-helm eventgate-analytics upgrade -f eventgate-analytics-codfw-values.yaml production stable/eventgate-analytics [namespace: eventgate-analytics, clusters: codfw]
  • 13:36 anomie@deploy1001: Synchronized wmf-config/InitialiseSettings.php: Setting actor migration to write-both/read-new on all wikis (T188327) (duration: 00m 50s)
  • 13:30 cdanis: cdanis@cumin1001.eqiad.wmnet ~ % sudo cumin -p95 -b8 'ms-be2*' 'run-puppet-agent'
  • 13:29 cdanis: swift codfw-prod: deploy I1035824d
  • 13:25 moritzm: uploaded puppetdb 4.4.0-1~wmf2 to component/puppetdb4 for apt.wikimedia.org/stretch-wikimedia (T219803)
  • 13:07 akosiaris: bump cxserver chart to 0.0.7. Renames nodejs GC stats to microseconds and bumps the biggest bucket to 100ms. T220709
  • 13:06 akosiaris@deploy1001: scap-helm cxserver finished
  • 13:06 akosiaris@deploy1001: scap-helm cxserver cluster staging completed
  • 13:06 akosiaris@deploy1001: scap-helm cxserver upgrade -f cxserver-staging-values.yaml staging stable/cxserver [namespace: cxserver, clusters: staging]
  • 13:06 akosiaris@deploy1001: scap-helm cxserver finished
  • 13:06 akosiaris@deploy1001: scap-helm cxserver cluster eqiad completed
  • 13:06 akosiaris@deploy1001: scap-helm cxserver upgrade -f cxserver-eqiad-values.yaml production stable/cxserver [namespace: cxserver, clusters: eqiad]
  • 13:06 akosiaris@deploy1001: scap-helm cxserver finished
  • 13:06 akosiaris@deploy1001: scap-helm cxserver cluster codfw completed
  • 13:05 akosiaris@deploy1001: scap-helm cxserver upgrade -f cxserver-codfw-values.yaml production stable/cxserver [namespace: cxserver, clusters: codfw]
  • 13:04 arturo: install libjs-jquery from stretch in cloudnet servers T222862
  • 13:03 arturo: enable puppet in cloudvirt1024 to refresh some apt config T222862
  • 12:50 moritzm: updating puppetdb on deployment-puppetdb02 to 4.4.0-1~wmf2 (T219803)
  • 12:36 cdanis: root@ms-be2013.codfw.wmnet ~ # umount /srv/swift-storage/sda1 && mount /srv/swift-storage/sda1 && umount /srv/swift-storage/sdb1 && mount /srv/swift-storage/sdb1
  • 12:36 krinkle@deploy1001: Synchronized php-1.34.0-wmf.4/resources/src/startup/startup.js: I76a2c8d52fa (duration: 00m 51s)
  • 12:33 cdanis: root@ms-be2013.codfw.wmnet ~ # mount /srv/swift-storage/sdf1
  • 12:25 cdanis: cdanis@ms-be2015.codfw.wmnet ~ % sudo umount /srv/swift-storage/sdl1 && sudo mount /srv/swift-storage/sdl1
  • 12:25 cdanis: cdanis@ms-be2015.codfw.wmnet ~ % sudo umount /srv/swift-storage/sdf1 && sudo mount /srv/swift-storage/sdf1
  • 12:18 cdanis: cdanis@ms-be2015.codfw.wmnet /var/log % sudo mount /srv/swift-storage/sda1
  • 12:08 reedy@deploy1001: Synchronized php-1.34.0-wmf.4/extensions/Wikibase/lib/includes/Formatters/CachingKartographerEmbeddingHandler.php: T223085 (duration: 00m 50s)
  • 11:59 reedy@deploy1001: Synchronized php-1.34.0-wmf.4/composer.json: T215746 (duration: 00m 49s)
  • 11:58 reedy@deploy1001: Synchronized php-1.34.0-wmf.4/vendor/: T215746 (duration: 01m 30s)
  • 11:43 reedy@deploy1001: Synchronized php-1.34.0-wmf.4/extensions/VisualEditor/: T222639 (duration: 00m 52s)
  • 11:04 ema: cp-ats rolling restart to apply https://gerrit.wikimedia.org/r/#/c/operations/puppet/+/509456/
  • 10:39 jforrester@deploy1001: Synchronized php-1.34.0-wmf.4/includes/http/HttpRequestFactory.php: T222935 Hot-deploy fix for HttpRequestFactory (duration: 00m 50s)
  • 10:38 jbond42: update puppet5 and facter3 in eqiad
  • 10:17 vgutierrez: rebooting cloudvirt1024 - T209707
  • 09:40 marostegui@deploy1001: Synchronized wmf-config/db-eqiad.php: Depool db1064 T217396 (duration: 00m 49s)
  • 09:33 hashar: Upgrading Zuul 2.5.1-wmf7 -> 2.5.1-wmf9 T105474
  • 07:27 marostegui@deploy1001: Synchronized wmf-config/db-eqiad.php: Fully pool db1130 (s5) and db1138 (s4) T222682 (duration: 00m 50s)
  • 07:08 elukey: slow roll restart of celery on ores* nodes to allow cores to be generated upon segfault - T222866
  • 07:05 marostegui@deploy1001: Synchronized wmf-config/db-eqiad.php: More traffic for db1130 (s5) and db1138 (s4) T222682 (duration: 00m 50s)
  • 06:53 moritzm: installing ghostscript security updates
  • 06:44 marostegui@deploy1001: Synchronized wmf-config/db-eqiad.php: More traffic for db1130 (s5) and db1138 (s4) T222682 (duration: 00m 49s)
  • 06:09 marostegui: Compress s2, s6 and s7 on labsdb1012 - T222978
  • 05:50 marostegui@deploy1001: Synchronized wmf-config/db-eqiad.php: More traffic for db1130 (s5) and db1138 (s4) T222682 (duration: 00m 49s)
  • 05:41 marostegui: Optimize tables on pc2007
  • 05:18 marostegui@deploy1001: Synchronized wmf-config/db-eqiad.php: Pool db1130 into s5 and db1138 into s4 T222682 (duration: 00m 49s)
  • 05:17 marostegui@deploy1001: Synchronized wmf-config/db-codfw.php: Pool db1130 into s5 and db1138 into s4 T222682 (duration: 00m 51s)

2019-05-12

  • 15:32 elukey: rollback python-kafka one eventlog1002 to 1.4.1-1~stretch1 - T222941
  • 12:14 elukey: restart eventlogging on eventlog1002 - all processors stuck due to kafka python (T222941)
  • 05:31 marostegui: DIsable notifications for db1116:s8 Slave LAG check as this is a snapshot source

2019-05-11

  • 18:26 reedy@deploy1001: Synchronized wmf-config/interwiki.php: Update interwiki cache (duration: 02m 57s)
  • 06:37 elukey: restart eventlogging on eventlog1002 - huge kafka consumer lag accumulated (T222941)
  • 02:01 mutante: actinium - low disk space - apt-get clean - gzip /var/log/squid3/access.log.1

2019-05-10

  • 18:58 cdanis: cdanis@cumin1001.eqiad.wmnet ~ % sudo cumin -b 15 -p 95 '*' 'run-puppet-agent -q --failed-only'
  • 18:51 cdanis: cdanis@cumin1001.eqiad.wmnet ~ % sudo cumin -b 15 -p 95 '*' 'run-puppet-agent -q --failed-only'
  • 18:49 cdanis: cdanis@cumin1001.eqiad.wmnet ~ % sudo cumin '*' 'enable-puppet "Puppet breakages on all hosts -- cdanis"'
  • 18:39 cdanis: cdanis@cumin1001.eqiad.wmnet ~ % sudo cumin '*' 'disable-puppet "Puppet breakages on all hosts -- cdanis"'
  • 16:50 reedy@deploy1001: Synchronized dblists/: Update size related dblists (duration: 00m 49s)
  • 16:31 ebernhardson: drop archive indices from cloudelastic
  • 16:11 ariel@deploy1001: Finished deploy [dumps/dumps@70e8498]: look for dumpstatus json file per wiki run (duration: 00m 05s)
  • 16:11 ariel@deploy1001: Started deploy [dumps/dumps@70e8498]: look for dumpstatus json file per wiki run
  • 16:05 ejegg: moved adyen smashpig job runner to frdev1001
  • 15:25 _joe_: wiped opcache clean on all api, appservers
  • 15:05 cdanis: cdanis@mw1239.eqiad.wmnet ~ % sudo php7adm /opcache-free
  • 15:05 Krinkle: fix opcache krinkle@mw1268:~$ scap pull
  • 15:04 cdanis: cdanis@mw1268.eqiad.wmnet ~ % sudo php7adm /opcache-free
  • 15:03 Krinkle: ran 'scap pull' on mw1239.eqiad.wmnet to fix opcache corruption
  • 14:56 jbond42: uploade zuul_2.5.10-wmf9 to jessie-wikimedia
  • 14:54 krinkle@deploy1001: Synchronized wmf-config/CommonSettings.php: T99740 / d9dbecad9c7b (duration: 00m 51s)
  • 14:33 akosiaris@deploy1001: scap-helm eventgate-analytics finished
  • 14:32 akosiaris@deploy1001: scap-helm eventgate-analytics cluster staging completed
  • 14:32 akosiaris@deploy1001: scap-helm eventgate-analytics upgrade -f lala.yaml staging stable/eventgate-analytics [namespace: eventgate-analytics, clusters: staging]
  • 14:30 akosiaris@deploy1001: scap-helm eventgate-analytics finished
  • 14:30 akosiaris@deploy1001: scap-helm eventgate-analytics cluster eqiad completed
  • 14:30 akosiaris@deploy1001: scap-helm eventgate-analytics upgrade -f eventgate-analytics-eqiad-values.yaml production stable/eventgate-analytics [namespace: eventgate-analytics, clusters: eqiad]
  • 14:30 akosiaris@deploy1001: scap-helm eventgate-analytics finished
  • 14:30 akosiaris@deploy1001: scap-helm eventgate-analytics cluster codfw completed
  • 14:30 akosiaris@deploy1001: scap-helm eventgate-analytics upgrade -f eventgate-analytics-codfw-values.yaml production stable/eventgate-analytics [namespace: eventgate-analytics, clusters: codfw]
  • 13:30 ema: pool cp3038 w/ ATS backend T222937
  • 12:19 ema: depool cp3038 and reimage as upload_ats T222937
  • 11:52 jbond42: (un)load edac kernel modules on elastic1029 to test resetting counters
  • 11:04 jbond42: restart refinery-eventlogging-saltrotate on an-coord1001
  • 10:30 moritzm: installing symfony security updates
  • 09:17 jynus: disabling replication lag alerts for backup source hosts on s1, s4, s8 T206203
  • 07:14 moritzm: uploaded linux-meta 1.21 for jessie-wikimedia (pointing to the new -9 ABI introduced with the 4.9.168 kernel)
  • 07:12 marostegui@deploy1001: Synchronized wmf-config/db-eqiad.php: Fully repool db1100 into API (duration: 00m 50s)
  • 06:55 ema: swift-fe: rolling restart to enable ensure_max_age T222937
  • 06:40 marostegui@deploy1001: Synchronized wmf-config/db-eqiad.php: Repool db1100 into API (duration: 00m 50s)
  • 06:27 ema: ms-fe1005: pool with ensure_max_age T222937
  • 06:26 ariel@deploy1001: Finished deploy [dumps/dumps@6f9a5a4]: remove sleep between incr dumps of wikis (duration: 00m 05s)
  • 06:26 ariel@deploy1001: Started deploy [dumps/dumps@6f9a5a4]: remove sleep between incr dumps of wikis
  • 06:22 marostegui@deploy1001: Synchronized wmf-config/db-eqiad.php: Repool db1100 (duration: 00m 50s)
  • 06:18 ema: ms-fe1005: depool and test ensure_max_age T222937
  • 06:09 _joe_: depooling mw1261 for tests
  • 05:41 marostegui@deploy1001: Synchronized wmf-config/db-codfw.php: Pool db2105 db2109 into s3 T222772 (duration: 00m 49s)
  • 05:40 marostegui@deploy1001: Synchronized wmf-config/db-eqiad.php: Pool db2105 db2109 into s3 T222772 (duration: 00m 52s)
  • 05:40 elukey: execute kafka preferred-replica-election on kafka-jumbo1001 as attempt to rebalance traffic (1002 seems handling way more than others since some days)
  • 05:32 elukey: restart eventlogging daemons on eventlog1002 - kafka consumer errors in the logs, some lag built over time
  • 05:08 marostegui: Stop MySQL on db1100
  • 05:04 marostegui@deploy1001: Synchronized wmf-config/db-eqiad.php: Depool db1100 (duration: 00m 50s)
  • 04:56 marostegui@deploy1001: Synchronized wmf-config/db-codfw.php: Repool db2112 (duration: 00m 51s)
  • 00:15 smalyshev@deploy1001: Finished deploy [wdqs/wdqs@e13facb]: Downgrade LDF server back for T222471 (duration: 00m 37s)
  • 00:14 smalyshev@deploy1001: Started deploy [wdqs/wdqs@e13facb]: Downgrade LDF server back for T222471

2019-05-09

  • 23:52 ebernhardson@deploy1001: Synchronized wmf-config/InitialiseSettings.php: T220625: Dont write to private wikis on cloudelastic (duration: 00m 50s)
  • 23:48 ebernhardson@deploy1001: Synchronized php-1.34.0-wmf.4/extensions/CirrusSearch/: T220819 Uniquely identify connections in connection pool (duration: 00m 58s)
  • 23:43 ebernhardson@deploy1001: Synchronized php-1.34.0-wmf.4/extensions/CirrusSearch/: T220625 Limit the clusters archive index is written to (duration: 00m 59s)
  • 23:41 ebernhardson@deploy1001: Synchronized php-1.34.0-wmf.4/extensions/Wikibase/view/resources/jquery/wikibase/jquery.wikibase.entityselector.js: T172937 T222346 Revert Close entityselector after selecting exact match (duration: 00m 51s)
  • 23:24 chaomodus: spicerack upgraded to 0.0.25 on cumin1001 and cumin 2001
  • 22:58 volans: uploaded spicerack_0.0.25-1_amd64.deb to apt.wikimedia.org stretch-wikimedia
  • 22:57 bawolff: Manually cleared extdistributor cache T188692
  • 22:50 mutante: labweb1001/labweb1002 - remove "runJob" cron job from www-data's crontab, it is already also a systemd timer and puppet was meant to remove it (T222917)
  • 21:27 foks: change user email for Melamrawy (WMF)@global
  • 21:23 mholloway-shell@deploy1001: Synchronized wmf-config/InitialiseSettings.php: Enable WikipediaAppCaptionEditCounter (T222211) (duration: 00m 52s)
  • 19:56 thcipriani@deploy1001: rebuilt and synchronized wikiversions files: all wikis to 1.34.0-wmf.4
  • 19:28 XioNoX: renumber mr1-esams<->cr2-knams link to 91.198.174.224/31 - T211254
  • 19:24 XioNoX: renumber mr1-esams<->cr1-esams link to 91.198.174.240/31 - T211254
  • 18:22 XioNoX: simplify filter analytics-in4 term mysql-dbstore on cr1/2-eqiad
  • 16:17 marostegui@deploy1001: Synchronized wmf-config/db-eqiad.php: Restore original weight on db1084 (duration: 00m 59s)
  • 16:07 marostegui@deploy1001: Synchronized wmf-config/db-eqiad.php: More traffic to db1081 (duration: 01m 13s)
  • 15:55 marostegui@deploy1001: Synchronized wmf-config/db-eqiad.php: More traffic to db1081 (duration: 01m 01s)
  • 15:40 marostegui@deploy1001: Synchronized wmf-config/db-eqiad.php: Slowly repool db1081 (duration: 01m 00s)
  • 15:37 marostegui@deploy1001: Synchronized wmf-config/db-codfw.php: Depool db2112 (duration: 00m 59s)
  • 15:25 marostegui@deploy1001: Synchronized wmf-config/db-eqiad.php: Slowly repool db1081 (duration: 00m 56s)
  • 15:20 marostegui: Stop mysql on db2112 for onsite work
  • 15:16 otto@deploy1001: scap-helm eventgate-main finished
  • 15:16 otto@deploy1001: scap-helm eventgate-main cluster eqiad completed
  • 15:16 otto@deploy1001: scap-helm eventgate-main install -n main -f main/eqiad-values.yaml stable/eventgate [namespace: eventgate-main, clusters: eqiad]
  • 15:13 otto@deploy1001: scap-helm eventgate-main finished
  • 15:13 otto@deploy1001: scap-helm eventgate-main cluster codfw completed
  • 15:13 otto@deploy1001: scap-helm eventgate-main install -n main -f main/codfw-values.yaml stable/eventgate [namespace: eventgate-main, clusters: codfw]
  • 15:12 papaul: shurtting down db2114 for main board replacement
  • 14:53 otto@deploy1001: scap-helm eventgate-main finished
  • 14:52 otto@deploy1001: scap-helm eventgate-main cluster staging completed
  • 14:52 otto@deploy1001: scap-helm eventgate-main upgrade main -f main/staging-values.yaml --reset-values stable/eventgate [namespace: eventgate-main, clusters: staging]
  • 14:48 moritzm: removing unused uwsgi packages from scb* hosts
  • 14:13 otto@deploy1001: scap-helm eventgate-main finished
  • 14:13 otto@deploy1001: scap-helm eventgate-main cluster staging completed
  • 14:13 otto@deploy1001: scap-helm eventgate-main upgrade main -f main/staging-values.yaml --reset-values stable/eventgate [namespace: eventgate-main, clusters: staging]
  • 13:34 bblack: recdns: wiping dyna.wikimedia.org from pdns-recursors
  • 13:13 fsero: running authdns-update for new docker-registry T221101
  • 12:49 fsero: switching traffic from old-registry to new registries registry[12]00[12] - T221101
  • 12:01 _joe_: reenabling puppet across the fleet
  • 11:57 jbond42: all puppetmasters and puppetdbs should be restored'
  • 11:55 jbond42: clean up old source files sudo cumin A:puppetmaster 'rm /etc/apt/sources.list.d/component-facter3.list /etc/apt/sources.list.d/component-puppet5.list'
  • 11:49 volans: updated netbox statues for decommissioning and spare hosts according to T222352
  • 11:23 jbond42: running sudo apt-get install puppet-master=4.8.2-5~bpo8+1 puppet-master-passenger=4.8.2-5~bpo8+1 on labtestpuppetmaster2001
  • 11:19 jbond42: running sudo apt-get install facter=2.4.6-1 puppet=4.8.2-5 puppet-master puppet-master-passenger on labpuppetmaster1001
  • 11:18 jbond42: starting puppetdb on puppetdb2001
  • 11:15 jbond42: run sudo apt-get install puppetdb on puppetdb2001
  • 11:14 jbond42: ran the folloowing on puppetdb2001 sudo apt-get install facter=2.4.6-1 puppet=4.8.2-5
  • 11:14 jbond42: ran the folloowing on puppetmaster200{1,2} sudo apt-get install facter=2.4.6-1 puppet=4.8.2-5 puppet-master puppet-master-passenger
  • 11:04 _joe_: disabling puppet across the fleet
  • 11:02 volans: stopped ircecho to avoid spam
  • 10:43 marostegui: Stop MySQL on db1081
  • 10:41 marostegui@deploy1001: Synchronized wmf-config/db-eqiad.php: Depool db1081 (duration: 00m 57s)
  • 10:29 marostegui@deploy1001: Synchronized wmf-config/db-eqiad.php: Give API traffic to db1129 (new host on s2) (duration: 00m 57s)
  • 10:15 _joe_: restarting low-traffic pybals in codfw, eqiad
  • 10:05 akosiaris: restart proton on proton1001. Host Out of memory T214975
  • 09:57 ariel@deploy1001: Finished deploy [dumps/dumps@ab56fdd]: reduce sleep time between dumps of adds-changes wikis still more (retry) (duration: 00m 06s)
  • 09:57 ariel@deploy1001: Started deploy [dumps/dumps@ab56fdd]: reduce sleep time between dumps of adds-changes wikis still more (retry)
  • 09:54 ariel@deploy1001: Finished deploy [dumps/dumps@ab56fdd]: reduce sleep time between dumps of adds-changes wikis still more (duration: 00m 06s)
  • 09:54 ariel@deploy1001: Started deploy [dumps/dumps@ab56fdd]: reduce sleep time between dumps of adds-changes wikis still more
  • 09:31 marostegui@deploy1001: Synchronized wmf-config/db-eqiad.php: More traffic to db1129 (new host on s2) (duration: 00m 57s)
  • 09:29 fsero@puppetmaster1001: conftool action : set/pooled=true; selector: dnsdisc=docker-registry,name=codfw
  • 09:12 marostegui@deploy1001: Synchronized wmf-config/db-eqiad.php: More traffic to db1076 and db1129 (new host on s2) (duration: 00m 56s)
  • 09:12 godog: bounce rsyslog on lithium
  • 09:01 marostegui@deploy1001: Synchronized wmf-config/db-eqiad.php: More traffic to db1076 and db1129 (new host on s2) (duration: 00m 56s)
  • 08:49 marostegui@deploy1001: Synchronized wmf-config/db-eqiad.php: More traffic to db1076 and db1129 (new host on s2) (duration: 00m 56s)
  • 08:38 marostegui@deploy1001: Synchronized wmf-config/db-eqiad.php: More traffic to db1076 and db1129 (new host on s2) (duration: 00m 57s)
  • 08:28 marostegui@deploy1001: Synchronized wmf-config/db-eqiad.php: More traffic to db1076 (duration: 00m 55s)
  • 08:23 elukey: upload uwsgi 2.0.14+20161117-3+deb9u2+wmf1 packages to stretch-wikimedia - T212697
  • 08:21 marostegui@deploy1001: Synchronized wmf-config/db-eqiad.php: Pool db1129 with low weight on s2 - T222682 (duration: 00m 56s)
  • 08:13 marostegui@deploy1001: Synchronized wmf-config/db-eqiad.php: Slowly repool db1076 (duration: 00m 56s)
  • 08:07 marostegui@deploy1001: Synchronized wmf-config/db-eqiad.php: Provision db1129, db2104, db2107, db2108 T222772 T222682 (duration: 00m 57s)
  • 08:06 marostegui@deploy1001: Synchronized wmf-config/db-codfw.php: Provision db1129, db2104, db2107, db2108 T222772 T222682 (duration: 00m 59s)
  • 07:54 moritzm: installing jquery security updates for stretch
  • 07:50 elukey: roll restart HDFS masters on an-master100[1,2] to pick up new logging settings
  • 07:23 moritzm: installing twitter-bootstrap3 security updates
  • 06:53 _joe_: restarted nagios-nrpe-server on proton1001
  • 05:58 marostegui@deploy1001: Synchronized wmf-config/db-codfw.php: Clarify disk status for db2103, db2112, db2116 (duration: 00m 58s)
  • 05:29 marostegui: Stop replication on db2098:s2
  • 05:25 marostegui: Stop MySQL on db1076
  • 05:22 marostegui@deploy1001: Synchronized wmf-config/db-eqiad.php: Depool db1076 (duration: 00m 57s)
  • 05:09 marostegui@deploy1001: Synchronized wmf-config/db-codfw.php: Pool db2103, db2112 and db2116 into s1 T222772 (duration: 01m 41s)
  • 05:07 marostegui@deploy1001: Synchronized wmf-config/db-eqiad.php: Pool db2103, db2112 and db2116 into s1 T222772 (duration: 01m 22s)
  • 04:55 marostegui@deploy1001: Synchronized wmf-config/db-eqiad.php: Repool pc1007 (duration: 00m 59s)
  • 00:57 twentyafterfour: stopped phd, now running `puppet agent --test` manually on phab1001
  • 00:08 twentyafterfour: phabricator upgrade successful
  • 00:04 twentyafterfour: starting phabricator deployment, momentary downtime expected (~1 minute)

2019-05-08

  • 23:06 krinkle@deploy1001: Synchronized php-1.34.0-wmf.3/includes/specials/SpecialWatchlist.php: T218511 / I423874 (duration: 00m 57s)
  • 23:00 krinkle@deploy1001: Synchronized php-1.34.0-wmf.4/extensions/CirrusSearch/includes/Hooks.php: T219342 / 164a7c1 (duration: 00m 59s)
  • 22:20 ejegg: re-enabled fundraising jobs
  • 22:15 ejegg: updated SmashPig standalone install from 78b92b7fef to 88fd9650ec
  • 22:14 ejegg: disabled fundraising jobs for SmashPig update
  • 22:08 jforrester@deploy1001: Synchronized wmf-config/InitialiseSettings.php: Stop setting wmgUseAdvancedSearch, no longer read; drop rcenhancedfilters from BF whitelist (duration: 00m 57s)
  • 22:06 jforrester@deploy1001: Synchronized wmf-config/CommonSettings.php: Unconditionally load AdvancedSearch everywhere, the config is always true (duration: 00m 57s)
  • 22:00 jforrester@deploy1001: Synchronized wmf-config/InitialiseSettings.php: Beta Feature config cleanup: doc change plus drop advancedsearch and templatewizard-betafeature (duration: 00m 57s)
  • 21:58 jforrester@deploy1001: Synchronized php-1.34.0-wmf.4/extensions/VisualEditor/includes/ApiVisualEditor.php: UBN T209599 ApiVisualEditor: Fix use of getBlockInfo() (duration: 00m 57s)
  • 21:52 niharika29@deploy1001: Synchronized php-1.34.0-wmf.4/tests/phpunit/: Fix Block::newLoad for IPv6 range blocks - follow-up to Ie8bebd8 T222246 (duration: 01m 09s)
  • 21:50 niharika29@deploy1001: Synchronized php-1.34.0-wmf.4/includes/Block.php: Fix Block::newLoad for IPv6 range blocks - follow-up to Ie8bebd8 T222246 (duration: 00m 59s)
  • 21:49 niharika29@deploy1001: sync aborted: php-1.34.0-wmf.4/includes/Block.php Fix Block::newLoad for IPv6 range blocks - follow-up to Ie8bebd8 T222246 (duration: 00m 03s)
  • 21:49 niharika29@deploy1001: Started scap: php-1.34.0-wmf.4/includes/Block.php Fix Block::newLoad for IPv6 range blocks - follow-up to Ie8bebd8 T222246
  • 20:12 thcipriani: restarting gerrit due to threads stuck behind sendemail
  • 20:10 gehel: upgrade to nodejs 10 for maps completed - T210704
  • 20:08 mbsantos@deploy1001: Finished deploy [kartotherian/deploy@7774721] (stretch): Deploy kartotherian node 10 build into maps1001 (T215852) (duration: 00m 20s)
  • 20:08 mbsantos@deploy1001: Started deploy [kartotherian/deploy@7774721] (stretch): Deploy kartotherian node 10 build into maps1001 (T215852)
  • 20:07 mbsantos@deploy1001: Finished deploy [tilerator/deploy@2736a69] (stretch): Deploy tilerator node 10 build into maps1001 (T215852) (duration: 00m 24s)
  • 20:07 mbsantos@deploy1001: Started deploy [tilerator/deploy@2736a69] (stretch): Deploy tilerator node 10 build into maps1001 (T215852)
  • 19:58 mbsantos@deploy1001: Finished deploy [tilerator/deploy@2736a69] (stretch): Deploy tilerator node 10 build into maps[12]004 (T215852) (duration: 00m 58s)
  • 19:57 mbsantos@deploy1001: Started deploy [tilerator/deploy@2736a69] (stretch): Deploy tilerator node 10 build into maps[12]004 (T215852)
  • 19:56 mbsantos@deploy1001: Finished deploy [kartotherian/deploy@7774721] (stretch): Deploy kartotherian node 10 build into maps[12]004 (T215852) (duration: 00m 59s)
  • 19:55 mbsantos@deploy1001: Started deploy [kartotherian/deploy@7774721] (stretch): Deploy kartotherian node 10 build into maps[12]004 (T215852)
  • 19:47 mbsantos@deploy1001: Finished deploy [kartotherian/deploy@7774721] (stretch): Deploy kartotherian node 10 build into maps[12]003 (T215852) (duration: 00m 54s)
  • 19:46 mbsantos@deploy1001: Started deploy [kartotherian/deploy@7774721] (stretch): Deploy kartotherian node 10 build into maps[12]003 (T215852)
  • 19:46 mbsantos@deploy1001: Finished deploy [tilerator/deploy@2736a69] (stretch): Deploy tilerator node 10 build into maps[12]003 (T215852) (duration: 00m 56s)
  • 19:45 mbsantos@deploy1001: Started deploy [tilerator/deploy@2736a69] (stretch): Deploy tilerator node 10 build into maps[12]003 (T215852)
  • 19:35 mbsantos@deploy1001: Finished deploy [tilerator/deploy@2736a69] (stretch): Deploy tilerator/kartotherian node 10 build into maps[12]002 (T215852) (duration: 01m 12s)
  • 19:33 mbsantos@deploy1001: Started deploy [tilerator/deploy@2736a69] (stretch): Deploy tilerator/kartotherian node 10 build into maps[12]002 (T215852)
  • 19:32 mbsantos@deploy1001: Finished deploy [kartotherian/deploy@7774721] (stretch): Deploy tilerator node 10 build into maps[12]002 (T215852) (duration: 00m 57s)
  • 19:31 mbsantos@deploy1001: Started deploy [kartotherian/deploy@7774721] (stretch): Deploy tilerator node 10 build into maps[12]002 (T215852)
  • 19:26 gehel: continue upgrade to nodejs 10 for maps - T210704
  • 19:22 thcipriani@deploy1001: Synchronized php: group1 wikis to 1.34.0-wmf.4 (duration: 01m 48s)
  • 19:21 cdanis: swift codfw-prod: deploy I59c88aed T221068
  • 19:20 thcipriani@deploy1001: rebuilt and synchronized wikiversions files: group1 wikis to 1.34.0-wmf.4
  • 19:01 cdanis: T221904 cdanis@cumin1001.eqiad.wmnet ~ % sudo cumin 'ms-be2*[4,7].codfw.wmnet' 'for DISK in /sys/block/sd*/queue/scheduler ; do echo cfq > $DISK ; done'
  • 18:09 mutante: restarting gerrit to apply logging changes (gerrit:508391)
  • 17:58 bblack: public authdns: deploying the big DYNA/CNAME change in https://gerrit.wikimedia.org/r/c/operations/dns/+/507399
  • 17:44 jforrester@deploy1001: Synchronized wmf-config/extension-list: Re-sort extension-list (prod no-op) (duration: 00m 56s)
  • 17:42 jforrester@deploy1001: Synchronized wmf-config/env.php: Clean-up: Allow for running outside the cluster for local testing (no-op for prod) (duration: 00m 56s)
  • 17:22 mholloway-shell@deploy1001: Synchronized wmf-config/InitialiseSettings-labs.php: Retry: Enable WikimediaEditorTasks on Beta commonswiki (duration: 00m 57s)
  • 17:15 mholloway-shell@deploy1001: Synchronized wmf-config/InitialiseSettings-labs.php: Enable WikimediaEditorTasks on Beta commonswiki (duration: 00m 57s)
  • 16:55 otto@deploy1001: scap-helm eventgate-main finished
  • 16:55 otto@deploy1001: scap-helm eventgate-main cluster staging completed
  • 16:55 otto@deploy1001: scap-helm eventgate-main upgrade main -f main/staging-values.yaml --reset-values stable/eventgate [namespace: eventgate-main, clusters: staging]
  • 16:08 gehel: restart tileratorui on maps2001 - T222801
  • 15:59 jynus: restart db2117 after first puppet run
  • 15:56 mforns@deploy1001: Finished deploy [analytics/refinery@698f213]: deploying analytics-refinery up to 698f213 with source=v0.0.89 (duration: 15m 38s)
  • 15:52 gehel: reset authentication on cassandra / maps / codfw - T222801
  • 15:40 mforns@deploy1001: Started deploy [analytics/refinery@698f213]: deploying analytics-refinery up to 698f213 with source=v0.0.89
  • 15:19 moritzm: installing ruby-i18n security updates
  • 15:14 moritzm: installing rails security updates
  • 15:04 XioNoX: fix typo on asw2-ulsfo<->cr2-ulsfo interface (Xlink2 instead of Xlink1)
  • 14:21 otto@deploy1001: scap-helm eventgate-main finished
  • 14:21 otto@deploy1001: scap-helm eventgate-main cluster staging completed
  • 14:21 otto@deploy1001: scap-helm eventgate-main install -n main -f main/staging-values.yaml stable/eventgate [namespace: eventgate-main, clusters: staging]
  • 14:18 mbsantos@deploy1001: Finished deploy [tilerator/deploy@2736a69] (stretch): Deploy tilerator node 10 build into maps2001 (T215852) (duration: 00m 27s)
  • 14:17 mbsantos@deploy1001: Started deploy [tilerator/deploy@2736a69] (stretch): Deploy tilerator node 10 build into maps2001 (T215852)
  • 14:14 mbsantos@deploy1001: Finished deploy [kartotherian/deploy@7774721] (stretch): Deploy kartotherian node 10 build into maps2001 (T215852) (duration: 00m 27s)
  • 14:14 mbsantos@deploy1001: Started deploy [kartotherian/deploy@7774721] (stretch): Deploy kartotherian node 10 build into maps2001 (T215852)
  • 14:05 fsero@deploy1001: scap-helm eventgate-main install -n main -f main/staging-values.yaml stable/eventgate [namespace: eventgate-main, clusters: staging]
  • 14:03 gehel: starting upgrade to nodejs 10 for maps - T210704
  • 13:50 otto@deploy1001: scap-helm eventgate-main install -n main -f main/staging-values.yaml stable/eventgate [namespace: eventgate-main, clusters: staging]
  • 13:18 ema: cp3035: restart varnish-be
  • 12:07 kart_: EU-Midday SWAT done.
  • 12:06 kartik@deploy1001: Synchronized php-1.34.0-wmf.3: SWAT: Log warning and show error on empty username (T222529)|gerrit:508559Log warning and show error on empty username (T222529) (duration: 07m 29s)
  • 11:56 akosiaris@deploy1001: scap-helm cxserver finished
  • 11:56 akosiaris@deploy1001: scap-helm cxserver cluster codfw completed
  • 11:56 akosiaris@deploy1001: scap-helm cxserver upgrade -f cxserver-codfw-values.yaml production stable/cxserver [namespace: cxserver, clusters: codfw]
  • 11:56 akosiaris@deploy1001: scap-helm cxserver upgrade -f cxserver-codfw-values.yaml staging stable/cxserver [namespace: cxserver, clusters: codfw]
  • 11:54 akosiaris: bump prometheus-statsd-exporter for cxserver to 0.0.5 T220709
  • 11:54 akosiaris@deploy1001: scap-helm cxserver finished
  • 11:54 akosiaris@deploy1001: scap-helm cxserver cluster staging completed
  • 11:54 akosiaris@deploy1001: scap-helm cxserver upgrade -f cxserver-staging-values.yaml staging stable/cxserver [namespace: cxserver, clusters: staging]
  • 11:29 kartik@deploy1001: Synchronized wmf-config/InitialiseSettings.php: SWAT Add publish restrictions config for enwiki|gerrit:495677Add publish restrictions config for enwiki (T217237) (duration: 00m 58s)
  • 11:06 mobrovac@deploy1001: Finished deploy [cpjobqueue/deploy@abd7fdc]: Prepare the config to allow jobs to be switched to PHP7 individually - T219148 (duration: 01m 30s)
  • 11:05 mobrovac@deploy1001: Started deploy [cpjobqueue/deploy@abd7fdc]: Prepare the config to allow jobs to be switched to PHP7 individually - T219148
  • 10:17 _joe_: restarted pybal on lvs1016 to pick up changes for T222705
  • 10:08 marostegui@deploy1001: Synchronized wmf-config/db-eqiad.php: Give more traffic to db1131 in s6 T222682 (duration: 00m 57s)
  • 09:51 _joe_: restarted proton on proton1001
  • 09:50 _joe_: restarted pybal on lvs1006 to pick up changes for T222705
  • 09:49 _joe_: restarted pybal on lvs2003 to pick up changes for T222705
  • 09:45 marostegui: Stop replication on db2097:3311
  • 09:29 marostegui@deploy1001: Synchronized wmf-config/db-eqiad.php: Give more traffic to db1131 in s6 T222682 (duration: 01m 07s)
  • 09:26 _joe_: restarting pybal on lvs2006 to pick up changes for T222705 (3/3)
  • 09:24 elukey: install uwsgi-core_2.0.14+20161117-3+deb9u2+wmf1 on netmon2001 to test a uwsgi bug fix - T212697
  • 09:12 _joe_: restarting pybal on lvs2006 to pick up changes for T222705 (2/3)
  • 08:57 _joe_: restarting pybal on lvs2006 to pick up changes for T222705
  • 08:56 godog: upload prometheus-statsd-exporter 0.9.0+ds1-1 to stretch-wikimedia T220709
  • 08:49 marostegui@deploy1001: Synchronized wmf-config/db-eqiad.php: Pool db1131 into s6 with low weight T222682 (duration: 00m 51s)
  • 08:48 marostegui@deploy1001: Synchronized wmf-config/db-codfw.php: Pool db1131 into s6 with low weight T222682 (duration: 00m 53s)
  • 08:29 marostegui@deploy1001: Synchronized wmf-config/db-eqiad.php: Fully repool db1093 (duration: 00m 58s)
  • 08:14 marostegui@deploy1001: Synchronized wmf-config/db-eqiad.php: More traffic to db1093 (duration: 00m 58s)
  • 07:49 marostegui: Stop replication s1 on db2102
  • 07:45 elukey: install uwsgi-core_2.0.14+20161117-3+deb9u2+wmf1 on netmon1002 to test a uwsgi bug fix - T212697
  • 07:42 marostegui@deploy1001: Synchronized wmf-config/db-eqiad.php: Give some API traffic to db1093 (duration: 00m 57s)
  • 07:41 vgutierrez: upgrading pybal to version 1.15.6 in lvs1001 - T222705
  • 07:40 godog: bounce prometheus on bast3002 to finalize migration
  • 07:37 vgutierrez: upgrading pybal to version 1.15.6 in lvs1004 - T222705
  • 07:33 vgutierrez: upgrading pybal to version 1.15.6 in lvs1002 - T222705
  • 07:33 marostegui@deploy1001: Synchronized wmf-config/db-codfw.php: Pool db2115 into x1 T222772 (duration: 00m 56s)
  • 07:32 marostegui@deploy1001: Synchronized wmf-config/db-eqiad.php: Pool db2115 into x1 T222772 (duration: 01m 09s)
  • 07:26 vgutierrez: upgrading pybal to version 1.15.6 in lvs1005 - T222705
  • 07:25 marostegui@deploy1001: Synchronized wmf-config/db-eqiad.php: Give some weight to db1093 (duration: 00m 56s)
  • 07:21 vgutierrez: upgrading pybal to version 1.15.6 in lvs1016 - T222705
  • 07:15 marostegui@deploy1001: Synchronized wmf-config/db-eqiad.php: Pool db1127 and db1137 into x1 T222682 (duration: 00m 56s)
  • 07:14 vgutierrez: upgrading pybal to version 1.15.6 in lvs1006 - T222705
  • 07:13 marostegui@deploy1001: Synchronized wmf-config/db-codfw.php: Pool db1127 and db1137 into x1 T222682 (duration: 01m 03s)
  • 07:04 vgutierrez: upgrading pybal to version 1.15.6 in lvs2001 - T222705
  • 07:02 vgutierrez: upgrading pybal to version 1.15.6 in lvs2004 - T222705
  • 06:58 vgutierrez: upgrading pybal to version 1.15.6 in lvs2002 - T222705
  • 06:51 vgutierrez: upgrading pybal to version 1.15.6 in lvs2005 - T222705
  • 06:42 vgutierrez: upgrading pybal to version 1.15.6 in lvs2003 - T222705
  • 06:36 vgutierrez: upgrading pybal to version 1.15.6 in lvs3001 - T222705
  • 06:32 vgutierrez: upgrading pybal to version 1.15.6 in lvs3003 - T222705
  • 06:29 elukey: restart uwsgi-netbox on netmon1002 after the daily segfault (upon restart)
  • 06:29 vgutierrez: upgrading pybal to version 1.15.6 in lvs3002 - T222705
  • 06:24 vgutierrez: upgrading pybal to version 1.15.6 in lvs3004 - T222705
  • 06:20 marostegui: Stop MySQL on db2096
  • 06:19 vgutierrez: upgrading pybal to version 1.15.6 in lvs4005 - T222705
  • 06:16 vgutierrez: upgrading pybal to version 1.15.6 in lvs4006 - T222705
  • 06:13 vgutierrez: upgrading pybal to version 1.15.6 in lvs4007 - T222705
  • 06:07 vgutierrez: upgrading pybal to version 1.15.6 in lvs5001 - T222705
  • 06:02 vgutierrez: upgrading pybal to version 1.15.6 in lvs5002 - T222705
  • 05:59 vgutierrez: upgrading pybal to version 1.15.6 in lvs5003 - T222705
  • 05:48 vgutierrez: upgrading pybal to version 1.15.6 in lvs2006 - T222705
  • 05:25 marostegui: Stop MySQL on db1093
  • 05:01 marostegui: Optimize tables on pc1007
  • 05:01 marostegui@deploy1001: Synchronized wmf-config/db-eqiad.php: Depool pc1007 (duration: 00m 59s)

2019-05-07

  • 23:31 ebernhardson@deploy1001: Synchronized wmf-config/CirrusSearch-production.php: T220625 Configure wgCirrusSearchPrivateClusters (duration: 00m 58s)
  • 22:06 ppchelko@deploy1001: Finished deploy [restbase/deploy@8f5859f]: Do not cache html if stash was requested T215956 (duration: 18m 12s)
  • 21:48 ppchelko@deploy1001: Started deploy [restbase/deploy@8f5859f]: Do not cache html if stash was requested T215956
  • 21:47 ppchelko@deploy1001: deploy aborted: Do not cache html if stash was requested T215956 (duration: 00m 12s)
  • 21:47 ppchelko@deploy1001: Started deploy [restbase/deploy@d91ee4c]: Do not cache html if stash was requested T215956
  • 21:46 mutante: deploy1001 - renabled puppet - deployment can go ahead
  • 21:06 cdanis: cdanis@cumin1001.eqiad.wmnet ~ % sudo cumin -m async -p80 -b10 'C:profile::mediawiki::php and *.codfw.wmnet' 'run-puppet-agent' 'systemctl reload php7.2-fpm.service'
  • 20:43 mutante: gerrit2001 - restarting apache.. failed
  • 20:38 ejegg: updated payments-wiki from 558427f731 to 6e0172bac3
  • 20:31 mutante: gerrit2001 - temp disabling puppet - testing apache rewrites for T218844 on non-prod host
  • 20:14 mutante: deploy1001 - temp disabled puppet - debugging issue with apache-fast-test script
  • 19:52 thcipriani@deploy1001: rebuilt and synchronized wikiversions files: group0 to 1.34.0-wmf.4
  • 19:42 thcipriani@deploy1001: Finished scap: testwiki to 1.34.0-wmf.4 and rebuild l10n cache (duration: 28m 55s)
  • 19:13 thcipriani@deploy1001: Started scap: testwiki to 1.34.0-wmf.4 and rebuild l10n cache
  • 19:04 thcipriani@deploy1001: Pruned MediaWiki: 1.33.0-wmf.22 (duration: 02m 15s)
  • 18:50 thcipriani@deploy1001: Pruned MediaWiki: 1.33.0-wmf.21 (duration: 08m 48s)
  • 18:38 mutante: LDAP - adding awight to 'wmde' group (T222538)
  • 18:08 mutante: restarting icinga via web UI button
  • 17:45 thcipriani: starting branchcut for train (1.34.0-wmf.4)
  • 17:31 arturo: rebooting cloudvirt1024 to test interfaces configuration
  • 16:59 otto@deploy1001: scap-helm eventgate-main install -n main -f main/staging-values.yaml stable/eventgate [namespace: eventgate-main, clusters: staging]
  • 16:39 otto@deploy1001: scap-helm eventgate-main install -n main -f main/staging-values.yaml stable/eventgate [namespace: eventgate-main, clusters: staging]
  • 16:38 arturo: rebooting cloudvirt1024 to test interfaces configuration
  • 16:05 fsero: created eventgate-main tokens - T218346
  • 16:05 fsero: created eventgate-main tokens
  • 15:47 fsero: creating eventgate-main namespace on k8s clusters
  • 15:38 vgutierrez: uploaded pybal 1.15.6 to apt.wikimedia.org (stretch && jessie)
  • 15:21 ebernhardson@deploy1001: Synchronized php-1.34.0-wmf.3/extensions/CirrusSearch/maintenance/forceSearchIndex.php: T222641: Cirrus maint script handle ancient logging rows (duration: 00m 52s)
  • 14:53 cdanis: pool mw1271
  • 14:53 cdanis: pool mw1256
  • 14:44 cdanis: cdanis@mw1256.eqiad.wmnet ~ % sudo php7adm /opcache-free
  • 14:43 cdanis: cdanis@mw1271.eqiad.wmnet ~ % sudo php7adm /opcache-free
  • 14:40 vgutierrez: uploaded pybal 1.15.5 to apt.wikimedia.org (stretch && jessie)
  • 14:26 _joe_: repooling mw1320
  • 14:25 _joe_: resetting opcache on mw1320
  • 14:13 vgutierrez: uploaded pybal 1.15.4 to apt.wikimedia.org (stretch)
  • 14:12 cdanis@puppetmaster1001: conftool action : set/pooled=no; selector: name=mw1256.eqiad.wmnet
  • 14:12 cdanis@puppetmaster1001: conftool action : set/pooled=no; selector: name=mw1271.eqiad.wmnet
  • 14:09 cdanis@puppetmaster1001: conftool action : set/pooled=no; selector: name=mw1320.eqiad.wmnet
  • 14:09 cdanis: depool mw1320
  • 14:07 otto@deploy1001: scap-helm eventgate-analytics finished
  • 14:07 otto@deploy1001: scap-helm eventgate-analytics cluster eqiad completed
  • 14:07 otto@deploy1001: scap-helm eventgate-analytics install -n analytics -f analytics/eqiad-values.yaml stable/eventgate [namespace: eventgate-analytics, clusters: eqiad]
  • 14:02 otto@deploy1001: scap-helm eventgate-analytics finished
  • 14:02 otto@deploy1001: scap-helm eventgate-analytics cluster codfw completed
  • 14:02 otto@deploy1001: scap-helm eventgate-analytics install -n analytics -f analytics/codfw-values.yaml stable/eventgate [namespace: eventgate-analytics, clusters: codfw]
  • 14:01 otto@deploy1001: scap-helm eventgate-analytics install -n analytics -f analytics/codfw-values.yaml stable/eventgate [namespace: eventgate-analytics, clusters: codfw]
  • 14:01 otto@deploy1001: scap-helm eventgate-analytics upgrade analytics -f analytics/codfw-values.yaml --reset-values stable/eventgate [namespace: eventgate-analytics, clusters: codfw]
  • 13:59 otto@deploy1001: scap-helm eventgate-analytics upgrade analytics -f analytics/codfw-values.yaml --reset-values stable/eventgate [namespace: eventgate-analytics, clusters: codfw]
  • 13:58 otto@deploy1001: scap-helm eventgate-analytics install -n analytics -f analytics/codfw-values.yaml stable/eventgate [namespace: eventgate-analytics, clusters: codfw]
  • 13:57 otto@deploy1001: scap-helm eventgate-analytics install -n analytics -f analytics/codfw-values.yaml stable/eventgate [namespace: eventgate-analytics, clusters: codfw]
  • 13:50 vgutierrez: uploaded prometheus-trafficserver-exporter 0.2.3 to apt.wikimedia.org (stretch) - T221217
  • 13:45 marostegui: Stop MySQL and poweroff db1093 for BBU replacement - T222127
  • 13:45 marostegui@deploy1001: Synchronized wmf-config/db-eqiad.php: Depool db1093 for BBU replacement T222127 (duration: 00m 51s)
  • 13:37 otto@deploy1001: scap-helm eventgate-analytics finished
  • 13:37 otto@deploy1001: scap-helm eventgate-analytics cluster staging completed
  • 13:37 otto@deploy1001: scap-helm eventgate-analytics upgrade analytics -f analytics/staging-values.yaml --reset-values stable/eventgate [namespace: eventgate-analytics, clusters: staging]
  • 13:37 otto@deploy1001: scap-helm eventgate-analytics upgrade analytics -f analytics/staging-values.yaml --reset-values stable/eventgate [namespace: eventgate-analytics, clusters: staging]
  • 13:17 cdanis: T221904 cdanis@cumin1001.eqiad.wmnet ~ % sudo cumin -p95 -m async -b5 'ms-be1*' 'run-puppet-agent -q' 'systemctl restart swift-object-replicator' 'systemctl restart swift-object-auditor'
  • 13:08 ema: sudo ipmitool -I lanplus -H cp2009.mgmt.codfw.wmnet -U root mc reset cold T222459
  • 13:07 ema: sudo ipmitool -I lanplus -H "cp2009.mgmt.codfw.wmnet" -U root -E chassis power cycle T222459
  • 13:02 cdanis: T221904 cdanis@cumin1001.eqiad.wmnet ~ % sudo cumin -p95 -m async -b5 'ms-be2*' 'run-puppet-agent -q' 'systemctl restart swift-object-replicator' 'systemctl restart swift-object-auditor'
  • 12:45 jynus: remove dbstore1001, dbstore2001, dbstore2002 from tendril and zarcillo T220002
  • 12:09 marostegui: Stop Replication on db1140:3320 to provision db1127 and db1137 T222682
  • 11:16 hashar: Downgraded Zuul back to 2.5.1-wmf7 # T105474 T140297
  • 11:08 hashar: Upgraded Zuul and it is broken. So downgrading back :-(
  • 10:51 hashar: Gracefully stopping Zuul for upgrade
  • 10:46 mlitn@deploy1001: Finished scap: SDC: Enable Depicts in UploadWizard on Commons (duration: 22m 45s)
  • 10:40 ema: libvmod-uuid 1.4-1 uploaded to stretch-wikimedia T221977
  • 10:23 mlitn@deploy1001: Started scap: SDC: Enable Depicts in UploadWizard on Commons
  • 10:16 hashar: contint1001: upgrading python-pbr from 0.8.2-1 to 1.10.0-1 , no more needed with recent Zuul # T218559
  • 10:16 hashar: contint1001, contint2002: rm /etc/apt/preferences.d/python_pbr.pref /etc/apt/preferences.d/python-pbr.pref # T218559
  • 10:08 jbond42: upload zull_2.5.1-wmf8 package to jessie-wikimedia
  • 09:51 godog: test statsd-exporter 0.9 upgrade on deployment-imagescaler03 - T220709
  • 09:47 jbond42: restart pdfrender on scb1004 - T174916
  • 08:51 arturo: T222685 remove facter from jessie-wikimedia/openstack-mitaka-jessie
  • 08:39 ema: repool cp1083 T222620
  • 07:59 moritzm: updating base-files from recent stretch point release
  • 07:51 mobrovac@deploy1001: Finished deploy [restbase/deploy@d91ee4c]: Remove section functionality from the REST API - T216636 (duration: 24m 46s)
  • 07:27 godog: upgrade prometheus on bast3002 - T187987
  • 07:26 mobrovac@deploy1001: Started deploy [restbase/deploy@d91ee4c]: Remove section functionality from the REST API - T216636
  • 07:21 mobrovac@deploy1001: Finished deploy [restbase/deploy@d91ee4c] (dev-cluster): Remove section functionality from the REST API (duration: 03m 02s)
  • 07:21 marostegui: Optimize tables on pc1010
  • 07:18 mobrovac@deploy1001: Started deploy [restbase/deploy@d91ee4c] (dev-cluster): Remove section functionality from the REST API
  • 06:59 moritzm: updating firmware-bnx2x (from stretch point release, this is a NOP, the source package firmware-nonfree was updated for various Wifi chipsets we don't use, doublechecked by comparing check sums for old and new bnx2x firmware)
  • 06:44 elukey: restart uwsgi-netbox on netmon1002 after segfault
  • 05:23 marostegui@deploy1001: Synchronized wmf-config/db-codfw.php: Promote db2045 to codfw x1 master T219493 (duration: 00m 55s)
  • 05:12 marostegui: Change topology on x1 codfw to promote db2045 to master T219493
  • 02:12 tstarling@deploy1001: Synchronized wmf-config/InitialiseSettings.php: Use Preprocessor_Hash unconditionally (duration: 00m 52s)
  • 00:53 mutante: install2002 - disabling puppet, live hacking DHCP config for db2103 to not serve installer via http to debug install issue for T221532 which seems like T190424#4548003
  • 00:38 jforrester@deploy1001: Synchronized php-1.34.0-wmf.3/extensions/VisualEditor/modules/ve-mw/init/ve.init.mw.ArticleTarget.js: Hot-deploy fix for visual diffs on mobile in non-section mode T222489 (duration: 00m 53s)
  • 00:32 ejegg: disabled fundraising scheduled jobs for CiviCRM maintenance

2019-05-06

  • 23:25 maxsem@deploy1001: Synchronized wmf-config/CommonSettings.php: https://gerrit.wikimedia.org/r/#/c/operations/mediawiki-config/+/503546/ (duration: 00m 50s)
  • 22:46 crusnov@deploy1001: Finished deploy [netbox/deploy@0061190]: Deploy new version of ganeti-netbox sync. (duration: 03m 53s)
  • 22:43 RoanKattouw: Running refreshMessageBlobs.php on all wikis for T222539
  • 22:42 crusnov@deploy1001: Started deploy [netbox/deploy@0061190]: Deploy new version of ganeti-netbox sync.
  • 21:59 mutante: LDAP - remove 'sukhe' from 'nda' and add to 'wmf' instead (T221990)
  • 21:24 cdanis: experimenting with different disk scheduler on ms-be2014 -- cdanis@ms-be2014.codfw.wmnet ~ % for D in /sys/block/sd*/queue/scheduler ; echo cfq | sudo tee $D
  • 21:15 godog: swift codfw-prod: push up-to-date rings, mistakenly pushed earlier an older version
  • 19:48 gehel: rolling restart of cassandra on maps* fro config change
  • 19:47 RoanKattouw: Running recomputeNotifCounts.php --notif-types=login-success on all Echo wikis for T220762
  • 19:31 cdanis: cdanis@cumin1001.eqiad.wmnet ~ % sudo cumin -m async -b4 'ms-be1*' 'run-puppet-agent --enable "cdanis rollout I369f9b29"' 'systemctl restart swift-object-replicator'
  • 19:22 cdanis: cdanis@cumin1001.eqiad.wmnet ~ % sudo cumin -m async -b4 'ms-be2*' 'run-puppet-agent --enable "cdanis rollout I369f9b29"' 'systemctl systemctl restart swift-object-replicator'
  • 19:01 catrope@deploy1001: Synchronized wmf-config/InitialiseSettings.php: Begin homepage experiment on cswiki and kowiki (T221266) (duration: 00m 51s)
  • 18:47 catrope@deploy1001: Synchronized php-1.34.0-wmf.3/extensions/GrowthExperiments/: Remove link to pageviews tool when no data available (T222405) (duration: 00m 52s)
  • 18:32 catrope@deploy1001: Synchronized php-1.34.0-wmf.3/skins/MinervaNeue/includes/menu/Definitions.php: Harden Definitions::insertCommunityPortal() method (T222407) (duration: 00m 53s)
  • 18:30 cdanis: cdanis@cumin1001.eqiad.wmnet ~ % sudo cumin 'ms-be*' 'disable-puppet "cdanis rollout I369f9b29"'
  • 18:24 jynus: restart and upgrade db1116
  • 18:14 catrope@deploy1001: Synchronized wmf-config/CommonSettings.php: Set $wgOresFrontendBaseUrl (T219396) (duration: 00m 51s)
  • 17:53 otto@deploy1001: scap-helm eventgate-main install -n main -f main/staging-values.yaml stable/eventgate [namespace: eventgate-main, clusters: staging]
  • 17:52 otto@deploy1001: scap-helm eventgate-main install -n main -f main/staging-values.yaml stable/eventgate [namespace: eventgate-main, clusters: staging]
  • 17:19 elukey: restart netbox on netmon1002 as test
  • 17:11 jynus: restart dbprov* hosts, in sequence, for kernel upgrade
  • 16:42 jynus: restart db1114 mysql for upgrade testing
  • 16:38 andrewbogott: re-imaging cloudvirt1024
  • 16:34 jynus: restart db2102 mysql for upgrade testing
  • 16:11 hashar: CI queue drained. Should be working fine again now
  • 15:57 hashar: CI / Zuul is being slowed down and being investigated
  • 15:48 moritzm: updating firmware-bnx2x (from stretch point release, this is a NOP, the source package firmware-nonfree was updated for various Wifi chipsets we don't use, doublechecked by comparing check sums for old and new bnx2x firmware)
  • 15:37 moritzm: updating firmware-bnx2 (from stretch point release, this is a NOP, the source package firmware-nonfree was updated for various Wifi chipsets we don't use, doublechecked by comparing check sums for old and new bnx2 firmware)
  • 15:35 papaul: shutting down elastic2038 for DIMM swap
  • 15:30 moritzm: updating base-files from recent stretch point release
  • 15:14 ema: pool cp4026 w/ ATS backend T219967
  • 14:57 godog: capture strace / core for rsyslog on wezen / lithium and restart - T199406
  • 14:42 ema: powercycle cp1083
  • 14:41 ema@puppetmaster1001: conftool action : set/pooled=no; selector: name=cp1083.eqiad.wmnet
  • 14:35 godog: swift eqiad-prod: finish decom ms-be101[45] - T220590
  • 14:25 moritzm: installing vips security updates
  • 14:19 ema: depool cp4026 and reimage as upload_ats T219967
  • 14:11 otto@deploy1001: scap-helm eventgate-analytics finished
  • 14:11 otto@deploy1001: scap-helm eventgate-analytics cluster staging completed
  • 14:11 otto@deploy1001: scap-helm eventgate-analytics upgrade analytics -f analytics/staging-values.yaml --reset-values stable/eventgate [namespace: eventgate-analytics, clusters: staging]
  • 14:09 hashar: CI workflow fixed by reverting a change deployed around 10:00 UTC # T222614
  • 14:03 ema: cp3038: restart varnish-be
  • 13:56 otto@deploy1001: scap-helm eventgate-analytics finished
  • 13:56 otto@deploy1001: scap-helm eventgate-analytics cluster staging completed
  • 13:56 otto@deploy1001: scap-helm eventgate-analytics install -n analytics -f analytics/staging-values.yaml stable/eventgate [namespace: eventgate-analytics, clusters: staging]
  • 13:54 moritzm: installing zziplib security updates
  • 13:52 hashar: CI does not run sometime for some reason ... https://phabricator.wikimedia.org/T222614 :(
  • 13:22 moritzm: installing audiofile security updates
  • 13:20 moritzm: installing unzip security updates
  • 12:43 moritzm: installing rsync security updates
  • 12:24 moritzm: installing golang security updates on jessie
  • 11:44 Amir1: EU SWAT is done
  • 11:40 ladsgroup@deploy1001: Synchronized wmf-config/InitialiseSettings.php: SWAT: Enable Suggestion Constraint Status on Wikidata|gerrit:508303Enable Suggestion Constraint Status on Wikidata (duration: 00m 52s)
  • 11:32 arturo: reverting puppet change to the sudo module
  • 11:17 arturo: merging puppet change to the sudo module https://gerrit.wikimedia.org/r/c/operations/puppet/+/507376
  • 10:59 ema: manual puppet-merge $sha on failed puppetmasters https://phabricator.wikimedia.org/P8477
  • 10:44 jdrewniak@deploy1001: Synchronized portals: Wikimedia Portals Update: Bumping portals to master (T128546)|gerrit:508302 Bumping portals to master (T128546) (duration: 00m 51s)
  • 10:43 jdrewniak@deploy1001: Synchronized portals/wikipedia.org/assets: Wikimedia Portals Update: Bumping portals to master (T128546)|gerrit:508302 Bumping portals to master (T128546) (duration: 00m 52s)
  • 10:05 arturo: upgrade udev in cloudservices2002-dev
  • 09:59 arturo: T222148 upgrade udev & libudev1 on cloudvirt[1001-1003,1005].eqiad.wmnet
  • 09:35 elukey: restart netbox on netmon1002 (trying to reproduce the segfault) - T212697
  • 09:03 godog: upgrade labmon1001 to prometheus 2 - T187987
  • 06:01 marostegui@deploy1001: Synchronized wmf-config/db-eqiad.php: Give some API traffic to db1093 (duration: 00m 52s)
  • 05:08 marostegui@deploy1001: Synchronized wmf-config/db-eqiad.php: Give some weight to db1093 (duration: 00m 58s)
  • 04:08 ariel@deploy1001: Finished deploy [dumps/dumps@b4b7733]: reduce sleep time more between wikis for incrs (duration: 00m 05s)
  • 04:08 ariel@deploy1001: Started deploy [dumps/dumps@b4b7733]: reduce sleep time more between wikis for incrs

2019-05-05

  • 14:42 elukey: restart pdfrender on scb1004
  • 03:10 chaomodus: fyi scb* flapping on some endpoints seems to be just noise, there is high load from mobileapi but things appear to be operating normally otherwise, several boxes are in the process of checking md which may account for service lags
  • 02:40 andrewbogott: restarting mariadb on cloudservices1003

2019-05-04

  • 22:20 reedy@deploy1001: Synchronized docroot/mediawiki/xml/index.html: Add extra xml namespace links (duration: 01m 06s)
  • 10:38 ariel@deploy1001: Finished deploy [dumps/dumps@26b52ef]: misc small fixes, reduce sleep time for incr wikis (duration: 00m 09s)
  • 10:38 ariel@deploy1001: Started deploy [dumps/dumps@26b52ef]: misc small fixes, reduce sleep time for incr wikis

2019-05-03

  • 23:50 thcipriani: gerrit back
  • 23:49 thcipriani: gerrit restart due to threads piling up
  • 22:09 XioNoX: clear v4 BGP to AS17451 on cr1-eqsin/cr4-ulsfo
  • 17:16 arturo: T222148 aborrero@labstore1005:~ $ sudo apt-get install libudev1 udev systemd systemd-sysv libsystemd0
  • 17:15 arturo: T222148 aborrero@labstore1004:~ $ sudo apt-get install libudev1 udev systemd systemd-sysv libsystemd0
  • 17:11 arturo: T222148 aborrero@labpuppetmaster1002:~ $ sudo apt-get install libudev1 udev systemd systemd-sysv libsystemd0
  • 17:10 arturo: T222148 aborrero@labpuppetmaster1001:~ $ sudo apt-get install libudev1 udev systemd systemd-sysv libsystemd0
  • 17:09 arturo: T222148 aborrero@labtestpuppetmaster2001:~ $ sudo apt-get install libudev1 udev systemd systemd-sysv libsystemd0
  • 17:08 arturo: T222148 drop libudev1 from openstack-mitaka-jessie/jessie-wikimedia (related to T216497)
  • 17:07 arturo: T222148 drop udev from openstack-mitaka-jessie/jessie-wikimedia (related to T216497)
  • 15:02 oblivian@puppetmaster1001: conftool action : set/pooled=yes; selector: cluster=parsoid,dc=codfw
  • 15:02 _joe_: repooling the wtp* servers depooled in codfw for load testing
  • 14:56 _joe_: repool mw1275
  • 13:49 jijiki: Restart npre on proton1001
  • 12:26 gehel: replaying 30 minutes of eqiad search traffic on codfw - T221121
  • 12:21 ema: cp3038: varnish-backend-restart
  • 11:10 _joe_: purging opcache on mw1275
  • 10:47 ema: pool cp4025 w/ ATS backend T219967
  • 10:43 jbond42: T220380 remove zull_2.5.0-8-gcbc7f62-wmf4jessie1 from jessie-wikimedia/thirdparty
  • 10:42 jbond42: T220380 upload zull_2.5.1-wmf7 to jessie-wikimedia
  • 10:25 jijiki: Depool mw1275
  • 10:02 lucaswerkmeister-wmde@deploy1001: Synchronized php-1.34.0-wmf.3/extensions/WikibaseLexemeCirrusSearch/: Fix reference to classes that moved (T222347)|gerrit:507847Fix reference to classes that moved (T222347) (duration: 00m 55s)
  • 09:49 ema: depool cp4025 and reimage as upload_ats T219967
  • 09:49 oblivian@puppetmaster1001: conftool action : set/pooled=no; selector: cluster=parsoid,dc=codfw,name=wtp201[3-4].*
  • 09:21 gehel: ban elastic2038 from elastic clusters pending memory issue investigation - T217398
  • 08:47 ema: pool cp4024 w/ ATS backend T219967
  • 08:27 jynus: starting table recompression on new backup source hosts on eqiad and codfw (stop replication) T220572
  • 07:45 ema: depool cp4024 and reimage as upload_ats T219967
  • 07:16 ema: cp1089: varnish-backend-restart
  • 05:32 _joe_: restarting varnish backend on cp1077
  • 05:05 oblivian@puppetmaster1001: conftool action : set/pooled=no; selector: cluster=parsoid,dc=codfw,name=wtp201[5-6].*
  • 04:57 oblivian@puppetmaster1001: conftool action : set/pooled=no; selector: cluster=parsoid,dc=codfw,name=wtp20(1[7-9]|20).*
  • 04:55 _joe_: progressively depooling parsoid servers in codfw to assess load tolerance
  • 00:32 mutante: powercycling elastic2038
  • 00:10 XioNoX: remove static route to 208.80.155.128/25 on cr1/2-eqiad - T193496
  • 00:06 mutante: restarting gerrit to pick up config changes for 2 mail threads and lower timeout (gerrit:507852, gerrit: 507853)

2019-05-02

  • 22:10 jforrester@deploy1001: Synchronized php-1.34.0-wmf.3/extensions/MobileFrontend/resources/dist/mobile.editor.overlay.js: Hot-deploy T222229 to fix VE switching on MobileFrontend (duration: 00m 52s)
  • 21:21 thcipriani: gerrit back
  • 21:20 ejegg: updated payments-wiki from aa8dad50e7 to 558427f731
  • 21:19 thcipriani: gerrit restart to pick up config changes: https://gerrit.wikimedia.org/r/504973/ and https://gerrit.wikimedia.org/r/507858/
  • 21:00 crusnov@deploy1001: Finished deploy [netbox/deploy@bf9aef2]: Upgrade Netbox to 2.5.12 - T222351 (duration: 01m 48s)
  • 20:58 crusnov@deploy1001: Started deploy [netbox/deploy@bf9aef2]: Upgrade Netbox to 2.5.12 - T222351
  • 20:58 crusnov@deploy1001: Finished deploy [netbox/deploy@bf9aef2]: Upgrade Netbox to 2.5.12 - T222351 (duration: 00m 33s)
  • 20:57 crusnov@deploy1001: Started deploy [netbox/deploy@bf9aef2]: Upgrade Netbox to 2.5.12 - T222351
  • 19:41 ejegg: updated CiviCRM from 01c4d15c9a to 5024c968ed
  • 19:40 jforrester@deploy1001: Synchronized php-1.34.0-wmf.3/resources/src/mediawiki.widgets/mw.widgets.SearchInputWidget.js: Hot-deploy T222329 fix part 2 (duration: 00m 50s)
  • 19:39 jforrester@deploy1001: Synchronized php-1.34.0-wmf.3/includes/widget/SearchInputWidget.php: Hot-deploy T222329 fix part 1 (duration: 00m 53s)
  • 19:31 James_F: Shuffled 1.34.0-wmf.3 security patch cee0e569f4 for T222324 into the tip of the upstream branch now it's merged; no-op
  • 19:27 thcipriani@deploy1001: rebuilt and synchronized wikiversions files: all wikis to 1.34.0-wmf.3
  • 19:03 mutante: phab2001 - apt-get autoremove ..removes a single python package not needed anymore
  • 19:00 mutante: phab1001 - upgrading PHP packages on prod phab server
  • 18:59 jynus: restart dbstore1001 for upgrade
  • 18:33 catrope@deploy1001: Synchronized php-1.34.0-wmf.3/extensions/GrowthExperiments/: Don't fatal on deleted pages in 'recent questions' (T222206) (duration: 01m 01s)
  • 18:18 catrope@deploy1001: Synchronized wmf-config/InitialiseSettings.php: Enable cirrussearch-request logging to eventgate-analytics on all wikis (T214080) (duration: 00m 58s)
  • 18:10 catrope@deploy1001: Synchronized wmf-config/InitialiseSettings.php: Enable SpecialHomepage on cswiki and kowiki (T221266) (duration: 00m 58s)
  • 18:09 mutante: phab1001 - install package upgrades for bash and cron
  • 17:46 sbassett: Deployed patch for T222324 (1.34.0-wmf.3)
  • 17:45 arlolra@deploy1001: Finished deploy [parsoid/deploy@414387b]: Updating Parsoid to 9786781 (duration: 05m 45s)
  • 17:39 arlolra@deploy1001: Started deploy [parsoid/deploy@414387b]: Updating Parsoid to 9786781
  • 16:42 gehel: replaying 30 minutes of eqiad search traffic on codfw - T221121
  • 16:10 jynus: restarted dbproxy1005 haproxy, weird connection issue
  • 15:42 reedy@deploy1001: Synchronized wmf-config/CommonSettings.php: Re-enable account creation on wikitech (duration: 00m 57s)
  • 15:40 reedy@deploy1001: Synchronized wmf-config/wikitech.php: Invalidate user sessions upon blocking on wikitech (duration: 00m 59s)
  • 15:15 chasemp: add dsharpe to content admin on wikitech for user blocking
  • 12:42 jynus: stopping several instances at dbstore1001 to clone them to db1139/40 T220572
  • 12:06 ema: swift-proxy rolling restart T222071
  • 12:01 ema: restart swift-proxy on ms-fe1005 T222071
  • 10:37 ariel@deploy1001: Finished deploy [dumps/dumps@53c9f22]: speed up adds-changes dumps by generating index.html less often. tmep sleep 120 (duration: 00m 15s)
  • 10:36 ariel@deploy1001: Started deploy [dumps/dumps@53c9f22]: speed up adds-changes dumps by generating index.html less often. tmep sleep 120
  • 10:04 ema: pool cp4023 w/ ATS backend T219967
  • 09:41 jynus: testing backups on db2102 (increased network and disk usage) T220572
  • 09:07 jynus: reboot db2102 T220572
  • 09:02 ema: depool cp4023 and reimage as upload_ats T219967
  • 09:02 godog: rollout rsyslog upgrade 8.1901.0-1~bpo9+wmf1 to eqiad
  • 08:55 jiji@deploy1001: Synchronized wmf-config/CommonSettings.php: Send 5% of anonymous users to PHP7.2 - T219150 (duration: 01m 03s)
  • 08:49 jijiki: Sending more traffic to PHP7.2 - T219150
  • 04:28 andrewbogott: upgraded mediawiki on wikitech-static to 1.32.1
  • 04:25 kart_: Updated cxserver to 2019-05-02-040910-production (T222305)
  • 04:23 andrewbogott: apt-get upgrade on wikitech-static
  • 04:18 kartik@deploy1001: scap-helm cxserver finished
  • 04:18 kartik@deploy1001: scap-helm cxserver cluster eqiad completed
  • 04:18 kartik@deploy1001: scap-helm cxserver upgrade -f cxserver-eqiad-values.yaml production stable/cxserver [namespace: cxserver, clusters: eqiad]
  • 04:16 kartik@deploy1001: scap-helm cxserver finished
  • 04:16 kartik@deploy1001: scap-helm cxserver cluster codfw completed
  • 04:16 kartik@deploy1001: scap-helm cxserver upgrade -f cxserver-codfw-values.yaml production stable/cxserver [namespace: cxserver, clusters: codfw]
  • 04:15 kartik@deploy1001: scap-helm cxserver finished
  • 04:15 kartik@deploy1001: scap-helm cxserver cluster staging completed
  • 04:15 kartik@deploy1001: scap-helm cxserver upgrade -f cxserver-staging-values.yaml staging stable/cxserver [namespace: cxserver, clusters: staging]
  • 00:35 eileen: civicrm revision changed from 3414657d36 to 01c4d15c9a, config revision is 2119df9495

2019-05-01

  • 23:35 catrope@deploy1001: Synchronized php-1.34.0-wmf.3/extensions/GrowthExperiments/: Drop RENDER_NOW for impact module images (T222223) (duration: 01m 04s)
  • 23:19 ebernhardson@deploy1001: Synchronized wmf-config/InitialiseSettings.php: T220625 Start writing to cloudelastic for group0 (duration: 01m 05s)
  • 22:07 mutante: LDAP - adding jaufrecht to wmf (T222214)
  • 21:57 ebernhardson: start importing group2 to cloudelastic in parallel with group1
  • 21:18 ebernhardson: start importing group1 into cloudelastic from mwmaint1002
  • 20:15 halfak@deploy1001: Finished deploy [ores/deploy@52e9759]: T222121 (duration: 14m 03s)
  • 20:01 halfak@deploy1001: Started deploy [ores/deploy@52e9759]: T222121
  • 19:17 thcipriani@deploy1001: Synchronized php: group1 wikis to 1.34.0-wmf.3 (duration: 01m 53s)
  • 19:15 thcipriani@deploy1001: rebuilt and synchronized wikiversions files: group1 wikis to 1.34.0-wmf.3
  • 17:59 elukey: force remount of /mnt/hdfs on notebook1003 (fuse hdfs got stuck)
  • 17:43 joal@deploy1001: Finished deploy [analytics/refinery@682ab7c]: Regular analytics weekly train - Second try after space freed (duration: 03m 15s)
  • 17:40 joal@deploy1001: Started deploy [analytics/refinery@682ab7c]: Regular analytics weekly train - Second try after space freed
  • 17:27 joal@deploy1001: Finished deploy [analytics/refinery@682ab7c]: Regular analytics weekly train (duration: 25m 18s)
  • 17:02 joal@deploy1001: Started deploy [analytics/refinery@682ab7c]: Regular analytics weekly train
  • 16:58 ebernhardson@deploy1001: Synchronized wmf-config/InitialiseSettings.php: T220625 Start writing to cloudelastic from testwiki (duration: 01m 01s)
  • 16:52 sbisson@deploy1001: Synchronized php-1.34.0-wmf.3/extensions/GrowthExperiments/modules/homepage/ext.growthExperiments.QuestionPosterDialog.js: SWAT: Ensure text exists before logging enter-question-text action|gerrit:507598Ensure text exists before logging enter-question-text action (duration: 01m 00s)
  • 16:48 sbisson@deploy1001: Synchronized php-1.34.0-wmf.3/extensions/GrowthExperiments/includes/HelpPanel/QuestionPoster.php: SWAT: Re-use timestamp for section header and question storage|gerrit:507593Re-use timestamp for section header and question storage (duration: 01m 01s)
  • 16:41 sbisson@deploy1001: Synchronized php-1.34.0-wmf.3/extensions/GrowthExperiments/includes/HelpPanel/QuestionPoster.php: SWAT: Re-use timestamp for section header and question storage|gerrit:507593Re-use timestamp for section header and question storage (duration: 01m 01s)
  • 16:23 sbisson@deploy1001: Synchronized php-1.34.0-wmf.3/extensions/GrowthExperiments/modules/homepage/ext.growthExperiments.Homepage.Mentorship.js: SWAT: Mentorship module: Add data-link-id to mentor's talkpage link|gerrit:507580Mentorship module: Add data-link-id to mentor's talkpage link (duration: 01m 01s)
  • 16:17 sbisson@deploy1001: Synchronized wmf-config/InitialiseSettings.php: SWAT: Enable cirrussearch-request logging to eventgate-analytics for group1 wikis|gerrit:507550Enable cirrussearch-request logging to eventgate-analytics for group1 wikis (duration: 01m 00s)
  • 15:58 reedy@deploy1001: Synchronized wmf-config/wikitech.php: Re-enable password reset on wikitech (duration: 00m 58s)
  • 14:54 reedy@deploy1001: Synchronized wmf-config/wikitech.php: propagate blocks to gerrit (duration: 00m 57s)
  • 14:52 reedy@deploy1001: Synchronized wmf-config/InitialiseSettings.php: Add new logging channel for wikitech (duration: 00m 58s)
  • 13:57 gilles@deploy1001: Synchronized wmf-config/InitialiseSettings.php: T209572 Disable Reporting API endpoint (duration: 00m 59s)
  • 13:31 gilles@deploy1001: Synchronized wmf-config/InitialiseSettings.php: T209572 Enable Feature Policy Reporting origin trial (duration: 01m 01s)
  • 13:28 jbond42: update puppet and facter on esams
  • 12:53 gehel: start recording 30 minutes of traffic from elasticsearch eqiad - T221121
  • 11:27 gilles: T216499 Y216594 T216598 mwscript purgeList.php ruwiki --all --verbose
  • 11:22 gilles@deploy1001: Synchronized wmf-config/InitialiseSettings.php: T216499 T216598 T216594 Renew origin trial tokens for ruwiki (duration: 01m 14s)
  • 01:01 mholloway-shell@deploy1001: Finished deploy [mobileapps/deploy@5d619e4]: Update spec x-amples (duration: 03m 58s)
  • 00:57 mholloway-shell@deploy1001: Started deploy [mobileapps/deploy@5d619e4]: Update spec x-amples
  • 00:30 ayounsi@deploy1001: Finished deploy [librenms/librenms@2094575]: Upgrade LibreNMS to 1.51 - T207481 (duration: 00m 04s)
  • 00:30 ayounsi@deploy1001: Started deploy [librenms/librenms@2094575]: Upgrade LibreNMS to 1.51 - T207481

2019-04-30

  • 23:56 ayounsi@deploy1001: Finished deploy [librenms/librenms@0fd8da6]: Rollback LibreNMS (duration: 00m 05s)
  • 23:56 ayounsi@deploy1001: Started deploy [librenms/librenms@0fd8da6]: Rollback LibreNMS
  • 23:49 ayounsi@deploy1001: Finished deploy [librenms/librenms@2094575]: Upgrade LibreNMS to 1.51 - T207481 (duration: 00m 04s)
  • 23:49 ayounsi@deploy1001: Started deploy [librenms/librenms@2094575]: Upgrade LibreNMS to 1.51 - T207481
  • 23:35 ariel@deploy1001: Finished deploy [dumps/dumps@d715ea0]: determine page ranges of content output files by cumul revision length as well as rev count (duration: 00m 03s)
  • 23:35 ariel@deploy1001: Started deploy [dumps/dumps@d715ea0]: determine page ranges of content output files by cumul revision length as well as rev count
  • 23:18 ayounsi@deploy1001: Finished deploy [librenms/librenms@0fd8da6]: Rollback LibreNMS (duration: 00m 05s)
  • 23:18 ayounsi@deploy1001: Started deploy [librenms/librenms@0fd8da6]: Rollback LibreNMS
  • 23:07 ayounsi@deploy1001: Finished deploy [librenms/librenms@2094575]: Upgrade LibreNMS to 1.51 - T207481 (duration: 00m 05s)
  • 23:07 ayounsi@deploy1001: Started deploy [librenms/librenms@2094575]: Upgrade LibreNMS to 1.51 - T207481
  • 22:21 mobrovac@deploy1001: Finished deploy [restbase/deploy@b3b140f]: Parsoid: Use the new stash tables for old revisions - T215956 (duration: 23m 56s)
  • 21:57 mobrovac@deploy1001: Started deploy [restbase/deploy@b3b140f]: Parsoid: Use the new stash tables for old revisions - T215956
  • 21:56 mobrovac@deploy1001: Finished deploy [restbase/deploy@b3b140f] (dev-cluster): Parsoid: use the new stashing tables for old revisions too (duration: 03m 22s)
  • 21:52 mobrovac@deploy1001: Started deploy [restbase/deploy@b3b140f] (dev-cluster): Parsoid: use the new stashing tables for old revisions too
  • 21:44 sbassett: Deployed patch for T222038 (1.34.0-wmf.1 and 1.34.0-wmf.3)
  • 21:44 sbassett: Deployed patch for T222036 (1.34.0-wmf.1 and 1.34.0-wmf.3)
  • 21:13 thcipriani@deploy1001: rebuilt and synchronized wikiversions files: group0 to 1.34.0-wmf.3
  • 21:10 mutante: netmon1002 - apt-get remove --purge php 7.0* ; apt-get install php-common php-pear (pending upgrades) | netmon2001: apt autoremove
  • 21:06 mutante: netmon2001 - apt-get install php-common php-pear (pending upgrades)
  • 21:03 mutante: netmon2001 - apt-get remove --purge php7.0*
  • 21:03 mutante: librenms - switched from PHP 7.0 to PHP 7.2 succesful now. reverted manual changes for debugging on netmon1002
  • 20:29 thcipriani@deploy1001: Finished scap: testwiki to 1.34.0-wmf.3 and rebuild l10n cache (duration: 31m 17s)
  • 20:21 mutante: netmon1002 - loading PHP 7.2 module to debug issue for librenms. librenms very short downtime
  • 19:58 thcipriani@deploy1001: Started scap: testwiki to 1.34.0-wmf.3 and rebuild l10n cache
  • 19:56 thcipriani@deploy1001: Pruned MediaWiki: 1.33.0-wmf.20 (duration: 02m 07s)
  • 19:47 thcipriani@deploy1001: Pruned MediaWiki: 1.33.0-wmf.19 (duration: 02m 24s)
  • 19:44 smalyshev@deploy1001: Finished deploy [wdqs/wdqs@4360316]: Redeploy GUI for fixes T222133, T222129, T222181, T222182 (duration: 09m 17s)
  • 19:44 thcipriani@deploy1001: Pruned MediaWiki: 1.33.0-wmf.18 (duration: 02m 25s)
  • 19:43 mutante: switched netmon1002/netmon2001 from PHP 7.0 to 7.2 but reverted because LibreNMS still had an issue with it
  • 19:40 thcipriani@deploy1001: Pruned MediaWiki: 1.33.0-wmf.17 (duration: 10m 11s)
  • 19:35 smalyshev@deploy1001: Started deploy [wdqs/wdqs@4360316]: Redeploy GUI for fixes T222133, T222129, T222181, T222182
  • 19:27 otto@deploy1001: scap-helm eventgate-analytics finished
  • 19:27 otto@deploy1001: scap-helm eventgate-analytics cluster codfw completed
  • 19:27 otto@deploy1001: scap-helm eventgate-analytics upgrade production -f eventgate-analytics/eventgate-analytics-codfw-values.yaml --reset-values stable/eventgate-analytics [namespace: eventgate-analytics, clusters: codfw]
  • 19:26 otto@deploy1001: scap-helm eventgate-analytics finished
  • 19:26 otto@deploy1001: scap-helm eventgate-analytics cluster eqiad completed
  • 19:26 otto@deploy1001: scap-helm eventgate-analytics upgrade production -f eventgate-analytics/eventgate-analytics-eqiad-values.yaml --reset-values stable/eventgate-analytics [namespace: eventgate-analytics, clusters: eqiad]
  • 19:25 otto@deploy1001: scap-helm eventgate-analytics finished
  • 19:25 otto@deploy1001: scap-helm eventgate-analytics cluster staging completed
  • 19:25 otto@deploy1001: scap-helm eventgate-analytics upgrade staging -f eventgate-analytics/eventgate-analytics-staging-values.yaml --reset-values stable/eventgate-analytics [namespace: eventgate-analytics, clusters: staging]
  • 19:24 otto@deploy1001: scap-helm eventgate-analytics upgrade production -f eventgate-analytics/eventgate-analytics-staging-values.yaml --reset-values stable/eventgate-analytics [namespace: eventgate-analytics, clusters: staging]
  • 19:24 otto@deploy1001: scap-helm eventgate-analytics upgrade production -f eventgate-analytics/analytics/eventgate-analytics-staging-values.yaml --reset-values stable/eventgate-analytics [namespace: eventgate-analytics, clusters: staging]
  • 18:40 cdanis: running puppet on ms-be201[3,5] to bump replication concurrency T221068
  • 18:24 cdanis: running puppet on ms-be2014 to bump replication concurrency T221068
  • 18:09 thcipriani: start branchcut for 1.34.0-wmf.3
  • 17:16 bsitzmann@deploy1001: Finished deploy [mobileapps/deploy@1f09e44]: Update mobileapps to 142ba30 (T217837) (duration: 04m 16s)
  • 17:11 bsitzmann@deploy1001: Started deploy [mobileapps/deploy@1f09e44]: Update mobileapps to 142ba30 (T217837)
  • 16:57 ayounsi@deploy1001: Finished deploy [librenms/librenms@0fd8da6]: Rollback LibreNMS (duration: 00m 09s)
  • 16:57 ayounsi@deploy1001: Started deploy [librenms/librenms@0fd8da6]: Rollback LibreNMS
  • 16:52 arturo: merging change to `profile::base` and `::raid` https://gerrit.wikimedia.org/r/c/operations/puppet/+/507357 related to T221225
  • 16:36 ayounsi@deploy1001: Finished deploy [librenms/librenms@2094575]: Upgrade LibreNMS to 1.51 - T207706 (duration: 00m 11s)
  • 16:36 ayounsi@deploy1001: Started deploy [librenms/librenms@2094575]: Upgrade LibreNMS to 1.51 - T207706
  • 16:27 XioNoX: upgrade librenms to 1.51
  • 16:26 jbond42: upgrade puppet and facter in eqsin
  • 16:04 ema: pool cp4022 w/ ATS backend T219967
  • 15:58 robh@cumin1001: END (PASS) - Cookbook sre.hosts.decommission (exit_code=0)
  • 15:58 robh@cumin1001: START - Cookbook sre.hosts.decommission
  • 15:58 robh@cumin1001: END (PASS) - Cookbook sre.hosts.decommission (exit_code=0)
  • 15:58 robh@cumin1001: START - Cookbook sre.hosts.decommission
  • 15:45 elukey: restart hadoop hdfs namenodes on an-master100[1,2] to pick up new logging settings - T220702
  • 15:18 jynus: stop s8 instance on dbstore2001 for cloning to db2100 T220572
  • 15:09 jiji@deploy1001: Synchronized wmf-config/CommonSettings.php: Send 1% of anonymous users to PHP7.2 - T219150 (duration: 00m 54s)
  • 14:58 jbond42: enable-puppet "T220987: global kafaka log shipping - staged rollout (jbond)"
  • 14:56 cdanis: cdanis@cumin1001.eqiad.wmnet ~ % sudo cumin 'bast3002*' 'run-puppet-agent --enable "filippo prometheus"'
  • 14:49 cdanis: cdanis@cumin1001.eqiad.wmnet ~ % sudo cumin 'labmon1001*' 'run-puppet-agent --enable "staged rollout T222105 by cdanis"'
  • 14:44 jijiki: Sending 1% of anonymous users to PHP7.2 - T219150
  • 14:43 cdanis: cdanis@cumin1001.eqiad.wmnet ~ % sudo cumin 'bast5001*' 'run-puppet-agent --enable "staged rollout T222105 by cdanis"'
  • 14:26 jbond42: disable-puppet "T220987: global kafaka log shipping - staged rollout (jbond)"
  • 14:24 cdanis: cdanis@cumin1001.eqiad.wmnet ~ % sudo cumin 'prometheus2004*' 'run-puppet-agent --enable "staged rollout T222105 by cdanis"'
  • 14:17 cdanis: cdanis@cumin1001.eqiad.wmnet ~ % sudo cumin 'prometheus2003*' 'run-puppet-agent --enable "staged rollout T222105 by cdanis"'
  • 14:15 cdanis: cdanis@prometheus1003.eqiad.wmnet ~ % sudo enable-puppet 'cdanis testing original query.max-samples T222105'
  • 13:29 cdanis: cdanis@prometheus1004.eqiad.wmnet ~ % sudo systemctl restart prometheus@ops.service
  • 13:28 ema: depool cp4022 and reimage as upload_ats T219967
  • 13:20 arturo: reverting sudo puppet module changes https://gerrit.wikimedia.org/r/c/operations/puppet/+/507317
  • 13:16 cdanis: cdanis@prometheus1003.eqiad.wmnet ~ % sudo systemctl restart prometheus@ops.service
  • 13:15 cdanis: cdanis@prometheus1003.eqiad.wmnet ~ % sudo disable-puppet 'cdanis testing original query.max-samples T222105'
  • 13:08 cdanis: OOMed the eqiad ops prometheus @ prometheus1003
  • 13:02 cdanis: OOMed the eqiad ops prometheus @ prometheus1004
  • 12:47 cdanis: cdanis@prometheus1003.eqiad.wmnet ~ % sudo run-puppet-agent --enable "staged rollout T222105 by cdanis"
  • 12:41 arturo: merging a sudo puppet module change
  • 12:39 cdanis: cdanis@prometheus1004.eqiad.wmnet ~ % sudo run-puppet-agent --enable "staged rollout T222105 by cdanis"
  • 12:34 elukey: moved /home to /srv/home (more space in a dedicated partition) on stat1005
  • 12:32 cdanis: cdanis@cumin1001.eqiad.wmnet ~ % sudo cumin 'R:prometheus::server' 'disable-puppet "staged rollout T222105 by cdanis"'
  • 11:27 Lucas_WMDE: EU SWAT done
  • 11:22 mlitn@deploy1001: Synchronized wmf-config/CommonSettings.php: Allow cross-site requests from mobile domains (duration: 00m 52s)
  • 11:15 lucaswerkmeister-wmde@deploy1001: Synchronized wmf-config/InitialiseSettings.php: SWAT: Serialize empty lists as objects on Commons (T138104)|gerrit:507032Serialize empty lists as objects on Commons (T138104) (duration: 00m 54s)
  • 11:12 lucaswerkmeister-wmde@deploy1001: Synchronized wmf-config/InitialiseSettings.php: SWAT: Serialize empty lists as objects on Wikidata (T138104)|gerrit:507031Serialize empty lists as objects on Wikidata (T138104) (duration: 00m 55s)
  • 11:08 gilles@deploy1001: Finished deploy [performance/navtiming@d6756c0]: T221848 Proper fix for partitions_for_topic in python-kafka > 1.4.4 (duration: 00m 05s)
  • 11:08 gilles@deploy1001: Started deploy [performance/navtiming@d6756c0]: T221848 Proper fix for partitions_for_topic in python-kafka > 1.4.4
  • 11:02 ema: cp3038 mbox lag, restarting varnish-be
  • 10:55 kart_: Updated cxserver to 2019-04-30-055331-production (T219412)
  • 10:49 santhosh@deploy1001: scap-helm cxserver finished
  • 10:49 santhosh@deploy1001: scap-helm cxserver cluster codfw completed
  • 10:49 santhosh@deploy1001: scap-helm cxserver upgrade -f cxserver-codfw-values.yaml production stable/cxserver [namespace: cxserver, clusters: codfw]
  • 10:48 santhosh@deploy1001: scap-helm cxserver finished
  • 10:48 santhosh@deploy1001: scap-helm cxserver cluster eqiad completed
  • 10:48 santhosh@deploy1001: scap-helm cxserver upgrade -f cxserver-eqiad-values.yaml production stable/cxserver [namespace: cxserver, clusters: eqiad]
  • 10:45 santhosh@deploy1001: scap-helm cxserver finished
  • 10:45 santhosh@deploy1001: scap-helm cxserver cluster staging completed
  • 10:45 santhosh@deploy1001: scap-helm cxserver upgrade -f cxserver-staging-values.yaml staging stable/cxserver [namespace: cxserver, clusters: staging]
  • 10:32 godog: rollout rsyslog upgrade to 8.1901.0-1~bpo9+wmf1 in codfw
  • 10:32 arturo: T222060 reimaged labtestservices2003 as stretch spare system
  • 10:32 arturo: T222057 reimaged labtestvirt2003 as spare system
  • 10:12 godog: rollout rsyslog upgrade to 8.1901.0-1~bpo9+wmf1 in eqsin / ulsfo / esams
  • 10:08 jynus: stop s7 and x1 instances on dbstore2* for cloning T220572
  • 09:31 fsero@puppetmaster1001: conftool action : set/pooled=yes; selector: cluster=docker-registry,service=docker-registry
  • 09:26 fsero: creating lvs endpoints for docker registry - T221101
  • 09:02 elukey: roll restart hdfs namenodes on an-master100[1,2] to pick up new settings - T220702
  • 08:22 godog: bounce prometheus on bast4002 after backfill has finished - T187987
  • 08:11 gilles@deploy1001: Finished deploy [performance/navtiming@8f135ac]: T221848 Default to partition 0 when no partition is found (duration: 00m 05s)
  • 08:11 gilles@deploy1001: Started deploy [performance/navtiming@8f135ac]: T221848 Default to partition 0 when no partition is found
  • 08:11 gilles@deploy1001: deploy aborted: T221848 Defalt to partition 0 when no partition is found (duration: 00m 00s)
  • 08:11 gilles@deploy1001: Started deploy [performance/navtiming@8f135ac]: T221848 Defalt to partition 0 when no partition is found
  • 07:53 gilles@deploy1001: Finished deploy [performance/navtiming@e900152]: T221848 add more logging around startup (duration: 00m 05s)
  • 07:53 gilles@deploy1001: Started deploy [performance/navtiming@e900152]: T221848 add more logging around startup
  • 07:29 moritzm: installing systemd updates for jessie
  • 07:24 marostegui: Remove labservices1001 and labservices1002 from tendril T221857
  • 05:27 marostegui@deploy1001: Synchronized wmf-config/db-eqiad.php: Clarify db1093's status (duration: 00m 51s)
  • 05:26 marostegui@deploy1001: Synchronized wmf-config/db-codfw.php: Clarify db1093's status (duration: 00m 55s)
  • 04:26 mutante: LDAP - remove user pirroh from group nda (T222085 and cross-validate-accounts demands consistency)
  • 02:23 mutante: analytics1050 - systemctl start mclog ... it was failed like recently on analytics1052 (T212219 ?)
  • 02:09 tgr@deploy1001: Synchronized wmf-config/db-eqiad.php: SWAT: depool db1093|gerrit:507237depool db1093 (duration: 00m 54s)
  • 01:30 mutante: contint2001..then contint1001 - deleting /etc/zuul/wikimedia and letting puppet re-clone it (gerrit:507070) (T218844)

2019-04-29

  • 23:59 ebernhardson@deploy1001: Synchronized wmf-config/CirrusSearch-production.php: T220625 Add cloudelastic servers to wgCirrusSearchClusters (5/5) (duration: 00m 52s)
  • 23:58 ebernhardson@deploy1001: Synchronized wmf-config/InitialiseSettings.php: T220625 Add cloudelastic servers to wgCirrusSearchClusters (4/5) (duration: 00m 52s)
  • 23:56 ebernhardson@deploy1001: Synchronized wmf-config/ProductionServices.php: T220625 Add cloudelastic servers to wgCirrusSearchClusters (3/5) (duration: 00m 50s)
  • 23:55 ebernhardson@deploy1001: Synchronized wmf-config/LabsServices.php: T220625 Add cloudelastic servers to wgCirrusSearchClusters (2/5) (duration: 00m 52s)
  • 23:54 ebernhardson@deploy1001: Synchronized tests/: T220625 Add cloudelastic servers to wgCirrusSearchClusters (1/5) (duration: 00m 53s)
  • 23:34 smalyshev@deploy1001: Finished deploy [wdqs/wdqs@65796ad]: New deploy with GUI fix (duration: 31m 04s)
  • 23:33 ebernhardson@deploy1001: Synchronized wmf-config/InitialiseSettings.php: T221154: Add static.inaturalist.org to $wgCopyUploadDomains for Commons (duration: 00m 54s)
  • 23:03 smalyshev@deploy1001: Started deploy [wdqs/wdqs@65796ad]: New deploy with GUI fix
  • 21:13 mutante: restarting gerrit
  • 21:10 mutante: cobalt (gerrit) upgrading openjdk 8 minor version
  • 20:40 arlolra: Updated Parsoid to c9dab9d (T106578, T113194, T205338, T219072, T219938, T221384, T219943)
  • 20:37 XioNoX: add BGP session to AS4922 in eqiad
  • 20:37 RoanKattouw: Deployed patch for T222014
  • 20:26 arlolra@deploy1001: Finished deploy [parsoid/deploy@7859b58]: Updating Parsoid to c9dab9d (duration: 06m 36s)
  • 20:25 jbond@cumin1001: conftool action : set/pooled=yes; selector: name=mw127[5-9].eqiad.wmnet
  • 20:19 arlolra@deploy1001: Started deploy [parsoid/deploy@7859b58]: Updating Parsoid to c9dab9d
  • 20:18 jbond@cumin1001: conftool action : set/pooled=no; selector: name=mw127[5-9].eqiad.wmnet
  • 20:18 jbond@cumin1001: conftool action : set/pooled=yes; selector: name=mw127[0-4].eqiad.wmnet
  • 20:10 jbond@cumin1001: conftool action : set/pooled=no; selector: name=mw127[0-4].eqiad.wmnet
  • 20:08 jbond@cumin1001: conftool action : set/pooled=yes; selector: name=mw126[5-9].eqiad.wmnet
  • 19:59 jbond@cumin1001: conftool action : set/pooled=no; selector: name=mw126[5-9].eqiad.wmnet
  • 19:52 jbond@cumin1001: conftool action : set/pooled=yes; selector: name=mw126[1-4].eqiad.wmnet
  • 19:44 thcipriani: gerrit back
  • 19:44 jbond@cumin1001: conftool action : set/pooled=no; selector: name=mw126[1-4].eqiad.wmnet
  • 19:44 jbond@cumin1001: conftool action : set/pooled=yes; selector: name=mw125[4-8].eqiad.wmnet
  • 19:43 thcipriani: gerrit restart for https://gerrit.wikimedia.org/r/327763 T221026
  • 19:39 jbond@cumin1001: conftool action : set/pooled=no; selector: name=mw125[4-8].eqiad.wmnet
  • 19:39 jbond@cumin1001: conftool action : set/pooled=yes; selector: name=mw125[0-3].eqiad.wmnet
  • 19:36 jbond@cumin1001: conftool action : set/pooled=no; selector: name=mw125[0-3].eqiad.wmnet
  • 19:35 jbond@cumin1001: conftool action : set/pooled=yes; selector: name=mw124[5-9].eqiad.wmnet
  • 19:32 jbond@cumin1001: conftool action : set/pooled=no; selector: name=mw124[5-9].eqiad.wmnet
  • 19:31 jbond@cumin1001: conftool action : set/pooled=yes; selector: name=mw124[0-4].eqiad.wmnet
  • 19:26 jbond@cumin1001: conftool action : set/pooled=no; selector: name=mw124[0-4].eqiad.wmnet
  • 19:26 jbond@cumin1001: conftool action : set/pooled=yes; selector: name=mw124[0-4].eqiad.wmnet
  • 19:25 jbond@cumin1001: conftool action : set/pooled=yes; selector: name=mw123[8-9].eqiad.wmnet
  • 19:21 jbond@cumin1001: conftool action : set/pooled=no; selector: name=mw123[8-9].eqiad.wmnet
  • 19:20 jbond@cumin1001: conftool action : set/pooled=yes; selector: name=mw123[0-5].eqiad.wmnet
  • 19:17 jbond@cumin1001: conftool action : set/pooled=no; selector: name=mw123[0-5].eqiad.wmnet
  • 19:07 otto@deploy1001: sync-file aborted: Enable cirrussearch-request logging to eventgate-analytics for group0 wikis - T214080 (duration: 00m 02s)
  • 19:05 otto@deploy1001: Synchronized wmf-config/InitialiseSettings.php: Enable cirrussearch-request logging to eventgate-analytics for group0 wikis - T214080 (duration: 00m 53s)
  • 19:01 ottomata: deploying config change to enable cirrusssearch-request logging to eventgate-analytics for group0 wikis - T214080
  • 18:59 RoanKattouw: Deployed patch for T221739
  • 18:45 otto@deploy1001: scap-helm eventgate-analytics finished
  • 18:45 otto@deploy1001: scap-helm eventgate-analytics cluster eqiad completed
  • 18:45 otto@deploy1001: scap-helm eventgate-analytics upgrade production -f analytics/eventgate-analytics-eqiad-values.yaml --reset-values stable/eventgate-analytics [namespace: eventgate-analytics, clusters: eqiad]
  • 18:44 otto@deploy1001: scap-helm eventgate-analytics finished
  • 18:44 otto@deploy1001: scap-helm eventgate-analytics cluster codfw completed
  • 18:44 otto@deploy1001: scap-helm eventgate-analytics upgrade production -f analytics/eventgate-analytics-codfw-values.yaml --reset-values stable/eventgate-analytics [namespace: eventgate-analytics, clusters: codfw]
  • 18:42 catrope@deploy1001: Synchronized static/images/project-logos/: Change wikimaniawiki logo to Wikimania 2019 version (T221829) (duration: 00m 54s)
  • 18:41 otto@deploy1001: scap-helm eventgate-analytics finished
  • 18:41 otto@deploy1001: scap-helm eventgate-analytics cluster staging completed
  • 18:41 otto@deploy1001: scap-helm eventgate-analytics upgrade staging -f analytics/eventgate-analytics-staging-values.yaml --reset-values stable/eventgate-analytics [namespace: eventgate-analytics, clusters: staging]
  • 18:41 jbond@cumin1001: conftool action : set/pooled=yes; selector: name=mw122[8-9].eqiad.wmnet
  • 18:37 jbond@cumin1001: conftool action : set/pooled=no; selector: name=mw122[8-9].eqiad.wmnet
  • 18:37 catrope@deploy1001: Synchronized wmf-config/InitialiseSettings.php: Serialize empty lists as objects on Test Commons (T138104) (duration: 00m 54s)
  • 18:34 jbond@cumin1001: conftool action : set/pooled=yes; selector: name=mw122[1-6].eqiad.wmnet
  • 18:33 otto@deploy1001: scap-helm eventgate-analytics finished
  • 18:33 otto@deploy1001: scap-helm eventgate-analytics cluster staging completed
  • 18:33 otto@deploy1001: scap-helm eventgate-analytics upgrade staging -f analytics/eventgate-analytics-staging-values.yaml --reset-values stable/eventgate-analytics [namespace: eventgate-analytics, clusters: staging]
  • 18:30 catrope@deploy1001: Synchronized wmf-config/InitialiseSettings.php: Serialize empty lists as objects on Test Wikidata (T138104) (duration: 00m 53s)
  • 18:29 jbond@cumin1001: conftool action : set/pooled=no; selector: name=mw122[1-6].eqiad.wmnet
  • 18:26 otto@deploy1001: scap-helm eventgate-analytics finished
  • 18:26 otto@deploy1001: scap-helm eventgate-analytics cluster staging completed
  • 18:26 otto@deploy1001: scap-helm eventgate-analytics upgrade staging -f analytics/eventgate-analytics-staging-values.yaml --reset-values stable/eventgate-analytics [namespace: eventgate-analytics, clusters: staging]
  • 18:22 Jeff_Green: authdns-update for T221475
  • 18:21 catrope@deploy1001: Synchronized docroot/noc: Publish throttle-analyze at noc (T187894) (duration: 00m 53s)
  • 18:13 catrope@deploy1001: Synchronized wmf-config/InitialiseSettings.php: Add www4.bibl.ulaval.ca to wgCopyUploadsDomains (T220704) (duration: 00m 53s)
  • 17:35 Jeff_Green: authdns-update to deploy T214525
  • 17:15 onimisionipe@deploy1001: Finished deploy [wdqs/wdqs@9273213]: Blazegraph upgrade for new LDF version and GUI updates (duration: 06m 58s)
  • 17:08 onimisionipe@deploy1001: Started deploy [wdqs/wdqs@9273213]: Blazegraph upgrade for new LDF version and GUI updates
  • 16:38 jforrester@deploy1001: Synchronized wmf-config/InitialiseSettings.php: SDC UW config cleanup: Drop wmgMediaInfoEnableUploadWizardDepicts from IS (duration: 00m 53s)
  • 16:34 jforrester@deploy1001: Synchronized wmf-config/CommonSettings.php: SDC UW config cleanup: Switch to wmgMediaInfoEnableUploadWizardStatements in CS (duration: 00m 53s)
  • 16:33 jforrester@deploy1001: sync-file aborted: SDC UW config cleanup: Switch to wmgMediaInfoEnableUploadWizardStatements in CS (duration: 00m 01s)
  • 16:33 jforrester@deploy1001: Synchronized wmf-config/InitialiseSettings.php: SDC UW config cleanup: Add wmgMediaInfoEnableUploadWizardDepicts to IS (duration: 00m 53s)
  • 16:28 jforrester@deploy1001: Synchronized wmf-config/InitialiseSettings.php: SDC: Enable feature flag for depicts in UW on Test Commons (duration: 00m 53s)
  • 15:40 mholloway-shell@deploy1001: Synchronized wmf-config/InitialiseSettings.php: Update WikimediaEditorTasks counter config (T221951) (duration: 00m 58s)
  • 14:49 herron: added uid=sukhe,ou=people,dc=wikimedia,dc=org to nda ldap group T221990
  • 13:56 jbond42: rolling security updates for imagemagick
  • 13:45 fsero: DNS: creating docker-registry.svc.(eqiad|codfw).wmnet RRs
  • 13:17 jbond42