Server Admin Log

From Wikitech
(Redirected from SAL)
Jump to navigation Jump to search

2019-12-08

  • 02:58 andrew@deploy1001: Finished deploy [horizon/deploy@ff0a0e7]: (no justification provided) (duration: 01m 53s)
  • 02:56 andrew@deploy1001: Started deploy [horizon/deploy@ff0a0e7]: (no justification provided)
  • 02:19 andrew@deploy1001: Finished deploy [horizon/deploy@ed2243c]: (no justification provided) (duration: 01m 50s)
  • 02:17 andrew@deploy1001: Started deploy [horizon/deploy@ed2243c]: (no justification provided)
  • 01:49 andrew@deploy1001: Finished deploy [horizon/deploy@accbbd1]: (no justification provided) (duration: 01m 55s)
  • 01:47 andrew@deploy1001: Started deploy [horizon/deploy@accbbd1]: (no justification provided)
  • 01:44 andrew@deploy1001: Finished deploy [horizon/deploy@accbbd1]: (no justification provided) (duration: 01m 47s)
  • 01:43 andrew@deploy1001: Started deploy [horizon/deploy@accbbd1]: (no justification provided)
  • 01:40 andrew@deploy1001: Finished deploy [horizon/deploy@accbbd1]: (no justification provided) (duration: 01m 49s)
  • 01:38 andrew@deploy1001: Started deploy [horizon/deploy@accbbd1]: (no justification provided)
  • 01:37 andrew@deploy1001: Finished deploy [horizon/deploy@accbbd1]: (no justification provided) (duration: 00m 07s)
  • 01:36 andrew@deploy1001: Started deploy [horizon/deploy@accbbd1]: (no justification provided)
  • 01:16 andrew@deploy1001: Finished deploy [horizon/deploy@accbbd1]: (no justification provided) (duration: 01m 53s)
  • 01:14 andrew@deploy1001: Started deploy [horizon/deploy@accbbd1]: (no justification provided)
  • 01:11 andrew@deploy1001: Finished deploy [horizon/deploy@841693b]: (no justification provided) (duration: 01m 48s)
  • 01:09 andrew@deploy1001: Started deploy [horizon/deploy@841693b]: (no justification provided)

2019-12-07

  • 13:44 andrew@deploy1001: Finished deploy [horizon/deploy@841693b]: (no justification provided) (duration: 00m 08s)
  • 13:44 andrew@deploy1001: Started deploy [horizon/deploy@841693b]: (no justification provided)
  • 13:29 elukey: restart php-fpm on mw1293 (jobrunner) as test
  • 13:26 elukey: restart php-fpm on mw1299 (jobrunner) as test
  • 09:51 apergos: reboot dumpsdata1002, checking that rpc.statd starts on boot properly
  • 04:10 andrew@deploy1001: Finished deploy [horizon/deploy@841693b]: (no justification provided) (duration: 01m 55s)
  • 04:08 andrew@deploy1001: Started deploy [horizon/deploy@841693b]: (no justification provided)
  • 03:27 andrew@deploy1001: Finished deploy [horizon/deploy@841693b]: (no justification provided) (duration: 01m 45s)
  • 03:25 andrew@deploy1001: Started deploy [horizon/deploy@841693b]: (no justification provided)
  • 02:59 andrew@deploy1001: Finished deploy [horizon/deploy@0f70602]: (no justification provided) (duration: 01m 40s)
  • 02:58 andrew@deploy1001: Started deploy [horizon/deploy@0f70602]: (no justification provided)
  • 02:55 andrew@deploy1001: Finished deploy [horizon/deploy@0f70602]: (no justification provided) (duration: 00m 07s)
  • 02:55 andrew@deploy1001: Started deploy [horizon/deploy@0f70602]: (no justification provided)
  • 01:05 andrew@deploy1001: Finished deploy [horizon/deploy@0f70602]: (no justification provided) (duration: 02m 55s)
  • 01:02 andrew@deploy1001: Started deploy [horizon/deploy@0f70602]: (no justification provided)
  • 01:01 andrew@deploy1001: Finished deploy [horizon/deploy@0f70602]: (no justification provided) (duration: 02m 04s)
  • 00:59 andrew@deploy1001: Started deploy [horizon/deploy@0f70602]: (no justification provided)
  • 00:58 andrew@deploy1001: Finished deploy [horizon/deploy@1911591]: (no justification provided) (duration: 106m 13s)

2019-12-06

  • 23:35 ejegg: updated internal fundraising dashboard from d9d74429ba to 3917f7d9dc
  • 23:22 ejegg: updated payments-wiki from 00632a397c to b3f983d5d1
  • 23:12 andrew@deploy1001: Started deploy [horizon/deploy@1911591]: (no justification provided)
  • 23:12 andrew@deploy1001: Finished deploy [horizon/deploy@1911591]: (no justification provided) (duration: 00m 07s)
  • 23:12 andrew@deploy1001: Started deploy [horizon/deploy@1911591]: (no justification provided)
  • 23:10 andrew@deploy1001: Finished deploy [horizon/deploy@1911591]: (no justification provided) (duration: 00m 07s)
  • 23:10 andrew@deploy1001: Started deploy [horizon/deploy@1911591]: (no justification provided)
  • 23:00 ppchelko@deploy1001: Finished deploy [restbase/deploy@c2bab5d]: Parsoid: Disable mirroring all traffic in split mode (duration: 13m 43s)
  • 22:46 ppchelko@deploy1001: Started deploy [restbase/deploy@c2bab5d]: Parsoid: Disable mirroring all traffic in split mode
  • 22:08 bblack: mc1033: ethernet tweaks as well (expect a short link blip)
  • 21:54 bblack: mc1026: add tc-fq qdisc to eth0 for tx
  • 21:41 bblack: mc1026: adjusting rx ring to 2047 and disabling ethernet pause (will be a minor blip of eth link state!)
  • 21:25 jeh@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
  • 21:23 jeh@cumin1001: START - Cookbook sre.hosts.downtime
  • 21:16 cdanis@cumin2001: conftool action : set/weight=15; selector: service=nginx,cluster=api_appserver,dc=eqiad,name=mw1231.eqiad.wmnet
  • 21:15 cdanis@cumin2001: conftool action : set/weight=15; selector: service=nginx,cluster=api_appserver,dc=eqiad,name=mw1227.eqiad.wmnet
  • 21:15 cdanis@cumin2001: conftool action : set/weight=15; selector: service=nginx,cluster=api_appserver,dc=eqiad,name=mw1222.eqiad.wmnet
  • 21:15 cdanis@cumin2001: conftool action : set/weight=15; selector: service=nginx,cluster=api_appserver,dc=eqiad,name=mw1233.eqiad.wmnet
  • 21:14 cdanis@cumin2001: conftool action : set/weight=25; selector: service=nginx,cluster=api_appserver,dc=eqiad,name=mw12[789].*
  • 21:12 cdanis@cumin2001: conftool action : set/weight=15; selector: service=nginx,cluster=api_appserver,dc=eqiad,name=mw1233
  • 21:12 cdanis@cumin2001: conftool action : set/weight=15; selector: service=nginx,cluster=api_appserver,dc=eqiad,name=mw1222
  • 21:12 cdanis@cumin2001: conftool action : set/weight=15; selector: service=nginx,cluster=api_appserver,dc=eqiad,name=mw1227
  • 21:04 jeh@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
  • 21:01 jeh@cumin1001: START - Cookbook sre.hosts.downtime
  • 20:37 jeh@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
  • 20:34 jeh@cumin1001: START - Cookbook sre.hosts.downtime
  • 18:57 jeh@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
  • 18:56 cdanis@cumin2001: conftool action : set/weight=20; selector: service=nginx,cluster=api_appserver,dc=eqiad,name=mw12.*
  • 18:55 jeh@cumin1001: START - Cookbook sre.hosts.downtime
  • 18:36 jeh@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
  • 18:34 jeh@cumin1001: START - Cookbook sre.hosts.downtime
  • 18:15 jeh@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
  • 18:13 jeh@cumin1001: START - Cookbook sre.hosts.downtime
  • 18:12 cdanis@cumin2001: conftool action : set/weight=15; selector: service=nginx,cluster=api_appserver,dc=eqiad,name=mw12.*
  • 17:54 bblack: install2002 - restart squid3 service
  • 17:43 jforrester@deploy1001: Synchronized php-1.35.0-wmf.8/includes/libs/rdbms/database/Database.php: T239877 Have Database::makeWhereFrom2d assume is string-based (duration: 01m 11s)
  • 17:28 jeh@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
  • 17:26 jeh@cumin1001: START - Cookbook sre.hosts.downtime
  • 17:19 bblack: editing /e/n/i carefully with sed across the fleet via cumin, to correct legacy "dns-nameservers" line in older installs
  • 17:08 jeh@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
  • 17:06 jeh@cumin1001: START - Cookbook sre.hosts.downtime
  • 16:50 jeh@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
  • 16:48 jeh@cumin1001: START - Cookbook sre.hosts.downtime
  • 16:47 _joe_: acpu flush finished
  • 16:41 _joe_: flush acpu across the api cluster in eqiad
  • 16:32 _joe_: flushing apcu on mw1339
  • 16:21 ejegg: updated fundraising CiviCRM from 30cdc5fa59 to 7eab025ec0
  • 14:40 ema: text@esams: rolling ats-backend-restart to apply https://gerrit.wikimedia.org/r/#/c/operations/puppet/+/553132/ T238494
  • 14:12 ema: cp3050: ats-backend-restart to apply https://gerrit.wikimedia.org/r/#/c/operations/puppet/+/553132/ T238494
  • 13:41 ema: cp2004: adding do_global_ doesn't seem to work with reload, restart ats-be T238494
  • 13:31 gehel: starting transfer of blazegraph journal from wdqs1007 to stat1004 - T239898
  • 08:46 andrew@deploy1001: Finished deploy [horizon/deploy@1911591]: (no justification provided) (duration: 00m 08s)
  • 08:46 andrew@deploy1001: Started deploy [horizon/deploy@1911591]: (no justification provided)
  • 08:43 andrew@deploy1001: Finished deploy [horizon/deploy@1911591]: (no justification provided) (duration: 01m 59s)
  • 08:41 andrew@deploy1001: Started deploy [horizon/deploy@1911591]: (no justification provided)
  • 08:38 andrew@deploy1001: Finished deploy [horizon/deploy@1911591]: (no justification provided) (duration: 01m 55s)
  • 08:36 andrew@deploy1001: Started deploy [horizon/deploy@1911591]: (no justification provided)
  • 08:25 moritzm: installing libgd2 security updates on stretch
  • 08:04 andrew@deploy1001: Finished deploy [horizon/deploy@a8c759e]: (no justification provided) (duration: 00m 07s)
  • 08:04 andrew@deploy1001: Started deploy [horizon/deploy@a8c759e]: (no justification provided)
  • 08:03 andrew@deploy1001: Finished deploy [horizon/deploy@a8c759e]: (no justification provided) (duration: 01m 28s)
  • 08:01 andrew@deploy1001: Started deploy [horizon/deploy@a8c759e]: (no justification provided)
  • 08:01 andrew@deploy1001: Finished deploy [horizon/deploy@a8c759e]: (no justification provided) (duration: 02m 03s)
  • 07:59 andrew@deploy1001: Started deploy [horizon/deploy@a8c759e]: (no justification provided)
  • 07:55 moritzm: installing libonig security updates
  • 07:46 andrew@deploy1001: Finished deploy [horizon/deploy@a8c759e]: (no justification provided) (duration: 03m 11s)
  • 07:43 andrew@deploy1001: Started deploy [horizon/deploy@a8c759e]: (no justification provided)
  • 07:42 andrew@deploy1001: Finished deploy [horizon/deploy@1ac26da]: (no justification provided) (duration: 00m 08s)
  • 07:41 andrew@deploy1001: Started deploy [horizon/deploy@1ac26da]: (no justification provided)
  • 07:41 andrew@deploy1001: Finished deploy [horizon/deploy@1ac26da]: (no justification provided) (duration: 03m 23s)
  • 07:38 andrew@deploy1001: Started deploy [horizon/deploy@1ac26da]: (no justification provided)
  • 07:38 moritzm: installing libav security updates
  • 07:37 andrew@deploy1001: Finished deploy [horizon/deploy@1ac26da]: (no justification provided) (duration: 00m 07s)
  • 07:37 andrew@deploy1001: Started deploy [horizon/deploy@1ac26da]: (no justification provided)
  • 03:58 herron@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
  • 03:55 herron@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
  • 03:53 herron@cumin1001: START - Cookbook sre.hosts.downtime
  • 03:53 herron@cumin1001: START - Cookbook sre.hosts.downtime
  • 02:12 reedy@deploy1001: Synchronized php-1.35.0-wmf.8/extensions/SecurePoll/cli/dump.php: T239968 (duration: 01m 04s)
  • 01:34 reedy@deploy1001: Synchronized php-1.35.0-wmf.5/extensions/SecurePoll/cli/dump.php: T239968 (duration: 01m 00s)
  • 01:25 reedy@deploy1001: Synchronized php-1.35.0-wmf.5/extensions/SecurePoll/cli/dump.php: T239968 (duration: 01m 01s)
  • 01:09 ejegg: updated fundraising internal dashboard from 3a93d2aba4 to d9d74429baa
  • 01:08 ejegg: updated payments-wiki from 81921bd04a to 00632a397c
  • 01:04 catrope@deploy1001: Synchronized private/PrivateSettings.php: HMAC value for Kask config (T222099) (duration: 00m 59s)
  • 01:02 reedy@deploy1001: Synchronized private/PrivateSettings.php: wmgSessionStoreHMACKey T222099 (duration: 01m 07s)
  • 00:47 catrope@deploy1001: Synchronized wmf-config/CommonSettings.php: Use PHP serialization with HMAC for Kask session serialization (T222099) (duration: 01m 01s)
  • 00:08 catrope@deploy1001: Synchronized wmf-config/InitialiseSettings.php: Add *.archives.go.jp to $wgCopyUploadsDomains (T238476) (duration: 01m 00s)

2019-12-05

  • 23:44 jforrester@deploy1001: Synchronized wmf-config/InitialiseSettings.php: T235263 Turn off redirect on exact search match for Commons (duration: 01m 00s)
  • 23:04 ebernhardson: [cloudelastic-chi] reduce indices.recovery.max_bytes_per_sec from 512mb->128mb
  • 22:30 herron@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
  • 22:28 herron@cumin1001: START - Cookbook sre.hosts.downtime
  • 22:07 krinkle@deploy1001: Synchronized wmf-config/: I64e5ebe5fcd6b - removes arclamp.php (duration: 01m 01s)
  • 22:03 mutante: phabricator - git-ssh.wikimedia.org has been fixed and is up again (T238956)
  • 22:01 mutante: phab1001 - restarting ssh-phab to listen on additional LVS IP
  • 22:00 krinkle@deploy1001: Synchronized php-1.35.0-wmf.8/includes/libs/rdbms/database/: T233342 (duration: 01m 02s)
  • 21:55 twentyafterfour: stopping phd on phab1003 and starting on phab1001
  • 21:50 mutante: phab1003 - remove IPv6 service IP for git-ssh from lo:LVS
  • 21:34 mutante: puppetmaster2001: deleting /var/run/confd-template/.git-ssh*.err to fix confd template compilation alerts
  • 21:33 mutante: puppetmaster1001: deleting /var/run/confd-template/.git-ssh*.err to fix confd template compilation alerts
  • 21:19 mutante: phab1001 - systemctl restart ssh-phab (to make it listen on IPv6, race between puppet adding the IP and starting the service)
  • 21:09 bblack: ns0.wikimedia.org: restore routing to authdns1001
  • 21:03 dzahn@cumin1001: conftool action : set/pooled=yes; selector: name=phab1001-vcs.eqiad.wmnet
  • 21:00 mutante: phab1001 - reload apache2, removed /ws/ rewrite for wstunnel for aphlict
  • 21:00 bblack@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
  • 20:58 bblack@cumin1001: START - Cookbook sre.hosts.downtime
  • 20:56 bblack: cr[12]-eqiad: delete leftover static route of ns2->authdns1001 from esams work, which was blinding icinga to the real ns2 :P
  • 20:49 mholloway-shell@deploy1001: helmfile [CODFW] Ran 'apply' command on namespace 'wikifeeds' for release 'production' .
  • 20:48 twentyafterfour: successfully migrated to phab1001 with no apparent user impact!
  • 20:47 mholloway-shell@deploy1001: helmfile [EQIAD] Ran 'apply' command on namespace 'wikifeeds' for release 'production' .
  • 20:46 mholloway-shell@deploy1001: helmfile [STAGING] Ran 'apply' command on namespace 'wikifeeds' for release 'staging' .
  • 20:43 bblack: ns0.wikimedia.org: re-routing auth traffic from authdns1001 (reimaging) to dns1001
  • 20:41 mutante: running puppet on all cp* for phab change
  • 20:36 volker-e@deploy1001: Finished deploy [design/style-guide@437023f]: Deploy design/style-guide: (duration: 00m 08s)
  • 20:36 volker-e@deploy1001: Started deploy [design/style-guide@437023f]: Deploy design/style-guide:
  • 20:29 twentyafterfour: migrating back to phab1001, minimal downtime expected
  • 20:12 mutante: phab1001 - rebooting to hopefully clear "microcode vuln" icinga alert
  • 20:11 onimisionipe: ban cloudelastic1002 from shard allocation - T230088
  • 20:10 bblack: ns1.wikimedia.org: restoring normal routing to the newly-reimaged authdns2001
  • 19:56 bblack@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
  • 19:53 bblack@cumin1001: START - Cookbook sre.hosts.downtime
  • 19:47 urbanecm@deploy1001: Synchronized php-1.35.0-wmf.8/extensions/Linter/extension.json: SWAT: afcfdce: Revert "Revert "Implement ParserLogLinterData hook"" (3/3, T238456) (duration: 01m 00s)
  • 19:46 urbanecm@deploy1001: Synchronized php-1.35.0-wmf.8/extensions/Linter/includes/ApiRecordLint.php: SWAT: afcfdce: Revert "Revert "Implement ParserLogLinterData hook"" (2/3, T238456) (duration: 01m 09s)
  • 19:44 urbanecm@deploy1001: Synchronized php-1.35.0-wmf.8/extensions/Linter/includes/Hooks.php: SWAT: afcfdce: Revert "Revert "Implement ParserLogLinterData hook"" (1/3, T238456) (duration: 01m 11s)
  • 19:41 urbanecm@deploy1001: Synchronized php-1.35.0-wmf.5/extensions/Linter/includes/ApiRecordLint.php: SWAT: 7b7f326: Implement ParserLogLinterData hook (3/3, T238456) (duration: 01m 04s)
  • 19:39 urbanecm@deploy1001: Synchronized php-1.35.0-wmf.5/extensions/Linter/extension.json: SWAT: 7b7f326: Implement ParserLogLinterData hook (2/3, T238456) (duration: 01m 05s)
  • 19:37 urbanecm@deploy1001: Synchronized php-1.35.0-wmf.5/extensions/Linter/includes/Hooks.php: SWAT: 7b7f326: Implement ParserLogLinterData hook (1/3, T238456) (duration: 01m 09s)
  • 19:35 mutante: Icinga: delete all downtimes for mw2259. Scheduling Icinga downtimes is tricky business. If you add some for hardware failure and they are too short you cause Icinga spam, if they are too long and the dcops operator is amazingly fast like Papaul then your server is back in production but not monitored and you have to click a million times in the web UI to remove them to avoid that.
  • 19:34 bblack: ns1.wikimedia.org: re-route authdns traffic from authdns2001 (to be reimaged) -> dns2001 temporarily - T239667
  • 19:28 urbanecm@deploy1001: Synchronized php-1.35.0-wmf.8/extensions/Linter: SWAT: e0a2059: Revert "Implement ParserLogLinterData hook" (duration: 01m 01s)
  • 19:19 urbanecm@deploy1001: Synchronized php-1.35.0-wmf.5/extensions/Linter/: SWAT: b376528: Revert "Implement ParserLogLinterData hook" (duration: 01m 01s)
  • 19:15 urbanecm@deploy1001: scap failed: average error rate on 3/11 canaries increased by 10x (rerun with --force to override this check, see https://logstash.wikimedia.org/goto/db09a36be5ed3e81155041f7d46ad040 for details)
  • 19:14 urbanecm@deploy1001: Synchronized php-1.35.0-wmf.8/extensions/Linter: SWAT: 839c383: Implement ParserLogLinterData hook (T238456) (duration: 01m 02s)
  • 18:40 dzahn@cumin1001: conftool action : set/pooled=yes; selector: name=mw2259.codfw.wmnet
  • 18:25 kevinbazira@deploy1001: Finished deploy [ores/deploy@6dd1fef]: T238839 (duration: 17m 20s)
  • 18:08 kevinbazira@deploy1001: Started deploy [ores/deploy@6dd1fef]: T238839
  • 17:38 jiji@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
  • 17:36 jiji@cumin1001: START - Cookbook sre.hosts.downtime
  • 17:31 ebernhardson@deploy1001: Finished deploy [wikimedia/discovery/analytics@c29a758]: deploy repo to search-airflow dsh group (duration: 00m 13s)
  • 17:30 ebernhardson@deploy1001: Started deploy [wikimedia/discovery/analytics@c29a758]: deploy repo to search-airflow dsh group
  • 17:23 cdanis: βœ”οΈ cdanis@install1002.wikimedia.org ~ πŸ•§β˜• sudo -E reprepro -C main include stretch-wikimedia prometheus-atlas-exporter_1.0+git20191204.ffafab7-1_amd64.changes
  • 17:18 effie: reimage mw2260, yes again
  • 16:47 ebernhardson@deploy1001: Finished deploy [wikimedia/discovery/analytics@87b25f2]: initial airflow dags/plugins (duration: 00m 06s)
  • 16:47 ebernhardson@deploy1001: Started deploy [wikimedia/discovery/analytics@87b25f2]: initial airflow dags/plugins
  • 16:40 brion: running `requeueTranscodes.php --error --throttle` on mwmaint1002 to clean up T239831-related broken video transcodes. will raise usage on video scalers for a while.
  • 16:33 elukey: execute clear bfd session address fe80::5e5e:ab00:d3d:85ce on cr3-knams
  • 16:32 elukey: execute clear bfd session address fe80::7a4f:9b00:d4e:8004 on cr1-eqiad
  • 16:20 elukey: execute clear bfd session address 208.80.154.208 on cr2-eqord
  • 15:50 anomie@deploy1001: Finished scap: Backporting fix for T239428 (duration: 33m 20s)
  • 15:49 ejegg: re-enabled creating CiviMail activities when sending Thank You emails
  • 15:44 jynus: restart backup1001, overloaded T234900
  • 15:43 akosiaris@deploy1001: helmfile [EQIAD] Ran 'apply' command on namespace 'blubberoid' for release 'production' .
  • 15:43 moritzm: upgrading the reimaged video scalers back to the row-mt enabled ffmpeg T239831
  • 15:41 ejegg: updated Fundraising CiviCRM from 4a72ad4e63 to 30cdc5fa59
  • 15:17 anomie@deploy1001: Started scap: Backporting fix for T239428
  • 15:16 onimisionipe: run osm-import on maps1004 - T239728
  • 14:52 cdanis@deploy1001: Synchronized src/Noc/WmfClusters.php: c0fe7c410 clarify loads output (earlier push was 7963fdcd2 sort clusters naturally) (duration: 00m 59s)
  • 14:52 onimisionipe: disable puppet on maps100[1-3].eqiad.wmnet - T239728
  • 14:51 onimisionipe: disable tilerator on maps100[1-3].eqiad.wmnet - T239728
  • 14:50 cdanis@deploy1001: Synchronized docroot/noc/db.php: c0fe7c410 noc/db.php: clarify loads output (duration: 01m 01s)
  • 14:39 jiji@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
  • 14:37 jiji@cumin1001: START - Cookbook sre.hosts.downtime
  • 14:25 Lucas_WMDE: 14:20:08 <effie> reimage mw2260
  • 13:09 godog: bounce mtail on mw1240
  • 13:01 _joe_: restarted mtail on mw1239
  • 12:41 jiji@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
  • 12:39 jiji@cumin1001: START - Cookbook sre.hosts.downtime
  • 12:21 effie: Reimage mw2261.codfw.wmnet
  • 12:16 urbanecm@deploy1001: Synchronized wmf-config/InitialiseSettings.php: SWAT: 6c9d168: Fix namespace name - napwikisource (T239547) (duration: 01m 02s)
  • 10:48 jiji@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
  • 10:44 jiji@cumin1001: START - Cookbook sre.hosts.downtime
  • 10:38 oblivian@deploy1001: helmfile [EQIAD] Ran 'apply' command on namespace 'blubberoid' for release 'production' .
  • 10:35 oblivian@deploy1001: helmfile [CODFW] Ran 'apply' command on namespace 'blubberoid' for release 'production' .
  • 10:26 effie: reimage mw2260.codfw.wmnet
  • 10:13 oblivian@deploy1001: helmfile [STAGING] Ran 'apply' command on namespace 'blubberoid' for release 'staging' .
  • 09:54 ema: text@esams: disable ats-be origin server request coalescing T238494
  • 09:07 marostegui: Upgrade db2094 and db2095
  • 08:38 marostegui: Upgrade db2078
  • 08:09 marostegui: Upgrade pc2007, pc2008, pc2009, pc2010
  • 08:09 marostegui@cumin1001: dbctl commit (dc=all): 'Remove db1062 from etcd T239188', diff saved to https://phabricator.wikimedia.org/P9821 and previous config saved to /var/cache/conftool/dbconfig/20191205-080909-marostegui.json
  • 08:03 elukey: remove logstash_cleanup_indices_apifeatureusage-search.svc.codfw.wmnet and logstash_cleanup_indices_apifeatureusage-search.svc.eqiad.wmnet from logstash1025,logstash1024,logstash1023,logstash2024,logstash2025 to reduce cronspam - T234854
  • 07:42 marostegui@cumin1001: dbctl commit (dc=all): 'Fully repool db1099:3311, db1099:3318', diff saved to https://phabricator.wikimedia.org/P9820 and previous config saved to /var/cache/conftool/dbconfig/20191205-074200-marostegui.json
  • 07:32 marostegui@cumin1001: dbctl commit (dc=all): 'Slowly repool db1099:3311, db1099:3318', diff saved to https://phabricator.wikimedia.org/P9819 and previous config saved to /var/cache/conftool/dbconfig/20191205-073209-marostegui.json
  • 07:29 _joe_: ran apt-get install manually on kubestagetcd1001 to fix broken packages
  • 07:25 _joe_: manually running package_builder_Clean_up_build_directory.service on boron
  • 07:23 marostegui@cumin1001: dbctl commit (dc=all): 'Slowly repool db1099:3311, db1099:3318', diff saved to https://phabricator.wikimedia.org/P9818 and previous config saved to /var/cache/conftool/dbconfig/20191205-072314-marostegui.json
  • 07:22 _joe_: umounting /proc,/sys,/dev from /var/cache/pbuilder/build/cow.6815 on boron to allow reaping it away
  • 07:14 marostegui@cumin1001: dbctl commit (dc=all): 'Slowly repool db1099:3311, db1099:3318', diff saved to https://phabricator.wikimedia.org/P9817 and previous config saved to /var/cache/conftool/dbconfig/20191205-071445-marostegui.json
  • 07:06 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db1099:3311, db1099:3318 for upgrade', diff saved to https://phabricator.wikimedia.org/P9816 and previous config saved to /var/cache/conftool/dbconfig/20191205-070631-marostegui.json
  • 06:55 marostegui@cumin1001: dbctl commit (dc=all): 'Fully repool db1101:3317, db1101:3318', diff saved to https://phabricator.wikimedia.org/P9815 and previous config saved to /var/cache/conftool/dbconfig/20191205-065536-marostegui.json
  • 06:51 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.decommission (exit_code=0)
  • 06:51 marostegui@cumin1001: START - Cookbook sre.hosts.decommission
  • 06:48 marostegui@cumin1001: dbctl commit (dc=all): 'Slowly repool db1101:3317, db1101:3318', diff saved to https://phabricator.wikimedia.org/P9814 and previous config saved to /var/cache/conftool/dbconfig/20191205-064845-marostegui.json
  • 06:31 marostegui@cumin1001: dbctl commit (dc=all): 'Slowly repool db1101:3317, db1101:3318', diff saved to https://phabricator.wikimedia.org/P9813 and previous config saved to /var/cache/conftool/dbconfig/20191205-063103-marostegui.json
  • 06:14 marostegui@cumin1001: dbctl commit (dc=all): 'Slowly repool db1101:3317, db1101:3318', diff saved to https://phabricator.wikimedia.org/P9812 and previous config saved to /var/cache/conftool/dbconfig/20191205-061453-marostegui.json
  • 05:57 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db1101:3317 for upgrade', diff saved to https://phabricator.wikimedia.org/P9811 and previous config saved to /var/cache/conftool/dbconfig/20191205-055756-marostegui.json
  • 03:37 twentyafterfour: leaving phabricator on phab1003 for tonight while phab1001 raid syncs, will pick it up tomorrow to decide where to go from here
  • 03:32 twentyafterfour@deploy1001: Finished deploy [phabricator/deployment@UNKNOWN]: deploy release/2019-08-22/1 to phab1001 (duration: 01m 36s)
  • 03:30 twentyafterfour@deploy1001: Started deploy [phabricator/deployment@UNKNOWN]: deploy release/2019-08-22/1 to phab1001
  • 03:29 twentyafterfour@deploy1001: Finished deploy [phabricator/deployment@UNKNOWN]: deploy release/2019-08-22/1 to phab1001 (duration: 00m 22s)
  • 03:29 twentyafterfour@deploy1001: Started deploy [phabricator/deployment@UNKNOWN]: deploy release/2019-08-22/1 to phab1001
  • 03:07 mutante: phab1001 - now using AHCI mode after reinstall, performance much better. rsyncing /srv/repos from phab1003 again
  • 02:32 mutante: phab1001 - signed new puppet cert - initial puppet run in progress
  • 02:27 mutante: phab1001 - fixed boot order in BIOS to boot only from HDD, back at login
  • 02:12 ejegg: updated payments-wiki from f61c9f0692 to 81921bd04a
  • 01:21 mutante: phab1001 - rebooting to BIOS once more - "The settings were saved successfully."
  • 01:19 twentyafterfour: phab1001 back, still in legacy ide mode
  • 01:12 mutante: phab1001 - enabling Write Cache in BIOS
  • 01:07 mutante: phab1001 - System BIOS Settings > SATA Settings > Embedded SATA: switch from ATA to AHCI mode (T238956)
  • 01:05 mutante: phab1001 - powercycling
  • 01:04 mutante: telling phab1001 to boot into BIOS next time it boots via mgmt console (https://wikitech.wikimedia.org/wiki/Platform-specific_documentation/Dell_PowerEdge_RN30#Reboot_and_boot_into_BIOS_then_console)
  • 01:03 twentyafterfour: phabricator switched back to phab1003 - reimaging phab1001 now
  • 00:49 dzahn@cumin1001: conftool action : set/pooled=inactive; selector: name=phab1001-vcs.eqiad.wmnet
  • 00:49 dzahn@cumin1001: conftool action : set/pooled=no; selector: name=phab1001-vcs.eqiad.wmnet
  • 00:43 cwhite@cumin1001: dbctl commit (dc=all): 'Depool db1062 T239874', diff saved to Unable to send diff to phaste and previous config saved to /var/cache/conftool/dbconfig/20191205-004256-cwhite.json

2019-12-04

  • 23:38 brennen@deploy1001: rebuilt and synchronized wikiversions files: Revert "group1 wikis to 1.35.0-wmf.5"
  • 23:35 brennen@deploy1001: Scap failed!: 9/11 canaries failed their endpoint checks(http://en.wikipedia.org)
  • 23:30 brennen@deploy1001: Synchronized php: group1 wikis to 1.35.0-wmf.8 (duration: 01m 01s)
  • 23:29 brennen@deploy1001: rebuilt and synchronized wikiversions files: group1 wikis to 1.35.0-wmf.8
  • 23:22 jforrester@deploy1001: Synchronized wmf-config/logging.php: Keep test consistent w/ operations/puppet, for logging (duration: 01m 02s)
  • 23:21 jforrester@deploy1001: Synchronized wmf-config/CommonSettings.php: Keep test consistent w/ operations/puppet, for CS (duration: 01m 03s)
  • 22:49 jforrester@deploy1001: Synchronized wmf-config/InitialiseSettings.php: Disable VisualEditor on Wikitech (and Labs Wikitech) (duration: 01m 02s)
  • 22:45 mholloway-shell@deploy1001: Finished deploy [mobileapps/deploy@e6afe36]: Update mobileapps to 9e9b042 (duration: 05m 48s)
  • 22:39 mholloway-shell@deploy1001: Started deploy [mobileapps/deploy@e6afe36]: Update mobileapps to 9e9b042
  • 22:39 bstorm_: powered off cloudstore1008, disabled sync from cloudstore1009, and downtimed both cloudstore1008 and cloudstore1009 for memory module replacement T239569
  • 22:37 bstorm_: poweroff cloudstore1008 for memory module replacement
  • 22:24 RoanKattouw: T208369 ran mwscript extensions/GrowthExperiments/maintenance/deleteOldSurveys.php kowiki --cutoff 350
  • 22:21 RoanKattouw: T208369 ran mwscript extensions/GrowthExperiments/maintenance/deleteOldSurveys.php cswiki --cutoff 350
  • 21:48 eileen: civicrm revision changed from 6812488f3a to 4a72ad4e63, config revision is 9f4db1edad (CiviCRM security patches )
  • 21:38 rzl@cumin1001: conftool action : set/pooled=yes; selector: name=mw226[3456]\.codfw\.wmnet
  • 21:11 rzl@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
  • 21:09 rzl@cumin1001: START - Cookbook sre.hosts.downtime
  • 20:47 milimetric@deploy1001: Finished deploy [analytics/refinery@fc710ec] (thin): Weekly train deploy to labs/notebooks (duration: 00m 07s)
  • 20:47 milimetric@deploy1001: Started deploy [analytics/refinery@fc710ec] (thin): Weekly train deploy to labs/notebooks
  • 20:33 rzl@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
  • 20:31 rzl@cumin1001: START - Cookbook sre.hosts.downtime
  • 20:28 milimetric@deploy1001: Finished deploy [analytics/refinery@fc710ec]: Weekly train deploy (duration: 07m 09s)
  • 20:28 brennen@deploy1001: rebuilt and synchronized wikiversions files: Revert "group1 wikis to 1.35.0-wmf.5"
  • 20:21 milimetric@deploy1001: Started deploy [analytics/refinery@fc710ec]: Weekly train deploy
  • 20:15 milimetric@deploy1001: Finished deploy [analytics/refinery@fc710ec]: Weekly train deploy (duration: 06m 37s)
  • 20:14 brennen@deploy1001: Synchronized php: group1 wikis to 1.35.0-wmf.8 (duration: 01m 29s)
  • 20:12 brennen@deploy1001: rebuilt and synchronized wikiversions files: group1 wikis to 1.35.0-wmf.8
  • 20:09 milimetric@deploy1001: Started deploy [analytics/refinery@fc710ec]: Weekly train deploy
  • 19:56 rzl@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
  • 19:54 rzl@cumin1001: START - Cookbook sre.hosts.downtime
  • 19:43 twentyafterfour@deploy1001: Finished deploy [phabricator/deployment@e4e2b22]: deploy phabricator to phab2001.codfw.wmnet (duration: 00m 31s)
  • 19:43 twentyafterfour@deploy1001: Started deploy [phabricator/deployment@e4e2b22]: deploy phabricator to phab2001.codfw.wmnet
  • 19:38 milimetric@deploy1001: deploy aborted: Weekly train deploy (duration: 00m 21s)
  • 19:38 milimetric@deploy1001: Started deploy [analytics/refinery@c8de2ab]: Weekly train deploy
  • 19:21 Amir1: morning SWAT is done
  • 19:19 rzl@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
  • 19:17 ladsgroup@deploy1001: Synchronized php-1.35.0-wmf.8/extensions/Wikibase/repo/includes/ParserOutput/FullEntityParserOutputGenerator.php: SWAT: Remove no-op 'jquery.ui.core.styles' from FullEntityParserOutputGenerator (T219604 T239594) (duration: 01m 06s)
  • 19:16 rzl@cumin1001: START - Cookbook sre.hosts.downtime
  • 18:55 bblack: dns1001: back to normal again
  • 18:54 bblack: dns1001: stop bird.service again, briefly
  • 18:52 otto@deploy1001: helmfile [EQIAD] Ran 'apply' command on namespace 'eventgate-logging-external' for release 'logging-external' .
  • 18:50 otto@deploy1001: helmfile [CODFW] Ran 'apply' command on namespace 'eventgate-logging-external' for release 'logging-external' .
  • 18:49 otto@deploy1001: helmfile [STAGING] Ran 'apply' command on namespace 'eventgate-logging-external' for release 'logging-external' .
  • 18:46 bblack: dns1001: restart bird.service
  • 18:45 arlolra: Updated Parsoid to b81bbf4 (T239643, T239830, T238456, T239841)
  • 18:41 bblack: dns1001: stopping just bird
  • 18:32 arlolra@deploy1001: Finished deploy [parsoid/deploy@0910e18]: Updating Parsoid to b81bbf4 (duration: 08m 11s)
  • 18:24 arlolra@deploy1001: Started deploy [parsoid/deploy@0910e18]: Updating Parsoid to b81bbf4
  • 18:08 bblack: dns1002: back to normal state
  • 18:05 bblack: dns1002: stopping recursive dns to test failure theory (same method as prere-imaging earlier, intended to not cause impact)
  • 17:54 bblack: dns1001: back to normal state
  • 17:51 bblack: dns1001: stopping recursive dns to test failure theory (same method as prere-imaging earlier, intended to not cause impact)
  • 17:50 ladsgroup@deploy1001: Synchronized php-1.35.0-wmf.5/extensions/Wikibase/repo/includes/ParserOutput/FullEntityParserOutputGenerator.php: T229407, part III (duration: 01m 01s)
  • 17:25 bblack@cumin1001: conftool action : set/pooled=yes; selector: name=dns[12]001.wikimedia.org
  • 17:25 _joe_: repooling mw1348
  • 17:21 _joe_: depooling mw1348 for debugging
  • 17:15 jynus: killing dump threads on db1118 T143870
  • 17:13 bblack@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
  • 17:11 bblack@cumin1001: START - Cookbook sre.hosts.downtime
  • 17:09 bblack@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
  • 17:07 bblack@cumin1001: START - Cookbook sre.hosts.downtime
  • 16:49 bblack@cumin1001: conftool action : set/pooled=no; selector: name=dns[12]001.wikimedia.org
  • 16:48 bblack: dns[12]001 - reimaging to buster
  • 16:48 rzl@cumin1001: conftool action : set/pooled=yes; selector: dc=codfw,service=nginx,name=mw2267.codfw.wmnet,cluster=jobrunner
  • 16:48 rzl@cumin1001: conftool action : set/pooled=yes; selector: dc=codfw,service=apache2,name=mw2267.codfw.wmnet,cluster=jobrunner
  • 16:48 rzl@cumin1001: conftool action : set/pooled=yes; selector: dc=codfw,service=nginx,name=mw2267.codfw.wmnet,cluster=videoscaler
  • 16:48 rzl@cumin1001: conftool action : set/pooled=yes; selector: dc=codfw,service=apache2,name=mw2267.codfw.wmnet,cluster=videoscaler
  • 16:48 rzl@cumin1001: conftool action : set/pooled=yes; selector: name=mw2272.codfw.wmnet,dc=codfw,service=nginx,cluster=appserver
  • 16:48 rzl@cumin1001: conftool action : set/pooled=yes; selector: name=mw2272.codfw.wmnet,dc=codfw,service=apache2,cluster=appserver
  • 16:48 rzl@cumin1001: conftool action : set/pooled=yes; selector: name=mw2273.codfw.wmnet,dc=codfw,cluster=appserver,service=nginx
  • 16:48 rzl@cumin1001: conftool action : set/pooled=yes; selector: name=mw2273.codfw.wmnet,dc=codfw,cluster=appserver,service=apache2
  • 16:33 ejegg: updated fundraising CiviCRM from 970b7b214b to 6812488f3a
  • 16:32 effie: enagle puppet on mwdebug1001
  • 16:32 effie: enagle puppet on mw1348
  • 16:30 rzl@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
  • 16:28 rzl@cumin1001: START - Cookbook sre.hosts.downtime
  • 16:25 effie: disable puppet on mw1348
  • 15:57 papaul: rebooting ms-fe2007 for HW maintenance
  • 15:49 rzl@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
  • 15:47 rzl@cumin1001: START - Cookbook sre.hosts.downtime
  • 15:29 moritzm: installing mariadb 10.3 updates from Buster 10.2 point release (client libs/tools only)
  • 15:28 mobrovac@deploy1001: Finished deploy [restbase/deploy@f4b752e]: Parsoid: Set title when sending html2html reqs; Mirror 6% of html2html reqs to Parsoid/PHP - T239768 T239643 (duration: 16m 02s)
  • 15:26 bblack@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
  • 15:24 bblack@cumin1001: START - Cookbook sre.hosts.downtime
  • 15:19 ejegg: updated fundraising CiviCRM from 0f51030071 to 970b7b214b
  • 15:15 ejegg: disabled debug logging for Ingenico on payments-wiki
  • 15:12 mobrovac@deploy1001: Started deploy [restbase/deploy@f4b752e]: Parsoid: Set title when sending html2html reqs; Mirror 6% of html2html reqs to Parsoid/PHP - T239768 T239643
  • 15:09 ladsgroup@deploy1001: Synchronized php-1.35.0-wmf.5/extensions/Wikibase/repo/includes/ParserOutput/FullEntityParserOutputGenerator.php: T229407, part II (duration: 01m 02s)
  • 15:07 rzl@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
  • 15:07 ladsgroup@deploy1001: Synchronized php-1.35.0-wmf.5/extensions/Wikibase/repo/includes/ParserOutput/FullEntityParserOutputGenerator.php: T229407 (duration: 01m 00s)
  • 15:05 rzl@cumin1001: START - Cookbook sre.hosts.downtime
  • 14:53 marostegui@cumin1001: dbctl commit (dc=all): 'Set db2135 as master for s10 in codfw', diff saved to https://phabricator.wikimedia.org/P9806 and previous config saved to /var/cache/conftool/dbconfig/20191204-145349-marostegui.json
  • 14:51 marostegui@cumin1001: dbctl commit (dc=all): 'Pool db2135 in m5 codfw', diff saved to https://phabricator.wikimedia.org/P9805 and previous config saved to /var/cache/conftool/dbconfig/20191204-145145-marostegui.json
  • 14:40 rzl@cumin1001: conftool action : set/pooled=yes; selector: service=apache2,cluster=appserver,dc=codfw,name=mw2274.codfw.wmnet
  • 14:40 rzl@cumin1001: conftool action : set/pooled=yes; selector: service=nginx,cluster=appserver,dc=codfw,name=mw2274.codfw.wmnet
  • 14:31 moritzm: test ldap-corp2001 as LDAP server on mx2001
  • 14:24 bblack: ns2 authdns: re-route from ganeti3003 to dns3001 - T236479
  • 14:10 bblack@cumin1001: conftool action : set/pooled=yes; selector: name=dns5001.wikimedia.org
  • 14:04 bblack@cumin1001: conftool action : set/pooled=yes; selector: name=dns[34]001.wikimedia.org
  • 13:59 rzl@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
  • 13:57 rzl@cumin1001: START - Cookbook sre.hosts.downtime
  • 13:54 bblack@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
  • 13:52 bblack@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
  • 13:52 bblack@cumin1001: START - Cookbook sre.hosts.downtime
  • 13:50 bblack@cumin1001: START - Cookbook sre.hosts.downtime
  • 13:45 bblack@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
  • 13:43 bblack@cumin1001: START - Cookbook sre.hosts.downtime
  • 13:24 bblack@cumin1001: conftool action : set/pooled=no; selector: name=dns[345]001.wikimedia.org
  • 13:24 onimisionipe: downtimed maps1004 - T239728
  • 13:23 bblack: dns[345]001 - starting downtimes/etc for reimage to buster...
  • 12:31 filippo@cumin1001: conftool action : set/pooled=no; selector: name=ms-fe2007.codfw.wmnet
  • 12:29 Urbanecm: EU SWAT done
  • 12:28 urbanecm@deploy1001: Synchronized php-1.35.0-wmf.5/extensions/WikimediaMessages/: SWAT: bbf2a33: Change Schema Revision of WMDEBannerEvents (T239430) (duration: 01m 02s)
  • 12:26 urbanecm@deploy1001: Synchronized php-1.35.0-wmf.8/extensions/WikimediaMessages/: SWAT: b3ef5cd: Change Schema Revision of WMDEBannerEvents (T239430) (duration: 01m 04s)
  • 11:38 jbond42: puppet enabled accross the fleet and new CA certificate installed
  • 11:31 akosiaris: drain kubernetes1002 for test of nf_conntrack changes
  • 11:23 jbond42: enable puppet in eqiad and deploy updated CA
  • 11:13 gehel@cumin1001: END (FAIL) - Cookbook sre.wdqs.restart (exit_code=99)
  • 10:54 jbond42: enable puppet in codfw and deploy updated CA
  • 10:46 jbond42: enable puppet in esams and deploy updated CA
  • 10:42 jbond42: enable puppet in ulsfo and deploy updated CA
  • 10:31 gehel@cumin1001: START - Cookbook sre.wdqs.restart
  • 10:31 gehel: rolling restart of wdqs for config change (event logging) - T101013
  • 10:31 jbond42: enable puppet in eqsin and deploy updated CA
  • 10:24 marostegui: stop replication and mysql on db2107 (s2 codfw master) to test puppet CA changes
  • 10:21 marostegui: stop replication and mysql on db2071 to test puppet CA changes
  • 10:02 jbond42: disabling puppet accros the fleet to start CA update change 548241
  • 09:29 godog: roll-restart logstash7 in codfw/eqiad after https://gerrit.wikimedia.org/r/c/operations/puppet/+/554472
  • 09:15 marostegui: Reload labsdb1010 after reimporting wikidatawiki.page - T238399
  • 09:06 moritzm: updated jenkins on apt.wikimedia.org to 2.190.3 (T239586)
  • 08:05 effie: Restart php7-fpm on mw1348
  • 07:09 marostegui: Depool labsdb1010 to reimport wikidatawiki.page - T238399
  • 07:02 marostegui: Repool labsdb1011
  • 06:36 mutante: removed LVS IP for git-ssh from interface on phab1003
  • 06:25 dzahn@cumin1001: conftool action : set/weight=10; selector: name=phab1001-vcs.eqiad.wmnet
  • 06:13 mutante: phab1001 - running rsync of /srv/repos with --delete because it's larger than the source by about 5GB - deleting objects to match phab1003, former prod server. now both 50G (T238956)
  • 06:04 marostegui: Depool labsdb1011
  • 06:01 mutante: rsyncing /srv/repos data once again. pulling from phab1003 to phab1001 (T238956)
  • 05:51 marostegui: Deploy schema change on s3 primary master (db1123)
  • 04:59 mutante: removed downtime for phabricator.wikimedia.org meta service (paging)
  • 04:58 mutante: phabricator maintenance ended for today - now running on phab1001 (buster)
  • 04:58 mutante: install1002 - restarted isc-dhcpd
  • 04:39 mutante: phab1001 - rebooting for BIOS config change
  • 02:06 mutante: re-enabling puppet on phab1003 and phab1001.. switching active_server for puppet
  • 01:55 dzahn@cumin1001: conftool action : set/pooled=yes; selector: name=phab1001-vcs.eqiad.wmnet
  • 01:47 mutante: switching phab-vcs in conftool-data from phab1003 to phab1001, running puppet on conf*
  • 01:45 dzahn@cumin1001: conftool action : set/pooled=inactive; selector: name=phab1003-vcs.eqiad.wmnet
  • 01:40 dzahn@cumin1001: conftool action : set/pooled=no; selector: name=phab1003-vcs.eqiad.wmnet
  • 01:37 twentyafterfour: re-enable phabricator writes (disable cluster.read-only)
  • 01:33 twentyafterfour: phab1001.eqiad.wmnet : sudo chown root.www-data /srv/phab/phabricator/conf/local/www.json
  • 01:29 mutante: phabricator currently under maintenance - db connection error is known
  • 01:20 mutante: running puppet on cp-eqiad
  • 00:49 ejegg: changed donations queue consumer and thank you mailer to use 3 minute cycles
  • 00:41 twentyafterfour: switching phabricator to read-only mode
  • 00:40 reedy@deploy1001: Synchronized php-1.35.0-wmf.8/skins/Vector/includes/templates/SearchComponent.mustache: I9776a3 (duration: 01m 01s)

2019-12-03

  • 23:47 volans: re-enabled meta-monitoring crontabs on wikitech-static after cleanup, reboot and fix wikitech-static's import errors
  • 22:59 volans: apt-get dist-upgrade and reboot of wikitech-static host
  • 22:36 jforrester@deploy1001: Synchronized wmf-config/InitialiseSettings.php: Remove settings for closed wikis T231178 (duration: 01m 01s)
  • 22:34 volans: disabled temporarily icinga meta-monitoring (disk full on the wikitech-static host)
  • 22:34 jforrester@deploy1001: Synchronized wmf-config/InitialiseSettings.php: Enable the Wikisource extension on frwikisource T239731 (duration: 01m 00s)
  • 22:22 jforrester@deploy1001: Synchronized wmf-config/CommonSettings.php: Read wmgDoNotRedirectOnSearchMatch to decide to enable auto-redirect search result change T235263 (duration: 01m 00s)
  • 22:21 jforrester@deploy1001: Synchronized wmf-config/InitialiseSettings.php: Set wmgDoNotRedirectOnSearchMatch, default off, on for Test Commons T235263 (duration: 01m 01s)
  • 22:03 jforrester@deploy1001: Synchronized wmf-config/InitialiseSettings.php: Set wgXmlDumpSchemaVersion to 0.1.0 everywhere T238921 T174031 (duration: 01m 03s)
  • 21:40 eileen: civicrm revision changed from 26b788378e to 0f51030071, config revision is 17b6730a72 - includes 3 possible performance improvements - logging reduction, cache a query result & cache file existence
  • 21:38 volker-e@deploy1001: Finished deploy [design/style-guide@02a92f7]: Deploy design/style-guide: (duration: 00m 07s)
  • 21:38 volker-e@deploy1001: Started deploy [design/style-guide@02a92f7]: Deploy design/style-guide:
  • 21:09 sbassett: Deployed security patch for T238768 to wmf.8
  • 21:03 sbassett: Deployed security patch for T238768 to wmf.5
  • 20:43 mutante: mw2259 - did not come back from reboot after reimage, also mgmt not reachable (T239054)
  • 20:38 dzahn@cumin1001: conftool action : set/pooled=yes; selector: name=mw2257.codfw.wmnet
  • 20:38 dzahn@cumin1001: conftool action : set/pooled=yes; selector: name=mw2256.codfw.wmnet
  • 20:38 dzahn@cumin1001: conftool action : set/pooled=yes; selector: name=mw2258.codfw.wmnet
  • 20:17 bblack@cumin1001: conftool action : set/pooled=yes; selector: name=dns[12]002.wikimedia.org
  • 20:00 mholloway-shell@deploy1001: Finished deploy [mobileapps/deploy@c21a1ca]: Bump preq version for better logging around MW API timeouts (duration: 05m 46s)
  • 19:54 mholloway-shell@deploy1001: Started deploy [mobileapps/deploy@c21a1ca]: Bump preq version for better logging around MW API timeouts
  • 19:53 ejegg: shifted 20 more sec / cycle from donations QC to thank you mailer
  • 19:41 dzahn@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
  • 19:38 dzahn@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
  • 19:38 dzahn@cumin1001: START - Cookbook sre.hosts.downtime
  • 19:36 dzahn@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
  • 19:35 dzahn@cumin1001: START - Cookbook sre.hosts.downtime
  • 19:34 dzahn@cumin1001: START - Cookbook sre.hosts.downtime
  • 19:33 dzahn@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
  • 19:31 dzahn@cumin1001: START - Cookbook sre.hosts.downtime
  • 19:30 bblack@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
  • 19:28 bblack@cumin1001: START - Cookbook sre.hosts.downtime
  • 19:24 bblack@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
  • 19:22 bblack@cumin1001: START - Cookbook sre.hosts.downtime
  • 19:16 Urbanecm: Morning SWAT done
  • 19:14 urbanecm@deploy1001: Synchronized wmf-config/InitialiseSettings.php: SWAT: 5c83491: Create translation namespace on nap.wikisource (T239547) (duration: 01m 03s)
  • 19:12 urbanecm@deploy1001: Synchronized wmf-config/InitialiseSettings.php: SWAT: 45edf5a: Add partial blocks for scowiki (T239493) (duration: 01m 00s)
  • 19:09 bblack@cumin1001: conftool action : set/pooled=no; selector: name=dns1002.wikimedia.org
  • 19:09 bblack@cumin1001: conftool action : set/pooled=no; selector: name=dns2002.wikimedia.org
  • 19:08 bblack: reimagine dns1002 + dns2002 - T239667
  • 19:07 thcipriani@deploy1001: Synchronized scap/plugins: scap: prep and clean git ops for /srv/patches T222240 (no-op sync) (duration: 01m 01s)
  • 17:52 ejegg: disabled PayPal orphan rectifier debug logging
  • 17:48 ejegg: adjusted timing of thank you mailer and donations QC to give 5 more sec / cycle to TY mails
  • 17:43 ejegg: updated fundraising CiviCRM from 4f3341455f to 26b788378e
  • 17:22 bblack@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
  • 17:19 bblack@cumin1001: START - Cookbook sre.hosts.downtime
  • 17:18 bblack@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
  • 17:14 bblack@cumin1001: START - Cookbook sre.hosts.downtime
  • 17:13 ebernhardson@deploy1001: Finished deploy [search/mjolnir/deploy@498c3d1]: repair bulk daemon swift listings (duration: 05m 49s)
  • 17:07 ebernhardson@deploy1001: Started deploy [search/mjolnir/deploy@498c3d1]: repair bulk daemon swift listings
  • 16:52 bblack: reimaging dns3002 + dns5002
  • 16:30 mholloway-shell@deploy1001: Synchronized php-1.35.0-wmf.8/extensions/MachineVision: Remove slow result randomization from the suggestions query (duration: 01m 03s)
  • 16:02 ejegg: reduced donations queue consumer 10 sec per cycle and increased TY mail sender 10 sec per cycle
  • 15:54 otto@deploy1001: helmfile [EQIAD] Ran 'apply' command on namespace 'eventgate-analytics' for release 'analytics' .
  • 15:44 otto@deploy1001: helmfile [CODFW] Ran 'apply' command on namespace 'eventgate-analytics' for release 'analytics' .
  • 15:38 ejegg: updated fundraising CiviCRM from 5cf2d2713f to 4f3341455f
  • 15:34 otto@deploy1001: helmfile [STAGING] Ran 'apply' command on namespace 'eventgate-analytics' for release 'analytics' .
  • 15:20 elukey: executing sudo cumin -b6 -s 20 -p 95 'A:mw-api-eqiad' 'restart-php7.2-fpm' on cumin1001
  • 14:52 godog: swift eqiad-prod: final weight to ms-be105[7-9] - T237438
  • 14:24 ema: all cp-esams hosts switched to digicert-2019a certs T238494
  • 14:19 otto@deploy1001: helmfile [EQIAD] Ran 'apply' command on namespace 'eventgate-logging-external' for release 'logging-external' .
  • 14:17 otto@deploy1001: helmfile [CODFW] Ran 'apply' command on namespace 'eventgate-logging-external' for release 'logging-external' .
  • 14:13 ema: cp-esams: re-enable puppet, switch to digicert-2019a certs https://gerrit.wikimedia.org/r/#/c/operations/puppet/+/554291/ T238494
  • 14:06 ema: repool cp3050 with digicert-2019a T238494
  • 14:00 ema: cp3050: depool and switch to digicert-2019a T238494
  • 13:56 ema: cp-esams: disable puppet in preparation of digicert-2019a cert switch https://gerrit.wikimedia.org/r/#/c/operations/puppet/+/554291/ T238494
  • 13:32 marostegui@cumin1001: dbctl commit (dc=all): 'Repool db1112 after schema change', diff saved to https://phabricator.wikimedia.org/P9802 and previous config saved to /var/cache/conftool/dbconfig/20191203-133231-marostegui.json
  • 13:22 mobrovac@deploy1001: Finished deploy [restbase/deploy@92acf1e]: Revert mirroring html2html traffic to PHP - T239643 (duration: 10m 43s)
  • 13:11 mobrovac@deploy1001: Started deploy [restbase/deploy@92acf1e]: Revert mirroring html2html traffic to PHP - T239643
  • 12:42 mobrovac@deploy1001: Finished deploy [restbase/deploy@41bb230]: Log all html2html errors coming from Parsoid/PHP - T239643 (duration: 14m 41s)
  • 12:28 mobrovac@deploy1001: Started deploy [restbase/deploy@41bb230]: Log all html2html errors coming from Parsoid/PHP - T239643
  • 12:23 mobrovac@deploy1001: Finished deploy [restbase/deploy@b346ebf]: Mirror html2html traffic to Parsoid/PHP, take #2 (duration: 11m 17s)
  • 12:12 mobrovac@deploy1001: Started deploy [restbase/deploy@b346ebf]: Mirror html2html traffic to Parsoid/PHP, take #2
  • 12:12 mobrovac@deploy1001: Finished deploy [restbase/deploy@b346ebf]: Mirror html2html traffic to Parsoid/PHP - T229015 T239643 (duration: 13m 29s)
  • 12:09 Amir1: EU SWAT is done
  • 12:09 ladsgroup@deploy1001: Synchronized wmf-config/InitialiseSettings.php: Set read new for term store for items for client wikis up to Q1000 (T225057) (duration: 01m 00s)
  • 11:58 mobrovac@deploy1001: Started deploy [restbase/deploy@b346ebf]: Mirror html2html traffic to Parsoid/PHP - T229015 T239643
  • 11:58 mobrovac@deploy1001: deploy aborted: Mirror html2html traffic to Parsoid/PHP - T229015 T239643 (duration: 00m 00s)
  • 11:58 mobrovac@deploy1001: Started deploy [restbase/deploy@b346ebf]: Mirror html2html traffic to Parsoid/PHP - T229015 T239643
  • 11:36 hashar: Updated operations-puppet-tests-stretch-docker to fix pip cache directory
  • 11:31 godog: refresh kibana fields for logstash-*
  • 11:00 hashar: Updated operations-puppet-tests-stretch-docker CI job to use tox 3.10.0 and support various python 3 versions
  • 10:37 ema: pool cp1083 with ATS backend T227432
  • 10:20 ema@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
  • 10:18 ema@cumin1001: START - Cookbook sre.hosts.downtime
  • 10:01 ema: depool cp1083 and reimage as text_ats T227432
  • 09:22 effie: Roll restart php-fpm mw[1240-1258,1261-1275,1319-1333].eqiad.wmnet
  • 09:05 godog: downtime new logstash hosts in codfw/eqiad until thurs
  • 09:02 akosiaris@deploy1001: helmfile [STAGING] Ran 'apply' command on namespace 'kube-system' for release 'rbac-deploy-clusterrole' .
  • 09:02 akosiaris@deploy1001: helmfile [CODFW] Ran 'apply' command on namespace 'kube-system' for release 'rbac-deploy-clusterrole' .
  • 09:00 akosiaris@deploy1001: helmfile [EQIAD] Ran 'apply' command on namespace 'kube-system' for release 'rbac-deploy-clusterrole' .
  • 08:48 elukey@cumin1001: END (PASS) - Cookbook sre.aqs.roll-restart (exit_code=0)
  • 08:45 elukey@cumin1001: START - Cookbook sre.aqs.roll-restart
  • 08:45 effie: Restart php-fpm on mw[1330-1333].eqiad.wmnet
  • 08:45 elukey@cumin1001: END (FAIL) - Cookbook sre.aqs.roll-restart (exit_code=99)
  • 08:45 elukey@cumin1001: START - Cookbook sre.aqs.roll-restart
  • 08:35 ema: cp3050: set cache.max_open_read_retries=-1 and proxy.config.http.cache.max_open_write_retries=1 (default values) T238494
  • 08:30 marostegui@deploy1001: Synchronized wmf-config/db-codfw.php: Remove db1062 from config T239188 (duration: 01m 02s)
  • 08:29 marostegui@deploy1001: Synchronized wmf-config/db-eqiad.php: Remove db1062 from config T239188 (duration: 01m 08s)
  • 08:20 akosiaris@deploy1001: helmfile [CODFW] Ran 'sync' command on namespace 'kube-system' for release 'calico-policy-controller' .
  • 08:19 akosiaris: apply calico rules for eventgate-logging-external. T236386
  • 08:18 akosiaris@deploy1001: helmfile [EQIAD] Ran 'sync' command on namespace 'kube-system' for release 'calico-policy-controller' .
  • 08:14 volker-e@deploy1001: Finished deploy [design/style-guide@7978f0d]: Deploy design/style-guide: (duration: 00m 06s)
  • 08:14 volker-e@deploy1001: Started deploy [design/style-guide@7978f0d]: Deploy design/style-guide:
  • 07:39 akosiaris@deploy1001: helmfile [STAGING] Ran 'sync' command on namespace 'kube-system' for release 'calico-policy-controller' .
  • 06:29 marostegui: Deploy schema change on db1112 with replication (this will generate lag on s3 on labs)
  • 06:19 volker-e@deploy1001: Finished deploy [design/style-guide@8e08740]: Deploy design/style-guide: (duration: 00m 08s)
  • 06:19 volker-e@deploy1001: Started deploy [design/style-guide@8e08740]: Deploy design/style-guide:
  • 06:07 marostegui: Stop MySQL on db1062 for decommissioning T239188
  • 06:00 marostegui: Remove db2065 from tendril and zarcillo T239046
  • 05:53 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.decommission (exit_code=0)
  • 05:53 marostegui@cumin1001: START - Cookbook sre.hosts.decommission
  • 05:50 ema: cp3050: ats-be restart with proxy.config.http.server_session_sharing.pool=thread T238494
  • 05:47 marostegui: Remove ar_comment triggers from s3 db1124:3313 - T234704
  • 05:45 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db1112 for schema change', diff saved to https://phabricator.wikimedia.org/P9798 and previous config saved to /var/cache/conftool/dbconfig/20191203-054528-marostegui.json
  • 04:19 tstarling@deploy1001: Synchronized php-1.35.0-wmf.5/includes/Rest/EntryPoint.php: disable IE6 safety checks for T239666 (duration: 01m 00s)
  • 04:15 tstarling@deploy1001: Synchronized php-1.35.0-wmf.8/includes/Rest/EntryPoint.php: disable IE6 safety checks for T239666 (duration: 01m 01s)
  • 03:53 mholloway-shell@deploy1001: Finished deploy [mobileapps/deploy@d00c6ad]: Fix: Apply language headers to zhwiki mobile-html responses (T239659) (duration: 05m 51s)
  • 03:47 mholloway-shell@deploy1001: Started deploy [mobileapps/deploy@d00c6ad]: Fix: Apply language headers to zhwiki mobile-html responses (T239659)
  • 02:54 mutante: mw1269 restarted nginx, php
  • 02:48 mutante: mw1320, mw1321 restarted php-fpm
  • 02:32 jforrester@deploy1001: Synchronized wmf-config/InitialiseSettings.php: T78711 Display 'twice a month' or 'once a month' on cached reports (duration: 01m 19s)
  • 02:25 jforrester@deploy1001: Synchronized wmf-config/InitialiseSettings.php: Stop setting testwiki => true for wmgUseCentralAuth, already implied by default (duration: 01m 24s)
  • 02:23 jforrester@deploy1001: Synchronized wmf-config/InitialiseSettings.php: T237698 Stop setting wmgUseDPL, unread (duration: 01m 11s)
  • 02:21 jforrester@deploy1001: Synchronized wmf-config/CommonSettings.php: T237698 Read wmgUseDynamicPageList not wmgUseDPL (duration: 01m 22s)
  • 02:19 jforrester@deploy1001: Synchronized wmf-config/InitialiseSettings.php: T237698 Set wmgUseDynamicPageList, less cryptic form of wmgUseDPL (duration: 01m 16s)
  • 02:16 jforrester@deploy1001: Synchronized wmf-config/CommonSettings.php: Stop setting wgTorLoadNodes, not read for a while (duration: 01m 14s)
  • 02:13 jforrester@deploy1001: Synchronized wmf-config/InitialiseSettings.php: Stop setting wgGEHelpPanelSearchEnabled, no longer used (duration: 01m 08s)
  • 02:04 jforrester@deploy1001: Synchronized wmf-config/InitialiseSettings.php: T239091 Enable Translate extension on sewikimedia, second try (duration: 01m 24s)
  • 01:58 jforrester@deploy1001: Synchronized php-1.35.0-wmf.5/extensions/VisualEditor/: T239209 Sanitize HTML on paste (duration: 01m 33s)
  • 01:55 jforrester@deploy1001: Synchronized php-1.35.0-wmf.8/extensions/VisualEditor/: T239209 Sanitize HTML on paste (duration: 01m 24s)
  • 01:42 dzahn@cumin1001: END (FAIL) - Cookbook sre.ganeti.makevm (exit_code=99)
  • 01:42 dzahn@cumin1001: START - Cookbook sre.ganeti.makevm
  • 01:36 dzahn@cumin1001: conftool action : set/pooled=yes; selector: name=mw2252.codfw.wmnet
  • 01:35 dzahn@cumin1001: conftool action : set/pooled=yes; selector: name=mw2250.codfw.wmnet
  • 01:33 mutante: mw2250 - E: dpkg was interrupted, you must manually run 'sudo dpkg --configure -a' to correct the problem.
  • 01:33 mutante: mw2252 rebooting
  • 01:26 dzahn@cumin1001: conftool action : set/pooled=yes; selector: name=mw2254.codfw.wmnet
  • 01:25 dzahn@cumin1001: conftool action : set/pooled=yes; selector: name=mw2253.codfw.wmnet
  • 01:22 mutante: mw2254 - rebooting (reimage script exited with segfault after reimage was done)
  • 01:20 jforrester@deploy1001: Synchronized php-1.35.0-wmf.5/includes/diff/DifferenceEngine.php: T236320 Don't calculate amount of inbetween revisions for MCR undo (duration: 00m 59s)
  • 01:15 jforrester@deploy1001: Synchronized dblists/wikidataclient.dblist: T239318 Add sewikimedia to wikidataclient (duration: 01m 03s)
  • 01:05 jforrester@deploy1001: Synchronized wmf-config/InitialiseSettings.php: T239091 Revert 'Enable Translate extension on sewikimedia' (duration: 01m 01s)
  • 01:00 James_F: mwscript sql.php --wiki=sewikimedia php-1.35.0-wmf.8/extensions/Translate/sql/translate_{…}.sql T239091
  • 00:56 jforrester@deploy1001: Synchronized wmf-config/InitialiseSettings.php: T239091 Enable Translate extension on sewikimedia (duration: 00m 57s)
  • 00:54 James_F: mwscript sql.php --wiki=sewikimedia php-1.35.0-wmf.5/extensions/Wikibase/client/sql/entity_usage.sql
  • 00:25 jforrester@deploy1001: Synchronized php-1.35.0-wmf.8/extensions/Echo/includes/DiscussionParser.php: T239275 Fix type hint fatal from getUserLinks() (duration: 01m 16s)
  • 00:11 dzahn@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
  • 00:09 dzahn@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
  • 00:07 dzahn@cumin1001: START - Cookbook sre.hosts.downtime
  • 00:07 dzahn@cumin1001: START - Cookbook sre.hosts.downtime
  • 00:05 dzahn@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
  • 00:03 dzahn@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
  • 00:02 dzahn@cumin1001: START - Cookbook sre.hosts.downtime
  • 00:01 dzahn@cumin1001: START - Cookbook sre.hosts.downtime

2019-12-02

  • 23:37 dzahn@cumin1001: conftool action : set/pooled=yes; selector: name=mw2251.codfw.wmnet
  • 23:06 dzahn@cumin1001: conftool action : set/pooled=yes; selector: name=mw2249.codfw.wmnet
  • 23:06 dzahn@cumin1001: conftool action : set/pooled=yes; selector: name=mw2248.codfw.wmnet
  • 23:05 mutante: mw2248 - restart nginx (for some reason unit was running but not listening on 443 after reimage..now it does)
  • 23:05 bblack@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
  • 23:02 bblack@cumin1001: START - Cookbook sre.hosts.downtime
  • 22:46 ejegg: updated payments-wiki from 06a8c3cdff to f61c9f0692
  • 22:44 bblack: reimaging dns4002 to buster - T239667
  • 22:07 mholloway-shell@deploy1001: Synchronized php-1.35.0-wmf.8/extensions/MachineVision: Update text for no personal uploads message (T238873) (duration: 01m 03s)
  • 22:05 dzahn@cumin1001: END (FAIL) - Cookbook sre.hosts.downtime (exit_code=99)
  • 22:03 dzahn@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
  • 22:01 dzahn@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
  • 22:00 dzahn@cumin1001: START - Cookbook sre.hosts.downtime
  • 22:00 dzahn@cumin1001: START - Cookbook sre.hosts.downtime
  • 21:59 dzahn@cumin1001: START - Cookbook sre.hosts.downtime
  • 21:32 dzahn@cumin1001: conftool action : set/pooled=yes; selector: name=mw2247.codfw.wmnet
  • 21:31 dzahn@cumin1001: conftool action : set/pooled=yes; selector: name=mw2246.codfw.wmnet
  • 21:25 dzahn@cumin1001: conftool action : set/pooled=yes; selector: name=mw2230.codfw.wmnet
  • 21:25 mholloway-shell@deploy1001: helmfile [CODFW] Ran 'apply' command on namespace 'wikifeeds' for release 'production' .
  • 21:23 mholloway-shell@deploy1001: helmfile [EQIAD] Ran 'apply' command on namespace 'wikifeeds' for release 'production' .
  • 21:22 mholloway-shell@deploy1001: helmfile [STAGING] Ran 'apply' command on namespace 'wikifeeds' for release 'staging' .
  • 20:59 marostegui@cumin1001: dbctl commit (dc=all): 'Repool db1078 after schema change', diff saved to https://phabricator.wikimedia.org/P9796 and previous config saved to /var/cache/conftool/dbconfig/20191202-205904-marostegui.json
  • 20:47 ariel@cumin1001: conftool action : set/pooled=yes; selector: cluster=appserver,name=mw2232.codfw.wmnet,service=nginx,dc=codfw
  • 20:47 ariel@cumin1001: conftool action : set/pooled=yes; selector: cluster=appserver,name=mw2232.codfw.wmnet,service=apache2,dc=codfw
  • 20:47 ariel@cumin1001: conftool action : set/pooled=yes; selector: name=mw2233.codfw.wmnet,dc=codfw,service=nginx,cluster=appserver
  • 20:46 ariel@cumin1001: conftool action : set/pooled=yes; selector: name=mw2233.codfw.wmnet,dc=codfw,service=apache2,cluster=appserver
  • 20:46 ariel@cumin1001: conftool action : set/pooled=yes; selector: name=mw2234.codfw.wmnet,service=nginx,cluster=appserver,dc=codfw
  • 20:46 ariel@cumin1001: conftool action : set/pooled=yes; selector: name=mw2234.codfw.wmnet,service=apache2,cluster=appserver,dc=codfw
  • 20:46 ariel@cumin1001: conftool action : set/pooled=yes; selector: cluster=appserver,name=mw2231.codfw.wmnet,service=nginx,dc=codfw
  • 20:46 ariel@cumin1001: conftool action : set/pooled=yes; selector: cluster=appserver,name=mw2231.codfw.wmnet,service=apache2,dc=codfw
  • 20:36 mobrovac@deploy1001: Synchronized wmf-config/InitialiseSettings.php: Switch Flow on all wikis to Parsoid/PHP - T229015 (duration: 00m 59s)
  • 20:35 otto@deploy1001: helmfile [STAGING] Ran 'apply' command on namespace 'eventgate-logging-external' for release 'logging-external' .
  • 20:26 otto@deploy1001: helmfile [STAGING] Ran 'apply' command on namespace 'eventgate-logging-external' for release 'logging-external' .
  • 20:21 mobrovac@deploy1001: Finished deploy [restbase/deploy@92acf1e]: Switch everything to Parsoid/PHP - T229015 (duration: 14m 59s)
  • 20:12 joal@deploy1001: Finished deploy [analytics/refinery@9cd234a] (thin): Analytics deploy - Fixes for today deploy (2) (duration: 00m 05s)
  • 20:12 joal@deploy1001: Started deploy [analytics/refinery@9cd234a] (thin): Analytics deploy - Fixes for today deploy (2)
  • 20:08 joal@deploy1001: Finished deploy [analytics/refinery@9cd234a]: Analytics deploy - Fixes for today deploy (2) (duration: 08m 08s)
  • 20:06 mobrovac@deploy1001: Started deploy [restbase/deploy@92acf1e]: Switch everything to Parsoid/PHP - T229015
  • 20:05 reedy@deploy1001: Synchronized wmf-config/LabsServices.php: labslabslabs (duration: 01m 08s)
  • 20:05 mobrovac@deploy1001: Finished deploy [restbase/deploy@92acf1e] (dev-cluster): Switch everything to Parsoid/PHP (duration: 02m 48s)
  • 20:02 mobrovac@deploy1001: Started deploy [restbase/deploy@92acf1e] (dev-cluster): Switch everything to Parsoid/PHP
  • 20:01 dzahn@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
  • 19:59 joal@deploy1001: Started deploy [analytics/refinery@9cd234a]: Analytics deploy - Fixes for today deploy (2)
  • 19:59 dzahn@cumin1001: START - Cookbook sre.hosts.downtime
  • 19:56 ariel@cumin1001: END (FAIL) - Cookbook sre.hosts.downtime (exit_code=99)
  • 19:55 ariel@cumin1001: END (FAIL) - Cookbook sre.hosts.downtime (exit_code=99)
  • 19:55 ariel@cumin1001: END (FAIL) - Cookbook sre.hosts.downtime (exit_code=99)
  • 19:53 dzahn@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
  • 19:51 ariel@cumin1001: START - Cookbook sre.hosts.downtime
  • 19:51 dzahn@cumin1001: END (FAIL) - Cookbook sre.hosts.downtime (exit_code=99)
  • 19:50 ariel@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
  • 19:50 otto@deploy1001: helmfile [STAGING] Ran 'apply' command on namespace 'eventgate-logging-external' for release 'logging-external' .
  • 19:50 dzahn@cumin1001: START - Cookbook sre.hosts.downtime
  • 19:50 ariel@cumin1001: START - Cookbook sre.hosts.downtime
  • 19:50 dzahn@cumin1001: START - Cookbook sre.hosts.downtime
  • 19:50 ariel@cumin1001: START - Cookbook sre.hosts.downtime
  • 19:48 ariel@cumin1001: START - Cookbook sre.hosts.downtime
  • 19:37 mobrovac@deploy1001: Finished deploy [restbase/deploy@e69e2e5]: Switch everything but enwiki to Parsoid/PHP - T229015 (duration: 13m 48s)
  • 19:27 dzahn@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
  • 19:25 dzahn@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
  • 19:24 dzahn@cumin1001: START - Cookbook sre.hosts.downtime
  • 19:23 mobrovac@deploy1001: Started deploy [restbase/deploy@e69e2e5]: Switch everything but enwiki to Parsoid/PHP - T229015
  • 19:23 dzahn@cumin1001: START - Cookbook sre.hosts.downtime
  • 19:22 mobrovac@deploy1001: Finished deploy [restbase/deploy@e69e2e5] (dev-cluster): Switch everything but enwiki to Parsoid/PHP (duration: 06m 38s)
  • 19:16 mobrovac@deploy1001: Started deploy [restbase/deploy@e69e2e5] (dev-cluster): Switch everything but enwiki to Parsoid/PHP
  • 19:04 mobrovac@deploy1001: Finished deploy [restbase/deploy@6a24685]: Parsoid Proxy: Direct html2html traffic to JS; Stop honouring the variant header; Switch sr and zh wikis to PHP - T229015 (duration: 14m 11s)
  • 18:58 jiji@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
  • 18:56 jiji@cumin1001: START - Cookbook sre.hosts.downtime
  • 18:50 mobrovac@deploy1001: Started deploy [restbase/deploy@6a24685]: Parsoid Proxy: Direct html2html traffic to JS; Stop honouring the variant header; Switch sr and zh wikis to PHP - T229015
  • 18:39 joal@deploy1001: Finished deploy [analytics/refinery@980298b] (thin): Analytics deploy - Fixes for today deploy (duration: 00m 06s)
  • 18:39 joal@deploy1001: Started deploy [analytics/refinery@980298b] (thin): Analytics deploy - Fixes for today deploy
  • 18:38 joal@deploy1001: Finished deploy [analytics/refinery@980298b]: Analytics deploy - Fixes for today deploy (duration: 08m 21s)
  • 18:32 otto@deploy1001: helmfile [STAGING] Ran 'apply' command on namespace 'eventgate-logging-external' for release 'logging-external' .
  • 18:30 joal@deploy1001: Started deploy [analytics/refinery@980298b]: Analytics deploy - Fixes for today deploy
  • 18:22 dzahn@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
  • 18:21 dzahn@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
  • 18:19 dzahn@cumin1001: START - Cookbook sre.hosts.downtime
  • 18:18 dzahn@cumin1001: START - Cookbook sre.hosts.downtime
  • 18:16 onimisionipe@deploy1001: Finished deploy [wdqs/wdqs@97d17f6]: New blazegraph and WDQS build plus GUI changes (duration: 15m 42s)
  • 18:10 dzahn@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
  • 18:08 dzahn@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
  • 18:06 dzahn@cumin1001: START - Cookbook sre.hosts.downtime
  • 18:05 dzahn@cumin1001: START - Cookbook sre.hosts.downtime
  • 18:00 onimisionipe@deploy1001: Started deploy [wdqs/wdqs@97d17f6]: New blazegraph and WDQS build plus GUI changes
  • 17:56 mobrovac@deploy1001: Finished deploy [restbase/deploy@ff7862f]: Switch sr and zh wikipediae back to Parsoid/JS - T229015 (duration: 14m 06s)
  • 17:51 jiji@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
  • 17:49 jiji@cumin1001: START - Cookbook sre.hosts.downtime
  • 17:42 mobrovac@deploy1001: Started deploy [restbase/deploy@ff7862f]: Switch sr and zh wikipediae back to Parsoid/JS - T229015
  • 17:29 ppchelko@deploy1001: Finished deploy [cpjobqueue/deploy@deafe56]: Followup on cirrusSearchElasticWrite partitioning T230495 (duration: 01m 14s)
  • 17:28 ppchelko@deploy1001: Started deploy [cpjobqueue/deploy@deafe56]: Followup on cirrusSearchElasticWrite partitioning T230495
  • 17:21 ssastry@deploy1001: Finished deploy [parsoid/deploy@743efb0]: Updating Parsoid to ca588b25 + fix broken langconv library / deploy (duration: 07m 48s)
  • 17:14 ssastry@deploy1001: Started deploy [parsoid/deploy@743efb0]: Updating Parsoid to ca588b25 + fix broken langconv library / deploy
  • 17:09 ejegg: disabled fundraising job omnimail_groupmember_load
  • 16:45 jiji@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
  • 16:43 ejegg: updated fundraising internal dashboard from 8fc2726736 to 3a93d2aba4
  • 16:43 effie: restart all API cluster in eqiad
  • 16:43 jiji@cumin1001: START - Cookbook sre.hosts.downtime
  • 16:42 hashar: Restarted CI Jenkins
  • 16:41 mobrovac@deploy1001: Finished deploy [restbase/deploy@3516382]: Switch ru, sr and zh wikipediae to Parsoid/PHP - T229015 (duration: 13m 53s)
  • 16:41 ema: cp3050: ats-be restart with proxy.config.http.server_session_sharing.pool=global T238494
  • 16:32 ema: cp3053: repooling after firmware update T239041
  • 16:27 mobrovac@deploy1001: Started deploy [restbase/deploy@3516382]: Switch ru, sr and zh wikipediae to Parsoid/PHP - T229015
  • 16:19 effie: reimage mw1295.eqiad.wmnet mw1294.eqiad.wmnet mw1293.eqiad.wmnet
  • 16:11 robh: cp3053 depooling and rebooting for firmware update T239041
  • 16:10 robh: cp3035 depooling and rebooting for firmware update T239041
  • 15:38 elukey@cumin1001: END (PASS) - Cookbook sre.aqs.roll-restart (exit_code=0)
  • 15:38 mobrovac@deploy1001: Synchronized wmf-config/InitialiseSettings.php: Parsoid VRS: Switch groups 0 and 1 to Parsoid/PHP - T229015 (duration: 00m 59s)
  • 15:35 elukey@cumin1001: START - Cookbook sre.aqs.roll-restart
  • 15:30 mobrovac@deploy1001: Finished deploy [restbase/deploy@d6d5a6e]: Parsoid Proxy: Do not use the fall-back for linting transforms - T239607 (duration: 14m 51s)
  • 15:26 effie: Rolling restart mw1345-1348
  • 15:15 mobrovac@deploy1001: Started deploy [restbase/deploy@d6d5a6e]: Parsoid Proxy: Do not use the fall-back for linting transforms - T239607
  • 14:46 ema: cp-ats: set server_session_sharing.match=2 everywhere (puppet re-enable and run) T238494
  • 14:31 ema: cp-ats: merge server_session_sharing.match=2 (https://gerrit.wikimedia.org/r/#/c/operations/puppet/+/553490/) with puppet disabled, test on cp3050 T238494
  • 14:18 godog: set grafana theme back to light, was dark for some reason
  • 14:10 jiji@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
  • 14:08 jiji@cumin1001: START - Cookbook sre.hosts.downtime
  • 13:56 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db1078 for schema change', diff saved to https://phabricator.wikimedia.org/P9794 and previous config saved to /var/cache/conftool/dbconfig/20191202-135643-marostegui.json
  • 13:55 marostegui@cumin1001: dbctl commit (dc=all): 'Repool db1075 after schema change', diff saved to https://phabricator.wikimedia.org/P9793 and previous config saved to /var/cache/conftool/dbconfig/20191202-135543-marostegui.json
  • 13:47 ema: power-cycle cp3053 T239041
  • 13:44 hashar: Restarted CI Jenkins
  • 13:30 hashar: Restarted CI Jenkins
  • 13:14 mobrovac@deploy1001: Finished deploy [restbase/deploy@eedba38]: Parsoid Proxy: Fixes - T229015 (duration: 14m 49s)
  • 13:04 jiji@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
  • 13:01 jiji@cumin1001: START - Cookbook sre.hosts.downtime
  • 12:59 mobrovac@deploy1001: Started deploy [restbase/deploy@eedba38]: Parsoid Proxy: Fixes - T229015
  • 12:57 mobrovac@deploy1001: Finished deploy [restbase/deploy@eedba38] (dev-cluster): Parsoid Proxy: Fixes (duration: 02m 54s)
  • 12:54 mobrovac@deploy1001: Started deploy [restbase/deploy@eedba38] (dev-cluster): Parsoid Proxy: Fixes
  • 12:54 Urbanecm: EU SWAT done
  • 12:50 urbanecm@deploy1001: Synchronized wmf-config/InitialiseSettings.php: SWAT: d27fe78: Enable partial blocks on eswiki (T239370) (duration: 01m 00s)
  • 12:45 urbanecm@deploy1001: Synchronized wmf-config/InitialiseSettings.php: SWAT: 445bdc3: Remove `move-rootuserpages` from user on svwiki (T238842) (duration: 01m 04s)
  • 12:43 Urbanecm: Purge https://en.wikipedia.org/static/images/project-logos/bawiki*.png
  • 12:39 urbanecm@deploy1001: Synchronized static/images/project-logos/: SWAT: 61a9563: Revert "Change bawiki logo to an anniversary one" (T237070) (duration: 01m 06s)
  • 12:37 effie: reimage mw1296.eqiad.wmnet
  • 12:37 effie: reimage mw1298.eqiad.wmnet
  • 12:31 ladsgroup@deploy1001: Synchronized wmf-config/InitialiseSettings.php: Set read new for term store for items of wikidata up to Q1000 (T225057) (duration: 01m 00s)
  • 12:19 lucaswerkmeister-wmde@deploy1001: Synchronized php-1.35.0-wmf.8/extensions/GrowthExperiments/: SWAT: Suggested edits: do not treat AQS lookup failure as error (T238178) (duration: 01m 02s)
  • 11:58 jiji@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
  • 11:56 jiji@cumin1001: START - Cookbook sre.hosts.downtime
  • 11:50 filippo@cumin1001: conftool action : set/pooled=yes; selector: name=mw2229.codfw.wmnet
  • 11:38 jdrewniak@deploy1001: Synchronized portals: Wikimedia Portals Update: Bumping portals to master (T128546) (duration: 01m 00s)
  • 11:37 jdrewniak@deploy1001: Synchronized portals/wikipedia.org/assets: Wikimedia Portals Update: Bumping portals to master (T128546) (duration: 01m 04s)
  • 11:07 jiji@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
  • 11:05 jiji@cumin1001: START - Cookbook sre.hosts.downtime
  • 11:03 moritzm: installing ruby2.1 security updates
  • 10:59 jiji@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
  • 10:57 jiji@cumin1001: START - Cookbook sre.hosts.downtime
  • 10:54 filippo@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
  • 10:52 filippo@cumin1001: START - Cookbook sre.hosts.downtime
  • 10:44 jiji@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
  • 10:43 moritzm: installing python-psutil security updates
  • 10:42 jiji@cumin1001: START - Cookbook sre.hosts.downtime
  • 10:42 effie: reimage mw1299.eqiad.wmnet
  • 10:18 effie: reimage mw1290.eqiad.wmnet
  • 10:18 effie: reimage mw1275.eqiad.wmnet
  • 10:15 moritzm: installing file/libmagic regresssion update for jessie
  • 10:08 filippo@cumin1001: conftool action : set/pooled=no; selector: name=mw2229.codfw.wmnet
  • 09:52 godog: swift eqiad-prod: more weight to ms-be105[7-9] - T237438
  • 09:51 jiji@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
  • 09:49 jiji@cumin1001: START - Cookbook sre.hosts.downtime
  • 09:49 jiji@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
  • 09:46 jiji@cumin1001: START - Cookbook sre.hosts.downtime
  • 09:41 joal@deploy1001: Finished deploy [analytics/refinery@8991301] (thin): Regular analytics deploy - late from last week (thin) (duration: 00m 08s)
  • 09:41 joal@deploy1001: Started deploy [analytics/refinery@8991301] (thin): Regular analytics deploy - late from last week (thin)
  • 09:40 joal@deploy1001: Finished deploy [analytics/refinery@8991301]: Regular analytics deploy - late from last week (duration: 18m 22s)
  • 09:23 effie: reimage mw1300.eqiad.wmnet
  • 09:23 effie: reimage mw1300.eqiad.wmne
  • 09:22 joal@deploy1001: Started deploy [analytics/refinery@8991301]: Regular analytics deploy - late from last week
  • 09:16 moritzm: installing libvpx security updates
  • 09:14 godog: extend graphite LVs on graphite1004 / graphite2003 by 200G
  • 08:39 jiji@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
  • 08:37 jiji@cumin1001: START - Cookbook sre.hosts.downtime
  • 08:33 jiji@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
  • 08:31 jiji@cumin1001: START - Cookbook sre.hosts.downtime
  • 08:14 effie: reimage mw1287.eqiad.wmnet mw1288.eqiad.wmnet mw1289.eqiad.wmnet
  • 08:08 effie: reimage mw1301.eqiad.wmnet
  • 08:03 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
  • 08:00 marostegui@cumin1001: START - Cookbook sre.hosts.downtime
  • 07:18 andrewbogott: forcing a reboot of cloudstore1008 via mgmt console β€” it seems to have locked up
  • 06:43 Urbanecm: Clear account creation throttle for several IPs (T239465)
  • 06:38 urbanecm@deploy1001: Synchronized wmf-config/throttle.php: New throttle rule for cawiki workshop (T239465) (duration: 01m 03s)
  • 06:00 marostegui: Compress s8 codfw master (lag might appear on codfw s8)
  • 06:00 marostegui: Compress s4 codfw master (lag might appear on codfw s4)
  • 05:56 marostegui: Deploy schema change on db1075
  • 05:55 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db1075 for schema change', diff saved to https://phabricator.wikimedia.org/P9791 and previous config saved to /var/cache/conftool/dbconfig/20191202-055546-marostegui.json
  • 05:53 marostegui: Compress db1099:3318 T235599
  • 05:52 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db1099:3318 for compression', diff saved to https://phabricator.wikimedia.org/P9790 and previous config saved to /var/cache/conftool/dbconfig/20191202-055245-marostegui.json

2019-12-01

  • 23:27 ladsgroup@deploy1001: Started restart [mobileapps/deploy@70154b4]: Rolling restart of mobileapps
  • 23:20 bblack: restarting AQS services in eqiad
  • 23:15 eileen: process-control config revision is 9750c318a0 - jobs disabled
  • 21:39 andrewbogott: restarted nova conductor and api on cloudcontrol1003 and 1004 to free up db connections (T239168)

2019-11-30

  • 15:47 Urbanecm: Reset email of SUL user Hayk.arabaget (T239462)
  • 07:40 vgutierrez: repooling cp3057 - T239502
  • 07:30 vgutierrez@puppetmaster1001: conftool action : set/pooled=no; selector: name=cp3057.esams.wmnet
  • 07:30 vgutierrez: depool and powercycle cp3057 - T239502

2019-11-29

  • 22:02 jiji@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
  • 22:00 jiji@cumin1001: START - Cookbook sre.hosts.downtime
  • 21:38 jiji@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
  • 21:36 jiji@cumin1001: START - Cookbook sre.hosts.downtime
  • 21:12 effie: reimage mw1302.eqiad.wmnet
  • 20:50 jiji@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
  • 20:47 jiji@cumin1001: START - Cookbook sre.hosts.downtime
  • 20:15 jiji@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
  • 20:13 jiji@cumin1001: START - Cookbook sre.hosts.downtime
  • 19:44 jiji@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
  • 19:42 jiji@cumin1001: START - Cookbook sre.hosts.downtime
  • 19:19 effie: reimage mw1284.eqiad.wmnet
  • 19:19 effie: reimage mw1303.eqiad.wmnet mw1283.eqiad.wmnet
  • 17:26 jiji@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
  • 17:24 jiji@cumin1001: START - Cookbook sre.hosts.downtime
  • 16:51 jiji@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
  • 16:49 jiji@cumin1001: START - Cookbook sre.hosts.downtime
  • 16:42 jiji@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
  • 16:40 jiji@cumin1001: START - Cookbook sre.hosts.downtime
  • 16:22 filippo@cumin1001: conftool action : set/pooled=yes; selector: name=mw2228.codfw.wmnet
  • 16:17 effie: reimage mw1274.eqiad.wmnet
  • 16:14 jiji@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
  • 16:12 jiji@cumin1001: START - Cookbook sre.hosts.downtime
  • 15:40 jiji@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
  • 15:38 jiji@cumin1001: START - Cookbook sre.hosts.downtime
  • 15:25 filippo@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
  • 15:23 filippo@cumin1001: START - Cookbook sre.hosts.downtime
  • 15:11 jiji@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
  • 15:09 jiji@cumin1001: START - Cookbook sre.hosts.downtime
  • 15:01 jiji@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
  • 14:59 jiji@cumin1001: START - Cookbook sre.hosts.downtime
  • 14:45 effie: reimage mw1282.eqiad.wmnet
  • 14:45 effie: reimage mw1282.eqiad.wmne
  • 14:37 jiji@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
  • 14:36 effie: reimage mw1323.eqiad.wmnet mw1297.eqiad.wmnet mw1273.eqiad.wmnet
  • 14:35 jiji@cumin1001: START - Cookbook sre.hosts.downtime
  • 14:27 jiji@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
  • 14:25 jiji@cumin1001: START - Cookbook sre.hosts.downtime
  • 14:14 filippo@cumin1001: conftool action : set/pooled=inactive; selector: name=mw2228.codfw.wmnet
  • 14:13 godog: reimage mw2228 for partman tests
  • 14:02 effie: reimage mw1271.eqiad.wmnet mw1272.eqiad.wmnet mw1304.eqiad.wmnet
  • 13:48 jiji@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
  • 13:46 jiji@cumin1001: START - Cookbook sre.hosts.downtime
  • 13:33 jynus: reenable puppet on dbprov2001, backup1001
  • 13:29 jiji@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
  • 13:27 jiji@cumin1001: START - Cookbook sre.hosts.downtime
  • 12:59 jiji@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
  • 12:57 jiji@cumin1001: START - Cookbook sre.hosts.downtime
  • 12:48 jynus: disabling puppet also on on backup1001 to test recoveries
  • 12:37 jiji@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
  • 12:35 jiji@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
  • 12:34 jiji@cumin1001: START - Cookbook sre.hosts.downtime
  • 12:33 jiji@cumin1001: START - Cookbook sre.hosts.downtime
  • 12:24 jiji@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
  • 12:22 jiji@cumin1001: START - Cookbook sre.hosts.downtime
  • 11:58 effie: reimage mw1305.eqiad.wmnet mw1265.eqiad.wmnet mw1270.eqiad.wmnet
  • 11:51 jiji@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
  • 11:49 jiji@cumin1001: START - Cookbook sre.hosts.downtime
  • 11:39 jynus: disabling puppet on dbprov2001 to test recoveries
  • 11:34 effie: reimage mw1268.eqiad.wmnet mw1280.eqiad.wmnet mw1281.eqiad.wmnet
  • 11:24 jiji@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
  • 11:22 jiji@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
  • 11:20 jiji@cumin1001: START - Cookbook sre.hosts.downtime
  • 11:20 jiji@cumin1001: START - Cookbook sre.hosts.downtime
  • 11:03 Lucas_WMDE: <effie> 10:58:17 log reimage mw1268.eqiad.wmnet mw1280.eqiad.wmnet mw1281.eqiad.wmne
  • 11:01 jiji@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
  • 10:58 jiji@cumin1001: START - Cookbook sre.hosts.downtime
  • 10:48 jiji@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
  • 10:47 elukey@deploy1001: Finished deploy [analytics/refinery@97015e4] (thin): Deploy thin Analytics Refinery (no jars/git-fat-obj) to notebook and labstore hosts (duration: 00m 08s)
  • 10:47 elukey@deploy1001: Started deploy [analytics/refinery@97015e4] (thin): Deploy thin Analytics Refinery (no jars/git-fat-obj) to notebook and labstore hosts
  • 10:46 jiji@cumin1001: START - Cookbook sre.hosts.downtime
  • 10:22 effie: reimage mw1306.eqiad.wmnet mw1264.eqiad.wmnet mw1279.eqiad.wmnet
  • 09:56 jiji@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
  • 09:53 jiji@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
  • 09:53 jiji@cumin1001: START - Cookbook sre.hosts.downtime
  • 09:51 jiji@cumin1001: START - Cookbook sre.hosts.downtime
  • 09:49 jiji@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
  • 09:46 jiji@cumin1001: START - Cookbook sre.hosts.downtime
  • 09:33 marostegui: Remove triggers from db2094:3313 - T234704
  • 09:33 marostegui: Stop replication on db2105 (s3 codfw) for schema change
  • 09:23 effie: reimage mw1263.eqiad.wmnet mw1307.eqiad.wmnet
  • 09:03 jiji@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
  • 09:01 jiji@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
  • 09:01 volans: temporary disabling puppet on 'R:keyholder::agent' to merge gerrit:operations/puppet/+/553460 - T239386
  • 09:00 jiji@cumin1001: START - Cookbook sre.hosts.downtime
  • 08:59 jiji@cumin1001: START - Cookbook sre.hosts.downtime
  • 08:45 jiji@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
  • 08:43 jiji@cumin1001: START - Cookbook sre.hosts.downtime
  • 08:18 effie: reimage mw2223.codfw.wmnet mw2222.codfw.wmnet mw2221.codfw.wmnet mw2220.codfw.wmnet
  • 07:50 jiji@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
  • 07:48 jiji@cumin1001: END (FAIL) - Cookbook sre.hosts.downtime (exit_code=99)
  • 07:48 jiji@cumin1001: START - Cookbook sre.hosts.downtime
  • 07:48 jiji@cumin1001: START - Cookbook sre.hosts.downtime
  • 07:25 effie: reimage mw1312.eqiad.wmnet mw1308.eqiad.wmnet mw1261.eqiad.wmnet
  • 06:15 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.decommission (exit_code=0)
  • 06:15 marostegui@cumin1001: START - Cookbook sre.hosts.decommission
  • 05:58 marostegui@cumin1001: dbctl commit (dc=all): 'Repool db1134 after schema change', diff saved to https://phabricator.wikimedia.org/P9781 and previous config saved to /var/cache/conftool/dbconfig/20191129-055845-marostegui.json
  • 05:08 krinkle@deploy1001: Synchronized php-1.35.0-wmf.5/includes/exception/MWExceptionHandler.php: 532f4aba96d85 (duration: 01m 03s)

2019-11-28

  • 23:46 jiji@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
  • 23:43 jiji@cumin1001: START - Cookbook sre.hosts.downtime
  • 23:21 effie: reimage mw1329.eqiad.wmnet
  • 23:01 effie: restart cp1087
  • 22:58 jiji@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
  • 22:56 jiji@cumin1001: START - Cookbook sre.hosts.downtime
  • 22:49 jiji@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
  • 22:47 jiji@cumin1001: START - Cookbook sre.hosts.downtime
  • 22:37 jiji@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
  • 22:35 jiji@cumin1001: START - Cookbook sre.hosts.downtime
  • 21:44 jiji@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
  • 21:42 jiji@cumin1001: START - Cookbook sre.hosts.downtime
  • 21:35 jiji@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
  • 21:33 jiji@cumin1001: START - Cookbook sre.hosts.downtime
  • 21:26 jiji@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
  • 21:23 jiji@cumin1001: START - Cookbook sre.hosts.downtime
  • 21:19 effie: reimage mw1309.eqiad.wmnet
  • 21:19 effie: reimage mw1323.eqiad.wmnet
  • 21:11 effie: reimage mw1316.eqiad.wmnet mw1315.eqiad.wmnet
  • 20:28 jiji@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
  • 20:26 jiji@cumin1001: START - Cookbook sre.hosts.downtime
  • 20:13 jiji@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
  • 20:10 jiji@cumin1001: START - Cookbook sre.hosts.downtime
  • 20:04 jiji@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
  • 20:03 effie: reimage mw1313.eqiad.wmnet
  • 20:02 jiji@cumin1001: START - Cookbook sre.hosts.downtime
  • 19:48 effie: reimage mw1331.eqiad.wmnet mw1330.eqiad.wmnet mw1310.eqiad.wmnet
  • 18:59 jiji@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
  • 18:57 jiji@cumin1001: START - Cookbook sre.hosts.downtime
  • 18:54 jiji@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
  • 18:52 jiji@cumin1001: START - Cookbook sre.hosts.downtime
  • 18:50 jiji@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
  • 18:48 jiji@cumin1001: START - Cookbook sre.hosts.downtime
  • 18:41 marostegui: Deploy schema change on db1134
  • 18:39 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db1134 for schema change', diff saved to https://phabricator.wikimedia.org/P9780 and previous config saved to /var/cache/conftool/dbconfig/20191128-183918-marostegui.json
  • 18:29 effie: reimage w1319.eqiad.wmnet mw1318.eqiad.wmnet
  • 18:05 marostegui@cumin1001: dbctl commit (dc=all): 'Repool db1106 after schema change', diff saved to https://phabricator.wikimedia.org/P9779 and previous config saved to /var/cache/conftool/dbconfig/20191128-180517-marostegui.json
  • 17:54 jiji@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
  • 17:52 jiji@cumin1001: START - Cookbook sre.hosts.downtime
  • 17:44 jiji@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
  • 17:42 jiji@cumin1001: START - Cookbook sre.hosts.downtime
  • 17:19 effie: reimage mw1340.eqiad.wmnet mw1339.eqiad.wmnet
  • 17:15 jiji@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
  • 17:12 jiji@cumin1001: START - Cookbook sre.hosts.downtime
  • 16:34 jiji@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
  • 16:32 jiji@cumin1001: START - Cookbook sre.hosts.downtime
  • 16:23 jiji@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
  • 16:21 jiji@cumin1001: START - Cookbook sre.hosts.downtime
  • 16:18 phamhi@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
  • 16:15 phamhi@cumin1001: START - Cookbook sre.hosts.downtime
  • 16:07 jiji@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
  • 16:05 jiji@cumin1001: START - Cookbook sre.hosts.downtime
  • 15:58 effie: reimage mw1311.eqiad.wmnet
  • 15:30 jiji@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
  • 15:28 jiji@cumin1001: START - Cookbook sre.hosts.downtime
  • 15:04 effie: reimage mw1333.eqiad.wmnet mw1332.eqiad.wmnet mw1331.eqiad.wmnet
  • 14:53 jiji@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
  • 14:51 jiji@cumin1001: START - Cookbook sre.hosts.downtime
  • 14:29 effie: reimage mw1343.eqiad.wmnet mw1342.eqiad.wmnet mw1341.eqiad.wmnet
  • 14:22 jiji@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
  • 14:20 jiji@cumin1001: START - Cookbook sre.hosts.downtime
  • 14:20 marostegui: Deploy schema change on s3 codfw on the master, lag will appear on s3 codfw (T234066)
  • 13:57 Amir1: start of mwscript extensions/Wikibase/repo/maintenance/rebuildPropertyTerms.php --wiki=wikidatawiki --batch-size 5 (T237984)
  • 13:57 marostegui: Deploy schema change on s4 codfw master with replication - T234066
  • 13:52 jiji@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
  • 13:50 jiji@cumin1001: START - Cookbook sre.hosts.downtime
  • 13:37 jiji@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
  • 13:37 marostegui: Deploy schema change on db1106 with replication (lag will appear on s1 on labs) - T234066 T233135
  • 13:37 marostegui: Recreate views for enwiki_p.protected_titles for all labsdb hosts - T233135
  • 13:35 jiji@cumin1001: START - Cookbook sre.hosts.downtime
  • 13:33 phamhi@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
  • 13:33 phamhi@cumin1001: START - Cookbook sre.hosts.downtime
  • 13:31 marostegui: Remove ar_comment triggers from db1124:3311 for enwiki.archive - T234704
  • 13:30 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db1106 for schema change, temporarily pool db1080 as vslow,dump', diff saved to https://phabricator.wikimedia.org/P9778 and previous config saved to /var/cache/conftool/dbconfig/20191128-133013-marostegui.json
  • 13:28 volans: cleanup root's crontab entries on netmon hosts from netbox/postres stuff - T238919
  • 13:26 marostegui@cumin1001: dbctl commit (dc=all): 'Repool db1118 after schema change', diff saved to https://phabricator.wikimedia.org/P9777 and previous config saved to /var/cache/conftool/dbconfig/20191128-132647-marostegui.json
  • 13:21 volans: cumin 'netmon*' 'rm -v /var/spool/cron/crontabs/postgres' T238919
  • 13:18 jiji@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
  • 13:15 effie: enable puppet on thumbor*
  • 13:15 jiji@cumin1001: START - Cookbook sre.hosts.downtime
  • 13:11 jiji@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
  • 13:08 jiji@cumin1001: START - Cookbook sre.hosts.downtime
  • 12:51 effie: disable puppet on thumbor*
  • 12:29 jiji@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
  • 12:27 jiji@cumin1001: START - Cookbook sre.hosts.downtime
  • 12:23 jiji@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
  • 12:21 jiji@cumin1001: START - Cookbook sre.hosts.downtime
  • 12:02 jiji@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
  • 12:00 jiji@cumin1001: START - Cookbook sre.hosts.downtime
  • 11:59 effie: reimage mw1267.eqiad.wmnet mw1277.eqiad.wmnet
  • 11:52 jiji@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
  • 11:50 jiji@cumin1001: START - Cookbook sre.hosts.downtime
  • 11:36 effie: reimage mw1344.eqiad.wmnet mw1334.eqiad.wmnet mw1324.eqiad.wmnet
  • 11:15 jiji@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
  • 11:13 jiji@cumin1001: START - Cookbook sre.hosts.downtime
  • 10:55 effie: reimage mw2279 mw2278 mw2277 mw2276 mw2275
  • 10:53 jiji@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
  • 10:51 jiji@cumin1001: START - Cookbook sre.hosts.downtime
  • 10:45 jiji@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
  • 10:43 jiji@cumin1001: START - Cookbook sre.hosts.downtime
  • 10:39 marostegui: Compress labsdb1009
  • 09:56 jiji@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
  • 09:54 jiji@cumin1001: START - Cookbook sre.hosts.downtime
  • 09:51 godog: swift eqiad-prod: more weight to ms-be105[7-9] - T237438
  • 09:42 jiji@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
  • 09:40 jiji@cumin1001: START - Cookbook sre.hosts.downtime
  • 09:39 jiji@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
  • 09:37 jiji@cumin1001: START - Cookbook sre.hosts.downtime
  • 09:19 jiji@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
  • 09:17 effie: reimage mw1266, mw1276
  • 09:17 jiji@cumin1001: START - Cookbook sre.hosts.downtime
  • 08:56 marostegui: Compress labsdb1011
  • 08:42 jiji@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
  • 08:40 jiji@cumin1001: START - Cookbook sre.hosts.downtime
  • 08:24 jiji@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
  • 08:22 jiji@cumin1001: START - Cookbook sre.hosts.downtime
  • 08:19 marostegui: Remove m4 from tendril and zarcillo - T159170
  • 08:15 effie: reimage mw2280, mw2281, mw2282
  • 08:06 marostegui: Compress labsdb1012
  • 07:56 effie: reimage mw1345, mw1335, mw1325
  • 06:56 elukey: remove log files on an-tool1007 to free root partition space
  • 06:14 marostegui: Remove db1061 from tendril and zarcillo - T238624
  • 06:13 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.decommission (exit_code=0)
  • 06:13 marostegui@cumin1001: START - Cookbook sre.hosts.decommission
  • 06:02 marostegui: Remove db2067 from tendril and zarcillo T233185
  • 05:52 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db1118 for schema change', diff saved to https://phabricator.wikimedia.org/P9776 and previous config saved to /var/cache/conftool/dbconfig/20191128-055212-marostegui.json
  • 05:50 marostegui@cumin1001: dbctl commit (dc=all): 'Repool db1119 after schema change', diff saved to https://phabricator.wikimedia.org/P9775 and previous config saved to /var/cache/conftool/dbconfig/20191128-055025-marostegui.json
  • 03:03 vgutierrez: restarting keyholder on acmechief[12]001
  • 01:41 dzahn@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
  • 01:38 dzahn@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
  • 01:37 dzahn@cumin1001: START - Cookbook sre.hosts.downtime
  • 01:36 dzahn@cumin1001: START - Cookbook sre.hosts.downtime
  • 01:29 dzahn@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
  • 01:27 dzahn@cumin1001: START - Cookbook sre.hosts.downtime
  • 01:21 dzahn@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
  • 01:19 dzahn@cumin1001: START - Cookbook sre.hosts.downtime
  • 00:59 mutante: mw2244 restart php-fpm and apache which somehow are returning 5xx after reimage
  • 00:26 dzahn@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
  • 00:24 dzahn@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
  • 00:24 dzahn@cumin1001: START - Cookbook sre.hosts.downtime
  • 00:22 dzahn@cumin1001: START - Cookbook sre.hosts.downtime

2019-11-27

  • 23:39 dzahn@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
  • 23:37 dzahn@cumin1001: START - Cookbook sre.hosts.downtime
  • 23:29 dzahn@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
  • 23:27 dzahn@cumin1001: START - Cookbook sre.hosts.downtime
  • 23:11 dzahn@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
  • 23:09 dzahn@cumin1001: START - Cookbook sre.hosts.downtime
  • 22:33 dzahn@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
  • 22:31 dzahn@cumin1001: START - Cookbook sre.hosts.downtime
  • 22:21 dzahn@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
  • 22:18 dzahn@cumin1001: START - Cookbook sre.hosts.downtime
  • 21:56 dzahn@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
  • 21:54 dzahn@cumin1001: START - Cookbook sre.hosts.downtime
  • 21:35 mutante: mw2215 scap pull
  • 21:30 mutante: mw2215 rebooting
  • 21:10 bblack: restarting acme-chief service on acmechief1001 (daemon appears to be stuck on a lock and nonfunctional for days...)
  • 20:43 dzahn@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
  • 20:40 dzahn@cumin1001: START - Cookbook sre.hosts.downtime
  • 20:32 dzahn@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
  • 20:29 dzahn@cumin1001: START - Cookbook sre.hosts.downtime
  • 20:14 cstone: payments-wiki revision changed from 2eb54fd6ef to 06a8c3cdff
  • 19:35 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db1119 for schema change', diff saved to https://phabricator.wikimedia.org/P9773 and previous config saved to /var/cache/conftool/dbconfig/20191127-193528-marostegui.json
  • 19:32 marostegui@cumin1001: dbctl commit (dc=all): 'Repool db1080 after schema change', diff saved to https://phabricator.wikimedia.org/P9772 and previous config saved to /var/cache/conftool/dbconfig/20191127-193227-marostegui.json
  • 19:32 ebernhardson@deploy1001: Finished deploy [search/airflow@45b7790]: Allow airflow virtualenv to import system site packages to facilitate libmysqlclient (duration: 00m 45s)
  • 19:31 ebernhardson@deploy1001: Started deploy [search/airflow@45b7790]: Allow airflow virtualenv to import system site packages to facilitate libmysqlclient
  • 19:27 mutante: an-airflow1001 - apt-get install python3-mysqldb - start airflow-webserver
  • 19:24 ebernhardson@deploy1001: Finished deploy [search/airflow@f3bad9d]: revert adding mysqlclient python package (duration: 00m 42s)
  • 19:23 ebernhardson@deploy1001: Started deploy [search/airflow@f3bad9d]: revert adding mysqlclient python package
  • 19:08 ebernhardson@deploy1001: Finished deploy [search/airflow@57f4caa]: Install mysqlclient to airflow instance (duration: 00m 40s)
  • 19:08 ebernhardson@deploy1001: Started deploy [search/airflow@57f4caa]: Install mysqlclient to airflow instance
  • 19:00 mutante: an-airflow1001: cd /etc/ ; chown airflow airflow; systemctl start airflow-webserver to let airflow write unittests.cfg (it tries to write this on first start and did not have permissions to do so) T236180
  • 18:58 mutante: an-airflow1001: cd /etc/ ; chown airflow airflow; systemctl start airflow-webserver to let airflow write unittests.cfg
  • 18:57 eileen: process-control config revision is b95355c0c0 - repair omnirecipient job off
  • 16:57 andrewbogott: disabling puppet on clouvirt* and cloudcontrol* while merging https://gerrit.wikimedia.org/r/#/c/operations/puppet/+/552894/
  • 16:50 oblivian@puppetmaster1001: conftool action : set/pooled=true; selector: dnsdisc=eventgate-logging-external
  • 16:32 cdanis@deploy1001: Synchronized wmf-config/PoolCounterSettings.php: dd4c76d3d SpecialContributions: max concurrency 3 (instead of 10) T234450 (duration: 01m 17s)
  • 16:22 ejegg: shifted daily silverpop export start time one hour earlier
  • 16:15 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db1080 for schema change', diff saved to https://phabricator.wikimedia.org/P9768 and previous config saved to /var/cache/conftool/dbconfig/20191127-161525-marostegui.json
  • 16:14 marostegui@cumin1001: dbctl commit (dc=all): 'Repool db1089 after schema change', diff saved to https://phabricator.wikimedia.org/P9767 and previous config saved to /var/cache/conftool/dbconfig/20191127-161450-marostegui.json
  • 16:06 ema: cp3050: set proxy.config.http.server_session_sharing.match to "ip" T238494
  • 15:57 _joe_: restarting pybal on lvs1015
  • 15:56 jiji@cumin1001: END (FAIL) - Cookbook sre.hosts.downtime (exit_code=99)
  • 15:56 jiji@cumin1001: START - Cookbook sre.hosts.downtime
  • 15:55 _joe_: restarting pybal on lvs1016
  • 15:52 jynus: disabling puppet on dbprov1001 to test bacula restore T238048
  • 15:47 papaul: testing redundancy power on scs-a1-codfw
  • 15:47 _joe_: restarting pybal on lvs2003
  • 15:44 _joe_: restarting pybal again on lvs2006
  • 15:42 jynus: migrate db entries of archive Media to backup1001 T238048
  • 15:37 marostegui: Logging retroactively for the record: drop user 'nova'@'%' from m5 - T239170
  • 15:30 jiji@cumin1001: END (FAIL) - Cookbook sre.hosts.downtime (exit_code=99)
  • 15:30 jiji@cumin1001: START - Cookbook sre.hosts.downtime
  • 15:29 jiji@cumin1001: END (FAIL) - Cookbook sre.hosts.downtime (exit_code=99)
  • 15:29 jiji@cumin1001: START - Cookbook sre.hosts.downtime
  • 15:29 marostegui: Add grants for dump (10.192.0.114,10.192.16.96) for nova_cell0_eqiad database on db1117:3325 and db2078:3325 - T239170
  • 15:27 marostegui: Add grants for dump (10.64.0.95,10.64.16.31) for nova_cell0_eqiad database on db1117:3325 and db2078:3325 - T239170
  • 15:25 _joe_: restarting lvs2006 for addition of eventgate-logging-external,blubberoid-https
  • 15:24 moritzm: installing freetype bugfix updates from Buster 10.2 point release
  • 15:21 oblivian@cumin1001: conftool action : set/weight=10:pooled=yes; selector: service=eventgate-logging-external
  • 15:14 jiji@cumin1001: END (FAIL) - Cookbook sre.hosts.downtime (exit_code=99)
  • 15:14 jiji@cumin1001: START - Cookbook sre.hosts.downtime
  • 15:11 moritzm: downgrading trapperkeeper-webserver-jetty9-clojure packages on puppetdb hosts to the version shipped in Buster 10.2
  • 15:06 jiji@cumin1001: END (FAIL) - Cookbook sre.hosts.downtime (exit_code=99)
  • 15:06 jiji@cumin1001: START - Cookbook sre.hosts.downtime
  • 15:04 ema: cp-ats: rolling ats-{tls,backend} restart to enable lua reload T233274
  • 15:02 moritzm: remove trapperkeeper-webserver-jetty9-clojure debs from apt.wikimedia.org/buster-wikimedia (these were needed to unbreak TLS on Puppetdb in Buster, but an update landed in Buster 10.2, which replaces our custom hotfix)
  • 14:56 marostegui: Add new grants for nova_cell0 database on m5 - T239170
  • 14:50 marostegui: Create nova_cell0 database on m5 master - T239170
  • 14:43 effie: reimage mw1346, mw1336, mw1326
  • 14:35 jiji@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
  • 14:33 jiji@cumin1001: START - Cookbook sre.hosts.downtime
  • 14:15 effie: reimage mw2285, mw2284, mw2283
  • 14:14 effie: reimage mw2285, mw2286, mw2283
  • 14:01 moritzm: temporarily stop cas on idp1001 for some failover tests
  • 14:00 jmm@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
  • 14:00 jmm@cumin1001: START - Cookbook sre.hosts.downtime
  • 13:57 ladsgroup@deploy1001: Synchronized wmf-config/InitialiseSettings.php: Set all of testwikidatawiki to read from the new term store for items (T225057) (duration: 00m 56s)
  • 13:44 jiji@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
  • 13:44 jiji@cumin1001: END (FAIL) - Cookbook sre.hosts.downtime (exit_code=99)
  • 13:42 jiji@cumin1001: END (FAIL) - Cookbook sre.hosts.downtime (exit_code=99)
  • 13:42 jiji@cumin1001: END (FAIL) - Cookbook sre.hosts.downtime (exit_code=99)
  • 13:42 jiji@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
  • 13:42 ema: cp1075: repool with tslua reloads enabled T233274
  • 13:42 jiji@cumin1001: START - Cookbook sre.hosts.downtime
  • 13:42 jiji@cumin1001: START - Cookbook sre.hosts.downtime
  • 13:41 jiji@cumin1001: START - Cookbook sre.hosts.downtime
  • 13:40 jiji@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
  • 13:39 jiji@cumin1001: START - Cookbook sre.hosts.downtime
  • 13:39 jiji@cumin1001: START - Cookbook sre.hosts.downtime
  • 13:38 jiji@cumin1001: START - Cookbook sre.hosts.downtime
  • 13:28 ema: cp1075: ats-{tls,backend} restarted to apply tslua reload changes T233274
  • 13:24 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db1089 for schema change', diff saved to https://phabricator.wikimedia.org/P9766 and previous config saved to /var/cache/conftool/dbconfig/20191127-132359-marostegui.json
  • 13:22 marostegui@cumin1001: dbctl commit (dc=all): 'Repool db1105:3311 after schema change', diff saved to https://phabricator.wikimedia.org/P9765 and previous config saved to /var/cache/conftool/dbconfig/20191127-132220-marostegui.json
  • 13:21 effie: reimage mw2288, mw2287, mw2286
  • 13:13 effie: reimage mw1348, mw1338, mw1328
  • 12:51 jiji@puppetmaster1001: conftool action : set/pooled=yes; selector: dc=codfw,service=nginx,cluster=api_appserver,name=mw2289.codfw.wmnet
  • 12:51 jiji@puppetmaster1001: conftool action : set/pooled=yes; selector: dc=codfw,service=nginx,cluster=api_appserver,name=mw2289.codfw.wmnet
  • 12:50 jiji@puppetmaster1001: conftool action : set/pooled=inactive; selector: dc=codfw,service=nginx,cluster=api_appserver,name=mw2289.codfw.wmnet
  • 12:50 jiji@puppetmaster1001: conftool action : set/pooled=inactive; selector: dc=codfw,service=apache2,cluster=api_appserver,name=mw2289.codfw.wmnet
  • 12:26 jiji@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
  • 12:26 jiji@puppetmaster1001: conftool action : set/pooled=yes; selector: name=mw1327.eqiad.wmnet,service=nginx
  • 12:26 jiji@puppetmaster1001: conftool action : set/pooled=yes; selector: name=mw1347.eqiad.wmnet,service=nginx
  • 12:26 jiji@puppetmaster1001: conftool action : set/pooled=yes; selector: name=mw1337.eqiad.wmnet,service=nginx
  • 12:26 jiji@puppetmaster1001: conftool action : set/pooled=yes; selector: name=mw1327.eqiad.wmnet,service=apache2
  • 12:26 jiji@puppetmaster1001: conftool action : set/pooled=yes; selector: name=mw1347.eqiad.wmnet,service=apache2
  • 12:26 jiji@puppetmaster1001: conftool action : set/pooled=yes; selector: name=mw1337.eqiad.wmnet,service=apache2
  • 12:24 filippo@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
  • 12:23 jiji@cumin1001: START - Cookbook sre.hosts.downtime
  • 12:22 filippo@cumin1001: START - Cookbook sre.hosts.downtime
  • 12:18 apergos: reimaged dumpsdata1001 to buster and forgot to use the dang script but it is all ok anyhow :-P
  • 11:47 Amir1: deployed security patch for T237667
  • 11:28 jiji@puppetmaster1001: conftool action : set/pooled=no; selector: name=mw1327.eqiad.wmnet,service=nginx
  • 11:28 jiji@puppetmaster1001: conftool action : set/pooled=no; selector: name=mw1347.eqiad.wmnet,service=nginx
  • 11:28 jiji@puppetmaster1001: conftool action : set/pooled=no; selector: name=mw1327.eqiad.wmnet,service=apache2
  • 11:28 jiji@puppetmaster1001: conftool action : set/pooled=no; selector: name=mw1347.eqiad.wmnet,service=apache2
  • 11:28 jiji@puppetmaster1001: conftool action : set/pooled=no; selector: name=mw1337.eqiad.wmnet,service=nginx
  • 11:27 jiji@puppetmaster1001: conftool action : set/pooled=no; selector: name=mw1337.eqiad.wmnet,service=apache2
  • 11:21 effie: reimage mw2289.codfw.wmnet
  • 11:12 jiji@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
  • 11:10 jiji@cumin1001: START - Cookbook sre.hosts.downtime
  • 11:09 jiji@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
  • 11:07 jiji@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
  • 11:06 ema: cp1075: depool to apply https://gerrit.wikimedia.org/r/#/c/operations/puppet/+/552955/ and test tslua reloads T233274
  • 11:06 jiji@cumin1001: START - Cookbook sre.hosts.downtime
  • 11:04 jiji@cumin1001: START - Cookbook sre.hosts.downtime
  • 10:43 effie: reimage mw1347,mw1337,mw1327 - T239054
  • 10:32 ariel@deploy1001: Finished deploy [dumps/dumps@e0b0e76]: skip comment lines in dblist files (duration: 00m 03s)
  • 10:32 ariel@deploy1001: Started deploy [dumps/dumps@e0b0e76]: skip comment lines in dblist files
  • 09:41 moritzm: installing symfony security updates
  • 09:35 jiji@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
  • 09:33 jiji@cumin1001: START - Cookbook sre.hosts.downtime
  • 09:33 jiji@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
  • 09:30 jiji@cumin1001: START - Cookbook sre.hosts.downtime
  • 09:29 moritzm: installing php-imagick security updates
  • 09:25 ema: cp3050: re-enable request coalescing after performance experiment T238494
  • 09:02 effie: reimage mw1317.eqiad.wmnet - T239054
  • 09:01 marostegui: Stop replication on 1124:3318 to reimport wikidatawiki.page table on labsdb1010 - T238399
  • 08:24 godog: silence codfw varnish traffic drop until dec 9th - T239039
  • 08:09 godog: swift eqiad-prod: more weight to ms-be105[7-9] - T237438
  • 07:58 oblivian@deploy1001: helmfile [CODFW] Ran 'apply' command on namespace 'blubberoid' for release 'production' .
  • 07:53 oblivian@deploy1001: helmfile [EQIAD] Ran 'apply' command on namespace 'blubberoid' for release 'production' .
  • 07:51 oblivian@deploy1001: helmfile [EQIAD] Ran 'apply' command on namespace 'blubberoid' for release 'production' .
  • 07:49 elukey: roll restart of eventstreams on scb2* - T239220
  • 07:41 oblivian@deploy1001: helmfile [CODFW] Ran 'apply' command on namespace 'blubberoid' for release 'production' .
  • 07:15 vgutierrez: repooling cp3063 - T239310
  • 07:04 vgutierrez@puppetmaster1001: conftool action : set/pooled=no; selector: name=cp3063.esams.wmnet
  • 07:04 vgutierrez: depool & powercycle cp3063 - T239310
  • 07:03 marostegui: Compress tables on db1102:3314
  • 06:52 marostegui: Remove db2062 from tendril and zarcillo - T238726
  • 06:50 marostegui: Stop MySQL on db2062 - T238726
  • 06:25 oblivian@deploy1001: helmfile [STAGING] Ran 'apply' command on namespace 'blubberoid' for release 'staging' .
  • 06:05 marostegui: Promote db2135 to codfw m5 master T238183
  • 06:02 marostegui@deploy1001: Synchronized wmf-config/db-codfw.php: Add db2135 to the config T238183 (duration: 00m 59s)
  • 06:01 marostegui@deploy1001: Synchronized wmf-config/db-eqiad.php: Add db2135 to the config T238183 (duration: 01m 11s)
  • 05:48 marostegui@cumin1001: dbctl commit (dc=all): 'Repool db2125 T239042', diff saved to https://phabricator.wikimedia.org/P9759 and previous config saved to /var/cache/conftool/dbconfig/20191127-054809-marostegui.json
  • 05:40 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db1105:3311 for schema change', diff saved to https://phabricator.wikimedia.org/P9758 and previous config saved to /var/cache/conftool/dbconfig/20191127-054056-marostegui.json
  • 01:58 krinkle@deploy1001: Synchronized vendor: 4108ff4e2 (3/3) (duration: 01m 00s)
  • 01:56 krinkle@deploy1001: Synchronized wmf-config/: 4108ff4e2 (2/3) (duration: 00m 59s)
  • 01:55 krinkle@deploy1001: Synchronized lib/: 4108ff4e2 (1/3) (duration: 01m 01s)
  • 01:28 krinkle@deploy1001: Synchronized wmf-config/InitialiseSettings.php: (no justification provided) (duration: 01m 03s)
  • 00:05 mholloway-shell@deploy1001: Synchronized wmf-config/InitialiseSettings.php: MachineVision: Show UploadWizard CTA on testcommonswiki (T234960) (duration: 01m 00s)

2019-11-26

  • 23:55 catrope@deploy1001: Synchronized wmf-config/InitialiseSettings.php: Enable WelcomeSurvey for 100% of new users on arwiki (duration: 01m 02s)
  • 23:25 eileen: process-control config revision is ad80b0136c
  • 20:33 jforrester@deploy1001: Synchronized dblists/: Update dblists, now autogenerated (no-op, just comment changes) T223602 (duration: 01m 01s)
  • 20:25 ppchelko@deploy1001: Finished deploy [cpjobqueue/deploy@c282e86]: Followup on T230495 (duration: 00m 59s)
  • 20:24 ebernhardson@deploy1001: Finished deploy [search/airflow@c235ab5]: Rebuild environment for python 3.7.3 (duration: 00m 42s)
  • 20:24 ppchelko@deploy1001: Started deploy [cpjobqueue/deploy@c282e86]: Followup on T230495
  • 20:24 ebernhardson@deploy1001: Started deploy [search/airflow@c235ab5]: Rebuild environment for python 3.7.3
  • 20:06 ppchelko@deploy1001: Finished deploy [cpjobqueue/deploy@2b713d6]: Partition CirrusSearchElasticaWrite jobs T230495 (duration: 01m 23s)
  • 20:05 ppchelko@deploy1001: Started deploy [cpjobqueue/deploy@2b713d6]: Partition CirrusSearchElasticaWrite jobs T230495
  • 19:59 Pchelolo: create partitioned topics for cirrusSearchElasticaWrite on kafka-main T239135
  • 19:57 Urbanecm: Reset email of TheklanBot (T239233)
  • 19:46 brennen@deploy1001: rebuilt and synchronized wikiversions files: group0 to 1.35.0-wmf.8
  • 19:39 brennen@deploy1001: Finished scap: testwiki to php-1.35.0-wmf.8 and rebuild l10n cache (duration: 32m 52s)
  • 19:27 marostegui@cumin1001: dbctl commit (dc=all): 'Repool db1099:3311 after schema change', diff saved to https://phabricator.wikimedia.org/P9753 and previous config saved to /var/cache/conftool/dbconfig/20191126-192724-marostegui.json
  • 19:22 shdubsh: restore codfw logstash to baseline - T215904
  • 19:09 shdubsh: stop logstash codfw, generate some consumer lag, and set batch size to 2000 - T215904
  • 19:07 ebernhardson@deploy1001: Finished deploy [search/airflow@6ab2cd1]: Align deploy groups in scap.cfg and checks.yaml (duration: 00m 29s)
  • 19:07 ebernhardson@deploy1001: Started deploy [search/airflow@6ab2cd1]: Align deploy groups in scap.cfg and checks.yaml
  • 19:06 brennen@deploy1001: Started scap: testwiki to php-1.35.0-wmf.8 and rebuild l10n cache
  • 19:04 brennen@deploy1001: Pruned MediaWiki: 1.35.0-wmf.2 (duration: 07m 08s)
  • 19:03 ebernhardson@deploy1001: Finished deploy [search/airflow@d9779a9]: redeploy current version (duration: 00m 05s)
  • 19:03 ebernhardson@deploy1001: Started deploy [search/airflow@d9779a9]: redeploy current version
  • 19:03 ebernhardson@deploy1001: Finished deploy [search/airflow@d9779a9]: redeploy current version (duration: 00m 02s)
  • 19:03 ebernhardson@deploy1001: Started deploy [search/airflow@d9779a9]: redeploy current version
  • 18:55 shdubsh: stop logstash codfw, generate some consumer lag - T215904
  • 18:44 shdubsh: temporarily update pipeline.batch.size to 1000 on logstash2004 - T215904
  • 18:33 shdubsh: stop logstash on logstash200[5-6] for metrics collection - T215904
  • 18:09 brennen: issues with branch.py branch cut; deleted stub wmf/1.35.0-wmf.8 branch and proceeding with standard process
  • 17:56 mholloway-shell@deploy1001: Synchronized wmf-config/InitialiseSettings-labs.php: MachineVision: Show UploadWizard CTA in beta (T234960) (duration: 00m 52s)
  • 17:31 brennen: cutting branch for 1.35.0-wmf.8
  • 17:26 paravoid: moving fiberring from cr3-esams:xe-0/0/2 to cr2-esams:xe-0/1/8
  • 17:25 ppchelko@deploy1001: Finished deploy [restbase/deploy@0b74625]: Switch group 0 and 1 to Parsoid-PHP T229015 (duration: 15m 38s)
  • 17:10 ppchelko@deploy1001: Started deploy [restbase/deploy@0b74625]: Switch group 0 and 1 to Parsoid-PHP T229015
  • 17:03 paravoid: above was for cr3-esams
  • 17:03 paravoid: cr2-esams: disable interface xe-0/0/2 (transit)
  • 16:36 jforrester@deploy1001: Synchronized wmf-config/CommonSettings.php: Drop Scribunto special-case for HHVM, never reached T235142 (duration: 00m 52s)
  • 16:32 jforrester@deploy1001: Synchronized docroot/noc/createTxtFileSymlinks.sh: Drop HHVMRequestInit symlink creation (duration: 00m 52s)
  • 16:31 James_F: No sane way to delete HHVMRequestInit.php with a simple sync-dir, so waiting for the full scap.
  • 16:30 jforrester@deploy1001: Synchronized docroot/noc/conf/: Drop HHVMRequestInit symlink (duration: 00m 52s)
  • 16:27 ssastry@deploy1001: Finished deploy [parsoid/deploy@ee63341]: Update Parsoid to 7b9b424a (duration: 08m 37s)
  • 16:19 ssastry@deploy1001: Started deploy [parsoid/deploy@ee63341]: Update Parsoid to 7b9b424a
  • 16:10 ssastry@deploy1001: Finished deploy [parsoid/deploy@ee63341]: Testing rollback fixes (T238685) (duration: 01m 07s)
  • 16:09 ssastry@deploy1001: Started deploy [parsoid/deploy@ee63341]: Testing rollback fixes (T238685)
  • 16:01 ema: cp3050: temporarily disable request coalescing to assess performance impact T238494
  • 15:15 ema: cp3050: repool after failed test of https://gerrit.wikimedia.org/r/#/c/operations/puppet/+/552862/ (reverted) T238494
  • 14:55 bblack: ignore previous message, restarts not necessary
  • 14:53 bblack: rolling through authdns daemon restarts (necessary to reconfigure ANY-address listener) on authdns1001, authdns2001, ganeti3003
  • 14:44 oblivian@deploy1001: Synchronized wmf-config/CommonSettings.php: Raise memory limit on parsoid servers 2/2 (duration: 00m 52s)
  • 14:42 oblivian@deploy1001: Synchronized wmf-config/InitialiseSettings.php: Raise memory limit on parsoid servers 1/2 (duration: 00m 51s)
  • 14:30 oblivian@deploy1001: Scap failed!: Call to mwscript eval.php stderr: not empty
  • 14:05 ema: cp3050: depool to merge and test https://gerrit.wikimedia.org/r/#/c/operations/puppet/+/552862/ T238494
  • 13:11 effie: enable puppet on mediawiki servers
  • 13:03 effie: Remove tmpreaper package from all mediawiki servers - T229792
  • 12:38 lucaswerkmeister-wmde@deploy1001: Synchronized wmf-config/InitialiseSettings-labs.php: SWAT: Wikibase (beta-only): Update wmgWikibaseClientDataBridgeHrefRegExp (T238918) (duration: 00m 53s)
  • 12:07 XioNoX: power down mr1-esams for replacement - T238174
  • 11:36 elukey: reboot stat1007
  • 11:35 marostegui: Deploy schema change on db1139:3311
  • 11:35 effie: enable puppet on mw canary servers, and restart apaches
  • 10:50 hashar: Updated jenkins job operations-puppet-tests-stretch-docker to use latest Docker container
  • 10:30 godog: swift eqiad-prod: add ms-be105[7-9] - T237438
  • 10:24 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db1099:3311 for schema change', diff saved to https://phabricator.wikimedia.org/P9749 and previous config saved to /var/cache/conftool/dbconfig/20191126-102442-marostegui.json
  • 10:08 filippo@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
  • 10:08 filippo@cumin1001: START - Cookbook sre.hosts.downtime
  • 10:08 filippo@cumin1001: END (FAIL) - Cookbook sre.hosts.downtime (exit_code=99)
  • 10:08 filippo@cumin1001: START - Cookbook sre.hosts.downtime
  • 10:08 filippo@cumin1001: END (FAIL) - Cookbook sre.hosts.downtime (exit_code=99)
  • 10:08 filippo@cumin1001: START - Cookbook sre.hosts.downtime
  • 10:07 filippo@cumin1001: END (FAIL) - Cookbook sre.hosts.downtime (exit_code=99)
  • 10:07 filippo@cumin1001: START - Cookbook sre.hosts.downtime
  • 09:45 effie: Disable puppet on all mediawiki servers to test 489982
  • 09:26 marostegui: Deploy schema change on s8 primary master (db1109) - T234066 T233135 T237120
  • 09:24 marostegui@cumin1001: dbctl commit (dc=all): 'Repool db1087 into s8 vslow,dump', diff saved to https://phabricator.wikimedia.org/P9748 and previous config saved to /var/cache/conftool/dbconfig/20191126-092409-marostegui.json
  • 09:18 marostegui: Run maintain-views for wikidatawiki.protected_title view on labsdb hosts T233135
  • 07:53 mobrovac@deploy1001: Synchronized wmf-config/InitialiseSettings.php: Parsoid: Switch Flow to Parsoid/PHP on mw.org -- T229015 (duration: 00m 52s)
  • 07:43 mobrovac@deploy1001: Finished deploy [restbase/deploy@378f504]: Do not use duplicate filter definitions T234266 (duration: 14m 24s)
  • 07:29 mobrovac@deploy1001: Started deploy [restbase/deploy@378f504]: Do not use duplicate filter definitions T234266
  • 07:28 mobrovac@deploy1001: Finished deploy [restbase/deploy@378f504] (dev-cluster): Do not use duplicate filter definitions (duration: 07m 36s)
  • 07:21 mobrovac@deploy1001: Started deploy [restbase/deploy@378f504] (dev-cluster): Do not use duplicate filter definitions
  • 07:17 marostegui@cumin1001: dbctl commit (dc=all): 'Remove db1061 from config - T238624', diff saved to https://phabricator.wikimedia.org/P9745 and previous config saved to /var/cache/conftool/dbconfig/20191126-071746-marostegui.json
  • 07:09 marostegui: Stop MySQL on db1061 - T238624
  • 07:05 marostegui@deploy1001: Synchronized wmf-config/db-eqiad.php: Remove db1061 from config T238624 (duration: 00m 52s)
  • 07:03 marostegui@deploy1001: Synchronized wmf-config/db-codfw.php: Remove db1061 from config T238624 (duration: 00m 54s)
  • 06:51 marostegui: Run compare.py for db2125 - T239042
  • 06:44 marostegui: Remove triggers for ar_comment on db1124:3318 T234704
  • 06:43 marostegui: Deploy schema change on db1087 with replication, lag will be generated on s8 for labsdb hosts
  • 06:42 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db1087 from vslow, and pool db1092 temporarily as vslow,dump for s8, for a schema change on db1087', diff saved to https://phabricator.wikimedia.org/P9744 and previous config saved to /var/cache/conftool/dbconfig/20191126-064200-marostegui.json
  • 06:34 XioNoX: Rename cr2-knams to cr3-knams - T237030
  • 06:01 marostegui@cumin2001: dbctl commit (dc=all): 'Promote db1086 on s7 master and remove read-only from s7 T238044', diff saved to https://phabricator.wikimedia.org/P9743 and previous config saved to /var/cache/conftool/dbconfig/20191126-060108-marostegui.json
  • 06:00 marostegui@cumin2001: dbctl commit (dc=all): 'Set s7 as read-only for maintenance T238044', diff saved to https://phabricator.wikimedia.org/P9742 and previous config saved to /var/cache/conftool/dbconfig/20191126-060023-marostegui.json
  • 06:00 marostegui: Starting s7 failover from db1062 to db1086 - T238044
  • 05:49 marostegui: Deploy schema change on dbstore1003:3311
  • 05:10 marostegui@cumin1001: dbctl commit (dc=all): 'Set weight 0 to db1086 as it will be the new s7 master - T238044', diff saved to https://phabricator.wikimedia.org/P9741 and previous config saved to /var/cache/conftool/dbconfig/20191126-051034-marostegui.json
  • 05:08 marostegui: Start pre-steps for s7 failover - T238044

2019-11-25

  • 23:39 cstone: payments-wiki revision changed from e4d51fe247 to 2eb54fd6ef
  • 23:14 Urbanecm: Evening SWAT done
  • 23:12 urbanecm@deploy1001: Synchronized wmf-config/interwiki.php: Update interwiki cache (duration: 02m 14s)
  • 23:10 urbanecm@deploy1001: update-interwiki-cache aborted: Update interwiki cache (duration: 00m 01s)
  • 23:09 urbanecm@deploy1001: Synchronized dblists/: SWAT: aed2369: Add gewikimedia to special.dblist (T239173) (duration: 00m 52s)
  • 23:07 urbanecm@deploy1001: Synchronized wmf-config/CommonSettings.php: SWAT: d71b0ab: kask-echoseen: Do not report dupes (T237143) (duration: 00m 53s)
  • 22:13 Jeff_Green: authdns update to deploy I21ddc1a3e
  • 22:04 eileen: civicrm revision changed from 852c4a36bd to 5cf2d2713f, config revision is c4ad2f5990
  • 20:37 dzahn@cumin1001: conftool action : set/weight=10; selector: name=mw1298.eqiad.wmnet
  • 20:31 dzahn@cumin1001: conftool action : set/pooled=yes; selector: name=mw1298.eqiad.wmnet
  • 20:31 dzahn@cumin1001: conftool action : set/pooled=no; selector: name=mw1298.eqiad.wmnet
  • 20:22 dzahn@cumin1001: conftool action : set/pooled=yes; selector: name=mw1298.eqiad.wmnet
  • 20:07 mholloway-shell@deploy1001: helmfile [CODFW] Ran 'apply' command on namespace 'wikifeeds' for release 'production' .
  • 20:05 mholloway-shell@deploy1001: helmfile [EQIAD] Ran 'apply' command on namespace 'wikifeeds' for release 'production' .
  • 20:04 mholloway-shell@deploy1001: helmfile [STAGING] Ran 'apply' command on namespace 'wikifeeds' for release 'staging' .
  • 19:48 dzahn@cumin1001: conftool action : set/pooled=no; selector: name=mw1298.eqiad.wmnet
  • 19:35 mutante: mw1298 - scap pull
  • 19:32 dzahn@cumin1001: conftool action : set/pooled=yes; selector: name=mw1298.eqiad.wmnet
  • 19:30 ema@cumin1001: conftool action : set/pooled=yes; selector: name=cp4032.ulsfo.wmnet,service=nginx
  • 19:14 bblack: cp[245]*: wipe daemon.log and syslog and restart syslog, again
  • 19:13 cdanis: restarted grafana-server on grafana1002 T220838
  • 19:11 cdanis: copied snapshot of database from grafana1001 to grafana1002 T220838
  • 19:07 cdanis: stopping grafana-next.wikimedia.org (on grafana1002)
  • 19:06 cdanis: making grafana.wikimedia.org read-only (on grafana1001) βœ”οΈ cdanis@grafana1001.eqiad.wmnet ~ πŸ•‘β˜• sudo chmod -w /var/lib/grafana/grafana.db
  • 18:56 Lucas_WMDE: Morning SWAT done
  • 18:55 lucaswerkmeister-wmde@deploy1001: Synchronized php-1.35.0-wmf.5/extensions/TemplateData/: SWAT: Implement ParsoidFetchTemplateData hook for Parsoid/PHP (T238954) (duration: 00m 53s)
  • 18:54 bblack: cp[245]*: wipe daemon.log and syslog and restart syslog, again
  • 18:54 ema: cumin -b1 'A:cp-ats and A:esams' 'run-puppet-agent; ats-backend-restart & ats-tls-restart'
  • 18:53 ema: cumin -b1 'A:cp-ats and A:eqsin' 'run-puppet-agent; ats-backend-restart & ats-tls-restart'
  • 18:53 ema: cumin -b1 'A:cp-ats and A:ulsfo' 'run-puppet-agent; ats-backend-restart & ats-tls-restart'
  • 18:52 ema: cumin -b1 'A:cp-ats and A:codfw' 'run-puppet-agent; ats-backend-restart & ats-tls-restart'
  • 18:51 ema: cumin -b1 'A:cp-ats and A:eqiad' 'run-puppet-agent; ats-backend-restart & ats-tls-restart'
  • 18:50 bblack: cp[245]*: wipe daemon.log and restart syslog, again
  • 18:48 mutante: mw1298 - pooling
  • 18:26 bblack: cp[245]*: disk space exhausted, rm /var/log/daemon.log + restart rsyslog
  • 18:17 bblack: cp4028: disk space exhausted, rm /var/log/daemon.log + restart rsyslog
  • 18:16 effie: Restart php-fpm on mw* and wtp* servers in eqiad and codfw - T236963
  • 18:07 effie: Upgrade php-wikidiff2 to 1.10.0 to all servers - T236963
  • 17:55 gehel: restart wdqs-updater on all wdqs servers
  • 17:55 onimisionipe@deploy1001: Finished deploy [wdqs/wdqs@4c5f503]: Revert New Blazegraph Build and WDQS Updates (duration: 10m 24s)
  • 17:50 mobrovac@deploy1001: Synchronized wmf-config/InitialiseSettings.php: Parsoid: Switch private wiki clients (Flow, VE) to Parsoid/PHP -- T229015 (duration: 00m 53s)
  • 17:45 onimisionipe@deploy1001: Started deploy [wdqs/wdqs@4c5f503]: Revert New Blazegraph Build and WDQS Updates
  • 17:36 marostegui: Upgrade kernel on db2125 T239042
  • 17:25 onimisionipe@deploy1001: Finished deploy [wdqs/wdqs@4c5f503]: New Blazegraph Build and WDQS Updates (duration: 12m 23s)
  • 17:19 XioNoX: power down cr2-knams - T237030
  • 17:14 arlolra@deploy1001: Finished deploy [parsoid/deploy@e7faa19]: Updating Parsoid to a6bfdfa (duration: 08m 58s)
  • 17:12 onimisionipe@deploy1001: Started deploy [wdqs/wdqs@4c5f503]: New Blazegraph Build and WDQS Updates
  • 17:05 arlolra@deploy1001: Started deploy [parsoid/deploy@e7faa19]: Updating Parsoid to a6bfdfa
  • 16:48 jynus: upgrading and restarting dbprov* hosts
  • 15:49 ema: pool cp3064 with varnish-be T227432
  • 15:36 ema: cp3064 create filesystem on /dev/nvme0n1p1 (see https://gerrit.wikimedia.org/r/#/c/operations/puppet/+/552547/) and reboot T238494
  • 15:22 ema: cp3064 manual reboot after wmf-auto-reimage error: 'Unable to run wmf-auto-reimage-host: Failed to reboot_host' T238494
  • 15:20 ema: cp-ats: rolling ats-{tls,backend} restart to enable lua reload T233274
  • 15:18 gehel@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
  • 15:14 gehel@cumin1001: START - Cookbook sre.hosts.downtime
  • 15:13 mholloway-shell@deploy1001: helmfile [CODFW] Ran 'apply' command on namespace 'wikifeeds' for release 'production' .
  • 15:11 mholloway-shell@deploy1001: helmfile [EQIAD] Ran 'apply' command on namespace 'wikifeeds' for release 'production' .
  • 15:11 ema@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
  • 15:11 ema: cp1075: ats-tls-restart to enable lua reload T233274
  • 15:10 mholloway-shell@deploy1001: helmfile [STAGING] Ran 'apply' command on namespace 'wikifeeds' for release 'staging' .
  • 15:09 ema@cumin1001: START - Cookbook sre.hosts.downtime
  • 15:03 ema: cp1075: ats-backend-restart to enable lua reload T233274
  • 15:02 bblack@cumin1001: conftool action : set/pooled=yes; selector: name=cp3056.esams.wmnet
  • 15:00 bblack@cumin1001: conftool action : set/weight=100; selector: name=cp3056.esams.wmnet,service=ats-be
  • 14:50 elukey@cumin1001: END (PASS) - Cookbook sre.zookeeper.roll-restart-zookeeper (exit_code=0)
  • 14:50 XioNoX: enable cr3-esams:et-1/0/0 - T236767
  • 14:45 ema: depool cp3064 and reimage with varnish-be T227432
  • 14:44 elukey@cumin1001: START - Cookbook sre.zookeeper.roll-restart-zookeeper
  • 14:38 marostegui: Remove triggers from archive table on s1 codfw sanitarium T234704
  • 14:37 marostegui: Deploy schema change on s1 codfw (this will generate lag on codfw) - T234066 T233135
  • 14:23 moritzm: upgrading OpenJDK 11 on an-conf*
  • 14:04 oblivian@deploy1001: helmfile [STAGING] Ran 'apply' command on namespace 'blubberoid' for release 'staging' .
  • 13:27 elukey: set global read_only=1 on db1108's log database - T159170
  • 13:16 XioNoX: cleanup config on cr3-esams - T237031
  • 13:15 oblivian@deploy1001: helmfile [STAGING] Ran 'apply' command on namespace 'blubberoid' for release 'staging' .
  • 13:11 oblivian@deploy1001: helmfile [STAGING] Ran 'apply' command on namespace 'blubberoid' for release 'staging' .
  • 13:06 XioNoX: cleanup config on cr2-esams - T237031
  • 13:02 oblivian@deploy1001: helmfile [STAGING] Ran 'apply' command on namespace 'blubberoid' for release 'staging' .
  • 12:59 oblivian@deploy1001: helmfile [STAGING] Ran 'apply' command on namespace 'blubberoid' for release 'staging' .
  • 12:48 XioNoX: bundle esams-knams links on knams side - T237031
  • 12:42 XioNoX: bundle esams-knams links on esams side - T237031
  • 12:27 XioNoX: disable BGP to knams transits - T237031
  • 11:48 marostegui@cumin1001: dbctl commit (dc=all): 'Increase main traffic weight for db1126', diff saved to https://phabricator.wikimedia.org/P9735 and previous config saved to /var/cache/conftool/dbconfig/20191125-114821-marostegui.json
  • 11:47 marostegui@cumin1001: dbctl commit (dc=all): 'Repool db1126 after schema change', diff saved to https://phabricator.wikimedia.org/P9734 and previous config saved to /var/cache/conftool/dbconfig/20191125-114733-marostegui.json
  • 11:40 effie: cumin -b 2 -s 10 restart php on API servers
  • 11:31 effie: restart php-fpm on mw1314
  • 11:16 Urbanecm: EU SWAT done
  • 11:16 urbanecm@deploy1001: Synchronized php-1.35.0-wmf.5/extensions/AbuseFilter/extension.json: SWAT: 29a16bd: Restrict viewing Special:Log/AbuseFilter, and remove from recent changes (T34959) (duration: 01m 04s)
  • 11:11 urbanecm@deploy1001: Synchronized wmf-config/throttle.php: SWAT: 4670d1d: Add throttle rule for WMCL Editathon 2019-12-07 (T238986) (duration: 00m 53s)
  • 11:06 urbanecm@deploy1001: Synchronized wmf-config/InitialiseSettings.php: SWAT: 9394f1f: Allow enwikiversity interface admins to remove their own interface administratorship (T238967) (duration: 00m 57s)
  • 09:45 moritzm: installing cron updates from buster point release
  • 09:32 moritzm: installing systemd security/bugfix updates on buster
  • 09:31 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db1126 - schema change', diff saved to https://phabricator.wikimedia.org/P9732 and previous config saved to /var/cache/conftool/dbconfig/20191125-093157-marostegui.json
  • 09:30 marostegui@cumin1001: dbctl commit (dc=all): 'Repool db1104 after schema change', diff saved to https://phabricator.wikimedia.org/P9731 and previous config saved to /var/cache/conftool/dbconfig/20191125-093038-marostegui.json
  • 09:30 onimisionipe@deploy1001: Finished deploy [wdqs/wdqs@db43901]: T238822 (duration: 13m 08s)
  • 09:28 _joe_: building and publishing updated images for envoy
  • 09:17 onimisionipe@deploy1001: Started deploy [wdqs/wdqs@db43901]: T238822
  • 09:13 moritzm: installing python2.7 updates on buster
  • 08:53 _joe_: rebuilding base docker images docker-registry.wikimedia.org/wikimedia-{jessie,stretch,buster}
  • 08:10 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.decommission (exit_code=0)
  • 08:10 marostegui@cumin1001: START - Cookbook sre.hosts.decommission
  • 07:22 marostegui: Compress db2090
  • 07:04 marostegui: Upgrade db2134
  • 06:24 marostegui: Compress db2080
  • 06:23 marostegui: Compress db2082
  • 06:22 marostegui: Compress db2094:3318
  • 06:18 marostegui: racadm serveraction hardreset on db2125 T239042
  • 06:16 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db1104 - schema change', diff saved to https://phabricator.wikimedia.org/P9730 and previous config saved to /var/cache/conftool/dbconfig/20191125-061629-marostegui.json
  • 06:15 marostegui@cumin1001: dbctl commit (dc=all): 'Fully repool db1101:3318', diff saved to https://phabricator.wikimedia.org/P9729 and previous config saved to /var/cache/conftool/dbconfig/20191125-061542-marostegui.json
  • 06:07 marostegui@cumin1001: dbctl commit (dc=all): 'Slowly repool db1101:3318', diff saved to https://phabricator.wikimedia.org/P9728 and previous config saved to /var/cache/conftool/dbconfig/20191125-060728-marostegui.json
  • 06:00 marostegui@cumin1001: dbctl commit (dc=all): 'Slowly repool db1101:3318', diff saved to https://phabricator.wikimedia.org/P9727 and previous config saved to /var/cache/conftool/dbconfig/20191125-060011-marostegui.json
  • 05:58 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db2125 - crashed T239042', diff saved to https://phabricator.wikimedia.org/P9726 and previous config saved to /var/cache/conftool/dbconfig/20191125-055813-marostegui.json
  • 05:53 marostegui@cumin1001: dbctl commit (dc=all): 'Slowly repool db1101:3318', diff saved to https://phabricator.wikimedia.org/P9725 and previous config saved to /var/cache/conftool/dbconfig/20191125-055305-marostegui.json
  • 03:13 vgutierrez: repooling cp3053 - T239041
  • 03:00 vgutierrez@puppetmaster1001: conftool action : set/pooled=no; selector: name=cp3053.esams.wmnet
  • 02:59 vgutierrez: depooling & power-cycling cp3053 - T239041
  • 00:10 eileen: also speed the repair process-control config revision is c4ad2f5990

2019-11-24

  • 20:54 eileen: process-control config revision is 371782a667
  • 15:41 ariel@deploy1001: Finished deploy [dumps/dumps@bfdea34]: can skip locks for misc dumps (duration: 00m 03s)
  • 15:41 ariel@deploy1001: Started deploy [dumps/dumps@bfdea34]: can skip locks for misc dumps
  • 15:01 apergos: rebooting dumpsdata1002 to clear up the other half of the nfs issues
  • 14:24 apergos: rebooting snapshot1008 to clear up some nfs + kernel issues

2019-11-23

  • 18:19 gehel: repool wdqs1007, catched up on lag - T238229
  • 14:23 reedy@deploy1001: Synchronized wmf-config/InitialiseSettings.php: touch (duration: 00m 55s)
  • 11:56 _joe_: oblivian@cumin1001:~$ sudo cumin -b2 -s60 A:mw-eqiad 'restart-php7.2-fpm'
  • 11:47 _joe_: restarting php7.2-fpm on mw1329
  • 09:49 XioNoX: downtime all ripe-atlas checks until Monday (most likely an upstream issue/maintenance)

2019-11-22

  • 21:55 reedy@deploy1001: Synchronized wmf-config/InitialiseSettings.php: T238955 (duration: 00m 53s)
  • 18:02 shdubsh: restore prometheus services default settings - T238807
  • 17:52 _joe_: repooling restbase2018
  • 17:36 bblack@cumin1001: END (FAIL) - Cookbook sre.hosts.downtime (exit_code=99)
  • 17:34 bblack@cumin1001: START - Cookbook sre.hosts.downtime
  • 17:30 shdubsh: clean tombstones on prometheus1004 - T238807
  • 17:09 shdubsh: restart prometheus on prometheus1004 - T238807
  • 16:22 shdubsh: clean tombstones on prometheus1003 - T238807
  • 15:40 XioNoX: renumber AS17639 sessions in eqsin
  • 15:16 ladsgroup@deploy1001: Synchronized php-1.35.0-wmf.5/extensions/Wikibase/repo/: Stop outputting anything in case of 304 responses in Special:EntityData (T238901) (duration: 00m 57s)
  • 14:49 _joe_: disabling puppet on restbase2018, testing envoy upgrade T238050
  • 14:48 _joe_: uploaded envoyproxy 1.12.1 to {buster,stretch} T237235
  • 13:11 Amir1: start of foreachwikiindblist wikidataclient extensions/Wikibase/lib/maintenance/populateSitesTable.php --force-protocol https (T238119 T238524 T237375 T238120)
  • 13:06 ladsgroup@deploy1001: Synchronized php-1.35.0-wmf.5/extensions/Wikibase/lib/includes/Store/Sql/SqlEntityInfoBuilder.php: T238473 (duration: 00m 52s)
  • 12:34 addshore@deploy1001: Synchronized wmf-config/InitialiseSettings.php: T221774 - wgWikidataOrgQueryServiceMaxLagFactor 60 RESYNC (duration: 00m 51s)
  • 12:32 addshore@deploy1001: Synchronized wmf-config/InitialiseSettings.php: T221774 - wgWikidataOrgQueryServiceMaxLagFactor 60 (duration: 00m 53s)
  • 11:59 effie: reload php7 on canaries
  • 11:34 effie: Roll out wikidiff2 1.10.0-1 to canaries - T236963
  • 11:29 effie: upload wikidiff2 1.10.0-1 - T236963
  • 09:59 ladsgroup@deploy1001: Synchronized wmf-config/interwiki.php: Update interwiki cache (duration: 02m 10s)
  • 09:56 ladsgroup@deploy1001: Synchronized langlist: T238105 (duration: 00m 51s)
  • 09:47 ladsgroup@deploy1001: Synchronized wmf-config/interwiki.php: Update interwiki cache (duration: 02m 20s)
  • 09:44 ladsgroup@deploy1001: Synchronized langlist: T238104 T238104 (duration: 00m 52s)
  • 09:28 ema: pool cp1081 with ATS backend T227432
  • 09:27 gehel: depool wdqs1007 to allow to catch up on lag - T238229
  • 09:23 reedy@deploy1001: Synchronized php-1.35.0-wmf.5/includes/specials/pagers/ContribsPager.php: Remove live hack of limit for T234450 (duration: 00m 54s)
  • 09:19 reedy@deploy1001: Synchronized wmf-config/CommonSettings.php: T234450 (duration: 00m 55s)
  • 09:07 ema@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
  • 09:05 ema@cumin1001: START - Cookbook sre.hosts.downtime
  • 09:04 gehel: remove blazegraph 2.1.5-wmf.11 from archiva, broken upload
  • 08:54 gehel: restarting blazegraph and updater on wdqs1007
  • 08:54 gehel: restarting blazegraph and updater on edqs1007
  • 08:49 ema: depool cp1081 and reimage as text_ats T227432
  • 06:31 marostegui@cumin1001: dbctl commit (dc=all): 'Rebalance weights on s7 in preparation for s7 failover on Tuesday T238044', diff saved to https://phabricator.wikimedia.org/P9722 and previous config saved to /var/cache/conftool/dbconfig/20191122-063145-marostegui.json
  • 03:49 shdubsh: restart prometheus@ops on prometheus1003 T238807
  • 00:46 mutante: xhgui1001/xhgui2001 - rsyncing /srv/mongod from tungsten to /srv/tungsten/mongod/ on both new machines (T158837)
  • 00:37 mutante: tungsten - starting ferm service
  • 00:20 catrope@deploy1001: Synchronized wmf-config/InitialiseSettings.php: Move newcomer tasks JSON config from mw.org to local wikis (T237301) (duration: 00m 52s)
  • 00:18 catrope@deploy1001: Synchronized php-1.35.0-wmf.5/extensions/GrowthExperiments/: Make non-remote titles work in RemotePageConfigurationLoader (T237301) (duration: 00m 54s)

2019-11-21

  • 23:09 ebernhardson@deploy1001: Synchronized wmf-config/InitialiseSettings.php: Remove unused CirrusSearch config variable (duration: 00m 52s)
  • 22:11 Urbanecm: mwscript importImages.php --wiki=commonswiki --overwrite --user=BΓΌrgerentscheid . (T238764)
  • 21:42 mholloway-shell@deploy1001: Synchronized php-1.35.0-wmf.5/extensions/UploadWizard: Revert "Add Machine Vision CTA to final step (T234960)", take 2 (duration: 00m 41s)
  • 21:36 mholloway-shell@deploy1001: Scap failed!: 5/11 canaries failed their endpoint checks(http://en.wikipedia.org)
  • 21:34 mholloway-shell@deploy1001: Scap failed!: 4/11 canaries failed their endpoint checks(http://en.wikipedia.org)
  • 21:29 mholloway-shell@deploy1001: Synchronized php-1.35.0-wmf.5/extensions/UploadWizard: Add Machine Vision CTA to final step (T234960) (duration: 00m 59s)
  • 21:16 mholloway-shell@deploy1001: Finished deploy [mobileapps/deploy@70154b4]: Update mobileapps to c140e88 (duration: 06m 29s)
  • 21:09 mholloway-shell@deploy1001: Started deploy [mobileapps/deploy@70154b4]: Update mobileapps to c140e88
  • 20:51 mutante: puppetmaster1001 - revoking puppet certs for xhgui1001/xhgui2001
  • 20:49 mutante: ganeti1003 - switching boot order of xhgui1001 to network and reinstalling with stretch (T238098)
  • 20:16 mforns@deploy1001: Finished deploy [analytics/refinery@97015e4]: add new projects to webrequest whitelist (duration: 08m 29s)
  • 20:14 mutante: icinga1001 - systemctl reset-failed
  • 20:08 mforns@deploy1001: Started deploy [analytics/refinery@97015e4]: add new projects to webrequest whitelist
  • 19:01 andrewbogott: upgrading designate to 'ocata' on cloudservices1003 and 1004
  • 18:49 otto@deploy1001: helmfile [EQIAD] Ran 'apply' command on namespace 'eventgate-logging-external' for release 'logging-external' .
  • 18:45 otto@deploy1001: helmfile [CODFW] Ran 'apply' command on namespace 'eventgate-logging-external' for release 'logging-external' .
  • 18:42 otto@deploy1001: helmfile [STAGING] Ran 'apply' command on namespace 'eventgate-logging-external' for release 'logging-external' .
  • 18:13 mobrovac@deploy1001: Synchronized wmf-config/InitialiseSettings.php: Switch private wikis back to Parsoid/JS - T229015 (duration: 00m 52s)
  • 18:03 otto@deploy1001: helmfile [STAGING] Ran 'apply' command on namespace 'eventgate-logging-external' for release 'logging-external' .
  • 18:02 mobrovac@deploy1001: Synchronized wmf-config/ProductionServices.php: Use HTTPS for contacting Parsoid/PHP - T229015 (duration: 00m 53s)
  • 17:52 mobrovac@deploy1001: Synchronized wmf-config/CommonSettings.php: Switch private wikis to Parsoid/PHP; file 4/4 -- T229015 (duration: 00m 53s)
  • 17:51 mobrovac@deploy1001: Synchronized wmf-config/InitialiseSettings.php: Switch private wikis to Parsoid/PHP; file 3/4 -- T229015 (duration: 00m 51s)
  • 17:50 mobrovac@deploy1001: Synchronized wmf-config/ProductionServices.php: Switch private wikis to Parsoid/PHP; file 2/4 -- T229015 (duration: 00m 53s)
  • 17:48 mobrovac@deploy1001: Synchronized wmf-config/LabsServices.php: Switch private wikis to Parsoid/PHP; file 1/4 -- T229015 (duration: 00m 53s)
  • 17:27 mobrovac@deploy1001: Finished deploy [restbase/deploy@b987068]: Switch mw.org to Parsoid/PHP - T229015 (duration: 16m 43s)
  • 17:10 mobrovac@deploy1001: Started deploy [restbase/deploy@b987068]: Switch mw.org to Parsoid/PHP - T229015
  • 17:09 mobrovac@deploy1001: Finished deploy [restbase/deploy@b987068] (dev-cluster): Switch mw.org to Parsoid/PHP (duration: 02m 38s)
  • 17:06 mobrovac@deploy1001: Started deploy [restbase/deploy@b987068] (dev-cluster): Switch mw.org to Parsoid/PHP
  • 16:54 otto@deploy1001: helmfile [STAGING] Ran 'sync' command on namespace 'eventgate-logging-external' for release 'logging-external' .
  • 16:48 sbassett@deploy1001: Finished scap: Deploying T238451 (ext:AbuseFilter), running scap sync for i18n issues. (duration: 16m 42s)
  • 16:31 sbassett@deploy1001: Started scap: Deploying T238451 (ext:AbuseFilter), running scap sync for i18n issues.
  • 15:54 otto@deploy1001: helmfile [STAGING] Ran 'apply' command on namespace 'eventgate-logging-external' for release 'logging-external' .
  • 15:42 mforns@deploy1001: Finished deploy [analytics/refinery@7f32472]: deploying analytics refinery (after refinery-source v0.0.107) (duration: 10m 50s)
  • 15:31 mforns@deploy1001: Started deploy [analytics/refinery@7f32472]: deploying analytics refinery (after refinery-source v0.0.107)
  • 15:30 ema: pool cp1079 with ATS backend T227432
  • 15:22 mholloway-shell@deploy1001: helmfile [CODFW] Ran 'apply' command on namespace 'wikifeeds' for release 'production' .
  • 15:19 mholloway-shell@deploy1001: helmfile [EQIAD] Ran 'apply' command on namespace 'wikifeeds' for release 'production' .
  • 15:13 akosiaris: purge https://releases.wikimedia.org/charts/eventgate-0.0.13.tgz, https://releases.wikimedia.org/charts/ and https://releases.wikimedia.org/charts/index.yaml
  • 15:09 ema@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
  • 15:07 bblack: DONE testing deployment software changes on authdns cluster, back to normal
  • 15:07 ema@cumin1001: START - Cookbook sre.hosts.downtime
  • 14:49 ema: depool cp1079 and reimage as text_ats T227432
  • 14:47 onimisionipe@deploy1001: Finished deploy [wdqs/wdqs@db43901]: Agent filter changes (duration: 18m 33s)
  • 14:43 bblack: testing deployment software changes on authdns cluster, please hold dns changes for a few!
  • 14:41 thcipriani: restarting Jenkins for update
  • 14:28 onimisionipe@deploy1001: Started deploy [wdqs/wdqs@db43901]: Agent filter changes
  • 13:59 ema: pool cp1077 with ATS backend T227432
  • 13:41 ema@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
  • 13:39 ema@cumin1001: START - Cookbook sre.hosts.downtime
  • 13:20 ema: depool cp1077 and reimage as text_ats T227432
  • 11:53 reedy@deploy1001: Finished scap: T234450 (duration: 19m 20s)
  • 11:42 effie: enable puppet on all mw hosts
  • 11:33 reedy@deploy1001: Started scap: T234450
  • 11:09 Urbanecm: EU SWAT done
  • 11:08 urbanecm@deploy1001: Synchronized wmf-config/InitialiseSettings.php: SWAT: e4861ec: Set correct language for shywiktionary (T238105) (duration: 00m 52s)
  • 11:06 urbanecm@deploy1001: Synchronized wmf-config/InitialiseSettings.php: SWAT: 68d2003: Restrict editing CNBanner namespace to autoconfirmed on metawiki (T238723) (duration: 00m 54s)
  • 11:05 effie: disable puppet on mw[1-2]*
  • 10:49 volans: restarting tcpircbot-logmsgbot on icinga1001, has failed to log some messages, no useful log on the host
  • 10:22 ema: pool cp2023 with Varnish backend T238817 T227432
  • 10:18 arturo: update buster-wikimedia thirdparty/kubeadm-k8s packages (newer version will be used to handle T238654)
  • 09:54 marostegui@cumin1001: dbctl commit (dc=all): 'Slowly repool db1090:331{2,7} after upgrade', diff saved to https://phabricator.wikimedia.org/P9714 and previous config saved to /var/cache/conftool/dbconfig/20191121-095401-marostegui.json
  • 09:39 marostegui@cumin1001: dbctl commit (dc=all): 'Slowly repool db1090:331{2,7} after upgrade', diff saved to https://phabricator.wikimedia.org/P9713 and previous config saved to /var/cache/conftool/dbconfig/20191121-093958-marostegui.json
  • 09:39 ema: depool cp2023 and reimage back as varnish-be T238817 T227432
  • 09:38 marostegui: Stop MySQL on db1067 - T238297
  • 09:27 marostegui: Upgrade db1090:3312, db1090:3317
  • 09:25 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db1090:3312, db1090:3317 for upgrade', diff saved to https://phabricator.wikimedia.org/P9712 and previous config saved to /var/cache/conftool/dbconfig/20191121-092554-marostegui.json
  • 09:08 akosiaris@deploy1001: helmfile [STAGING] Ran 'apply' command on namespace 'wikifeeds' for release 'staging' .
  • 09:06 marostegui@cumin1001: dbctl commit (dc=all): 'Fully repool db1079 after upgrade', diff saved to https://phabricator.wikimedia.org/P9711 and previous config saved to /var/cache/conftool/dbconfig/20191121-090623-marostegui.json
  • 09:03 akosiaris@deploy1001: helmfile [STAGING] Ran 'apply' command on namespace 'wikifeeds' for release 'staging' .
  • 08:58 akosiaris@deploy1001: helmfile [STAGING] Ran 'sync' command on namespace 'wikifeeds' for release 'staging' .
  • 08:56 marostegui@cumin1001: dbctl commit (dc=all): 'More traffic to db1079 after upgrade', diff saved to https://phabricator.wikimedia.org/P9710 and previous config saved to /var/cache/conftool/dbconfig/20191121-085644-marostegui.json
  • 08:53 akosiaris@deploy1001: helmfile [STAGING] Ran 'sync' command on namespace 'wikifeeds' for release 'staging' .
  • 08:45 marostegui@cumin1001: dbctl commit (dc=all): 'More traffic to db1079 after upgrade', diff saved to https://phabricator.wikimedia.org/P9709 and previous config saved to /var/cache/conftool/dbconfig/20191121-084500-marostegui.json
  • 08:33 marostegui@cumin1001: dbctl commit (dc=all): 'Slowly repool db1079 after upgrade', diff saved to https://phabricator.wikimedia.org/P9708 and previous config saved to /var/cache/conftool/dbconfig/20191121-083322-marostegui.json
  • 08:21 marostegui: Upgrade db1079
  • 08:21 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db1079 for upgrade', diff saved to https://phabricator.wikimedia.org/P9707 and previous config saved to /var/cache/conftool/dbconfig/20191121-082108-marostegui.json
  • 07:57 akosiaris: upgrade OTRS to 5.0.39 T225925
  • 07:56 marostegui: Promote db2133 to codfw m2 master - T238183
  • 07:25 marostegui@cumin1001: dbctl commit (dc=all): 'Fully repool db1086 after upgrade', diff saved to https://phabricator.wikimedia.org/P9706 and previous config saved to /var/cache/conftool/dbconfig/20191121-072543-marostegui.json
  • 07:18 marostegui: Upgrade db1125 (sanitarium)
  • 07:17 marostegui@cumin1001: dbctl commit (dc=all): 'More traffic to db1086 after upgrade', diff saved to https://phabricator.wikimedia.org/P9705 and previous config saved to /var/cache/conftool/dbconfig/20191121-071758-marostegui.json
  • 06:56 marostegui: Repool labsdb1009
  • 06:32 marostegui: Sanitize shywiktionary gcrwiki szywiki minwiktionary gewikimedia on db1124:3313 T238115 T238114 T237373 T238522 T236404
  • 06:30 marostegui: Sanitize shywiktionary gcrwiki szywiki minwiktionary gewikimedia on db2094:3313 T238115 T238114 T237373 T238522 T236404
  • 06:24 marostegui@cumin1001: dbctl commit (dc=all): 'More traffic to db1086 after upgrade', diff saved to https://phabricator.wikimedia.org/P9704 and previous config saved to /var/cache/conftool/dbconfig/20191121-062412-marostegui.json
  • 06:17 marostegui@cumin1001: dbctl commit (dc=all): 'Slowly repool db1086 after upgrade', diff saved to https://phabricator.wikimedia.org/P9703 and previous config saved to /var/cache/conftool/dbconfig/20191121-061711-marostegui.json
  • 06:16 marostegui: Compress db2081
  • 06:13 marostegui: Stop MySQL on db1107 T238113
  • 06:06 marostegui: Compress db2083
  • 05:57 marostegui: Depool labsdb1009 for upgrade
  • 05:56 marostegui: Upgrade db1086
  • 05:55 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db1086 for upgrade', diff saved to https://phabricator.wikimedia.org/P9702 and previous config saved to /var/cache/conftool/dbconfig/20191121-055557-marostegui.json
  • 05:53 marostegui: Compress db2073
  • 00:41 catrope@deploy1001: Synchronized wmf-config/InitialiseSettings.php: Config does not seem to be applying on half the app servers, resyncing (duration: 00m 52s)
  • 00:24 catrope@deploy1001: Synchronized wmf-config/InitialiseSettings.php: GrowthExperiments: Enable suggested edits without opt-in (T227728) (duration: 00m 52s)
  • 00:18 catrope@deploy1001: Finished scap: GrowthExperiments and MobileFrontend changes SWAT (includes i18n) (duration: 15m 57s)
  • 00:02 catrope@deploy1001: Started scap: GrowthExperiments and MobileFrontend changes SWAT (includes i18n)

2019-11-20

  • 23:14 Amir1: finished creating five wikis, total duration 134 minutes
  • 23:14 ladsgroup@deploy1001: Synchronized wmf-config/interwiki.php: Update interwiki cache (duration: 02m 24s)
  • 23:11 ladsgroup@deploy1001: Synchronized langlist: T238105 (duration: 00m 50s)
  • 23:10 ladsgroup@deploy1001: Synchronized static/images/project-logos/: T238105 (duration: 00m 52s)
  • 23:09 ladsgroup@deploy1001: Synchronized wmf-config/InitialiseSettings.php: T238105 (duration: 00m 51s)
  • 23:08 ladsgroup@deploy1001: Synchronized multiversion/MWMultiVersion.php: T238105 (duration: 00m 51s)
  • 23:05 ladsgroup@deploy1001: rebuilt and synchronized wikiversions files: T238105
  • 22:59 ladsgroup@deploy1001: Synchronized dblists: T238105 (duration: 00m 53s)
  • 22:49 ladsgroup@deploy1001: Synchronized langlist: T238104 (duration: 00m 51s)
  • 22:48 ladsgroup@deploy1001: Synchronized static/images/project-logos/: T238104 (duration: 00m 52s)
  • 22:47 ladsgroup@deploy1001: Synchronized wmf-config/InitialiseSettings.php: T238104 (duration: 00m 52s)
  • 22:43 ladsgroup@deploy1001: Synchronized multiversion/MWMultiVersion.php: T238104 (duration: 00m 51s)
  • 22:41 ladsgroup@deploy1001: rebuilt and synchronized wikiversions files: T238104
  • 22:36 ladsgroup@deploy1001: Synchronized dblists: T238104 (duration: 00m 52s)
  • 22:22 ladsgroup@deploy1001: Synchronized langlist: T237369 (duration: 00m 53s)
  • 22:21 ladsgroup@deploy1001: Synchronized static/images/project-logos/: T237369 (duration: 00m 52s)
  • 22:19 ladsgroup@deploy1001: Synchronized wmf-config/InitialiseSettings.php: T237369 (duration: 00m 51s)
  • 22:17 ladsgroup@deploy1001: Synchronized multiversion/MWMultiVersion.php: T237369 (duration: 00m 51s)
  • 22:15 ladsgroup@deploy1001: rebuilt and synchronized wikiversions files: T237369
  • 22:11 ladsgroup@deploy1001: Synchronized dblists: T237369 (duration: 00m 52s)
  • 22:00 Urbanecm: Wiki creation continues
  • 21:56 ladsgroup@deploy1001: Synchronized langlist: T236861 (duration: 00m 52s)
  • 21:55 ladsgroup@deploy1001: Synchronized static/images/project-logos/: T236861 (duration: 00m 51s)
  • 21:54 ladsgroup@deploy1001: Synchronized wmf-config/InitialiseSettings.php: T236861 (duration: 00m 52s)
  • 21:52 ladsgroup@deploy1001: Synchronized multiversion/MWMultiVersion.php: T236861 (duration: 00m 51s)
  • 21:49 ladsgroup@deploy1001: rebuilt and synchronized wikiversions files: T236861
  • 21:44 ladsgroup@deploy1001: Synchronized dblists: T236861 (duration: 00m 52s)
  • 21:38 Urbanecm: mwscript createAndPromote.php --wiki=gewikimedia --sysop --bureaucrat Mehman97 <password redacted> (T236389)
  • 21:35 gehel: repool wdqs1004 - T238229
  • 21:30 urbanecm@deploy1001: Synchronized wmf-config/InitialiseSettings.php: new wiki gewikimedia (T236389) (duration: 00m 52s)
  • 21:29 urbanecm@deploy1001: Synchronized static/images/project-logos/: new wiki gewikimedia (T236389) (duration: 00m 53s)
  • 21:28 urbanecm@deploy1001: Synchronized multiversion/MWMultiVersion.php: new wiki gewikimedia (T236389) (duration: 00m 52s)
  • 21:27 ejegg: Fundraising CiviCRM updated from 2802bdd649 to 852c4a36bd
  • 21:23 mutante: notebook1003 - systemctl start nagios-nrpe-server (second time today already today T212824)
  • 21:20 urbanecm@deploy1001: rebuilt and synchronized wikiversions files: new wiki gewikimedia (T236389)
  • 21:16 urbanecm@deploy1001: Synchronized dblists: new wiki gewikimedia (T236389) (duration: 00m 52s)
  • 21:01 ssastry@deploy1001: Finished deploy [parsoid/deploy@7665624]: Dummy Parsoid deploy to test T238748 fix (duration: 07m 20s)
  • 20:53 ssastry@deploy1001: Started deploy [parsoid/deploy@7665624]: Dummy Parsoid deploy to test T238748 fix
  • 20:37 ssastry@deploy1001: Finished deploy [parsoid/deploy@d5646b7]: Updating Parsoid to 2e79460d (duration: 09m 14s)
  • 20:27 ssastry@deploy1001: Started deploy [parsoid/deploy@d5646b7]: Updating Parsoid to 2e79460d
  • 20:27 mholloway-shell@deploy1001: helmfile [STAGING] Ran 'apply' command on namespace 'wikifeeds' for release 'staging' .
  • 20:23 mutante: notebook1003 - sudo systemctl nagios-nrpe-server (as usual ....)
  • 20:19 mholloway-shell@deploy1001: helmfile [STAGING] Ran 'apply' command on namespace 'wikifeeds' for release 'staging' .
  • 19:31 ejegg: updated fundraising internal dashboard from 69fdbec60d to 8fc2726736
  • 19:04 mutante: xhgui1001 - initial puppet run, signed puppet cert on puppetmaster1001
  • 18:56 addshore@deploy1001: Synchronized wmf-config/InitialiseSettings.php: RESYNC T221774 - wgWikidataOrgQueryServiceMaxLagFactor 120 (duration: 00m 50s)
  • 18:51 addshore@deploy1001: Synchronized wmf-config/InitialiseSettings.php: T221774 - wgWikidataOrgQueryServiceMaxLagFactor 120 (duration: 00m 54s)
  • 18:42 addshore@deploy1001: Synchronized wmf-config/InitialiseSettings.php: T221774 - wgWikidataOrgQueryServiceMaxLagFactor 170 (duration: 00m 53s)
  • 18:31 mutante: ganeti - introducing and installing buster on new VMs xhgui1001/xhgui2001 - for replacing tungsten (jessie) T238098
  • 18:17 mobrovac: morning SWAT done
  • 18:17 mobrovac@deploy1001: Synchronized php-1.35.0-wmf.5/includes/libs/virtualrest/ParsoidVirtualRESTService.php: Parsoid VRS: Add the Host header - T229015 T229078 T229074 (duration: 00m 52s)
  • 18:13 shdubsh: restart mtail on fermium
  • 17:40 ema: pool cp2023 with ATS backend T227432
  • 17:24 mobrovac@deploy1001: helmfile [EQIAD] Ran 'apply' command on namespace 'citoid' for release 'production' .
  • 17:21 mobrovac@deploy1001: helmfile [CODFW] Ran 'apply' command on namespace 'citoid' for release 'production' .
  • 17:19 mobrovac@deploy1001: helmfile [STAGING] Ran 'apply' command on namespace 'citoid' for release 'staging' .
  • 17:18 andrewbogott: upgrading pdns to version 4 on cloudservices1003
  • 17:06 ema@cumin2001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
  • 17:04 ema@cumin2001: START - Cookbook sre.hosts.downtime
  • 17:03 andrewbogott: upgrading pdns to version 4 on cloudvirt1004 T210715
  • 16:58 andrewbogott: disabling puppet on cloudvirt1003 and 1004 for T210715
  • 16:55 moritzm: installing rpcbind bugfix updates from buster 10.2 point release
  • 16:43 ema@cumin2001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
  • 16:40 ema@cumin2001: START - Cookbook sre.hosts.downtime
  • 16:23 ema: depool cp2023 and reimage as text_ats T227432
  • 16:14 ema: pool cp2019 with ATS backend T227432
  • 16:08 marostegui@cumin1001: dbctl commit (dc=all): 'Repool db1103:3314 after compression', diff saved to https://phabricator.wikimedia.org/P9695 and previous config saved to /var/cache/conftool/dbconfig/20191120-160813-marostegui.json
  • 16:03 gehel: depool wdqs1004 to allow catching up on lag - T238229
  • 15:42 mobrovac@deploy1001: Synchronized wmf-config/LabsServices.php: [BETA-ONLY] Switch Flow to use Parsoid/PHP - T229078 (duration: 00m 52s)
  • 15:40 ema@cumin2001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
  • 15:38 ema@cumin2001: START - Cookbook sre.hosts.downtime
  • 15:36 addshore@deploy1001: Synchronized wmf-config/InitialiseSettings.php: RESYNC T221774 - wgWikidataOrgQueryServiceMaxLagFactor 180 gerrit:552069 (duration: 00m 52s)
  • 15:19 ema: depool cp2019 and reimage as text_ats T227432
  • 15:08 gehel: reset LVS weight for wdqs public eqiad to 10
  • 15:05 effie: Enable puppet on mw*
  • 14:52 addshore@deploy1001: Synchronized wmf-config/InitialiseSettings.php: T221774 - wgWikidataOrgQueryServiceMaxLagFactor 180 gerrit:552069 (duration: 00m 52s)
  • 14:50 addshore@deploy1001: Synchronized php-1.35.0-wmf.5/extensions/Wikidata.org: T221774 - Wikidata.org extension (use altered lag, not raw lag) gerrit:552072 (duration: 00m 53s)
  • 14:49 ema: pool cp2016 with ATS backend T227432
  • 14:47 effie: disable puppet on all mw* servers
  • 14:27 ema@cumin2001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
  • 14:24 ema@cumin2001: START - Cookbook sre.hosts.downtime
  • 14:06 ema: depool cp2016 and reimage as text_ats T227432
  • 13:32 godog: updated puppet compiler facts on compiler100* hosts
  • 12:43 ema: pool cp2013 with ATS backend T227432
  • 12:27 ema@cumin2001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
  • 12:25 ema@cumin2001: START - Cookbook sre.hosts.downtime
  • 12:08 ema: depool cp2013 and reimage as text_ats T227432
  • 11:59 ema: pool cp2012 with ATS backend T227432
  • 11:55 Urbanecm: EU SWAT done
  • 11:55 urbanecm@deploy1001: Synchronized wmf-config/InitialiseSettings.php: SWAT: 2b13fbe: [rowiki] Enable deleterevision for patrollers (T234051) (duration: 00m 52s)
  • 11:46 urbanecm@deploy1001: Synchronized wmf-config/InitialiseSettings.php: SWAT: 51ecd71: Partial cleanup of InitializeSettings (T231178) (duration: 00m 52s)
  • 11:42 ema@cumin2001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
  • 11:40 ema@cumin2001: START - Cookbook sre.hosts.downtime
  • 11:38 urbanecm@deploy1001: Synchronized wmf-config/InitialiseSettings.php: SWAT: f847380: Set namespace alias for Index: (NS 102/103) for elwikisource (T237253) (duration: 00m 54s)
  • 11:36 urbanecm@deploy1001: Finished scap: SWAT: 44ec4e4: e1baf0e: 3c02aa7: Namespace changes (duration: 06m 15s)
  • 11:30 urbanecm@deploy1001: Started scap: SWAT: 44ec4e4: e1baf0e: 3c02aa7: Namespace changes
  • 11:27 ema: cp2010: ats-backend-restart to clear backend restart alert
  • 11:21 ema: depool cp2012 and reimage as text_ats T227432
  • 11:15 ema: pool cp2010 with ATS backend T227432
  • 10:54 ema@cumin2001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
  • 10:52 ema@cumin2001: START - Cookbook sre.hosts.downtime
  • 10:36 mobrovac@deploy1001: Finished deploy [restbase/deploy@daa7808]: Revert switching test2.wp to Parsoid/JS - T238716 (duration: 13m 56s)
  • 10:34 ema: depool cp2010 and reimage as text_ats T227432
  • 10:30 marostegui: Upgrade db1116
  • 10:22 mobrovac@deploy1001: Started deploy [restbase/deploy@daa7808]: Revert switching test2.wp to Parsoid/JS - T238716
  • 10:17 marostegui@cumin1001: dbctl commit (dc=all): 'Fully repool db1094', diff saved to https://phabricator.wikimedia.org/P9694 and previous config saved to /var/cache/conftool/dbconfig/20191120-101727-marostegui.json
  • 10:14 marostegui: Compress db2095:3314
  • 10:07 mobrovac@deploy1001: Finished deploy [restbase/deploy@c677063]: Switch test2.wp back to Parsoid/JS temporarily - T238716 (duration: 14m 54s)
  • 09:56 marostegui: Compress db2106
  • 09:52 mobrovac@deploy1001: Started deploy [restbase/deploy@c677063]: Switch test2.wp back to Parsoid/JS temporarily - T238716
  • 09:48 marostegui: Compress dbstore1005:3318
  • 09:47 marostegui: Compress dbstore1004:3314
  • 09:33 marostegui@cumin1001: dbctl commit (dc=all): 'More traffic to db1094 after upgrade', diff saved to https://phabricator.wikimedia.org/P9693 and previous config saved to /var/cache/conftool/dbconfig/20191120-093308-marostegui.json
  • 09:23 marostegui@cumin1001: dbctl commit (dc=all): 'More traffic to db1094 after upgrade', diff saved to https://phabricator.wikimedia.org/P9692 and previous config saved to /var/cache/conftool/dbconfig/20191120-092337-marostegui.json
  • 09:07 marostegui@cumin1001: dbctl commit (dc=all): 'Slowly repool db1094 after upgrade', diff saved to https://phabricator.wikimedia.org/P9691 and previous config saved to /var/cache/conftool/dbconfig/20191120-090739-marostegui.json
  • 08:55 marostegui: Upgrade db1094
  • 08:54 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db1094 for upgrade', diff saved to https://phabricator.wikimedia.org/P9690 and previous config saved to /var/cache/conftool/dbconfig/20191120-085448-marostegui.json
  • 08:02 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.decommission (exit_code=0)
  • 08:01 marostegui@cumin1001: START - Cookbook sre.hosts.decommission
  • 07:43 marostegui: Promote db2132 as m1-codfw master - T238183
  • 07:19 marostegui: Upgrade db2062
  • 07:19 marostegui: Upgrade db2078
  • 07:14 marostegui: Deploy schema change on s3 (testwikidatawiki) directly on s3 primary master T237120
  • 07:05 marostegui@cumin1001: dbctl commit (dc=all): 'Fully repool db1136', diff saved to https://phabricator.wikimedia.org/P9688 and previous config saved to /var/cache/conftool/dbconfig/20191120-070511-marostegui.json
  • 06:57 marostegui@cumin1001: dbctl commit (dc=all): 'Give more traffic to db1136', diff saved to https://phabricator.wikimedia.org/P9687 and previous config saved to /var/cache/conftool/dbconfig/20191120-065718-marostegui.json
  • 06:44 marostegui: Upgrade db2118 (s7 codfw master)
  • 06:41 marostegui: Repool labsdb1011
  • 06:40 marostegui@cumin1001: dbctl commit (dc=all): 'Pool db1136 into s7 api', diff saved to https://phabricator.wikimedia.org/P9686 and previous config saved to /var/cache/conftool/dbconfig/20191120-064022-marostegui.json
  • 06:36 marostegui@cumin1001: dbctl commit (dc=all): 'Slowly repool db1136 after upgrade', diff saved to https://phabricator.wikimedia.org/P9685 and previous config saved to /var/cache/conftool/dbconfig/20191120-063628-marostegui.json
  • 06:28 marostegui: Upgrade db1136
  • 06:27 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db1136 for upgrade', diff saved to https://phabricator.wikimedia.org/P9684 and previous config saved to /var/cache/conftool/dbconfig/20191120-062749-marostegui.json
  • 06:20 marostegui@cumin1001: dbctl commit (dc=all): 'Repool db1101:3317 after upgrade', diff saved to https://phabricator.wikimedia.org/P9683 and previous config saved to /var/cache/conftool/dbconfig/20191120-062029-marostegui.json
  • 06:19 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db1103:3314 for compression', diff saved to https://phabricator.wikimedia.org/P9682 and previous config saved to /var/cache/conftool/dbconfig/20191120-061938-marostegui.json
  • 05:58 marostegui: Stop MySQL on db1101:3317, db1101:3318 for upgrade and schema change
  • 05:57 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db1101:3317 and db1101:3318 for upgrade and schema change', diff saved to https://phabricator.wikimedia.org/P9681 and previous config saved to /var/cache/conftool/dbconfig/20191120-055732-marostegui.json
  • 05:55 marostegui: Depool labsdb1011 for upgrade
  • 05:54 marostegui@cumin2001: dbctl commit (dc=all): 'Repool db1105:3311 db1097:3314 db1098:3316 db1098:3317 after compression', diff saved to https://phabricator.wikimedia.org/P9680 and previous config saved to /var/cache/conftool/dbconfig/20191120-055426-marostegui.json
  • 05:48 marostegui@cumin1001: dbctl commit (dc=all): 'Repool db1092 after schema change', diff saved to https://phabricator.wikimedia.org/P9679 and previous config saved to /var/cache/conftool/dbconfig/20191120-054840-marostegui.json
  • 03:16 tgr: T208369 ran mwscript extensions/GrowthExperiments/maintenance/deleteOldSurveys.php kowiki --cutoff 350
  • 02:57 vgutierrez: restarting pybal on lvs2002
  • 02:54 vgutierrez: restarting pybal on lvs2005
  • 02:32 dzahn@cumin1001: conftool action : set/pooled=yes; selector: name=phab2001-vcs.codfw.wmnet
  • 02:18 dzahn@cumin1001: conftool action : set/pooled=no; selector: name=phab2001-vcs.codfw.wmnet
  • 00:10 mutante: phab2001 - restart ssh-phab service after repooling it after buster reinstall, it wasn't listening on the IPv6 IP,causing LVS/pybal alerts
  • 00:06 catrope@deploy1001: Synchronized php-1.35.0-wmf.5/extensions/GrowthExperiments/: Pass token as editing_session_id for suggested edits (T238249) (duration: 00m 53s)
  • 00:02 catrope@deploy1001: Synchronized php-1.35.0-wmf.5/extensions/VisualEditor/: EditAttemptStep: Allow overriding session ID (T238249) (duration: 00m 52s)
  • 00:00 catrope@deploy1001: Synchronized php-1.35.0-wmf.5/extensions/WikiEditor/: EditAttemptStep: Allow overriding session ID (T238249) (duration: 00m 54s)

2019-11-19

  • 23:58 catrope@deploy1001: Synchronized php-1.35.0-wmf.5/extensions/MobileFrontend/: EditAttemptStep: Allow overriding session ID (T238249) (duration: 00m 53s)
  • 23:55 catrope@deploy1001: Synchronized php-1.35.0-wmf.5/extensions/WikimediaEvents/: EditAttemptStep: Allow other extensions to trigger oversampling (T238249) (duration: 00m 53s)
  • 23:42 dzahn@cumin1001: conftool action : set/pooled=yes; selector: name=phab2001-vcs.codfw.wmnet
  • 21:45 XioNoX: rebooting pfw3-codfw:node1 for upgrade - T235150
  • 21:14 XioNoX: rebooting pfw3-codfw for upgrade - T235150
  • 20:59 dzahn@cumin1001: END (PASS) - Cookbook sre.ganeti.makevm (exit_code=0)
  • 20:17 gehel: completed reloading data from wdqs1007 to wdqs1004 - after failed test of merging updater - T212826
  • 20:14 dzahn@cumin1001: START - Cookbook sre.ganeti.makevm
  • 20:10 XioNoX: homer push on mgmt routers
  • 20:09 mutante: phab1003 after merging gerrit:551910 puppet now also stopped the actual aphlict service and removed the systemd unit file. had to manually run 'systemctl reset-failed' though to clean systemd status and avoid icinga alert (T238593)
  • 20:07 dzahn@cumin1001: END (PASS) - Cookbook sre.ganeti.makevm (exit_code=0)
  • 19:20 dzahn@cumin1001: START - Cookbook sre.ganeti.makevm
  • 19:18 dzahn@cumin1001: END (ERROR) - Cookbook sre.ganeti.makevm (exit_code=97)
  • 19:18 dzahn@cumin1001: START - Cookbook sre.ganeti.makevm
  • 19:08 mholloway-shell@deploy1001: Finished deploy [mobileapps/deploy@6e6bd42]: Prevent expensive content transforms from blocking the event loop (T229286) (duration: 06m 49s)
  • 19:01 mholloway-shell@deploy1001: Started deploy [mobileapps/deploy@6e6bd42]: Prevent expensive content transforms from blocking the event loop (T229286)
  • 19:00 elukey: regenerate TLS cert for yarn.wikimedia.org (containing SANs for all analytics UIs) to add datasets.w.o SAN (site was failing due to ATS not being able to contact thorium)
  • 18:59 rlazarus: restarted php7.2-fpm on wtp2001, wtp2002
  • 18:56 rlazarus: restarted php7.2-fpm on wtp1025, wtp1026
  • 18:35 catrope@deploy1001: Synchronized php-1.35.0-wmf.5/extensions/VisualEditor/: Unbreak instrumentation of init events (duration: 00m 53s)
  • 18:34 ssastry@deploy1001: Finished deploy [parsoid/deploy@6e7cffd]: Updating Parsoid to 1a1105a7 (duration: 02m 04s)
  • 18:32 ssastry@deploy1001: Started deploy [parsoid/deploy@6e7cffd]: Updating Parsoid to 1a1105a7
  • 18:30 mutante: icinga config - manually added team-dcops, started icinga
  • 18:20 addshore@deploy1001: Synchronized php-1.35.0-wmf.5/extensions/Wikidata.org: T221774 - Wikidata.org extension (queryservice maxlag, hook) gerrit:551858 (duration: 00m 53s)
  • 18:12 RoanKattouw: That was eowiktionary, not eowikisource
  • 18:09 catrope@deploy1001: Synchronized wmf-config/InitialiseSettings.php: Configure default search namespaces for eowikisource (T237792) (duration: 00m 52s)
  • 17:43 addshore@deploy1001: Synchronized php-1.35.0-wmf.5/extensions/Wikidata.org: T221774 - Wikidata.org extension (queryservice maxlag, maint script) gerrit:551857 (duration: 00m 52s)
  • 17:39 gehel@cumin1001: END (FAIL) - Cookbook sre.wdqs.data-transfer (exit_code=99)
  • 17:11 addshore@deploy1001: Synchronized php-1.35.0-wmf.5/extensions/Wikidata.org: T221774 - Wikidata.org extension (queryservice maxlag) gerrit:551855 gerrit:551856 (duration: 00m 54s)
  • 17:02 volker-e@deploy1001: Finished deploy [design/style-guide@d73818a]: Deploy design/style-guide: (duration: 00m 07s)
  • 17:02 volker-e@deploy1001: Started deploy [design/style-guide@d73818a]: Deploy design/style-guide:
  • 16:58 ema: pool cp2007 with ATS backend T227432
  • 16:30 ema@cumin2001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
  • 16:28 ema@cumin2001: START - Cookbook sre.hosts.downtime
  • 16:25 moritzm: installing glib2.0 security updates
  • 16:21 mutante: phab1003 - puppet restarts aphlict service even with "phabricator_aphlict_enabled: false" in Hiera. But it does properly remove the proxy config lines from apache. so service is running but not used. (T238593)
  • 16:17 mutante: phab1003 - systemctl stop aphlict (proxy config in apache is disabled as well as disabled in ATS) (T238593)
  • 16:15 gehel: reloading data from wdqs1007 to wdqs1004 - after failed test of merging updater - T212826
  • 16:14 gehel@cumin1001: START - Cookbook sre.wdqs.data-transfer
  • 16:10 ema: depool cp2007 and reimage as text_ats T227432
  • 16:09 ema: pool cp2006 with ATS backend T227432
  • 15:59 mobrovac@deploy1001: Finished deploy [restbase/deploy@564b2c6]: New Parsoid/PHP config structure (duration: 02m 11s)
  • 15:57 mobrovac@deploy1001: Started deploy [restbase/deploy@564b2c6]: New Parsoid/PHP config structure
  • 15:37 ema@cumin2001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
  • 15:34 ema@cumin2001: START - Cookbook sre.hosts.downtime
  • 15:27 mobrovac@deploy1001: Finished deploy [restbase/deploy@5e7f759]: Switch test.wp and test2.wp to Parsoid/PHP - T229015 (duration: 14m 22s)
  • 15:15 ema: depool cp2006 and reimage as text_ats T227432
  • 15:13 mobrovac@deploy1001: Started deploy [restbase/deploy@5e7f759]: Switch test.wp and test2.wp to Parsoid/PHP - T229015
  • 15:09 mobrovac@deploy1001: Finished deploy [restbase/deploy@5e7f759] (dev-cluster): Switch test.wp and test2.wp to Parsoid/PHP (duration: 02m 58s)
  • 15:07 ema: pool cp2004 with ATS backend T227432
  • 15:06 mobrovac@deploy1001: Started deploy [restbase/deploy@5e7f759] (dev-cluster): Switch test.wp and test2.wp to Parsoid/PHP
  • 14:38 ema@cumin2001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
  • 14:36 ema@cumin2001: START - Cookbook sre.hosts.downtime
  • 14:34 gehel: restarting blazegraph with additional logging on wdqs1004 - T231411
  • 14:18 ema: depool cp2004 and reimage as text_ats T227432
  • 14:13 ema: pool cp2001 with ATS backend T227432
  • 13:57 marostegui: Deploy schema change on metawiki directly on s7 master T238370
  • 13:57 marostegui: Deploy schema change on mediawikiwiki directly on s7 master T238370
  • 13:55 marostegui: Deploy schema change on mediawikiwiki directly on s3 master T238370
  • 13:50 marostegui: Deploy schema change on foundationwiki directly on s3 master - T238370
  • 13:46 marostegui: Deploy schema change on labswiki (wikitech) - T238370
  • 13:39 marostegui: Deploy schema change on db1092
  • 13:38 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db1092 for schema change', diff saved to https://phabricator.wikimedia.org/P9673 and previous config saved to /var/cache/conftool/dbconfig/20191119-133850-marostegui.json
  • 13:37 marostegui@cumin1001: dbctl commit (dc=all): 'Repool db1101:3318 after schema change', diff saved to https://phabricator.wikimedia.org/P9672 and previous config saved to /var/cache/conftool/dbconfig/20191119-133704-marostegui.json
  • 13:34 ema@cumin2001: END (FAIL) - Cookbook sre.hosts.downtime (exit_code=99)
  • 13:33 ema@cumin2001: START - Cookbook sre.hosts.downtime
  • 13:14 ema: depool cp2001 and reimage as text_ats T227432
  • 12:42 jbond42: add libapache2-mod-auth-cas 1.2-1 to stretch-wikimedia repo
  • 12:28 effie: enable puppet on P:mediawiki::php and *.eqiad.wmnet
  • 12:22 effie: enable puppet on P:mediawiki::php and *.codfw.wmnet
  • 12:12 marostegui@deploy1001: Synchronized wmf-config/db-codfw.php: Remove db1067 from config T238297 (duration: 00m 52s)
  • 12:11 marostegui@deploy1001: Synchronized wmf-config/db-eqiad.php: Remove db1067 from config T238297 (duration: 00m 52s)
  • 11:41 gehel: depooling wdqs1004 - T231411
  • 11:37 gehel: restarting wdqs blazegraph on wdqs1004 - T231411
  • 11:29 marostegui: Upgrade dbstore1003 (3311,3315,3317)
  • 11:16 gehel: restarting wdqs updater on wdqs1004 - T231411
  • 10:36 marostegui: Compress and upgrade db1098:3316
  • 10:35 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db1096:3316 for upgrade and compression', diff saved to https://phabricator.wikimedia.org/P9671 and previous config saved to /var/cache/conftool/dbconfig/20191119-103540-marostegui.json
  • 10:34 marostegui: Compress and upgrade db1098:3317
  • 10:34 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db1098:3317 for upgrade and compression', diff saved to https://phabricator.wikimedia.org/P9670 and previous config saved to /var/cache/conftool/dbconfig/20191119-103426-marostegui.json
  • 10:29 marostegui: Upgrade db2077
  • 10:24 marostegui: Upgrade db2120 db2121 db2122
  • 10:10 marostegui: Upgrade MySQL on db2086 db2087 db2100
  • 10:06 godog: repool centrallog2001
  • 09:40 effie: disable puppet on P:mediawiki::php - T229792
  • 09:21 moritzm: installing ncurses security updates
  • 09:20 moritzm: rolling restart of nginx on acmechief/puppetdb to pick up libxslt security updates
  • 09:08 moritzm: installing libxslt security updates
  • 09:08 marostegui: Deploy schema change on db1101:3318
  • 09:08 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db1101:3318 for schema change', diff saved to https://phabricator.wikimedia.org/P9669 and previous config saved to /var/cache/conftool/dbconfig/20191119-090823-marostegui.json
  • 09:07 marostegui@cumin1001: dbctl commit (dc=all): 'Repool db1099:3318 after schema change', diff saved to https://phabricator.wikimedia.org/P9668 and previous config saved to /var/cache/conftool/dbconfig/20191119-090745-marostegui.json
  • 09:05 marostegui: Repool labsbdb1010
  • 07:33 mobrovac@deploy1001: Synchronized wmf-config/CommonSettings-labs.php: Enable math links in Beta - T208758 (duration: 00m 53s)
  • 06:45 marostegui: Stop MySQL on db2061 T238526
  • 06:44 marostegui: Remove db2061 from tendril and zarcillo T238526
  • 06:39 marostegui@deploy1001: Synchronized wmf-config/db-eqiad.php: Remove db2061 from config T238526 (duration: 00m 52s)
  • 06:38 marostegui@deploy1001: Synchronized wmf-config/db-codfw.php: Remove db2061 from config T238526 (duration: 00m 53s)
  • 06:26 vgutierrez: Move cp1089 from nginx to ats-tls - T231627
  • 06:20 marostegui: Depool labsdb1010 for upgrade
  • 06:02 marostegui@cumin2001: dbctl commit (dc=all): 'Promote db1131 to s6 master and remove read-only from s6 T235469', diff saved to https://phabricator.wikimedia.org/P9667 and previous config saved to /var/cache/conftool/dbconfig/20191119-060203-marostegui.json
  • 06:01 marostegui@cumin2001: dbctl commit (dc=all): 'Set s6 as read-only for maintenance T235469', diff saved to https://phabricator.wikimedia.org/P9666 and previous config saved to /var/cache/conftool/dbconfig/20191119-060122-marostegui.json
  • 06:01 marostegui: Starting s6 failover from db1061 to db1131 - T235469
  • 05:37 eileen: process control - I reverted the above to check some stuff first
  • 05:36 vgutierrez: Move cp1087 from nginx to ats-tls - T231627
  • 05:26 marostegui: Deploy schema change on db1099:3318
  • 05:26 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db1099:3318 for schema change', diff saved to https://phabricator.wikimedia.org/P9665 and previous config saved to /var/cache/conftool/dbconfig/20191119-052632-marostegui.json
  • 05:25 marostegui: Compress db1097:3314
  • 05:24 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db1097:3314 for compression', diff saved to https://phabricator.wikimedia.org/P9664 and previous config saved to /var/cache/conftool/dbconfig/20191119-052412-marostegui.json
  • 05:17 vgutierrez: Move cp1085 from nginx to ats-tls - T231627
  • 05:14 marostegui: Compress tables on db1105:3311
  • 05:13 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db1105:3311 for compression', diff saved to https://phabricator.wikimedia.org/P9663 and previous config saved to /var/cache/conftool/dbconfig/20191119-051344-marostegui.json
  • 05:13 marostegui@cumin1001: dbctl commit (dc=all): 'Repool db1105:3312 after compression', diff saved to https://phabricator.wikimedia.org/P9662 and previous config saved to /var/cache/conftool/dbconfig/20191119-051259-marostegui.json
  • 05:12 eileen: process-control config revision is 9fbfc79988 - change gap on repair job to 16 hours to reflect the with-daylight-savings ones
  • 05:07 marostegui@cumin1001: dbctl commit (dc=all): 'Set db1131 with weight 0 T235469 ', diff saved to https://phabricator.wikimedia.org/P9661 and previous config saved to /var/cache/conftool/dbconfig/20191119-050748-marostegui.json
  • 05:02 marostegui: Start pre-switchover steps T235469
  • 04:47 vgutierrez: Move cp2023 from nginx to ats-tls - T231627
  • 04:17 vgutierrez: Move cp2019 from nginx to ats-tls - T231627
  • 03:53 vgutierrez: Move cp2016 from nginx to ats-tls - T231627
  • 03:51 tgr: T208369 ran mwscript extensions/GrowthExperiments/maintenance/deleteOldSurveys.php cswiki --cutoff 350
  • 03:37 vgutierrez: Move cp2013 from nginx to ats-tls - T231627
  • 01:12 ejegg: re-enabled fundraising CiviCRM contact de-duplication jobs
  • 01:05 ejegg: disabled fundraising CiviCRM contact de-duplication jobs
  • 00:54 ejegg: updated civicrm from 1f454aa69a to 2802bdd649
  • 00:39 mutante: phab2001 - rsyncing /srv/repos data from phab1003 (T190568)
  • 00:30 mutante: rebooting phab2001

2019-11-18

  • 23:52 catrope@deploy1001: Finished scap: Update GrowthExperiments to master in wmf.5 (includes i18n) (duration: 19m 57s)
  • 23:37 mutante: phab2001 - restart ssh-phab service after reimaging (some race condition binding to the IP before getting it on the interface after fresh install .. reschedule pybal checks (T190568)
  • 23:32 catrope@deploy1001: Started scap: Update GrowthExperiments to master in wmf.5 (includes i18n)
  • 22:58 dzahn@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
  • 22:56 dzahn@cumin1001: START - Cookbook sre.hosts.downtime
  • 22:44 dzahn@cumin1001: conftool action : set/pooled=no; selector: name=phab2001-vcs.codfw.wmnet
  • 22:39 dzahn@cumin1001: conftool action : set/pooled=no; selector: name=phab2001.codfw.wmnet
  • 22:39 eileen: civicrm revision changed from c05c302e54 to 1f454aa69a, config revision is 67685c12f5
  • 22:31 mutante: phab2001 - reinstalling with buster (T190568)
  • 21:59 mholloway-shell@deploy1001: helmfile [CODFW] Ran 'apply' command on namespace 'wikifeeds' for release 'production' .
  • 21:57 mholloway-shell@deploy1001: helmfile [EQIAD] Ran 'apply' command on namespace 'wikifeeds' for release 'production' .
  • 21:57 arlolra: Upgraded Parsoid to 2245b8f (T237886, T237103, T236864, T237569, T236930, T237463, T236867, T234266)
  • 21:56 mholloway-shell@deploy1001: helmfile [STAGING] Ran 'apply' command on namespace 'wikifeeds' for release 'staging' .
  • 21:47 arlolra@deploy1001: Finished deploy [parsoid/deploy@c6a457f]: Updating Parsoid to 2245b8f (duration: 08m 22s)
  • 21:39 arlolra@deploy1001: Started deploy [parsoid/deploy@c6a457f]: Updating Parsoid to 2245b8f
  • 20:59 mutante: phab1003 - re-enabling puppet after merging gerrit::551271 - making sure aphlict stays disabled incl. the apache config ProxyPass lines using mod_proxy_wstunnel (T238593)
  • 20:23 marostegui@cumin1001: dbctl commit (dc=all): 'Repool db1098:3316 after some compression', diff saved to https://phabricator.wikimedia.org/P9659 and previous config saved to /var/cache/conftool/dbconfig/20191118-202259-marostegui.json
  • 19:03 ejegg: updated payments-wiki from 30579d34d8 to 3f99ebecc7
  • 18:21 onimisionipe@deploy1001: Finished deploy [wdqs/wdqs@582d394]: New WDQS build with merging updater (duration: 13m 27s)
  • 18:07 onimisionipe@deploy1001: Started deploy [wdqs/wdqs@582d394]: New WDQS build with merging updater
  • 17:44 cdanis: rebooting grafana1002 (currently test host not used in prod)
  • 17:08 marostegui: Deploy schema change on db1116:3318
  • 16:54 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db1098:3316 for compression', diff saved to https://phabricator.wikimedia.org/P9658 and previous config saved to /var/cache/conftool/dbconfig/20191118-165410-marostegui.json
  • 16:49 marostegui@cumin1001: dbctl commit (dc=all): 'Repool db1096:3316 after compression', diff saved to https://phabricator.wikimedia.org/P9656 and previous config saved to /var/cache/conftool/dbconfig/20191118-164923-marostegui.json
  • 16:40 cdanis: βœ”οΈ cdanis@install1002.wikimedia.org ~ πŸ•¦ sudo -E reprepro --restrict grafana update buster-wikimedia
  • 16:08 filippo@cumin1001: END (FAIL) - Cookbook sre.hosts.downtime (exit_code=99)
  • 16:06 filippo@cumin1001: START - Cookbook sre.hosts.downtime
  • 15:13 anomie@deploy1001: Synchronized wmf-config/InitialiseSettings.php: Set MCR migration stage to NEW on remaining wikis for T198312 (duration: 00m 53s)
  • 14:48 mobrovac@deploy1001: Finished deploy [restbase/deploy@b3b288c]: Parsoid: mirror traffic in split mode; add minwiktionary - T229015 T238523 (duration: 13m 58s)
  • 14:34 mobrovac@deploy1001: Started deploy [restbase/deploy@b3b288c]: Parsoid: mirror traffic in split mode; add minwiktionary - T229015 T238523
  • 14:34 mobrovac@deploy1001: Finished deploy [restbase/deploy@b3b288c] (dev-cluster): Parsoid: mirror traffic in split mode; add minwiktionary - T229015 T238523 (duration: 02m 30s)
  • 14:31 mobrovac@deploy1001: Started deploy [restbase/deploy@b3b288c] (dev-cluster): Parsoid: mirror traffic in split mode; add minwiktionary - T229015 T238523
  • 14:30 mobrovac@deploy1001: Finished deploy [restbase/deploy@b3b288c] (dev-cluster): Parsoid: mirror traffic in split mode; add minwiktionary (duration: 02m 45s)
  • 14:28 mobrovac@deploy1001: Started deploy [restbase/deploy@b3b288c] (dev-cluster): Parsoid: mirror traffic in split mode; add minwiktionary
  • 14:27 arturo: imported openstack ocata deb packages into stretch-wikimedia/thirdpartdy/openstack-ocata-stretch (T238338)
  • 14:22 marostegui: Deploy schema change on dbstore1005:3318
  • 13:10 ema: cp-ats: rolling ats-{tls,backend} restart to apply log_buffer_size config changes T237608
  • 12:51 Urbanecm: Run mwscript recountCategories.php --wiki=cswiki --mode={subcats,pages,files} (T228585)
  • 12:48 Urbanecm: Run mwscript recountCategories.php --wiki=dewiki --mode=files (T238500)
  • 12:48 Urbanecm: Run mwscript recountCategories.php --wiki=dewiki --mode=pages (T238500)
  • 12:47 Urbanecm: Run mwscript recountCategories.php --wiki=dewiki --mode=subcats (T238500)
  • 11:32 awight: EU SWAT complete
  • 11:28 awight@deploy1001: Synchronized php-1.35.0-wmf.5/extensions/Cite: SWAT: Track pageviews only on content page views, not edits (T214493) (duration: 00m 51s)
  • 11:26 awight@deploy1001: Synchronized php-1.35.0-wmf.5/extensions/Popups: SWAT: Don't record Popups actions on non-content pages (T214493) (duration: 00m 51s)
  • 11:04 moritzm: installing postgresql-common security updates
  • 10:56 moritzm: installing python-werkzeug security updates
  • 10:56 marostegui: Deploy schema change on db2078 (codfw master for wikidatawiki), this will create lag on s8 codfw - T237120
  • 10:53 moritzm: installing gdb updates from buster point release
  • 10:49 moritzm: installing python-cryptography bugfix updates from buster point release
  • 10:45 moritzm: updated buster netinst image for 10.2 T238519
  • 10:16 marostegui: Upgrade MySQL on labsdb1012
  • 09:33 godog: remove wezen from service, pending reimage
  • 09:11 marostegui: Remove ar_comment from triggers on db2094:3318 - T234704
  • 09:11 marostegui: Deploy schema change on s8 codfw, this will generate lag on s8 codfw - T233135 T234066
  • 09:03 marostegui: Restart MySQL on db1124 and db1125 to apply new replication filters T238370
  • 07:17 marostegui: Upgrade and restart mysql on sanitarium hosts on codfw to pick up new replication filters: db2094 and db2095 - T238370
  • 07:09 marostegui: Stop MySQL on db2070 to clone db2135 - T238183
  • 06:52 vgutierrez: Move cp1083 from nginx to ats-tls - T231627
  • 06:32 vgutierrez: Move cp1081 from nginx to ats-tls - T231627
  • 06:30 marostegui: Restart tendril mysql - T231769
  • 06:12 vgutierrez: Move cp2012 from nginx to ats-tls - T231627
  • 06:05 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db1096:3316 for compression', diff saved to https://phabricator.wikimedia.org/P9652 and previous config saved to /var/cache/conftool/dbconfig/20191118-060508-marostegui.json
  • 06:02 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db1105:3312 for compression', diff saved to https://phabricator.wikimedia.org/P9651 and previous config saved to /var/cache/conftool/dbconfig/20191118-060207-marostegui.json
  • 06:01 marostegui@cumin1001: dbctl commit (dc=all): 'Repool db2072, db2088:3311, db2087:3316, db2086:3317 after maintenances and schema changes', diff saved to https://phabricator.wikimedia.org/P9650 and previous config saved to /var/cache/conftool/dbconfig/20191118-060114-marostegui.json
  • 05:53 marostegui: Deploy schema change on s5 primary master db1100 - T233135 T234066
  • 03:40 vgutierrez: Move cp2007 from nginx to ats-tls - T231627
  • 00:44 tstarling@deploy1001: Synchronized php-1.35.0-wmf.5/includes/Rest/Handler/PageHistoryCountHandler.php: fix extremely slow query T238378 (duration: 00m 59s)

2019-11-16

  • 20:27 ariel@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
  • 20:25 ariel@cumin1001: START - Cookbook sre.hosts.downtime
  • 12:17 effie: restart rsyslog on mw2221
  • 09:43 elukey: systemctl restart hadoop-* on analytics1077 after oom killer

2019-11-15

  • 22:14 jeh@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
  • 22:12 jeh@cumin1001: START - Cookbook sre.hosts.downtime
  • 21:54 jeh@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
  • 21:52 jeh@cumin1001: START - Cookbook sre.hosts.downtime
  • 21:31 jeh@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
  • 21:29 jeh@cumin1001: START - Cookbook sre.hosts.downtime
  • 21:21 _joe_: disabling proxying to ws on phabricator1003
  • 20:04 XioNoX: push pfw policies to pfw3-eqiad - T238368
  • 20:02 XioNoX: push pfw policies to pfw3-codfw - T238368
  • 19:07 XioNoX: remove vlan 1 trunking between msw1-codfw and mr1-codfw, will cause a quick connectivity issue - T228112
  • 18:07 XioNoX: homer push on management switches
  • 17:30 mutante: phabricator - -started phd service
  • 17:11 XioNoX: homer push to management routers (https://gerrit.wikimedia.org/r/550576)
  • 16:43 hashar: Restored zuul-merger / CI for operations/puppet.git
  • 16:29 hashar: CI slowed down due to a huge spike of internal jobs. Being flushed as of now # T140297
  • 16:25 bblack: repool cp2001
  • 16:08 bblack: depool cp2001 for experiments
  • 16:02 moritzm: rebooting rpki1001 to rectify microcode loading
  • 16:00 jmm@cumin2001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
  • 16:00 jmm@cumin2001: START - Cookbook sre.hosts.downtime
  • 15:51 ejegg: updated Fundraising CiviCRM from ae9b3819cd to c05c302e54
  • 15:36 ejegg: reduced batch size of CiviCRM contact deduplication jobs
  • 15:11 ema: pool cp3064 with ATS backend T227432
  • 15:07 ema: reboot cp3064 after reimage
  • 14:51 ema@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
  • 14:49 ema@cumin1001: START - Cookbook sre.hosts.downtime
  • 14:25 ema: depool cp3064 and reimage as text_ats T227432
  • 14:17 godog: SIGHUP prometheus@ops on prometheus1004
  • 14:13 bblack: lvs1013 - pybal restart for new config
  • 14:13 bblack: lvs2001 - pybal restart for new config
  • 14:13 bblack: lvs5001 - pybal restart for new config
  • 14:13 bblack: lvs4005 - pybal restart for new config
  • 14:12 bblack: lvs3005 - pybal restart for new config
  • 14:11 bblack: lvs5003 - pybal restart for new config
  • 14:11 bblack: lvs4007 - pybal restart for new config
  • 14:11 bblack: lvs3007 - pybal restart for new config
  • 14:10 bblack: lvs2004 - pybal restart for new config
  • 14:09 bblack: lvs1016 - pybal restart for new config
  • 13:28 ariel@deploy1001: Finished deploy [dumps/dumps@61090ee]: configuration setting to produce empty abstracts (duration: 00m 03s)
  • 13:28 ariel@deploy1001: Started deploy [dumps/dumps@61090ee]: configuration setting to produce empty abstracts
  • 13:06 ariel@deploy1001: Finished deploy [dumps/dumps@61090ee]: configuration setting to produce empty abstracts (expecting failure) (duration: 00m 04s)
  • 13:06 ariel@deploy1001: Started deploy [dumps/dumps@61090ee]: configuration setting to produce empty abstracts (expecting failure)
  • 11:43 ariel@deploy1001: Finished deploy [dumps/dumps@61090ee]: configuration setting to produce empty abstracts (duration: 00m 09s)
  • 11:43 ariel@deploy1001: Started deploy [dumps/dumps@61090ee]: configuration setting to produce empty abstracts
  • 11:27 moritzm: reboott ganeti4001-4003 to rectify microcode application
  • 11:26 jmm@cumin2001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
  • 11:26 jmm@cumin2001: START - Cookbook sre.hosts.downtime
  • 11:25 marostegui@cumin1001: dbctl commit (dc=all): 'Repool db1113:3315 into vslow,dump after schema change', diff saved to https://phabricator.wikimedia.org/P9645 and previous config saved to /var/cache/conftool/dbconfig/20191115-112520-marostegui.json
  • 11:19 marostegui: Reboot dbproxy2002
  • 11:15 marostegui: Reboot dbproxy2004
  • 11:12 marostegui: Reboot dbproxy2001
  • 10:45 marostegui: Run maintain-views for s5 on labsdb1011 T233135
  • 10:38 moritzm: installing ghostscript security updates
  • 10:37 mobrovac: restbase - truncated parsoidphp data tables - T229015
  • 10:36 ema: pool cp3062 with ATS backend T227432
  • 10:24 godog: roll-restart logstash to apply configuration change
  • 10:19 ema@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
  • 10:15 ema@cumin1001: START - Cookbook sre.hosts.downtime
  • 09:50 ema: depool cp3062 and reimage as text_ats T227432
  • 09:47 vgutierrez: Use a synthetic warning for 1% of TLSv1/TLS1v.1 pageviews - T238038
  • 09:18 vgutierrez: Move cp1079 from nginx to ats-tls - T231627
  • 09:13 gehel@cumin1001: END (ERROR) - Cookbook sre.wdqs.data-reload (exit_code=97)
  • 09:02 vgutierrez: Move cp1077 from nginx to ats-tls - T231627
  • 08:42 vgutierrez: Move cp2006 from nginx to ats-tls - T231627
  • 08:30 vgutierrez: Move cp2004 from nginx to ats-tls - T231627
  • 06:41 marostegui: Stop MySQL on db2065 to clone db2134 (this will trigger an haproxy irc alert) - T238183
  • 06:08 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db1113:3315 for schema change and temporary pool db1082 into vslow,dump', diff saved to https://phabricator.wikimedia.org/P9643 and previous config saved to /var/cache/conftool/dbconfig/20191115-060807-marostegui.json
  • 06:04 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db2088:3311 for compression', diff saved to https://phabricator.wikimedia.org/P9642 and previous config saved to /var/cache/conftool/dbconfig/20191115-060425-marostegui.json
  • 06:03 marostegui@cumin1001: dbctl commit (dc=all): 'Repool db1103:3312 db1082 after schema changes', diff saved to https://phabricator.wikimedia.org/P9641 and previous config saved to /var/cache/conftool/dbconfig/20191115-060300-marostegui.json
  • 05:57 marostegui: Run maintain-views for s5 on labsdb1009, labsdb1010, labsdb1012 (pending labsdb1011 as it is still running the schema change) T233135
  • 05:07 vgutierrez: Move cp3064 from nginx to ats-tls - T231627
  • 04:38 volker-e@deploy1001: Finished deploy [design/style-guide@2ad7b1a]: Deploy design/style-guide: (duration: 00m 07s)
  • 04:38 volker-e@deploy1001: Started deploy [design/style-guide@2ad7b1a]: Deploy design/style-guide:
  • 04:17 vgutierrez: Move cp3062 from nginx to ats-tls - T231627
  • 04:00 vgutierrez: Move cp3060 from nginx to ats-tls - T231627
  • 01:35 tstarling@deploy1001: Synchronized php-1.35.0-wmf.5/includes/Rest/Handler/CompareHandler.php: deploying REST compare section feature because iOS team need it for a beta release due very soon (duration: 00m 53s)
  • 01:33 tstarling@deploy1001: Synchronized php-1.35.0-wmf.5/includes/Rest/coreRoutes.json: deploying REST compare section feature because iOS team need it for a beta release due very soon (duration: 00m 52s)
  • 01:32 tstarling@deploy1001: Synchronized php-1.35.0-wmf.5/includes/parser/Parser.php: deploying REST compare section feature because iOS team need it for a beta release due very soon (duration: 00m 54s)

2019-11-14

  • 23:03 mutante: restarting gerrit to ncrease defaultThreadPoolSize to 2
  • 22:29 eileen: civicrm revision changed from a3714003ff to ae9b3819cd, config revision is 6adc66a20b
  • 21:32 ssastry@deploy1001: Finished deploy [parsoid/deploy@150f9af]: Updating Parsoid to 74203415 (duration: 08m 21s)
  • 21:24 ssastry@deploy1001: Started deploy [parsoid/deploy@150f9af]: Updating Parsoid to 74203415
  • 21:14 gehel@cumin1001: END (PASS) - Cookbook sre.wdqs.data-transfer (exit_code=0)
  • 20:06 cdanis@cumin2001: dbctl commit (dc=all): 'remove now-defunct wikitech section T233236', diff saved to https://phabricator.wikimedia.org/P9639 and previous config saved to /var/cache/conftool/dbconfig/20191114-200649-cdanis.json
  • 20:04 gehel: reloading data on wdqs1004 from wdqs1007 to catch up on lag faster - T238229
  • 19:57 gehel@cumin1001: START - Cookbook sre.wdqs.data-transfer
  • 19:33 otto@deploy1001: helmfile [EQIAD] Ran 'apply' command on namespace 'eventgate-logging-external' for release 'logging-external' .
  • 19:31 otto@deploy1001: helmfile [CODFW] Ran 'apply' command on namespace 'eventgate-logging-external' for release 'logging-external' .
  • 19:20 otto@deploy1001: helmfile [STAGING] Ran 'apply' command on namespace 'eventgate-logging-external' for release 'logging-external' .
  • 18:49 catrope@deploy1001: Synchronized wmf-config/: Use s10/s11 dblists for wikitechs (for real this time) (T233236) (duration: 00m 52s)
  • 18:37 catrope@deploy1001: Synchronized dblists/: Use s10/s11 dblists for wikitechs (T233236) (duration: 00m 51s)
  • 18:35 catrope@deploy1001: Synchronized dblists/: Add s10/s11 dblists for wikitechs (T233236) (duration: 00m 52s)
  • 18:34 mutante: scandium - restart php7.2-fpm
  • 18:31 mutante: phabricator (phab1003, prod server) - upgrade PHP version to 7.2.24 (T237239)
  • 18:17 cdanis@cumin2001: dbctl commit (dc=all): 'alias wikitech section to new s10 section T233236', diff saved to https://phabricator.wikimedia.org/P9638 and previous config saved to /var/cache/conftool/dbconfig/20191114-181732-cdanis.json
  • 17:46 robh: running dell epsa tool on cp3056 per T236497
  • 17:35 akosiaris@deploy1001: helmfile [CODFW] Ran 'sync' command on namespace 'kube-system' for release 'calico-policy-controller' .
  • 17:35 akosiaris@deploy1001: helmfile [CODFW] Ran 'sync' command on namespace 'kube-system' for release 'coredns' .
  • 17:35 akosiaris@deploy1001: helmfile [CODFW] Ran 'sync' command on namespace 'kube-system' for release 'rbac-deploy-clusterrole' .
  • 17:22 akosiaris@deploy1001: helmfile [STAGING] Ran 'sync' command on namespace 'kube-system' for release 'calico-policy-controller' .
  • 17:22 akosiaris@deploy1001: helmfile [STAGING] Ran 'sync' command on namespace 'kube-system' for release 'coredns' .
  • 17:22 akosiaris@deploy1001: helmfile [STAGING] Ran 'sync' command on namespace 'kube-system' for release 'rbac-deploy-clusterrole' .
  • 17:22 ejegg: updated payments-wiki from bd907656fb to 30579d34d8
  • 17:17 akosiaris@deploy1001: helmfile [EQIAD] Ran 'sync' command on namespace 'kube-system' for release 'calico-policy-controller' .
  • 17:17 akosiaris@deploy1001: helmfile [EQIAD] Ran 'sync' command on namespace 'kube-system' for release 'coredns' .
  • 17:17 akosiaris@deploy1001: helmfile [EQIAD] Ran 'sync' command on namespace 'kube-system' for release 'rbac-deploy-clusterrole' .
  • 16:09 mutante: phab2001 - upgrading PHP version to 7.2.24 (T237239)
  • 16:06 mutante: scandium - upgrading PHP version to 7.2.24 (fyi, @subbu T228069) (T237239)
  • 16:04 ladsgroup@deploy1001: Synchronized php-1.35.0-wmf.5/extensions/Wikibase: Put a layer of APC cache on top of reading wb_terms in SqlEntityInfoBuilder (T231011 T229407 T236681), Try II (duration: 00m 56s)
  • 14:54 ema: pool cp3060 with ATS backend T227432
  • 14:53 mholloway-shell@deploy1001: Synchronized php-1.35.0-wmf.5/extensions/MachineVision: Fix bug when when looking up entity for an unknown ID (duration: 00m 53s)
  • 14:48 anomie@deploy1001: Synchronized wmf-config/InitialiseSettings.php: Set MCR migration stage to NEW on group1 for T198312 (duration: 00m 53s)
  • 14:27 ema@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
  • 14:24 ema@cumin1001: START - Cookbook sre.hosts.downtime
  • 14:01 ema: depool cp3060 and reimage as text_ats T227432
  • 13:37 ladsgroup@deploy1001: scap failed: average error rate on 7/11 canaries increased by 10x (rerun with --force to override this check, see https://logstash.wikimedia.org/goto/db09a36be5ed3e81155041f7d46ad040 for details)
  • 13:35 gehel: depool wdqs1004 to allow catching up on lag - T238229
  • 13:06 bblack: removing digicert-2019 files from cache nodes - https://gerrit.wikimedia.org/r/#/c/operations/puppet/+/550829/
  • 12:24 mobrovac@deploy1001: Finished deploy [restbase/deploy@58cf5ae]: Fix /metrics/mediarequests/top/ indentation (duration: 14m 52s)
  • 12:09 mobrovac@deploy1001: Started deploy [restbase/deploy@58cf5ae]: Fix /metrics/mediarequests/top/ indentation
  • 11:58 mobrovac@deploy1001: Finished deploy [restbase/deploy@58cf5ae] (dev-cluster): Fix /metrics/mediarequests/top/ indentation (duration: 02m 50s)
  • 11:55 mobrovac@deploy1001: Started deploy [restbase/deploy@58cf5ae] (dev-cluster): Fix /metrics/mediarequests/top/ indentation
  • 11:26 gehel@cumin1001: END (PASS) - Cookbook sre.wdqs.data-transfer (exit_code=0)
  • 10:48 vgutierrez: Rolling restart of ats-tls/ats-backend to upgrade to 8.0.5-1wm11 - T238307
  • 10:44 vgutierrez: uploaded trafficserver-8.0.5-1wm11 to apt.wikimedia.org (stretch) - T238307
  • 10:43 ema: pool cp3058 with ATS backend T227432
  • 10:25 ema@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
  • 10:23 ema@cumin1001: START - Cookbook sre.hosts.downtime
  • 10:20 godog: netbox1001 bandaid/symlink /srv/deployment/netbox/deploy/src/netbox/project-static to 'static'
  • 10:06 gehel: copying journal from wdqs1007 to wdqs1005 - T238232
  • 10:05 gehel@cumin1001: START - Cookbook sre.wdqs.data-transfer
  • 10:03 Urbanecm: Run deleteEqualMessages.php --delete for cswiki and viwiki
  • 09:59 ema@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
  • 09:57 ema@cumin1001: START - Cookbook sre.hosts.downtime
  • 09:55 gehel: depool wdqs (public) eqiad - high lag - T238229
  • 09:34 ema: depool cp3058 and reimage as text_ats T227432
  • 09:31 marostegui: Compare wikidatawiki.pagelinks between labsdb1011 and labsdb1010 - T233986
  • 09:25 moritzm: installing ghostscript updates on thumbor1001
  • 09:24 marostegui: Stop mysql on db2067 to clone db21133 - T238183
  • 09:20 marostegui@cumin1001: dbctl commit (dc=all): 'Full weight to db1089 on special groups for s1 T223151', diff saved to https://phabricator.wikimedia.org/P9635 and previous config saved to /var/cache/conftool/dbconfig/20191114-092006-marostegui.json
  • 09:06 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
  • 09:05 marostegui: Compare wikidatawiki.pagelinks between db1124:3318 and labsdb1010 - T233986
  • 09:04 marostegui@cumin1001: START - Cookbook sre.hosts.downtime
  • 08:42 marostegui: Remove ar_comment from triggers on db1124:3315 - T234704
  • 08:41 marostegui: Deploy schema change with replication on db1082, this will generate lag on s5 labs - T233135 T234066
  • 08:40 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
  • 08:40 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db1082 for schema change', diff saved to https://phabricator.wikimedia.org/P9634 and previous config saved to /var/cache/conftool/dbconfig/20191114-084043-marostegui.json
  • 08:38 marostegui@cumin1001: END (FAIL) - Cookbook sre.hosts.downtime (exit_code=99)
  • 08:38 marostegui@cumin1001: START - Cookbook sre.hosts.downtime
  • 08:38 marostegui@cumin1001: START - Cookbook sre.hosts.downtime
  • 08:37 marostegui@cumin1001: dbctl commit (dc=all): 'Repool db1110 after schema change', diff saved to https://phabricator.wikimedia.org/P9633 and previous config saved to /var/cache/conftool/dbconfig/20191114-083729-marostegui.json
  • 08:03 eileen: process-control config revision is 6adc66a20b re-enable backfill
  • 08:00 marostegui@cumin1001: dbctl commit (dc=all): 'Pool a non partitioned slave db1089 on special groups for s1 T223151', diff saved to https://phabricator.wikimedia.org/P9632 and previous config saved to /var/cache/conftool/dbconfig/20191114-080038-marostegui.json
  • 07:54 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db1103:3312 T235599', diff saved to https://phabricator.wikimedia.org/P9631 and previous config saved to /var/cache/conftool/dbconfig/20191114-075449-marostegui.json
  • 07:41 eileen: process-control config revision is b7c2cf7227 - disabled backfill again - some error?
  • 07:29 eileen: process-control config revision is 909108622d re-enable omnirecipient date repair job
  • 07:25 eileen: process-control config revision is d3ebeddcc1 (I renabled the old back fill job)
  • 07:12 moritzm: installing intel-microcode updates
  • 06:53 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db1067', diff saved to https://phabricator.wikimedia.org/P9630 and previous config saved to /var/cache/conftool/dbconfig/20191114-065309-marostegui.json
  • 06:16 marostegui: Stop replication on db1067
  • 06:01 marostegui@cumin2001: dbctl commit (dc=all): 'Promote db1083 to s1 master and remove read-only from s1 T234800', diff saved to https://phabricator.wikimedia.org/P9629 and previous config saved to /var/cache/conftool/dbconfig/20191114-060138-marostegui.json
  • 06:00 marostegui@cumin2001: dbctl commit (dc=all): 'Set s1 as read-only for maintenance T234800', diff saved to https://phabricator.wikimedia.org/P9628 and previous config saved to /var/cache/conftool/dbconfig/20191114-060026-marostegui.json
  • 06:00 marostegui: Starting s1 failover from db1067 to db1083 - T234800
  • 05:51 jynus: stopping db1114 replication
  • 05:34 marostegui: Compress db2089:3316 - T235599
  • 05:24 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db1110 for schema change', diff saved to https://phabricator.wikimedia.org/P9627 and previous config saved to /var/cache/conftool/dbconfig/20191114-052400-marostegui.json
  • 05:23 marostegui@cumin1001: dbctl commit (dc=all): 'Repool db1130 after schema change', diff saved to https://phabricator.wikimedia.org/P9626 and previous config saved to /var/cache/conftool/dbconfig/20191114-052303-marostegui.json
  • 05:13 marostegui: Move replicas from db1067 to db1083 T234800
  • 05:09 marostegui@cumin1001: dbctl commit (dc=all): 'Set db1083 with weight 0 T234800', diff saved to https://phabricator.wikimedia.org/P9625 and previous config saved to /var/cache/conftool/dbconfig/20191114-050940-marostegui.json
  • 05:08 vgutierrez: Repooling cp1077 - T238289
  • 05:07 marostegui: Start pre-failover steps T234800
  • 05:01 kart_: Updated cxserver to 2019-11-13-111130-production tag (T237379, T235748, T236906)
  • 04:56 kartik@deploy1001: helmfile [EQIAD] Ran 'apply' command on namespace 'cxserver' for release 'production' .
  • 04:51 kartik@deploy1001: helmfile [CODFW] Ran 'apply' command on namespace 'cxserver' for release 'production' .
  • 04:49 kartik@deploy1001: helmfile [STAGING] Ran 'apply' command on namespace 'cxserver' for release 'staging' .
  • 03:49 vgutierrez: power cycling cp1077 - T238289
  • 03:49 vgutierrez@puppetmaster1001: conftool action : set/pooled=no; selector: name=cp1077.eqiad.wmnet
  • 03:49 vgutierrez: depooling cp1077 - T238289
  • 00:41 ebernhardson: T237849 Start CirrusSearch forceSearchIndex.php commonswiki 2019-10-20T00:00:00 - 2019-11-14T01:00:00 pushing into jobqueue
  • 00:40 crusnov@deploy1001: Finished deploy [netbox/deploy@56df4a5]: deploy netbox for script update (duration: 00m 49s)
  • 00:39 crusnov@deploy1001: Started deploy [netbox/deploy@56df4a5]: deploy netbox for script update
  • 00:39 crusnov@deploy1001: Finished deploy [netbox/deploy@56df4a5]: deploy netbox for script update (duration: 00m 44s)
  • 00:38 crusnov@deploy1001: Started deploy [netbox/deploy@56df4a5]: deploy netbox for script update
  • 00:36 ebernhardson@deploy1001: Synchronized php-1.35.0-wmf.5/extensions/CirrusSearch/includes/BuildDocument/BuildDocument.php: T237849: Restore CirrusSearchBuildDocumentParse hook (duration: 00m 54s)

2019-11-13

  • 23:00 jeh@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
  • 22:58 jeh@cumin1001: START - Cookbook sre.hosts.downtime
  • 22:25 catrope@deploy1001: Finished scap: For some reason that limited i18n sync didn't work, trying a full scap (duration: 18m 33s)
  • 22:07 catrope@deploy1001: Started scap: For some reason that limited i18n sync didn't work, trying a full scap
  • 22:04 catrope@deploy1001: scap sync-l10n completed (1.35.0-wmf.5) (duration: 02m 54s)
  • 22:00 catrope@deploy1001: Synchronized php-1.35.0-wmf.5/extensions/GrowthExperiments/: Update to master (b937dce) (duration: 00m 54s)
  • 20:17 XioNoX: delete unused asw2-esams:ae1
  • 19:37 mholloway-shell@deploy1001: Synchronized wmf-config/InitialiseSettings.php: MachineVision: Update WD item blacklist (again) (duration: 00m 52s)
  • 18:49 Jeff_Green: authdns-update to remove host alnilam
  • 17:49 mholloway-shell@deploy1001: Synchronized wmf-config/InitialiseSettings.php: MachineVision: Update WD item blacklist (duration: 00m 53s)
  • 16:41 gehel: depool wdqs1005 - T238232
  • 16:36 gehel: restart blazegraph on wdqs1005
  • 16:21 ema: pool cp3054 with ATS backend T227432
  • 16:21 gehel: draining elastic1017-1031 to prepare for decommission - T230746
  • 16:02 ema@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
  • 16:00 ema@cumin1001: START - Cookbook sre.hosts.downtime
  • 15:51 marostegui@cumin1001: dbctl commit (dc=all): 'Fully repool db1089', diff saved to https://phabricator.wikimedia.org/P9621 and previous config saved to /var/cache/conftool/dbconfig/20191113-155134-marostegui.json
  • 15:39 moritzm: powercycle cloudbackup2002
  • 15:35 ema: depool cp3054 and reimage as text_ats T227432
  • 15:32 moritzm: rebooting cloudbackup2002
  • 15:30 jmm@cumin2001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
  • 15:30 jmm@cumin2001: START - Cookbook sre.hosts.downtime
  • 15:29 jynus: shutdown db2072 T237905
  • 15:29 gehel: configuration of new elasticsearch servers completed, all working and pooled - T230746
  • 14:55 jynus@cumin1001: dbctl commit (dc=all): 'Depool db2072', diff saved to https://phabricator.wikimedia.org/P9620 and previous config saved to /var/cache/conftool/dbconfig/20191113-145541-jynus.json
  • 13:49 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db1130 for schema change', diff saved to https://phabricator.wikimedia.org/P9619 and previous config saved to /var/cache/conftool/dbconfig/20191113-134938-marostegui.json
  • 13:46 marostegui@cumin1001: dbctl commit (dc=all): 'More traffic to db1089 after upgrade', diff saved to https://phabricator.wikimedia.org/P9618 and previous config saved to /var/cache/conftool/dbconfig/20191113-134625-marostegui.json
  • 13:34 marostegui@cumin1001: dbctl commit (dc=all): 'Slowly repool db1089 after upgrade', diff saved to https://phabricator.wikimedia.org/P9617 and previous config saved to /var/cache/conftool/dbconfig/20191113-133410-marostegui.json
  • 13:22 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db1089 for upgrade', diff saved to https://phabricator.wikimedia.org/P9616 and previous config saved to /var/cache/conftool/dbconfig/20191113-132216-marostegui.json
  • 13:15 marostegui@cumin1001: dbctl commit (dc=all): 'Repool db1096:3315', diff saved to https://phabricator.wikimedia.org/P9615 and previous config saved to /var/cache/conftool/dbconfig/20191113-131530-marostegui.json
  • 11:56 effie: Upgrade to php 7.2.24-1 mediawiki eqiad hosts and restart php-fpm - T237239
  • 11:55 ema: cp-ats: rolling trafficserver (8.0.5-1wm10) and fifo-log-demux (0.6) upgrade and restart
  • 11:46 moritzm: rebooting cloudcontrol2001-dev for microcode debugging
  • 11:45 jmm@cumin2001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
  • 11:45 jmm@cumin2001: START - Cookbook sre.hosts.downtime
  • 11:38 moritzm: rebooting labtestpuppetmaster2001 for microcode debugging
  • 11:37 jmm@cumin2001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
  • 11:37 jmm@cumin2001: START - Cookbook sre.hosts.downtime
  • 11:27 ema: cp-ats-ulsfo: rolling trafficserver (8.0.5-1wm10) and fifo-log-demux (0.6) upgrade and restart
  • 11:27 moritzm: rebooting cloudcontrol2003-dev for some microcode debugging
  • 11:24 jmm@cumin2001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
  • 11:24 jmm@cumin2001: START - Cookbook sre.hosts.downtime
  • 11:24 ema: cp4022: trafficserver (8.0.5-1wm10) and fifo-log-demux (0.6) upgrade and restart
  • 11:08 marostegui@cumin1001: dbctl commit (dc=all): 'Fully repool db1083', diff saved to https://phabricator.wikimedia.org/P9614 and previous config saved to /var/cache/conftool/dbconfig/20191113-110802-marostegui.json
  • 11:05 Urbanecm: EU SWAT done
  • 11:05 Urbanecm: Purge https://en.wikipedia.org/static/images/project-logos/ffwiki* (T238191)
  • 11:04 urbanecm@deploy1001: Synchronized static/images/project-logos/: SWAT: 0a90ef9: Update localized logos for the Fula Wikipedia (T238191) (duration: 00m 54s)
  • 10:53 vgutierrez: Testing ats-tls-restart on cp5007 - T237425
  • 10:43 marostegui@cumin1001: dbctl commit (dc=all): 'More traffic to db1083', diff saved to https://phabricator.wikimedia.org/P9613 and previous config saved to /var/cache/conftool/dbconfig/20191113-104326-marostegui.json
  • 10:32 marostegui@cumin1001: dbctl commit (dc=all): 'More traffic to db1083', diff saved to https://phabricator.wikimedia.org/P9612 and previous config saved to /var/cache/conftool/dbconfig/20191113-103225-marostegui.json
  • 10:27 gehel: start configuration of new elasticsearch servers - T230746
  • 10:20 marostegui@cumin1001: dbctl commit (dc=all): 'Slowly repool db1083 after kernel upgrade', diff saved to https://phabricator.wikimedia.org/P9610 and previous config saved to /var/cache/conftool/dbconfig/20191113-102054-marostegui.json
  • 10:11 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db1083 for kernel upgrade', diff saved to https://phabricator.wikimedia.org/P9609 and previous config saved to /var/cache/conftool/dbconfig/20191113-101127-marostegui.json
  • 09:51 jynus: upgraded wmf-mariadb101-client on cumin hosts
  • 09:50 mobrovac@deploy1001: helmfile [EQIAD] Ran 'apply' command on namespace 'mathoid' for release 'production' .
  • 09:43 mobrovac@deploy1001: helmfile [CODFW] Ran 'apply' command on namespace 'mathoid' for release 'production' .
  • 09:41 mobrovac@deploy1001: helmfile [STAGING] Ran 'apply' command on namespace 'mathoid' for release 'staging' .
  • 09:21 mobrovac@deploy1001: Finished deploy [restbase/deploy@1f2c7d8]: Start storing Parsoid/PHP results; add gcrwiki, shywiktionary, szywiki - T229015 T238117 T238116 T237374 (duration: 11m 19s)
  • 09:10 mobrovac@deploy1001: Started deploy [restbase/deploy@1f2c7d8]: Start storing Parsoid/PHP results; add gcrwiki, shywiktionary, szywiki - T229015 T238117 T238116 T237374
  • 09:09 mobrovac@deploy1001: Finished deploy [restbase/deploy@1f2c7d8] (dev-cluster): Start storing Parsoid/PHP results; add gcrwiki, shywiktionary, szywiki (duration: 02m 35s)
  • 09:06 mobrovac@deploy1001: Started deploy [restbase/deploy@1f2c7d8] (dev-cluster): Start storing Parsoid/PHP results; add gcrwiki, shywiktionary, szywiki
  • 08:25 marostegui: Stop MySQL on db2062 to copy its data to db2132 T238183
  • 08:18 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
  • 08:15 marostegui@cumin1001: START - Cookbook sre.hosts.downtime
  • 07:09 marostegui: Fix replication on labsdb1010 - T233986
  • 07:03 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db1096:3315 for schema change', diff saved to https://phabricator.wikimedia.org/P9607 and previous config saved to /var/cache/conftool/dbconfig/20191113-070339-marostegui.json
  • 07:00 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db2086:3317 for compression', diff saved to https://phabricator.wikimedia.org/P9606 and previous config saved to /var/cache/conftool/dbconfig/20191113-070055-marostegui.json
  • 06:59 marostegui@cumin1001: dbctl commit (dc=all): 'Repool db2087:3317 after compression', diff saved to https://phabricator.wikimedia.org/P9605 and previous config saved to /var/cache/conftool/dbconfig/20191113-065952-marostegui.json
  • 06:58 marostegui@cumin1001: dbctl commit (dc=all): 'Repool db1097:3315 after schema change', diff saved to https://phabricator.wikimedia.org/P9604 and previous config saved to /var/cache/conftool/dbconfig/20191113-065823-marostegui.json
  • 06:25 volker-e@deploy1001: Finished deploy [design/style-guide@edce4cc]: Deploy design/style-guide: (duration: 00m 08s)
  • 06:25 volker-e@deploy1001: Started deploy [design/style-guide@edce4cc]: Deploy design/style-guide:
  • 01:35 eileen: civicrm revision changed from 3c15db25bb to a3714003ff, config revision is d678dbcaa5

2019-11-12

  • 23:57 mholloway-shell@deploy1001: Synchronized php-1.35.0-wmf.5/extensions/MachineVision: Fix: Do not return after inserting a single suggestion (duration: 00m 52s)
  • 23:51 catrope@deploy1001: Synchronized php-1.35.0-wmf.5/resources/src/mediawiki.interface.helpers.styles.less: Remove extraneous semicolons (T233649), part 2 (duration: 00m 52s)
  • 23:49 catrope@deploy1001: Synchronized php-1.35.0-wmf.5/includes/changes/ChangesList.php: Remove extraneous semicolons (T233649), part 1 (duration: 00m 53s)
  • 23:49 jeh@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
  • 23:45 jeh@cumin1001: START - Cookbook sre.hosts.downtime
  • 23:22 jeh@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
  • 23:20 jeh@cumin1001: START - Cookbook sre.hosts.downtime
  • 22:37 bblack: repool cp1076 (experiments concluded)
  • 22:35 tstarling@deploy1001: Synchronized wmf-config/CommonSettings.php: enabling REST API (duration: 00m 52s)
  • 22:34 tstarling@deploy1001: Synchronized wmf-config/InitialiseSettings.php: enabling REST API (duration: 00m 52s)
  • 22:32 eileen: civicrm revision changed from bfa53ee611 to 3c15db25bb, config revision is d678dbcaa5
  • 21:54 bblack: depooling cp1076 for some local experimentation
  • 20:18 herron: reprepro copy buster-wikimedia stretch-wikimedia prometheus-elasticsearch-exporter
  • 20:11 otto@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
  • 20:11 otto@cumin1001: START - Cookbook sre.hosts.downtime
  • 19:46 Amir1: ladsgroup@mwmaint1002:~$ mwscript extensions/Wikibase/repo/maintenance/changePropertyDataType.php --wiki=wikidatawiki --property-id P7007 --new-data-type external-id (T234221)
  • 19:45 Amir1: ladsgroup@mwmaint1002:~$ mwscript extensions/Wikibase/repo/maintenance/changePropertyDataType.php --wiki=wikidatawiki --property-id P4839 --new-data-type external-id (T234221)
  • 19:43 anomie@deploy1001: Synchronized wmf-config/InitialiseSettings-labs.php: Sync a previously undeployed change to InitialiseSettings-labs.php that someone forgot to deploy (as a no-op) in production (duration: 00m 52s)
  • 19:41 anomie@deploy1001: Synchronized wmf-config/InitialiseSettings.php: Set MCR migration stage to NEW on group0 for T198312 (duration: 00m 52s)
  • 19:19 arlolra: Updated Parsoid to 6a0a708 (T215000, T235295, T235656, T235217, T235295, T236846, T237556, T235231)
  • 19:03 arlolra@deploy1001: Finished deploy [parsoid/deploy@f516018]: Updating Parsoid to 6a0a708 (duration: 10m 09s)
  • 18:58 mholloway-shell@deploy1001: Synchronized php-1.35.0-wmf.5/extensions/MachineVision: Final fixes and tweaks for testing (duration: 00m 53s)
  • 18:53 arlolra@deploy1001: Started deploy [parsoid/deploy@f516018]: Updating Parsoid to 6a0a708
  • 18:39 ejegg: re-enabled Omnimail and contact de-duplication jobs
  • 18:20 Urbanecm: Morning SWAT done
  • 18:18 Urbanecm: Deploy security patch for T237887
  • 18:15 urbanecm@deploy1001: Synchronized wmf-config/InitialiseSettings.php: SWAT: 130ef87: Add right "abusefilter-log-private" to usergroup "rollbacker" at ptwiki (T237830) (duration: 00m 53s)
  • 18:08 XioNoX: push pfw change to add recdns anycast IP
  • 17:33 XioNoX: update fasw-c-eqiad to match current standard (ntp/users/rootpw/lldp)
  • 17:22 XioNoX: update fasw-c-codfw to match current standard (ntp/users/rootpw/lldp)
  • 17:03 ema: pool cp3052 with ATS backend T238085
  • 17:03 ema: pool cp3052 with ATS backend T227432
  • 16:53 bblack: cpNNNN (all cache nodes) - cumin manual removal of globalsign-2018 remnants (key, cert, ocsp config, ocsp output)
  • 16:42 ema@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
  • 16:40 ema@cumin1001: START - Cookbook sre.hosts.downtime
  • 16:28 XioNoX: setup bgp session from cr2-codfw to multihop RIS collector - T106056
  • 16:21 XioNoX: reboot scs-c1-eqiad.mgmt.eqiad.wmnet - T238036
  • 16:09 ema: depool cp3052 and observe performance impact T238085 before reimaging as text_ats T227432
  • 15:49 marostegui: Deploy schema change on db1102:3315 T233135 T234066
  • 15:45 mholloway-shell@deploy1001: Synchronized php-1.35.0-wmf.5/extensions/MachineVision: Fixes and tweaks for initial rollout (duration: 00m 53s)
  • 15:41 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db1097:3315 for a schema change T233135 T234066', diff saved to https://phabricator.wikimedia.org/P9600 and previous config saved to /var/cache/conftool/dbconfig/20191112-154127-marostegui.json
  • 15:24 otto@puppetmaster1001: conftool action : set/pooled=true; selector: dnsdisc=schema
  • 14:46 bblack: cpNNNN (all caches): remove stale outputs from transient ocsp failures ( /var/cache/ocsp/update-ocsp-*.tmp )
  • 14:41 ema: cp4022: trafficserver (8.0.5-1wm10) and fifo-log-demux (0.6) upgrade and restart
  • 14:38 ema@puppetmaster1001: conftool action : set/pooled=yes; selector: name=cp4021.ulsfo.wmnet,service=nginx
  • 14:35 ema: cp4021: ats-tls-restart to see if https://gerrit.wikimedia.org/r/550475 fixed the script
  • 14:16 Jeff_Green: authdns-update to deploy fundraising-read.wmnet service cname adjustment
  • 14:01 ladsgroup@deploy1001: Synchronized wmf-config/InitialiseSettings.php: Revert "Set all of wikidata for write both for term store" (duration: 00m 52s)
  • 12:57 godog: refresh kibana field list
  • 12:46 gehel: repool wdqs1004
  • 12:37 Amir1: ladsgroup@mwmaint1002:~$ mwscript extensions/Wikibase/repo/maintenance/rebuildPropertyTerms.php --wiki=wikidatawiki --batch-size 100 (T237984)
  • 12:19 onimisionipe: restarting blazegraph on wdqs1005
  • 12:11 effie: Reimage mwdebug1002 - T214734
  • 11:47 Amir1: EU SWAT is done
  • 11:47 ladsgroup@deploy1001: Synchronized php-1.35.0-wmf.4/extensions/Wikibase: Wikibase term store error reduction, Do not catch DBError in ReplicaMasterAwareRecordIdsAcquirer. (T236466) (duration: 00m 56s)
  • 11:44 effie: Upgrade wtp* to 7.2.24-1 with elegance and restart php-fpm - T237239
  • 11:20 ladsgroup@deploy1001: Synchronized wmf-config/InitialiseSettings.php: Set all of wikidata for write both for term store (T225055) (duration: 00m 52s)
  • 11:05 urbanecm@deploy1001: Synchronized wmf-config/CommonSettings.php: SECURITY: Dont allow Wikimedia sysops to see who had 2FA disabled (duration: 00m 53s)
  • 10:44 marostegui@cumin1001: dbctl commit (dc=all): 'Fully repool db1083', diff saved to https://phabricator.wikimedia.org/P9599 and previous config saved to /var/cache/conftool/dbconfig/20191112-104400-marostegui.json
  • 10:36 marostegui@cumin1001: dbctl commit (dc=all): 'More traffic to db1083', diff saved to https://phabricator.wikimedia.org/P9598 and previous config saved to /var/cache/conftool/dbconfig/20191112-103641-marostegui.json
  • 10:35 onimisionipe: resetting cronfile on wdqs hosts
  • 10:33 marostegui: Drop labtestwiki database from m5 master db1133 - T236010
  • 10:30 marostegui: Deploy schema change on dbstore1003:3315
  • 10:07 ema: repool cp3065, nothing interesting in kern.log and SEL T238032
  • 09:52 marostegui@cumin1001: dbctl commit (dc=all): 'More traffic to db1083', diff saved to https://phabricator.wikimedia.org/P9596 and previous config saved to /var/cache/conftool/dbconfig/20191112-095221-marostegui.json
  • 09:42 marostegui: Remove privileges for labtestwiki on m5 - T236010
  • 09:27 gehel: restarting blazegraph on wdqs1004
  • 09:17 marostegui@cumin1001: dbctl commit (dc=all): 'Slowly repool db1083', diff saved to https://phabricator.wikimedia.org/P9595 and previous config saved to /var/cache/conftool/dbconfig/20191112-091706-marostegui.json
  • 09:12 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db1083 for mariadb upgrade to 10.1.39 - T234800', diff saved to https://phabricator.wikimedia.org/P9594 and previous config saved to /var/cache/conftool/dbconfig/20191112-091158-marostegui.json
  • 09:11 marostegui: Upgrade mariadb to 10.1.39 on db1083 (candidate master for s1)
  • 08:56 moritzm: restarting archiva to pick up Java security updates
  • 08:44 volker-e@deploy1001: Finished deploy [design/style-guide@3de6820]: Deploy design/style-guide: (duration: 00m 06s)
  • 08:44 volker-e@deploy1001: Started deploy [design/style-guide@3de6820]: Deploy design/style-guide:
  • 08:37 marostegui@cumin1001: dbctl commit (dc=all): 'Fully repool db1083', diff saved to https://phabricator.wikimedia.org/P9593 and previous config saved to /var/cache/conftool/dbconfig/20191112-083720-marostegui.json
  • 08:37 gehel: depool wdqs1004 to investigate update lag
  • 08:35 moritzm: installing poppler security updates
  • 08:24 volker-e@deploy1001: Finished deploy [design/style-guide@b926b95]: Deploy design/style-guide: (duration: 00m 07s)
  • 08:24 volker-e@deploy1001: Started deploy [design/style-guide@b926b95]: Deploy design/style-guide:
  • 08:15 moritzm: installing curl security updates
  • 08:13 marostegui@cumin1001: dbctl commit (dc=all): 'Increase traffic to db1083', diff saved to https://phabricator.wikimedia.org/P9592 and previous config saved to /var/cache/conftool/dbconfig/20191112-081322-marostegui.json
  • 07:40 marostegui@cumin1001: dbctl commit (dc=all): 'Increase traffic to db1083', diff saved to https://phabricator.wikimedia.org/P9591 and previous config saved to /var/cache/conftool/dbconfig/20191112-074006-marostegui.json
  • 07:36 elukey: remove /etc/logrotate.d/wdqs_autodeployment_log from wdqs1009 (not in puppet anymore and causing cronspam)
  • 07:28 marostegui@cumin1001: dbctl commit (dc=all): 'Slowly repool db1083 after kernel upgrade', diff saved to https://phabricator.wikimedia.org/P9590 and previous config saved to /var/cache/conftool/dbconfig/20191112-072823-marostegui.json
  • 07:10 marostegui: Upgrade kernel on db1083 (s1 candidate master)
  • 07:04 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db1083 for kernel upgrade - T234800', diff saved to https://phabricator.wikimedia.org/P9589 and previous config saved to /var/cache/conftool/dbconfig/20191112-070436-marostegui.json
  • 06:58 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.decommission (exit_code=0)
  • 06:57 marostegui@cumin1001: START - Cookbook sre.hosts.decommission
  • 06:44 marostegui: Change triggers on s5 db2094 - T234704
  • 06:40 marostegui: Deploy schema change on s5 codfw with replication, this will generate lag on s5 codfw T233135 T234066
  • 06:21 marostegui: Compress db2087:3316, db2087:3317 T235599
  • 06:20 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db2087:3316, db2087:3317 for compression - T235599', diff saved to https://phabricator.wikimedia.org/P9588 and previous config saved to /var/cache/conftool/dbconfig/20191112-061959-marostegui.json
  • 03:41 vgutierrez: restart wdqs-blazegraph on wdqs1004

2019-11-11

  • 22:51 ema@puppetmaster1001: conftool action : set/pooled=no; selector: name=cp3065.esams.wmnet
  • 22:49 ema: power-cycle cp3065, currently down
  • 19:36 XioNoX: disable ALGs on mr1-esams
  • 18:20 onimisionipe@deploy1001: Finished deploy [wdqs/wdqs@222b1c2]: New WDQS build - 0.3.6-SNAPSHOT (duration: 00m 57s)
  • 18:19 onimisionipe@deploy1001: Started deploy [wdqs/wdqs@222b1c2]: New WDQS build - 0.3.6-SNAPSHOT
  • 18:16 onimisionipe@deploy1001: Finished deploy [wdqs/wdqs@222b1c2]: New WDQS build - 0.3.6-SNAPSHOT (duration: 15m 14s)
  • 18:01 onimisionipe@deploy1001: Started deploy [wdqs/wdqs@222b1c2]: New WDQS build - 0.3.6-SNAPSHOT
  • 17:53 elukey@cumin1001: END (PASS) - Cookbook sre.kafka.roll-restart-brokers (exit_code=0)
  • 17:44 jeh@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
  • 17:41 jeh@cumin1001: START - Cookbook sre.hosts.downtime
  • 15:44 elukey@cumin1001: START - Cookbook sre.kafka.roll-restart-brokers
  • 15:42 elukey@cumin1001: END (PASS) - Cookbook sre.kafka.roll-restart-mirror-maker (exit_code=0)
  • 15:30 elukey@cumin1001: START - Cookbook sre.kafka.roll-restart-mirror-maker
  • 14:26 ema: pool cp3050 with ATS backend T227432
  • 13:50 ema@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
  • 13:48 ema@cumin1001: START - Cookbook sre.hosts.downtime
  • 13:25 ema: depool cp3050 and reimage as text_ats T227432
  • 12:59 gehel@cumin1001: START - Cookbook sre.wdqs.data-reload
  • 12:46 effie: Upgrade to 7.2.24-1 mwdebug[2001-2002].codfw.wmnet,mwmaint2001.codfw.wmnet,deploy2001.codfw.wmnet - T237239
  • 12:31 onimisionipe@deploy1001: Finished deploy [wdqs/wdqs@2cb2dde]: Deploy updates on wdqs1010 (duration: 00m 28s)
  • 12:30 onimisionipe@deploy1001: Started deploy [wdqs/wdqs@2cb2dde]: Deploy updates on wdqs1010
  • 12:28 effie: Upgrade mw2* to 7.2.24-1 with elegance and restart php-fpm - T237239
  • 12:21 effie: Upgrade mw2* to 7.2.24-1 with elegance and restart php-fpm - T231881
  • 11:55 gehel@cumin1001: END (FAIL) - Cookbook sre.wdqs.data-reload (exit_code=99)
  • 10:52 hoo: Updated the Wikidata property suggester with data from the 2019-11-04 JSON dump and applied the T132839 workarounds
  • 10:48 gehel@cumin1001: START - Cookbook sre.wdqs.data-reload
  • 10:47 gehel@cumin1001: END (FAIL) - Cookbook sre.wdqs.data-reload (exit_code=99)
  • 10:45 gehel@cumin1001: START - Cookbook sre.wdqs.data-reload
  • 10:32 vgutierrez: restarting ats-tls on cp1088
  • 10:21 jynus: upgrade mariadb on db2102
  • 10:16 ema: repool cp4027 after successful X-Wikimedia-Debug testing P9585 T237687
  • 10:12 jynus: manually run full backup of labtestpuppetmaster2001 T235819
  • 09:41 ema: test x-wikimedia-debug-routing.lua on cp4027 (depooled) T237687
  • 09:09 volker-e@deploy1001: Finished deploy [design/style-guide@0ea65f2]: Deploy design/style-guide: (duration: 00m 07s)
  • 09:09 volker-e@deploy1001: Started deploy [design/style-guide@0ea65f2]: Deploy design/style-guide:
  • 08:28 marostegui: Stop MySQL on db2048 before decommissioning - T237913
  • 08:28 marostegui@deploy1001: Synchronized wmf-config/db-codfw.php: Remove db2048 from config T237913 (duration: 00m 51s)
  • 08:27 marostegui@deploy1001: Synchronized wmf-config/db-eqiad.php: Remove db2048 from config T237913 (duration: 00m 54s)
  • 08:21 marostegui: Remove db2048 from tendril and zarcillo T237913
  • 06:56 elukey: delete /etc/logrotate.d/wdqs-reload-categories from wdqs* as attempt to reduce cronspam
  • 06:44 marostegui: Delete globalblocks table from napwikisource T230055
  • 05:27 vgutierrez: Switch from nginx to ats-tls on cp3058 - T231627

2019-11-09

  • 20:25 reedy@deploy1001: Synchronized langlist-labs: T237823 (duration: 00m 54s)
  • 02:39 volker-e@deploy1001: Finished deploy [design/style-guide@d2bfc09]: Deploy design/style-guide: (duration: 00m 07s)
  • 02:39 volker-e@deploy1001: Started deploy [design/style-guide@d2bfc09]: Deploy design/style-guide:
  • 01:07 volker-e@deploy1001: Finished deploy [design/style-guide@ef82b69]: Deploy design/style-guide: (duration: 00m 07s)
  • 01:07 volker-e@deploy1001: Started deploy [design/style-guide@ef82b69]: Deploy design/style-guide:
  • 01:06 volker-e@deploy1001: Finished deploy [design/style-guide@97fb3ee]: Deploy design/style-guide: (duration: 00m 09s)
  • 01:06 volker-e@deploy1001: Started deploy [design/style-guide@97fb3ee]: Deploy design/style-guide:

2019-11-08

  • 20:26 mholloway-shell@deploy1001: Synchronized wmf-config/InitialiseSettings.php: MachineVision: Delay annotation request jobs by 5 mins for testing (duration: 00m 52s)
  • 16:54 jeh@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
  • 16:52 jeh@cumin1001: START - Cookbook sre.hosts.downtime
  • 16:19 jeh@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
  • 16:15 jeh@cumin1001: START - Cookbook sre.hosts.downtime
  • 16:15 mholloway-shell@deploy1001: Synchronized wmf-config/InitialiseSettings.php: Revert "MachineVision: Enable testers-only mode on testcommonswiki for debugging" (duration: 00m 54s)
  • 15:57 jynus@cumin1001: dbctl commit (dc=all): 'Pool db1118, db1106 at 100%', diff saved to https://phabricator.wikimedia.org/P9582 and previous config saved to /var/cache/conftool/dbconfig/20191108-155700-jynus.json
  • 15:37 herron: beginning rolling service restarts on logstash hosts for java security updates
  • 15:13 mholloway-shell@deploy1001: Synchronized wmf-config/InitialiseSettings.php: MachineVision: Enable testers-only mode on testcommonswiki for debugging (duration: 00m 52s)
  • 14:56 volans@cumin1001: END (PASS) - Cookbook sre.hosts.decommission (exit_code=0)
  • 14:55 volans@cumin1001: START - Cookbook sre.hosts.decommission
  • 14:50 jynus@cumin1001: dbctl commit (dc=all): 'Pool db1118 at 50%', diff saved to https://phabricator.wikimedia.org/P9581 and previous config saved to /var/cache/conftool/dbconfig/20191108-145028-jynus.json
  • 14:42 volans@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
  • 14:40 volans@cumin1001: START - Cookbook sre.hosts.downtime
  • 13:40 jynus: stop and upgrade percona-server on test host db1114
  • 13:27 elukey@cumin1001: END (PASS) - Cookbook sre.cassandra.roll-restart (exit_code=0)
  • 13:12 jynus@cumin1001: dbctl commit (dc=all): 'Pool db1118 at 20%', diff saved to https://phabricator.wikimedia.org/P9580 and previous config saved to /var/cache/conftool/dbconfig/20191108-131257-jynus.json
  • 13:09 urbanecm@deploy1001: Synchronized wmf-config/InitialiseSettings.php: ee2027c: Change the language of Votewiki back to English (en) (T230614) (duration: 00m 54s)
  • 12:34 elukey@cumin1001: START - Cookbook sre.cassandra.roll-restart
  • 12:14 jynus@cumin1001: dbctl commit (dc=all): 'Pool db1118 at 10%', diff saved to https://phabricator.wikimedia.org/P9578 and previous config saved to /var/cache/conftool/dbconfig/20191108-121444-jynus.json
  • 12:02 jynus: update and restart db1118
  • 12:01 jynus@cumin1001: dbctl commit (dc=all): 'Depool db1118 fully', diff saved to https://phabricator.wikimedia.org/P9577 and previous config saved to /var/cache/conftool/dbconfig/20191108-120138-jynus.json
  • 11:55 jynus@cumin1001: dbctl commit (dc=all): 'Pool db1118 at 20%', diff saved to https://phabricator.wikimedia.org/P9576 and previous config saved to /var/cache/conftool/dbconfig/20191108-115553-jynus.json
  • 11:27 jynus@cumin1001: dbctl commit (dc=all): 'Pool db1118 at 50%', diff saved to https://phabricator.wikimedia.org/P9575 and previous config saved to /var/cache/conftool/dbconfig/20191108-112733-jynus.json
  • 11:25 jynus@cumin1001: dbctl commit (dc=all): 'repool db2130', diff saved to https://phabricator.wikimedia.org/P9574 and previous config saved to /var/cache/conftool/dbconfig/20191108-112503-jynus.json
  • 11:12 jynus: update and restart db2130
  • 11:11 jynus@cumin1001: dbctl commit (dc=all): 'Repool db2116, depool db2130', diff saved to https://phabricator.wikimedia.org/P9573 and previous config saved to /var/cache/conftool/dbconfig/20191108-111125-jynus.json
  • 10:58 Amir1: running rebuildItemTerms on 8028 items (T234329)
  • 10:51 jynus: update and restart db2116
  • 10:50 jynus@cumin1001: dbctl commit (dc=all): 'Repool db2103, depool db2116', diff saved to https://phabricator.wikimedia.org/P9572 and previous config saved to /var/cache/conftool/dbconfig/20191108-105013-jynus.json
  • 10:38 jynus: update and restart db2103
  • 10:34 jeh: enable IPMI `racadm set iDRAC.IPMILan.Enable 1` on cloudcephmon[1-3] T228102
  • 10:33 jeh: enable IPMI `racadm set iDRAC.IPMILan.Enable 1` on cloudcephosd[1-3] T224188
  • 10:32 jynus@cumin1001: dbctl commit (dc=all): 'Repool db2092, depool db2103', diff saved to https://phabricator.wikimedia.org/P9571 and previous config saved to /var/cache/conftool/dbconfig/20191108-103218-jynus.json
  • 10:19 jynus: update and restart db2092
  • 10:18 jynus@cumin1001: dbctl commit (dc=all): 'Repool db2071, depool db2092', diff saved to https://phabricator.wikimedia.org/P9570 and previous config saved to /var/cache/conftool/dbconfig/20191108-101759-jynus.json
  • 10:09 elukey: restart jvm-based hadoop daemons on an-master100[1,2] to pick up the new openjdk version
  • 10:06 jynus: update and restart db2071
  • 10:03 jynus@cumin1001: dbctl commit (dc=all): 'Depool db2071', diff saved to https://phabricator.wikimedia.org/P9569 and previous config saved to /var/cache/conftool/dbconfig/20191108-100310-jynus.json
  • 10:01 jynus@cumin1001: dbctl commit (dc=all): 'Repool db2072', diff saved to https://phabricator.wikimedia.org/P9568 and previous config saved to /var/cache/conftool/dbconfig/20191108-100128-jynus.json
  • 09:50 moritzm: uploaded openjdk 8u232-b09-1~deb10u1 to component/jdk8 for apt.wikimedia.org/buster-wikimedia
  • 09:41 jynus: update and restart db2072
  • 09:41 jynus@cumin1001: dbctl commit (dc=all): 'Depool db2072', diff saved to https://phabricator.wikimedia.org/P9567 and previous config saved to /var/cache/conftool/dbconfig/20191108-094100-jynus.json
  • 09:39 jynus@cumin1001: dbctl commit (dc=all): 'Repool db1106 at 50%', diff saved to https://phabricator.wikimedia.org/P9566 and previous config saved to /var/cache/conftool/dbconfig/20191108-093958-jynus.json
  • 09:35 elukey@cumin1001: END (PASS) - Cookbook sre.druid.roll-restart-workers (exit_code=0)
  • 09:29 jynus: update and restart db2094
  • 09:27 jynus@cumin1001: dbctl commit (dc=all): 'Repool db1106 at 10%', diff saved to https://phabricator.wikimedia.org/P9565 and previous config saved to /var/cache/conftool/dbconfig/20191108-092735-jynus.json
  • 09:10 jynus: update and restart db1106
  • 09:08 moritzm: installing Java security updates on kafka-jumbo
  • 09:07 jynus@cumin1001: dbctl commit (dc=all): 'Depool db1106 fully', diff saved to https://phabricator.wikimedia.org/P9564 and previous config saved to /var/cache/conftool/dbconfig/20191108-090746-jynus.json
  • 09:05 elukey@cumin1001: START - Cookbook sre.druid.roll-restart-workers
  • 09:04 jynus@cumin1001: dbctl commit (dc=all): 'Depool db1106 at 10%', diff saved to https://phabricator.wikimedia.org/P9563 and previous config saved to /var/cache/conftool/dbconfig/20191108-090451-jynus.json
  • 09:00 jynus@cumin1001: dbctl commit (dc=all): 'Depool db1106 at 50%', diff saved to https://phabricator.wikimedia.org/P9562 and previous config saved to /var/cache/conftool/dbconfig/20191108-090012-jynus.json
  • 08:57 elukey@cumin1001: END (PASS) - Cookbook sre.druid.roll-restart-workers (exit_code=0)
  • 08:52 jynus: stop and upgrade db1124 (may create temporary lag on wikireplicas)
  • 08:31 elukey@cumin1001: START - Cookbook sre.druid.roll-restart-workers
  • 08:23 elukey: restart kafka on kafka-jumbo1001 to test the new openjdk
  • 08:07 moritzm: installing fribidi security updates on Buster
  • 03:03 vgutierrez: Switch from nginx to ats-tls on cp3054 - T231627
  • 02:42 vgutierrez: Switch from nginx to ats-tls on cp3052 - T231627
  • 01:23 reedy@deploy1001: Synchronized php-1.35.0-wmf.5/extensions/GlobalBlocking/: Prevent some extra db queries (duration: 00m 53s)
  • 01:14 catrope@deploy1001: Synchronized php-1.35.0-wmf.5/resources/: Use internationalized semicolon separators (T233649) (duration: 00m 53s)
  • 01:09 twentyafterfour@deploy1001: Finished deploy [releng/phatality@11d4ad8]: deploying one more time, hopefully without killing elastic (duration: 03m 04s)
  • 01:06 twentyafterfour@deploy1001: Started deploy [releng/phatality@11d4ad8]: deploying one more time, hopefully without killing elastic
  • 00:44 catrope@deploy1001: Synchronized php-1.35.0-wmf.5/extensions/GrowthExperiments/modules/homepage/ext.growthExperiments.Homepage.Logging.js: Fix homepage instrumentation (T237600) (duration: 00m 52s)
  • 00:40 catrope@deploy1001: Synchronized php-1.35.0-wmf.5/includes: Sync DiffEngine changes that were needed to unbreak CI (duration: 00m 55s)
  • 00:34 catrope@deploy1001: Synchronized php-1.35.0-wmf.5/resources/: Semicolon should appear after log entries (T237500) (duration: 00m 53s)
  • 00:26 catrope@deploy1001: Synchronized wmf-config/InitialiseSettings.php: Fix remote API configs for GrowthExperiments (duration: 00m 51s)
  • 00:19 catrope@deploy1001: Synchronized wmf-config/InitialiseSettings.php: Enable suggested edits as hidden preference on arwiki, cswiki, kowiki, viwiki (T236968) (duration: 00m 53s)

2019-11-07

  • 23:49 foks: removing one file for legal compliance
  • 23:47 twentyafterfour@deploy1001: Finished deploy [releng/phatality@62e2870]: revert phatalaty again (duration: 03m 04s)
  • 23:44 shdubsh: start elasticsearch on logstash1008
  • 23:44 twentyafterfour@deploy1001: Started deploy [releng/phatality@62e2870]: revert phatalaty again
  • 23:41 twentyafterfour@deploy1001: Finished deploy [releng/phatality@11d4ad8]: one more time (duration: 03m 00s)
  • 23:38 twentyafterfour@deploy1001: Started deploy [releng/phatality@11d4ad8]: one more time
  • 23:31 twentyafterfour@deploy1001: Finished deploy [releng/phatality@11d4ad8]: trying again with a longer scap timeout (duration: 03m 02s)
  • 23:28 twentyafterfour@deploy1001: Started deploy [releng/phatality@11d4ad8]: trying again with a longer scap timeout
  • 23:23 twentyafterfour@deploy1001: Finished deploy [releng/phatality@62e2870]: revert to previous phatality plugin version (duration: 02m 55s)
  • 23:20 twentyafterfour@deploy1001: Started deploy [releng/phatality@62e2870]: revert to previous phatality plugin version
  • 23:09 twentyafterfour@deploy1001: Finished deploy [releng/phatality@11d4ad8]: (no justification provided) (duration: 00m 06s)
  • 23:09 twentyafterfour@deploy1001: Started deploy [releng/phatality@11d4ad8]: (no justification provided)
  • 23:04 twentyafterfour@deploy1001: Finished deploy [releng/phatality@11d4ad8]: (no justification provided) (duration: 06m 48s)
  • 23:00 XenoRyet: updated payments-wiki from aac3d93f70 to bd907656fb
  • 22:57 twentyafterfour@deploy1001: Started deploy [releng/phatality@11d4ad8]: (no justification provided)
  • 22:53 volker-e@deploy1001: Finished deploy [design/style-guide@4abbc70]: Update responsive Illustrations styles changes (duration: 00m 05s)
  • 22:53 volker-e@deploy1001: Started deploy [design/style-guide@4abbc70]: Update responsive Illustrations styles changes
  • 22:32 mholloway-shell@deploy1001: Synchronized wmf-config/InitialiseSettings.php: MachineVision: Remove annotation job delay (duration: 00m 53s)
  • 22:03 volker-e@deploy1001: Finished deploy [design/style-guide@4abbc70]: Update to latest master with components overview additions (duration: 00m 06s)
  • 22:03 volker-e@deploy1001: Started deploy [design/style-guide@4abbc70]: Update to latest master with components overview additions
  • 21:54 andrewbogott: rebuilding labtestpuppetmaster2001 w/Stretch
  • 21:43 dzahn@cumin1001: conftool action : set/pooled=yes; selector: name=wtp2020.codfw.wmnet
  • 21:28 mutante: boron apt-get clean (saved 9G on /) (T237649)
  • 20:42 twentyafterfour@deploy1001: rebuilt and synchronized wikiversions files: group2 wikis to 1.35.0-wmf.5 refs T233853
  • 20:24 catrope@deploy1001: Synchronized php-1.35.0-wmf.5/extensions/VisualEditor/modules/ve-mw/init/targets/ve.init.mw.ArticleTarget.js: Fix error handling (duration: 01m 00s)
  • 20:21 herron: performing rolling reboots of kafka-main hosts for security updates
  • 20:17 onimisionipe: cluster restart for cloudelastic to pick JVM upgrade
  • 20:08 eileen: civicrm revision changed from f1ce5c86f7 to bfa53ee611, config revision is 72d2692743
  • 19:54 mholloway-shell@deploy1001: Synchronized wmf-config/InitialiseSettings.php: MachineVision: Enqueue annotation job on upload complete (duration: 05m 19s)
  • 18:31 mholloway-shell@deploy1001: Synchronized wmf-config/InitialiseSettings.php: MachineVision: Disable retrying annotation requests (duration: 05m 17s)
  • 18:25 ebernhardson: restart mjolnir-kafka-bulk-daemon and mjolnir-kafka-msearch-daemon across `cirrus` dsh group
  • 18:20 ebernhardson@deploy1001: Finished deploy [search/mjolnir/deploy@afd41d7]: bulk_daemon: Adjust glent configuration (duration: 05m 49s)
  • 18:14 ebernhardson@deploy1001: Started deploy [search/mjolnir/deploy@afd41d7]: bulk_daemon: Adjust glent configuration
  • 17:44 elukey@cumin1001: END (PASS) - Cookbook sre.zookeeper.roll-restart-zookeeper (exit_code=0)
  • 17:39 elukey@cumin1001: END (PASS) - Cookbook sre.hadoop.roll-restart-workers (exit_code=0)
  • 17:38 elukey@cumin1001: START - Cookbook sre.zookeeper.roll-restart-zookeeper
  • 17:37 elukey@cumin1001: END (PASS) - Cookbook sre.zookeeper.roll-restart-zookeeper (exit_code=0)
  • 17:30 elukey@cumin1001: START - Cookbook sre.zookeeper.roll-restart-zookeeper
  • 17:25 mholloway-shell@deploy1001: Synchronized php-1.35.0-wmf.5/extensions/MachineVision: Drop currently unsupported external dependencies (T227349) (duration: 05m 19s)
  • 17:10 XioNoX: Homer push - forwarding-options - to all cr
  • 17:09 elukey@cumin1001: START - Cookbook sre.hadoop.roll-restart-workers
  • 17:08 XioNoX: add sampling stanza (disabled) to cr2-esams
  • 17:00 mutante: wtp2020 - 2 hours downtime - shut down (T205712) - go ahead @papaul
  • 17:00 elukey@cumin1001: END (PASS) - Cookbook sre.hadoop.roll-restart-workers (exit_code=0)
  • 16:58 mutante: wtp2020 - depooled for T205712
  • 16:58 dzahn@cumin1001: conftool action : set/pooled=no; selector: name=wtp2020.codfw.wmnet
  • 16:42 reedy@deploy1001: Synchronized wmf-config/InitialiseSettings-labs.php: some alphasorted config (duration: 01m 00s)
  • 16:34 XioNoX: Homer push on cr2-knams: Sampling (disabled), enhanced-hash-key, ospf interfaces re-ordering (noop), policy-statement BGP_from_LVS (unused), lo0 term allow_vmhost
  • 16:32 jynus@cumin1001: dbctl commit (dc=all): 'Repool db1080 at 100%', diff saved to https://phabricator.wikimedia.org/P9553 and previous config saved to /var/cache/conftool/dbconfig/20191107-163235-jynus.json
  • 16:20 XioNoX: add BGP sessions to AS64050 in eqiad
  • 16:15 XioNoX: add BGP sessions to AS57695 in esams and eqiad
  • 16:12 XioNoX: clear v4 BGP sessions to AS7713 in eqsin (hit max prefix limit)
  • 16:02 mutante: mw2225 restart cron (T236799)
  • 15:58 reedy@deploy1001: Synchronized wmf-config/InitialiseSettings-labs.php: beta logging (duration: 01m 00s)
  • 15:41 XioNoX: remove BGP to AS3491 on eqiad (left the IX)
  • 15:40 elukey@cumin1001: START - Cookbook sre.hadoop.roll-restart-workers
  • 14:53 jbond42: rebuilding compiler1001
  • 13:50 jynus@cumin1001: dbctl commit (dc=all): 'Repool db1080 at 50%', diff saved to https://phabricator.wikimedia.org/P9551 and previous config saved to /var/cache/conftool/dbconfig/20191107-135018-jynus.json
  • 12:47 Urbanecm: EU SWAT done
  • 12:47 urbanecm@deploy1001: Synchronized wmf-config/InitialiseSettings.php: SWAT: 8e71601: a36ed85: GrowthExperiments: Configure testwiki for suggested edits testing + follow up patch (T237634) (duration: 00m 59s)
  • 12:31 urbanecm@deploy1001: Synchronized wmf-config/InitialiseSettings.php: SWAT: 19034af: GrowthExperiments: Configure intro links for suggested edits (T235723) (duration: 01m 00s)
  • 12:16 urbanecm@deploy1001: Synchronized wmf-config/InitialiseSettings.php: SWAT: 2be3f86: [cirrus] remove cross_cluster_single_shard_search quirk (duration: 01m 02s)
  • 12:04 urbanecm@deploy1001: Synchronized wmf-config/InitialiseSettings.php: SWAT: 5253dec: Give commonswiki filemovers `suppressredirect` rights (T236348) (duration: 01m 03s)
  • 11:57 jynus@deploy1001: Synchronized wmf-config/db-eqiad.php: repool es1016 fully (duration: 01m 01s)
  • 11:54 jbond42: update puppet_version used by CI 545289
  • 11:50 jbond42: rebuilding compiler1002
  • 11:36 jynus@cumin1001: dbctl commit (dc=all): 'Repool db1080 at 10%', diff saved to https://phabricator.wikimedia.org/P9550 and previous config saved to /var/cache/conftool/dbconfig/20191107-113611-jynus.json
  • 11:16 jynus: stop and upgrade db1080
  • 10:58 moritzm: installing Java security updates on kafka-main/logstash
  • 10:50 moritzm: installing Java security updates on wdqs/maps
  • 10:46 jynus@cumin1001: dbctl commit (dc=all): 'Fully depool db1080', diff saved to https://phabricator.wikimedia.org/P9549 and previous config saved to /var/cache/conftool/dbconfig/20191107-104618-jynus.json
  • 10:28 moritzm: upgrading mw1277-1279 servers to PHP 7.2.24 T237239
  • 10:27 jynus@cumin1001: dbctl commit (dc=all): 'Reduce db1080 weight', diff saved to https://phabricator.wikimedia.org/P9548 and previous config saved to /var/cache/conftool/dbconfig/20191107-102747-jynus.json
  • 09:41 jynus@deploy1001: Synchronized wmf-config/db-eqiad.php: repool es1016 with low weight (duration: 01m 02s)
  • 09:30 jynus: stop and upgrade es1016
  • 09:18 moritzm: installing Java security updates on aqs/druid/Hadoop
  • 09:12 jynus@deploy1001: Synchronized wmf-config/db-eqiad.php: depool es1016 (duration: 01m 04s)
  • 09:03 jynus: stop and upgrade es2012, es2014
  • 08:48 jynus: stop and upgrade es2011
  • 08:30 jynus: upgrade and restart db2093
  • 00:21 XioNoX: enable interface damping on primary eqsin-codfw link - T236878
  • 00:09 reedy@deploy1001: Synchronized wmf-config/InitialiseSettings-labs.php: https://gerrit.wikimedia.org/r/549227 (duration: 01m 00s)
  • 00:00 ebernhardson@deploy1001: Finished deploy [search/mjolnir/deploy@d2ad2da]: bulk_daemon: support ltr model uploads (duration: 04m 29s)

2019-11-06

  • 23:56 ebernhardson@deploy1001: Started deploy [search/mjolnir/deploy@d2ad2da]: bulk_daemon: support ltr model uploads
  • 23:55 ebernhardson@deploy1001: Finished deploy [search/mjolnir/deploy@d2ad2da]: bulk_daemon: support ltr model uploads (duration: 14m 56s)
  • 23:40 ebernhardson@deploy1001: Started deploy [search/mjolnir/deploy@d2ad2da]: bulk_daemon: support ltr model uploads
  • 22:36 mdholloway: MachineVision: Imported Freebase to Wikidata ID mappings on commonswiki (T227349)
  • 22:30 dzahn@cumin1001: conftool action : set/pooled=yes; selector: name=mw1290.eqiad.wmnet
  • 22:29 mholloway-shell@deploy1001: Synchronized wmf-config/InitialiseSettings.php: Enable MachineVision on commonswiki (T227349) (duration: 01m 00s)
  • 22:24 mholloway-shell@deploy1001: Synchronized wmf-config/InitialiseSettings.php: MachineVision: Delay annotation jobs on commonswiki only (duration: 01m 01s)
  • 22:17 mdholloway: created MachineVision extension tables on commonswiki
  • 22:13 XioNoX: push standard forwarding-options to cr3/4-ulsfo
  • 22:04 dzahn@cumin1001: conftool action : set/pooled=no; selector: name=mw1290.eqiad.wmnet
  • 22:04 mholloway-shell@deploy1001: Synchronized private/PrivateSettings.php: Configure Google Cloud Vision API credentials (2/2) (T236426) (duration: 00m 59s)
  • 22:04 dzahn@cumin1001: conftool action : set/pooled=yes; selector: name=mw1247.eqiad.wmnet
  • 22:03 mholloway-shell@deploy1001: Synchronized private/GoogleCloudVision.php: Configure Google Cloud Vision API credentials (1/2) (T236426) (duration: 00m 59s)
  • 21:57 mholloway-shell@deploy1001: Synchronized php-1.35.0-wmf.5/extensions/MachineVision: Allow specifying API credentials as an associative array (T236426) (duration: 01m 01s)
  • 21:53 thcipriani: checkout /srv/mediawiki-staging/php-1.35.0-wmf.5/maintenance/Maintenance.php looks like a local change for debugging left behind
  • 21:47 arlolra: Updated Parsoid to 1d283ed (T237104, T227209, T236865)
  • 21:35 arlolra@deploy1001: Finished deploy [parsoid/deploy@7e86f83]: Updating Parsoid to 1d283ed (duration: 10m 22s)
  • 21:24 arlolra@deploy1001: Started deploy [parsoid/deploy@7e86f83]: Updating Parsoid to 1d283ed
  • 21:21 dzahn@cumin1001: conftool action : set/pooled=no; selector: name=mw1247.eqiad.wmnet
  • 21:14 XioNoX: push standard forwarding-options to cr3-esams
  • 21:12 milimetric@deploy1001: Finished deploy [analytics/refinery@dc85f9d]: Hdfs Cleaner and TLS columns (duration: 10m 52s)
  • 21:01 milimetric@deploy1001: Started deploy [analytics/refinery@dc85f9d]: Hdfs Cleaner and TLS columns
  • 20:36 twentyafterfour@deploy1001: Synchronized php-1.35.0-wmf.5/extensions/OpenStackManager/: sync openstackmanager to deploy https://gerrit.wikimedia.org/r/#/q/I5b08f0069941052acdd9f05a62aac5b2cf9ecdd5 (duration: 01m 00s)
  • 20:34 twentyafterfour@deploy1001: Synchronized php: group1 wikis to 1.35.0-wmf.5 refs T233853 (duration: 01m 00s)
  • 20:33 twentyafterfour@deploy1001: rebuilt and synchronized wikiversions files: group1 wikis to 1.35.0-wmf.5 refs T233853
  • 19:05 mutante: mw1225 - re-enabling puppet (no reason given, nothing in SAL or Phab but disabled)
  • 18:43 mutante: LDAP - add dwisehaupt to wmf group (T235676)
  • 18:34 urbanecm@deploy1001: Synchronized wmf-config/CommonSettings.php: Fix typo (T222117) (duration: 01m 00s)
  • 18:28 urbanecm@deploy1001: Synchronized wmf-config/CommonSettings.php: SWAT: Instrument logging to ClosedWikiProvider (T222117) (duration: 01m 01s)
  • 17:22 jynus@cumin1001: dbctl commit (dc=all): 'Reduce db1126 weight, too much backlog', diff saved to https://phabricator.wikimedia.org/P9542 and previous config saved to /var/cache/conftool/dbconfig/20191106-172235-jynus.json
  • 17:21 ejegg: turned off donation queue consumer for financial_trxn record fix
  • 17:17 ejegg: updated Fundraising CiviCRM from 1c3be265ae to f1ce5c86f7
  • 17:15 jynus@deploy1001: Synchronized wmf-config/db-eqiad.php: repool es1019 fully (duration: 00m 59s)
  • 17:11 jforrester@deploy1001: Synchronized wmf-config/CommonSettings.php: Enable WebAuthn extension if wmgUseWebAuthn is set (false in all of production) T227242 (duration: 01m 00s)
  • 17:09 jforrester@deploy1001: Synchronized wmf-config/InitialiseSettings.php: Set wmgUseWebAuthn false in all of production T227242 (duration: 01m 01s)
  • 17:08 jynus@cumin1001: dbctl commit (dc=all): 'Repool db1074 fully', diff saved to https://phabricator.wikimedia.org/P9541 and previous config saved to /var/cache/conftool/dbconfig/20191106-170852-jynus.json
  • 16:11 mdholloway: MachineVision: Imported Freebase to Wikidata ID mappings on testcommonswiki (T227349)
  • 15:58 mdholloway: created MachineVision tables on testcommonswiki (T227349)
  • 15:52 mholloway-shell@deploy1001: Synchronized wmf-config/InitialiseSettings.php: Configure MachineVision and enable on testcommonswiki (T227349) (duration: 01m 00s)
  • 15:47 mholloway-shell@deploy1001: Synchronized wmf-config/CommonSettings.php: MachineVision: Use an HTTP proxy in production (T236843) (duration: 01m 01s)
  • 15:42 mholloway-shell@deploy1001: Synchronized wmf-config/InitialiseSettings-labs.php: MachineVision: Do not restrict to testing users on Beta (duration: 01m 00s)
  • 15:31 mholloway-shell@deploy1001: Synchronized wmf-config/InitialiseSettings-labs.php: MachineVision: Fix Beta config with updated service name (duration: 01m 02s)
  • 14:45 jynus@deploy1001: Synchronized wmf-config/db-eqiad.php: repool es1019 with low weight (duration: 00m 59s)
  • 14:41 mholloway-shell@deploy1001: Synchronized wmf-config/InitialiseSettings.php: WikimediaEditorTasks: Enable streaks and revert counts (T234955, T234956) (duration: 01m 00s)
  • 14:27 jynus: upgrade and restart es1019
  • 14:23 jynus@deploy1001: Synchronized wmf-config/db-eqiad.php: depool es1019 (duration: 01m 00s)
  • 14:07 jynus@cumin1001: dbctl commit (dc=all): 'Repool db1074 at 50%', diff saved to https://phabricator.wikimedia.org/P9539 and previous config saved to /var/cache/conftool/dbconfig/20191106-140702-jynus.json
  • 12:38 Urbanecm: EU SWAT done
  • 12:38 urbanecm@deploy1001: Synchronized wmf-config/InitialiseSettings.php: SWAT: a239b14: Allow certain users to create account at closed wikis (T222117; 2/2) (duration: 01m 00s)
  • 12:36 urbanecm@deploy1001: Synchronized wmf-config/CommonSettings.php: SWAT: a239b14: Allow certain users to create account at closed wikis (T222117; 1/2) (duration: 00m 59s)
  • 12:34 urbanecm@deploy1001: Synchronized wmf-config/InitialiseSettings.php: SWAT: 3e9ede0: Add 104 (Cookbook) to $wgContentNamespaces for bnwikibooks (T236840) (duration: 01m 00s)
  • 12:09 urbanecm@deploy1001: Synchronized wmf-config/InitialiseSettings.php: SWAT: 5875c45: [cirrus] Disable instant indexing on wikidata (duration: 01m 15s)
  • 11:57 jynus: upgrade and restart db2048
  • 11:35 jynus@cumin1001: dbctl commit (dc=all): 'Repool db1074 at 10%', diff saved to https://phabricator.wikimedia.org/P9537 and previous config saved to /var/cache/conftool/dbconfig/20191106-113510-jynus.json
  • 11:14 jynus: stopping db1074 for maintenance (will create temporary s2 lag on wikireplicas)
  • 11:06 jynus@cumin1001: dbctl commit (dc=all): 'Depool db1074', diff saved to https://phabricator.wikimedia.org/P9536 and previous config saved to /var/cache/conftool/dbconfig/20191106-110603-jynus.json
  • 09:46 moritzm: upgrading mw1262-mw1265,mw1276 servers to PHP 7.2.24 T237239
  • 09:33 jynus: stop and upgrade labsdb1011 T236015
  • 09:25 jynus: depooling labsdb1011 for wikireplica service T236015
  • 09:10 onimisionipe@deploy1001: Finished deploy [wdqs/wdqs@2cb2dde]: T233213 (duration: 11m 38s)
  • 08:58 onimisionipe@deploy1001: Started deploy [wdqs/wdqs@2cb2dde]: T233213
  • 08:51 jynus: upgrading wmf-mariadb101-client on cumin hosts
  • 08:51 moritzm: upgrading remaining mwdebug* servers to PHP 7.2.24 T237239
  • 08:33 jynus: upgrading db2102 mariadb (test-s1)
  • 07:48 onimisionipe@deploy1001: Finished deploy [wdqs/wdqs@2cb2dde]: T233213 (duration: 11m 38s)
  • 07:37 onimisionipe@deploy1001: Started deploy [wdqs/wdqs@2cb2dde]: T233213
  • 02:59 vgutierrez: Switch from nginx to ats-tls on cp5012 - T231627
  • 00:07 mdholloway: created table wikimedia_editor_tasks_edit_streak on x1/wikishared (T234956)

2019-11-05

  • 23:32 twentyafterfour@deploy1001: rebuilt and synchronized wikiversions files: group0 wikis to 1.35.0-wmf.5 refs T233853
  • 23:25 twentyafterfour@deploy1001: Finished scap: testwikis wikis to 1.35.0-wmf.5 refs T233853 (duration: 24m 13s)
  • 23:01 twentyafterfour@deploy1001: Started scap: testwikis wikis to 1.35.0-wmf.5 refs T233853
  • 22:51 twentyafterfour@deploy1001: scap failed: CalledProcessError Command 'cp -r "/tmp/scap_l10n_2905573311"/* "/srv/mediawiki-staging/php-1.35.0-wmf.5/cache/l10n"' returned non-zero exit status 1 (duration: 01m 26s)
  • 22:50 twentyafterfour@deploy1001: Started scap: testwikis wikis to 1.35.0-wmf.5 refs T233853
  • 22:39 twentyafterfour@deploy1001: scap failed: CalledProcessError Command '/usr/local/bin/mwscript rebuildLocalisationCache.php --wiki="testwiki" --outdir="/tmp/scap_l10n_2076118383" --threads=30 --lang en --quiet' returned non-zero exit status 1 (duration: 01m 26s)
  • 22:38 twentyafterfour@deploy1001: Started scap: testwikis wikis to 1.35.0-wmf.5 refs T233853
  • 22:17 twentyafterfour: scap failed with error: A copy of your installation's LocalSettings.php must exist and be readable in the source directory. Use --conf to specify it. refs T233853
  • 22:09 twentyafterfour@deploy1001: scap failed: CalledProcessError Command '/usr/local/bin/mwscript rebuildLocalisationCache.php --wiki="testwiki" --outdir="/tmp/scap_l10n_840646293" --threads=30 --lang en --quiet' returned non-zero exit status 1 (duration: 04m 54s)
  • 22:04 twentyafterfour@deploy1001: Started scap: testwikis wikis to 1.35.0-wmf.5 refs T233853
  • 22:03 XioNoX: remove 127.0.0.1/32 and ::1/128 from cr2-esams:lo0.0
  • 21:58 XioNoX: remove 127.0.0.1/32 and ::1/128 from cr3-esams:lo0.0
  • 20:45 mutante: shutting down cobalt (formerly gerrit server)
  • 20:44 dzahn@cumin1001: END (PASS) - Cookbook sre.hosts.decommission (exit_code=0)
  • 20:43 dzahn@cumin1001: START - Cookbook sre.hosts.decommission
  • 20:33 XioNoX: push fw policies to pfw3-eqiad - T236201
  • 20:23 XioNoX: push fw policies to pfw3-codfw - T236201
  • 20:17 joal@deploy1001: Finished deploy [analytics/refinery@ea631bd]: Analytics deploy for spark upgrade - forgotten patch (duration: 08m 21s)
  • 20:09 joal@deploy1001: Started deploy [analytics/refinery@ea631bd]: Analytics deploy for spark upgrade - forgotten patch
  • 20:08 joal@deploy1001: Finished deploy [analytics/refinery@8013a86]: Analytics deploy for spark upgrade (duration: 08m 49s)
  • 20:00 joal@deploy1001: Started deploy [analytics/refinery@8013a86]: Analytics deploy for spark upgrade
  • 18:40 twentyafterfour: MediaWiki train: start branching wmf/1.35.0-wmf.5
  • 18:30 XioNoX: fix typo on cr1-eqsin:lo0.0 v6 IP
  • 18:27 ejegg: updated payments-wiki from 0de9d96208 to aac3d93f70
  • 17:21 jynus: restarting etherpad
  • 16:56 arturo: deleted stretch-wikimedia/thirdparty/kubeadm-k8s and created buster-wikimedia/thirdparty/kubeadm-k8s
  • 16:24 papaul: Replacing disk on db2120
  • 15:37 jynus: deploying schema change on x1 T234955
  • 15:20 ema: cp4027: upgrade trafficserver to 8.0.5-1wm10
  • 14:37 jynus: reducing consistency temporarilly on db1114 so it can catch up replication
  • 11:58 aborrero@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
  • 11:58 aborrero@cumin1001: START - Cookbook sre.hosts.downtime
  • 11:58 aborrero@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
  • 11:58 aborrero@cumin1001: START - Cookbook sre.hosts.downtime
  • 11:58 aborrero@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
  • 11:58 aborrero@cumin1001: START - Cookbook sre.hosts.downtime
  • 11:58 aborrero@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
  • 11:58 aborrero@cumin1001: START - Cookbook sre.hosts.downtime
  • 11:57 aborrero@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
  • 11:57 aborrero@cumin1001: START - Cookbook sre.hosts.downtime
  • 10:59 ema: pool cp5012 with ATS backend T227432
  • 10:45 vgutierrez: restarting atsmtail@backend on cp5006
  • 09:36 ema@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
  • 09:34 ema@cumin1001: START - Cookbook sre.hosts.downtime
  • 09:24 ema: wb2-phab stopped saying things a while ago. Restarted
  • 09:18 jynus: restart dbprov100[12] T236924
  • 09:11 jynus: restart dbprov2001 T236924
  • 08:12 vgutierrez: uploaded fifo-log-demux 0.6 to apt.wikimedia.org (stretch)
  • 08:02 jynus: redact mnwwiki on db1124 and db2094 T235743
  • 04:30 vgutierrez: Switch from nginx to ats-tls on cp5011 - T231627
  • 04:13 vgutierrez: Switch from nginx to ats-tls on cp5010 - T231627
  • 03:51 vgutierrez: pooling cp3057 - T237348
  • 03:46 mutante: wdqs1004 restarting wdqs-blazegraph
  • 03:01 vgutierrez@puppetmaster1001: conftool action : set/pooled=no; selector: name=cp3057.esams.wmnet
  • 02:59 vgutierrez: depool cp3057 - T237348
  • 00:15 mutante: gerrit - restarting service to re-enable jgit gc (T217497)
  • 00:13 mutante: gerrit2001 - restart gerrit (replica)

2019-11-04

  • 23:18 milimetric@deploy1001: Finished deploy [analytics/refinery@99f1535]: Fix for geoeditors jobs (duration: 07m 20s)
  • 23:11 milimetric@deploy1001: Started deploy [analytics/refinery@99f1535]: Fix for geoeditors jobs
  • 23:05 gehel@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
  • 23:03 gehel@cumin1001: START - Cookbook sre.hosts.downtime
  • 22:08 bd808: The Wikimedia SAL Twitter feed is now @wikimedia_sal (https://twitter.com/wikimedia_sal) T237322
  • 20:51 bd808: Testing twitter feed following account confirmation
  • 19:23 Urbanecm: Morning SWAT done
  • 19:17 mutante: cobalt - stopping services, removing apache2
  • 19:17 urbanecm@deploy1001: Synchronized wmf-config/throttle.php: SWAT: 6a4b966: Add throttle rule for bard college editathon (T236955) (duration: 00m 54s)
  • 19:07 urbanecm@deploy1001: Synchronized wmf-config/InitialiseSettings.php: SWAT: 9204768: Enable DNS blacklist for es.wikinews (T237151) (duration: 00m 53s)
  • 19:05 urbanecm@deploy1001: Synchronized wmf-config/CommonSettings.php: SWAT: 0fc3909: Allow FlaggedRevs autoreview permission to be assigned globally (duration: 00m 54s)
  • 18:30 andrew@deploy1001: Finished deploy [horizon/deploy@1ac26da]: add new user-selected puppet edit mode (duration: 03m 27s)
  • 18:26 andrew@deploy1001: Started deploy [horizon/deploy@1ac26da]: add new user-selected puppet edit mode
  • 18:24 ppchelko@deploy1001: Finished deploy [restbase/deploy@20c710d]: Bump Parsoid-PHP mirroring to 100% T235902 (duration: 14m 30s)
  • 18:17 onimisionipe@deploy1001: Finished deploy [wdqs/wdqs@2cb2dde]: Event logging via Event Gate and Absolute classpath for munge and runUpdate scripts (duration: 12m 07s)
  • 18:09 ppchelko@deploy1001: Started deploy [restbase/deploy@20c710d]: Bump Parsoid-PHP mirroring to 100% T235902
  • 18:05 onimisionipe@deploy1001: Started deploy [wdqs/wdqs@2cb2dde]: Event logging via Event Gate and Absolute classpath for munge and runUpdate scripts
  • 17:41 jforrester@deploy1001: Synchronized multiversion/MWConfigCacheGenerator.php: Update for YAML-reading (offline) (duration: 00m 52s)
  • 17:39 jforrester@deploy1001: Synchronized wmf-config/config/: Sync out YAML config files (duration: 00m 56s)
  • 15:43 mholloway-shell@deploy1001: Synchronized wmf-config/InitialiseSettings-labs.php: Enable revert counts on beta (T234955) (duration: 00m 53s)
  • 15:36 jynus: running failing check_private_data report on labsdb1009 T235743
  • 15:33 ladsgroup@deploy1001: Synchronized php-1.35.0-wmf.4/extensions/Wikibase: Wikibase deadlock reduction, Stop locking and use DISTINCT when finding used terms to delete (T236466) (duration: 00m 59s)
  • 15:01 joal@deploy1001: Started restart [analytics/aqs/deploy@59a97fa]: (no justification provided)
  • 14:36 ema@cumin1001: END (FAIL) - Cookbook sre.hosts.downtime (exit_code=99)
  • 14:36 ema@cumin1001: START - Cookbook sre.hosts.downtime
  • 13:53 ema: upload trafficserver 8.0.5-1wm10 to stretch-wikimedia
  • 13:49 ema@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
  • 13:47 ema@cumin1001: START - Cookbook sre.hosts.downtime
  • 13:38 elukey: update bacula terms on analytics-in{4,6} filters on cr{1,2}-eqiad - T237016
  • 13:28 jbond42: update production puppetmasters to use new puppetdb servers
  • 13:20 Amir1: Creating Mon Wikipedia is done T235739
  • 13:19 ladsgroup@deploy1001: Synchronized wmf-config/interwiki.php: Update interwiki cache (duration: 02m 39s)
  • 13:16 ladsgroup@deploy1001: Synchronized langlist: T235739 (duration: 00m 52s)
  • 13:15 ladsgroup@deploy1001: Synchronized static/images/project-logos/: T235739 (duration: 00m 53s)
  • 13:14 ladsgroup@deploy1001: Synchronized wmf-config/InitialiseSettings.php: T235739 (duration: 00m 53s)
  • 13:13 ladsgroup@deploy1001: Synchronized multiversion/MWMultiVersion.php: T235739 (duration: 00m 52s)
  • 13:12 ladsgroup@deploy1001: rebuilt and synchronized wikiversions files: T235739
  • 13:07 ladsgroup@deploy1001: Synchronized dblists: (no justification provided) (duration: 00m 53s)
  • 13:06 ema: depool cp5012 and reimage as text_ats T227432
  • 12:21 Urbanecm: EU SWAT done
  • 12:14 urbanecm@deploy1001: Synchronized wmf-config/InitialiseSettings.php: SWAT: 7c1c64c: Add localized Minerva wordmark for Sindhi Wikipedia (T200870; 2/2) (duration: 00m 52s)
  • 12:12 Urbanecm: Purge https://en.wikipedia.org/static/images/project-logos/zh_classicalwiki* (T236905)
  • 12:11 urbanecm@deploy1001: Synchronized static/images/mobile/copyright/: SWAT: 7c1c64c: Add localized Minerva wordmark for Sindhi Wikipedia (T200870; 1/2) (duration: 00m 53s)
  • 12:08 urbanecm@deploy1001: Synchronized static/images/project-logos/: SWAT: a6d64b1: Update logo for zh-classical Wikipedia (T236905) (duration: 00m 53s)
  • 12:05 urbanecm@deploy1001: Synchronized wmf-config/InitialiseSettings.php: SWAT: c92a13c: Enable partial blocks on kowiki (T236752) (duration: 00m 54s)
  • 12:00 moritzm: upgrading mw1261 to PHP 7.2.24 (T237239)
  • 11:39 jdrewniak@deploy1001: Synchronized portals: Wikimedia Portals Update: Bumping portals to master (T128546) (duration: 00m 52s)
  • 11:38 jdrewniak@deploy1001: Synchronized portals/wikipedia.org/assets: Wikimedia Portals Update: Bumping portals to master (T128546) (duration: 01m 03s)
  • 11:08 moritzm: uploaded PHP 7.2.24 to apt.wikimedia.org stretch-wikimedia/component/php72 (T237239)
  • 04:53 vgutierrez: Switch from nginx to ats-tls on cp5009 - T231627
  • 04:39 vgutierrez: Switch from nginx to ats-tls on cp5008 - T231627

2019-11-03

  • 03:54 andrew@deploy1001: Finished deploy [horizon/deploy@0c024d4]: one more prefix fix (duration: 03m 35s)
  • 03:50 andrew@deploy1001: Started deploy [horizon/deploy@0c024d4]: one more prefix fix
  • 03:10 andrew@deploy1001: Finished deploy [horizon/deploy@9972ed2]: deploying fix for puppet prefix creation (second try) (duration: 00m 25s)
  • 03:10 andrew@deploy1001: Started deploy [horizon/deploy@9972ed2]: deploying fix for puppet prefix creation (second try)
  • 03:09 andrew@deploy1001: Finished deploy [horizon/deploy@9972ed2]: deploying fix for puppet prefix creation (duration: 06m 01s)
  • 03:03 andrew@deploy1001: Started deploy [horizon/deploy@9972ed2]: deploying fix for puppet prefix creation

2019-11-02

  • 00:58 mutante: gerrit-replica - created missing /var/lib/gerrit2/review_site/tmp and restarted service - service back up on buster (T176774)
  • 00:34 mutante: gerrit-replica - fixing permissions of files in /srv/gerrit and restarting
  • 00:27 mutante: gerrit2001 - copy mysql-connector-java.jar into /usr/share/java/ and link it into /var/lib/gerrit2/review_site/lib (T176774)
  • 00:05 mutante: rsyncing gerrit plugin dir from gerrit1001 to gerrit2001 (T176774)

2019-11-01

  • 23:45 mutante: rsyncing gerrit git data from gerrit1001 to gerrit2001 (using --delete too!) T176774
  • 22:00 mutante: gerrit - repo sync between gerrit and gerrit-replica in progress .. if you can't clone from replica you can use main gerrit and replica will come back
  • 21:20 jforrester@deploy1001: Synchronized php-1.35.0-wmf.4/extensions/UploadWizard/resources/mw.UploadWizardUploadInterface.js: T237126 Fixing DOM in upload interface of UploadWizard (duration: 00m 56s)
  • 21:06 mutante: scp /usr/share/java/mysql-connector-java.jar from gerrit1001 to gerrit2001 (T176774)
  • 20:46 cdanis: add to bot_blocked_nets the IPs of several EC2 instances sending expensive requests to ORES T237134
  • 19:54 dzahn@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
  • 19:52 dzahn@cumin1001: START - Cookbook sre.hosts.downtime
  • 19:37 mutante: gerrit2001 - reinstalling with buster
  • 19:03 volker-e@deploy1001: Finished deploy [design/style-guide@4abbc70]: Add wikimedia deployment (scap) configuration (duration: 00m 11s)
  • 19:03 volker-e@deploy1001: Started deploy [design/style-guide@4abbc70]: Add wikimedia deployment (scap) configuration
  • 16:39 XioNoX: push Add BGP_from_LVS policy and term vmhost to loopback4 filter to CRs
  • 16:37 ema: pool cp5011 with ATS backend T227432
  • 16:16 XioNoX: asw2-a-eqiad# run request system license add terminal
  • 15:39 moritzm: installing libonig security updates
  • 15:30 ema@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
  • 15:28 ema@cumin1001: START - Cookbook sre.hosts.downtime
  • 15:25 moritzm: installing libpcap security updates
  • 15:11 moritzm: installing python-ecdsa security updates
  • 14:34 ema@cumin1001: END (FAIL) - Cookbook sre.hosts.downtime (exit_code=99)
  • 14:34 ema@cumin1001: START - Cookbook sre.hosts.downtime
  • 14:05 ema: depool cp5011 and reimage as text_ats T227432
  • 14:02 moritzm: rebooting kafka-main1004 for microcode tests
  • 14:02 jmm@cumin2001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
  • 14:02 jmm@cumin2001: START - Cookbook sre.hosts.downtime
  • 12:56 moritzm: upgrading mwdebug2002 to PHP 7.2.24 for some smoke tests with the new build
  • 12:18 ema: pool cp5010 with ATS backend T227432
  • 11:59 ema@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
  • 11:56 ema@cumin1001: START - Cookbook sre.hosts.downtime
  • 11:21 ema: depool cp5010 and reimage as text_ats T227432
  • 11:08 effie: enable puppet mediawiki and prometheus servers
  • 10:54 effie: remove prometheus-hhvm-exporter package from mw* servers - T229792
  • 10:37 moritzm: installing clamav security updates on mendelevium
  • 10:33 effie: Disable puppet on mediawiki and prometheus servers to remove hhvm exporters - T229792
  • 09:28 moritzm: installing file security updates on jessie
  • 09:21 effie: depool mw1317
  • 09:19 moritzm: installing golang-1.11 security updates
  • 08:57 moritzm: installing ruby-loofah security updates
  • 08:17 moritzm: installing libarchive security updates
  • 01:58 volker-e@deploy1001: Finished deploy [design/style-guide@4d8d085]: deploying design/style-guide with mobile layout improvements (duration: 00m 05s)
  • 01:58 volker-e@deploy1001: Started deploy [design/style-guide@4d8d085]: deploying design/style-guide with mobile layout improvements
  • 01:21 jforrester@deploy1001: Synchronized php-1.35.0-wmf.4/resources/src/mediawiki.widgets/mw.widgets.UsersMultiselectWidget.js: T236460 mw.widgets.UsersMultiselectWidget: Fix property name (duration: 00m 54s)

2019-10-31

  • 23:33 Urbanecm: Evening SWAT done
  • 23:27 urbanecm@deploy1001: Synchronized php-1.35.0-wmf.4/extensions/CentralNotice/extension.json: SWAT: dcd3ec3: Fix error in CentralNoticeImpression schema (T236627) (duration: 00m 51s)
  • 23:24 urbanecm@deploy1001: Synchronized php-1.35.0-wmf.4/extensions/VisualEditor/: SWAT: 3686b82: Revert "Parse relative hrefs on image nodes like on regular links" (T237040) (duration: 00m 53s)
  • 23:21 urbanecm@deploy1001: Synchronized wmf-config/InitialiseSettings.php: SWAT: 02bf4b8: Re-enable mobile editor A/B testing (T236337) (duration: 00m 52s)
  • 23:13 Urbanecm: Purge https://en.wikipedia.org/static/images/project-logos/bawiki* (T237035)
  • 23:11 urbanecm@deploy1001: Synchronized static/images/project-logos/: SWAT: 54ee973: Change bawiki logo to an anniversary one (T237035) (duration: 00m 53s)
  • 23:04 eileen: civicrm revision changed from d2045c6b98 to 1183915bde, config revision is 1a709a61aa
  • 23:00 mutante: replacing deployment keys for apache2secmod ; re-arming keyholder on deployment server
  • 22:51 XioNoX: Homer push to cr1/2-eqiad
  • 22:17 XioNoX: Homer push to cr1/2-codfw
  • 22:14 twentyafterfour@deploy1001: Finished deploy [design/style-guide@4d8d085]: testing deploy_design (duration: 00m 06s)
  • 22:14 twentyafterfour@deploy1001: Started deploy [design/style-guide@4d8d085]: testing deploy_design
  • 22:12 mutante: vega sudo find /srv/deployment/design/ -uid 498 -exec chown deploy-design:deploy-design {} \;
  • 22:12 twentyafterfour@deploy1001: deploy aborted: testing deploy_design (duration: 05m 07s)
  • 22:12 mutante: bromine sudo find /srv/deployment/design/ -uid 498 -exec chown deploy-design:deploy-design {} \;
  • 22:07 twentyafterfour@deploy1001: Started deploy [design/style-guide@4d8d085]: testing deploy_design
  • 22:05 twentyafterfour@deploy1001: Finished deploy [design/style-guide@4d8d085]: testing deploy_design (duration: 01m 30s)
  • 22:04 twentyafterfour@deploy1001: Started deploy [design/style-guide@4d8d085]: testing deploy_design
  • 21:59 mutante: deploy1001 - recreating deploy_design deployment key as ED25519 and with the correct comment (the comment matters and must match path to the file for keyholder) (T235677)
  • 21:49 mutante: deploy1001 keyholder restart, keyholder arm ...
  • 21:46 mutante: deploy1001 - move apach2modsec deployment key out of keyholder dir, keyholder arm to reload all other deployment keys including the new one for design (T235677)
  • 21:32 ppchelko@deploy1001: Finished deploy [restbase/deploy@9cac9ac]: Bump Parsoid-PHP traffic mirroring to 50% T235902 (duration: 13m 44s)
  • 21:25 robh: setting up ps1-b8-eqiad per T227543. it will reboot twice in the next 15 minutes, and then should start to clear up in icinga
  • 21:18 ppchelko@deploy1001: Started deploy [restbase/deploy@9cac9ac]: Bump Parsoid-PHP traffic mirroring to 50% T235902
  • 20:35 XioNoX: Homer push to all cr2-eqdfw - new NTP servers, remove border-in4 term unused-ips, add (unused) BGP_Wikimedia_pops, re-order ospf interfaces
  • 20:27 shdubsh: restarting logstash on logstash1008 to test level->severity filter selector
  • 20:12 XioNoX: Homer push to all msw* - new NTP servers - T237011
  • 20:07 XioNoX: Homer push to all asw* - new NTP servers - T237011
  • 19:49 XioNoX: Homer push to eqsin
  • 19:49 mutante: rsyncing home dirs from previous gerrit server cobalt to gerrit1001
  • 19:36 fdans@deploy1001: Finished deploy [analytics/refinery@af91ce6]: deploying refinery, second attempt (duration: 06m 53s)
  • 19:31 XioNoX: Homer push to ulsfo
  • 19:29 fdans@deploy1001: Started deploy [analytics/refinery@af91ce6]: deploying refinery, second attempt
  • 19:08 brennen@deploy1001: rebuilt and synchronized wikiversions files: all wikis to 1.35.0-wmf.4
  • 18:22 Urbanecm: Morning SWAT done
  • 18:21 urbanecm@deploy1001: Synchronized php-1.35.0-wmf.4/extensions/CentralNotice: SWAT: 3e5b33f: Update CentralNoticeImpression scheme for campaign fallback (T236627) (duration: 00m 55s)
  • 18:20 urbanecm@deploy1001: Synchronized php-1.35.0-wmf.3/extensions/CentralNotice: SWAT: 963e963: Update CentralNoticeImpression scheme for campaign fallback (T236627) (duration: 01m 01s)
  • 18:12 urbanecm@deploy1001: Synchronized wmf-config/InitialiseSettings.php: SWAT: fe08fbb: Undeploy reader surveys in English, Polish, and Russian (T232525) (duration: 01m 02s)
  • 18:01 fdans@deploy1001: Finished deploy [analytics/refinery@8ca04df]: deploying refinery (duration: 01m 09s)
  • 18:00 fdans@deploy1001: Started deploy [analytics/refinery@8ca04df]: deploying refinery
  • 16:23 bd808: Our @wikimediatech Twitter account is soft blocked pending phone number verification. bd808 trying to figure out a good way to do that verification for a bot account.
  • 16:14 jynus: restart dbprov2002 after upgrade T236924
  • 16:09 jynus@cumin1001: dbctl commit (dc=all): 'Repool db1119, db1113 at 100%', diff saved to https://phabricator.wikimedia.org/P9513 and previous config saved to /var/cache/conftool/dbconfig/20191031-160925-jynus.json
  • 15:28 jgleeson: Updated paymentswiki from e28bc54e85 to 0de9d96208
  • 14:56 Urbanecm: Password reset for SUL user `Darth AK`
  • 14:50 jynus@cumin1001: dbctl commit (dc=all): 'Repool db1119 at 10%', diff saved to https://phabricator.wikimedia.org/P9512 and previous config saved to /var/cache/conftool/dbconfig/20191031-145010-jynus.json
  • 14:28 jynus: reloading ferm on db1119
  • 14:24 jynus@cumin1001: dbctl commit (dc=all): 'Depool db1119', diff saved to https://phabricator.wikimedia.org/P9511 and previous config saved to /var/cache/conftool/dbconfig/20191031-142455-jynus.json
  • 13:40 effie: upload xdebug 2.7.0-1+wmf2 to component/php72 - T234418
  • 13:21 jynus@deploy1001: Synchronized wmf-config/db-eqiad.php: repool pc1008 T227543 (duration: 01m 02s)
  • 13:16 jynus@cumin1001: dbctl commit (dc=all): 'Repool db1119, db1113 at 10% T227543', diff saved to https://phabricator.wikimedia.org/P9509 and previous config saved to /var/cache/conftool/dbconfig/20191031-131606-jynus.json
  • 11:48 jynus: setting pc1008 as a replica of active pc1010
  • 11:43 jynus@deploy1001: Synchronized wmf-config/db-eqiad.php: depooling pc1008 T227543 (duration: 01m 01s)
  • 11:37 jynus@cumin1001: dbctl commit (dc=all): 'Depool db1119, db1113 T227543', diff saved to https://phabricator.wikimedia.org/P9507 and previous config saved to /var/cache/conftool/dbconfig/20191031-113659-jynus.json
  • 11:24 Urbanecm: EU SWAT done
  • 11:23 urbanecm@deploy1001: Synchronized php-1.35.0-wmf.4/extensions/ProofreadPage/: SWAT: e0d5ce9: Add page navigation tabs in correct order skin-side and remove js requirement for Vector tab icons (T231250); ed17da2: Makes sure that Vector default background does not override the navigation arrows (T236969) (duration: 01m 02s)
  • 11:07 kartik@deploy1001: Synchronized wmf-config/InitialiseSettings.php: SWAT: 547086|Enable ContentTranslation out of Beta in Albanian WP (T236064) (duration: 01m 02s)
  • 11:03 ema: cp5008: restart ats-be to clear "backend process restarted" alert
  • 11:00 oblivian@deploy1001: helmfile [STAGING] Ran 'apply' command on namespace 'blubberoid' for release 'staging' .
  • 10:59 aborrero@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
  • 10:59 aborrero@cumin1001: START - Cookbook sre.hosts.downtime
  • 10:59 aborrero@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
  • 10:59 aborrero@cumin1001: START - Cookbook sre.hosts.downtime
  • 10:54 godog: bounce logstash on logstash2004
  • 10:39 oblivian@deploy1001: helmfile [STAGING] Ran 'apply' command on namespace 'blubberoid' for release 'staging' .
  • 10:38 ema: pool cp5009 with ATS backend T227432
  • 10:37 oblivian@deploy1001: helmfile [STAGING] Ran 'apply' command on namespace 'blubberoid' for release 'staging' .
  • 10:35 oblivian@deploy1001: helmfile [STAGING] Ran 'sync' command on namespace 'blubberoid' for release 'staging' .
  • 10:30 oblivian@deploy1001: helmfile [STAGING] Ran 'apply' command on namespace 'blubberoid' for release 'staging' .
  • 10:29 oblivian@deploy1001: helmfile [STAGING] Ran 'apply' command on namespace 'blubberoid' for release 'staging' .
  • 10:19 oblivian@deploy1001: helmfile [STAGING] Ran 'apply' command on namespace 'blubberoid' for release 'staging' .
  • 10:18 oblivian@deploy1001: helmfile [STAGING] Ran 'apply' command on namespace 'blubberoid' for release 'staging' .
  • 10:13 godog: bounce logstash on logstash2004
  • 10:07 oblivian@deploy1001: helmfile [STAGING] Ran 'apply' command on namespace 'blubberoid' for release 'staging' .
  • 10:05 oblivian@deploy1001: helmfile [STAGING] Ran 'apply' command on namespace 'blubberoid' for release 'staging' .
  • 09:46 ema@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
  • 09:43 ema@cumin1001: START - Cookbook sre.hosts.downtime
  • 09:37 godog: temporarily stop logstash on logstash2005 to test performance with two ingesters only - T215904
  • 09:23 godog: temporarily stop logstash on logstash2006 to test performance with two ingesters only - T215904
  • 09:10 ema: depool cp5009 and reimage as text_ats T227432
  • 08:25 ariel@deploy1001: Finished deploy [dumps/dumps@f2b6d78]: couple of fixup scripts, bug fix for incr dumps index.html generation (duration: 00m 03s)
  • 08:25 ariel@deploy1001: Started deploy [dumps/dumps@f2b6d78]: couple of fixup scripts, bug fix for incr dumps index.html generation
  • 06:37 elukey: upgrade cergen to 0.2.5 on puppetmaster1001
  • 03:44 vgutierrez: switch from nginx to ats-tls on cp4032 - T231627
  • 03:09 vgutierrez: switch from nginx to ats-tls on cp4031 - T231627
  • 02:51 vgutierrez: switch from nginx to ats-tls on cp4030 - T231627
  • 01:41 eileen: civicrm revision changed from 0547c84f73 to d2045c6b98, config revision is 1a709a61aa (looks like patch was still hung in gerrit last time)
  • 01:34 eileen: civicrm revision is 0547c84f73, config revision is 1a709a61aa - that should stop those failmails
  • 00:40 jforrester@deploy1001: Synchronized php-1.35.0-wmf.4/extensions/WikiLove/resources/ext.wikiLove.icon.vector.css: T236958 Fix Vector icon after upstream change (duration: 01m 02s)
  • 00:38 eileen: civicrm revision changed from a55c2d2787 to 0547c84f73, config revision is 1a709a61aa

2019-10-30

  • 23:21 ejegg: updated fundraising python tools from ffc7bf764b to a93eec292d
  • 23:08 XioNoX: power cycle cr3-esams re1 - T236598
  • 22:29 mutante: scandium - live hack /srv/mediawiki/wmf-config/InitialiseSettings.php - set wmgMemoryLimit to 850 (*1024 *1024), restart php7.2-fpm (T236833)
  • 22:22 andrew@deploy1001: Finished deploy [horizon/deploy@2d551d8]: Rolling out a currently-turned-off puppet edit mode (duration: 03m 15s)
  • 22:19 andrew@deploy1001: Started deploy [horizon/deploy@2d551d8]: Rolling out a currently-turned-off puppet edit mode
  • 22:09 ppchelko@deploy1001: Finished deploy [restbase/deploy@fa934c8]: Bump parsoid mirroring to 25% and fix 412: T235902, T236837 (duration: 13m 54s)
  • 21:55 ppchelko@deploy1001: Started deploy [restbase/deploy@fa934c8]: Bump parsoid mirroring to 25% and fix 412: T235902, T236837
  • 21:31 ppchelko@deploy1001: Finished deploy [restbase/deploy@88cf547]: Parsoid mirroring followups: T236837, T236838 (duration: 14m 04s)
  • 21:17 ppchelko@deploy1001: Started deploy [restbase/deploy@88cf547]: Parsoid mirroring followups: T236837, T236838
  • 20:47 twentyafterfour@deploy1001: Finished deploy [design/style-guide@4d8d085]: (no justification provided) (duration: 00m 03s)
  • 20:47 twentyafterfour@deploy1001: Started deploy [design/style-guide@4d8d085]: (no justification provided)
  • 20:46 twentyafterfour@deploy1001: Finished deploy [design/style-guide@4d8d085]: (no justification provided) (duration: 00m 04s)
  • 20:46 twentyafterfour@deploy1001: Started deploy [design/style-guide@4d8d085]: (no justification provided)
  • 20:42 arlolra: Updated Parsoid to 5ac1623 (T235656, T233818, T234549, T227209, T236112)
  • 20:29 otto@deploy1001: Synchronized wmf-config/LabsServices.php: Syncing LabsServices.php change for beta eventgate instance replacement (duration: 01m 01s)
  • 20:28 arlolra@deploy1001: Finished deploy [parsoid/deploy@a69ec92]: Updating Parsoid to 5ac1623 (duration: 09m 10s)
  • 20:25 twentyafterfour@deploy1001: Finished deploy [design/style-guide@4d8d085]: (no justification provided) (duration: 00m 18s)
  • 20:24 twentyafterfour@deploy1001: Started deploy [design/style-guide@4d8d085]: (no justification provided)
  • 20:19 arlolra@deploy1001: Started deploy [parsoid/deploy@a69ec92]: Updating Parsoid to 5ac1623
  • 20:17 mholloway-shell@deploy1001: Synchronized wmf-config/InitialiseSettings-labs.php: WikimediaEditorTasks: Enable edit streaks on beta (duration: 01m 03s)
  • 20:11 twentyafterfour@deploy1001: Finished deploy [design/style-guide@4d8d085]: (no justification provided) (duration: 00m 03s)
  • 20:11 twentyafterfour@deploy1001: Started deploy [design/style-guide@4d8d085]: (no justification provided)
  • 20:10 twentyafterfour@deploy1001: Finished deploy [design/style-guide@4d8d085]: (no justification provided) (duration: 00m 51s)
  • 20:09 twentyafterfour@deploy1001: Started deploy [design/style-guide@4d8d085]: (no justification provided)
  • 20:07 twentyafterfour@deploy1001: deploy aborted: (no justification provided) (duration: 00m 07s)
  • 20:07 twentyafterfour@deploy1001: Started deploy [design/style-guide@4d8d085]: (no justification provided)
  • 20:06 twentyafterfour@deploy1001: Finished deploy [design/style-guide@4d8d085]: (no justification provided) (duration: 00m 23s)
  • 20:06 twentyafterfour@deploy1001: Started deploy [design/style-guide@4d8d085]: (no justification provided)
  • 20:04 twentyafterfour@deploy1001: Finished deploy [design/style-guide@4d8d085]: (no justification provided) (duration: 00m 05s)
  • 20:03 twentyafterfour@deploy1001: Started deploy [design/style-guide@4d8d085]: (no justification provided)
  • 19:35 gehel@cumin1001: END (PASS) - Cookbook sre.elasticsearch.rolling-upgrade (exit_code=0)
  • 19:06 brennen@deploy1001: Synchronized php: group1 wikis to 1.35.0-wmf.4 (duration: 01m 00s)
  • 19:05 brennen@deploy1001: rebuilt and synchronized wikiversions files: group1 wikis to 1.35.0-wmf.4
  • 19:05 mutante: moscovium - stop and remove rsync server, purge rsync package T180641
  • 18:33 jforrester@deploy1001: Synchronized wmf-config/InitialiseSettings.php: T222851 Migrate to Kask for Echo seen-time storage (duration: 01m 01s)
  • 17:43 elukey: upload cergen 0.2.5-1+deb10u1 to buster-wikimedia component/cergen
  • 17:41 elukey: run reprepro clearvanished on install1002 to clean leftovers of buster-wikimedia|thirdparty/elastic7
  • 17:37 twentyafterfour@deploy1001: Finished deploy [design/style-guide@4d8d085]: (no justification provided) (duration: 00m 04s)
  • 17:37 twentyafterfour@deploy1001: Started deploy [design/style-guide@4d8d085]: (no justification provided)
  • 17:29 ladsgroup@deploy1001: Synchronized php-1.35.0-wmf.4/extensions/Wikibase: Revert 16:05 UTC T236928 (duration: 01m 05s)
  • 17:26 ladsgroup@deploy1001: Synchronized php-1.35.0-wmf.3/extensions/Wikibase: Revert 16:02 UTC T236928 (duration: 01m 04s)
  • 16:59 jynus: killed rebuildItemTerms on mwmaint1002
  • 16:05 ladsgroup@deploy1001: Synchronized php-1.35.0-wmf.4/extensions/Wikibase: Wikibase deadlock reduction, Stop locking and use DISTINCT when finding used terms to delete (T234948) (duration: 01m 04s)
  • 16:02 ladsgroup@deploy1001: Synchronized php-1.35.0-wmf.3/extensions/Wikibase: Wikibase deadlock reduction, Stop locking and use DISTINCT when finding used terms to delete (T236466) (duration: 01m 05s)
  • 15:48 godog: roll restart logstash after https://gerrit.wikimedia.org/r/c/operations/puppet/+/544217
  • 15:46 ladsgroup@deploy1001: Synchronized php-1.35.0-wmf.3/extensions/Wikibase: Wikibase deadlock reduction, Shorten out when there is nothing to clean up (T236466) (duration: 01m 06s)
  • 15:41 ladsgroup@deploy1001: Synchronized php-1.35.0-wmf.4/extensions/Wikibase: Wikibase deadlock reduction, Shorten out when there is nothing to clean up (T236466) (duration: 01m 05s)
  • 15:36 gehel@cumin1001: START - Cookbook sre.elasticsearch.rolling-upgrade
  • 15:29 gehel@cumin1001: END (ERROR) - Cookbook sre.elasticsearch.rolling-upgrade (exit_code=97)
  • 15:23 gehel: shutting down elastic1039 to be ready for disk swap - T236601
  • 15:10 effie: enable-puppet in mw* hosts
  • 15:02 gehel@cumin1001: START - Cookbook sre.elasticsearch.rolling-upgrade
  • 14:50 jforrester@deploy1001: Synchronized wmf-config/CommonSettings.php: T210174 Load Wikisource extension when wmgUseWikisource is true (duration: 01m 01s)
  • 14:48 jforrester@deploy1001: Synchronized wmf-config/InitialiseSettings.php: T236502 Define wmgUseWikisource as default-false (duration: 01m 22s)
  • 14:40 ema: pool cp5008 with ATS backend T227432
  • 14:32 effie: disable puppet on all mw* hosts
  • 14:20 gehel@cumin1001: END (FAIL) - Cookbook sre.elasticsearch.rolling-upgrade (exit_code=99)
  • 14:19 ema@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
  • 14:15 ema@cumin1001: START - Cookbook sre.hosts.downtime
  • 14:04 gehel@cumin1001: END (PASS) - Cookbook sre.elasticsearch.force-shard-allocation (exit_code=0)
  • 14:04 gehel@cumin1001: START - Cookbook sre.elasticsearch.force-shard-allocation
  • 14:04 gehel@cumin1001: START - Cookbook sre.elasticsearch.rolling-upgrade
  • 13:39 andrew@deploy1001: Finished deploy [horizon/deploy@53028ab]: Rolling out improvments to the puppet git archiver (duration: 03m 38s)
  • 13:36 andrew@deploy1001: Started deploy [horizon/deploy@53028ab]: Rolling out improvments to the puppet git archiver
  • 12:59 cdanis@cumin1001: conftool action : set/pooled=inactive; selector: name=cp5008.eqsin.wmnet
  • 12:58 moritzm: rolling restart of slapd to pick up LDAP schema change
  • 12:57 cdanis@cumin1001: conftool action : set/pooled=no; selector: name=cp5008.eqsin.wmnet
  • 12:50 arturo: updating package versions in install1002 for thirdparty/kubeadm-k8s stretch-wikimedia (T236824)
  • 12:23 ema@cumin1001: END (FAIL) - Cookbook sre.hosts.downtime (exit_code=99)
  • 12:22 ema@cumin1001: START - Cookbook sre.hosts.downtime
  • 11:49 moritzm: temporarily disabling puppet on LDAP servers for a schema change
  • 11:42 ema: depool cp5008 and reimage as text_ats T227432
  • 11:37 gehel@cumin2001: END (PASS) - Cookbook sre.elasticsearch.rolling-upgrade (exit_code=0)
  • 11:31 mlitn@deploy1001: Synchronized wmf-config/InitialiseSettings.php: Increase rate limits for newbie non-ip users on Commons (duration: 01m 01s)
  • 11:13 Urbanecm: EU SWAT done
  • 11:12 Urbanecm: Synchronized wmf-config/InitialiseSettings.php: SWAT: 61cb77c: Re-apply: MCR: Set testwiki to use the new MCR-only schema (T198558) (duration: 00m 59s)
  • 10:07 jynus: restarting bacula-dir, bacula-sd on backup1001 T236406
  • 09:46 vgutierrez: Switch from nginx to ats-tls on cp4029 - T231627
  • 09:34 vgutierrez: Switch from nginx to ats-tls on cp4028 - T231627
  • 09:25 gehel@cumin1001: END (FAIL) - Cookbook sre.wdqs.data-reload (exit_code=99)
  • 08:51 gehel@cumin1001: START - Cookbook sre.wdqs.data-reload
  • 08:45 gehel@cumin2001: START - Cookbook sre.elasticsearch.rolling-upgrade
  • 08:25 moritzm: installing php7.0 security updates
  • 07:58 oblivian@deploy1001: helmfile [CODFW] Ran 'apply' command on namespace 'blubberoid' for release 'production' .
  • 07:57 oblivian@deploy1001: helmfile [EQIAD] Ran 'apply' command on namespace 'blubberoid' for release 'production' .
  • 05:58 vgutierrez: Rolling restart of ats-tls to get rid of leaked sockets and benefit from the lower inactivity timeout - T236458
  • 04:24 vgutierrez: restarting ats-tls on cp4027 with half open disabled - T236458
  • 03:09 vgutierrez: Rolling restart of prometheus-exporter-trafficserver-tls - T236458
  • 02:40 vgutierrez: restarting ats-tls on cp3050 with half open disabled - T236458
  • 00:54 dzahn@cumin1001: conftool action : set/pooled=yes; selector: name=wtp1025.eqiad.wmnet,service=parsoid-php

2019-10-29

  • 23:42 dzahn@cumin1001: conftool action : set/pooled=no; selector: name=wtp1025.eqiad.wmnet,service=parsoid-php
  • 23:09 mutante: ganeti1003 - gnt-instance remove ununpentium.wikimedia.org (T236748)
  • 23:05 Urbanecm: Evening SWAT done
  • 23:05 Urbanecm: Purge https://en.wikipedia.org/static/images/project-logos/atjwiki* (T236777)
  • 23:04 urbanecm@deploy1001: Synchronized static/images/project-logos/: SWAT: f7b9972: Revert "Milestone lobo for atjwiki" (T236777) (duration: 01m 01s)
  • 22:26 dzahn@cumin1001: END (FAIL) - Cookbook sre.hosts.decommission (exit_code=1)
  • 22:24 dzahn@cumin1001: START - Cookbook sre.hosts.decommission
  • 22:17 mutante: ununpentium - shutdown Ganeti VM - running decom script, schedule icinga downtime (T236748)
  • 22:14 mutante: rsynced data dump and config from ununpentium to moscovium in /srv/ before shutting down the old server (T180641)
  • 20:43 papaul: rebooting cp3056 for HW check
  • 20:19 Trey314159: reindexing Slovak wikis on elastic@eqiad and elastic@codfw complete (T235654)
  • 19:42 andrew@deploy1001: Finished deploy [horizon/deploy@dbe892e]: (no justification provided) (duration: 03m 59s)
  • 19:38 andrew@deploy1001: Started deploy [horizon/deploy@dbe892e]: (no justification provided)
  • 19:32 jynus: restarting bacula-fd on install1002 T236406
  • 19:31 andrew@deploy1001: Finished deploy [horizon/deploy@bab5d37]: (no justification provided) (duration: 01m 35s)
  • 19:30 andrew@deploy1001: Started deploy [horizon/deploy@bab5d37]: (no justification provided)
  • 19:25 brennen@deploy1001: rebuilt and synchronized wikiversions files: group0 to 1.35.0-wmf.4
  • 19:14 brennen@deploy1001: Finished scap: testwiki to php-1.35.0-wmf.4 and rebuild l10n cache (duration: 21m 11s)
  • 18:54 jynus@cumin1001: dbctl commit (dc=all): 'Revert state to before overload+maintenance', diff saved to https://phabricator.wikimedia.org/P9501 and previous config saved to /var/cache/conftool/dbconfig/20191029-185438-jynus.json
  • 18:53 brennen@deploy1001: Started scap: testwiki to php-1.35.0-wmf.4 and rebuild l10n cache
  • 18:53 Trey314159: reindexing Slovak wikis on elastic@eqiad and elastic@codfw (T235654)
  • 18:50 brennen@deploy1001: Pruned MediaWiki: 1.35.0-wmf.1 (duration: 08m 09s)
  • 18:21 ppchelko@deploy1001: Finished deploy [restbase/deploy@cf80130]: Mirror 10% of /page/html/ traffic to Parsoid/PHP T235902 (duration: 14m 13s)
  • 18:07 ppchelko@deploy1001: Started deploy [restbase/deploy@cf80130]: Mirror 10% of /page/html/ traffic to Parsoid/PHP T235902
  • 17:42 brennen: cutting branch for 1.35.0-wmf.4
  • 17:38 mutante: phab1001 - upgrading php7.3 packages
  • 17:34 mutante: phab2001 - upgrading PHP packages
  • 17:06 jynus@cumin1001: dbctl commit (dc=all): 'repool db1099 both instances fully to increase redundancy', diff saved to https://phabricator.wikimedia.org/P9499 and previous config saved to /var/cache/conftool/dbconfig/20191029-170648-jynus.json
  • 16:56 jynus@cumin1001: dbctl commit (dc=all): 'depool fully db1105:3311, stability/lag issues', diff saved to https://phabricator.wikimedia.org/P9498 and previous config saved to /var/cache/conftool/dbconfig/20191029-165633-jynus.json
  • 16:52 ssastry@deploy1001: Finished deploy [parsoid/deploy@aa59ce3]: Update parsoid to 089bf28d (duration: 09m 35s)
  • 16:46 jynus@cumin1001: dbctl commit (dc=all): 'pool db1106 into s1 rcs', diff saved to https://phabricator.wikimedia.org/P9497 and previous config saved to /var/cache/conftool/dbconfig/20191029-164640-jynus.json
  • 16:43 ssastry@deploy1001: Started deploy [parsoid/deploy@aa59ce3]: Update parsoid to 089bf28d
  • 16:39 gehel@cumin2001: END (ERROR) - Cookbook sre.elasticsearch.rolling-upgrade (exit_code=97)
  • 16:31 dzahn@cumin1001: conftool action : set/pooled=yes; selector: name=wtp2002.codfw.wmnet,service=parsoid-php
  • 16:31 dzahn@cumin1001: conftool action : set/pooled=yes; selector: name=wtp2001.codfw.wmnet,service=parsoid-php
  • 16:30 dzahn@cumin1001: conftool action : set/pooled=yes; selector: name=wtp1025.eqiad.wmnet,service=parsoid-php
  • 16:30 dzahn@cumin1001: conftool action : set/pooled=yes; selector: name=wtp1026.eqiad.wmnet,service=parsoid-php
  • 16:28 ssastry@deploy1001: Finished deploy [parsoid/deploy@d932d6a]: Update parsoid to 089bf28d (duration: 06m 11s)
  • 16:22 gehel@cumin2001: START - Cookbook sre.elasticsearch.rolling-upgrade
  • 16:22 ssastry@deploy1001: Started deploy [parsoid/deploy@d932d6a]: Update parsoid to 089bf28d
  • 16:20 mutante: reloading nginx on wtp*
  • 15:57 bstorm_: restarted ferm on labstore1006 -- it failed an external DNS lookup due to brief issues apparently on the other end
  • 15:25 vgutierrez: restarting ats-tls on cp5007 with a default inactivity timeout of 5 minutes and half open disabled - T236458
  • 15:04 eevans@deploy1001: helmfile [EQIAD] Ran 'apply' command on namespace 'echostore' for release 'production' .
  • 15:01 eevans@deploy1001: helmfile [CODFW] Ran 'apply' command on namespace 'echostore' for release 'production' .
  • 14:58 eevans@deploy1001: helmfile [STAGING] Ran 'apply' command on namespace 'echostore' for release 'staging' .
  • 14:45 robh: setting up ps1-b2-eqiad, librenms will output a couple reboots from it T227538
  • 14:32 Krinkle: krinkle@webperf1001.eqiad Restart navtiming, coal and statsv services
  • 14:29 elukey: upgrade python-kafka on webperf[12]001 - T234808
  • 14:27 Krinkle: krinkle@webperf2001 Restart navtiming, coal and statsv services
  • 12:32 hashar: Restarting Zuul / Jenkins
  • 12:31 hashar: Stopping Zuul / Jenkins for upgrade
  • 12:29 akosiaris: delete all production00 volumes on backup1001
  • 11:48 oblivian@deploy1001: helmfile [STAGING] Ran 'apply' command on namespace 'blubberoid' for release 'staging' .
  • 11:37 Urbanecm: EU SWAT done
  • 11:34 urbanecm@deploy1001: Synchronized wmf-config/abusefilter.php: SWAT: faeb8f1: Allow AbuseFilter to issue blocks on es.wikinews (T236730) (duration: 00m 53s)
  • 11:28 urbanecm@deploy1001: Synchronized wmf-config/InitialiseSettings.php: SWAT: fc9920e: Rename Author talk namespace at thwikisource (T236640) (duration: 00m 56s)
  • 11:19 gehel@cumin2001: END (FAIL) - Cookbook sre.elasticsearch.rolling-upgrade (exit_code=99)
  • 11:17 gehel@cumin2001: START - Cookbook sre.elasticsearch.rolling-upgrade
  • 10:51 aborrero@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
  • 10:51 aborrero@cumin1001: START - Cookbook sre.hosts.downtime
  • 10:51 aborrero@cumin1001: END (FAIL) - Cookbook sre.hosts.downtime (exit_code=99)
  • 10:51 aborrero@cumin1001: START - Cookbook sre.hosts.downtime
  • 10:46 jakob@deploy1001: helmfile [EQIAD] Ran 'apply' command on namespace 'termbox' for release 'production' .
  • 10:39 jakob@deploy1001: helmfile [CODFW] Ran 'apply' command on namespace 'termbox' for release 'production' .
  • 10:33 jakob@deploy1001: helmfile [STAGING] Ran 'apply' command on namespace 'termbox' for release 'staging' .
  • 10:29 moritzm: installing php5 security updates
  • 10:23 jakob@deploy1001: helmfile [STAGING] Ran 'apply' command on namespace 'termbox' for release 'test' .
  • 10:21 jynus: running import on m1-master, m1 replicas will lag for a whileT236406
  • 10:20 oblivian@deploy1001: helmfile [EQIAD] Ran 'apply' command on namespace 'kube-system' for release 'rbac-deploy-clusterrole' .
  • 10:19 oblivian@deploy1001: helmfile [CODFW] Ran 'apply' command on namespace 'kube-system' for release 'rbac-deploy-clusterrole' .
  • 10:15 oblivian@deploy1001: helmfile [STAGING] Ran 'apply' command on namespace 'kube-system' for release 'rbac-deploy-clusterrole' .
  • 10:07 XioNoX: disable cr3-esams:et-1/0/0 (flapping)
  • 09:56 filippo@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
  • 09:55 filippo@cumin1001: END (FAIL) - Cookbook sre.hosts.downtime (exit_code=99)
  • 09:55 filippo@cumin1001: END (FAIL) - Cookbook sre.hosts.downtime (exit_code=99)
  • 09:54 filippo@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
  • 09:52 filippo@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
  • 09:51 filippo@cumin1001: START - Cookbook sre.hosts.downtime
  • 09:50 filippo@cumin1001: START - Cookbook sre.hosts.downtime
  • 09:50 filippo@cumin1001: START - Cookbook sre.hosts.downtime
  • 09:50 filippo@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
  • 09:49 filippo@cumin1001: START - Cookbook sre.hosts.downtime
  • 09:48 filippo@cumin1001: START - Cookbook sre.hosts.downtime
  • 09:48 filippo@cumin1001: START - Cookbook sre.hosts.downtime
  • 09:29 gehel: plugin upgrade on relforge - T236123
  • 09:27 godog: reimage elastic 7 hw with Buster
  • 09:27 vgutierrez: restart ats-tls on cp5007 disabling TCP SO_LINGER - T236458
  • 08:43 jynus: shutting down db1099 T227538
  • 08:35 jynus@cumin1001: dbctl commit (dc=all): 'Depool db1099', diff saved to https://phabricator.wikimedia.org/P9492 and previous config saved to /var/cache/conftool/dbconfig/20191029-083547-jynus.json
  • 08:15 XioNoX: push term allow_vmhost ro cr3-esams loopback4 filter - T236598
  • 08:06 vgutierrez: restarting ats-tls on cp5007 with TCP FASTOPEN disabled - T236458
  • 07:40 moritzm: installing php7.3 security updates
  • 07:06 elukey: roll restart java daemons on analytics1042, druid1003 and aqs1004 to pick up new openjdk upgrades
  • 07:01 _joe_: restart memcached on mc1024-1036, 1 hour apart, via cumin (T235188)
  • 06:26 _joe_: restart memcached on mc1023 T23518
  • 03:35 vgutierrez: restarting varnish-frontend on cp5008

2019-10-28

  • 23:23 catrope@deploy1001: Synchronized wmf-config/InitialiseSettings.php: Deploy Echo kask migration to officewiki for testing, part 3 (T222851) (duration: 00m 52s)
  • 23:20 catrope@deploy1001: Synchronized wmf-config/CommonSettings.php: Deploy Echo kask migration to officewiki for testing, part 2 (T222851) (duration: 00m 52s)
  • 23:19 catrope@deploy1001: Synchronized wmf-config/ProductionServices.php: Deploy Echo kask migration to officewiki for testing, part 1 (T222851) (duration: 00m 54s)
  • 23:18 mutante: re-enabling puppet on moscovium (RT)
  • 22:02 ejegg: re-enabled basic fundraising jobs (Queue consumers, audit processors, TY mailer)
  • 20:56 cdanis: restart memcached on mc1022 T235188
  • 20:37 Jeff_Green: authdns update to switch fundraising db service hostname
  • 20:19 ejegg: disabled all fundraising scheduled jobs
  • 19:50 rlazarus: restarted memcached on mc1021 (T235188)
  • 19:41 ssastry@deploy1001: Finished deploy [parsoid/deploy@d932d6a]: Update parsoid to 089bf28d (duration: 02m 42s)
  • 19:38 ssastry@deploy1001: Started deploy [parsoid/deploy@d932d6a]: Update parsoid to 089bf28d
  • 18:53 moritzm: updating PHP on people1001
  • 18:52 Urbanecm: Morning SWAT done
  • 18:42 urbanecm@deploy1001: Synchronized wmf-config/logging.php: SWAT: 1a09e2a: Direct Parsoid/PHP logs to a parsoid-php log "type" (T235899) (duration: 00m 52s)
  • 18:41 rlazarus: restarted memcached on mc1020 T235188
  • 18:32 mutante: moscovium - rename all files in /etc/request-tracker4/RT_SiteConfig.d to have a .pm extension - this fixed RT - login works again - puppet patch coming up (T180641)
  • 18:30 urbanecm@deploy1001: Synchronized wmf-config/InitialiseSettings.php: SWAT: 30111f3: Enable mapframe at kawiki (T229726) (duration: 00m 53s)
  • 18:28 mutante: moscovium - deleting /etc/request-tracker4/RT_SiteConfig.d/ 50-debconf.pm and 51-dbconfig-common.pm which duplicate the same files without .pm extension with wrong values, probably due to some package change (T180641)
  • 18:27 jgleeson: updated paymentswiki from 7bb9f5257e to e28bc54e85
  • 18:26 urbanecm@deploy1001: Synchronized wmf-config/: SWAT: c48271d: Revert "Config changes for Echo kask migration" (T222851) (duration: 00m 53s)
  • 18:24 urbanecm@deploy1001: Synchronized php-1.35.0-wmf.3/extensions/VisualEditor/includes/ApiVisualEditor.php: SWAT: b19ad5f: Revert "Revert "ApiVisualEditor: Return etag with content for preloaded content""; 4f3b724: ApiVisualEditor: Fix preload handling further (T233320) (duration: 00m 53s)
  • 18:15 Urbanecm: Run mwscript namespaceDupes.php --wiki=thwikisource --fix (T236640)
  • 18:14 urbanecm@deploy1001: Synchronized wmf-config/InitialiseSettings.php: SWAT: ea927dd: Rename author NS at thwikisource (T236640) (duration: 00m 53s)
  • 18:07 urbanecm@deploy1001: Synchronized wmf-config/: SWAT: ddaa534: Config changes for Echo kask migration (T222851) (duration: 00m 55s)
  • 17:20 elukey@cumin1001: END (PASS) - Cookbook sre.ganeti.makevm (exit_code=0)
  • 17:12 bblack: mr1-eqiad: fix bast3004 access for eqiad mgmt network - T236686
  • 17:11 _joe_: starting rolling restart of memcached servers in eqiad, beginning with mc1019 T235188
  • 17:11 bblack: mr1-codfw: fix bast3004 access for codfw mgmt network - T236686
  • 17:10 bblack: mr1-ulsfo: fix bast3004 access for ulsfo mgmt network - T236686
  • 16:57 bblack: mr1-eqsin: fix bast3004 access for eqsin mgmt network - T236686
  • 16:56 elukey@cumin1001: START - Cookbook sre.ganeti.makevm
  • 16:55 bblack: mr1-esams: fix bast3004 access for esams mgmt network - T236686
  • 16:36 jbond42: restart puppetdb on pupetdb1001 to remove queue
  • 13:50 ema: pool cp5007 with ATS backend T227432
  • 13:30 godog: roll restart logstash in codfw/eqiad to apply new config
  • 13:23 effie: enable puppet on mw1*, depool and repool to reload apache - T229792
  • 13:13 effie: enable puppet on mw[1261-1265].eqiad.wmnet (mw canaries), depool and repool to reload apache - T229792
  • 13:07 ema@cumin1001: END (FAIL) - Cookbook sre.hosts.downtime (exit_code=99)
  • 13:07 ema@cumin1001: START - Cookbook sre.hosts.downtime
  • 13:05 effie: enable puppet on mw2* servers, depool and repool to reload apache - T229792
  • 13:01 jynus: stop db1114 for testing
  • 12:30 ema: depool cp5007 and reimage as text_ats T227432
  • 12:22 effie: depool mw2150
  • 11:56 twentyafterfour@deploy1001: Finished deploy [phabricator/deployment@e4e2b22]: testing deployment of phabricator to phab1001 (duration: 00m 05s)
  • 11:56 twentyafterfour@deploy1001: Started deploy [phabricator/deployment@e4e2b22]: testing deployment of phabricator to phab1001
  • 11:34 Urbanecm: EU SWAT done
  • 11:33 urbanecm@deploy1001: Synchronized php-1.35.0-wmf.3/extensions/VisualEditor/includes/ApiVisualEditorEdit.php: SWAT: 8caf681: Dont log missing ETags when creating a new page, thats normal (T233320) (duration: 00m 54s)
  • 11:33 effie: Disable puppet on mw* for 545652 - T229792
  • 11:12 urbanecm@deploy1001: Synchronized wmf-config/InitialiseSettings.php: SWAT: dd2f06c: Add Translate channel for the Translate extension (T221119) (duration: 00m 53s)
  • 11:07 urbanecm@deploy1001: Synchronized wmf-config/InitialiseSettings.php: SWAT: ff17666: Adjust wgUploadNavigationUrl for azwiki to point to commons UpWiz (T236307) (duration: 00m 53s)
  • 11:05 urbanecm@deploy1001: Synchronized dblists/commonsuploads.dblist: SWAT: 7e26ef4: Revert "Restrict uploads on azwiki" (T236307) (duration: 00m 53s)
  • 11:02 moritzm: installing OpenJDK security updates on elastic*
  • 10:40 jdrewniak@deploy1001: Synchronized portals: Wikimedia Portals Update: Bumping portals to master (T128546) (duration: 00m 53s)
  • 10:39 jdrewniak@deploy1001: Synchronized portals/wikipedia.org/assets: Wikimedia Portals Update: Bumping portals to master (T128546) (duration: 00m 54s)
  • 08:48 godog: bump udp_localhost kafka-logging topics to 6 partitions and roll-restart logstash and rsyslog - T215904
  • 08:26 volans: manually cleanup changes reverted in https://gerrit.wikimedia.org/r/546407 on icinga[12]001 - T222074
  • 08:25 moritzm: installing file/libmagic security updates
  • 08:16 mobrovac@deploy1001: Finished deploy [restbase/deploy@447981b]: Parsoid: Shim content-language and vary headers only for the PHP variant - T230791 (duration: 13m 42s)
  • 08:15 godog: swift eqiad-prod: final weight to ms-be105[1-6] - T232367
  • 08:02 mobrovac@deploy1001: Started deploy [restbase/deploy@447981b]: Parsoid: Shim content-language and vary headers only for the PHP variant - T230791
  • 07:40 mobrovac@deploy1001: Finished deploy [restbase/deploy@c500d7a]: Add the Parsoid proxy for JS/PHP variants, add top mediarequests end point and add mnwwiki and ge.wm.org - T230791 T235744 T236389 (duration: 13m 44s)
  • 07:40 elukey@deploy1001: Finished deploy [eventlogging/analytics@0f1ad6d]: Move codebase to Python3 - second attempt (duration: 00m 05s)
  • 07:40 elukey@deploy1001: Started deploy [eventlogging/analytics@0f1ad6d]: Move codebase to Python3 - second attempt
  • 07:37 elukey: upload archiva 2.2.4-1 to wikimedia-stretch (fix to avoid overriding archiva.xml upon install)
  • 07:27 mobrovac@deploy1001: Started deploy [restbase/deploy@c500d7a]: Add the Parsoid proxy for JS/PHP variants, add top mediarequests end point and add mnwwiki and ge.wm.org - T230791 T235744 T236389
  • 07:25 mobrovac@deploy1001: Finished deploy [restbase/deploy@c500d7a] (dev-cluster): Add the Parsoid proxy for JS/PHP variants, add top mediarequests end point and add mnwwiki and ge.wm.org (duration: 02m 37s)
  • 07:22 mobrovac@deploy1001: Started deploy [restbase/deploy@c500d7a] (dev-cluster): Add the Parsoid proxy for JS/PHP variants, add top mediarequests end point and add mnwwiki and ge.wm.org

2019-10-26

  • 11:30 XioNoX: restart cr3-esams
  • 11:01 XioNoX: re0.cr3-esams> request chassis routing-engine master switch

2019-10-25

  • 22:55 mutante: moscovium rm /dev/shm/envoy_shared_memory_0 to revive envoy which failed to run after changing ports and reinstalling it (T180641)
  • 22:42 mutante: moscovium - manually deleting envoy listener on 1443 and letting puppet recreate config because it's not removed if you change the port (T180641)
  • 21:55 mutante: running puppet on ulsfo cp-ats servers to pick up config change for RT backend
  • 20:42 twentyafterfour@deploy1001: Finished deploy [design/style-guide@c69242e]: deploying design/style-guide for demonstration purposes (duration: 00m 06s)
  • 20:41 twentyafterfour@deploy1001: Started deploy [design/style-guide@c69242e]: deploying design/style-guide for demonstration purposes
  • 20:04 twentyafterfour@deploy1001: Finished deploy [design/style-guide@c69242e]: test deploy design/style-guide (duration: 00m 10s)
  • 20:04 twentyafterfour@deploy1001: Started deploy [design/style-guide@c69242e]: test deploy design/style-guide
  • 17:49 bblack@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
  • 17:47 bblack@cumin1001: START - Cookbook sre.hosts.downtime
  • 17:26 bblack: lvs3005 - reimaging to fix partman issue, high-traffic1 (text) to lvs3007 for the duration
  • 16:43 bblack@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
  • 16:40 bblack@cumin1001: START - Cookbook sre.hosts.downtime
  • 16:19 bblack: lvs3006 - reimaging to fix partman issue, high-traffic2 (upload/maps) to lvs3007 for the duration
  • 16:19 crusnov@deploy1001: Finished deploy [netbox/deploy@0f4c92d]: deploy netbox scripts update (netbox1001) T223292 (duration: 13m 31s)
  • 16:05 crusnov@deploy1001: Started deploy [netbox/deploy@0f4c92d]: deploy netbox scripts update (netbox1001) T223292
  • 16:04 crusnov@deploy1001: Finished deploy [netbox/deploy@0f4c92d]: deploy netbox scripts update (netbox2001) T223292 (duration: 00m 43s)
  • 16:04 crusnov@deploy1001: Started deploy [netbox/deploy@0f4c92d]: deploy netbox scripts update (netbox2001) T223292
  • 15:35 robh: ps1-oe14-esams ip info set, rebooting (wont affect servers) via T184066
  • 15:03 gehel@cumin1001: END (ERROR) - Cookbook sre.wdqs.data-reload (exit_code=97)
  • 15:01 bblack@cumin1001: END (PASS) - Cookbook sre.hosts.decommission (exit_code=0)
  • 15:00 bblack@cumin1001: START - Cookbook sre.hosts.decommission
  • 14:41 bblack: cr[23]-esams: re-route ns2 IP to ganeti3003
  • 14:36 gehel@cumin1001: START - Cookbook sre.wdqs.data-reload
  • 14:32 crusnov@deploy1001: Finished deploy [netbox/deploy@690f9ae]: deploy netbox scripts (netbox2001) -T223292 (duration: 00m 44s)
  • 14:31 crusnov@deploy1001: Started deploy [netbox/deploy@690f9ae]: deploy netbox scripts (netbox2001) -T223292
  • 14:30 crusnov@deploy1001: Finished deploy [netbox/deploy@690f9ae]: deploy netbox scripts (netbox2001) T223292 (duration: 00m 05s)
  • 14:30 crusnov@deploy1001: Started deploy [netbox/deploy@690f9ae]: deploy netbox scripts (netbox2001) T223292
  • 14:28 crusnov@deploy1001: Finished deploy [netbox/deploy@690f9ae]: deploy netbox scripts T223292 (duration: 01m 02s)
  • 14:27 crusnov@deploy1001: Started deploy [netbox/deploy@690f9ae]: deploy netbox scripts T223292
  • 14:17 gehel@cumin1001: END (FAIL) - Cookbook sre.wdqs.data-reload (exit_code=99)
  • 14:15 gehel@cumin1001: START - Cookbook sre.wdqs.data-reload
  • 14:10 gehel@cumin1001: END (FAIL) - Cookbook sre.wdqs.data-reload (exit_code=99)
  • 14:10 gehel@cumin1001: START - Cookbook sre.wdqs.data-reload
  • 14:09 bblack: reboot ganeti3003
  • 13:57 bblack@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
  • 13:57 ema: pool cp4032 with ATS backend T227432
  • 13:55 bblack@cumin1001: START - Cookbook sre.hosts.downtime
  • 13:48 effie: depool mw1334 and pool back
  • 13:30 filippo@cumin1001: END (PASS) - Cookbook sre.hosts.decommission (exit_code=0)
  • 13:30 ema@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
  • 13:30 filippo@cumin1001: START - Cookbook sre.hosts.decommission
  • 13:28 ema@cumin1001: START - Cookbook sre.hosts.downtime
  • 13:07 ema@cumin1001: conftool action : set/weight=100; selector: name=cp4032.ulsfo.wmnet,service=ats-be
  • 13:05 ema: depool cp4032 and reimage as text_ats T227432
  • 12:34 jynus: introducing new freshnesh check for bacula T234900
  • 12:11 ema: pool cp4031 with ATS backend T227432
  • 10:20 ema@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
  • 10:18 ema@cumin1001: START - Cookbook sre.hosts.downtime
  • 10:01 godog: swift eqiad-prod: add weight to ms-be105[1-6] - T232367
  • 09:59 ema@puppetmaster1001: conftool action : set/weight=100; selector: name=cp4031.ulsfo.wmnet,service=ats-be
  • 09:56 ema: depool cp4031 and reimage as text_ats T227432
  • 09:39 ema: pool cp4030 with ATS backend T227432
  • 09:22 ema@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
  • 09:21 XioNoX: powering off mr1-esams again
  • 09:20 ema@cumin1001: START - Cookbook sre.hosts.downtime
  • 09:06 XioNoX: going to power down mr1-esams (esams mgmt is going to go down) for 30min the time to move power cables
  • 09:02 jynus: disabling persistent journald on db1074
  • 09:01 ema@cumin1001: conftool action : set/weight=100; selector: name=cp4030.ulsfo.wmnet,service=ats-be
  • 08:58 ema: depool cp4030 and reimage as text_ats T227432
  • 08:48 vgutierrez: switch from nginx to ats-tls on cp3050 - T231627
  • 08:45 godog: stop prometheus on bast300[24] and done last round of rsync data - T236329
  • 08:37 ema: lvs1015: restart pybal to add labweb-ssl T210411
  • 08:36 ema: test
  • 08:34 ema@cumin1001: conftool action : set/pooled=yes; selector: service=labweb-ssl
  • 08:32 ema: lvs1016: restart pybal to add labweb-ssl T210411
  • 08:02 vgutierrez: rolling restart of ats-tls to introduce a SSL handshake timeout of 60 secs - T236458
  • 07:35 akosiaris: reboot webperf1002 for disk resize T235455
  • 07:29 akosiaris: reboot webperf2002 for disk resize T235455
  • 05:58 vgutierrez@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
  • 05:56 vgutierrez@cumin1001: START - Cookbook sre.hosts.downtime
  • 05:35 vgutierrez: reimage lvs3007 to let it get the proper partman configuration - T236294
  • 05:03 vgutierrez: Applying a SSL handshake timeout of 60 secs on ats-tls/cp5007 - T236458
  • 04:56 bblack@cumin1001: END (FAIL) - Cookbook sre.hosts.decommission (exit_code=1)
  • 04:55 bblack@cumin1001: START - Cookbook sre.hosts.decommission
  • 04:54 bblack@cumin1001: END (PASS) - Cookbook sre.hosts.decommission (exit_code=0)
  • 04:53 bblack@cumin1001: START - Cookbook sre.hosts.decommission
  • 04:53 bblack@cumin1001: END (PASS) - Cookbook sre.hosts.decommission (exit_code=0)
  • 04:52 bblack@cumin1001: START - Cookbook sre.hosts.decommission
  • 04:51 dzahn@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
  • 04:51 bblack@cumin1001: END (FAIL) - Cookbook sre.hosts.decommission (exit_code=1)
  • 04:50 bblack@cumin1001: START - Cookbook sre.hosts.decommission
  • 04:49 dzahn@cumin1001: START - Cookbook sre.hosts.downtime
  • 04:49 bblack@cumin1001: END (FAIL) - Cookbook sre.hosts.decommission (exit_code=1)
  • 04:49 bblack@cumin1001: START - Cookbook sre.hosts.decommission
  • 03:24 bblack@cumin1001: conftool action : set/pooled=yes; selector: name=dns300.*
  • 03:24 bblack@cumin1001: conftool action : set/weight=1; selector: name=dns300.*
  • 03:24 bblack@cumin1001: conftool action : set/weight=1; selector: name=dns3001.*
  • 03:08 bblack: cr2-esams + cr3-esams : remove nescio and maerlant from anycast4 neighbor list
  • 03:06 bblack@cumin1001: END (PASS) - Cookbook sre.hosts.decommission (exit_code=0)
  • 03:05 bblack@cumin1001: START - Cookbook sre.hosts.decommission
  • 02:45 bblack@cumin1001: conftool action : set/pooled=yes; selector: name=cp3065.esams.wmnet
  • 02:45 bblack@cumin1001: conftool action : set/pooled=no; selector: name=cp3049.esams.wmnet
  • 02:45 bblack@cumin1001: conftool action : set/pooled=yes; selector: name=cp3064.esams.wmnet
  • 02:44 bblack@cumin1001: conftool action : set/pooled=no; selector: name=cp3043.esams.wmnet
  • 02:11 bblack@cumin1001: END (PASS) - Cookbook sre.hosts.decommission (exit_code=0)
  • 02:09 bblack@cumin1001: START - Cookbook sre.hosts.decommission
  • 01:52 bblack: mr1-esams: switch ntp peers list to use dns300[12] instead of nescio/maerlant
  • 01:50 bblack: asw2-esams: switch ntp peers list to use dns300[12] instead of nescio/maerlant
  • 01:46 bblack: cr2-esams + cr3-esams: switch ntp peers list to use dns300[12] instead of nescio/maerlant
  • 01:40 bblack@cumin1001: conftool action : set/pooled=yes; selector: name=cp3062.esams.wmnet
  • 01:40 bblack@cumin1001: conftool action : set/pooled=no; selector: name=cp3042.esams.wmnet
  • 01:40 bblack@cumin1001: conftool action : set/pooled=yes; selector: name=cp3063.esams.wmnet
  • 01:39 bblack@cumin1001: conftool action : set/pooled=no; selector: name=cp3047.esams.wmnet
  • 01:28 bblack@cumin1001: conftool action : set/pooled=yes; selector: name=cp3052.esams.wmnet
  • 01:28 bblack@cumin1001: conftool action : set/pooled=no; selector: name=cp3041.esams.wmnet
  • 01:27 bblack@cumin1001: conftool action : set/pooled=yes; selector: name=cp3061.esams.wmnet
  • 01:27 bblack@cumin1001: conftool action : set/pooled=no; selector: name=cp3046.esams.wmnet
  • 01:27 bblack@cumin1001: conftool action : set/pooled=no; selector: name=cp3045.esams.wmnet
  • 01:13 mutante: puppetmaster1001 - revoking parsoid.svc.eqiad / parsoid.svc.codfw / parsoid.discovery.wmnet certificates and creating new ones including parsoid-php.discovery.wmnet (T233654)
  • 00:52 krinkle@deploy1001: Synchronized php-1.35.0-wmf.3/extensions/LiquidThreads/classes/View.php: (no justification provided) (duration: 00m 54s)

2019-10-24

  • 23:46 mutante: bast3002 - rsyncing /home, /srv/tfptboot and /srv/prometheus to /srv/bast3002/ on bast3004 (T236394 T236329)
  • 23:24 krinkle@deploy1001: Synchronized php-1.35.0-wmf.3/includes/specials/pagers/BlockListPager.php: T236425, fc99c5a7c0de2 (duration: 00m 54s)
  • 22:16 bblack@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
  • 22:14 bblack@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
  • 22:13 mutante: gerrit1001 - starting gerrit
  • 22:13 bblack@cumin1001: START - Cookbook sre.hosts.downtime
  • 22:12 bblack@cumin1001: END (FAIL) - Cookbook sre.hosts.downtime (exit_code=99)
  • 22:12 bblack@cumin1001: START - Cookbook sre.hosts.downtime
  • 22:12 bblack@cumin1001: START - Cookbook sre.hosts.downtime
  • 22:11 bblack@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
  • 22:10 thcipriani: stopping gerrit briefly for script run for T236344
  • 22:09 bblack@cumin1001: START - Cookbook sre.hosts.downtime
  • 22:01 mutante: mw1270 - was alerting in Icinga as degraded systemd state - reason was 'hhvm.service not-found". systemctl reset-failed cleared it. could cause monitoring spam on more servers (T229792)
  • 21:56 eileen: civicrm revision changed from 47e0800001 to a55c2d2787, config revision is 63a67f32a1
  • 21:16 bblack@cumin1001: conftool action : set/pooled=no; selector: name=cp3040.esams.wmnet
  • 21:16 bblack@cumin1001: conftool action : set/pooled=yes; selector: name=cp3050.esams.wmnet
  • 21:13 bblack@cumin1001: conftool action : set/pooled=yes; selector: name=cp3051.esams.wmnet
  • 21:13 bblack@cumin1001: conftool action : set/pooled=no; selector: name=cp3044.esams.wmnet
  • 21:12 bblack@cumin1001: conftool action : set/pooled=no; selector: name=cp3039.esams.wmnet
  • 21:06 bblack: cr3-esams remove pybal neighbor IPs for lvs3001-4
  • 21:05 bblack: cr2-esams remove pybal neighbor IPs for lvs3001-4
  • 21:05 urandom: restbase cassandra rolling restart, codfw / rack 'd' -- T200803
  • 21:02 bblack: downtimed lvs3001-4, stopping pybal there, etc...
  • 20:58 bblack: cr3-esams switch high-traffic1 static fallback routes from lvs3001 to lvs3005
  • 20:58 bblack: cr2-esams switch high-traffic1 static fallback routes from lvs3001 to lvs3005
  • 20:40 bblack: esams lvs: high-traffic1 - change 3005's med to 0 (becomes new primary, permanently)
  • 20:36 bblack: esams lvs: high-traffic1 - change 3003's med to 200, 3001's med to 50, 3005 remains 100 (traffic will blip to 3005 then back to 3001 again)
  • 20:33 urandom: restbase cassandra rolling restart, codfw / rack 'c' -- T200803
  • 20:24 bblack@cumin1001: conftool action : set/pooled=no; selector: name=cp3038.esams.wmnet
  • 20:24 bblack@cumin1001: conftool action : set/pooled=no; selector: name=cp3033.esams.wmnet
  • 20:23 bblack@cumin1001: conftool action : set/pooled=yes; selector: name=cp3053.esams.wmnet
  • 20:22 bblack@cumin1001: conftool action : set/pooled=yes; selector: name=cp3054.esams.wmnet
  • 20:04 bblack: reboot cp3054 again for good measure
  • 19:57 bblack: cp3054 - trying racadm serveraction hardreset
  • 19:32 bblack: reboot dns3001
  • 19:31 urandom: restbase cassandra rolling restart, codfw / rack 'b' -- T200803
  • 19:10 dzahn@cumin1001: END (FAIL) - Cookbook sre.hosts.downtime (exit_code=99)
  • 19:07 dzahn@cumin1001: END (FAIL) - Cookbook sre.hosts.downtime (exit_code=99)
  • 19:06 urandom: restbase cassandra rolling restart, rack 'd' -- T200803
  • 19:05 dzahn@cumin1001: END (FAIL) - Cookbook sre.hosts.downtime (exit_code=99)
  • 19:05 dzahn@cumin1001: START - Cookbook sre.hosts.downtime
  • 19:05 dzahn@cumin1001: START - Cookbook sre.hosts.downtime
  • 19:03 dzahn@cumin1001: START - Cookbook sre.hosts.downtime
  • 19:01 bblack@cumin1001: END (FAIL) - Cookbook sre.hosts.downtime (exit_code=99)
  • 19:01 bblack@cumin1001: END (FAIL) - Cookbook sre.hosts.downtime (exit_code=99)
  • 19:01 bblack@cumin1001: END (FAIL) - Cookbook sre.hosts.downtime (exit_code=99)
  • 19:00 bblack@cumin1001: END (FAIL) - Cookbook sre.hosts.downtime (exit_code=99)
  • 19:00 bblack@cumin1001: END (FAIL) - Cookbook sre.hosts.downtime (exit_code=99)
  • 18:59 bblack@cumin1001: END (FAIL) - Cookbook sre.hosts.downtime (exit_code=99)
  • 18:57 bblack@cumin1001: END (FAIL) - Cookbook sre.hosts.downtime (exit_code=99)
  • 18:56 bblack@cumin1001: START - Cookbook sre.hosts.downtime
  • 18:56 bblack@cumin1001: START - Cookbook sre.hosts.downtime
  • 18:56 bblack@cumin1001: START - Cookbook sre.hosts.downtime
  • 18:56 bblack@cumin1001: START - Cookbook sre.hosts.downtime
  • 18:55 bblack@cumin1001: START - Cookbook sre.hosts.downtime
  • 18:55 bblack@cumin1001: START - Cookbook sre.hosts.downtime
  • 18:55 bblack@cumin1001: END (FAIL) - Cookbook sre.hosts.downtime (exit_code=99)
  • 18:55 bblack@cumin1001: END (FAIL) - Cookbook sre.hosts.downtime (exit_code=99)
  • 18:55 bblack@cumin1001: END (FAIL) - Cookbook sre.hosts.downtime (exit_code=99)
  • 18:55 bblack@cumin1001: START - Cookbook sre.hosts.downtime
  • 18:55 bblack@cumin1001: START - Cookbook sre.hosts.downtime
  • 18:55 bblack@cumin1001: START - Cookbook sre.hosts.downtime
  • 18:55 Urbanecm: Morning SWAT done
  • 18:55 bblack@cumin1001: START - Cookbook sre.hosts.downtime
  • 18:46 urandom: restbase cassandra rolling restart, rack 'b' -- T200803
  • 18:44 bblack@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
  • 18:42 bblack@cumin1001: END (FAIL) - Cookbook sre.hosts.downtime (exit_code=99)
  • 18:42 bblack@cumin1001: START - Cookbook sre.hosts.downtime
  • 18:42 bblack@cumin1001: START - Cookbook sre.hosts.downtime
  • 18:31 bblack: cr3-esams: add dns3001 to anycast4 neighbors
  • 18:30 bblack: cr2-esams: add dns3001 to anycast4 neighbors
  • 18:29 urbanecm@deploy1001: Synchronized wmf-config/InitialiseSettings.php: SWAT: 263fd0f: Enable Wikibase client access on commonswiki (T223792) (duration: 00m 52s)
  • 18:25 urandom: restbase cassandra rolling restart, rack 'a' -- T200803
  • 18:22 robh: completing ps1-b6-eqiad setup, pdu will reboot twice, power output unaffected T227540
  • 18:20 robh: ps1-a6-eqiad setup complete, icinga errors should clear up T227142
  • 18:15 urbanecm@deploy1001: Synchronized wmf-config/: SWAT: 84c48df: rename service definition (T222851) (duration: 00m 53s)
  • 18:06 urbanecm@deploy1001: Synchronized wmf-config/InitialiseSettings.php: SWAT: b20d6de: Reference Previews: full beta deployment (T235083) (duration: 00m 52s)
  • 18:03 robh: setting ip info for ps1-a6-eqiad, it is rebooting. T227142
  • 17:41 dzahn@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
  • 17:39 dzahn@cumin1001: START - Cookbook sre.hosts.downtime
  • 17:38 ema: pool cp3059 (cache_upload) T233242
  • 17:29 bblack: asw2-esams - committing switch port/vlan config for new rack 14 hosts
  • 17:26 mobrovac@deploy1001: Synchronized wmf-config/CommonSettings.php: Enable Parsoid/PHP in the whole wtp (a.k.a. Parsoid) cluster - T236388 (duration: 00m 53s)
  • 17:18 bblack@cumin1001: END (FAIL) - Cookbook sre.hosts.downtime (exit_code=99)
  • 17:15 bblack@cumin1001: START - Cookbook sre.hosts.downtime
  • 16:54 ema: depool cp3036 (cache_upload) T233242
  • 16:39 urandom: restarting cassandra, restbase2011 (canary for config changes) -- T200803
  • 16:32 urandom: restarting cassandra, restbase1016 (canary for config changes) -- T200803
  • 16:28 ema: depool cp3035 (cache_upload) T233242
  • 16:07 ema: pool cp3057 (cache_upload) T233242
  • 15:51 ema: depool cp3032 (cache_text) T233242
  • 15:45 ema: depool cp3034 (cache_upload) T233242
  • 15:40 ema: depool cp3030 (cache_text) T233242
  • 15:27 bblack: asw2-esams: configure port descriptions and vlan/lvs groupings for all rack16 hosts (lvs3007, ganeti3003, bast3004, cp3061-5)
  • 15:19 ema: pool cp3058 (cache_text) T233242
  • 15:18 effie: Slowly reload apache across the fleet (as we are enabling puppet) - T229792
  • 15:09 effie: Remove hhvm packages and enable puppet across the fleet - T229792
  • 15:09 ema: pool cp3055 (cache_upload) T233242
  • 15:04 addshore@deploy1001: Synchronized wmf-config/InitialiseSettings.php: testcommonswiki, Enable Wikibase client access T223792 (duration: 00m 53s)
  • 15:00 bblack: cr2-esams - add missing lvs3005 IP to bgp pybal neighbor list
  • 14:58 bblack: cr3-esams - change fallback static route for high-traffic2 to lvs3006
  • 14:58 bblack: cr2-esams - change fallback static route for high-traffic2 to lvs3006
  • 14:47 effie: run puppet on all canaries and codfw - T229792
  • 14:42 ema@cumin1001: END (FAIL) - Cookbook sre.hosts.downtime (exit_code=99)
  • 14:40 effie: Remove hhvm hhvm-luasandbox hhvm-tidy hhvm-wikidiff2 hhvm-dbg from all canaries and codfw - T229792
  • 14:40 ema@cumin1001: START - Cookbook sre.hosts.downtime
  • 14:26 bblack: lvs3006 (upload, becoming active) - manual pybal med s/90/0/ (will take over from lvs3002, intended permanently).
  • 14:23 bblack: lvs3006 (upload, inactive) - manual pybal med s/100/90/ (preferred to lvs3004 for fallback from lvs3002)
  • 14:22 effie: enable puppet on mw app canaries
  • 14:16 ema: power-cycle cp3056, stuck rebooting into d-i T233242
  • 13:59 ema: pool cp3060 T233242
  • 13:36 bblack: re-pooling esams in dns
  • 13:34 effie: enable puppet on mwdebug*
  • 13:25 XioNoX: enable transit4/6 on cr2-knams
  • 13:24 ema@puppetmaster1001: conftool action : set/weight=100; selector: service=varnish-be,name=cp30[56].*
  • 13:24 bblack@cumin1001: conftool action : set/weight=100; selector: name=cp30[56].*,service=varnish-be
  • 13:23 bblack@cumin1001: conftool action : set/weight=1; selector: name=cp30[56].*,cluster=cache_text,service=varnish-fe
  • 13:22 bblack@cumin1001: conftool action : set/weight=1; selector: name=cp30[56].*,cluster=cache_text,service=nginx
  • 13:22 bblack@cumin1001: conftool action : set/weight=1; selector: name=cp30[56].*,cluster=cache_upload,service=varnish-fe
  • 13:22 bblack@cumin1001: conftool action : set/weight=1; selector: name=cp30[56].*,cluster=cache_upload,service=nginx
  • 13:18 ema@puppetmaster1001: conftool action : set/weight=100; selector: service=ats-be,name=cp3063.esams.wmnet
  • 13:18 ema@puppetmaster1001: conftool action : set/weight=100; selector: service=ats-be,name=cp3051.esams.wmnet
  • 13:18 ema@puppetmaster1001: conftool action : set/weight=100; selector: service=ats-be,name=cp3059.esams.wmnet
  • 13:18 ema@puppetmaster1001: conftool action : set/weight=100; selector: service=ats-be,name=cp3061.esams.wmnet
  • 13:18 ema@puppetmaster1001: conftool action : set/weight=100; selector: service=ats-be,name=cp3057.esams.wmnet
  • 13:18 ema@puppetmaster1001: conftool action : set/weight=100; selector: service=ats-be,name=cp3065.esams.wmnet
  • 13:18 ema@puppetmaster1001: conftool action : set/weight=100; selector: service=ats-be,name=cp3055.esams.wmnet
  • 13:18 ema@puppetmaster1001: conftool action : set/weight=100; selector: service=ats-be,name=cp3053.esams.wmnet
  • 13:17 ema: set ats-be weights on new esams upload nodes T233242
  • 13:06 liw@deploy1001: rebuilt and synchronized wikiversions files: all wikis to 1.35.0-wmf.3
  • 12:56 effie: purge hhvm hhvm-luasandbox hhvm-tidy hhvm-wikidiff2 hhvm-dbg from mw* canaries - T229792
  • 12:42 ema@puppetmaster1001: conftool action : set/weight=100; selector: name=cp3060.esams.wmnet,service=varnish-be
  • 12:33 effie: Stopping puppet on all hosts including the hhvm class (C:hhvm) - 544864 - T229792
  • 12:25 ema: cp3060: powercycle -- NMI watchdog: BUG: soft lockup - CPU#18 stuck for 22s! [charon:1226] T233242
  • 12:14 bblack: depool esams in geodns
  • 12:08 marostegui@cumin1001: dbctl commit (dc=all): 'Repool db2092 after analyze table', diff saved to https://phabricator.wikimedia.org/P9468 and previous config saved to /var/cache/conftool/dbconfig/20191024-120812-marostegui.json
  • 12:06 XioNoX: shutdown cr1-esams - cr2-knams link
  • 12:00 XioNoX: shutdown transit BGP sessions on cr2-knams
  • 11:40 Urbanecm: EU SWAT done
  • 11:35 urbanecm@deploy1001: Synchronized wmf-config/InitialiseSettings.php: SWAT: 3a5cb68: Permission changes of move-rootuserpages assignment at commonswiki (T236359) (duration: 01m 00s)
  • 11:33 ema@cumin1001: END (FAIL) - Cookbook sre.hosts.downtime (exit_code=99)
  • 11:31 ema@cumin1001: START - Cookbook sre.hosts.downtime
  • 11:31 Urbanecm: Run mwscript namespaceDupes.php --wiki=commonswiki --add-prefix=FIXME --fix (T236352)
  • 11:28 vgutierrez@cumin1001: END (FAIL) - Cookbook sre.hosts.downtime (exit_code=99)
  • 11:26 vgutierrez@cumin1001: START - Cookbook sre.hosts.downtime
  • 11:26 urbanecm@deploy1001: Synchronized wmf-config/InitialiseSettings.php: SWAT: e079956: Add CAT as alias for NS_CATEGORY at commonswiki (T236352) (duration: 01m 00s)
  • 11:22 urbanecm@deploy1001: Synchronized dblists/commonsuploads.dblist: SWAT: 2d66deb: Restrict uploads on azwiki (T236307) (duration: 01m 03s)
  • 11:15 mlitn@deploy1001: Synchronized php-1.35.0-wmf.3/extensions/WikibaseMediaInfo: Also use custom PrefetchingTermLookup in SingleEntitySourceServices (duration: 01m 01s)
  • 11:13 mlitn@deploy1001: Synchronized php-1.35.0-wmf.3/extensions/Wikibase: Allow defining entity-type-specific PrefetchingTermLookup (duration: 01m 06s)
  • 11:08 aborrero@cumin1001: END (FAIL) - Cookbook sre.hosts.downtime (exit_code=99)
  • 11:08 aborrero@cumin1001: START - Cookbook sre.hosts.downtime
  • 11:08 aborrero@cumin1001: END (FAIL) - Cookbook sre.hosts.downtime (exit_code=99)
  • 11:08 aborrero@cumin1001: START - Cookbook sre.hosts.downtime
  • 11:00 vgutierrez@cumin1001: END (FAIL) - Cookbook sre.hosts.downtime (exit_code=99)
  • 10:58 vgutierrez@cumin1001: START - Cookbook sre.hosts.downtime
  • 10:56 aborrero@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
  • 10:56 aborrero@cumin1001: START - Cookbook sre.hosts.downtime
  • 10:56 aborrero@cumin1001: END (FAIL) - Cookbook sre.hosts.downtime (exit_code=99)
  • 10:56 aborrero@cumin1001: START - Cookbook sre.hosts.downtime
  • 10:56 aborrero@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
  • 10:56 aborrero@cumin1001: START - Cookbook sre.hosts.downtime
  • 10:56 aborrero@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
  • 10:56 aborrero@cumin1001: START - Cookbook sre.hosts.downtime
  • 10:56 aborrero@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
  • 10:56 aborrero@cumin1001: START - Cookbook sre.hosts.downtime
  • 10:56 aborrero@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
  • 10:56 aborrero@cumin1001: START - Cookbook sre.hosts.downtime
  • 10:55 aborrero@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
  • 10:55 aborrero@cumin1001: START - Cookbook sre.hosts.downtime
  • 10:55 ema@cumin1001: END (FAIL) - Cookbook sre.hosts.downtime (exit_code=99)
  • 10:52 ema@cumin1001: START - Cookbook sre.hosts.downtime
  • 10:18 marostegui@cumin1001: dbctl commit (dc=all): 'Adjust s6 weights for db1093 and db1085', diff saved to https://phabricator.wikimedia.org/P9466 and previous config saved to /var/cache/conftool/dbconfig/20191024-101810-marostegui.json
  • 09:59 hashar: Converting CI jobs to use the new PostBuildScript plugin config | https://gerrit.wikimedia.org/r/#/c/integration/config/+/544907/ | T188398
  • 09:57 hashar: Restarting CI Jenkins
  • 09:35 vgutierrez@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
  • 09:33 vgutierrez@cumin1001: START - Cookbook sre.hosts.downtime
  • 09:14 vgutierrez@cumin1001: END (FAIL) - Cookbook sre.hosts.downtime (exit_code=99)
  • 09:12 vgutierrez@cumin1001: START - Cookbook sre.hosts.downtime
  • 09:12 gilles@deploy1001: Synchronized wmf-config/InitialiseSettings.php: T234853 Re-enable performance perception survey on ruwiki (duration: 01m 04s)
  • 08:39 vgutierrez@cumin1001: END (FAIL) - Cookbook sre.hosts.downtime (exit_code=99)
  • 08:37 vgutierrez@cumin1001: START - Cookbook sre.hosts.downtime
  • 08:36 godog: roll restart rsyslog in codfw/eqiad to pick up new kafka partitions
  • 08:18 godog: roll restart rsyslog in ulsfo/esams/eqsin to pick up new kafka partitions
  • 08:15 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db2092 for analyze table', diff saved to https://phabricator.wikimedia.org/P9465 and previous config saved to /var/cache/conftool/dbconfig/20191024-081519-marostegui.json
  • 07:57 XioNoX: reboot mr1-esams
  • 07:42 godog: bump rsyslog- topics partitions to 6 and roll-restart logstash frontends
  • 07:24 vgutierrez@cumin1001: END (FAIL) - Cookbook sre.hosts.downtime (exit_code=99)
  • 07:22 vgutierrez@cumin1001: START - Cookbook sre.hosts.downtime
  • 07:22 XioNoX: drain Telia link on cr2-esams
  • 06:32 oblivian@puppetmaster1001: conftool action : set/pooled=true; selector: dnsdisc=parsoid-php,name=eqiad
  • 05:20 marostegui@cumin1001: dbctl commit (dc=all): 'Fully repool db1097:3315 after compression', diff saved to https://phabricator.wikimedia.org/P9463 and previous config saved to /var/cache/conftool/dbconfig/20191024-052002-marostegui.json
  • 05:18 marostegui: Run analyze enwiki.revision on db2092 T223151
  • 04:59 marostegui@cumin1001: dbctl commit (dc=all): 'More traffic to db1097:3315 after compression', diff saved to https://phabricator.wikimedia.org/P9462 and previous config saved to /var/cache/conftool/dbconfig/20191024-045954-marostegui.json
  • 04:59 marostegui@cumin1001: dbctl commit (dc=all): 'Remove db1089 from special slaves group and leave it with its original pooling options T223151', diff saved to https://phabricator.wikimedia.org/P9461 and previous config saved to /var/cache/conftool/dbconfig/20191024-045924-marostegui.json
  • 04:55 marostegui@cumin1001: dbctl commit (dc=all): 'Slowly repool db1097:3315 after compression', diff saved to https://phabricator.wikimedia.org/P9460 and previous config saved to /var/cache/conftool/dbconfig/20191024-045544-marostegui.json
  • 04:48 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.decommission (exit_code=0)
  • 04:48 marostegui@cumin1001: START - Cookbook sre.hosts.decommission
  • 03:55 shdubsh: temporarily turn down accept delay on fermium - T235983
  • 00:03 mutante: restarting gerrit to increase heap_size from 20G to 32G (T225166 T222391)

2019-10-23

  • 22:55 brennen@deploy1001: Synchronized php-1.35.0-wmf.3/extensions/AbuseFilter: SWAT: Unbreak filter edit form (T236286) (duration: 01m 05s)
  • 22:20 twentyafterfour@deploy1001: Finished deploy [phabricator/deployment@e4e2b22]: deploy to phab1001 (currently a warm spare server) (duration: 00m 21s)
  • 22:20 twentyafterfour@deploy1001: Started deploy [phabricator/deployment@e4e2b22]: deploy to phab1001 (currently a warm spare server)
  • 22:20 twentyafterfour@deploy1001: Finished deploy [phabricator/deployment@e4e2b22]: deploy to phab1001 (currently a warm spare server) (duration: 00m 05s)
  • 22:19 twentyafterfour@deploy1001: Started deploy [phabricator/deployment@e4e2b22]: deploy to phab1001 (currently a warm spare server)
  • 22:15 twentyafterfour@deploy1001: Finished deploy [phabricator/deployment@e4e2b22]: deploy to phab1001 (currently a warm spare server) (duration: 01m 10s)
  • 22:14 twentyafterfour@deploy1001: Started deploy [phabricator/deployment@e4e2b22]: deploy to phab1001 (currently a warm spare server)
  • 22:00 twentyafterfour@deploy1001: Finished deploy [phabricator/deployment@e4e2b22]: deploy to phab1001 (currently a warm spare server) (duration: 00m 21s)
  • 22:00 twentyafterfour@deploy1001: Started deploy [phabricator/deployment@e4e2b22]: deploy to phab1001 (currently a warm spare server)
  • 21:32 mutante: webperf1002/2002 - starting bacula-fd service that is failed after initial puppet run turning them into backup::hosts
  • 21:14 ejegg: updated Fundraising python tools from b3c7453be2 to ffc7bf764b
  • 20:37 shdubsh: restart nagios-nrpe-server on stat1007
  • 18:56 milimetric@deploy1001: Finished deploy [analytics/refinery@3aaabf6]: Minor: fix two scripts (duration: 07m 53s)
  • 18:49 milimetric@deploy1001: Started deploy [analytics/refinery@3aaabf6]: Minor: fix two scripts
  • 18:29 mforns@deploy1001: Finished deploy [analytics/refinery@1110d59]: deploying refinery up to 1110d59 (duration: 06m 40s)
  • 18:22 mforns@deploy1001: Started deploy [analytics/refinery@1110d59]: deploying refinery up to 1110d59
  • 17:31 akosiaris: restart varnish-be on cp1089 as a response to HTTP availability alerts. High mailbox lag
  • 17:25 akosiaris: restart varnish-be on cp1081 as a response to HTTP availability alerts
  • 15:55 _joe_: restarting pybal on lvs2006, then 2003 for picking up parsoid-php
  • 15:32 marostegui: Enable slow query log 1/20 on db1089 (enwiki) T223151
  • 14:40 ema@cumin1001: END (PASS) - Cookbook sre.hosts.decommission (exit_code=0)
  • 14:39 ema@cumin1001: START - Cookbook sre.hosts.decommission
  • 14:38 ema@cumin1001: END (FAIL) - Cookbook sre.hosts.decommission (exit_code=1)
  • 14:37 ema@cumin1001: START - Cookbook sre.hosts.decommission
  • 14:36 ema@cumin1001: END (PASS) - Cookbook sre.hosts.decommission (exit_code=0)
  • 14:35 ema@cumin1001: START - Cookbook sre.hosts.decommission
  • 14:19 bblack: repooling esams
  • 14:00 hashar: Restarting CI Jenkins
  • 13:57 _joe_: manually changing the symlinked deployed version of parsoid on wtp1025 T236275
  • 13:35 XioNoX: migrate esams mgmt to new mgmt router
  • 13:34 effie: disable puppet on mwdebug1002 - T214734
  • 13:13 ssastry@deploy1001: Finished deploy [parsoid/deploy@451db1e]: Updating Parsoid to 5521ea74; Dummy Parsoid deploy to debug Parsoid/PHP deployment issues (duration: 08m 44s)
  • 13:07 liw@deploy1001: Synchronized php: group1 wikis to 1.35.0-wmf.3 (duration: 01m 00s)
  • 13:05 liw@deploy1001: rebuilt and synchronized wikiversions files: group1 wikis to 1.35.0-wmf.3
  • 13:04 ssastry@deploy1001: Started deploy [parsoid/deploy@451db1e]: Updating Parsoid to 5521ea74; Dummy Parsoid deploy to debug Parsoid/PHP deployment issues
  • 12:37 effie: Depool mwdebug1002 - T214734
  • 12:31 vgutierrez: restarting ats-tls on cache text nodes - T233274
  • 12:27 marostegui@cumin1001: dbctl commit (dc=all): 'Remove db1130 from the special slaves group on s5 and leave it back with its original pooling options T223151', diff saved to https://phabricator.wikimedia.org/P9454 and previous config saved to /var/cache/conftool/dbconfig/20191023-122708-marostegui.json
  • 11:26 XioNoX: powering down cr1-esams
  • 11:24 Urbanecm: EU SWAT done
  • 11:22 urbanecm@deploy1001: Synchronized wmf-config/InterwikiSortOrders.php: SWAT: e21054e: Add Balinese to interwiki sort orders (T234768) (duration: 01m 01s)
  • 11:18 Urbanecm: mwscript updateArticleCount.php --wiki=frwikiquote --update (T236212)
  • 11:16 urbanecm@deploy1001: Synchronized wmf-config/InitialiseSettings.php: SWAT: 0889da0: Add custom Minerva wordmark for Hebrew wikivoyage (2/2; T234278) (duration: 01m 01s)
  • 11:09 urbanecm@deploy1001: Synchronized static/images/mobile/copyright: SWAT: 0889da0: Add custom Minerva wordmark for Hebrew wikivoyage (1/2; T234278) (duration: 01m 01s)
  • 11:06 urbanecm@deploy1001: Synchronized wmf-config/InitialiseSettings.php: SWAT: cf8e2f1: Set $wgArticleCountMethod to any for frwikiquote (T236212) (duration: 01m 12s)
  • 10:46 ema: cp-ats: rolling ATS backend restart to apply https://gerrit.wikimedia.org/r/#/c/operations/puppet/+/545522/ T233274
  • 10:13 jynus: reverting dbtree revision to HEAD~1 T224589
  • 10:11 jynus: deploying new version of dbtree T224589
  • 10:04 ema: cp1075: ats-backend-restart to test https://gerrit.wikimedia.org/r/#/c/operations/puppet/+/545508/
  • 09:42 godog: bounce burrow-logging-eqiad.service on kafkamon1001
  • 09:40 godog: roll restart logstash to pick up new rsyslog-notice partitions
  • 09:31 godog: bump rsyslog-notice topic to 6 partitions
  • 09:00 moritzm: rebooting logstash2021 for some firmware tests
  • 08:59 jmm@cumin2001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
  • 08:59 jmm@cumin2001: START - Cookbook sre.hosts.downtime
  • 08:54 moritzm: installing systemd bugfix update on mw canaries
  • 08:50 aborrero@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
  • 08:50 aborrero@cumin1001: START - Cookbook sre.hosts.downtime
  • 08:42 godog: roll restart rsyslog on cirrus and wqds hosts to pick up changes to logback topic partitions
  • 08:28 marostegui@cumin1001: dbctl commit (dc=all): 'Repool db2091:3312 after table compression', diff saved to https://phabricator.wikimedia.org/P9452 and previous config saved to /var/cache/conftool/dbconfig/20191023-082826-marostegui.json
  • 08:23 godog: roll restart logstash in codfw/eqiad to pick up new kafka partitions
  • 08:22 marostegui@cumin1001: dbctl commit (dc=all): 'Change weights to x100 on s8 eqiad - T231018', diff saved to https://phabricator.wikimedia.org/P9451 and previous config saved to /var/cache/conftool/dbconfig/20191023-082246-marostegui.json
  • 08:11 godog: kafka-logging eqiad set 12 partitions for ^mwlog- ^logback- and eqiad.client.error topics
  • 08:09 marostegui@cumin1001: dbctl commit (dc=all): 'Change weights to x100 on s8 codfw - T231018', diff saved to https://phabricator.wikimedia.org/P9450 and previous config saved to /var/cache/conftool/dbconfig/20191023-080857-marostegui.json
  • 07:55 godog: kafka-logging delete unused topic syslog-notice
  • 07:51 marostegui@cumin1001: dbctl commit (dc=all): 'Change weights to x100 on s7 eqiad - T231018', diff saved to https://phabricator.wikimedia.org/P9449 and previous config saved to /var/cache/conftool/dbconfig/20191023-075106-marostegui.json
  • 07:48 marostegui@cumin1001: dbctl commit (dc=all): 'Change weights to x100 on s7 codfw - T231018', diff saved to https://phabricator.wikimedia.org/P9448 and previous config saved to /var/cache/conftool/dbconfig/20191023-074828-marostegui.json
  • 07:46 XioNoX: powering down cr2-esams for relocation (for real this time)
  • 07:38 marostegui@cumin1001: dbctl commit (dc=all): 'Change weights to x100 on s6 eqiad - T231018', diff saved to https://phabricator.wikimedia.org/P9447 and previous config saved to /var/cache/conftool/dbconfig/20191023-073831-marostegui.json
  • 07:35 marostegui@cumin1001: dbctl commit (dc=all): 'Change weights to x100 on s6 codfw - T231018', diff saved to https://phabricator.wikimedia.org/P9446 and previous config saved to /var/cache/conftool/dbconfig/20191023-073556-marostegui.json
  • 07:30 XioNoX: powering down cr2-esams for relocation
  • 07:28 hashar: logstash: refreshing index fields for logstash-* indices (via https://logstash.wikimedia.org/app/kibana#/management/kibana/indices/logstash-* ) # T234564
  • 07:05 XioNoX: redirect ns2 to eqiad - T235805
  • 07:04 marostegui: Enable slow query log 1/10 on db1089 (enwiki) T223151
  • 07:02 ayounsi@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
  • 07:02 ayounsi@cumin1001: START - Cookbook sre.hosts.downtime
  • 06:59 XioNoX: depool esams - T235805
  • 06:57 effie: Depooling mw1317
  • 06:54 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.decommission (exit_code=0)
  • 06:54 marostegui@cumin1001: START - Cookbook sre.hosts.decommission
  • 06:46 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.decommission (exit_code=0)
  • 06:46 marostegui@cumin1001: START - Cookbook sre.hosts.decommission
  • 06:38 marostegui: Compress tables on db1097:3315 T235599
  • 06:38 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db1097:3315 for compression T235599', diff saved to https://phabricator.wikimedia.org/P9445 and previous config saved to /var/cache/conftool/dbconfig/20191023-063800-marostegui.json
  • 05:29 ema@puppetmaster1001: conftool action : set/pooled=false; selector: dnsdisc=kibana,name=codfw
  • 05:29 marostegui@cumin1001: dbctl commit (dc=all): 'Fully repool db1096:3315 after maintenance maintenance', diff saved to https://phabricator.wikimedia.org/P9444 and previous config saved to /var/cache/conftool/dbconfig/20191023-052940-marostegui.json
  • 05:08 marostegui@cumin1001: dbctl commit (dc=all): 'More traffic to db1096:3315 after maintenance maintenance', diff saved to https://phabricator.wikimedia.org/P9443 and previous config saved to /var/cache/conftool/dbconfig/20191023-050812-marostegui.json
  • 04:57 marostegui@cumin1001: dbctl commit (dc=all): 'More traffic to db1096:3315 after maintenance maintenance', diff saved to https://phabricator.wikimedia.org/P9442 and previous config saved to /var/cache/conftool/dbconfig/20191023-045722-marostegui.json
  • 04:49 vgutierrez: repool cp5007 - T234887
  • 04:48 marostegui@cumin1001: dbctl commit (dc=all): 'Slowly repool db1096:3315 after maintenance maintenance', diff saved to https://phabricator.wikimedia.org/P9441 and previous config saved to /var/cache/conftool/dbconfig/20191023-044833-marostegui.json
  • 04:36 MaxSem: Fixed a page title via namespaceDupes.php on pswiki
  • 03:51 vgutierrez: depool cp5007 - T234887

2019-10-22

  • 23:57 maxsem@deploy1001: Synchronized php-1.35.0-wmf.3/includes/block/DatabaseBlock.php: https://gerrit.wikimedia.org/r/#/c/mediawiki/core/+/545373/ (duration: 00m 59s)
  • 23:53 maxsem@deploy1001: Synchronized wmf-config/wikitech.php: https://gerrit.wikimedia.org/r/#/c/operations/mediawiki-config/+/543943/ (duration: 01m 01s)
  • 23:43 maxsem@deploy1001: Synchronized dblists/: https://gerrit.wikimedia.org/r/#/c/operations/mediawiki-config/+/543664/ (duration: 00m 59s)
  • 23:41 maxsem@deploy1001: Synchronized wmf-config: https://gerrit.wikimedia.org/r/#/c/operations/mediawiki-config/+/543664/ (duration: 01m 01s)
  • 23:38 maxsem@deploy1001: Synchronized dblists/labtestwiki.dblist: https://gerrit.wikimedia.org/r/#/c/operations/mediawiki-config/+/543664/ (duration: 01m 02s)
  • 23:32 mutante: LDAP - added keepit-ssh to wmf group (T236209)
  • 22:23 ejegg: updated Fundraising CiviCRM from ff69d64ad4 to 47e0800001
  • 21:57 thcipriani: stopping gerrit to run ref-update script T236114
  • 21:57 thcipriani: stopping gerrit to run ref-update script
  • 21:45 mutante: LDAP - added lexnasser to nda group (T235688)
  • 21:07 eileen: process-control config revision is 95ee1bafb3 dedupe job re-enabled
  • 20:09 mutante: gerrit1001 - mkdir /srv/gerrit/cobalt/git - rsyncing /srv/gerrit/git from cobalt to /srv/gerrit/cobalt/git/ on gerrit1001 (T236114)
  • 19:42 hashar: gerrit1001: apt install colordiff # T236114
  • 19:27 brennen@deploy1001: rebuilt and synchronized wikiversions files: group0 to 1.35.0-wmf.3
  • 19:03 brennen: proceeding with train for 1.35.0-wmf.3
  • 18:09 mutante: DNS - added new Wikipedia language "mnw" (Mon) T235739 - a language spoken in Myanmar
  • 17:59 sbassett: Uploaded and applied (but did not deploy per releng) security fix for T234450 to wmf.3
  • 17:57 sbassett: Deployed security fix for T234450 to wmf.2
  • 17:57 mholloway-shell@deploy1001: Finished deploy [mobileapps/deploy@b4c484a]: Build structured talk pages by walking the DOM (T235213) (duration: 05m 14s)
  • 17:54 mutante: restarting gerrit to disable jgit gc (T236114)
  • 17:51 mholloway-shell@deploy1001: Started deploy [mobileapps/deploy@b4c484a]: Build structured talk pages by walking the DOM (T235213)
  • 17:37 arlolra: Updated Parsoid to cf01d91 (T234057, T234768, T235296, T235684, T235563)
  • 17:26 arlolra@deploy1001: Finished deploy [parsoid/deploy@4c64c9c]: Updating Parsoid to cf01d91 (duration: 07m 37s)
  • 17:20 bblack: geodns: re-pooling esams (at this point, we're entirely back in our "normal" state of affairs)
  • 17:19 arlolra@deploy1001: Started deploy [parsoid/deploy@4c64c9c]: Updating Parsoid to cf01d91
  • 16:51 bblack: geodns: moving all "normal" eqiad traffic back to eqiad (in addition to the esams-diverted traffic which is still pointed mostly at eqiad right now)
  • 16:21 mutante: running puppet on deployment servers
  • 16:20 thcipriani: restarting gerrit
  • 16:14 thcipriani: stopping gerrit to run a fix for T222391
  • 15:58 bblack: depooling esams temporarily to test traffic scenario on lvs1014
  • 15:47 bblack: enable pybal+puppet on rebooted lvs1014
  • 15:40 bblack: rebooting lvs1014
  • 15:28 liw@deploy1001: Finished scap: testwiki to php-1.35.0-wmf.3 and rebuild l10n cache (duration: 37m 39s)
  • 15:26 XioNoX: repool esams
  • 15:20 XioNoX: rollback ns2 redirect
  • 15:13 bblack: re-disabling lvs1014 ...
  • 15:10 bblack: re-enabling lvs1014 pybal/puppet
  • 15:03 moritzm: rebooting kafka-main1005 for microcode debugging
  • 15:01 jmm@cumin2001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
  • 15:01 jmm@cumin2001: START - Cookbook sre.hosts.downtime
  • 14:52 bblack: stopping puppet and pybal on lvs1014 (upload+maps traffic to 1016)
  • 14:50 liw@deploy1001: Started scap: testwiki to php-1.35.0-wmf.3 and rebuild l10n cache
  • 14:45 mbsantos@deploy1001: Finished deploy [kartotherian/deploy@85ea6e1]: Deploy kartotherian 1.1.5-wmf.0 (duration: 02m 44s)
  • 14:42 mbsantos@deploy1001: Started deploy [kartotherian/deploy@85ea6e1]: Deploy kartotherian 1.1.5-wmf.0
  • 14:13 XioNoX: restart asw-esams for onsite work
  • 13:52 andrewbogott: restarted slapd on ldap-eqiad-replica01
  • 13:38 gehel: silencing LVS check for katotherian (we know there is an issue) - T236163
  • 13:35 liw@deploy1001: scap failed: CalledProcessError Command '/usr/local/bin/mwscript rebuildLocalisationCache.php --wiki="labtestwiki" --outdir="/tmp/scap_l10n_2419219323" --threads=30 --lang en --quiet' returned non-zero exit status 1 (duration: 06m 40s)
  • 13:28 liw@deploy1001: Started scap: testwiki to php-1.34.0-wmf.3 and rebuild l10n cache
  • 13:13 ayounsi@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
  • 13:13 ayounsi@cumin1001: START - Cookbook sre.hosts.downtime
  • 13:06 XioNoX: depool esams for onsite work - T235805
  • 13:05 marostegui@cumin1001: dbctl commit (dc=all): 'Fully repool db1096:3316 db1105:3311 db1105:3312 after PDU and on-site maintenance', diff saved to https://phabricator.wikimedia.org/P9434 and previous config saved to /var/cache/conftool/dbconfig/20191022-130556-marostegui.json
  • 12:54 marostegui@cumin1001: dbctl commit (dc=all): 'More traffic to db1096:3316 db1105:3311 instance db1105:3312 after PDU and on-site maintenance', diff saved to https://phabricator.wikimedia.org/P9433 and previous config saved to /var/cache/conftool/dbconfig/20191022-125435-marostegui.json
  • 12:46 marostegui@cumin1001: dbctl commit (dc=all): 'More traffic to db1096:3316 db1105:3311 instance db1105:3312 after PDU and on-site maintenance', diff saved to https://phabricator.wikimedia.org/P9432 and previous config saved to /var/cache/conftool/dbconfig/20191022-124607-marostegui.json
  • 12:37 marostegui@cumin1001: dbctl commit (dc=all): 'Slowly repool db1096:3316 after PDU maintenance', diff saved to https://phabricator.wikimedia.org/P9431 and previous config saved to /var/cache/conftool/dbconfig/20191022-123757-marostegui.json
  • 12:32 marostegui@cumin1001: dbctl commit (dc=all): 'Slowly repool db1105:3312 and db1105:3311 after on-site maintenance T235877', diff saved to https://phabricator.wikimedia.org/P9430 and previous config saved to /var/cache/conftool/dbconfig/20191022-123257-marostegui.json
  • 12:30 marostegui@cumin1001: dbctl commit (dc=all): 'Repool db2089:3315', diff saved to https://phabricator.wikimedia.org/P9429 and previous config saved to /var/cache/conftool/dbconfig/20191022-123032-marostegui.json
  • 12:29 moritzm: rebooting miscweb2001 for some microcode tests
  • 12:28 marostegui: Compress db1096:3315
  • 12:27 jmm@cumin2001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
  • 12:27 jmm@cumin2001: START - Cookbook sre.hosts.downtime
  • 12:25 marostegui@deploy1001: Synchronized wmf-config/db-eqiad.php: Repool pc1007 after PDU maintenance T227142 (duration: 00m 50s)
  • 12:15 jynus: reimage to buster dbmonitor2001.wikimedia.org T224589
  • 11:57 liw: starting to cut branch for train 1.35-wmf.3
  • 11:51 hashar: Restarted CI Jenkins on contint1001
  • 11:35 marostegui: Stop MySQL on db1105:3311, db1105:3312 for firmware upgrade - T235877
  • 11:34 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db1105:3311, db1105:3312 for firmware upgrade T235877', diff saved to https://phabricator.wikimedia.org/P9428 and previous config saved to /var/cache/conftool/dbconfig/20191022-113437-marostegui.json
  • 11:29 Urbanecm: EU SWAT done
  • 11:28 urbanecm@deploy1001: Synchronized php-1.35.0-wmf.2/extensions/VisualEditor/: SWAT: 2bc4420 (T235707); 680a98b (T233320); d83265d (T234564) (duration: 00m 53s)
  • 11:10 urbanecm@deploy1001: Synchronized wmf-config/InitialiseSettings.php: SWAT: 0593f34: Change the language of Votewiki to Persian (fa) temporarily for the annual ArbCom elections (T230614) (duration: 00m 54s)
  • 10:55 moritzm: rebooting rpki2001 for some microcode tests
  • 10:54 jmm@cumin2001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
  • 10:54 jmm@cumin2001: START - Cookbook sre.hosts.downtime
  • 10:37 ema@puppetmaster1001: conftool action : set/pooled=true; selector: dnsdisc=kibana
  • 10:32 jynus: shutting down db1115 in preparation for PDU maintanance, this will make tendril and dbtree unavailable for 2 hours T227142
  • 10:21 ema: lvs2003: restart pybal to add new service kibana-ssl T210411
  • 10:18 ema: lvs1015: restart pybal to add new service kibana-ssl T210411
  • 10:14 ema: puppetmaster1001: rm /var/run/confd-template/.kibana-ssl*.err to make confd icinga check happy T210411
  • 10:02 ema@puppetmaster1001: conftool action : set/pooled=yes; selector: service=kibana-ssl
  • 09:54 ema: lvs2006: restart pybal to add new service kibana-ssl T210411
  • 09:54 ema: lvs1016: restart pybal to add new service kibana-ssl T210411
  • 09:13 marostegui@cumin1001: dbctl commit (dc=all): 'Change weights to x100 on s4 eqiad - T231018', diff saved to https://phabricator.wikimedia.org/P9425 and previous config saved to /var/cache/conftool/dbconfig/20191022-091327-marostegui.json
  • 09:10 marostegui@cumin1001: dbctl commit (dc=all): 'Change weights to x100 on s4 codfw - T231018', diff saved to https://phabricator.wikimedia.org/P9424 and previous config saved to /var/cache/conftool/dbconfig/20191022-091051-marostegui.json
  • 08:05 marostegui: Stop MySQL on labsdb1012 for PDU work T227142
  • 07:53 marostegui: Stop MySQL on db1116 pc1007 db1096:3315, db1096:3316 for PDU maintenance T227142
  • 07:18 moritzm: installing tcpdump security updates
  • 06:43 marostegui@deploy1001: Synchronized wmf-config/db-eqiad.php: Depool pc1010 T227142 (duration: 00m 52s)
  • 06:32 vgutierrez: rolling restart of ats-tls - T233274 T234803
  • 05:51 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db1096 for PDU maintenance', diff saved to https://phabricator.wikimedia.org/P9423 and previous config saved to /var/cache/conftool/dbconfig/20191022-055151-marostegui.json
  • 05:48 marostegui@cumin1001: dbctl commit (dc=all): 'Remove db1070 from config T235464', diff saved to https://phabricator.wikimedia.org/P9422 and previous config saved to /var/cache/conftool/dbconfig/20191022-054759-marostegui.json
  • 05:41 marostegui: Stop mysql on db1070 - T235464
  • 05:40 marostegui@deploy1001: Synchronized wmf-config/db-eqiad.php: Remove db1070 from config T235464 (duration: 00m 51s)
  • 05:40 marostegui: Remove db1070 from tendril and zarcillo - T235464
  • 05:39 marostegui@deploy1001: Synchronized wmf-config/db-codfw.php: Remove db1070 from config T235464 (duration: 00m 53s)
  • 05:33 vgutierrez: Switch from nginx to ats-tls on cp1090 - T231433
  • 05:24 vgutierrez: repooling cp2025 - T231433
  • 05:20 vgutierrez: depooling cp2025 to fix ATS/nginx configuration - T231433
  • 05:16 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.decommission (exit_code=0)
  • 05:16 marostegui@cumin1001: START - Cookbook sre.hosts.decommission
  • 05:08 vgutierrez: Switch from nginx to ats-tls on cp1088 - T231433
  • 05:02 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db2089:3315 for compression T235599', diff saved to https://phabricator.wikimedia.org/P9421 and previous config saved to /var/cache/conftool/dbconfig/20191022-050204-marostegui.json
  • 05:00 marostegui@cumin1001: dbctl commit (dc=all): 'Repool db2084:3314 after compression', diff saved to https://phabricator.wikimedia.org/P9420 and previous config saved to /var/cache/conftool/dbconfig/20191022-050048-marostegui.json
  • 04:58 vgutierrez: Switch from nginx to ats-tls on cp2026 - T231433
  • 04:30 vgutierrez: Switch from nginx to ats-tls on cp2024 - T231433
  • 04:18 vgutierrez: Switch from nginx to ats-tls on cp3049 - T231433
  • 03:44 vgutierrez: Switch from nginx to ats-tls on cp3047 - T231433
  • 01:12 eileen: disabled dedupe job pending T236096 deploy
  • 01:12 eileen: process-control config revision is 782a14c7d9

2019-10-21

  • 23:15 thcipriani: ops/puppet:sudo -u gerrit2 git update-ref refs/changes/66/535966/meta d6909e0 && sudo -u gerrit2 git update-ref refs/changes/66/535966/meta 8494c28 on gerrit1001
  • 23:11 mutante: rsynced operations/puppet.git/objects from cobalt to gerrit1001 (and backup in /root) (T222391)
  • 22:23 mutante: mw1340 - restarting php7.2-fpm, restarting apache2
  • 21:27 mutante: gerrit1001 manually running command from "list_mediawiki_extensions" cron (T222391)
  • 21:26 cdanis: βœ”οΈ cdanis@cumin1001.eqiad.wmnet ~ πŸ•”πŸΊ sudo cumin -b 30 -p 95 '*' 'run-puppet-agent -q --failed-only'
  • 21:23 thcipriani: ssh -p 29418 gerrit.wikimedia.org -- gerrit index start changes --force
  • 21:21 mutante: copied apache config for gerrit.wm.org site from cobalt to gerrit1001, restarted apache2, ran puppet again. gerrit back up (T222391)
  • 21:18 mutante: copied apache config for gerrit.wm.org site from cobalt to gerrit1001, restarted apache2
  • 21:16 cdanis: previous cumin invocation was to unblock gerrit migration; will be automatically restored to usual on next puppet run. T222391
  • 21:12 cdanis: βœ”οΈ cdanis@cumin1001.eqiad.wmnet ~ πŸ•”πŸΊ sudo cumin A:dns-auth 'perl -p -i".bak" -e "s/gerrit\./gerrit-replica./" /etc/wikimedia-authdns.conf'
  • 20:57 mutante: running puppet on gerrit1001
  • 20:57 thcipriani: running puppet on cobalt
  • 20:52 mutante: rsyncing gerrit-data/plugins and /var/lib/gerrit2/review_site/ again
  • 20:51 mutante: rsyncing gerrit-data/git again
  • 20:50 thcipriani: stopping gerrit on cobalt
  • 20:44 maxsem@deploy1001: Synchronized wmf-config/InitialiseSettings.php: Touch (duration: 00m 52s)
  • 20:37 mutante: disabled puppet on cobalt and gerrit2001
  • 20:29 mutante: running puppet on dbproxy10017 to apply ferm change for gerrit db from gerrit1001 (T222391)
  • 20:25 mutante: gerrit1001 - puppet agent disabled - gerrit service stopped
  • 20:19 mholloway-shell@deploy1001: Finished deploy [mobileapps/deploy@0c6d34b]: Update mobileapps to d6a6e7f (duration: 06m 02s)
  • 20:13 mholloway-shell@deploy1001: Started deploy [mobileapps/deploy@0c6d34b]: Update mobileapps to d6a6e7f
  • 20:12 mutante: rsyncing /var/lib/gerrit2/review_site from cobalt to gerrit1001 (T222391)
  • 20:10 maxsem@deploy1001: Synchronized wmf-config/InitialiseSettings.php: https://gerrit.wikimedia.org/r/#/c/operations/mediawiki-config/+/545027/ T235949 (duration: 00m 52s)
  • 20:08 mutante: rsynced /srv/gerrit/plugins from cobalt to gerrit1001 (T222391)
  • 20:08 mutante: rsynced /srv/gerrit/git from cobalt to gerrit1001 (T222391)
  • 18:43 Urbanecm: Morning SWAT done
  • 18:41 urbanecm@deploy1001: Synchronized php-1.35.0-wmf.2/extensions/VisualEditor: SWAT: a4ab456: TreeModifier: Ignore removed nodes properly when normalizing from a text node (T235959); ecb4532: Update VE core submodule to a4ab456dc0 (T235959); a850cee: ApiVisualEditor: Always return etag with content (T233320) (duration: 00m 55s)
  • 18:32 robh: ps1-23-ulsfo back online, all pdu work in ulsfo is now complete T235911
  • 18:30 robh: ps1-22-ulsfo repaired (reseating its NIC rebooted its mgmt interface) Done with it and repeating on ps1-23-ulsfo via T235911
  • 18:24 robh: working on ps1-22-ulsfo via T235911 (it may flap but it is already ack'd as down in icinga, but not persistent)
  • 17:13 onimisionipe@deploy1001: Finished deploy [wdqs/wdqs@75c0577]: GUI Updates (duration: 11m 37s)
  • 17:08 jforrester@deploy1001: Synchronized php-1.35.0-wmf.1/extensions/VisualEditor/: Update VisualEditor for set of back-ports in wmf.1 T233320, T234564, T235959 (duration: 00m 56s)
  • 17:01 onimisionipe@deploy1001: Started deploy [wdqs/wdqs@75c0577]: GUI Updates
  • 14:16 jhuneidi@deploy1001: rebuilt and synchronized wikiversions files: all wikis to 1.35.0-wmf.2 refs T233850
  • 13:46 Urbanecm: Deploy sec patch for T104807
  • 13:26 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db2084:3314 and db2091:3312 for table compression', diff saved to https://phabricator.wikimedia.org/P9412 and previous config saved to /var/cache/conftool/dbconfig/20191021-132633-marostegui.json
  • 13:24 marostegui@cumin1001: dbctl commit (dc=all): 'Change weights 1/2 to 100/200 on s2 codfw - T231018', diff saved to https://phabricator.wikimedia.org/P9411 and previous config saved to /var/cache/conftool/dbconfig/20191021-132440-marostegui.json
  • 13:21 marostegui@cumin1001: dbctl commit (dc=all): 'Change weights 1/2 to 100/200 on s2 eqiad - T231018', diff saved to https://phabricator.wikimedia.org/P9410 and previous config saved to /var/cache/conftool/dbconfig/20191021-132145-marostegui.json
  • 13:07 ema: lvs1015: restart pybal to add new service wdqs-ssl T210411
  • 13:04 marostegui: Deploy schema change on db1122 (s2 primary master) - T233135 T234066
  • 13:04 ema: lvs2003: restart pybal to add new service wdqs-ssl T210411
  • 13:03 marostegui@cumin1001: dbctl commit (dc=all): 'Repool db1090:3312 after schema change and remove db1129 from vslow and dump as it was was there temporarily', diff saved to https://phabricator.wikimedia.org/P9409 and previous config saved to /var/cache/conftool/dbconfig/20191021-130355-marostegui.json
  • 13:02 ema: lvs1016: restart pybal to add new service wdqs-ssl T210411
  • 13:00 ema@puppetmaster1001: conftool action : set/pooled=yes; selector: service=wdqs-ssl
  • 12:58 ema: lvs2006: restart pybal to add new service wdqs-ssl T210411
  • 12:38 hashar: Started zuul-merger on contint2001
  • 12:32 hashar: Stopped zuul-merger on contint2001
  • 12:31 hashar: Started zuul-merger on contint1001
  • 12:16 hashar: Stopped zuul-merger on contint1001
  • 12:02 Urbanecm: EU SWAT finally done
  • 12:01 urbanecm@deploy1001: Synchronized wmf-config/InitialiseSettings.php: SWAT: e8d70c1: Partial cleanup of InitialiseSettings (T231178) (duration: 01m 00s)
  • 12:00 Urbanecm: I'm going to do one last sync for EU SWAT
  • 11:57 urbanecm@deploy1001: Synchronized wmf-config/InitialiseSettings.php: SWAT: 12e3549: Create Portal namespace for sawikisource (T235343) (duration: 00m 59s)
  • 11:55 urbanecm@deploy1001: sync-file aborted: SWAT: 12e3549: Create Portal namespace for sawikisource (duration: 00m 01s)
  • 11:54 urbanecm@deploy1001: Synchronized wmf-config/InitialiseSettings.php: SWAT: 3b1350b: wgCopyUploadDomains: Add iip.bu.uni.wroc.pl there (T235904) (duration: 00m 59s)
  • 11:49 Urbanecm: Reopen EU SWAT
  • 11:42 awight: EU SWAT complete
  • 11:42 awight@deploy1001: Synchronized wmf-config/InitialiseSettings-labs.php: SWAT: Put reference previews back into beta mode on beta cluster (T233813) (duration: 01m 00s)
  • 11:38 kartik@deploy1001: Synchronized wmf-config/InitialiseSettings.php: SWAT: 543764|Enable ContentTranslation out of Beta in Malayalam/Bengali/Mongolian WPs (T233008, T233009, T234317) (duration: 01m 00s)
  • 11:34 moritzm: installing Java security updates on restbase-dev1004
  • 11:30 mobrovac@deploy1001: Synchronized php-1.35.0-wmf.2/tests/phpunit/includes/Storage/SqlBlobStoreTest.php: SqlBlobStore HOT FIX: remove caching from getBlobBatch; file 3/3 - T235188 (duration: 01m 00s)
  • 11:28 mobrovac@deploy1001: Synchronized php-1.35.0-wmf.2/includes/libs/objectcache/wancache/WANObjectCache.php: SqlBlobStore HOT FIX: remove caching from getBlobBatch; file 2/3 - T235188 (duration: 00m 59s)
  • 11:25 mobrovac@deploy1001: Synchronized php-1.35.0-wmf.2/includes/Storage/SqlBlobStore.php: SqlBlobStore HOT FIX: remove caching from getBlobBatch; file 1/3 - T235188 (duration: 01m 00s)
  • 11:15 jbond@cumin1001: END (PASS) - Cookbook sre.hosts.decommission (exit_code=0)
  • 11:14 jbond@cumin1001: START - Cookbook sre.hosts.decommission
  • 10:39 jdrewniak@deploy1001: Synchronized portals: Wikimedia Portals Update: Bumping portals to master (T128546) (duration: 01m 00s)
  • 10:38 jdrewniak@deploy1001: Synchronized portals/wikipedia.org/assets: Wikimedia Portals Update: Bumping portals to master (T128546) (duration: 01m 00s)
  • 10:19 hashar: contint1001 / contint2001 : marking integration/config zuul merger repo readonly: sudo chown -R root:root /srv/zuul/git/integration/config
  • 10:13 hashar: CI in trouble due to a huge number of changes
  • 10:08 jmm@cumin2001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
  • 10:06 jmm@cumin2001: START - Cookbook sre.hosts.downtime
  • 09:51 Amir1: maintenance script is done
  • 09:35 moritzm: removing PHP 7.0 from deployment servers
  • 09:20 Amir1: start of foreachwikiindblist wikidataclient extensions/Wikibase/lib/maintenance/populateSitesTable.php --force-protocol https (T234774)
  • 09:18 moritzm: installing php7.0 security updates
  • 09:11 moritzm: installing subversion updates on Stretch (fixes compatibility with security fix for Apache update)
  • 09:07 moritzm: installing jackson-databind security updates
  • 09:01 moritzm: installing openjpeg2 security updates
  • 08:52 godog: roll-restart logstash to apply https://gerrit.wikimedia.org/r/c/operations/puppet/+/544209
  • 08:34 Urbanecm: Deploy security patch (T234862)
  • 08:34 vgutierrez: Switch from nginx to ats-tls on cp2022 - T231627
  • 08:30 ema: pool cp4029 with ATS backend T227432
  • 08:20 vgutierrez: Switch from nginx to ats-tls on cp2020 - T231627
  • 08:09 vgutierrez: Switch from nginx to ats-tls on cp2018 - T231627
  • 08:08 godog: swift eqiad-prod: add weight to ms-be105[1-6] - T232367
  • 08:03 godog: swift codfw-prod: final weight to ms-be205[1-6] - T233638
  • 07:59 ema@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
  • 07:57 vgutierrez: Switch from nginx to ats-tls on cp3046 - T231627
  • 07:57 ema@cumin1001: START - Cookbook sre.hosts.downtime
  • 07:50 ema@puppetmaster1001: conftool action : set/weight=100; selector: name=cp4029.ulsfo.wmnet,service=ats-be
  • 07:45 moritzm: installing aspell security updates on jessie
  • 07:43 vgutierrez: Switch from nginx to ats-tls on cp3045 - T231627
  • 07:35 moritzm: installing openjdk-11 security updates
  • 07:32 ema: depool cp4029 and reimage as text_ats T227432
  • 07:15 vgutierrez: Switch from nginx to ats-tls on cp1075 - T231627
  • 07:06 marostegui@cumin1001: dbctl commit (dc=all): 'Pool non partitioned db1089 into s1 special slaves to check for slow queries T223151', diff saved to https://phabricator.wikimedia.org/P9406 and previous config saved to /var/cache/conftool/dbconfig/20191021-070655-marostegui.json
  • 07:03 marostegui@cumin1001: dbctl commit (dc=all): 'Change weights from 1 to 100 on s1 eqiad - T231018', diff saved to https://phabricator.wikimedia.org/P9405 and previous config saved to /var/cache/conftool/dbconfig/20191021-070352-marostegui.json
  • 07:01 marostegui@cumin1001: dbctl commit (dc=all): 'Change weights from 1 to 100 on s1 codfw - T231018', diff saved to https://phabricator.wikimedia.org/P9404 and previous config saved to /var/cache/conftool/dbconfig/20191021-070119-marostegui.json
  • 06:59 vgutierrez: Switch from nginx to ats-tls on cp2001 - T231627
  • 06:46 vgutierrez: Switch from nginx to ats-tls on cp3030 - T231627
  • 06:28 vgutierrez: Install python3-cryptography-2.6.1-3+deb10u2 on acme-chief hosts - T234131
  • 06:15 marostegui@cumin1001: dbctl commit (dc=all): 'Fully repool db1105:3311', diff saved to https://phabricator.wikimedia.org/P9403 and previous config saved to /var/cache/conftool/dbconfig/20191021-061518-marostegui.json
  • 06:12 vgutierrez: Switch cp1086 from nginx to ats-tls - T231433
  • 06:10 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.decommission (exit_code=0)
  • 06:10 marostegui@cumin1001: START - Cookbook sre.hosts.decommission
  • 05:58 marostegui@cumin1001: dbctl commit (dc=all): 'Give weight 100 to db1130 on s5 to check for slow queries T223151', diff saved to https://phabricator.wikimedia.org/P9402 and previous config saved to /var/cache/conftool/dbconfig/20191021-055843-marostegui.json
  • 05:54 vgutierrez: Switch cp2017 from nginx to ats-tls - T231433
  • 05:50 marostegui@cumin1001: dbctl commit (dc=all): 'Give more traffic to db1105:3311', diff saved to https://phabricator.wikimedia.org/P9401 and previous config saved to /var/cache/conftool/dbconfig/20191021-055017-marostegui.json
  • 05:43 marostegui@cumin1001: dbctl commit (dc=all): 'Remove db2048 and db2061, those hosts will be decommissioned T228258', diff saved to https://phabricator.wikimedia.org/P9400 and previous config saved to /var/cache/conftool/dbconfig/20191021-054340-marostegui.json
  • 05:42 _joe_: slowly removing service objects from production etcd T233973
  • 05:38 vgutierrez: Switch cp3044 from nginx to ats-tls - T231433
  • 05:37 marostegui@cumin1001: dbctl commit (dc=all): 'Give more traffic to db1105:3311', diff saved to https://phabricator.wikimedia.org/P9399 and previous config saved to /var/cache/conftool/dbconfig/20191021-053737-marostegui.json
  • 05:30 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.decommission (exit_code=0)
  • 05:30 marostegui@cumin1001: START - Cookbook sre.hosts.decommission
  • 05:28 marostegui: Compress tables on db2084:3314 db2091:3312 - T235599
  • 05:26 marostegui@cumin1001: dbctl commit (dc=all): 'Repool db1105:3312', diff saved to https://phabricator.wikimedia.org/P9398 and previous config saved to /var/cache/conftool/dbconfig/20191021-052643-marostegui.json
  • 05:25 marostegui@cumin1001: dbctl commit (dc=all): 'Repool db2088:3312 db2084:3315 - T235599', diff saved to https://phabricator.wikimedia.org/P9397 and previous config saved to /var/cache/conftool/dbconfig/20191021-052527-marostegui.json
  • 05:20 marostegui@cumin1001: dbctl commit (dc=all): 'Slowly repool db1105:3311', diff saved to https://phabricator.wikimedia.org/P9396 and previous config saved to /var/cache/conftool/dbconfig/20191021-052035-marostegui.json
  • 05:19 vgutierrez: Switch cp4026 from nginx to ats-tls - T231433
  • 05:14 marostegui: Deploy schema change on db1090:3312 T234066 T233135
  • 05:13 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db1090:3312 for schema change and pool db1129 temporarily in vslow, dump', diff saved to https://phabricator.wikimedia.org/P9395 and previous config saved to /var/cache/conftool/dbconfig/20191021-051356-marostegui.json
  • 05:09 marostegui: Deploy schema change on s7 primary master db1062 - T234066 T233135
  • 04:57 vgutierrez: Switch cp5006 from nginx to ats-tls - T231433

2019-10-19

  • 08:41 XioNoX: add user papaul to fasw-c-eqiad
  • 00:05 mutante: LDAP - adding verenali to wmde and nda groups, to match raja_wmde (T233807, T231677)

2019-10-18

  • 22:52 dzahn@cumin1001: conftool action : set/pooled=yes; selector: name=wtp1047.eqiad.wmnet,service=parsoid-php
  • 22:48 dzahn@cumin1001: conftool action : set/pooled=yes; selector: name=wtp1048.eqiad.wmnet,service=parsoid-php
  • 22:43 dzahn@cumin1001: conftool action : set/pooled=yes; selector: name=wtp1044.eqiad.wmnet,service=parsoid-php
  • 22:42 dzahn@cumin1001: conftool action : set/pooled=yes; selector: name=wtp1042.eqiad.wmnet,service=parsoid-php
  • 22:41 dzahn@cumin1001: conftool action : set/pooled=yes; selector: name=wtp1043.eqiad.wmnet,service=parsoid-php
  • 22:41 dzahn@cumin1001: conftool action : set/pooled=yes; selector: name=wtp1046.eqiad.wmnet,service=parsoid-php
  • 22:28 dzahn@cumin1001: conftool action : set/pooled=yes; selector: name=wtp1041.eqiad.wmnet,service=parsoid-php
  • 22:27 dzahn@cumin1001: conftool action : set/pooled=yes; selector: name=wtp1038.eqiad.wmnet,service=parsoid-php
  • 22:25 dzahn@cumin1001: conftool action : set/pooled=yes; selector: name=wtp2020.codfw.wmnet,service=parsoid-php
  • 22:25 dzahn@cumin1001: conftool action : set/pooled=yes; selector: name=wtp2019.codfw.wmnet,service=parsoid-php
  • 22:23 dzahn@cumin1001: conftool action : set/pooled=yes; selector: name=wtp1040.eqiad.wmnet,service=parsoid-php
  • 22:20 dzahn@cumin1001: conftool action : set/pooled=yes; selector: name=wtp1038.eqiad.wmnet,service=parsoid-php
  • 22:17 dzahn@cumin1001: conftool action : set/pooled=yes; selector: name=wtp2020.codfw.wmnet,service=parsoid-php
  • 22:16 dzahn@cumin1001: conftool action : set/pooled=yes; selector: name=wtp2018.codfw.wmnet,service=parsoid-php
  • 22:16 dzahn@cumin1001: conftool action : set/pooled=yes; selector: name=wtp2017.codfw.wmnet,service=parsoid-php
  • 22:15 dzahn@cumin1001: conftool action : set/pooled=yes; selector: name=wtp2016.codfw.wmnet,service=parsoid-php
  • 22:11 dzahn@cumin1001: conftool action : set/pooled=yes; selector: name=wtp2015.codfw.wmnet,service=parsoid-php
  • 22:11 dzahn@cumin1001: conftool action : set/pooled=yes; selector: name=wtp2014.codfw.wmnet,service=parsoid-php
  • 22:11 dzahn@cumin1001: conftool action : set/pooled=yes; selector: name=wtp2013.codfw.wmnet,service=parsoid-php
  • 22:10 dzahn@cumin1001: conftool action : set/pooled=yes; selector: name=wtp2032.codfw.wmnet,service=parsoid-php
  • 22:10 dzahn@cumin1001: conftool action : set/pooled=yes; selector: name=wtp2012.codfw.wmnet,service=parsoid-php
  • 22:10 dzahn@cumin1001: conftool action : set/pooled=yes; selector: name=wtp2011.codfw.wmnet,service=parsoid-php
  • 22:08 dzahn@cumin1001: conftool action : set/pooled=yes; selector: name=wtp1045.eqiad.wmnet,service=parsoid-php
  • 21:58 dzahn@cumin1001: conftool action : set/pooled=yes; selector: name=wtp2010.codfw.wmnet,service=parsoid-php
  • 21:58 dzahn@cumin1001: conftool action : set/pooled=yes; selector: name=wtp2009.codfw.wmnet,service=parsoid-php
  • 21:58 dzahn@cumin1001: conftool action : set/pooled=yes; selector: name=wtp2008.codfw.wmnet,service=parsoid-php
  • 21:58 dzahn@cumin1001: conftool action : set/pooled=yes; selector: name=wtp2007.codfw.wmnet,service=parsoid-php
  • 21:57 dzahn@cumin1001: conftool action : set/pooled=yes; selector: name=wtp2006.codfw.wmnet,service=parsoid-php
  • 21:55 dzahn@cumin1001: conftool action : set/pooled=yes; selector: name=wtp1037.eqiad.wmnet,service=parsoid-php
  • 21:44 dzahn@cumin1001: conftool action : set/pooled=yes; selector: name=wtp1039.eqiad.wmnet,service=parsoid-php
  • 21:42 dzahn@cumin1001: conftool action : set/pooled=yes; selector: name=wtp1036.eqiad.wmnet,service=parsoid-php
  • 21:42 dzahn@cumin1001: conftool action : set/pooled=yes; selector: name=wtp1035.eqiad.wmnet,service=parsoid-php
  • 21:41 dzahn@cumin1001: conftool action : set/pooled=yes; selector: name=wtp1034.eqiad.wmnet,service=parsoid-php
  • 21:08 dzahn@cumin1001: conftool action : set/pooled=yes; selector: name=wtp1033.eqiad.wmnet,service=parsoid-php
  • 20:52 dzahn@cumin1001: conftool action : set/pooled=yes; selector: name=wtp1032.eqiad.wmnet,service=parsoid-php
  • 20:42 dzahn@cumin1001: conftool action : set/pooled=yes; selector: name=wtp2005.codfw.wmnet,service=parsoid-php
  • 20:42 dzahn@cumin1001: conftool action : set/pooled=yes; selector: name=wtp2004.codfw.wmnet,service=parsoid-php
  • 20:38 dzahn@cumin1001: conftool action : set/pooled=yes; selector: name=wtp1031.eqiad.wmnet,service=parsoid-php
  • 20:37 dzahn@cumin1001: conftool action : set/pooled=yes; selector: name=wtp1027.eqiad.wmnet,service=parsoid-php
  • 20:27 dzahn@cumin1001: conftool action : set/pooled=yes; selector: name=wtp1031.eqiad.wmnet,service=parsoid-php
  • 20:18 dzahn@cumin1001: conftool action : set/pooled=yes; selector: name=wtp1030.eqiad.wmnet,service=parsoid-php
  • 20:17 dzahn@cumin1001: conftool action : set/pooled=yes; selector: name=wtp1029.eqiad.wmnet,service=parsoid-php
  • 20:13 dzahn@cumin1001: conftool action : set/pooled=yes; selector: name=wtp2003.codfw.wmnet,service=parsoid-php
  • 19:55 dzahn@cumin1001: conftool action : set/pooled=yes; selector: name=wtp1028.eqiad.wmnet,service=parsoid-php
  • 19:49 dzahn@cumin1001: conftool action : set/pooled=yes; selector: name=wtp2002.codfw.wmnet,service=parsoid-php
  • 19:46 dzahn@cumin1001: conftool action : set/pooled=yes; selector: name=wtp1027.eqiad.wmnet,service=parsoid-php
  • 19:45 dzahn@cumin1001: conftool action : set/pooled=yes; selector: name=wtp1026.eqiad.wmnet,service=parsoid-php
  • 18:27 mutante: temp. disabled puppet on all wtp* servers, adding mediawiki appserver roles on them incrementally by re-enabling puppet, starting with wtp1026, scheduled icinga downtime for wtp* all services (T233654)
  • 18:19 mutante: temp. disabling puppet on all wtp* servers
  • 15:40 Urbanecm: Reassign edits from DannyS712 (T235446) to DannyS712 at banwiki (T235446)
  • 15:38 Urbanecm: Run extensions/CentralAuth/maintenance/createLocalAccount.php --wiki=banwiki DannyS712 (T235446)
  • 15:38 Urbanecm: Rename DannyS712@banwiki to DannyS712 (T235446) locally (T235446)
  • 15:07 Urbanecm: Reattach DannyS712@banwiki to DannyS712@SUL (T235446)
  • 14:19 _joe_: uploading cassandra 3.11.4 to stretch-wikimedia
  • 14:10 marostegui: Run compare.py on db1105 - T235877
  • 13:48 jynus: disabled notifications on db1105
  • 13:45 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db1105:3311 and db1105:3312 host rebooted itself', diff saved to https://phabricator.wikimedia.org/P9392 and previous config saved to /var/cache/conftool/dbconfig/20191018-134517-marostegui.json
  • 13:29 marostegui@cumin1001: dbctl commit (dc=all): 'Remove db2059 from config, host decommissioned', diff saved to https://phabricator.wikimedia.org/P9391 and previous config saved to /var/cache/conftool/dbconfig/20191018-132934-marostegui.json
  • 13:02 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db2084:3315 for tables compression T235599', diff saved to https://phabricator.wikimedia.org/P9390 and previous config saved to /var/cache/conftool/dbconfig/20191018-130253-marostegui.json
  • 13:01 marostegui: Compress db2084:3315 T235599
  • 12:39 marostegui@cumin1001: dbctl commit (dc=all): 'Repool db1076 after schema change', diff saved to https://phabricator.wikimedia.org/P9389 and previous config saved to /var/cache/conftool/dbconfig/20191018-123930-marostegui.json
  • 12:20 jmm@cumin1001: END (PASS) - Cookbook sre.hosts.decommission (exit_code=0)
  • 12:20 jmm@cumin1001: START - Cookbook sre.hosts.decommission
  • 12:10 jbond42: !log disable puppet on puppetmasters to fix puppet-merge
  • 11:58 moritzm: installing sudo security updates for jessie
  • 11:56 Reedy: `mwscript refreshLinks.php banwiki` on mwmaint1002 T235843
  • 11:10 ema@puppetmaster1001: conftool action : set/weight=100; selector: name=cp4028.ulsfo.wmnet,service=ats-be
  • 10:56 effie: Updating wikidiff2 to 1.9.0-2~wmf1 and slowly restart php-fpm across the fleet - T234175
  • 10:53 effie: Updating wikidiff2 to 1.9.0-2~wmf1 and slowly restart php-fpm across the fleet
  • 10:49 effie: Uploading wikidiff2_1.9.0-2~wmf1 to stretch-wikimedia T231586
  • 09:58 moritzm: rolling out debdeploy 0.0.99.12 fleet-wide
  • 09:57 oblivian@puppetmaster1001: conftool action : set/pooled=true; selector: dnsdisc=echostore
  • 09:40 _joe_: restarting pybal on lvs1015 to pick up the addition of echostore
  • 09:37 ema: pool cp4028 with ATS backend T227432
  • 09:36 _joe_: restarting pybal on lvs2003 to pick up the addition of echostore
  • 09:34 _joe_: restarting pybal on lvs1016 to pick up the addition of echostore
  • 09:20 _joe_: restarting pybal on lvs2006 to pick up the addition of echostore
  • 09:16 akosiaris@: helmfile [EQIAD] Ran 'apply' command on namespace 'kube-system' for release 'calico-policy-controller' .
  • 09:16 akosiaris@: helmfile [EQIAD] Ran 'apply' command on namespace 'kube-system' for release 'coredns' .
  • 09:16 akosiaris@: helmfile [EQIAD] Ran 'apply' command on namespace 'kube-system' for release 'rbac-deploy-clusterrole' .
  • 09:14 oblivian@puppetmaster1001: conftool action : set/pooled=yes; selector: service=echostore
  • 09:14 akosiaris@: helmfile [CODFW] Ran 'apply' command on namespace 'kube-system' for release 'calico-policy-controller' .
  • 09:14 akosiaris@: helmfile [CODFW] Ran 'apply' command on namespace 'kube-system' for release 'coredns' .
  • 09:14 akosiaris@: helmfile [CODFW] Ran 'apply' command on namespace 'kube-system' for release 'rbac-deploy-clusterrole' .
  • 09:14 moritzm: importing debdeploy 0.0.99.12 to apt.wikimedia.org
  • 09:13 akosiaris@: helmfile [STAGING] Ran 'apply' command on namespace 'kube-system' for release 'calico-policy-controller' .
  • 09:12 akosiaris@: helmfile [STAGING] Ran 'apply' command on namespace 'kube-system' for release 'coredns' .
  • 09:12 akosiaris@: helmfile [STAGING] Ran 'apply' command on namespace 'kube-system' for release 'rbac-deploy-clusterrole' .
  • 09:11 _joe_: hotpatching puppet-merge on puppetmaster1001
  • 08:34 ema@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
  • 08:32 ema@cumin1001: START - Cookbook sre.hosts.downtime
  • 08:03 ema: depool cp4028 and reimage as text_ats T227432
  • 07:58 marostegui: Deploy schema change on db1076
  • 07:57 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db1076 for schema change', diff saved to https://phabricator.wikimedia.org/P9388 and previous config saved to /var/cache/conftool/dbconfig/20191018-075709-marostegui.json
  • 07:55 marostegui@cumin1001: dbctl commit (dc=all): 'Repool db1129 after schema change', diff saved to https://phabricator.wikimedia.org/P9387 and previous config saved to /var/cache/conftool/dbconfig/20191018-075529-marostegui.json
  • 07:21 moritzm: installing unbound security updates on buster
  • 07:20 moritzm: installing libdatetime-timezone-perl updates (time zone updates)#
  • 05:53 vgutierrez: switch cp1084 from nginx to ats-tls - T231433
  • 05:34 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.decommission (exit_code=0)
  • 05:34 marostegui@cumin1001: START - Cookbook sre.hosts.decommission
  • 05:32 vgutierrez: switch cp2014 from nginx to ats-tls - T231433
  • 05:19 marostegui: Rename m5 labtestwiki database - T233236
  • 05:15 marostegui: Deploy schema change on db1129 T233135 T234066
  • 05:15 marostegui: Compress tables on db2091:3314 T235599
  • 05:14 vgutierrez: switch cp3039 from nginx to ats-tls - T231433
  • 05:13 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db1129 for schema change', diff saved to https://phabricator.wikimedia.org/P9386 and previous config saved to /var/cache/conftool/dbconfig/20191018-051355-marostegui.json
  • 05:08 marostegui@cumin1001: dbctl commit (dc=all): 'Repool db1099:3311 and db2086:3318 after table compression', diff saved to https://phabricator.wikimedia.org/P9385 and previous config saved to /var/cache/conftool/dbconfig/20191018-050831-marostegui.json
  • 04:57 vgutierrez: switch cp4025 from nginx to ats-tls - T231433
  • 04:34 vgutierrez: switch cp5005 from nginx to ats-tls - T231433
  • 04:31 vgutierrez: restarting nagios-nrpe-server on stat1007

2019-10-17

  • 21:42 mholloway-shell@deploy1001: Finished deploy [mobileapps/deploy@d663006]: Update mobileapps to f345673 (duration: 05m 38s)
  • 21:37 mholloway-shell@deploy1001: Started deploy [mobileapps/deploy@d663006]: Update mobileapps to f345673
  • 19:31 eileen: civicrm revision changed from 4eac801762 to ff69d64ad4, config revision is dc3a88889d
  • 18:26 mutante: wtp1025 - cd /srv/deployment/parsoid/deploy/src ; sudo -u deploy-service ln -s ../vendor (for benchmarking test)
  • 18:01 _joe_: depooled wtp1025 from parsoid, parsoid-php to allow running benchmarks there
  • 18:01 elukey: update librdkafka on eventlog1002 and restart eventlogging
  • 15:19 marostegui@cumin1001: dbctl commit (dc=all): 'Repool db1090:3317 and remove db1136 from its temporary vslow,dump role', diff saved to https://phabricator.wikimedia.org/P9382 and previous config saved to /var/cache/conftool/dbconfig/20191017-151952-marostegui.json
  • 15:07 dcausse: unbanning elastic1050:psi
  • 15:01 dcausse: dumping jvm heap on elastic1050:psi to investigate gc issues
  • 14:46 moritzm: installing 4.9.189 Linux update on jessie hosts (no reboots, deploying the package only at this point)
  • 14:37 dcausse: banning elastic1050:psi to investigate gc issues
  • 14:32 moritzm: uploaded linux-meta 1.22 for jessie-wikimedia
  • 14:32 bblack: disable puppet on cache fleet (cp*) ahead of cert deployment refactoring - T234803
  • 14:09 cdanis: βœ”οΈ cdanis@install1002.wikimedia.org ~ πŸ•™β˜• sudo -E reprepro --restrict grafana update buster-wikimedia
  • 13:41 marostegui@cumin1001: dbctl commit (dc=all): 'Fully repool db1129 after PDU maintenance', diff saved to https://phabricator.wikimedia.org/P9381 and previous config saved to /var/cache/conftool/dbconfig/20191017-134112-marostegui.json
  • 13:30 marostegui@cumin1001: dbctl commit (dc=all): 'More traffic to db1129 after PDU maintenance', diff saved to https://phabricator.wikimedia.org/P9380 and previous config saved to /var/cache/conftool/dbconfig/20191017-133047-marostegui.json
  • 13:06 XioNoX: rollback failover vrrp from cr2-eqiad to cr1-eqiad - T227133
  • 12:56 XioNoX: restart mr1-eqiad
  • 12:54 XioNoX: downtiming all mgmt host for 30min (mr1-eqiad needs to be rebooted)
  • 12:52 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db2088:3312 for compression T235599', diff saved to https://phabricator.wikimedia.org/P9379 and previous config saved to /var/cache/conftool/dbconfig/20191017-125248-marostegui.json
  • 12:51 marostegui@cumin1001: dbctl commit (dc=all): 'More traffic to db1129 after PDU maintenance', diff saved to https://phabricator.wikimedia.org/P9378 and previous config saved to /var/cache/conftool/dbconfig/20191017-125154-marostegui.json
  • 12:50 marostegui: Compress tables on db2088:3312 - T235599
  • 12:45 marostegui@cumin1001: dbctl commit (dc=all): 'Slowly repool db1129 after PDU maintenance', diff saved to https://phabricator.wikimedia.org/P9377 and previous config saved to /var/cache/conftool/dbconfig/20191017-124503-marostegui.json
  • 12:13 marostegui@cumin1001: dbctl commit (dc=all): 'Restore db1090:3312 original weight', diff saved to https://phabricator.wikimedia.org/P9376 and previous config saved to /var/cache/conftool/dbconfig/20191017-121330-marostegui.json
  • 12:11 marostegui@cumin1001: dbctl commit (dc=all): 'Repool db1105:3312 after schema change', diff saved to https://phabricator.wikimedia.org/P9375 and previous config saved to /var/cache/conftool/dbconfig/20191017-121106-marostegui.json
  • 11:39 ema: pool cp4027 with ATS backend T227432
  • 11:36 vgutierrez: upgrading ATS on eqiad nodes to 8.0.5-1wm9 - T234011
  • 11:27 vgutierrez: upgrading ATS on codfw nodes to 8.0.5-1wm9 - T234011
  • 11:27 ema@puppetmaster1001: conftool action : set/weight=100; selector: name=cp4027.ulsfo.wmnet,service=ats-be
  • 11:16 vgutierrez: upgrading ATS on esams nodes to 8.0.5-1wm9 - T234011
  • 11:11 Urbanecm: EU SWAT done
  • 11:11 XioNoX: failover vrrp from cr2-eqiad to cr1-eqiad - T227133
  • 11:11 urbanecm@deploy1001: Synchronized wmf-config/InitialiseSettings.php: SWAT: 36d4612: Allow sysops to add transwiki on nnwiki, and add import sources (T231761) (duration: 00m 59s)
  • 11:09 vgutierrez: upgrading ATS on ulsfo nodes to 8.0.5-1wm9 - T234011
  • 11:08 urbanecm@deploy1001: Synchronized php-1.35.0-wmf.2/extensions/WikibaseMediaInfo: SWAT: 5a67011: Keep track of assigned nodes in both old & new DOM (T235236) (duration: 01m 03s)
  • 10:58 ema@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
  • 10:56 ema@cumin1001: START - Cookbook sre.hosts.downtime
  • 10:32 ema: depool cp4027 and reimage as text_ats T227432
  • 10:31 effie: depool mw1333
  • 10:25 elukey: rollback eventlogging back to Python 2, some errors (unseen in tests) logged by the processors
  • 10:24 elukey@deploy1001: Finished deploy [eventlogging/analytics@0f0a1aa]: Rollback move codebase to Python3 (duration: 00m 03s)
  • 10:24 elukey@deploy1001: Started deploy [eventlogging/analytics@0f0a1aa]: Rollback move codebase to Python3
  • 10:19 elukey: Move eventlogging on eventlog1002 to Python3
  • 10:17 elukey@deploy1001: Finished deploy [eventlogging/analytics@0f0a1aa]: Move codebase to Python3 (duration: 00m 05s)
  • 10:17 elukey@deploy1001: Started deploy [eventlogging/analytics@0f0a1aa]: Move codebase to Python3
  • 09:57 godog: swift codfw-prod: more weight to ms-be205[1-6] - T233638
  • 09:39 godog: swift eqiad-prod: add weight to ms-be105[1-6] - T232367
  • 09:38 marostegui: Stop MySQL on db1129 for PDU work
  • 09:37 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db1129 for PDU work, give some traffic to db1090:3312 meanwhile T22meanwhile T227133', diff saved to https://phabricator.wikimedia.org/P9374 and previous config saved to /var/cache/conftool/dbconfig/20191017-093753-marostegui.json
  • 09:27 elukey: upload archiva 2.2.4-1 to stretch-wikimedia - T222595
  • 09:26 marostegui: Stop MySQL on db1117 this will generate some haproxy alerts - T227133
  • 08:28 jmm@cumin1001: END (PASS) - Cookbook sre.hosts.decommission (exit_code=0)
  • 08:28 jmm@cumin1001: START - Cookbook sre.hosts.decommission
  • 08:26 jmm@cumin1001: END (PASS) - Cookbook sre.hosts.decommission (exit_code=0)
  • 08:26 jmm@cumin1001: START - Cookbook sre.hosts.decommission
  • 08:05 vgutierrez: upgrading ATS on eqsin nodes to 8.0.5-1wm9 - T234011
  • 08:03 marostegui: Deploy schema change on db1090:3317
  • 08:01 marostegui@cumin1001: dbctl commit (dc=all): 'Fix db1136 weight', diff saved to https://phabricator.wikimedia.org/P9373 and previous config saved to /var/cache/conftool/dbconfig/20191017-080157-marostegui.json
  • 08:00 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db1090:3317 pool db1136 temporarily into vslow,dump', diff saved to https://phabricator.wikimedia.org/P9372 and previous config saved to /var/cache/conftool/dbconfig/20191017-080026-marostegui.json
  • 07:47 marostegui@cumin1001: dbctl commit (dc=all): 'Repool db1136', diff saved to https://phabricator.wikimedia.org/P9371 and previous config saved to /var/cache/conftool/dbconfig/20191017-074658-marostegui.json
  • 07:13 marostegui@cumin1001: dbctl commit (dc=all): 'Pool db1130 (non partitioned host) into s5 special group with low weight - T223151', diff saved to https://phabricator.wikimedia.org/P9370 and previous config saved to /var/cache/conftool/dbconfig/20191017-071308-marostegui.json
  • 06:06 elukey: upgrade archiva on archiva1001 to 2.2.4 - T222595
  • 06:02 marostegui@cumin1001: dbctl commit (dc=all): 'Change special weights from x to x100 on s5 - T231018', diff saved to https://phabricator.wikimedia.org/P9369 and previous config saved to /var/cache/conftool/dbconfig/20191017-060251-marostegui.json
  • 05:35 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.decommission (exit_code=0)
  • 05:35 marostegui@cumin1001: START - Cookbook sre.hosts.decommission
  • 05:30 marostegui: Deploy schema change on labtestwiki and labswiki
  • 05:12 marostegui: Deploy schema change on db1095:3312
  • 05:10 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db1105:3312 and db1136 for schema change', diff saved to https://phabricator.wikimedia.org/P9368 and previous config saved to /var/cache/conftool/dbconfig/20191017-051055-marostegui.json
  • 05:06 marostegui@cumin1001: dbctl commit (dc=all): 'Repool db1103:3312 and db1094', diff saved to https://phabricator.wikimedia.org/P9367 and previous config saved to /var/cache/conftool/dbconfig/20191017-050614-marostegui.json
  • 05:01 vgutierrez: upgrading ATS to 8.0.5-1wm9 on cp5001 - T234011
  • 05:00 vgutierrez: uploaded trafficserver 8.0.5-1wm9 to apt.wikimedia.org (stretch) - T234011
  • 02:04 bblack: repooling eqsin
  • 00:55 dzahn@cumin1001: END (PASS) - Cookbook sre.ganeti.makevm (exit_code=0)
  • 00:50 dzahn@cumin1001: START - Cookbook sre.ganeti.makevm
  • 00:43 dzahn@cumin1001: END (FAIL) - Cookbook sre.ganeti.makevm (exit_code=99)
  • 00:43 dzahn@cumin1001: START - Cookbook sre.ganeti.makevm
  • 00:42 dzahn@cumin1001: END (FAIL) - Cookbook sre.ganeti.makevm (exit_code=99)
  • 00:42 dzahn@cumin1001: START - Cookbook sre.ganeti.makevm
  • 00:41 dzahn@cumin1001: END (FAIL) - Cookbook sre.ganeti.makevm (exit_code=99)
  • 00:40 dzahn@cumin1001: START - Cookbook sre.ganeti.makevm

2019-10-16

  • 23:17 Urbanecm: Evening SWAT done
  • 23:17 urbanecm@deploy1001: Synchronized wmf-config/throttle.php: Clean expired rules (duration: 00m 58s)
  • 23:14 Urbanecm: Purge https://en.wikipedia.org/static/images/project-logos/azwiki-1.5x.png (T235710)
  • 23:14 Urbanecm: Purge https://en.wikipedia.org/static/images/project-logos/azwiki-2x.png (T235710)
  • 23:14 Urbanecm: Purge https://en.wikipedia.org/static/images/project-logos/azwiki.png (T235710)
  • 23:13 urbanecm@deploy1001: Synchronized static/images/project-logos/: SWAT: 9c5bcd8: Change logo for azwiki (T235710) (duration: 00m 59s)
  • 23:11 urbanecm@deploy1001: Synchronized wmf-config/throttle.php: SWAT: 6dc4c0c: New throttle rule for WMCL editathon (T235693) (duration: 00m 59s)
  • 23:09 @: helmfile [EQIAD] Ran 'apply' command on namespace 'echostore' for release 'production' .
  • 23:06 urbanecm@deploy1001: Synchronized wmf-config/InitialiseSettings.php: SWAT: 96c87c7: Enable transwiki import from other Wikipedias on srwikisource (T235419) (duration: 00m 58s)
  • 23:05 @: helmfile [CODFW] Ran 'apply' command on namespace 'echostore' for release 'production' .
  • 23:00 jforrester@deploy1001: Synchronized php-1.35.0-wmf.1/resources/src/mediawiki.special/contributions.less: T235137 Don't apply styling for Special:Contributions on other pages (duration: 00m 59s)
  • 22:47 @: helmfile [CODFW] Ran 'apply' command on namespace 'echostore' for release 'production' .
  • 22:42 James_F: Zuul: Add composer-php72-docker for wikimedia-cz/web-theme and wikimedia-cz/web-plugin
  • 22:31 mutante: mwmaint1002 - running generate-fancy-captcha-loop to work around issue with generate-captcha cron (T230245)
  • 22:30 jforrester@deploy1001: Synchronized php-1.35.0-wmf.2/resources/src/mediawiki.special/contributions.less: T235137 Don't apply styling for Special:Contributions on other pages (duration: 00m 59s)
  • 22:29 jforrester@deploy1001: Synchronized php-1.35.0-wmf.2/includes/OutputPage.php: T235711 Lower severity of targets violation back to DEBUG (duration: 00m 59s)
  • 21:53 jforrester@deploy1001: Synchronized php-1.35.0-wmf.2/extensions/WikiEditor: T235701 Revert removal of jquery.tabIndex (duration: 00m 59s)
  • 21:47 @: helmfile [CODFW] Ran 'sync' command on namespace 'echostore' for release 'production' .
  • 21:44 @: helmfile [CODFW] Ran 'sync' command on namespace 'echostore' for release 'production' .
  • 21:42 @: helmfile [CODFW] Ran 'sync' command on namespace 'echostore' for release 'production' .
  • 21:41 @: helmfile [CODFW] Ran 'apply' command on namespace 'echostore' for release 'production' .
  • 21:10 @: helmfile [CODFW] Ran 'apply' command on namespace 'echostore' for release 'production' .
  • 20:42 @: helmfile [STAGING] Ran 'apply' command on namespace 'echostore' for release 'staging' .
  • 20:41 ejegg: rolled back fundraising python tools from 31171f148c to b3c7453be2
  • 20:16 jforrester@deploy1001: Synchronized php-1.35.0-wmf.2/includes/resourceloader/ResourceLoaderStartUpModule.php: Expose StartupModule::getConfigSettings for internal use T235350 T229836 (duration: 00m 59s)
  • 20:07 joal@deploy1001: Finished deploy [analytics/refinery@1704fdd]: Regular analytics weekly train (duration: 17m 06s)
  • 20:00 urandom: upgrading Cassandra to 3.11.4, codfw, rack d -- T200803
  • 19:50 joal@deploy1001: Started deploy [analytics/refinery@1704fdd]: Regular analytics weekly train
  • 19:35 urandom: upgrading Cassandra to 3.11.4, codfw, rack c -- T200803
  • 19:30 jhuneidi@deploy1001: Pruned MediaWiki: 1.34.0-wmf.25 (duration: 03m 24s)
  • 19:18 joal@deploy1001: Finished deploy [analytics/aqs/deploy@59a97fa]: Regular analytics weekly train - try 2 after fix (duration: 05m 53s)
  • 19:13 joal@deploy1001: Started deploy [analytics/aqs/deploy@59a97fa]: Regular analytics weekly train - try 2 after fix
  • 19:08 jhuneidi@deploy1001: Synchronized php: group1 wikis to 1.35.0-wmf.2 refs T233850 (duration: 00m 59s)
  • 19:07 jhuneidi@deploy1001: rebuilt and synchronized wikiversions files: group1 wikis to 1.35.0-wmf.2 refs T233850
  • 19:06 joal@deploy1001: Finished deploy [analytics/aqs/deploy@59a97fa]: Regular analytics weekly train (top-mediarequest endpoint) (duration: 01m 18s)
  • 19:05 joal@deploy1001: Started deploy [analytics/aqs/deploy@59a97fa]: Regular analytics weekly train (top-mediarequest endpoint)
  • 18:46 urandom: upgrading Cassandra to 3.11.4, codfw, rack b -- T200803
  • 18:28 urandom: upgrading Cassandra to 3.11.4, eqiad, rack d -- T200803
  • 18:06 urandom: upgrading Cassandra to 3.11.4, eqiad, rack b -- T200803
  • 16:33 urandom: upgrading Cassandra to 3.11.4, eqiad, rack a -- T200803
  • 16:17 catrope@deploy1001: Synchronized php-1.35.0-wmf.2/extensions/GrowthExperiments/: Fix help panel button alignment (T235578) (duration: 01m 02s)
  • 16:16 mutante: ganeti1003 - shutting down and removing instance moscovium.eqiad.wmnet - recreating under same name with cookbook
  • 15:59 mutante: new dsh group parsoid_php created - parsoid-php servers added to scap / mediawiki-installation dsh group
  • 15:17 marostegui: Deploy schema change on dbstore1004:3312 - T234066 T233135
  • 15:09 marostegui: Recreate views for protected_titles on s2 and s7 on labsdb1009 and labsdb1012 - T233135
  • 15:04 mutante: wtp1025 wtp2001 - scap pull (T233654)
  • 15:04 mutante: wtp parsoid servers added to conftool - wtp1025 and wtp2001 pooled in new service parsoid-php (T233654)
  • 15:00 dzahn@cumin1001: conftool action : set/pooled=yes; selector: name=wtp1025.eqiad.wmnet,service=parsoid-php
  • 14:59 dzahn@cumin1001: conftool action : set/pooled=yes; selector: name=wtp2001.codfw.wmnet,service=parsoid-php
  • 14:53 effie: Remove tex* and math related packages from deploy*,mwmaint*,snapshot* - T195847
  • 14:30 akosiaris@: helmfile [EQIAD] Ran 'apply' command on namespace 'kube-system' for release 'calico-policy-controller' .
  • 14:30 akosiaris@: helmfile [EQIAD] Ran 'apply' command on namespace 'kube-system' for release 'coredns' .
  • 14:30 akosiaris@: helmfile [EQIAD] Ran 'apply' command on namespace 'kube-system' for release 'rbac-deploy-clusterrole' .
  • 14:26 papaul: power down puppetmaster2001 for HW maintenance
  • 14:24 oblivian@: helmfile [CODFW] Ran 'apply' command on namespace 'kube-system' for release 'calico-policy-controller' .
  • 14:24 oblivian@: helmfile [CODFW] Ran 'apply' command on namespace 'kube-system' for release 'coredns' .
  • 14:24 oblivian@: helmfile [CODFW] Ran 'apply' command on namespace 'kube-system' for release 'rbac-deploy-clusterrole' .
  • 14:24 _joe_: creating namespaces and policies for echostore in codfw, T234376
  • 14:18 oblivian@: helmfile [STAGING] Ran 'apply' command on namespace 'kube-system' for release 'calico-policy-controller' .
  • 14:18 oblivian@: helmfile [STAGING] Ran 'apply' command on namespace 'kube-system' for release 'coredns' .
  • 14:18 oblivian@: helmfile [STAGING] Ran 'apply' command on namespace 'kube-system' for release 'rbac-deploy-clusterrole' .
  • 14:10 moritzm: installing idp2001
  • 13:56 jynus: reenabling puppet on helium T229209
  • 13:46 XioNoX: rollback failover VRRP from cr1-eqiad to cr2-eqiad - T226782
  • 13:26 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db1103:3312 and db1094 for schema change', diff saved to https://phabricator.wikimedia.org/P9364 and previous config saved to /var/cache/conftool/dbconfig/20191016-132620-marostegui.json
  • 13:10 marostegui@cumin1001: dbctl commit (dc=all): 'Repool db1074 after schema change', diff saved to https://phabricator.wikimedia.org/P9363 and previous config saved to /var/cache/conftool/dbconfig/20191016-131010-marostegui.json
  • 13:10 jmm@cumin2001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
  • 13:10 jmm@cumin2001: START - Cookbook sre.hosts.downtime
  • 12:51 marostegui@cumin1001: dbctl commit (dc=all): 'Repool db1079 after schema change', diff saved to https://phabricator.wikimedia.org/P9362 and previous config saved to /var/cache/conftool/dbconfig/20191016-125102-marostegui.json
  • 12:38 effie: remove tex* and math related packages from appserver canaries - T195847
  • 12:30 onimisionipe@deploy1001: Finished deploy [wdqs/wdqs@217cac5]: redeploy 0.3.4-SNAPSHOT - T235540 (duration: 03m 40s)
  • 12:29 jmm@cumin2001: END (PASS) - Cookbook sre.ganeti.makevm (exit_code=0)
  • 12:26 onimisionipe@deploy1001: Started deploy [wdqs/wdqs@217cac5]: redeploy 0.3.4-SNAPSHOT - T235540
  • 12:20 marostegui: Compress tables on db1099:3311 - T235599
  • 12:15 onimisionipe@deploy1001: Finished deploy [wdqs/wdqs@c90503b]: Revert to fix T235540 (duration: 19m 09s)
  • 12:10 jmm@cumin2001: START - Cookbook sre.ganeti.makevm
  • 12:00 kart_: Updated cxserver to 2019-10-15-091114-production (T234773, T217585)
  • 11:57 @: helmfile [EQIAD] Ran 'apply' command on namespace 'cxserver' for release 'production' .
  • 11:56 onimisionipe@deploy1001: Started deploy [wdqs/wdqs@c90503b]: Revert to fix T235540
  • 11:49 onimisionipe@deploy1001: Finished deploy [wdqs/wdqs@5b42bdf]: Revert wdqs 0.3.4-SNAPSHOT (duration: 10m 13s)
  • 11:46 @: helmfile [CODFW] Ran 'apply' command on namespace 'cxserver' for release 'production' .
  • 11:44 @: helmfile [STAGING] Ran 'apply' command on namespace 'cxserver' for release 'staging' .
  • 11:39 onimisionipe@deploy1001: Started deploy [wdqs/wdqs@5b42bdf]: Revert wdqs 0.3.4-SNAPSHOT
  • 11:34 Lucas_WMDE: EU SWAT done
  • 11:26 lucaswerkmeister-wmde@deploy1001: Synchronized wmf-config/: SWAT: extension-list: Load FlaggedRevs via extension.json (T87915, T139800, T140852) (duration: 01m 05s)
  • 11:16 lucaswerkmeister-wmde@deploy1001: Synchronized wmf-config/InitialiseSettings.php: SWAT: Configure Citoid+Wikibase integration on Test Wikidata (T228412) (duration: 01m 13s)
  • 11:14 _joe_: purging confd from wtp* servers, not needed anymore
  • 10:48 _joe_: upgrading confd to 0.16.0 across the cluster. T147204. confd will be restarted on the next puppet run
  • 10:31 elukey: upload prometheus-memcached-exporter 0.4.1+git20181010.2fa99eb-1+deb10u1 to buster-wikimedia - T213089
  • 10:17 marostegui: Stop replication on s2 codfw master for schema change and to modify sanitarium triggers T234066 T233135 T234704
  • 09:40 effie: enable puppet on all hosts running hhvm - T229792
  • 09:36 XioNoX: restart fastnetmon on netflow2001
  • 09:27 effie: Disable puppet on all hosts running hhvm to merge 543131 - T229792
  • 09:22 effie: Disable puppet on mw* hosts to merge 543131
  • 09:20 gehel: force merging commonswiki_content on elasticsearch codfw
  • 08:55 aborrero@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
  • 08:55 aborrero@cumin1001: START - Cookbook sre.hosts.downtime
  • 06:15 _joe_: upgrading envoyproxy in production to 1.11.2 T235412
  • 05:26 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.decommission (exit_code=0)
  • 05:26 marostegui@cumin1001: START - Cookbook sre.hosts.decommission
  • 05:21 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db1099:3311 for schema change', diff saved to https://phabricator.wikimedia.org/P9360 and previous config saved to /var/cache/conftool/dbconfig/20191016-052104-marostegui.json
  • 05:18 marostegui: Deploy schema change on s2 sanitarium master (db1074) this will create lag on s2 labsdb T233135 T234066
  • 05:18 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db1074 for schema change', diff saved to https://phabricator.wikimedia.org/P9359 and previous config saved to /var/cache/conftool/dbconfig/20191016-051812-marostegui.json
  • 05:14 marostegui: Change s7 triggers for archive table from db1125:3317 T234704
  • 05:11 marostegui: Change s2 triggers for archive table from db1125:3312 T234704
  • 05:08 marostegui: Deploy schema change on s7 sanitarium master (db1079) this will create lag on s7 labsdb T233135 T234066
  • 05:06 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db1079 for schema change', diff saved to https://phabricator.wikimedia.org/P9358 and previous config saved to /var/cache/conftool/dbconfig/20191016-050627-marostegui.json
  • 03:49 mobrovac@deploy1001: Finished deploy [restbase/deploy@320f3a5]: Parsoid: Use the ETag for retrieving stashed content - T235465 (duration: 13m 37s)
  • 03:35 mobrovac@deploy1001: Started deploy [restbase/deploy@320f3a5]: Parsoid: Use the ETag for retrieving stashed content - T235465
  • 01:55 eileen: civicrm revision changed from 5a2f8048c4 to 4eac801762, config revision is dc3a88889d
  • 00:09 mutante: wikitech - make JBond a "content administrator" to give the ability to create server fingerprint pages

2019-10-15

  • 22:41 Reedy: manually running `extensions/ConfirmEdit/maintenance/GenerateFancyCaptchas.php` T230245
  • 21:26 jforrester@deploy1001: Synchronized multiversion/MWConfigCacheGenerator.php: Provide getCachableMWConfig() which doesn't rely on wgConf (duration: 01m 00s)
  • 21:24 mholloway-shell@deploy1001: Finished deploy [mobileapps/deploy@cdfa545]: Media: Fix TypeError when processing pages with only Mathoid images (T235408) (duration: 05m 35s)
  • 21:18 mholloway-shell@deploy1001: Started deploy [mobileapps/deploy@cdfa545]: Media: Fix TypeError when processing pages with only Mathoid images (T235408)
  • 21:16 jforrester@deploy1001: Synchronized wmf-config/InitialiseSettings.php: InitialiseSettings: Stop writing wmgScoreFileBackend and wmgScorePath, never read (duration: 00m 59s)
  • 21:15 jforrester@deploy1001: Synchronized wmf-config/CommonSettings.php: CommonSettings: Stop using wmg variables for Score extension (duration: 01m 01s)
  • 21:13 jforrester@deploy1001: Synchronized wmf-config/InitialiseSettings.php: Write wgScoreFileBackend and wgScorePath directly, not via CommonSettings (duration: 01m 00s)
  • 20:10 jhuneidi@deploy1001: rebuilt and synchronized wikiversions files: group0 wikis to 1.35.0-wmf.2 refs T233850
  • {{safesubst:SAL entry|1=19:55 urandom: upgrade restbase2011-{a,b,c} to cassandra 3.11.-4 -- T200803}}
  • 19:52 urandom: upgrade restbase1016-c to cassandra 3.11.-4 -- T200803
  • 19:48 jhuneidi@deploy1001: Finished scap: testwikis wikis to 1.35.0-wmf.2 refs T233850 (duration: 27m 39s)
  • 19:48 urandom: upgrade restbase1016-b to cassandra 3.11.-4 -- T200803
  • 19:42 urandom: upgrade restbase1016-a to cassandra 3.11.-4 -- T200803
  • 19:20 jhuneidi@deploy1001: Started scap: testwikis wikis to 1.35.0-wmf.2 refs T233850
  • 19:07 mutante: LDAP - adding user rzl to groups wmf and ops (T235215)
  • 17:51 longma: cutting the branch for 1.35.0-wmf.2 T233850
  • 16:28 ejegg: updated payments-wiki from c3cc3ace2f to 570324a30f
  • 16:24 papaul: power down lvs2010 for HW maintenance
  • 16:00 _joe_: uploading envoy 1.11.2 to stretch-wikimedia, buster-wikimedia T230779 T235412
  • 15:54 marostegui@cumin1001: dbctl commit (dc=all): 'Fully repool db1086 after schema change', diff saved to https://phabricator.wikimedia.org/P9355 and previous config saved to /var/cache/conftool/dbconfig/20191015-155454-marostegui.json
  • 15:52 papaul: power down lvs2009 for HW maintenance
  • 15:43 marostegui@cumin1001: dbctl commit (dc=all): 'Slowly repool db1086 after schema change', diff saved to https://phabricator.wikimedia.org/P9354 and previous config saved to /var/cache/conftool/dbconfig/20191015-154325-marostegui.json
  • 15:17 ejegg: updated payments-wiki from 8a65f57874 to c3cc3ace2f
  • 15:01 moritzm: installing fribidi bugfix updates from stretch point release
  • 14:54 moritzm: installing cups security updates for stretch (client-side libs/tools only)
  • 14:43 elukey: start a root tmux containing a bash script on conf1004 to clean up znodes under /yarn-rmstore/analytics-hadoop/ZKRMStateRoot/RMAppRoot slowly - T217057
  • 14:40 papaul: power down puppetmaster2002 for HW maintenance
  • 14:38 moritzm: installing usbutils update from stretch point release
  • 14:34 elukey: executed 'rmr' in zookeeper on conf1004 for znodes /yarn-leader-election /hadoop-ha /hive_zookeeper_namespace
  • 14:12 ejegg: updated fundraising python tools from b3c7453be2 to 31171f148c
  • 13:53 moritzm: installing 4.9.189 Linux update from last stretch point releases (no reboots, deploying the package only at this point)
  • 13:03 marostegui@cumin1001: dbctl commit (dc=all): 'Fully repool db1126 after PDU maintenance', diff saved to https://phabricator.wikimedia.org/P9353 and previous config saved to /var/cache/conftool/dbconfig/20191015-130356-marostegui.json
  • 12:49 marostegui@cumin1001: dbctl commit (dc=all): 'More traffic to db1126 after PDU maintenance', diff saved to https://phabricator.wikimedia.org/P9352 and previous config saved to /var/cache/conftool/dbconfig/20191015-124942-marostegui.json
  • 12:46 elukey: Hadoop maintenance over
  • 12:33 marostegui@cumin1001: dbctl commit (dc=all): 'More traffic to db1126 after PDU maintenance', diff saved to https://phabricator.wikimedia.org/P9351 and previous config saved to /var/cache/conftool/dbconfig/20191015-123356-marostegui.json
  • 12:24 mobrovac: restbase add parsoidphp tables in prod - T230792
  • 12:18 marostegui@cumin1001: dbctl commit (dc=all): 'Slowly repool db1126 after PDU maintenance', diff saved to https://phabricator.wikimedia.org/P9350 and previous config saved to /var/cache/conftool/dbconfig/20191015-121840-marostegui.json
  • 12:17 marostegui: Repool labsdb1009 after PDU maintenance
  • 12:17 elukey: Hadoop maintenance start - migration to the new Zookepeer cluster
  • 12:16 moritzm: installing sudo security updates on buster/stretch
  • 12:13 arturo: add copy of python-pykube and python3-pykube from stretch-wikimedia to buster-wikimedia (T230961)
  • 12:06 elukey@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
  • 12:06 elukey@cumin1001: START - Cookbook sre.hosts.downtime
  • 12:05 hashar: CI Jenkins restarted
  • 12:04 hashar: Restarting CI Jenkins
  • 12:04 marostegui@cumin1001: dbctl commit (dc=all): 'Repool db1097:3314', diff saved to https://phabricator.wikimedia.org/P9348 and previous config saved to /var/cache/conftool/dbconfig/20191015-120359-marostegui.json
  • 12:01 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db1086 for schema change', diff saved to https://phabricator.wikimedia.org/P9347 and previous config saved to /var/cache/conftool/dbconfig/20191015-120133-marostegui.json
  • 11:59 marostegui@cumin1001: dbctl commit (dc=all): 'Repool db1098:3317', diff saved to https://phabricator.wikimedia.org/P9346 and previous config saved to /var/cache/conftool/dbconfig/20191015-115922-marostegui.json
  • 11:12 Urbanecm: EU SWAT done
  • 11:12 urbanecm@deploy1001: Synchronized wmf-config/InitialiseSettings.php: SWAT: ac37540: Add `autopatrol` to translation administrators on mediawiki (duration: 00m 51s)
  • 11:12 jbond42: move puppetmaster_ca_server back to puppetmaster1001
  • 11:08 Urbanecm: mwscript resetAuthenticationThrottle.php --wiki=cswiki --signup --ip 195.113.145.2 (T235493)
  • 11:05 urbanecm@deploy1001: Synchronized wmf-config/throttle.php: SWAT:855aca4eb: Throttle rule for Czech course (T235493) (duration: 00m 51s)
  • 10:54 moritzm: mark ruby-safe-yaml as manually installed using apt-mark on jessie/stretch, prevents accidental removal of ruby-safe-yaml after puppet 4->5 migration
  • 10:07 moritzm: installing openssl updates for buster (some ciphers we don't use were not enabled due to an upstream change related to the selection of ASM-optimised implementations over generic C)
  • 08:07 marostegui: Stop MySQL on db1126 and labsdb1009 for PDU maintenance - T226782
  • 08:06 elukey: upload new version of memkeys (adding a patch to merged to upstream to avoid segfaults on stretch/buster) to stretch|buster wikimedia apt repos - T223863
  • 07:52 Urbanecm: Set email for `Martin Urbanec (test 10)` to test@wikimedia.cz (debug, no ticket)
  • 07:48 Urbanecm: Password reset for Xaris333 #2 (T235441)
  • 07:41 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.decommission (exit_code=0)
  • 07:41 marostegui@cumin1001: START - Cookbook sre.hosts.decommission
  • 07:35 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.decommission (exit_code=0)
  • 07:34 marostegui@cumin1001: START - Cookbook sre.hosts.decommission
  • 07:13 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db1126 for PDU maintenance T226782', diff saved to https://phabricator.wikimedia.org/P9345 and previous config saved to /var/cache/conftool/dbconfig/20191015-071338-marostegui.json
  • 07:10 XioNoX: failover VRRP from cr1-eqiad to cr2-eqiad in prevision of the PDU work of - T226782
  • 06:44 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db2086:3318 T232446', diff saved to https://phabricator.wikimedia.org/P9344 and previous config saved to /var/cache/conftool/dbconfig/20191015-064419-marostegui.json
  • 06:40 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db1070 T235464', diff saved to https://phabricator.wikimedia.org/P9343 and previous config saved to /var/cache/conftool/dbconfig/20191015-064005-marostegui.json
  • 05:38 marostegui: Depool labsdb1009 for PDU maintenance T226782
  • 05:28 marostegui: Deploy schema change on db1098:3317 T234066 T233135
  • 05:28 marostegui: Deploy schema change on db1097:3314 T233625
  • 05:26 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db1097:3314', diff saved to https://phabricator.wikimedia.org/P9342 and previous config saved to /var/cache/conftool/dbconfig/20191015-052621-marostegui.json
  • 05:22 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db1098:3317', diff saved to https://phabricator.wikimedia.org/P9341 and previous config saved to /var/cache/conftool/dbconfig/20191015-052220-marostegui.json
  • 05:19 marostegui@cumin1001: dbctl commit (dc=all): 'Repool db2085:3318', diff saved to https://phabricator.wikimedia.org/P9340 and previous config saved to /var/cache/conftool/dbconfig/20191015-051924-marostegui.json
  • 05:14 marostegui@cumin1001: dbctl commit (dc=all): 'Repool db1103:3314', diff saved to https://phabricator.wikimedia.org/P9339 and previous config saved to /var/cache/conftool/dbconfig/20191015-051400-marostegui.json
  • 05:12 marostegui@cumin1001: dbctl commit (dc=all): 'Repool db1098:3316', diff saved to https://phabricator.wikimedia.org/P9338 and previous config saved to /var/cache/conftool/dbconfig/20191015-051236-marostegui.json
  • 05:00 marostegui@cumin2001: dbctl commit (dc=all): 'Promote db1100 to s5 master and remove read-only from s5 T234300', diff saved to https://phabricator.wikimedia.org/P9337 and previous config saved to /var/cache/conftool/dbconfig/20191015-050042-marostegui.json
  • 05:00 marostegui@cumin2001: dbctl commit (dc=all): 'Set s5 as read-only for maintenance T234300', diff saved to https://phabricator.wikimedia.org/P9336 and previous config saved to /var/cache/conftool/dbconfig/20191015-050016-marostegui.json
  • 05:00 marostegui: Starting s5 failover from db1070 to db1100 - T234300
  • 04:34 marostegui@cumin1001: dbctl commit (dc=all): 'Repool db1101:3317 after schema change', diff saved to https://phabricator.wikimedia.org/P9335 and previous config saved to /var/cache/conftool/dbconfig/20191015-043403-marostegui.json
  • 04:15 marostegui: Start pre-switchover steps T234300

2019-10-14

  • 23:27 Krinkle: Delete 2019-09-01––2019-09-10 arclamp trace logs from webperf1002, and decompress the rest of 2019-09 (this will trigger svg re-generation), T235425
  • 23:10 krinkle@deploy1001: Synchronized wmf-config/InitialiseSettings.php: 86f12b6e (duration: 00m 51s)
  • 21:47 Krinkle: Deleting 2019-09-01––2019-09-10 arclamp logs on webperf2002, and decompress the rest of 2019-09, T235425
  • 21:12 Krinkle: Delete misc arclamp/logs and arclamp/svgs data from between 2018 and and 2019-08 on webperf1002/webperf2002, T235425
  • 20:41 maxsem@deploy1001: Synchronized php-1.35.0-wmf.1/includes/: https://gerrit.wikimedia.org/r/#/c/mediawiki/core/+/542963/ (duration: 00m 55s)
  • 17:56 mutante: webperf2002 - /srv/xenon/logs/daily# gzip 2019-09*excimer*.log (T235425)
  • 17:21 onimisionipe@deploy1001: Finished deploy [wdqs/wdqs@217cac5]: New blazegraph build and GUI updates (duration: 16m 45s)
  • 17:04 onimisionipe@deploy1001: Started deploy [wdqs/wdqs@217cac5]: New blazegraph build and GUI updates
  • 16:07 moritzm: imported cergen 0.2.4-1+deb10u3 to component/cergen for buster-wikimedia T235405
  • 16:00 Urbanecm: Password reset for Xaris333 (T235441)
  • 15:57 moritzm: imported cergen 0.2.4-1+deb10u2 to component/cergen for buster-wikimedia T235405
  • 14:28 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db1103:3314 for schema change T233625', diff saved to https://phabricator.wikimedia.org/P9329 and previous config saved to /var/cache/conftool/dbconfig/20191014-142843-marostegui.json
  • 14:28 elukey: upload matomo 3.11 to stretch-wikimedia and upgrade matomo1001 - T234607
  • 14:21 marostegui: Deploy schema change on db1116:3317 T234066 T233135
  • 14:13 effie: Enable puppet on mw* servers and reload apache - T229792
  • 13:48 moritzm: imported cergen 0.2.4-1+deb10u1 to component/cergen for buster-wikimedia T235405
  • 13:42 marostegui: Repool labsdb1009 after PSU replacement - T233273
  • 13:36 effie: Slowly enable puppet on mw* canaries
  • 13:26 moritzm: imported python-networkx 1.11-2~wmf1 to component/cergen for buster-wikimedia T235405
  • 13:21 @: helmfile [EQIAD] Ran 'apply' command on namespace 'cxserver' for release 'production' .
  • 13:19 @: helmfile [CODFW] Ran 'apply' command on namespace 'cxserver' for release 'production' .
  • 13:18 @: helmfile [STAGING] Ran 'apply' command on namespace 'cxserver' for release 'staging' .
  • 13:18 effie: Disable puppet on mw* to remove php72_only feature flag - T229792
  • 13:13 urbanecm@deploy1001: Synchronized wmf-config/InitialiseSettings.php: 245b4e5: Add banwiki logo to IS.php (T234768) (duration: 00m 51s)
  • 13:12 Urbanecm: Run git reset --hard origin/master in /srv/mediawiki-stagging (deleted https://gerrit.wikimedia.org/r/c/operations/mediawiki-config/+/542920 and https://gerrit.wikimedia.org/r/c/operations/mediawiki-config/+/542919 from deployment srv, both don't actually change anything => safe to delete) (T234768)
  • 13:10 marostegui: Sanitize banwiki on db1124:3313 and db2094:3313 T234770
  • 12:44 Amir1: Creating banwiki is banned (done)
  • 12:40 ladsgroup@deploy1001: Synchronized wmf-config/interwiki.php: Update interwiki cache (duration: 03m 04s)
  • 12:34 ladsgroup@deploy1001: Synchronized langlist: Creating banwiki: T234768 (duration: 00m 50s)
  • 12:32 ladsgroup@deploy1001: Synchronized static/images/project-logos/: Creating banwiki: T234768 (duration: 00m 51s)
  • 12:31 ladsgroup@deploy1001: Synchronized wmf-config/InitialiseSettings.php: Creating banwiki: T234768 (duration: 00m 51s)
  • 12:28 ladsgroup@deploy1001: rebuilt and synchronized wikiversions files: Creating banwiki: T234768
  • 12:20 ladsgroup@deploy1001: Synchronized dblists: Creating banwiki: T234768 (duration: 00m 52s)
  • 12:10 tarrow@deploy1001: Synchronized php-1.35.0-wmf.1/extensions/Wikibase: SWAT: Bump up Termbox cache version (T235192) (duration: 00m 56s)
  • 11:46 lucaswerkmeister-wmde@deploy1001: Synchronized wmf-config/InitialiseSettings.php: SWAT: Enable reftabs on testwikidata (T199197, T228412) (duration: 00m 51s)
  • 11:33 urbanecm@deploy1001: Synchronized wmf-config/InitialiseSettings.php: SWAT: a295cc7: Fix wrong domain in wgCopyUploadDomains added in T203363 (T235415) (duration: 00m 51s)
  • 11:27 kart_: Update cxserver to 2019-10-03-054958-production (T232986)
  • 11:22 @: helmfile [EQIAD] Ran 'apply' command on namespace 'cxserver' for release 'production' .
  • 11:17 @: helmfile [CODFW] Ran 'apply' command on namespace 'cxserver' for release 'production' .
  • 11:15 @: helmfile [STAGING] Ran 'apply' command on namespace 'cxserver' for release 'staging' .
  • 11:09 kartik@deploy1001: Synchronized wmf-config/InitialiseSettings.php: SWAT: 538867|Use ContentTranslationEnableMT to disable MT (T232986) (duration: 00m 51s)
  • 10:35 jdrewniak@deploy1001: Synchronized portals: Wikimedia Portals Update: Bumping portals to master (T128546) (duration: 00m 51s)
  • 10:34 jdrewniak@deploy1001: Synchronized portals/wikipedia.org/assets: Wikimedia Portals Update: Bumping portals to master (T128546) (duration: 00m 52s)
  • 10:08 marostegui@cumin1001: dbctl commit (dc=all): 'Set db1100 with weight 0 in preparation for tomorrow's failover T234300', diff saved to https://phabricator.wikimedia.org/P9326 and previous config saved to /var/cache/conftool/dbconfig/20191014-100758-marostegui.json
  • 09:48 marostegui@cumin1001: dbctl commit (dc=all): 'Pool db1130 into s5 api, db1100 will be removed later in preparation for tomorrow's failover T234300', diff saved to https://phabricator.wikimedia.org/P9325 and previous config saved to /var/cache/conftool/dbconfig/20191014-094809-marostegui.json
  • 09:34 hashar: Upgraded CI jobs to Quibble 0.0.38
  • 09:14 marostegui: Deploy schema change on dbstore1003:3317
  • 08:56 @: helmfile [STAGING] Ran 'apply' command on namespace 'restrouter' for release 'staging' .
  • 08:55 @: helmfile [CODFW] Ran 'apply' command on namespace 'restrouter' for release 'production' .
  • 08:52 @: helmfile [EQIAD] Ran 'apply' command on namespace 'restrouter' for release 'production' .
  • 08:51 marostegui@cumin1001: dbctl commit (dc=all): 'Repool db1074 and db2126 after changing sanitarium to replicate from db1074 T231638', diff saved to https://phabricator.wikimedia.org/P9322 and previous config saved to /var/cache/conftool/dbconfig/20191014-085143-marostegui.json
  • 08:46 mobrovac: restbase drop metadata keyspaces from cassandra - T235173
  • 07:54 marostegui: Stop db1074 and db2126 in sync to change sanitarium's master for s2 - T231638
  • 07:49 mobrovac@deploy1001: Finished deploy [restbase/deploy@4d469a1] (dev-cluster): Remove VE logging and stop using storage for /page/metadata (duration: 03m 58s)
  • 07:45 mobrovac@deploy1001: Started deploy [restbase/deploy@4d469a1] (dev-cluster): Remove VE logging and stop using storage for /page/metadata
  • 07:41 mobrovac@deploy1001: Finished deploy [restbase/deploy@e0d071f]: Remove VE logging and stop using storage for /page/metadata - T234928 T235173 (duration: 13m 37s)
  • 07:33 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db1074 and db2126 to change sanitarium to replicate from db1074 T231638', diff saved to https://phabricator.wikimedia.org/P9320 and previous config saved to /var/cache/conftool/dbconfig/20191014-073319-marostegui.json
  • 07:28 mobrovac@deploy1001: Started deploy [restbase/deploy@e0d071f]: Remove VE logging and stop using storage for /page/metadata - T234928 T235173
  • 07:28 mobrovac@deploy1001: Finished deploy [changeprop/deploy@c25a1c2]: Do not pre-generate /page/metadata - T235173 (duration: 01m 25s)
  • 07:26 mobrovac@deploy1001: Started deploy [changeprop/deploy@c25a1c2]: Do not pre-generate /page/metadata - T235173
  • 07:21 marostegui@cumin1001: dbctl commit (dc=all): 'Remove db2068 from config - T235399', diff saved to https://phabricator.wikimedia.org/P9319 and previous config saved to /var/cache/conftool/dbconfig/20191014-072100-marostegui.json
  • 07:16 marostegui: Stop MySQL on labsdb1009 for on-site maintenance - T233273
  • 07:01 marostegui@cumin1001: END (FAIL) - Cookbook sre.hosts.decommission (exit_code=1)
  • 07:01 marostegui@cumin1001: START - Cookbook sre.hosts.decommission
  • 06:03 marostegui@deploy1001: Synchronized wmf-config/db-codfw.php: Remove db2068 from config T235399 (duration: 00m 51s)
  • 06:02 marostegui@deploy1001: Synchronized wmf-config/db-eqiad.php: Remove db2068 from config T235399 (duration: 00m 53s)
  • 05:47 marostegui: Remove db2068 from tendril and zarcillo T235399
  • 04:56 marostegui: Depool labsdb1009 for on-site maintenance - T233273
  • 04:56 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db1101:3317 for schema change T233625', diff saved to https://phabricator.wikimedia.org/P9318 and previous config saved to /var/cache/conftool/dbconfig/20191014-045629-marostegui.json

2019-10-13

  • 00:52 krinkle@deploy1001: Synchronized wmf-config/CommonSettings.php: ec77b1b (duration: 00m 55s)

2019-10-12

  • 23:21 krinkle@deploy1001: Synchronized wmf-config/profiler.php: bfa8bb69c1f, T231564 (duration: 00m 51s)
  • 21:07 krinkle@deploy1001: Synchronized php-1.35.0-wmf.1/includes/resourceloader/ResourceLoaderStartUpModule.php: 8c6baeae2 (duration: 00m 53s)
  • 20:57 Urbanecm: Reset user email of User:Gardini (T235318)
  • 18:38 _joe_: deleting zotero pods with excessive memory usage in eqiad
  • 16:16 reedy@deploy1001: Synchronized php-1.35.0-wmf.1/includes/api/ApiQueryBase.php: T235334 (duration: 00m 51s)
  • 16:15 reedy@deploy1001: Synchronized php-1.35.0-wmf.1/includes/api/ApiQueryBacklinksprop.php: T235334 (duration: 00m 56s)
  • 04:37 krinkle@deploy1001: Synchronized wmf-config/profiler.php: 29d8469 (duration: 00m 57s)

2019-10-11

  • 15:39 AndyRussG: updated fruec from 18d89675d0 to 1e6a6ee2de
  • 13:57 moritzm: rebooting cloudbackup2001
  • 13:57 jmm@cumin2001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
  • 13:57 jmm@cumin2001: START - Cookbook sre.hosts.downtime
  • 13:01 moritzm: installing 4.9.189 Linux update from last stretch point releases (no reboots, deploying the package only at this point)
  • 12:48 XioNoX: disable SIP ALG on pfw3-eqiad - T235150
  • 12:47 XioNoX: disable SIP ALG on pfw3-codfw - T235150
  • 12:45 moritzm: installing libxslt security updates
  • 12:35 moritzm: installin zsh updates from stretch point release
  • 12:33 moritzm: installing gsoap security updates on stretch
  • 12:32 marostegui@cumin1001: dbctl commit (dc=all): 'Repool db1098:3317 after schema change T233625', diff saved to https://phabricator.wikimedia.org/P9314 and previous config saved to /var/cache/conftool/dbconfig/20191011-123159-marostegui.json
  • 12:31 moritzm: installing libcaca security updates on stretch
  • 12:25 XioNoX: push firewall policies to pfw3-eqiad - T235074
  • 12:24 XioNoX: push firewall policies to pfw3-codfw - T235074
  • 11:51 moritzm: installing unzip security updates on stretch
  • 11:08 moritzm: upgrading debdeploy to 0.0.99.11
  • 10:18 moritzm: imported debdeploy 0.0.99.11 for jessie/stretch/buster-wikimedia
  • 10:11 hashar: Restarting Gerrit # T224448
  • 10:02 hashar: gerrit: killed a stall SendEmail thread that was holding a lock
  • 08:34 moritzm: remove kafka2001-2003 from debmonitor DB (T235125)
  • 08:32 moritzm: remove kafka1001-1003 from debmonitor DB (T235125)
  • 08:30 jmm@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
  • 08:28 jmm@cumin1001: START - Cookbook sre.hosts.downtime
  • 08:04 moritzm: reimaging labpuppetmaster1002 (spare) for some tests related to microcode loading
  • 07:32 XioNoX: rollback two previous HE peering deactivate
  • 07:30 XioNoX: deactivate HE peering on cr2-eqord for packet loss
  • 07:28 XioNoX: deactivate HE peering on cr1-eqiad for packet loss
  • 06:13 marostegui: Compress tables on db2085:3318 - T232446
  • 06:08 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db2085:3318 for compression - T232446', diff saved to https://phabricator.wikimedia.org/P9311 and previous config saved to /var/cache/conftool/dbconfig/20191011-060814-marostegui.json
  • 05:27 papaul: rebooting an-conf1001 for serial troubleshooting
  • 05:13 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.decommission (exit_code=0)
  • 05:13 marostegui@cumin1001: START - Cookbook sre.hosts.decommission
  • 04:54 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db1098:3317 for schema change T233625', diff saved to https://phabricator.wikimedia.org/P9310 and previous config saved to /var/cache/conftool/dbconfig/20191011-045409-marostegui.json
  • 02:14 mutante: gerrit - "manually" starting replication via ssh command
  • 02:13 mutante: gerrit - restart service to ensure last config change is picked up
  • 02:10 mutante: gerrit1001 - attempt to manually start replication to github

2019-10-10

  • 22:27 jforrester@deploy1001: Synchronized wmf-config/InitialiseSettings.php: Stop setting wgMFMobileFormatterHeadings, unread T232690 (duration: 00m 51s)
  • 22:17 jforrester@deploy1001: Synchronized wmf-config/InitialiseSettings.php: T78711 Update cron-updated miser pages to say they are run periodically, not never (duration: 00m 51s)
  • 22:10 jforrester@deploy1001: Synchronized wmf-config/wikitech.php: Remove debug line dating from 2015-12-08! (duration: 00m 51s)
  • 22:04 jforrester@deploy1001: Synchronized wmf-config/mc.php: Drop nutcracker indirection for HHVM servers, just point to localhost (duration: 00m 51s)
  • 21:58 jforrester@deploy1001: Synchronized wmf-config/PhpAutoPrepend.php: Drop special-case for PHP7, now always used (duration: 00m 51s)
  • 21:55 jforrester@deploy1001: Synchronized wmf-config/CommonSettings.php: Drop HHVM special-case for SVG converter, no longer used (duration: 00m 51s)
  • 21:49 jforrester@deploy1001: Synchronized wmf-config/CommonSettings.php: Don't check to shard static config cache for HHVM any more (duration: 00m 50s)
  • 21:48 jforrester@deploy1001: Synchronized wmf-config/Wikibase.php: Don't check to shard wmgWBSharedCacheKey for HHVM any more (duration: 00m 51s)
  • 21:39 jforrester@deploy1001: Synchronized php-1.35.0-wmf.1/extensions/VisualEditor/lib/ve/src/dm/ve.dm.TreeCursor.js: T234881 TreeCursor: cross ignored nodes properly from the end of a text node (duration: 00m 54s)
  • 20:36 otto@deploy1001: Finished deploy [analytics/refinery@9b322e4]: attempting to fix missing git fat jar on stat1004 (duration: 00m 06s)
  • 20:36 otto@deploy1001: Started deploy [analytics/refinery@9b322e4]: attempting to fix missing git fat jar on stat1004
  • 20:13 hoo: Updated the Wikidata property suggester with data from the 2019-09-30 JSON dump and applied the T132839 workarounds
  • 19:33 godog: swift eqiad-prod: add weight to ms-be105[1-6] - T232367
  • 19:29 marxarelli: promoted 1.35.0-wmf.1 to all wikis. no rise in errors rates. no new relevant errors cc: T233849
  • 19:25 godog: swift codfw-prod: more weight to ms-be205[1-6] - T233638
  • 19:20 dduvall@deploy1001: rebuilt and synchronized wikiversions files: all wikis to 1.35.0-wmf.1
  • 19:11 dduvall@deploy1001: rebuilt and synchronized wikiversions files: labswiki to 1.35.0-wmf.1
  • 19:09 dduvall@deploy1001: Synchronized php-1.35.0-wmf.1/extensions/OpenStackManager: labswiki to 1.35.0-wmf.1 (duration: 01m 00s)
  • 19:04 marxarelli: promoting labswiki to 1.35.0-wmf.1 cc: T233849
  • 17:07 jbond42: puppetmaster1001 has been upgraded and is back serving requests
  • 16:21 urandom: Upgrading sessionstore200[1-3].codfw.wmnet to Cassandra 3.11.4 -- T200803
  • 16:18 urandom: Upgrading sessionstore1003.eqiad.wmnet to Cassandra 3.11.4 -- T200803
  • 16:16 urandom: Upgrading sessionstore1002.eqiad.wmnet to Cassandra 3.11.4 -- T200803
  • 16:11 @: helmfile [EQIAD] Ran 'apply' command on namespace 'termbox' for release 'production' .
  • 16:07 @: helmfile [CODFW] Ran 'apply' command on namespace 'termbox' for release 'production' .
  • 16:04 thcipriani: restarting gerrit due to T224448
  • 16:04 @: helmfile [STAGING] Ran 'apply' command on namespace 'termbox' for release 'staging' .
  • 16:01 urandom: Upgrading sessionstore1001.eqiad.wmnet to Cassandra 3.11.4 -- T200803
  • 15:42 @: helmfile [STAGING] Ran 'apply' command on namespace 'termbox' for release 'test' .
  • 15:23 mholloway-shell@deploy1001: Finished deploy [mobileapps/deploy@1adf74e]: Update mobileapps to c89aa55 (duration: 05m 39s)
  • 15:18 mholloway-shell@deploy1001: Started deploy [mobileapps/deploy@1adf74e]: Update mobileapps to c89aa55
  • 14:57 marostegui@cumin1001: dbctl commit (dc=all): 'Fully repool db1074 after getting its BBU replaced T231638', diff saved to https://phabricator.wikimedia.org/P9306 and previous config saved to /var/cache/conftool/dbconfig/20191010-145737-marostegui.json
  • 14:54 moritzm: ran systemctl reset-failed on puppetmaster1001 (puppet-master.service after reimage)
  • 14:42 marostegui@cumin1001: dbctl commit (dc=all): 'Slowly repool db1074 after BBU replacement T231638', diff saved to https://phabricator.wikimedia.org/P9305 and previous config saved to /var/cache/conftool/dbconfig/20191010-144201-marostegui.json
  • 14:39 marostegui@cumin1001: dbctl commit (dc=all): 'Fully repool db1112 into recentchanges and remove db1078 from it after PDU maintenance', diff saved to https://phabricator.wikimedia.org/P9304 and previous config saved to /var/cache/conftool/dbconfig/20191010-143924-marostegui.json
  • 14:36 marostegui@cumin1001: dbctl commit (dc=all): 'Fully repool to db1084 db1083 db1076 db1112 db1118 after PDU maintenance', diff saved to https://phabricator.wikimedia.org/P9303 and previous config saved to /var/cache/conftool/dbconfig/20191010-143633-marostegui.json
  • 14:23 marostegui@cumin1001: dbctl commit (dc=all): 'More traffic to db1084 db1083 db1076 db1112 db1118 after PDU maintenance', diff saved to https://phabricator.wikimedia.org/P9302 and previous config saved to /var/cache/conftool/dbconfig/20191010-142323-marostegui.json
  • 14:13 marostegui@cumin1001: dbctl commit (dc=all): 'Slowly repool db1084 db1083 db1076 db1112 db1118 after PDU maintenance', diff saved to https://phabricator.wikimedia.org/P9301 and previous config saved to /var/cache/conftool/dbconfig/20191010-141303-marostegui.json
  • 14:04 marostegui@deploy1001: Synchronized wmf-config/db-eqiad.php: Fully repool es1013, es1014 after PDU maintenance (duration: 00m 59s)
  • 14:03 jbond42: re-enable puppet now ca has been correctly moved
  • 13:58 marostegui@cumin1001: dbctl commit (dc=all): 'Slowly repool db1112 after PDU maintenance', diff saved to https://phabricator.wikimedia.org/P9300 and previous config saved to /var/cache/conftool/dbconfig/20191010-135806-marostegui.json
  • 13:57 marostegui@cumin1001: dbctl commit (dc=all): 'Slowly repool db1084 db1083 db1076 db1118 after PDU maintenance', diff saved to https://phabricator.wikimedia.org/P9299 and previous config saved to /var/cache/conftool/dbconfig/20191010-135659-marostegui.json
  • 13:50 jbond42: disable puppet fleet wide as puppetmaster2002 is stuggeling
  • 13:32 jbond42: reimage puppetmaster1001
  • 13:27 marostegui: Repool labsdb1011 after reclone - T235016
  • 13:16 arturo: added flannel 0.5.5-4 to buster-wikimedia (T235059)
  • 13:05 marostegui@deploy1001: Synchronized wmf-config/db-eqiad.php: More traffic to es1013, es1014 after PDU maintenance (duration: 00m 58s)
  • 13:00 jbond@cumin2001: END (PASS) - Cookbook sre.hosts.ipmi-password-reset (exit_code=0)
  • 12:41 marostegui@deploy1001: Synchronized wmf-config/db-eqiad.php: Slowly repool es1013, es1014 after PDU maintenance (duration: 00m 59s)
  • 11:57 jbond@cumin2001: Updating IPMI password on 1253 hosts - jbond@cumin2001
  • 11:57 jbond@cumin2001: START - Cookbook sre.hosts.ipmi-password-reset
  • 11:48 jbond@cumin2001: END (PASS) - Cookbook sre.hosts.ipmi-password-reset (exit_code=0)
  • 11:46 jbond@cumin2001: Updating IPMI password on 35 hosts - jbond@cumin2001
  • 11:46 jbond@cumin2001: START - Cookbook sre.hosts.ipmi-password-reset
  • 11:41 lucaswerkmeister-wmde@deploy1001: Synchronized wmf-config/Wikibase.php: Fix typo in beta repo data bridge config (T235033) (duration: 00m 59s)
  • 11:40 marostegui: Deploy schema change on s7 codfw master (db2118), this will generate lag on s7 codfw - T234066 T233135
  • 11:38 jbond@cumin2001: END (FAIL) - Cookbook sre.hosts.ipmi-password-reset (exit_code=99)
  • 11:38 jbond@cumin2001: Updating IPMI password on 1253 hosts - jbond@cumin2001
  • 11:38 jbond@cumin2001: START - Cookbook sre.hosts.ipmi-password-reset
  • 11:37 arturo: icinga downtime cloudvirt1023 for 2h (T227536)
  • 11:36 arturo: icinga downtime cloudvirt1025 for 2h (T227536)
  • 11:36 jbond@cumin2001: END (FAIL) - Cookbook sre.hosts.ipmi-password-reset (exit_code=99)
  • 11:36 jbond@cumin2001: Updating IPMI password on 1253 hosts - jbond@cumin2001
  • 11:36 jbond@cumin2001: START - Cookbook sre.hosts.ipmi-password-reset
  • 11:35 arturo: icinga downtime cloudvirt1026 for 2h (T227536)
  • 11:35 marostegui: Stop replication on db2077 to change triggers on db2095:3317 - T234704
  • 11:23 moritzm: installing reportbug updates from stretch point release
  • 11:22 Lucas_WMDE: EU SWAT done
  • 11:21 jbond@cumin2001: END (FAIL) - Cookbook sre.hosts.ipmi-password-reset (exit_code=99)
  • 11:21 jbond@cumin2001: Updating IPMI password on 1253 hosts - jbond@cumin2001
  • 11:21 jbond@cumin2001: START - Cookbook sre.hosts.ipmi-password-reset
  • 11:21 lucaswerkmeister-wmde@deploy1001: Synchronized wmf-config/: SWAT: Set dataBridgeEnabled repo setting on beta (T235033) (affects InitialiseSettings-labs.php and Wikibase.php, but Wikibase.php part is guarded by isset(), so should be safe to sync both at once, I think) (duration: 01m 00s)
  • 11:21 jbond@cumin2001: END (FAIL) - Cookbook sre.hosts.ipmi-password-reset (exit_code=99)
  • 11:21 jbond@cumin2001: START - Cookbook sre.hosts.ipmi-password-reset
  • 11:14 Lucas_WMDE: ^ (and by CS, I actually mean Wikibase.php, not CommonSettings.php, sorry)
  • 11:13 lucaswerkmeister-wmde@deploy1001: Synchronized wmf-config/: SWAT: Rename data bridge config variable names (T235033) (affects IS-labs and CS, but the CS part is all guarded by isset(), so should be safe to sync both at once, I think) (duration: 01m 00s)
  • 10:38 moritzm: rebalancing Ganeti eqiad/row C after rolling reboots of Ganeti nodes
  • 10:34 volans: uploaded spicerack_0.0.28-1_amd64.deb to apt.wikimedia.org stretch-wikimedia
  • 08:23 @: helmfile [EQIAD] Ran 'apply' command on namespace 'restrouter' for release 'production' .
  • 08:20 @: helmfile [CODFW] Ran 'apply' command on namespace 'restrouter' for release 'production' .
  • 08:17 @: helmfile [STAGING] Ran 'apply' command on namespace 'restrouter' for release 'staging' .
  • 08:12 mobrovac@deploy1001: Synchronized wmf-config/CommonSettings.php: Add wtp1025/wtp2001 to the list of servers using Parsoid/PHP - T233654 (duration: 01m 01s)
  • 07:55 marostegui: Stop MySQL on es1014 es1013 db1084 db1083 db1077 db1076 db1112 db1124 db1118 for on-site PDU maintenance (this will generate lag on labsdb hosts) - T227536
  • 06:57 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.decommission (exit_code=0)
  • 06:56 marostegui@cumin1001: START - Cookbook sre.hosts.decommission
  • 06:45 marostegui: Drop designate_pool_manager database from m5 - T233978
  • 06:33 marostegui: Revoke privileges from designate user on the designate_pool_manager database - T233978
  • 05:51 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db1112 for PDU maintenance T227536', diff saved to https://phabricator.wikimedia.org/P9294 and previous config saved to /var/cache/conftool/dbconfig/20191010-055153-marostegui.json
  • 05:51 marostegui@cumin1001: dbctl commit (dc=all): 'Pool db1078 into rc service for s3 for PDU maintenance', diff saved to https://phabricator.wikimedia.org/P9293 and previous config saved to /var/cache/conftool/dbconfig/20191010-055102-marostegui.json
  • 05:48 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db1084 db1083 db1076 db1118 for PDU maintenance', diff saved to https://phabricator.wikimedia.org/P9292 and previous config saved to /var/cache/conftool/dbconfig/20191010-054853-marostegui.json
  • 05:47 marostegui: Depool db1084 db1083 db1076 db1118 for PDU maintenance - T227536
  • 05:04 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.decommission (exit_code=0)
  • 05:04 marostegui@cumin1001: START - Cookbook sre.hosts.decommission
  • 04:53 marostegui: Deploy schema change on db1061 (s6 eqiad master) - T233135 T234066
  • 04:43 marostegui: Depool labsdb1011 for recloning - T235016
  • 00:39 dzahn@cumin1001: END (PASS) - Cookbook sre.hosts.ipmi-password-reset (exit_code=0)
  • 00:39 dzahn@cumin1001: Updating IPMI password on 1 hosts - dzahn@cumin1001
  • 00:39 dzahn@cumin1001: START - Cookbook sre.hosts.ipmi-password-reset
  • 00:39 dzahn@cumin1001: END (FAIL) - Cookbook sre.hosts.ipmi-password-reset (exit_code=99)
  • 00:38 dzahn@cumin1001: START - Cookbook sre.hosts.ipmi-password-reset

2019-10-09

  • 23:55 twentyafterfour@deploy1001: deploy aborted: (no justification provided) (duration: 03m 57s)
  • 23:51 twentyafterfour@deploy1001: Started deploy [phabricator/deployment@e4e2b22]: (no justification provided)
  • 23:24 catrope@deploy1001: Synchronized wmf-config/InitialiseSettings.php: Enable AMC on all wikis (T233612) (duration: 00m 58s)
  • 23:17 catrope@deploy1001: Synchronized wmf-config/InitialiseSettings.php: Turn on AMC outreach modal (T234026) (duration: 00m 59s)
  • 22:01 mutante: restarting gerrit to revert replication config change (T235135)
  • 21:27 godog: swift eqiad-prod: add ms-be105[1-6] - T232367
  • 21:02 otto@deploy1001: Finished deploy [analytics/refinery@9b322e4]: (no justification provided) (duration: 00m 02s)
  • 21:02 otto@deploy1001: Started deploy [analytics/refinery@9b322e4]: (no justification provided)
  • 21:02 otto@deploy1001: deploy aborted: (no justification provided) (duration: 38m 29s)
  • 20:55 ppchelko@deploy1001: Finished deploy [restbase/deploy@aaadd73] (dev-cluster): Switch to wikifeeds, rb-dev1006 (duration: 01m 44s)
  • 20:53 ppchelko@deploy1001: Started deploy [restbase/deploy@aaadd73] (dev-cluster): Switch to wikifeeds, rb-dev1006
  • 20:44 ppchelko@deploy1001: Finished deploy [restbase/deploy@aaadd73] (dev-cluster): Switch to wikifeeds (duration: 02m 42s)
  • 20:41 ppchelko@deploy1001: Started deploy [restbase/deploy@aaadd73] (dev-cluster): Switch to wikifeeds
  • 20:31 papaul: rebooting ms-be1051 to access BIOS
  • 20:28 mholloway-shell@deploy1001: Finished deploy [mobileapps/deploy@469ed65]: Update mobileapps to b9a225e (duration: 06m 22s)
  • 20:28 filippo@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
  • 20:28 filippo@cumin1001: START - Cookbook sre.hosts.downtime
  • 20:28 filippo@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
  • 20:28 filippo@cumin1001: START - Cookbook sre.hosts.downtime
  • 20:23 otto@deploy1001: Started deploy [analytics/refinery@9b322e4]: (no justification provided)
  • 20:22 mholloway-shell@deploy1001: Started deploy [mobileapps/deploy@469ed65]: Update mobileapps to b9a225e
  • 20:16 milimetric@deploy1001: Finished deploy [analytics/refinery@46501d1]: new geoeditors column and wikipedia portal EL fix (duration: 00m 10s)
  • 20:16 milimetric@deploy1001: Started deploy [analytics/refinery@46501d1]: new geoeditors column and wikipedia portal EL fix
  • 20:16 milimetric@deploy1001: Finished deploy [analytics/refinery@46501d1]: new geoeditors column and wikipedia portal EL fix (duration: 05m 34s)
  • 20:10 milimetric@deploy1001: Started deploy [analytics/refinery@46501d1]: new geoeditors column and wikipedia portal EL fix
  • 20:09 milimetric@deploy1001: Finished deploy [analytics/refinery@46501d1]: new geoeditors column and wikipedia portal EL fix (duration: 02m 23s)
  • 20:06 milimetric@deploy1001: Started deploy [analytics/refinery@46501d1]: new geoeditors column and wikipedia portal EL fix
  • 20:01 @: helmfile [CODFW] Ran 'apply' command on namespace 'wikifeeds' for release 'production' .
  • 19:56 @: helmfile [EQIAD] Ran 'apply' command on namespace 'wikifeeds' for release 'production' .
  • 19:54 milimetric@deploy1001: Finished deploy [analytics/refinery@0a914bf]: new geoeditors column and wikipedia portal EL fix (duration: 00m 12s)
  • 19:54 @: helmfile [STAGING] Ran 'apply' command on namespace 'wikifeeds' for release 'staging' .
  • 19:54 milimetric@deploy1001: Started deploy [analytics/refinery@0a914bf]: new geoeditors column and wikipedia portal EL fix
  • 19:52 milimetric@deploy1001: Finished deploy [analytics/refinery@0a914bf]: new geoeditors column and wikipedia portal EL fix (duration: 08m 00s)
  • 19:44 milimetric@deploy1001: Started deploy [analytics/refinery@0a914bf]: new geoeditors column and wikipedia portal EL fix
  • 19:44 milimetric@deploy1001: Finished deploy [analytics/refinery@0a914bf]: new geoeditors column and wikipedia portal EL fix (duration: 09m 33s)
  • 19:34 milimetric@deploy1001: Started deploy [analytics/refinery@0a914bf]: new geoeditors column and wikipedia portal EL fix
  • 19:25 marxarelli: 1.35.0-wmf.1 promoted to group1, labswiki rolled back to 1.34.0-wmf.25 and to be kept back, cc: T233849
  • 19:09 dduvall@deploy1001: rebuilt and synchronized wikiversions files: labswiki rollback to 1.34.0-wmf.25 due to hhvm
  • {{safesubst:SAL entry|1=19:09 urandom: Upgrade restbase-dev1006-{a,b} to Cassandra 3.11.4 -- T200803}}
  • 19:09 dduvall@deploy1001: Synchronized php: group1 wikis to 1.35.0-wmf.1 (duration: 00m 58s)
  • 19:06 dduvall@deploy1001: rebuilt and synchronized wikiversions files: group1 wikis to 1.35.0-wmf.1
  • {{safesubst:SAL entry|1=18:51 urandom: Upgrade restbase-dev1005-{a,b} to Cassandra 3.11.4 -- T200803}}
  • {{safesubst:SAL entry|1=18:45 urandom: Upgrade restbase-dev1004-{a,b} to Cassandra 3.11.4 -- T200803}}
  • 18:44 filippo@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
  • 18:44 filippo@cumin1001: START - Cookbook sre.hosts.downtime
  • 18:43 filippo@cumin1001: END (FAIL) - Cookbook sre.hosts.downtime (exit_code=99)
  • 18:43 filippo@cumin1001: START - Cookbook sre.hosts.downtime
  • 17:22 elukey: roll restart aqs on aqs100[4-9] to pick up new Druid config changes
  • 17:19 eileen: civicrm revision changed from 2ba100486e to 5a2f8048c4, config revision is 5560cc0878
  • 16:50 @: helmfile [CODFW] Ran 'apply' command on namespace 'wikifeeds' for release 'production' .
  • 16:48 @: helmfile [EQIAD] Ran 'apply' command on namespace 'wikifeeds' for release 'production' .
  • 16:46 @: helmfile [STAGING] Ran 'apply' command on namespace 'wikifeeds' for release 'staging' .
  • 16:05 marostegui@cumin1001: dbctl commit (dc=all): 'Fully repool db1075 after unexpected reboot', diff saved to https://phabricator.wikimedia.org/P9289 and previous config saved to /var/cache/conftool/dbconfig/20191009-160506-marostegui.json
  • 15:37 marostegui@cumin1001: dbctl commit (dc=all): 'Repool db1101:3318 after schema change', diff saved to https://phabricator.wikimedia.org/P9288 and previous config saved to /var/cache/conftool/dbconfig/20191009-153705-marostegui.json
  • 15:04 akosiaris@: helmfile [STAGING] Ran 'sync' command on namespace 'kube-system' for release 'calico-policy-controller' .
  • 15:02 akosiaris@: helmfile [CODFW] Ran 'sync' command on namespace 'kube-system' for release 'calico-policy-controller' .
  • 14:51 marostegui@cumin1001: dbctl commit (dc=all): 'Remove db1085 vslow and dump group', diff saved to https://phabricator.wikimedia.org/P9287 and previous config saved to /var/cache/conftool/dbconfig/20191009-145102-marostegui.json
  • 14:49 marostegui@cumin1001: dbctl commit (dc=all): 'Repool db1113:3316 after schema change', diff saved to https://phabricator.wikimedia.org/P9286 and previous config saved to /var/cache/conftool/dbconfig/20191009-144928-marostegui.json
  • 14:46 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db1101:3318 for schema change T233625', diff saved to https://phabricator.wikimedia.org/P9285 and previous config saved to /var/cache/conftool/dbconfig/20191009-144607-marostegui.json
  • 14:44 marostegui@cumin1001: dbctl commit (dc=all): 'More trafic to db1075 after unexpected reboot', diff saved to https://phabricator.wikimedia.org/P9284 and previous config saved to /var/cache/conftool/dbconfig/20191009-144400-marostegui.json
  • 14:38 elukey: cr1-eqsin: change IPv6 address for BGP peer AS4761
  • 14:11 marostegui@cumin1001: dbctl commit (dc=all): 'Repool db1099:3318 after schema change T233625', diff saved to https://phabricator.wikimedia.org/P9283 and previous config saved to /var/cache/conftool/dbconfig/20191009-141137-marostegui.json
  • 14:07 marostegui@cumin1001: dbctl commit (dc=all): 'Slowly repool db1075 after unexpected reboot', diff saved to https://phabricator.wikimedia.org/P9282 and previous config saved to /var/cache/conftool/dbconfig/20191009-140749-marostegui.json
  • 14:05 jbond@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
  • 14:03 jbond@cumin1001: START - Cookbook sre.hosts.downtime
  • 14:02 moritzm: rebalancing Ganeti eqiad/row A after rolling reboots of Ganeti nodes
  • 13:48 jbond42: reimage puppetmaster2001
  • 13:37 vgutierrez: repooling cp1085 - T231525
  • 13:37 marostegui@cumin1001: dbctl commit (dc=all): 'depool db1075', diff saved to https://phabricator.wikimedia.org/P9280 and previous config saved to /var/cache/conftool/dbconfig/20191009-133709-marostegui.json
  • 13:13 mobrovac@deploy1001: Finished deploy [restbase/deploy@aaadd73]: Parsoid: Retry fetching stashes with undefined as the revid - T234928 (duration: 14m 26s)
  • 12:59 mobrovac@deploy1001: Started deploy [restbase/deploy@aaadd73]: Parsoid: Retry fetching stashes with undefined as the revid - T234928
  • 12:56 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db1099:3318 for schema change T233625', diff saved to https://phabricator.wikimedia.org/P9279 and previous config saved to /var/cache/conftool/dbconfig/20191009-125641-marostegui.json
  • 12:42 marostegui: Stop MySQL and power off db1074 for BBU replacement T231638
  • 12:42 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db1074 for BBU replacement T231638', diff saved to https://phabricator.wikimedia.org/P9278 and previous config saved to /var/cache/conftool/dbconfig/20191009-124218-marostegui.json
  • 12:41 mobrovac@deploy1001: Finished deploy [restbase/deploy@068d2ed]: Feed: Use Wikifeeds; Parsoid: Use the ETag revid for stashing and use the same ETag for stashing and response, take #2 (duration: 08m 18s)
  • 12:40 marostegui@cumin1001: dbctl commit (dc=all): 'Repool db1105:3312 after schema change', diff saved to https://phabricator.wikimedia.org/P9277 and previous config saved to /var/cache/conftool/dbconfig/20191009-124035-marostegui.json
  • 12:38 vgutierrez@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
  • 12:38 vgutierrez@cumin1001: START - Cookbook sre.hosts.downtime
  • 12:36 moritzm: disabled puppet on DNS recursors for staged rollout of ferm NTP change
  • 12:35 jbond42: reimage puppetmaster2002
  • 12:32 mobrovac@deploy1001: Started deploy [restbase/deploy@068d2ed]: Feed: Use Wikifeeds; Parsoid: Use the ETag revid for stashing and use the same ETag for stashing and response, take #2
  • 12:30 mobrovac@deploy1001: Finished deploy [restbase/deploy@068d2ed]: Feed: Use Wikifeeds; Parsoid: Use the ETag revid for stashing and use the same ETag for stashing and response - T170455 T234928 (duration: 09m 40s)
  • 12:28 vgutierrez: depooling cp1085 for a power drain - T231525
  • 12:20 mobrovac@deploy1001: Started deploy [restbase/deploy@068d2ed]: Feed: Use Wikifeeds; Parsoid: Use the ETag revid for stashing and use the same ETag for stashing and response - T170455 T234928
  • 12:13 moritzm: draining ganeti1001 for upcoming reboot (combined kernel/qemu security updates)
  • 12:10 moritzm: failover Ganeti master in eqiad to ganeti1003
  • 12:03 jmm@cumin2001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
  • 12:03 jmm@cumin2001: START - Cookbook sre.hosts.downtime
  • 12:02 jmm@cumin2001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
  • 12:02 jmm@cumin2001: START - Cookbook sre.hosts.downtime
  • 12:02 jmm@cumin2001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
  • 12:02 jmm@cumin2001: START - Cookbook sre.hosts.downtime
  • 11:32 moritzm: draining ganeti1008 for upcoming reboot (combined kernel/qemu security updates)
  • 11:25 jmm@cumin2001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
  • 11:25 jmm@cumin2001: START - Cookbook sre.hosts.downtime
  • 11:25 jmm@cumin2001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
  • 11:25 jmm@cumin2001: START - Cookbook sre.hosts.downtime
  • 11:25 jmm@cumin2001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
  • 11:25 jmm@cumin2001: START - Cookbook sre.hosts.downtime
  • 11:05 Amir1: EU SWAT is done
  • 11:04 ladsgroup@deploy1001: Synchronized wmf-config/InitialiseSettings.php: Put write both limit down to Q70m for item terms (T234948) (duration: 01m 10s)
  • 11:04 @: helmfile [EQIAD] Ran 'sync' command on namespace 'restrouter' for release 'production' .
  • 10:58 akosiaris@: helmfile [EQIAD] Ran 'apply' command on namespace 'kube-system' for release 'calico-policy-controller' .
  • 10:18 @: helmfile [EQIAD] Ran 'sync' command on namespace 'restrouter' for release 'production' .
  • 10:16 @: helmfile [EQIAD] Ran 'apply' command on namespace 'restrouter' for release 'production' .
  • 09:53 aborrero@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
  • 09:53 aborrero@cumin1001: START - Cookbook sre.hosts.downtime
  • 09:53 aborrero@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
  • 09:53 aborrero@cumin1001: START - Cookbook sre.hosts.downtime
  • 09:52 aborrero@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
  • 09:52 aborrero@cumin1001: START - Cookbook sre.hosts.downtime
  • 09:48 aborrero@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
  • 09:48 aborrero@cumin1001: START - Cookbook sre.hosts.downtime
  • 09:44 moritzm: draining ganeti1007 for upcoming reboot (combined kernel/qemu security updates)
  • 09:39 jmm@cumin2001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
  • 09:39 jmm@cumin2001: START - Cookbook sre.hosts.downtime
  • 09:00 jmm@cumin1001: END (PASS) - Cookbook sre.hosts.decommission (exit_code=0)
  • 08:59 jmm@cumin1001: START - Cookbook sre.hosts.decommission
  • 08:50 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db1113:3316 for schema change, temporarily pool db1085 as vslow,dump', diff saved to https://phabricator.wikimedia.org/P9276 and previous config saved to /var/cache/conftool/dbconfig/20191009-085016-marostegui.json
  • 08:47 marostegui@cumin1001: dbctl commit (dc=all): 'Repool db1085 after schema change', diff saved to https://phabricator.wikimedia.org/P9275 and previous config saved to /var/cache/conftool/dbconfig/20191009-084732-marostegui.json
  • 08:39 vgutierrez: Switch cp1082 from nginx to ats-tls - T231433
  • 08:24 moritzm: draining ganeti1006 for upcoming reboot (combined kernel/qemu security updates)
  • 08:18 jmm@cumin2001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
  • 08:18 jmm@cumin2001: START - Cookbook sre.hosts.downtime
  • 08:18 jmm@cumin2001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
  • 08:18 jmm@cumin2001: START - Cookbook sre.hosts.downtime
  • 08:14 jmm@cumin2001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
  • 08:14 jmm@cumin2001: START - Cookbook sre.hosts.downtime
  • 08:01 vgutierrez: Switch cp2011 from nginx to ats-tls - T231433
  • 07:48 moritzm: reduced RAM assignment for boron to 8G
  • 07:38 vgutierrez: Switch cp3038 from nginx to ats-tls - T231433
  • 07:29 jmm@cumin2001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
  • 07:29 jmm@cumin2001: START - Cookbook sre.hosts.downtime
  • 06:34 vgutierrez: switching from nginx to ats-tls on cp4024 - T231433
  • 05:47 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.decommission (exit_code=0)
  • 05:47 marostegui@cumin1001: START - Cookbook sre.hosts.decommission
  • 05:45 marostegui@deploy1001: Synchronized wmf-config/db-eqiad.php: Depool es1013, es1014 T227536 (duration: 01m 00s)
  • 05:19 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db1085 for schema change - lag will be generated on s6 labs', diff saved to https://phabricator.wikimedia.org/P9274 and previous config saved to /var/cache/conftool/dbconfig/20191009-051911-marostegui.json
  • 05:11 marostegui: Restart gerrit as it is down
  • 04:59 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db1105:3312 for schema change', diff saved to https://phabricator.wikimedia.org/P9273 and previous config saved to /var/cache/conftool/dbconfig/20191009-045941-marostegui.json
  • 04:47 marostegui@cumin1001: dbctl commit (dc=all): 'Repool db1103:3312', diff saved to https://phabricator.wikimedia.org/P9272 and previous config saved to /var/cache/conftool/dbconfig/20191009-044752-marostegui.json
  • 04:40 vgutierrez: switching cp5004 from nginx to ats-tls - T231433

2019-10-08

  • 23:28 mutante: phab1001 - replacing tin.eqiad.wmnet with deploy1001.eqiad.wmnet in phabricator/deployment-cache/.config:git_server - wondering if we can ever get rid of tin (T190568)
  • 23:05 ebernhardson@deploy1001: Synchronized wmf-config/: [cirrus] drop support for HHVM connection pooling (duration: 00m 59s)
  • 21:58 jforrester@deploy1001: Synchronized wmf-config/CommonSettings.php: Split out the CSP configuration s it can be more easily over-ridden (duration: 00m 59s)
  • 21:28 XenoRyet: updated payments-wiki from d2e2637275 to 8a65f57874
  • 21:09 chaomodus: restarted nagios-nrpe-server on notebook1003
  • 20:38 mutante: labweb1001 - disabled 2fa for myself on Wikitech using disableOATHAuthForUser.php --wiki=labswiki to debug T234996
  • 20:24 mutante: labweb1001 - edit /srv/mediawiki/wmf-config/wikitech.php to and change "false" to "true" on line 52 to enable LDAP debug logging for T234996
  • 19:51 marxarelli: 1.35.0-wmf.1 promoted to group0, cc: T233849. no rise in error rates. no new relevant errors
  • 19:43 dduvall@deploy1001: rebuilt and synchronized wikiversions files: group0 to 1.35.0-wmf.1
  • 19:38 dduvall@deploy1001: Synchronized php-1.35.0-wmf.1/skins/MinervaNeue/: sync T233521 backport prior to group0 (duration: 00m 59s)
  • 19:29 shdubsh: adding swagger exporter to apt repo
  • 19:13 dduvall@deploy1001: Finished scap: testwiki to php-1.35.0-wmf.1 and rebuild l10n cache (duration: 19m 21s)
  • 18:54 dduvall@deploy1001: Started scap: testwiki to php-1.35.0-wmf.1 and rebuild l10n cache
  • 18:53 godog: codfw-prod: more weight to ms-be205[1-6] - T233638
  • 18:45 dduvall@deploy1001: Pruned MediaWiki: 1.34.0-wmf.24 (duration: 08m 24s)
  • 17:32 marxarelli: cutting wmf/1.35.0-wmf.1
  • 16:17 cstone: civicrm revision changed from db7ef10bfa to 2ba100486e
  • 16:00 @: helmfile [CODFW] Ran 'apply' command on namespace 'wikifeeds' for release 'production' .
  • 15:58 @: helmfile [EQIAD] Ran 'apply' command on namespace 'wikifeeds' for release 'production' .
  • 15:57 @: helmfile [STAGING] Ran 'apply' command on namespace 'wikifeeds' for release 'staging' .
  • 15:30 XioNoX: remove 2 more sessions to AS12871 on cr2-esams - T232617
  • 15:20 XioNoX: add BGP sessions to AS199524 on cr2-eqdfw
  • 15:18 XioNoX: add BGP sessions to AS2635 on cr2-eqiad
  • 15:13 XioNoX: renumber BGP session to AS4761 on cr1-eqsin
  • 13:53 @: helmfile [CODFW] Ran 'apply' command on namespace 'wikifeeds' for release 'production' .
  • 13:51 marostegui@cumin2001: dbctl commit (dc=all): 'Depool db1103:3312 for schema change T233625', diff saved to https://phabricator.wikimedia.org/P9266 and previous config saved to /var/cache/conftool/dbconfig/20191008-135058-marostegui.json
  • 13:50 marostegui@cumin2001: dbctl commit (dc=all): 'Fully repool db1082 db1081 db1080 db1079 db1075 db1074 after PDU maintenance', diff saved to https://phabricator.wikimedia.org/P9265 and previous config saved to /var/cache/conftool/dbconfig/20191008-135033-marostegui.json
  • 13:49 @: helmfile [EQIAD] Ran 'apply' command on namespace 'wikifeeds' for release 'production' .
  • 13:46 @: helmfile [STAGING] Ran 'apply' command on namespace 'wikifeeds' for release 'staging' .
  • 13:41 marostegui@cumin2001: dbctl commit (dc=all): 'More traffic for db1082 db1081 db1080 db1079 db1075 db1074 after PDU maintenance', diff saved to https://phabricator.wikimedia.org/P9264 and previous config saved to /var/cache/conftool/dbconfig/20191008-134152-marostegui.json
  • 13:35 mholloway-shell@deploy1001: Finished deploy [mobileapps/deploy@8490964]: Update mobileapps to abd3543 (duration: 06m 04s)
  • 13:32 marostegui@cumin2001: dbctl commit (dc=all): 'Slowly repool db1082 db1081 db1080 db1079 db1075 db1074 after PDU maintenance', diff saved to https://phabricator.wikimedia.org/P9263 and previous config saved to /var/cache/conftool/dbconfig/20191008-133208-marostegui.json
  • 13:29 mholloway-shell@deploy1001: Started deploy [mobileapps/deploy@8490964]: Update mobileapps to abd3543
  • 13:17 marostegui@cumin1001: dbctl commit (dc=all): 'Slowly repool db1082 db1081 db1080 db1079 db1075 db1074 after PDU maintenance', diff saved to https://phabricator.wikimedia.org/P9262 and previous config saved to /var/cache/conftool/dbconfig/20191008-131752-marostegui.json
  • 13:07 marostegui@deploy1001: Synchronized wmf-config/db-eqiad.php: Fully repool es1011 (duration: 00m 51s)
  • 12:44 marostegui@cumin1001: dbctl commit (dc=all): 'Repool db1093 after schema change', diff saved to https://phabricator.wikimedia.org/P9261 and previous config saved to /var/cache/conftool/dbconfig/20191008-124417-marostegui.json
  • 12:43 marostegui@deploy1001: Synchronized wmf-config/db-eqiad.php: Slowly repool es1011 (duration: 00m 51s)
  • 12:38 marostegui@deploy1001: Synchronized wmf-config/db-eqiad.php: Repool es1012 T227138 (duration: 00m 51s)
  • 12:27 marostegui: Stop MySQL on es1012 for onsite maintenance
  • 12:27 marostegui@deploy1001: Synchronized wmf-config/db-eqiad.php: Depool es1012 T227138 (duration: 00m 51s)
  • 11:39 jbond@cumin1001: END (PASS) - Cookbook sre.hosts.ipmi-password-reset (exit_code=0)
  • 11:10 Urbanecm: EU SWAT done
  • 11:08 urbanecm@deploy1001: Synchronized wmf-config/InitialiseSettings.php: SWAT: fb49404: Enable more transwiki import sources for hiwikisource (T234892) (duration: 00m 55s)
  • 10:58 jbond@cumin1001: Updating IPMI password on 1253 hosts - jbond@cumin1001
  • 10:58 jbond@cumin1001: START - Cookbook sre.hosts.ipmi-password-reset
  • 10:58 jbond@cumin1001: END (ERROR) - Cookbook sre.hosts.ipmi-password-reset (exit_code=97)
  • 10:58 jbond@cumin1001: Updating IPMI password on 1253 hosts - jbond@cumin1001
  • 10:57 jbond42: testing ipmi reset cookbook. using the current pass for both old and new so no reset actully occures
  • 10:57 jbond@cumin1001: START - Cookbook sre.hosts.ipmi-password-reset
  • 10:57 jbond@cumin1001: END (FAIL) - Cookbook sre.hosts.ipmi-password-reset (exit_code=99)
  • 10:57 jbond@cumin1001: START - Cookbook sre.hosts.ipmi-password-reset
  • 10:22 akosiaris@: helmfile [CODFW] Ran 'apply' command on namespace 'kube-system' for release 'rbac-deploy-clusterrole' .
  • 10:21 akosiaris@: helmfile [EQIAD] Ran 'apply' command on namespace 'kube-system' for release 'rbac-deploy-clusterrole' .
  • 10:19 moritzm: draining ganeti1005 for upcoming reboot (combined kernel/qemu security updates)
  • 10:16 akosiaris@: helmfile [STAGING] Ran 'apply' command on namespace 'kube-system' for release 'rbac-deploy-clusterrole' .
  • 10:15 mobrovac@deploy1001: Finished deploy [restbase/deploy@00eda0b]: Parsoid VE logging: log if the etags differ (duration: 06m 32s)
  • 10:09 jmm@cumin2001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
  • 10:09 mobrovac@deploy1001: Started deploy [restbase/deploy@00eda0b]: Parsoid VE logging: log if the etags differ
  • 10:08 jmm@cumin2001: START - Cookbook sre.hosts.downtime
  • 10:08 jmm@cumin2001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
  • 10:08 jmm@cumin2001: START - Cookbook sre.hosts.downtime
  • 09:33 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db1093 for schema change', diff saved to https://phabricator.wikimedia.org/P9259 and previous config saved to /var/cache/conftool/dbconfig/20191008-093309-marostegui.json
  • 09:26 marostegui@cumin1001: dbctl commit (dc=all): 'Repool db1088 after schema change', diff saved to https://phabricator.wikimedia.org/P9258 and previous config saved to /var/cache/conftool/dbconfig/20191008-092627-marostegui.json
  • 09:20 marostegui: Compress logging table on db2088:3312 for idwiki,plwiki,ptwiki,zhwiki
  • 09:09 moritzm: draining ganeti1004 for upcoming reboot (combined kernel/qemu security updates)
  • 09:06 marostegui@cumin1001: dbctl commit (dc=all): 'Repool db1097:3315 T233625', diff saved to https://phabricator.wikimedia.org/P9257 and previous config saved to /var/cache/conftool/dbconfig/20191008-090616-marostegui.json
  • 08:57 jmm@cumin2001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
  • 08:57 jmm@cumin2001: START - Cookbook sre.hosts.downtime
  • 08:46 mobrovac@deploy1001: Finished deploy [restbase/deploy@83fcc0c]: Minor updates to VE logging (duration: 08m 05s)
  • 08:38 mobrovac@deploy1001: Started deploy [restbase/deploy@83fcc0c]: Minor updates to VE logging
  • 08:33 elukey: roll restart druid historicals and brokers on druid100[1-3] to pick up new settings - T234684
  • 08:10 akosiaris@: helmfile [EQIAD] Ran 'apply' command on namespace 'kube-system' for release 'rbac-deploy-clusterrole' .
  • 08:10 akosiaris@: helmfile [CODFW] Ran 'apply' command on namespace 'kube-system' for release 'rbac-deploy-clusterrole' .
  • 08:09 akosiaris@: helmfile [STAGING] Ran 'apply' command on namespace 'kube-system' for release 'rbac-deploy-clusterrole' .
  • 08:05 akosiaris@: helmfile [STAGING] Ran 'apply' command on namespace 'kube-system' for release 'rbac-deploy-clusterrole' .
  • 07:51 moritzm: draining ganeti1003 for upcoming reboot (combined kernel/qemu security updates)
  • 07:49 akosiaris: update OTRS to 5.0.38
  • 07:30 jmm@cumin2001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
  • 07:30 jmm@cumin2001: START - Cookbook sre.hosts.downtime
  • 07:29 jmm@cumin2001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
  • 07:29 jmm@cumin2001: START - Cookbook sre.hosts.downtime
  • 07:19 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db1088 for schema change', diff saved to https://phabricator.wikimedia.org/P9256 and previous config saved to /var/cache/conftool/dbconfig/20191008-071859-marostegui.json
  • 07:15 marostegui@cumin1001: dbctl commit (dc=all): 'Repool db1131 after schema change', diff saved to https://phabricator.wikimedia.org/P9255 and previous config saved to /var/cache/conftool/dbconfig/20191008-071551-marostegui.json
  • 07:10 moritzm: draining ganeti1002 for upcoming reboot (combined kernel/qemu security updates)
  • 06:48 marostegui: Stop MySQL on es1011 db1082 db1081 db1080 db1079 db1075 db1074 (replication lag will appear on labs for s5) for on-site maintenance T227138
  • 06:09 marostegui: Repool labsdb1011 after mysql upgrade
  • 05:48 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.decommission (exit_code=0)
  • 05:48 marostegui@cumin1001: START - Cookbook sre.hosts.decommission
  • 05:44 elukey: drop PageCreation_7481635 table from the log db on db1107/db1108 - T233892
  • 05:41 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db1082 db1081 db1080 db1079 db1075 db1074 for PDU maintenance T227138', diff saved to https://phabricator.wikimedia.org/P9254 and previous config saved to /var/cache/conftool/dbconfig/20191008-054127-marostegui.json
  • 05:35 elukey: drop CitationUsage tables from the log database on db1107/db1108 (the ones listed in the task) - T233893
  • 05:25 marostegui: Depool labsdb1011 for mysql upgrade
  • 05:14 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db1131 for schema change', diff saved to https://phabricator.wikimedia.org/P9253 and previous config saved to /var/cache/conftool/dbconfig/20191008-051435-marostegui.json
  • 05:10 marostegui: Reload query killer on labsdb1011
  • 05:08 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db1097:3315 T233625', diff saved to https://phabricator.wikimedia.org/P9252 and previous config saved to /var/cache/conftool/dbconfig/20191008-050833-marostegui.json
  • 05:07 marostegui: Deploy schema change on db1097:3315 - T233625
  • 03:04 andrewbogott: restarted nova-conductor on cloudcontrol1003 and cloudcontrol1004 β€” experimental band-aid for T234876
  • 00:33 dzahn@cumin1001: END (PASS) - Cookbook sre.hosts.ipmi-password-reset (exit_code=0)

2019-10-07

  • 23:52 dzahn@cumin1001: Updating IPMI password on 1254 hosts - dzahn@cumin1001
  • 23:52 dzahn@cumin1001: START - Cookbook sre.hosts.ipmi-password-reset
  • 23:26 krinkle@deploy1001: Synchronized wmf-config/InitialiseSettings.php: no-op / config cache issue? (duration: 00m 49s)
  • 23:25 dzahn@cumin1001: END (PASS) - Cookbook sre.hosts.ipmi-password-reset (exit_code=0)
  • 23:21 dzahn@cumin1001: Updating IPMI password on 1254 hosts - dzahn@cumin1001
  • 23:20 dzahn@cumin1001: START - Cookbook sre.hosts.ipmi-password-reset
  • 22:40 krinkle@deploy1001: Synchronized wmf-config/InitialiseSettings.php: 7b9e6829821, T156095 (duration: 00m 51s)
  • 22:29 chaomodus: restart nagios-nrpe-server on stat1007
  • 21:56 mutante: gerrit2001 - sudo rm /etc/apache2/sites-available/50-gerrit-slave-wikimedia-org.conf
  • 21:40 jforrester@deploy1001: Synchronized wmf-config/CommonSettings.php: Run Labs config after CSP config so it can change it (duration: 00m 51s)
  • 21:20 godog: swift codfw-prod: add ms-be205[3456] - T233638
  • 20:56 XenoRyet: updated payments-wiki from b94da68f7e to d2e2637275
  • 20:35 herron@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
  • 20:33 herron@cumin1001: END (FAIL) - Cookbook sre.hosts.downtime (exit_code=99)
  • 20:33 herron@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
  • 20:31 herron@cumin1001: START - Cookbook sre.hosts.downtime
  • 20:31 herron@cumin1001: START - Cookbook sre.hosts.downtime
  • 20:31 herron@cumin1001: END (FAIL) - Cookbook sre.hosts.downtime (exit_code=99)
  • 20:31 herron@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
  • 20:30 herron@cumin1001: START - Cookbook sre.hosts.downtime
  • 20:30 herron@cumin1001: START - Cookbook sre.hosts.downtime
  • 20:29 herron@cumin1001: START - Cookbook sre.hosts.downtime
  • 19:31 jforrester@deploy1001: Synchronized wmf-config/CommonSettings.php: Add the beta REL1_34 to ExtensionDistributor (duration: 00m 50s)
  • 19:20 herron@cumin1001: END (FAIL) - Cookbook sre.hosts.downtime (exit_code=99)
  • 19:18 herron@cumin1001: START - Cookbook sre.hosts.downtime
  • 19:10 Lucas_WMDE: Morning SWAT done
  • 19:09 lucaswerkmeister-wmde@deploy1001: Synchronized php-1.34.0-wmf.25/extensions/Wikibase: SWAT: Revert "Format coordinates with limited precision" (T174504) (duration: 00m 57s)
  • 18:33 Lucas_WMDE: reopen Morning SWAT for another backport (sorry)
  • 18:26 Urbanecm: Morning SWAT done
  • 18:25 urbanecm@deploy1001: Synchronized php-1.34.0-wmf.25/extensions/VisualEditor/: SWAT: 011b6eb: 11033b7: Update VE core submodule to 2ffb699eb (TreeModifier fixes), T234489, T234742 + ve.ui.MWDefinedTransclusionContextItem: Fix handling of template names (T234817) (duration: 00m 53s)
  • 18:16 godog: roll-restart logstash to apply https://gerrit.wikimedia.org/r/c/operations/puppet/+/539978
  • 18:12 andrewbogott: apt dist-upgrade on all cloudvirts (for nova upgrades)
  • 18:12 godog: start swiftrepl eqiad -> codfw (no deletes)
  • 18:10 urbanecm@deploy1001: Synchronized wmf-config/InitialiseSettings.php: SWAT: f434ae3: Enable NewUserMessage on sq.wikipedia and sq.wikiquote (T234499) (duration: 00m 52s)
  • 18:07 jgleeson: Updating civicrm from c12f7bb51f to db7ef10bfa
  • 17:46 ottomata: stat1007 is unresponsive, can't login via mgmt either. powercycling.
  • 17:29 XioNoX: add BGP route damping on IX sessions - eqiad - T222424
  • 17:27 XioNoX: add BGP route damping on IX sessions - esams - T222424
  • 17:22 XioNoX: add BGP route damping on IX sessions - eqsin - T222424
  • 15:34 mholloway-shell@deploy1001: Finished deploy [mobileapps/deploy@334e809]: Update mobileapps to 16cb9ae (duration: 06m 28s)
  • 15:30 @: helmfile [CODFW] Ran 'apply' command on namespace 'restrouter' for release 'production' .
  • 15:29 @: helmfile [EQIAD] Ran 'apply' command on namespace 'restrouter' for release 'production' .
  • 15:27 mholloway-shell@deploy1001: Started deploy [mobileapps/deploy@334e809]: Update mobileapps to 16cb9ae
  • 15:27 @: helmfile [STAGING] Ran 'apply' command on namespace 'restrouter' for release 'staging' .
  • 15:14 jforrester@deploy1001: Synchronized wmf-config/InitialiseSettings.php: Stop writing wmgVisualEditorEnableNewMobileContext (duration: 00m 51s)
  • 15:13 jforrester@deploy1001: Synchronized wmf-config/CommonSettings.php: Stop reading wmgVisualEditorEnableNewMobileContext (duration: 00m 52s)
  • 14:25 arturo: upgrading openstack in CloudVPS. Some IRC bots and related stuff may be unavailable (T212302)
  • 14:17 marostegui: Deploy schema change on db1139:3316 - T233135 T234066
  • 13:27 ladsgroup@deploy1001: Synchronized wmf-config/InitialiseSettings.php: Set all of wikidata to write both for item term store (T225055) (duration: 00m 54s)
  • 13:26 mobrovac@deploy1001: Finished deploy [restbase/deploy@1337290]: Minor tweaks to VE logging, v2 (duration: 06m 38s)
  • 13:19 mobrovac@deploy1001: Started deploy [restbase/deploy@1337290]: Minor tweaks to VE logging, v2
  • 13:17 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db1098:3316 schema change T233135 T234066', diff saved to https://phabricator.wikimedia.org/P9248 and previous config saved to /var/cache/conftool/dbconfig/20191007-131720-marostegui.json
  • 13:16 mobrovac@deploy1001: Finished deploy [restbase/deploy@bf72f5c]: Minor tweaks to VE logging (duration: 07m 01s)
  • 13:13 elukey: upload python-kafka and python3-kafka 1.4.7-1 to buster-wikimedia - T222941
  • 13:09 mobrovac@deploy1001: Started deploy [restbase/deploy@bf72f5c]: Minor tweaks to VE logging
  • 13:05 mobrovac@deploy1001: Finished deploy [restbase/deploy@5321aac]: (no justification provided) (duration: 00m 29s)
  • 13:04 mobrovac@deploy1001: Started deploy [restbase/deploy@5321aac]: (no justification provided)
  • 13:04 mobrovac@deploy1001: deploy aborted: Minor tweaks to VE logging (duration: 01m 07s)
  • 13:03 marostegui@cumin1001: dbctl commit (dc=all): 'Repool db1096:3315 after schema change T233625', diff saved to https://phabricator.wikimedia.org/P9247 and previous config saved to /var/cache/conftool/dbconfig/20191007-130317-marostegui.json
  • 13:03 mobrovac@deploy1001: Started deploy [restbase/deploy@fe39197]: Minor tweaks to VE logging
  • 12:54 akosiaris@puppetmaster1001: conftool action : set/pooled=true; selector: dnsdisc=restrouter
  • 12:54 elukey: upload python-kafka and python3-kafka 1.4.7-1 to stretch-wikimedia - T222941
  • 11:44 Lucas_WMDE: EU SWAT done
  • 11:44 lucaswerkmeister-wmde@deploy1001: Synchronized wmf-config/CommonSettings.php: SWAT: Get rid of main page hack for fixcopyrightwiki (T120085) (duration: 00m 52s)
  • 11:42 lucaswerkmeister-wmde@deploy1001: Synchronized wmf-config/InitialiseSettings.php: SWAT: Set $wgMainPageIsDomainRoot true for fixcopyrightwiki (T120085) (duration: 00m 52s)
  • 11:41 Amir1: another hack bites the dust
  • 11:35 lucaswerkmeister-wmde@deploy1001: Synchronized php-1.34.0-wmf.25/extensions/GrowthExperiments/: SWAT: Homepage: Don't use flexbox for vertical layouts in mobile start module (T234380) (duration: 00m 53s)
  • 11:18 lucaswerkmeister-wmde@deploy1001: Synchronized wmf-config/InitialiseSettings.php: SWAT: Enable partial blocks on nlwiki (T234685) (duration: 00m 52s)
  • 11:16 arturo: added bdsync 0.11.1-1~wmf1 to buster-wikimedia (T234683)
  • 10:59 mobrovac@deploy1001: Finished deploy [restbase/deploy@5321aac]: Skip checking resources on start-up, add banwiki, add metrics/mediarequests end points and log all VE requests, take #5 (duration: 04m 17s)
  • 10:55 mobrovac@deploy1001: Started deploy [restbase/deploy@5321aac]: Skip checking resources on start-up, add banwiki, add metrics/mediarequests end points and log all VE requests, take #5
  • 10:54 mobrovac@deploy1001: Finished deploy [restbase/deploy@5321aac]: Skip checking resources on start-up, add banwiki, add metrics/mediarequests end points and log all VE requests, take #4 (duration: 04m 27s)
  • 10:50 mobrovac@deploy1001: Started deploy [restbase/deploy@5321aac]: Skip checking resources on start-up, add banwiki, add metrics/mediarequests end points and log all VE requests, take #4
  • 10:48 mobrovac@deploy1001: Finished deploy [restbase/deploy@5321aac]: Skip checking resources on start-up, add banwiki, add metrics/mediarequests end points and log all VE requests, take #3 (duration: 03m 53s)
  • 10:44 mobrovac@deploy1001: Started deploy [restbase/deploy@5321aac]: Skip checking resources on start-up, add banwiki, add metrics/mediarequests end points and log all VE requests, take #3
  • 10:37 jdrewniak@deploy1001: Synchronized portals: Wikimedia Portals Update: Bumping portals to master (T128546) (duration: 00m 51s)
  • 10:36 jdrewniak@deploy1001: Synchronized portals/wikipedia.org/assets: Wikimedia Portals Update: Bumping portals to master (T128546) (duration: 00m 53s)
  • 10:31 _joe_: uploading confd 0.16.0 to stretch
  • 10:21 mobrovac@deploy1001: Finished deploy [restbase/deploy@1798e39]: Skip checking resources on start-up, add banwiki, add metrics/mediarequests end points and log all VE requests, take #2 (duration: 01m 56s)
  • 10:19 mobrovac@deploy1001: Started deploy [restbase/deploy@1798e39]: Skip checking resources on start-up, add banwiki, add metrics/mediarequests end points and log all VE requests, take #2
  • 10:16 mobrovac@deploy1001: Finished deploy [restbase/deploy@1798e39]: Skip checking resources on start-up, add banwiki, add metrics/mediarequests end points and log all VE requests - T233127 T234772 (duration: 05m 58s)
  • 10:10 mobrovac@deploy1001: Started deploy [restbase/deploy@1798e39]: Skip checking resources on start-up, add banwiki, add metrics/mediarequests end points and log all VE requests - T233127 T234772
  • 09:55 marostegui: Deploy schema change on db2129 (s6 codfw master), this will generate lag on s6 codfw - T233135 T234066
  • 08:34 hashar: gerrit: force reindexing all changes ( gerrit index start changes --force )
  • 07:09 marostegui: Remove grants for dbproxy1006 on m1 databases - T231280
  • 06:56 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db1096:3315 for schema change T233625', diff saved to https://phabricator.wikimedia.org/P9246 and previous config saved to /var/cache/conftool/dbconfig/20191007-065645-marostegui.json
  • 06:25 marostegui@deploy1001: Synchronized wmf-config/db-eqiad.php: Depool es1011 T227138 (duration: 01m 10s)
  • 06:08 elukey: upgrade python-kafka on eventlog1002 to 1.4.7-1 (manually via dpkg -i) - T222941
  • 05:46 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.decommission (exit_code=0)
  • 05:45 marostegui@cumin1001: START - Cookbook sre.hosts.decommission
  • 05:25 marostegui: Deploy schema change on db2124 T233135 T234066
  • 05:10 marostegui: The above was for db2095:3316 T234704
  • 05:08 marostegui: Stop replication on db2076 to modify triggers on db2096:3316 T234704
  • 05:02 marostegui: Fix replication on labsdb1011:s8
  • 04:54 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db1098:3316 for schema change T233135 T234066', diff saved to https://phabricator.wikimedia.org/P9245 and previous config saved to /var/cache/conftool/dbconfig/20191007-045411-marostegui.json

2019-10-06

  • 20:11 Urbanecm: mwscript importImages.php --wiki=commonswiki --comment-ext=txt --user=Racconish /home/urbanecm/T234741 (T234741)
  • 19:15 marostegui: Reload haproxy on dbproxy1010, dbproxy1011, dbproxy1018, dbproxy1019
  • 06:47 elukey: delete old cron entry 'xenon_generate_svgs' (user xenon) on webperf[12]002 to reduce cronspam

2019-10-05

  • 06:48 elukey: force umount/remount of /mnt/hdfs on an-coord1001 - processes stuck in D state, fuser proc consuming a ton of memory

2019-10-04

  • 22:06 mutante: ms-be1020 - power cycle via mgmt - host down
  • 20:43 krinkle@deploy1001: Synchronized w/static.php: 9648e03, 97d9384 (duration: 00m 53s)
  • 20:41 mutante: deploy1001 / deploy2001 - remove python-pygerrit2 (version for python3 is needed instead)
  • 20:32 mutante: gerrit1001 - scp /usr/share/java/mysql-connector-java.jar from cobalt into /usr/share/java/ on gerrit1001 and then symlink into /var/lib/gerrit2/review_site/lib/ (T222391)
  • 19:27 mutante: wtp1025 - mediawiki appserver classes are being applied, install in progress will trigger some new icinga alerts
  • 14:03 marostegui: Deploy schema change on db2117 T233135 T234066
  • 13:50 @: helmfile [EQIAD] Ran 'apply' command on namespace 'restrouter' for release 'production' .
  • 13:47 @: helmfile [CODFW] Ran 'apply' command on namespace 'restrouter' for release 'production' .
  • 13:36 @: helmfile [STAGING] Ran 'apply' command on namespace 'restrouter' for release 'staging' .
  • 12:28 marostegui: Deploy schema change on db2097:3316 T233135 T234066
  • 12:23 elukey: cleaned up old files and apt-cache from an-coord1001
  • 08:41 marostegui: Deploy schema change on db2076 (sanitarium master) with replication T233135 T234066
  • 08:32 _joe_: reuploading the old confd package to stetch-wikimedia, some incompatibility detected
  • 07:26 elukey: execute gnt-instance remove kerberos1001 on ganeti1001 - T234600
  • 07:24 elukey@cumin1001: END (FAIL) - Cookbook sre.hosts.decommission (exit_code=1)
  • 07:24 elukey@cumin1001: START - Cookbook sre.hosts.decommission
  • 06:41 marostegui: Deploy schema change on db2114 T233135 T234066
  • 06:22 _joe_: downgrading confd back to 0.9.0 while some templates get fixed.
  • 06:19 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.decommission (exit_code=0)
  • 06:18 marostegui@cumin1001: START - Cookbook sre.hosts.decommission
  • 06:16 marostegui: Deploy schema change on dbstore1005:3316 T233135 T234066
  • 05:59 marostegui@deploy1001: Synchronized wmf-config/db-eqiad.php: Fully repool es1019 after on-site maintenance T233698 (duration: 00m 51s)
  • 05:53 _joe_: upgrading confd on puppetmaster1001 T147204
  • 05:50 _joe_: uploading confd 0.16.0 on stretch T147204
  • 05:49 marostegui@deploy1001: Synchronized wmf-config/db-eqiad.php: More traffic to es1019 after on-site maintenance T233698 (duration: 00m 51s)
  • 05:11 marostegui@cumin1001: dbctl commit (dc=all): 'Repool db1096:3316 after schema change', diff saved to https://phabricator.wikimedia.org/P9240 and previous config saved to /var/cache/conftool/dbconfig/20191004-051112-marostegui.json
  • 05:08 marostegui@deploy1001: Synchronized wmf-config/db-eqiad.php: Slowly repool es1019 after on-site maintenance T233698 (duration: 00m 53s)

2019-10-03

  • 23:50 mutante: gerrit - restarting for replication config tweaks
  • 20:05 @: helmfile [EQIAD] Ran 'apply' command on namespace 'sessionstore' for release 'production' .
  • 20:01 @: helmfile [CODFW] Ran 'apply' command on namespace 'sessionstore' for release 'production' .
  • 19:52 XenoRyet: updated payments-wiki from 80dead6444 to b94da68f7e
  • 19:40 mutante: mw1290 - depooled and scheduled downtime in Icinga for hardware maintenance T234153
  • 19:38 dzahn@cumin1001: conftool action : set/pooled=no; selector: name=mw1290.eqiad.wmnet
  • 19:30 marxarelli: 1.34.0-wmf.25 promoted to all wikis, cc: T220750. no rise in relevant error rates. no new errors
  • 19:21 dduvall@deploy1001: rebuilt and synchronized wikiversions files: all wikis to 1.34.0-wmf.25
  • 19:19 mutante: puppetmaster1001 - revoke cert for parsoid.discovery.wmnet - creating new ones for each DC and a unified one with both (T233654)
  • 19:11 @: helmfile [STAGING] Ran 'apply' command on namespace 'sessionstore' for release 'staging' .
  • 18:52 krinkle@deploy1001: Synchronized wmf-config/InitialiseSettings.php: no-op / config cached? (duration: 00m 59s)
  • 18:43 krinkle@deploy1001: Synchronized wmf-config/InitialiseSettings.php: c2b3d7c (duration: 00m 59s)
  • 18:14 krinkle@deploy1001: Synchronized wmf-config/InitialiseSettings.php: no-op / config cache issue? (duration: 01m 00s)
  • 18:03 krinkle@deploy1001: Synchronized wmf-config/InitialiseSettings.php: 5389d0243ee9c (duration: 01m 01s)
  • 17:13 mholloway-shell@deploy1001: Finished deploy [mobileapps/deploy@31b2703]: Update mobileapps to 1db84a7 (duration: 06m 06s)
  • 17:07 mholloway-shell@deploy1001: Started deploy [mobileapps/deploy@31b2703]: Update mobileapps to 1db84a7
  • 13:49 elukey: roll restart hadoop yarn resource managers for openssl updates on Hadoop workers
  • 13:44 marostegui: Stop MySQL and shutdown es1019 for on-site maintenance - T233698
  • 13:40 marostegui@deploy1001: Synchronized wmf-config/db-eqiad.php: Depool es1019 for on-site maintenance T233698 (duration: 01m 01s)
  • 13:29 hashar: Gerrit should be back
  • 13:26 hashar: restarting Gerrit due to a deadlock in SendEmail task and AccountCacheImpl
  • 13:22 hashar: Gerrit might be dead again; taking traces
  • 13:04 _joe_: restarting php7 on mw1275
  • 12:54 onimisionipe: force shard allocation on eqiad chi cluster
  • 10:27 elukey: killed rsync processes in "D" state on stat1007, force umount/mount of /mnt/hdfs
  • 10:25 jbond42: rolling upgrade of openssl packages
  • 10:21 Urbanecm: Manually cleared signup throttle for IP 80.188.128.54 at cswiki, issue with introduced throttle rule
  • 10:20 Urbanecm: Manually cleared signup throttle for IP 88.100.221.84 at cswiki, issue with introduced throttle rule
  • 10:18 Urbanecm: Manually cleared signup throttle for IP 90.176.155.12 at cswiki, issue with introduced throttle rule
  • 09:32 elukey: run apt-get autoremove incrementally on all the hadoop prod workers to remove python2 deps (and verify that they are not used anymore by Hadoop)
  • 08:33 marostegui: Deploy schema change on db2087:3316 T233135 T234066
  • 08:28 marostegui: Deploy schema change on db1096:3316 - T233625
  • 08:26 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db1096:3316 for schema change T233135 T234066', diff saved to https://phabricator.wikimedia.org/P9236 and previous config saved to /var/cache/conftool/dbconfig/20191003-082651-marostegui.json
  • 08:15 akosiaris: slowly rolling restart all pods in eqiad, codfw, staging for log rollover before merging https://gerrit.wikimedia.org/r/#/c/operations/puppet/+/539912
  • 07:49 marostegui: Set notes on the sanitarium masters - T234039
  • 07:19 marostegui: Remove unused labspuppet database from m5 - T233281
  • 07:03 @: helmfile [CODFW] Ran 'apply' command on namespace 'zotero' for release 'production' .
  • 07:00 @: helmfile [STAGING] Ran 'apply' command on namespace 'zotero' for release 'staging' .
  • 06:59 eileen: tools revision changed from e1b81688c6 to b3c7453be2
  • 06:59 @: helmfile [EQIAD] Ran 'apply' command on namespace 'zotero' for release 'production' .
  • 06:48 marostegui: Drop database grants on m5 for labspuppet - T233281
  • 06:37 marostegui: Rename tables on m5 master on designate_pool_manager - T233978
  • 06:16 marostegui: Deploy schema change on db2089:3316 T233135 T234066
  • 05:46 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.decommission (exit_code=0)
  • 05:45 marostegui@cumin1001: START - Cookbook sre.hosts.decommission
  • 05:28 eileen: civicrm revision changed from 12c5727a23 to c12f7bb51f, config revision is 422a0f7d48
  • 02:07 krinkle@deploy1001: Synchronized wmf-config/InitialiseSettings.php: 1c599baea51f9 (duration: 01m 03s)
  • 01:05 mutante: gerrit1001 - shutdown - scheduled downtime
  • 00:51 mutante: gerrit1001 - removing wrong IPv6 address from interface, running puppet

2019-10-02

  • 23:42 XioNoX: enable cr2-eqiad:xe-4/0/0 - T234416
  • 23:38 XioNoX: disable cr2-eqiad:xe-4/0/0 - T234416
  • 23:22 ebernhardson@deploy1001: Synchronized php-1.34.0-wmf.24/extensions/CirrusSearch/: T234445: CirrusSearch: Fix Precondition failed: Must have a resultset set (duration: 01m 00s)
  • 23:21 ebernhardson@deploy1001: Synchronized php-1.34.0-wmf.25/extensions/CirrusSearch/: T234445: CirrusSearch: Fix Precondition failed: Must have a resultset set (duration: 01m 02s)
  • 22:29 godog: remove queued messages from mx1001 for fr-tech-ops@, triggering sender rate limit from gmail
  • 22:12 jforrester@deploy1001: Synchronized php-1.34.0-wmf.24/extensions/VisualEditor/includes/ApiVisualEditorEdit.php: VE unstructured logging, part II (duration: 00m 58s)
  • 22:11 jforrester@deploy1001: Synchronized php-1.34.0-wmf.24/extensions/VisualEditor/includes/ApiVisualEditor.php: VE unstructured logging, part I (duration: 00m 59s)
  • 22:09 jforrester@deploy1001: Synchronized php-1.34.0-wmf.25/extensions/VisualEditor/includes/ApiVisualEditorEdit.php: VE unstructured logging, part II (duration: 00m 58s)
  • 22:06 jforrester@deploy1001: Synchronized php-1.34.0-wmf.25/extensions/VisualEditor/includes/ApiVisualEditor.php: VE unstructured logging, part I (duration: 01m 00s)
  • 21:17 mutante: cobalt (gerrit) rsyncing /srv/gerrit/git and /srv/gerrit/plugins data to gerrit1001 again after reinstall and fixing gerrit2 UID/GID (T222391)
  • 21:13 mutante: gerrit1001 - rebooting
  • 21:08 mutante: gerrit1001 changing GID of gerrit2 user to 119 in /etc/group ; find / -uid 499 -exec chown gerrit2 {} \; find / -gid 1001 -exec chown gerrit2:gerrit2 {} \; (T222391)
  • 21:03 mutante: gerrit1001 changing UID of gerrit2 user to 114 and GID to 119 in /etc/passwd to match cobalt to avoid privilege issues after rsyncing data (T222391)
  • 19:58 mutante: puppetmaster1001 - sudo puppet cert clean parsoid.discovery.wmnet (only created yesterday but does not have all the SANs it needs, updating with more SANs) (T233654)
  • 19:47 Jeff_Green: deployed icinga fundraising-nsca collection configuration change
  • 19:37 dzahn@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
  • 19:34 dzahn@cumin1001: START - Cookbook sre.hosts.downtime
  • 19:33 marxarelli: 1.34.0-wmf.25 promoted to group1, cc: T220750. no rise in relevant error rates
  • 19:23 dduvall@deploy1001: Synchronized php: group1 wikis to 1.34.0-wmf.25 (duration: 00m 59s)
  • 19:22 dduvall@deploy1001: rebuilt and synchronized wikiversions files: group1 wikis to 1.34.0-wmf.25
  • 18:28 XioNoX: add BGP route damping on IX sessions - eqord - T222424
  • 18:25 XioNoX: add BGP route damping on IX sessions - eqdfw - T222424
  • 18:15 XioNoX: add BGP route damping on IX sessions - ulsfo - T222424
  • 17:08 Lucas_WMDE: Morning SWAT done
  • 17:03 lucaswerkmeister-wmde@deploy1001: Synchronized php-1.34.0-wmf.25/skins/Vector/: SWAT: vector.js: Remove eager calculation of p-cactions width on page load (duration: 01m 00s)
  • 16:53 otto@deploy1001: Started restart [eventstreams/deploy@dbc9bbb]: Enabling revision-score stream in eventstreams
  • 16:50 otto@deploy1001: Started restart [eventstreams/deploy@dbc9bbb]: (no justification provided)
  • 16:50 otto@deploy1001: Finished deploy [eventstreams/deploy@dbc9bbb]: (no justification provided) (duration: 00m 01s)
  • 16:50 otto@deploy1001: Started deploy [eventstreams/deploy@dbc9bbb]: (no justification provided)
  • 16:46 lucaswerkmeister-wmde@deploy1001: Synchronized php-1.34.0-wmf.25/extensions/VisualEditor/: SWAT: ApiVisualEditor: Add logging for RESTBase HTTP errors (T233127) + ApiVisualEditorEdit: Add logging for funny etags (T233320) (duration: 01m 04s)
  • 16:42 lucaswerkmeister-wmde@deploy1001: Synchronized php-1.34.0-wmf.24/extensions/VisualEditor/: SWAT: ApiVisualEditorEdit: Add logging for funny etags (T233320) (duration: 01m 03s)
  • 15:31 godog: correction, add ms-be2052
  • 15:29 godog: swift codfw-prod: add ms-be2051 T233638
  • 15:13 godog: run swiftrepl eqiad -> codfw on ms-fe1005 (no deletes)
  • 14:31 moritzm: installing libxslt security updates on stretch
  • 14:16 moritzm: installing babeltrace bugfix update from buster point release
  • 13:18 moritzm: installing mariabd-10.3 update from buster point release (just client side libs, tools)
  • 13:16 moritzm: installing console-setup bugfix update from buster point release
  • 11:28 moritzm: installing cryptsetup bugfix from buster 10.1 point release
  • 11:26 Urbanecm: EU SWAT done
  • 11:26 urbanecm@deploy1001: Synchronized wmf-config/InitialiseSettings.php: SWAT: 01711d5: Enable partial blocks at ptwiki (T233754) (duration: 00m 55s)
  • 11:26 jbond42: update puppet.eqiad.wmnet to puppetmaster2001
  • 11:24 jbond42: update puppet.esams.wmnet to puppetmaster2001
  • 11:20 pmiazga@deploy1001: Synchronized wmf-config/InitialiseSettings.php: SWAT: Set new MFMobileFormatterOptions config using old config (T232690) (duration: 01m 01s)
  • 11:15 _joe_: testing the package on restbase-dev1006
  • 11:14 _joe_: uploaded service-checker 0.2.0 to stretch-wikimedia
  • 11:12 pmiazga@deploy1001: Synchronized wmf-config/mobile.php: SWAT: Do not set wgMFNoindexPages config flag in mobile.php (T206497) (duration: 01m 14s)
  • 10:17 gehel@cumin1001: END (FAIL) - Cookbook sre.hosts.decommission (exit_code=1)
  • 10:17 gehel@cumin1001: START - Cookbook sre.hosts.decommission
  • 09:41 moritzm: rebalancing Ganeti/codfw Row A after rolling reboot of Ganeti nodes
  • 07:46 moritzm: upgrading remaining stretch hosts to ferm 2.4.2pre
  • 06:23 marostegui: Fix replication on labsdb1011:s7 - T233986
  • 06:17 marostegui: Fix replication on labsdb1011:s1 - T233986
  • 06:09 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.decommission (exit_code=0)
  • 06:09 marostegui@cumin1001: START - Cookbook sre.hosts.decommission
  • 04:07 vgutierrez: restarting trafficserver-tls on cp5007
  • 00:54 ejegg: updated fundraising CiviCRM from 6d90d0cf06 to 12c5727a23
  • 00:34 krinkle@deploy1001: Synchronized php-1.34.0-wmf.25/resources/src: 5eb3ae1 (duration: 01m 00s)
  • 00:30 krinkle@deploy1001: Synchronized php-1.34.0-wmf.25/skins/Vector/: d30064229f9 (duration: 00m 59s)

2019-10-01

  • 23:46 ebernhardson@deploy1001: Synchronized php-1.34.0-wmf.25/extensions/VisualEditor/includes/ApiVisualEditor.php: T233127: ApiVisualEditor: Add logging for RESTBase HTTP errors (duration: 00m 58s)
  • 23:44 ebernhardson@deploy1001: Synchronized php-1.34.0-wmf.24/extensions/WikimediaEvents/modules/all/ext.wikimediaEvents.searchSatisfaction.js: T233211: Deploy cirrussearch glent m0 a/b test (duration: 00m 59s)
  • 23:43 ebernhardson@deploy1001: Synchronized php-1.34.0-wmf.25/extensions/WikimediaEvents/modules/all/ext.wikimediaEvents.searchSatisfaction.js: T233211: Deploy cirrussearch glent m0 a/b test (duration: 00m 59s)
  • 23:28 mutante: cobalt (gerrit) rsyncing /srv/gerrit/plugins dir, push to new server gerrit1001 (T222391)
  • 23:21 mutante: gerrit1001 - chown -R gerrit2:gerrit2 /srv/gerrit/git/ (T222391)
  • 23:20 ebernhardson@deploy1001: Synchronized wmf-config/InitialiseSettings.php: T233211: CirrusSearch: Configuration for glent m0 AB test (duration: 00m 58s)
  • 23:12 ebernhardson@deploy1001: Synchronized wmf-config/InitialiseSettings.php: T233127: Add VisualEditor logging channel to wmgMonologChannels (duration: 00m 59s)
  • 22:30 ayounsi@cumin1001: END (FAIL) - Cookbook sre.hosts.rotate-pdu-password (exit_code=1)
  • 22:19 ayounsi@cumin1001: START - Cookbook sre.hosts.rotate-pdu-password
  • 21:34 godog: swift codfw-prod: add ms-be2051 with minimal weight - T233638 T222366
  • 21:33 krinkle@deploy1001: Synchronized php-1.34.0-wmf.25/skins/Vector/: bb2fd9cf9c22cc (duration: 01m 00s)
  • 21:29 ayounsi@cumin1001: END (FAIL) - Cookbook sre.hosts.rotate-pdu-password (exit_code=1)
  • 21:29 ayounsi@cumin1001: START - Cookbook sre.hosts.rotate-pdu-password
  • 20:11 ayounsi@cumin1001: END (FAIL) - Cookbook sre.hosts.rotate-pdu-password (exit_code=1)
  • 20:10 ayounsi@cumin1001: START - Cookbook sre.hosts.rotate-pdu-password
  • 19:58 mutante: cobalt (gerrit) - rsyncing gerrit data to gerrit1001 in a screen session (T222391)
  • 19:47 ayounsi@cumin1001: END (ERROR) - Cookbook sre.hosts.rotate-pdu-password (exit_code=97)
  • 19:47 ayounsi@cumin1001: START - Cookbook sre.hosts.rotate-pdu-password
  • 19:42 marxarelli: 1.34.0-wmf.25 promoted to group0 cc: T220750. no rise in relevant error rates
  • 19:34 dduvall@deploy1001: rebuilt and synchronized wikiversions files: group0 to 1.34.0-wmf.25
  • 19:30 marxarelli: promoting 1.34.0-wmf.25 to group0
  • 19:28 dduvall@deploy1001: Finished scap: testwiki to php-1.34.0-wmf.25 and rebuild l10n cache (duration: 19m 31s)
  • 19:08 dduvall@deploy1001: Started scap: testwiki to php-1.34.0-wmf.25 and rebuild l10n cache
  • 19:07 dduvall@deploy1001: Pruned MediaWiki: 1.34.0-wmf.23 (duration: 01m 32s)
  • 19:04 dduvall@deploy1001: Pruned MediaWiki: 1.34.0-wmf.22 (duration: 01m 41s)
  • 19:02 dduvall@deploy1001: Pruned MediaWiki: 1.34.0-wmf.21 (duration: 01m 57s)
  • 19:01 ayounsi@cumin1001: END (FAIL) - Cookbook sre.hosts.rotate-pdu-password (exit_code=1)
  • 19:00 ayounsi@cumin1001: START - Cookbook sre.hosts.rotate-pdu-password
  • 18:59 dduvall@deploy1001: Pruned MediaWiki: 1.34.0-wmf.20 (duration: 02m 11s)
  • 18:57 dduvall@deploy1001: Pruned MediaWiki: 1.34.0-wmf.19 (duration: 02m 12s)
  • 18:54 dduvall@deploy1001: Pruned MediaWiki: 1.34.0-wmf.17 (duration: 02m 48s)
  • 18:48 dduvall@deploy1001: Pruned MediaWiki: 1.34.0-wmf.16 (duration: 18m 45s)
  • 17:53 ayounsi@cumin1001: END (ERROR) - Cookbook sre.hosts.rotate-pdu-password (exit_code=97)
  • 17:52 thcipriani: gerrit restart for new config changes incoming
  • 17:52 ayounsi@cumin1001: START - Cookbook sre.hosts.rotate-pdu-password
  • 17:50 ayounsi@cumin1001: END (ERROR) - Cookbook sre.hosts.rotate-pdu-password (exit_code=97)
  • 17:48 ayounsi@cumin1001: START - Cookbook sre.hosts.rotate-pdu-password
  • 17:48 XioNoX: rotate PDUs passwords - T233053
  • 17:18 filippo@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
  • 17:14 filippo@cumin1001: START - Cookbook sre.hosts.downtime
  • 17:09 krinkle@deploy1001: Synchronized wmf-config/InitialiseSettings.php: T156095 - c28baa1862401 (duration: 00m 59s)
  • 17:07 mutante: Welcome new deployer Andrew Kostka (WMDE) (T233202)
  • 17:07 marxarelli: cutting wmf/1.34.0-wmf.25
  • 16:16 _joe_: manually downgrading php-geoip on deploy*, it was still at the 7.0-only version from the distro
  • 16:14 @: helmfile [CODFW] Ran 'sync' command on namespace 'restrouter' for release 'production' .
  • 16:14 @: helmfile [CODFW] Ran 'sync' command on namespace 'restrouter' for release 'production' .
  • 16:10 @: helmfile [EQIAD] Ran 'sync' command on namespace 'restrouter' for release 'production' .
  • 16:06 @: helmfile [STAGING] Ran 'sync' command on namespace 'restrouter' for release 'staging' .
  • 15:36 _joe_: uninstalling temporarily the math rendering related packages from mwdebug2002, test for T195847
  • 15:36 elukey: powercycle an-conf1001 to test some bios settings
  • 15:12 jbond42: puppetmaster2001 is back online
  • 14:34 dcausse: created cirrussearch indices for nqowiki (T234326)
  • 14:18 moritzm: rebooting krb1001 for some tests
  • 14:17 jmm@cumin2001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
  • 14:17 jmm@cumin2001: START - Cookbook sre.hosts.downtime
  • 14:10 hashar: Restarting CI Jenkins
  • 14:08 cdanis: βœ”οΈ cdanis@puppetmaster2001.codfw.wmnet ~ πŸ•™β˜• (cd /var/lib/git/labs/private ; git rev-parse HEAD | sudo tee /srv/config-master/labsprivate-sha1.txt )
  • 14:08 cdanis: βœ”οΈ cdanis@puppetmaster2001.codfw.wmnet ~ πŸ•™β˜• (cd /var/lib/git/operations/puppet ; git rev-parse HEAD | sudo tee /srv/config-master/puppet-sha1.txt )
  • 14:08 herron: beginning rolling reboots of eqiad and codfw logstash collectors
  • 14:02 moritzm: rebooting mw1265 for some tests
  • 14:01 jmm@cumin2001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
  • 14:01 jmm@cumin2001: START - Cookbook sre.hosts.downtime
  • 13:59 cdanis: βœ”οΈ cdanis@puppetmaster2001.codfw.wmnet ~ πŸ•™β˜• sudo touch /srv/config-master/puppet-sha1.txt /srv/config-master/labsprivate-sha1.txt && sudo chown gitpuppet:gitpuppet /srv/config-master/puppet-sha1.txt /srv/config-master/labsprivate-sha1.txt
  • 13:42 jbond@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
  • 13:40 jbond@cumin1001: START - Cookbook sre.hosts.downtime
  • 13:24 jbond42: reimage puppetmaster2001
  • 12:37 hashar: Gerrit misbehaved temporarily due to human operator error (hashar ran jstack -l -m which bring the jvm to an halt)
  • 11:16 jbond42: update puppet.ulsfo.wmnet to point to puppetmaster1001
  • 10:45 jbond42: update puppet.esqin.wmnet to point to puppetmaster1001
  • 10:17 moritzm: upgrading ferm on remaining mw servers 2.4.2pre T153468
  • 09:35 moritzm: run systemctl reset-failed on puppetmaster2002 to clear failed puppet-master.service
  • 09:19 moritzm: upgrading ferm on a number of systems to 2.4.2pre T153468
  • 09:07 vgutierrez: restarting acme-chief on acmechief1001 to catch up with python3-cryptography upgrades - T234131
  • 09:04 vgutierrez: upgrading python3-cryptography to version 2.6.1-3+deb10u1~wmf1 on acme-chief hosts - T234131
  • 09:03 moritzm: rebalancing ganeti/row_B after rolling reboot
  • 08:57 vgutierrez: upgrading python3-cryptography to version 2.6.1-3+deb10u1~wmf1 on acmechief-test1001 - T234131
  • 08:41 jmm@cumin2001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
  • 08:41 jmm@cumin2001: START - Cookbook sre.hosts.downtime
  • 08:00 moritzm: draining ganeti2003 for upcoming reboot (combined kernel/qemu security updates)
  • 07:00 hashar: gerrit: forcing reindex of changes # T233989
  • 06:29 elukey@cumin1001: END (FAIL) - Cookbook sre.hosts.decommission (exit_code=99)
  • 06:29 elukey@cumin1001: START - Cookbook sre.hosts.decommission
  • 06:28 elukey@cumin1001: END (FAIL) - Cookbook sre.hosts.decommission (exit_code=1)
  • 06:28 elukey@cumin1001: START - Cookbook sre.hosts.decommission
  • 06:20 marostegui@cumin1001: dbctl commit (dc=all): 'Repool db2091:3314 schema change - T233625', diff saved to https://phabricator.wikimedia.org/P9223 and previous config saved to /var/cache/conftool/dbconfig/20191001-061956-marostegui.json
  • 05:12 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.decommission (exit_code=0)
  • 05:12 marostegui@cumin1001: START - Cookbook sre.hosts.decommission
  • 00:12 mutante: phabricator - upgrading PHP version to 7.2.22 - T230024

2019-09-30

  • 23:28 niharika29@deploy1001: Synchronized php-1.34.0-wmf.24/extensions/CentralNotice/resources/infrastructure/: CentralNotice: Replace deprecated editToken with csrfToken - T233538 (duration: 00m 57s)
  • 23:23 AndyRussG: updated fruec from c591bd653b to 18d89675d0
  • 21:48 dzahn@cumin1001: conftool action : set/pooled=yes; selector: name=mw1290.eqiad.wmnet
  • 21:47 mutante: mw1290 - scap pull to get it in sync with latest deployment - it was down during scap run for T234153
  • 21:42 jforrester@deploy1001: Synchronized robots.txt: Remove old InternetArchive bot rule that's been disabled since 2008 T7582 (duration: 00m 57s)
  • 21:40 jforrester@deploy1001: Synchronized wmf-config/CommonSettings.php: T222539 Drop no-op hacky disablement of MessageBlobStore::clear() (duration: 05m 13s)
  • 21:38 James_F: sync failure on mw1290.eqiad.wmnet – Connection timed out
  • 21:26 mutante: mw1290 - downtimed for onsite work on mgmt, depooled earlier
  • 21:09 dzahn@cumin1001: conftool action : set/pooled=no; selector: name=mw1290.eqiad.wmnet
  • 21:08 XioNoX: delete BGP to AS131285 on cr1-eqsin
  • 20:43 arlolra: Updated Parsoid to 1922eb6 (T233459, T230359, T208070)
  • 20:43 arlolra: T208070
  • 20:34 arlolra@deploy1001: Finished deploy [parsoid/deploy@a6da34c]: Updating Parsoid to 1922eb6 (duration: 08m 39s)
  • 20:25 arlolra@deploy1001: Started deploy [parsoid/deploy@a6da34c]: Updating Parsoid to 1922eb6
  • 20:06 mholloway-shell@deploy1001: Finished deploy [mobileapps/deploy@1f9fedd]: Update mobileapps to 131b83f (duration: 05m 55s)
  • 20:00 mholloway-shell@deploy1001: Started deploy [mobileapps/deploy@1f9fedd]: Update mobileapps to 131b83f
  • 19:15 XenoRyet: Updated payments-wiki from 5193dcdfa9 to 80dead6444
  • 17:37 twentyafterfour@deploy1001: Finished deploy [releng/phatality@62e2870]: fix T234223 (duration: 03m 03s)
  • 17:33 twentyafterfour@deploy1001: Started deploy [releng/phatality@62e2870]: fix T234223
  • 17:24 twentyafterfour@deploy1001: Started deploy [releng/phatality@62e2870]: fix T234223
  • 17:18 twentyafterfour@deploy1001: Finished deploy [releng/phatality@62e2870]: fix T234223 (duration: 00m 05s)
  • 17:18 twentyafterfour@deploy1001: Started deploy [releng/phatality@62e2870]: fix T234223
  • 17:15 twentyafterfour@deploy1001: deploy aborted: fix T234223 (duration: 06m 24s)
  • 17:10 twentyafterfour: deploy failed
  • 17:09 twentyafterfour@deploy1001: Started deploy [releng/phatality@62e2870]: fix T234223
  • 17:08 twentyafterfour: deploying minor update to phatality to fix T234223
  • 16:35 cdanis@cumin1001: END (PASS) - Cookbook sre.ganeti.makevm (exit_code=0)
  • 16:34 krinkle@deploy1001: Synchronized wmf-config/InitialiseSettings.php: 0aa4b4b (duration: 00m 57s)
  • 16:34 ppchelko@deploy1001: Finished deploy [cpjobqueue/deploy@79db711]: Take job domain into account for deduplication T234226 (duration: 01m 17s)
  • 16:32 krinkle@deploy1001: Synchronized wmf-config/abusefilter.php: 0aa4b4b (duration: 00m 57s)
  • 16:32 ppchelko@deploy1001: Started deploy [cpjobqueue/deploy@79db711]: Take job domain into account for deduplication T234226
  • 16:25 cdanis@cumin1001: START - Cookbook sre.ganeti.makevm
  • 16:25 cdanis@cumin1001: END (FAIL) - Cookbook sre.ganeti.makevm (exit_code=99)
  • 16:25 cdanis@cumin1001: START - Cookbook sre.ganeti.makevm
  • 15:49 moritzm: installing console-setup bugfixes from Buster 10.1 point release
  • 15:46 cdanis@cumin1001: END (FAIL) - Cookbook sre.ganeti.makevm (exit_code=99)
  • 15:46 cdanis@cumin1001: START - Cookbook sre.ganeti.makevm
  • 15:42 moritzm: failover Ganeti master in codfw to ganeti2001
  • 15:09 jmm@cumin2001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
  • 15:09 jmm@cumin2001: START - Cookbook sre.hosts.downtime
  • 14:44 jmm@cumin2001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
  • 14:44 jmm@cumin2001: START - Cookbook sre.hosts.downtime
  • 14:29 moritzm: draining ganeti2007 for upcoming reboot (combined kernel/qemu security updates)
  • 14:17 jmm@cumin2001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
  • 14:17 jmm@cumin2001: START - Cookbook sre.hosts.downtime
  • 14:08 moritzm: draining ganeti2006 for upcoming reboot (combined kernel/qemu security updates)
  • 14:00 jmm@cumin2001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
  • 14:00 jmm@cumin2001: START - Cookbook sre.hosts.downtime
  • 13:54 moritzm: draining ganeti2005 for upcoming reboot (combined kernel/qemu security updates)
  • 13:49 jmm@cumin2001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
  • 13:49 jmm@cumin2001: START - Cookbook sre.hosts.downtime
  • 12:53 jbond@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
  • 12:51 jbond@cumin1001: START - Cookbook sre.hosts.downtime
  • 12:33 kart_: Update cxserver to 2019-09-26-034732-production (T233834, T232674, T233085)
  • 12:29 @: helmfile [EQIAD] Ran 'apply' command on namespace 'cxserver' for release 'production' .
  • 12:29 jbond42: offline puppetmaster2002 to reimage https://gerrit.wikimedia.org/r/c/operations/puppet/+/539322
  • 12:27 @: helmfile [CODFW] Ran 'apply' command on namespace 'cxserver' for release 'production' .
  • 12:24 @: helmfile [STAGING] Ran 'apply' command on namespace 'cxserver' for release 'staging' .
  • 12:00 Urbanecm: EU SWAT done #2
  • 12:00 urbanecm@deploy1001: Synchronized wmf-config/throttle.php: SWAT: 3f4f242: New throttle rule for Czech wiki course (T234113) (duration: 00m 56s)
  • 11:57 Urbanecm: Reopen EU SWAT to deploy throttle rule for October 02 (T234113)
  • 11:54 raynor: EU SWAT finished
  • 11:54 pmiazga@deploy1001: Synchronized wmf-config/InitialiseSettings.php: SWAT: Enable alternate mobile link for it, nl, ko wikis. (T206497) (duration: 00m 57s)
  • 11:27 kartik@deploy1001: Synchronized wmf-config/InitialiseSettings.php: SWAT: 539517|Enable CX out of beta in Tagalog and Central Bikol WPs (T233006, T233007) (duration: 00m 59s)
  • 11:20 hashar: Restarting Docker on integration-agent-puppet-docker-1001 # T234197
  • 11:08 hashar: Restarting Docker on CI agents to clear out some docker/iptables oddity # T234197
  • 10:48 hashar: CI outage is tracked in https://phabricator.wikimedia.org/T234197
  • 10:42 moritzm: draining ganeti2004 for upcoming reboot (combined kernel/qemu security updates)
  • 10:40 hashar: CI down due to some DNS related failure on the hosts :-\
  • 10:30 jmm@cumin2001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
  • 10:30 jmm@cumin2001: START - Cookbook sre.hosts.downtime
  • 09:30 moritzm: uploading ferm 2.4.1+wmf2+deb9u1 for stretch-wikimedia, fixes AAAA lookups (T153468)
  • 09:11 moritzm: draining ganeti2002 for upcoming reboot (combined kernel/qemu security updates)
  • 09:10 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db2091:3314 for a schema change - T233625', diff saved to https://phabricator.wikimedia.org/P9217 and previous config saved to /var/cache/conftool/dbconfig/20190930-091043-marostegui.json
  • 08:01 moritzm: installing e2fsprogs security updates on Stretch/Buster
  • 07:56 marostegui: Stop dbstore1003:3311 for troubleshooting
  • 06:47 moritzm: installing exim security updates on buster

2019-09-28

  • 16:28 vgutierrez: restarting acme-chief on acmechief1001

2019-09-27

  • 22:44 mutante: phab2001 - apt-get autoremove - remove unused python and ruby packages
  • 22:36 mutante: phab2001 - upgrade php7.2 packages to 7.2.22 (T230024)
  • 22:03 mutante: webperf1001, webperf2001: restart envoyproxy to pick up new cert with the right subject alt. names
  • 18:22 mutante: mwdebug1001, mwdebug1002 - deleted from /srv/mediawiki/: php-1.34.0-wmf.16, .17, .18, .19 and .20 (current is .24) - usage back to about 57% (T234063)
  • 18:17 mutante: mwdebug1001, mwdebug1002 - apt-get clean saves about 3GB and gets usage down from 94% to 87% on / (T234063)
  • 16:01 XioNoX: delete BGP to AS34305 on cr2-esams
  • 15:34 elukey: update pcc facts to add new hosts
  • 15:02 moritzm: installing usb.ids update from Buster 10.1 point release
  • 14:45 moritzm: installing ncurses bugfix update from Buster 10.1 point release
  • 14:39 moritzm: installing postgresql-common bugfix update from Buster 10.1 point release
  • 14:32 effie: Disable puppet and reload apache on mw* for 539465 and 539488 - T229792
  • 13:33 marostegui: Set candidate masters in dbctl T234039
  • 13:31 jmm@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
  • 13:29 jmm@cumin1001: START - Cookbook sre.hosts.downtime
  • 13:16 moritzm: reimaging auth1002 to buster
  • 13:09 akosiaris: reboot ganeti2001 T233906
  • 13:08 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.decommission (exit_code=0)
  • 13:08 marostegui@cumin1001: START - Cookbook sre.hosts.decommission
  • 13:03 effie: Disable puppet on mwmaint1002 to test noc.wikimedia.org with PHP7
  • 12:58 elukey@cumin1001: END (FAIL) - Cookbook sre.hosts.downtime (exit_code=99)
  • 12:56 elukey@cumin1001: START - Cookbook sre.hosts.downtime
  • 12:48 moritzm: installing openldap security updates on Buster
  • 12:43 jmm@cumin2001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
  • 12:41 jmm@cumin2001: START - Cookbook sre.hosts.downtime
  • 12:37 moritzm: killing stray processes from old openjdk-8 build on boron (probably test suite not properly terminated)
  • 12:30 moritzm: installing glib2.0 security updates on Buster
  • 12:14 moritzm: reimaging auth2001 to buster
  • 12:06 moritzm: install gnupg2 security update from Buster 10.1 point release
  • 10:49 marostegui@cumin1001: dbctl commit (dc=all): 'Repool db2088:3311 db2091:3312 db2084:3314 db2089:3315 db2089:3316 db2087:3317 T233625', diff saved to https://phabricator.wikimedia.org/P9213 and previous config saved to /var/cache/conftool/dbconfig/20190927-104914-marostegui.json
  • 10:34 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.decommission (exit_code=0)
  • 10:33 marostegui@cumin1001: START - Cookbook sre.hosts.decommission
  • 10:02 urbanecm@deploy1001: Synchronized wmf-config/throttle.php: New throttle rule for Czech course (T234024) (duration: 00m 59s)
  • 09:30 jmm@cumin2001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
  • 09:30 jmm@cumin2001: START - Cookbook sre.hosts.downtime
  • 09:06 moritzm: running a few ferm tests on cp1008, puppet disabled
  • 07:36 godog: swift eqiad-prod: remove ms-be1027 - T233289
  • 05:42 XioNoX: remove tcp-mss clamping from cr2-eqiad - T232602
  • 05:30 XioNoX: remove tcp-mss clamping from cr2-eqord - T232602
  • 05:23 XioNoX: remove tcp-mss clamping from cr1-eqiad - T232602
  • 00:53 twentyafterfour: hotfixing phabricator fatal exception refs T233998

2019-09-26

  • 22:15 jforrester@deploy1001: Synchronized wmf-config/InitialiseSettings.php: T211620 Enable emails for certain notification types by default on officewiki (duration: 00m 56s)
  • 22:11 jforrester@deploy1001: Synchronized wmf-config/InitialiseSettings.php: Stop setting wgPageTriageNoIndexTemplates, never read (duration: 00m 57s)
  • 22:02 jforrester@deploy1001: Synchronized wmf-config/filebackend.php: T228547 Stop sharding wgFileBackends shardViaHashLevels for math-render (duration: 00m 56s)
  • 21:59 jforrester@deploy1001: Synchronized wmf-config/InitialiseSettings.php: T228547 Stop setting wgMathFileBackend, wgMathPath, wgMathDirectory (unused) (duration: 00m 56s)
  • 21:57 jforrester@deploy1001: Synchronized wmf-config/CommonSettings.php: T228547 Stop setting wgTexvc, wgMathTexvcCheckExecutable, wgMathCheckFiles (unused) (duration: 01m 00s)
  • 20:53 ejegg: updated fundraising CiviCRM from 52d2a24404 to 6d90d0cf06
  • 19:58 phedenskog@deploy1001: Finished deploy [performance/navtiming@1880a79]: Test deploy (duration: 00m 05s)
  • 19:58 phedenskog@deploy1001: Started deploy [performance/navtiming@1880a79]: Test deploy
  • 19:52 krinkle@deploy1001: Finished deploy [performance/navtiming@f2a0863]: (no justification provided) (duration: 00m 05s)
  • 19:52 krinkle@deploy1001: Started deploy [performance/navtiming@f2a0863]: (no justification provided)
  • 19:46 phedenskog@deploy1001: Finished deploy [performance/navtiming@f2a0863]: (no justification provided) (duration: 00m 05s)
  • 19:46 phedenskog@deploy1001: Started deploy [performance/navtiming@f2a0863]: (no justification provided)
  • 19:23 twentyafterfour@deploy1001: rebuilt and synchronized wikiversions files: all wikis to 1.34.0-wmf.24 refs T220749
  • 19:17 volans@deploy1001: Finished deploy [homer/deploy@715d842]: Initial Homer release (test) (duration: 00m 16s)
  • 19:17 volans@deploy1001: Started deploy [homer/deploy@715d842]: Initial Homer release (test)
  • 19:13 twentyafterfour: preparing to deploy the mediawiki train for 1.34.0-wmf.24. refs T220749
  • 18:45 ayounsi@deploy1001: Finished deploy [homer/deploy@715d842]: Initial Homer release (duration: 00m 22s)
  • 18:44 ayounsi@deploy1001: Started deploy [homer/deploy@715d842]: Initial Homer release
  • 18:35 jforrester@deploy1001: Synchronized wmf-config/CirrusSearch-production.php: Stop setting various static settings, now set in IS (duration: 01m 04s)
  • 18:35 mforns@deploy1001: Finished deploy [analytics/refinery@cd2f43b]: deploy refinery using scap (together with refinery-source v0.0.101) (duration: 06m 04s)
  • 18:34 jforrester@deploy1001: Synchronized wmf-config/InitialiseSettings.php: Set last static Cirrus settings directly in IS (duration: 01m 07s)
  • 18:29 mforns@deploy1001: Started deploy [analytics/refinery@cd2f43b]: deploy refinery using scap (together with refinery-source v0.0.101)
  • 18:25 volans@deploy1001: Finished deploy [homer/deploy@715d842]: Initial Homer release (duration: 00m 23s)
  • 18:25 volans@deploy1001: Started deploy [homer/deploy@715d842]: Initial Homer release
  • 18:17 jforrester@deploy1001: Synchronized wmf-config/CirrusSearch-common.php: Stop indirectly setting wgWMESearchRelevancePages (duration: 01m 04s)
  • 18:15 volans@deploy1001: Finished deploy [homer/deploy@68ac5cc]: Initial Homer release (duration: 00m 31s)
  • 18:15 volans@deploy1001: Started deploy [homer/deploy@68ac5cc]: Initial Homer release
  • 18:11 jforrester@deploy1001: Synchronized wmf-config/InitialiseSettings.php: Set wgWMESearchRelevancePages directly in InitialiseSettings (duration: 01m 04s)
  • 18:07 ayounsi@deploy1001: Finished deploy [homer/deploy@68ac5cc]: Initial Homer release (duration: 00m 55s)
  • 18:06 ayounsi@deploy1001: Started deploy [homer/deploy@68ac5cc]: Initial Homer release
  • 18:04 mutante: running mcrouter_generate_certs to add a cert for wtp2001.codfw.wmnet for T233654
  • 18:04 volans@deploy1001: Finished deploy [homer/deploy@68ac5cc]: Initial Homer release (duration: 00m 03s)
  • 18:04 volans@deploy1001: Started deploy [homer/deploy@68ac5cc]: Initial Homer release
  • 18:03 volans@deploy1001: Finished deploy [homer/deploy@68ac5cc]: Initial Homer release (duration: 00m 42s)
  • 18:02 volans@deploy1001: Started deploy [homer/deploy@68ac5cc]: Initial Homer release
  • 17:58 jforrester@deploy1001: Synchronized wmf-config/CirrusSearch-common.php: Stop setting bits of the CirrusSearch timeoutes arrays, already set in IS (duration: 01m 04s)
  • 17:57 jforrester@deploy1001: Synchronized wmf-config/InitialiseSettings.php: Set the whole of the CirrusSearch timeoutes arrays directly (duration: 01m 00s)
  • 17:49 jforrester@deploy1001: Synchronized wmf-config/CirrusSearch-common.php: Stop setting static values now set in InitialiseSettings (duration: 01m 04s)
  • 17:49 Amir1: end of foreachwikiindblist wikidataclient extensions/Wikibase/lib/maintenance/populateSitesTable.php --force-protocol https (T233835, T233246)
  • 17:47 jforrester@deploy1001: Synchronized wmf-config/InitialiseSettings.php: Move static settings from CirrusSettings-common (duration: 01m 05s)
  • 17:43 ppchelko@deploy1001: Finished deploy [changeprop/deploy@2db4bff]: Modify ORES processor for new-style events T225211 (duration: 02m 04s)
  • 17:41 ppchelko@deploy1001: Started deploy [changeprop/deploy@2db4bff]: Modify ORES processor for new-style events T225211
  • 17:35 elukey: run apt-get autoremove on stat* and notebook* to clean up old python2 deps
  • 17:31 Amir1: start of foreachwikiindblist wikidataclient extensions/Wikibase/lib/maintenance/populateSitesTable.php --force-protocol https (T233835, T233246)
  • 17:14 @: helmfile [EQIAD] Ran 'apply' command on namespace 'eventgate-main' for release 'main' .
  • 17:13 cdanis: βœ”οΈ cdanis@cumin1001.eqiad.wmnet ~ πŸ•§β˜• sudo debdeploy deploy -u 2019-09-26-conftool.yaml -s eqiad
  • 17:11 @: helmfile [CODFW] Ran 'apply' command on namespace 'eventgate-main' for release 'main' .
  • 17:08 @: helmfile [STAGING] Ran 'apply' command on namespace 'eventgate-main' for release 'main' .
  • 16:40 papaul: upgrading firmware on scs-c1-codfw
  • 16:37 cdanis: βœ”οΈ cdanis@cumin1001.eqiad.wmnet ~ πŸ•›β˜• sudo debdeploy deploy -u 2019-09-26-conftool.yaml -s codfw
  • 15:56 cdanis: sudo debdeploy deploy -u 2019-09-26-conftool.yaml -s esams
  • 15:35 cdanis: βœ”οΈ cdanis@cumin1001.eqiad.wmnet ~ πŸ•¦β˜• sudo debdeploy deploy -u 2019-09-26-conftool.yaml -s ulsfo
  • 15:15 cdanis: βœ”οΈ cdanis@cumin1001.eqiad.wmnet ~ πŸ•šβ˜• sudo debdeploy deploy -u 2019-09-26-conftool.yaml -s eqsin
  • 15:06 mforns@deploy1001: Finished deploy [analytics/aqs/deploy@1a1c08c]: Deploying analytics-aqs using scap (duration: 02m 44s)
  • 15:03 mforns@deploy1001: Started deploy [analytics/aqs/deploy@1a1c08c]: Deploying analytics-aqs using scap
  • 15:00 cdanis: dbctl schema migration done T229677
  • 14:47 cdanis: dbctl schema migration on instances to add note field https://wikitech.wikimedia.org/wiki/Dbctl#Schema_upgrades T229677
  • 14:43 cdanis@cumin1001: dbctl commit (dc=all): 'dbctl 1.2.0 adds hostByName to the output, but it is not used by Mediawiki; this commit is the first made with the new release; no-op change', diff saved to https://phabricator.wikimedia.org/P9208 and previous config saved to /var/cache/conftool/dbconfig/20190926-144328-cdanis.json
  • 14:41 cdanis: βœ”οΈ cdanis@cumin1001.eqiad.wmnet ~ πŸ•₯β˜• sudo debdeploy deploy -u 2019-09-26-conftool.yaml -s cumin
  • 14:37 cdanis: βœ”οΈ cdanis@cumin1001.eqiad.wmnet ~ πŸ•₯β˜• sudo debdeploy deploy -u 2019-09-26-conftool.yaml -s puppetmaster
  • 14:36 cdanis: βœ”οΈ cdanis@puppetmaster1001.eqiad.wmnet ~ πŸ•₯β˜• sudo apt install python3-conftool
  • 14:19 cdanis: βœ”οΈ cdanis@install1002.wikimedia.org ~ πŸ•₯β˜• sudo -E reprepro -C main include jessie-wikimedia conftool_1.2.0-1+deb8u1_amd64.changes
  • 14:16 cdanis: βœ”οΈ cdanis@install1002.wikimedia.org ~ πŸ•™β˜• sudo -E reprepro -C main include buster-wikimedia conftool_1.2.0-1+deb10u1_amd64.changes ; sudo -E reprepro -C main include stretch-wikimedia conftool_1.2.0-1_amd64.changes
  • 11:31 Urbanecm: mwscript importImages.php --wiki=commonswiki --comment-ext=txt --user='Nederlandse Leeuw' /home/urbanecm/T233922 (T233922)
  • 11:23 Urbanecm: EU SWAT done
  • 11:16 urbanecm@deploy1001: Synchronized wmf-config/InitialiseSettings.php: SWAT: 96bba4c: Add wgMinervaCustomLogos for szlwiki (T233104; 3/3) (duration: 01m 05s)
  • 11:14 Urbanecm: Purge https://en.wikipedia.org/static/images/mobile/copyright/wikipedia-wordmark-szl.svg (T233104)
  • 11:13 urbanecm@deploy1001: Synchronized static/images/mobile/copyright/wikipedia-wordmark-szl.svg: SWAT: 96bba4c: Add wgMinervaCustomLogos for szlwiki (T233104; 2/3) (duration: 01m 05s)
  • 11:10 urbanecm@deploy1001: Synchronized wmf-config/InitialiseSettings.php: SWAT: 7645e55: Enable reader demographic surveys in English, Polish, and Russian (T232525) (duration: 01m 06s)
  • 11:09 jbond@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
  • 11:07 urbanecm@deploy1001: Synchronized static/images/mobile/copyright/wikipedia-wordmark-szl.png: SWAT: 96bba4c: Add wgMinervaCustomLogos for szlwiki (T233104; 1/3) (duration: 01m 08s)
  • 11:07 jbond@cumin1001: START - Cookbook sre.hosts.downtime
  • 10:53 jbond42: reimagaing puppetmaster1002 to buster
  • 10:48 vgutierrez: switching from nginx to ats-tls on cp5007 - T231627
  • 09:55 moritzm: bouncing postgres on puppetdb1002/2002
  • 09:18 vgutierrez: switching from nginx to ats-tls on cp1080 - T231433
  • 09:13 marostegui@cumin1001: dbctl commit (dc=all): 'Fully repool db1078', diff saved to https://phabricator.wikimedia.org/P9203 and previous config saved to /var/cache/conftool/dbconfig/20190926-091348-marostegui.json
  • 09:04 mobrovac@deploy1001: Finished deploy [restbase/deploy@c419651]: Add nqo.wp.org - T233833 (duration: 21m 32s)
  • 09:02 jmm@cumin2001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
  • 09:02 jmm@cumin2001: START - Cookbook sre.hosts.downtime
  • 08:47 vgutierrez: switching from nginx to ats-tls on cp2008 - T231433
  • 08:43 mobrovac@deploy1001: Started deploy [restbase/deploy@c419651]: Add nqo.wp.org - T233833
  • 08:42 marostegui@cumin1001: dbctl commit (dc=all): 'More weight to db1078', diff saved to https://phabricator.wikimedia.org/P9202 and previous config saved to /var/cache/conftool/dbconfig/20190926-084159-marostegui.json
  • 08:25 jmm@cumin2001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
  • 08:25 jmm@cumin2001: START - Cookbook sre.hosts.downtime
  • 08:22 marostegui@cumin1001: dbctl commit (dc=all): 'Change special weights from 1 to 100 - T231018', diff saved to https://phabricator.wikimedia.org/P9201 and previous config saved to /var/cache/conftool/dbconfig/20190926-082233-marostegui.json
  • 08:18 marostegui@cumin1001: dbctl commit (dc=all): 'More weight to db1078', diff saved to https://phabricator.wikimedia.org/P9200 and previous config saved to /var/cache/conftool/dbconfig/20190926-081759-marostegui.json
  • 08:13 vgutierrez: switching from nginx to ats-tls on cp3036 - T231433
  • 08:11 marostegui@cumin1001: dbctl commit (dc=all): 'Slowly repool db1078', diff saved to https://phabricator.wikimedia.org/P9199 and previous config saved to /var/cache/conftool/dbconfig/20190926-081144-marostegui.json
  • 08:09 marostegui@cumin1001: dbctl commit (dc=all): 'Slowly repool db1078', diff saved to https://phabricator.wikimedia.org/P9198 and previous config saved to /var/cache/conftool/dbconfig/20190926-080949-marostegui.json
  • 08:07 elukey: executed 'rmr /yarn-rmstore/analytics-test-hadoop/ZKRMStateRoot' on conf1004's zkCli.sh to clean up znodes - T217057
  • 08:04 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db1078 to change binlog format', diff saved to https://phabricator.wikimedia.org/P9197 and previous config saved to /var/cache/conftool/dbconfig/20190926-080442-marostegui.json
  • 08:02 marostegui: Depool db1078 to restart mysql to change its binlog format to ROW
  • 07:57 vgutierrez: switching from nginx to ats-tls on cp4023 - T231433
  • 07:49 godog: swift eqiad-prod: continue ms-be1027 decom - T233289
  • 07:47 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.decommission (exit_code=0)
  • 07:47 marostegui@cumin1001: START - Cookbook sre.hosts.decommission
  • 07:42 moritzm: draining ganeti2001 for upcoming reboot (combined kernel/qemu security updates)
  • 07:41 vgutierrez: switching from nginx to ats-tls on cp5003 - T231433
  • 07:10 marostegui: Power off db1114 for mainboard replacement T229452
  • 07:09 marostegui: Stop mysql on db1114 for mainboard replacement - T229452
  • 06:55 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.decommission (exit_code=False)
  • 06:55 marostegui@cumin1001: START - Cookbook sre.hosts.decommission
  • 06:41 marostegui: Sanitize nqowiki on db1124:3313 and db2094:3313 - T230543
  • 06:39 marostegui: Deploy schema change on db2088:3311 db2091:3312 db2084:3314 db2089:3315 db2089:3316 db2087:3317 T233625
  • 06:35 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db2088:3311 db2091:3312 db2084:3314 db2089:3315 db2089:3316 db2087:3317 T233625', diff saved to https://phabricator.wikimedia.org/P9196 and previous config saved to /var/cache/conftool/dbconfig/20190926-063555-marostegui.json
  • 06:29 marostegui@cumin1001: dbctl commit (dc=all): ' Repool db2088:3312 db2084:3315 db2087:3316 db2086:3317 T233625', diff saved to https://phabricator.wikimedia.org/P9195 and previous config saved to /var/cache/conftool/dbconfig/20190926-062922-marostegui.json
  • 05:30 marostegui@cumin1001: dbctl commit (dc=all): 'Fully pool db1081 - T230784', diff saved to https://phabricator.wikimedia.org/P9194 and previous config saved to /var/cache/conftool/dbconfig/20190926-053029-marostegui.json
  • 05:19 marostegui@cumin1001: dbctl commit (dc=all): 'Increase weight for db1081 - T230784', diff saved to https://phabricator.wikimedia.org/P9193 and previous config saved to /var/cache/conftool/dbconfig/20190926-051916-marostegui.json
  • 05:09 marostegui@cumin1001: dbctl commit (dc=all): 'Give some API weight to db1081 - T230784', diff saved to https://phabricator.wikimedia.org/P9192 and previous config saved to /var/cache/conftool/dbconfig/20190926-050937-marostegui.json
  • 05:07 marostegui@cumin1001: dbctl commit (dc=all): 'Give some weight to db1081 - T230784', diff saved to https://phabricator.wikimedia.org/P9191 and previous config saved to /var/cache/conftool/dbconfig/20190926-050722-marostegui.json
  • 05:01 marostegui@cumin1001: dbctl commit (dc=all): 'Promote db1138 to s4 master and remove read-only from s4 T230784', diff saved to https://phabricator.wikimedia.org/P9190 and previous config saved to /var/cache/conftool/dbconfig/20190926-050140-marostegui.json
  • 05:00 marostegui@cumin1001: dbctl commit (dc=all): 'Set s4 as read-only for maintenance T230784', diff saved to https://phabricator.wikimedia.org/P9189 and previous config saved to /var/cache/conftool/dbconfig/20190926-050050-marostegui.json
  • 05:00 marostegui: Starting s4 failover from db1081 to db1138 - T230784
  • 04:15 marostegui@cumin1001: dbctl commit (dc=all): 'Set db1138 with weight 0 T230784', diff saved to https://phabricator.wikimedia.org/P9188 and previous config saved to /var/cache/conftool/dbconfig/20190926-041508-marostegui.json
  • 04:10 marostegui: Start pre-switchover s4 steps T230784

2019-09-25

  • 21:59 bblack: remove GRE MTU hacks on archiva1001 gerrit2001 cobalt install1002 - T232602
  • 21:58 bblack: remove GRE MTU hacks on eqiad caches (cp1xxx) - T232602
  • 21:57 bblack: remove GRE MTU hacks on esams caches (cp3xxx) - T232602
  • 21:56 bblack: remove GRE MTU hacks on eqsin caches (cp5xxx) - T232602
  • 21:10 AndyRussG: update fruec from 97128874bf to c591bd653b
  • 21:00 ejegg: updated fundraising internal dashboard from 4473c65af0 to 69fdbec60d
  • 20:23 mholloway-shell@deploy1001: Finished deploy [mobileapps/deploy@dbf4e7e]: Speed up querySelectors in domUtil (T229286) (duration: 05m 32s)
  • 20:20 hashar: Upgrading CI Jenkins
  • 20:17 mholloway-shell@deploy1001: Started deploy [mobileapps/deploy@dbf4e7e]: Speed up querySelectors in domUtil (T229286)
  • 19:28 twentyafterfour@deploy1001: Synchronized php: group1 wikis to 1.34.0-wmf.24 refs T220749 (duration: 01m 03s)
  • 19:27 twentyafterfour@deploy1001: rebuilt and synchronized wikiversions files: group1 wikis to 1.34.0-wmf.24 refs T220749
  • 18:24 twentyafterfour@deploy1001: Finished deploy [releng/phatality@42ba003]: trying again (duration: 03m 31s)
  • 18:21 twentyafterfour@deploy1001: Started deploy [releng/phatality@42ba003]: trying again
  • 18:19 twentyafterfour@deploy1001: Finished deploy [releng/phatality@42ba003]: deploy for version 5.6.15 (duration: 00m 50s)
  • 18:19 twentyafterfour@deploy1001: Started deploy [releng/phatality@42ba003]: deploy for version 5.6.15
  • 18:13 twentyafterfour@deploy1001: Finished deploy [releng/phatality@8f05ba9]: Deploy phatality (duration: 00m 24s)
  • 18:13 twentyafterfour@deploy1001: Started deploy [releng/phatality@8f05ba9]: Deploy phatality
  • 18:11 Amir1: creating nqowiki is finished now
  • 18:10 ladsgroup@deploy1001: Synchronized wmf-config/interwiki.php: Update interwiki cache (duration: 02m 39s)
  • 18:07 ladsgroup@deploy1001: Synchronized dblists/rtl.dblist: Create nqowiki T230359 (duration: 01m 05s)
  • 18:01 Amir1: creating nqowiki is going to take five more minutes
  • 17:57 ladsgroup@deploy1001: Synchronized langlist: Create nqowiki T230359 (duration: 01m 02s)
  • 17:56 ladsgroup@deploy1001: Synchronized static/images/project-logos/: Create nqowiki T230359 (duration: 01m 05s)
  • 17:54 ladsgroup@deploy1001: Synchronized wmf-config/InitialiseSettings.php: Create nqowiki T230359 (duration: 01m 04s)
  • 17:51 ladsgroup@deploy1001: rebuilt and synchronized wikiversions files: (no justification provided)
  • 17:47 ladsgroup@deploy1001: Synchronized dblists: (no justification provided) (duration: 01m 04s)
  • 17:29 mutante: DNS - adding nqo (N'Ko) to langlist for new nqo.wikipedia, approved by langcom https://meta.wikimedia.org/wiki/Requests_for_new_languages/Wikipedia_N'Ko (T230359)
  • 17:11 ladsgroup@deploy1001: Synchronized php-1.34.0-wmf.23/extensions/WikimediaMaintenance/addWiki.php: Redefine RevisionStore service for the wiki being created (T212881) (duration: 01m 05s)
  • 17:08 ladsgroup@deploy1001: Synchronized php-1.34.0-wmf.24/extensions/WikimediaMaintenance/addWiki.php: Redefine RevisionStore service for the wiki being created (T212881) (duration: 01m 04s)
  • 16:19 tgr@deploy1001: Synchronized wmf-config/InitialiseSettings.php: SWAT: GrowthExperiments: Enable WelcomeSurvey for euwiki (T233063) (duration: 01m 04s)
  • 16:06 kartik@deploy1001: Synchronized wmf-config/InitialiseSettings.php: SWAT: 537628|Fix incorrect channel name for TranslationNotifications extension (T144780) (duration: 01m 06s)
  • 15:38 moritzm: installing php5 security updates
  • 15:07 moritzm: imported jenkins 2.176.4 for jessie/stretch T233214
  • 14:57 filippo@cumin1001: END (FAIL) - Cookbook sre.hosts.decommission (exit_code=True)
  • 14:57 filippo@cumin1001: START - Cookbook sre.hosts.decommission
  • 14:55 ladsgroup@deploy1001: Synchronized php-1.34.0-wmf.24/extensions/Wikibase/view/lib/resources.php: Revert "Merge valueview modules": T233800 (duration: 01m 04s)
  • 14:53 anomie@deploy1001: Synchronized wmf-config/InitialiseSettings.php: Fix Draft namespace aliases (T233770) (duration: 01m 04s)
  • 14:52 onimisionipe: pool wdqs1005 - lag issues have minimized.
  • 14:38 moritzm: restarting apache on analytics-tool/an-tool to pick up Expat security update
  • 14:35 filippo@cumin1001: END (FAIL) - Cookbook sre.hosts.decommission (exit_code=True)
  • 14:34 filippo@cumin1001: START - Cookbook sre.hosts.decommission
  • 14:29 moritzm: restarting apache on grafana1001 to pick up Expat security update
  • 14:14 moritzm: restarting apache on various services to pick up Expat security update (releases, netmon, miscweb, graphite, planet,puppetboard)
  • 14:02 marostegui: Deploy schema change on db2086:3318
  • 14:00 effie: Rolling restart thumbor for expat updat
  • 13:55 moritzm: rolling restart of apache on webperf* to pick up Expat security update
  • 13:53 marostegui@cumin1001: dbctl commit (dc=all): 'Fully repool db1075 after BBU replacement', diff saved to https://phabricator.wikimedia.org/P9183 and previous config saved to /var/cache/conftool/dbconfig/20190925-135317-marostegui.json
  • 13:52 filippo@cumin1001: END (PASS) - Cookbook sre.hosts.decommission (exit_code=False)
  • 13:51 filippo@cumin1001: START - Cookbook sre.hosts.decommission
  • 13:51 filippo@cumin1001: END (ERROR) - Cookbook sre.hosts.decommission (exit_code=97)
  • 13:51 filippo@cumin1001: START - Cookbook sre.hosts.decommission
  • 13:45 _joe_: restarting trafficserver on cp1075 to pick up the change
  • 13:41 gilles@deploy1001: Synchronized wmf-config/InitialiseSettings.php: T230817 Remove origin trials config (duration: 01m 05s)
  • 13:31 marostegui@cumin1001: dbctl commit (dc=all): 'Increase weight for db1075 after BBU replacement', diff saved to https://phabricator.wikimedia.org/P9182 and previous config saved to /var/cache/conftool/dbconfig/20190925-133146-marostegui.json
  • 13:31 moritzm: installing remaining expat security updates
  • 13:21 marostegui@cumin1001: dbctl commit (dc=all): 'Increase weight for db1075 after BBU replacement', diff saved to https://phabricator.wikimedia.org/P9181 and previous config saved to /var/cache/conftool/dbconfig/20190925-132147-marostegui.json
  • 13:11 marostegui@cumin1001: dbctl commit (dc=all): 'Increase weight for db1075 after BBU replacement', diff saved to https://phabricator.wikimedia.org/P9180 and previous config saved to /var/cache/conftool/dbconfig/20190925-131149-marostegui.json
  • 13:06 marostegui@cumin1001: dbctl commit (dc=all): 'Slowly repool db1075 after replacing its BBU', diff saved to https://phabricator.wikimedia.org/P9179 and previous config saved to /var/cache/conftool/dbconfig/20190925-130613-marostegui.json
  • 12:56 marostegui@cumin1001: dbctl commit (dc=all): 'Repool db2085:3311 T233625', diff saved to https://phabricator.wikimedia.org/P9178 and previous config saved to /var/cache/conftool/dbconfig/20190925-125601-marostegui.json
  • 12:51 marostegui@cumin1001: dbctl commit (dc=all): ' Depool for schema change on the logging table: db2088:3312 db2084:3315 db2087:3316 db2086:3317 T233625', diff saved to https://phabricator.wikimedia.org/P9177 and previous config saved to /var/cache/conftool/dbconfig/20190925-125140-marostegui.json
  • 12:47 akosiaris@: helmfile [STAGING] Ran 'sync' command on namespace 'kube-system' for release 'calico-policy-controller' .
  • 12:47 akosiaris@: helmfile [CODFW] Ran 'sync' command on namespace 'kube-system' for release 'calico-policy-controller' .
  • 12:46 akosiaris@: helmfile [EQIAD] Ran 'sync' command on namespace 'kube-system' for release 'calico-policy-controller' .
  • 12:45 akosiaris@: helmfile [EQIAD] Ran 'apply' command on namespace 'kube-system' for release 'calico-policy-controller' .
  • 12:45 akosiaris@: helmfile [CODFW] Ran 'apply' command on namespace 'kube-system' for release 'calico-policy-controller' .
  • 12:44 akosiaris@: helmfile [STAGING] Ran 'apply' command on namespace 'kube-system' for release 'calico-policy-controller' .
  • 12:44 marostegui: Repool labsdb1011 T233766
  • 12:41 marostegui: Shutdown db1075 for onsite maintenance T233534
  • 12:37 marostegui: Stop MySQL on db1075 for BBU replacement T233534
  • 12:37 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db1075 for BBU replacement T233534', diff saved to https://phabricator.wikimedia.org/P9176 and previous config saved to /var/cache/conftool/dbconfig/20190925-123736-marostegui.json
  • 12:34 onimisionipe: depool wdqs1005 to allow it catch up on lag
  • 12:32 @: helmfile [STAGING] Ran 'apply' command on namespace 'restrouter' for release 'staging' .
  • 12:29 @: helmfile [EQIAD] Ran 'apply' command on namespace 'restrouter' for release 'production' .
  • 12:28 @: helmfile [CODFW] Ran 'sync' command on namespace 'restrouter' for release 'production' .
  • 12:18 mholloway-shell@deploy1001: Finished deploy [mobileapps/deploy@241b284]: Performance tweaks: domUtil + addSectionEditButtons (T229286) (duration: 05m 17s)
  • 12:13 mholloway-shell@deploy1001: Started deploy [mobileapps/deploy@241b284]: Performance tweaks: domUtil + addSectionEditButtons (T229286)
  • 12:05 akosiaris: depool kubernetes1001 and disable puppet on it for rsyslog mmkubernetes testing
  • 12:05 akosiaris@puppetmaster1001: conftool action : set/pooled=no; selector: name=kubernetes1001.*
  • 11:57 vgutierrez: switch cp1078 from nginx to ats-tls - T231433
  • 11:37 vgutierrez: switch cp2005 from nginx to ats-tls - T231433
  • 11:29 onimisionipe: restarted wdqs-blazegraph on wdqs1005
  • 11:15 onimisionipe: repooled wdqs1004 to reduce load on the wdqs public cluster
  • 11:15 Urbanecm: EU SWAT done
  • 11:13 vgutierrez: switch cp3035 from nginx to ats-tls - T231433
  • 11:07 urbanecm@deploy1001: Synchronized wmf-config/InitialiseSettings.php: SWAT: 127485c: Fully close bgwikinews (T233322) (duration: 01m 06s)
  • 10:48 vgutierrez: Switch from nginx to ats-tls on cp4022 - T231433
  • 10:46 jmm@cumin2001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
  • 10:46 jmm@cumin2001: START - Cookbook sre.hosts.downtime
  • 10:27 twentyafterfour@deploy1001: Finished deploy [releng/phatality@8f05ba9]: (no justification provided) (duration: 00m 16s)
  • 10:26 twentyafterfour@deploy1001: Started deploy [releng/phatality@8f05ba9]: (no justification provided)
  • 10:26 vgutierrez: switch cp5002 from nginx to ats-tls - T231433
  • 10:25 twentyafterfour@deploy1001: Finished deploy [releng/phatality@8f05ba9]: (no justification provided) (duration: 00m 12s)
  • 10:25 twentyafterfour@deploy1001: Started deploy [releng/phatality@8f05ba9]: (no justification provided)
  • 10:22 twentyafterfour@deploy1001: deploy aborted: (no justification provided) (duration: 00m 42s)
  • 10:21 twentyafterfour@deploy1001: Started deploy [releng/phatality@8f05ba9]: (no justification provided)
  • 10:13 twentyafterfour@deploy1001: Finished deploy [releng/phatality@8f05ba9]: (no justification provided) (duration: 45m 54s)
  • 09:51 @: helmfile [EQIAD] Ran 'apply' command on namespace 'restrouter' for release 'production' .
  • 09:50 @: helmfile [CODFW] Ran 'apply' command on namespace 'restrouter' for release 'codfw' .
  • 09:27 twentyafterfour@deploy1001: Started deploy [releng/phatality@8f05ba9]: (no justification provided)
  • 09:20 twentyafterfour@deploy1001: Finished deploy [releng/phatality@8f05ba9]: (no justification provided) (duration: 02m 24s)
  • 09:18 twentyafterfour@deploy1001: Started deploy [releng/phatality@8f05ba9]: (no justification provided)
  • 09:16 twentyafterfour@deploy1001: Finished deploy [releng/phatality@8f05ba9]: (no justification provided) (duration: 00m 54s)
  • 09:15 twentyafterfour@deploy1001: Started deploy [releng/phatality@8f05ba9]: (no justification provided)
  • 09:07 @: helmfile [CODFW] Ran 'sync' command on namespace 'restrouter' for release 'codfw' .
  • 09:06 @: helmfile [EQIAD] Ran 'sync' command on namespace 'restrouter' for release 'production' .
  • 09:02 @: helmfile [EQIAD] Ran 'sync' command on namespace 'restrouter' for release 'production' .
  • 09:02 @: helmfile [EQIAD] Ran 'sync' command on namespace 'restrouter' for release 'production' .
  • 09:01 @: helmfile [EQIAD] Ran 'sync' command on namespace 'restrouter' for release 'production' .
  • 08:52 godog: roll-restart kibana
  • 08:50 jmm@cumin2001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
  • 08:50 jmm@cumin2001: START - Cookbook sre.hosts.downtime
  • 08:48 twentyafterfour@deploy1001: Finished deploy [releng/phatality@8f05ba9]: (no justification provided) (duration: 00m 05s)
  • 08:48 twentyafterfour@deploy1001: Started deploy [releng/phatality@8f05ba9]: (no justification provided)
  • 08:48 twentyafterfour@deploy1001: Finished deploy [releng/phatality@8f05ba9]: (no justification provided) (duration: 09m 26s)
  • 08:44 vgutierrez: repooling cp4027 - T233667
  • 08:39 twentyafterfour@deploy1001: Started deploy [releng/phatality@8f05ba9]: (no justification provided)
  • 07:51 dcausse@deploy1001: Synchronized wmf-config/CirrusSearch-production.php: T233584 revert: [cirrus] temp disable sanity check (duration: 01m 05s)
  • 07:38 moritzm: installing emacs updates for buster (from SUA update, extended ELPA repository key)
  • 07:28 urbanecm@deploy1001: Synchronized static/images/mobile/copyright/: c761ec1: Revert "Add localized Wikipedia wordmark for szlwiki" (T233104) (duration: 01m 04s)
  • 07:27 urbanecm@deploy1001: Synchronized wmf-config/InitialiseSettings.php: c761ec1: Revert "Add localized Wikipedia wordmark for szlwiki" (T233104) (duration: 01m 16s)
  • 07:17 onimisionipe: pool wdqs1005 to allow depooling wdqs1004 to handle lag issues
  • 07:17 elukey: allow analytics users to log in into stat1005
  • 06:33 _joe_: restarting pybal on all low-traffic lbs
  • 06:29 @: helmfile [CODFW] Ran 'sync' command on namespace 'restrouter' for release 'codfw' .
  • 06:29 @: helmfile [EQIAD] Ran 'sync' command on namespace 'restrouter' for release 'production' .
  • 06:21 marostegui: Deploy schema change on db2085:3311 T233625
  • 06:20 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db2085:3311 T233625', diff saved to https://phabricator.wikimedia.org/P9171 and previous config saved to /var/cache/conftool/dbconfig/20190925-062036-marostegui.json
  • 05:33 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.decommission (exit_code=False)
  • 05:33 marostegui@cumin1001: START - Cookbook sre.hosts.decommission
  • 05:24 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.decommission (exit_code=False)
  • 05:24 marostegui@cumin1001: START - Cookbook sre.hosts.decommission
  • 05:24 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.decommission (exit_code=False)
  • 05:23 marostegui@cumin1001: START - Cookbook sre.hosts.decommission
  • 05:23 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.decommission (exit_code=False)
  • 05:23 marostegui@cumin1001: START - Cookbook sre.hosts.decommission
  • 05:23 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.decommission (exit_code=False)
  • 05:23 marostegui@cumin1001: START - Cookbook sre.hosts.decommission
  • 05:11 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.decommission (exit_code=False)
  • 05:11 marostegui@cumin1001: START - Cookbook sre.hosts.decommission
  • 05:06 marostegui: Run a data check on labsdb1011 - T233766
  • 04:43 marostegui: Deploy schema change on s3 with replication - T231172
  • 03:28 twentyafterfour@deploy1001: rebuilt and synchronized wikiversions files: group0 wikis to 1.34.0-wmf.24 refs T220749
  • 03:03 krinkle@deploy1001: Synchronized docroot/noc/: c7c6c0ee0, 8405bf1c2 (duration: 01m 05s)
  • 03:01 krinkle@deploy1001: Synchronized src/: c7c6c0ee0, 8405bf1c2 (for noc.wm.o) (duration: 01m 09s)
  • 02:58 twentyafterfour: belatedly promoting wmf.24 to group0 refs T220749
  • 02:32 onimisionipe: depool wdqs1005 to let it catch up with lag
  • 02:30 onimisionipe: pool wdqs1006 - it has caught up with lag
  • 01:16 mutante: stat1007 - restart nagios-nrpe-server, echo "please don't use all of the RAM on this server" | wall
  • 01:14 krinkle@deploy1001: Synchronized wmf-config/: 3373247e12 (duration: 01m 04s)
  • 01:12 krinkle@deploy1001: Synchronized src/WmfClusters.php: 3373247e123b (duration: 01m 04s)
  • 01:08 krinkle@deploy1001: Synchronized tests: 3373247e123b5 (duration: 01m 04s)
  • 01:07 krinkle@deploy1001: Synchronized docroot/noc: 3373247e123b53 and 1efc8bd (duration: 01m 05s)
  • 01:03 krinkle@deploy1001: Synchronized README: 3373247e123b53 (duration: 01m 04s)
  • 01:00 krinkle@deploy1001: Synchronized wmf-config/InitialiseSettings.php: 3373247e123b53 - create new file (duration: 01m 05s)
  • 00:47 krinkle@deploy1001: Synchronized wmf-config/: 6dca83a9f6c2c (duration: 01m 04s)
  • 00:44 krinkle@deploy1001: Synchronized docroot/noc/: 6dca83a9f6c2c (duration: 01m 05s)
  • 00:43 krinkle@deploy1001: Synchronized tests/: 6dca83a9f6c2c (duration: 01m 05s)
  • 00:02 mutante: cp1075 - systemctl restart vhtcpd
  • 00:02 mutante: cp1075 - systemctl status vhtcpd

2019-09-24

  • 23:38 mutante: gerrit service restart to switch LDAP backend
  • 23:35 bstorm_: wiki-replicas depooled labsdb1011
  • 23:33 mutante: gerrit2001 - restarting gerrit service
  • 23:30 mutante: switching LDAP servers used by Gerrit to readonly replicas. stop using so called "labs" config for LDAP backend.
  • 22:26 twentyafterfour@deploy1001: Finished scap: testwikis wikis to 1.34.0-wmf.24 refs T220749 (duration: 40m 38s)
  • 21:53 mutante: restbase1024 - enable IPMI over LAN which wasn't working before
  • 21:45 twentyafterfour@deploy1001: Started scap: testwikis wikis to 1.34.0-wmf.24 refs T220749
  • 21:19 mutante: ganeti4001 - racadm racreset - attempt to fix IPMI
  • 20:19 twentyafterfour: restarting gerrit due to unreasonably high garbage collection times and sluggish performance in general.
  • 19:39 XioNoX: disable asw2-d-eqiad:ge-5/0/41 excessive flapping
  • 19:28 ejegg: updated payments-wiki from 939b771800 to 5193dcdfa9
  • 19:20 twentyafterfour: branching 1.34.0-wmf.24 refs T220749
  • 18:45 AndyRussG: updated fruec from fb29cb74 to 97128874bf
  • 18:08 ejegg: updated Fundraising CiviCRM feca96a2e3 to 52d2a24404
  • 17:13 cstone: civicrm revision changed from 5def62ab05 to feca96a2e3
  • 14:40 @: helmfile [STAGING] Ran 'sync' command on namespace 'restrouter' for release 'staging' .
  • 14:28 @: helmfile [STAGING] Ran 'sync' command on namespace 'restrouter' for release 'staging' .
  • 14:24 @: helmfile [STAGING] Ran 'sync' command on namespace 'restrouter' for release 'staging' .
  • 14:24 @: helmfile [STAGING] Ran 'sync' command on namespace 'restrouter' for release 'staging' .
  • 14:17 @: helmfile [STAGING] Ran 'sync' command on namespace 'restrouter' for release 'staging' .
  • 14:09 moritzm: rebooting cloudvirt1021 for kernel update
  • 14:09 @: helmfile [STAGING] Ran 'sync' command on namespace 'restrouter' for release 'staging' .
  • 14:09 jmm@cumin2001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
  • 14:09 jmm@cumin2001: START - Cookbook sre.hosts.downtime
  • 14:09 @: helmfile [STAGING] Ran 'sync' command on namespace 'restrouter' for release 'staging' .
  • 13:50 volans@cumin1001: END (PASS) - Cookbook sre.hosts.decommission (exit_code=False)
  • 13:50 volans@cumin1001: START - Cookbook sre.hosts.decommission
  • 13:49 jbond42__: promote puppetmaster1003 to a real puppetmaster backend https://gerrit.wikimedia.org/r/c/operations/puppet/+/538686
  • 13:45 _joe_: installing the new conftool version on the cumin hosts
  • 13:40 _joe_: uploaded conftool 1.1.4-3 to stretch-wikimedia, T233679
  • 13:19 volans@cumin1001: END (FAIL) - Cookbook sre.hosts.decommission (exit_code=True)
  • 13:18 volans@cumin1001: START - Cookbook sre.hosts.decommission
  • 13:02 @: helmfile [STAGING] Ran 'sync' command on namespace 'restrouter' for release 'staging' .
  • 12:22 arturo: remove systemd-sysv from jessie-wikimedia/openstack-mitaka-jessie in install1002 (T231793)
  • 12:20 dcausse@deploy1001: Synchronized wmf-config/CirrusSearch-production.php: T233584 [cirrus] temp disable sanity check (duration: 00m 55s)
  • 12:18 mobrovac@deploy1001: Finished deploy [restbase/deploy@19d0f44]: REVERT (due to wikifeeds problems): Start using the wikifeeds service for v1/feed - T170455 (duration: 02m 35s)
  • 12:16 mobrovac@deploy1001: Started deploy [restbase/deploy@19d0f44]: REVERT (due to wikifeeds problems): Start using the wikifeeds service for v1/feed - T170455
  • 11:47 mobrovac@deploy1001: Finished deploy [restbase/deploy@87eea26]: Start using the wikifeeds service for v1/feed - T170455 (duration: 02m 35s)
  • 11:45 mobrovac@deploy1001: Started deploy [restbase/deploy@87eea26]: Start using the wikifeeds service for v1/feed - T170455
  • 11:43 Urbanecm: EU SWAT done
  • 11:41 urbanecm@deploy1001: Synchronized wmf-config/VariantSettings.php: SWAT: 11a48f8: Add support for some languages on Commons and stop support for nys on Wikidata (T230480) (duration: 00m 56s)
  • 11:39 Urbanecm: Run mwscript initSiteStats.php --wiki=napwikisource --update (T233673)
  • 11:37 urbanecm@deploy1001: Synchronized wmf-config/VariantSettings.php: SWAT: 9eaa4f8: Set wgArticleCountMethod to any for napwikisource (T233673) (duration: 00m 56s)
  • 11:30 Urbanecm: Purge https://en.wikipedia.org/static/images/project-logos/mxwikimedia.png (T233670)
  • 11:30 urbanecm@deploy1001: Synchronized wmf-config/VariantSettings.php: SWAT: b6947c5: Follow-up 8f3f0705baed: add missing namespace for eswiki (T233562) (duration: 00m 56s)
  • 11:27 urbanecm@deploy1001: Synchronized php-1.34.0-wmf.23/extensions/MassMessage/: SWAT: ba9b209: Provide deduplication info to MassMessageJob (T232379) (duration: 00m 57s)
  • 11:26 urbanecm@deploy1001: Synchronized static/images/project-logos/mxwikimedia.png: SWAT: 246b352: Update logo for mx.wikimedia (T233670) (duration: 00m 54s)
  • 11:24 urbanecm@deploy1001: Synchronized php-1.34.0-wmf.23/extensions/GrowthExperiments/modules/homepage/ext.growthExperiments.Homepage.less: SWAT: d4c64a7: Fix broken display of mobile overlay headings (T233163) (duration: 00m 57s)
  • 11:16 urbanecm@deploy1001: Synchronized wmf-config/VariantSettings.php: SWAT: 8bf6aae: Enable alternate mobile link for ar,zh,hi wikis (T206497) (duration: 00m 54s)
  • 11:10 _joe_: all wikis (including API) are now served by PHP7 T219150
  • 11:09 urbanecm@deploy1001: Synchronized wmf-config/CommonSettings.php: SWAT: a14b772: FileImporter: limited default deployment (2/2; T232539) (duration: 00m 56s)
  • 11:05 urbanecm@deploy1001: Synchronized wmf-config/VariantSettings.php: SWAT: 8a89652: FileImporter: limited default deployment (1/2; T232539) (duration: 01m 03s)
  • 10:56 mobrovac@deploy1001: Finished deploy [cpjobqueue/deploy@7857639]: Bump CirrusSearchLinksUpdate concurrency to clear the queue - T233584 (duration: 01m 00s)
  • 10:55 mobrovac@deploy1001: Started deploy [cpjobqueue/deploy@7857639]: Bump CirrusSearchLinksUpdate concurrency to clear the queue - T233584
  • 10:54 _joe_: converting all appservers to php7, T219150
  • 10:51 mobrovac@deploy1001: Finished deploy [restbase/deploy@19d0f44]: Expose the key_value buckets to production IPs - T223953 (duration: 22m 20s)
  • 10:50 _joe_: converting mw1261 to full-php7
  • 10:29 mobrovac@deploy1001: Started deploy [restbase/deploy@19d0f44]: Expose the key_value buckets to production IPs - T223953
  • 10:12 marostegui: Deploy schema change on s7 (centralauth and wikis) master with replication - T231172
  • 10:03 marostegui: Deploy schema change on s1 master with replication - T231172
  • 09:58 marostegui: Deploy schema change on labswiki (wikitech) and labtestwiki T231172
  • 09:51 effie: Upgrade to php 7.2.22 on mwmaint* - T230024
  • 09:30 marostegui: Deploy schema change on s2 master with replication - T231172
  • 09:26 effie: Upgrade to php 7.2.22 on deploy* - T230024
  • 09:14 marostegui: Drop table archive_save on frwiki T233187
  • 08:43 marostegui: Deploy schema change on s8 master with replication - T231172
  • 08:37 mvolz@deploy1001: scap-helm zotero finished
  • 08:37 mvolz@deploy1001: scap-helm zotero cluster codfw completed
  • 08:37 mvolz@deploy1001: scap-helm zotero upgrade production -f zotero-values-codfw.yaml stable/zotero [namespace: zotero, clusters: codfw]
  • 08:36 jynus: stop db1114 mariadb process for some time
  • 08:33 moritzm: installed expat security updates on remaining mw* servers
  • 08:33 mvolz@deploy1001: scap-helm zotero finished
  • 08:32 mvolz@deploy1001: scap-helm zotero cluster eqiad completed
  • 08:32 mvolz@deploy1001: scap-helm zotero upgrade production -f zotero-values-eqiad.yaml stable/zotero [namespace: zotero, clusters: eqiad]
  • 08:30 marostegui: Deploy schema change on s4 master with replication - T231172
  • 08:29 effie: Disable puppet on api cluster and restart php-fpm to finish php7 migration - T219150
  • 08:19 mvolz@deploy1001: scap-helm zotero finished
  • 08:19 mvolz@deploy1001: scap-helm zotero cluster staging completed
  • 08:19 mvolz@deploy1001: scap-helm zotero upgrade staging -f zotero-values-staging.yaml stable/zotero [namespace: zotero, clusters: staging]
  • 08:18 marostegui: Deploy schema change on s5 master with replication - T231172
  • 07:51 onimisionipe: depool wdqs1006 to clear HTTP too many request error
  • 07:43 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.decommission (exit_code=False)
  • 07:42 marostegui@cumin1001: START - Cookbook sre.hosts.decommission
  • 07:38 jmm@cumin2001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
  • 07:38 jmm@cumin2001: START - Cookbook sre.hosts.downtime
  • 07:33 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.decommission (exit_code=False)
  • 07:32 marostegui@cumin1001: START - Cookbook sre.hosts.decommission
  • 07:29 moritzm: uploaded openjdk-8 8u222-b10-1~deb10u2 to buster-wikimedia component/jdk8 T233604
  • 07:26 elukey@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
  • 07:24 elukey@cumin1001: START - Cookbook sre.hosts.downtime
  • 07:18 godog: swift eqiad-prod: continue ms-be1027 decom T233289
  • 06:40 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.decommission (exit_code=False)
  • 06:40 marostegui@cumin1001: START - Cookbook sre.hosts.decommission
  • 06:37 marostegui: Stop MySQL on db1066 - T233071
  • 06:36 marostegui: Remove db1066 from tendril and zarcillo T233071
  • 06:30 marostegui@cumin1001: dbctl commit (dc=all): 'Fully repool db1075', diff saved to https://phabricator.wikimedia.org/P9163 and previous config saved to /var/cache/conftool/dbconfig/20190924-063002-marostegui.json
  • 06:19 marostegui@cumin1001: dbctl commit (dc=all): 'Give more weight to db1075', diff saved to https://phabricator.wikimedia.org/P9162 and previous config saved to /var/cache/conftool/dbconfig/20190924-061943-marostegui.json
  • 05:39 marostegui@cumin1001: dbctl commit (dc=all): 'Give more weight to db1075', diff saved to https://phabricator.wikimedia.org/P9161 and previous config saved to /var/cache/conftool/dbconfig/20190924-053919-marostegui.json
  • 05:25 marostegui@cumin1001: dbctl commit (dc=all): 'Give weight 100 to db1075', diff saved to https://phabricator.wikimedia.org/P9160 and previous config saved to /var/cache/conftool/dbconfig/20190924-052545-marostegui.json
  • 05:13 cdanis@cumin1001: dbctl commit (dc=all): 're-do T230783 master promotion and set read-write', diff saved to https://phabricator.wikimedia.org/P9159 and previous config saved to /var/cache/conftool/dbconfig/20190924-051307-cdanis.json
  • 05:11 marostegui@cumin1001: dbctl commit (dc=all): 'Promote db1123 to s3 master and remove read-only from s3 T230783', diff saved to https://phabricator.wikimedia.org/P9158 and previous config saved to /var/cache/conftool/dbconfig/20190924-051147-marostegui.json
  • 05:10 cdanis: T230783 mark DEFAULT not s3 as readonly in etcd etcd dbconfig data
  • 05:00 marostegui@cumin1001: dbctl commit (dc=all): 'Set s3 as read-only for maintenance T230783', diff saved to https://phabricator.wikimedia.org/P9157 and previous config saved to /var/cache/conftool/dbconfig/20190924-050034-marostegui.json
  • 05:00 marostegui: Starting s3 failover from db1075 to db1123 - T230783
  • 04:21 marostegui@cumin1001: dbctl commit (dc=all): 'Set weight 0 to db1123 T230783', diff saved to https://phabricator.wikimedia.org/P9156 and previous config saved to /var/cache/conftool/dbconfig/20190924-042121-marostegui.json
  • 04:13 marostegui: Start pre switchover steps - T230783
  • 03:52 chaomodus: rebooted netboxdb[12]001 for kernel upgrade
  • 03:46 crusnov@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
  • 03:45 crusnov@cumin1001: START - Cookbook sre.hosts.downtime
  • 00:43 mutante: db2060 - remove PXE flag boot override - set Boot Device to none

2019-09-23

  • 23:56 dzahn@cumin1001: END (PASS) - Cookbook sre.hosts.ipmi-password-reset (exit_code=0)
  • 23:50 dzahn@cumin1001: Updating IPMI password on 92 hosts - dzahn@cumin1001
  • 23:50 dzahn@cumin1001: START - Cookbook sre.hosts.ipmi-password-reset
  • 23:46 dzahn@cumin1001: END (PASS) - Cookbook sre.hosts.ipmi-password-reset (exit_code=0)
  • 23:43 dzahn@cumin1001: Updating IPMI password on 92 hosts - dzahn@cumin1001
  • 23:43 dzahn@cumin1001: START - Cookbook sre.hosts.ipmi-password-reset
  • 21:32 catrope@deploy1001: Synchronized wmf-config/VariantSettings.php: Syncing no-op change for T232419 (duration: 00m 57s)
  • 19:57 cdanis: T233657 βœ”οΈ cdanis@cp4027.ulsfo.wmnet ~ πŸ•“πŸ΅ sudo -i depool
  • 19:16 urbanecm@deploy1001: Synchronized wmf-config/VariantSettings.php: 2a7a125: Redefine hiwikisource extra namespaces (T233365) (duration: 00m 57s)
  • 19:09 Urbanecm: Going to deploy one more last-time patch
  • 18:51 mholloway-shell@deploy1001: Synchronized wmf-config/InitialiseSettings-labs.php: MachineVision: Update active handler config, take 2 (T233610) (duration: 00m 56s)
  • 18:48 Urbanecm: Morning SWAT done
  • 18:48 urbanecm@deploy1001: Synchronized wmf-config/VariantSettings.php: SWAT: 37fcbdf: Fix: Move hiwikisource extra namespace to extra namespace section (duration: 00m 56s)
  • 18:35 urbanecm@deploy1001: Synchronized wmf-config/VariantSettings.php: SWAT: be2f9d4: Add localized Wikipedia wordmark for szlwiki (T233104) (duration: 00m 55s)
  • 18:30 urbanecm@deploy1001: Synchronized static/images/mobile/copyright/wikipedia-wordmark-szl.svg: SWAT: d397f5f: Add localized Wikipedia wordmark for szlwiki (T233104) (duration: 00m 56s)
  • 18:23 urbanecm@deploy1001: Synchronized wmf-config/VariantSettings.php: SWAT: 8f3f070: Disallow indexing discussion and user pages on eswiki (T233562) (duration: 00m 56s)
  • 18:21 urbanecm@deploy1001: Synchronized wmf-config/throttle.php: SWAT: 6cb2042: New throttle rule for Wikimedia Chile editathon (T233378) (duration: 00m 56s)
  • 18:13 Urbanecm: Security deploy for T207094
  • 18:03 gilles: T233095 Purge articles for all wikis: foreachwiki maintenance/purgeList.php --all --verbose
  • 17:59 gilles@deploy1001: Synchronized php-1.34.0-wmf.23/maintenance/purgeList.php: T233095 Make purgeList.php use getCdnUrls() (duration: 00m 56s)
  • 17:54 mholloway-shell@deploy1001: Synchronized wmf-config/InitialiseSettings-labs.php: MachineVision: Update active handler config (T233610) (duration: 00m 58s)
  • 16:53 elukey@deploy1001: Finished deploy [analytics/refinery@b99647e]: (no justification provided) (duration: 07m 24s)
  • 16:46 elukey@deploy1001: Started deploy [analytics/refinery@b99647e]: (no justification provided)
  • 16:33 Urbanecm: Remove my temporary adminship on bgwikinews (T233322)
  • 16:29 urbanecm@deploy1001: Synchronized wmf-config/VariantSettings.php: 84afa44: Close bgwikinews, but allow sysops to edit (T233322; 2/2) (duration: 00m 56s)
  • 16:27 urbanecm@deploy1001: Synchronized dblists/closed.dblist: 84afa44: Close bgwikinews, but allow sysops to edit (T233322; 1/2) (duration: 00m 58s)
  • 16:26 Urbanecm: mwscript createAndPromote.php --wiki=bgwikinews --sysop --force 'Martin Urbanec' - temporary (T233322)
  • 13:21 moritzm: installing qemu security updates on remaining cloudvirt hosts
  • 12:40 moritzm: rolling restart of graphoid on scb to pick up expat security update
  • 12:05 moritzm: restarting apache on bast5001 to pick up expat security update
  • 11:50 moritzm: restarting Apache/HHVM/PHP on mw1261-mw1265 after Expat security update
  • 11:42 vgutierrez: switching cp4027 from nginx to ats-tls - T231627
  • 11:35 moritzm: installing expat security updates
  • 11:33 awight: EU SWAT finished
  • 11:31 awight@deploy1001: Synchronized php-1.34.0-wmf.23/extensions/FileImporter: SWAT: Add change tags to all FileImport text revisions (T227849) (duration: 00m 57s)
  • 11:23 awight@deploy1001: Synchronized wmf-config/VariantSettings.php: SWAT: Set item terms on write both up to Q40Mio (T225055) (duration: 00m 55s)
  • 11:12 effie: Disable puppet and rolling restart of php7.2-fpm on mw[1321-1333] - T219150
  • 11:11 awight@deploy1001: Synchronized wmf-config/VariantSettings.php: SWAT: Add localized logos for the Zulu Wikipedia (T233424) (duration: 00m 56s)
  • 11:06 awight@deploy1001: Synchronized static/images/project-logos: SWAT: Add localized logos for the Zulu Wikipedia (T233424) (duration: 00m 57s)
  • 11:05 moritzm: uploaded openjdk 8u222-b10-1~deb10u1 to buster-wikimedia/component/jdk8 (bootstrap build, second boron build following) T233604
  • 10:43 jdrewniak@deploy1001: Synchronized portals: Wikimedia Portals Update: Bumping portals to master (T128546) (duration: 00m 57s)
  • 10:43 jdrewniak@deploy1001: Synchronized portals/wikipedia.org/assets: Wikimedia Portals Update: Bumping portals to master (T128546) (duration: 00m 58s)
  • 09:51 jynus: stopping db2102 mariadb to recover db
  • 09:45 Urbanecm: mwscript extensions/CentralAuth/maintenance/fixStuckGlobalRename.php --wiki=loginwiki --logwiki=metawiki 'Ω†ΨΉΩ†ΩˆΨΉΩ‡' 'Ω…Ψ±ΩŠΨ§Ω†Ψ§_ΨΉΩ„ΩŠ' (T233585)
  • 09:44 Urbanecm: mwscript extensions/CentralAuth/maintenance/fixStuckGlobalRename.php --wiki=bnwiki --logwiki=metawiki 'Huangzonghao' 'HUANGZONGHAO' (T233585)
  • 09:38 akosiaris: T218184 upload to apt.wikimedia.org/jessie-wikimedia apertium-dan-nor_1.4.0-1+wmf1, apertium-nno-nob_1.2.0-1+wmf1, apertium-swe-dan_0.8.0-2+wmf1, apertium-swe-nor_0.3.0-2+wmf1
  • 09:02 effie: Disable puppet and rolling restart php-fpm on mw[1312-1317,1339-1347]* - T219150
  • 08:31 elukey@deploy1001: Finished deploy [analytics/refinery@a20a647]: Deploy python2 -> python3 fixes (duration: 07m 26s)
  • 08:24 elukey@deploy1001: Started deploy [analytics/refinery@a20a647]: Deploy python2 -> python3 fixes
  • 08:21 marostegui@cumin1001: dbctl commit (dc=all): 'Fully repool db1123 after kernel and binlog format change', diff saved to https://phabricator.wikimedia.org/P9148 and previous config saved to /var/cache/conftool/dbconfig/20190923-082119-marostegui.json
  • 07:41 godog: swift run swiftrepl without deletes eqiad -> codfw
  • 07:40 godog: swift eqiad-prod: continue ms-be1027 decom - T233289
  • 07:30 marostegui@cumin1001: dbctl commit (dc=all): 'More traffic to db1123 after kernel and binlog format change', diff saved to https://phabricator.wikimedia.org/P9147 and previous config saved to /var/cache/conftool/dbconfig/20190923-073044-marostegui.json
  • 07:15 marostegui@cumin1001: dbctl commit (dc=all): 'Slowly repool db1123 after kernel and binlog format change', diff saved to https://phabricator.wikimedia.org/P9146 and previous config saved to /var/cache/conftool/dbconfig/20190923-071537-marostegui.json
  • 07:08 marostegui: Stop MySQL on db1123 to reboot to change binlog format and kernel - T230783
  • 07:06 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db1123 to change binlog format T230783', diff saved to https://phabricator.wikimedia.org/P9145 and previous config saved to /var/cache/conftool/dbconfig/20190923-070628-marostegui.json
  • 06:50 marostegui@cumin1001: dbctl commit (dc=all): 'Change db1123 and db1078 roles, db1078 will serve logpager and recentchanges, db1123 will just serve general traffic', diff saved to https://phabricator.wikimedia.org/P9144 and previous config saved to /var/cache/conftool/dbconfig/20190923-065056-marostegui.json
  • 05:23 marostegui@deploy1001: Synchronized wmf-config/db-codfw.php: Remove db1066 from config T233071 (duration: 00m 56s)
  • 05:22 marostegui@deploy1001: Synchronized wmf-config/db-eqiad.php: Remove db1066 from config T233071 (duration: 01m 15s)

2019-09-22

  • off: marostegui set s3 master RW

2019-09-21

  • 05:42 shdubsh: re-enable input-kafka-rsyslog-shipper in codfw
  • 05:33 shdubsh: drop input-kafka-rsyslog-shipper in codfw
  • 02:15 bblack: dbproxy1017: executing "systemctl reload haproxy" to recover from false healthcheck failure (network issues) on master
  • 02:14 bblack: dbproxy1016: executing "systemctl reload haproxy" to recover from false healthcheck failure (network issues) on master
  • 01:52 shdubsh: temporarily removing input-kafka-rsyslog-shipper-eqiad/codfw from logstash2004-5-6
  • 01:34 mutante: restarting mobileapps service on scb*
  • 01:34 mutante: restarted mobileapps service on scb1001
  • 01:21 bblack@cumin1001: conftool action : set/pooled=yes; selector: name=cp1087.eqiad.wmnet
  • 01:21 bblack@cumin1001: conftool action : set/pooled=yes; selector: name=cp1088.eqiad.wmnet
  • 01:21 bblack: re-pooling cp108[78] in D2 via confctl
  • 01:14 shdubsh: temporarily removing input-kafka-rsyslog-shipper-eqiad/codfw from logstash1007
  • 01:08 shdubsh: removed input-kafka-rsyslog-shipper-eqiad/codfw from logstash inputs logstash1008 and logstash1009
  • 00:54 mutante: aqs1009 - systemctl restart aqs
  • 00:54 mutante: aqs1006 - systemctl restart aqs
  • 00:48 mutante: aqs1005 - systemctl restart aqs
  • 00:46 shdubsh: restarting logstash on logstash1008 without udp-localhost-eqiad/codfw configs
  • 00:39 bblack@cumin1001: conftool action : set/pooled=no; selector: name=cp1088.eqiad.wmnet
  • 00:38 bblack: depooling confctl things in rack D2
  • 00:38 bblack@cumin1001: conftool action : set/pooled=no; selector: name=cp1087.eqiad.wmnet

2019-09-20

  • 21:30 urbanecm@deploy1001: Synchronized php-1.34.0-wmf.22/extensions/CheckUser: fix T233453 (duration: 00m 56s)
  • 21:29 urbanecm@deploy1001: Synchronized php-1.34.0-wmf.23/extensions/CheckUser: fix T233453 (duration: 00m 58s)
  • 19:26 XioNoX: update eqsin firewall filters - T233268
  • 16:35 krinkle@deploy1001: Synchronized vendor/: ead70240892e9 (duration: 00m 59s)
  • 16:14 XioNoX: update eqiad firewall filters - T233268
  • 16:11 XioNoX: update esams firewall filters - T233268
  • 15:17 Urbanecm: mwscript extensions/CentralAuth/maintenance/fixStuckGlobalRename.php --wiki=bgwiki --logwiki=metawiki 'Newrdkter' 'NRdk' (T233313)
  • 15:03 XioNoX: remove AS-PATH prepending in ams
  • 11:29 @: helmfile [STAGING] Ran 'sync' command on namespace 'restrouter' for release 'staging' .
  • 11:22 @: helmfile [STAGING] Ran 'sync' command on namespace 'restrouter' for release 'staging' .
  • 11:22 @: helmfile [STAGING] Ran 'sync' command on namespace 'restrouter' for release 'staging' .
  • 11:22 @: helmfile [STAGING] Ran 'sync' command on namespace 'restrouter' for release 'staging' .
  • 11:16 @: helmfile [STAGING] Ran 'sync' command on namespace 'restrouter' for release 'staging' .
  • 11:15 @: helmfile [STAGING] Ran 'sync' command on namespace 'restrouter' for release 'staging' .
  • 10:17 jbond@cumin1001: END (PASS) - Cookbook sre.hosts.ipmi-password-reset (exit_code=0)
  • 10:17 jbond@cumin1001: Updating IPMI password on 1 hosts - jbond@cumin1001
  • 10:17 jbond@cumin1001: START - Cookbook sre.hosts.ipmi-password-reset
  • 10:17 jbond@cumin1001: END (PASS) - Cookbook sre.hosts.ipmi-password-reset (exit_code=0)
  • 10:17 jbond@cumin1001: Updating IPMI password on 1 hosts - jbond@cumin1001
  • 10:17 jbond@cumin1001: START - Cookbook sre.hosts.ipmi-password-reset
  • 10:17 jbond@cumin1001: END (FAIL) - Cookbook sre.hosts.ipmi-password-reset (exit_code=99)
  • 10:17 jbond@cumin1001: START - Cookbook sre.hosts.ipmi-password-reset
  • 09:31 jbond@cumin1001: END (PASS) - Cookbook sre.hosts.ipmi-password-reset (exit_code=0)
  • 09:31 jbond@cumin1001: Updating IPMI password on 1 hosts - jbond@cumin1001
  • 09:31 jbond@cumin1001: START - Cookbook sre.hosts.ipmi-password-reset
  • 09:30 jbond@cumin1001: END (ERROR) - Cookbook sre.hosts.ipmi-password-reset (exit_code=97)
  • 09:30 jbond@cumin1001: START - Cookbook sre.hosts.ipmi-password-reset
  • 09:30 jbond@cumin1001: END (FAIL) - Cookbook sre.hosts.ipmi-password-reset (exit_code=99)
  • 09:30 jbond@cumin1001: START - Cookbook sre.hosts.ipmi-password-reset
  • 08:52 jynus: creating new database on m1 "bacula9" T229209
  • 08:28 hashar: Killed zuul-server process on contint2001 which was establishing connections to Gerrit and filling the pool of allowed ssh connections # T233390
  • 08:23 hashar: CI in default since it is somehow no more able to fetch from Gerrit T233390
  • 08:20 hashar: contint1001: upgrade zuul to 2.5.1-wmf10 # T203846
  • 08:12 hashar: contint2001: upgrade zuul to 2.5.1-wmf10 # T203846
  • 07:46 jmm@cumin1001: END (PASS) - Cookbook sre.hosts.decommission (exit_code=False)
  • 07:46 jmm@cumin1001: START - Cookbook sre.hosts.decommission
  • 07:46 jmm@cumin1001: END (PASS) - Cookbook sre.hosts.decommission (exit_code=False)
  • 07:45 jmm@cumin1001: START - Cookbook sre.hosts.decommission
  • 07:29 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.decommission (exit_code=False)
  • 07:28 marostegui@cumin1001: START - Cookbook sre.hosts.decommission
  • 07:14 godog: eqiad-prod: start ms-be1027 decom - T233289
  • 05:29 marostegui@cumin1001: dbctl commit (dc=all): 'Remove db1089 from logpager and contributions after testing, repool back with normal weight on main traffic T223151', diff saved to https://phabricator.wikimedia.org/P9136 and previous config saved to /var/cache/conftool/dbconfig/20190920-052902-marostegui.json
  • 05:27 marostegui: Analyze table enwiki.logging on db2102 - T223151
  • 05:07 marostegui: Remove temporary index on hiwikisource views T219374
  • 01:06 mholloway-shell@deploy1001: Finished deploy [recommendation-api/deploy@a29da76]: Rolling back deployment due to alerts beginning after 0:00 UTC (duration: 02m 51s)
  • 01:05 jforrester@deploy1001: Synchronized php-1.34.0-wmf.23/extensions/TimedMediaHandler/: T233360 Fix Safari 13.0 regression in video playback with audio (duration: 00m 58s)
  • 01:03 mholloway-shell@deploy1001: Started deploy [recommendation-api/deploy@a29da76]: Rolling back deployment due to alerts beginning after 0:00 UTC

2019-09-19

  • 23:23 dzahn@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
  • 23:21 dzahn@cumin1001: START - Cookbook sre.hosts.downtime
  • 22:51 ejegg: updated payments-wiki from adef0e858f to 939b771800
  • 22:34 mutante: gerrit1001 - stopping puppet, removing gerrit IP from interface, rebooting
  • 21:37 niharika29@deploy1001: Synchronized wmf-config/VariantSettings.php: Enable special:mute on testwiki; T231577 (duration: 00m 56s)
  • 20:15 XioNoX: push firewall policies to pfw3-eqiad - T233325
  • 20:07 XioNoX: push firewall policies to pfw3-codfw - T233325
  • 19:25 twentyafterfour@deploy1001: rebuilt and synchronized wikiversions files: all wikis to 1.34.0-wmf.23 refs T220748
  • 19:02 twentyafterfour: There are currently no blockers for T220748 so I am preparing to deploy 1.34.0-wmf.23 to all wikis.
  • 18:54 dzahn@cumin1001: conftool action : set/pooled=yes; selector: name=mw1298.eqiad.wmnet
  • 18:14 XioNoX: add TCP-MSS 1436 to cr2-eqiad external interfaces - T232602
  • 18:12 XioNoX: add TCP-MSS 1436 to cr1-eqiad external interfaces - T232602
  • 18:01 bblack: lvs2004 - restart pybal for deploy/test of https://gerrit.wikimedia.org/r/#/c/operations/puppet/+/536324/
  • 17:55 mutante: puppetmaster1001 - add mcrouter cert for mw1298.eqiad.wmnet (T192457)
  • 17:52 @: helmfile [STAGING] Ran 'apply' command on namespace 'wikifeeds' for release 'staging' .
  • 17:48 krinkle@deploy1001: Synchronized php-1.34.0-wmf.23/extensions/AbuseFilter/includes/: T156095, 32cf50453cd (duration: 01m 04s)
  • 17:47 arlolra@deploy1001: Finished deploy [parsoid/deploy@77630c5]: Updating Parsoid to 6bf23c2 (duration: 08m 52s)
  • 17:43 Krinkle: Move whisper/MediaWiki/wanobjectcache/revision_row_1/29 to whisper/MediaWiki/wanobjectcache/revision_row_1_29 on graphite1004 and graphite2003 (T232907)
  • 17:38 arlolra@deploy1001: Started deploy [parsoid/deploy@77630c5]: Updating Parsoid to 6bf23c2
  • 17:27 bblack: lvs2006 - restart pybal for deploy/test of https://gerrit.wikimedia.org/r/#/c/operations/puppet/+/536324/
  • 17:27 krinkle@deploy1001: Synchronized php-1.34.0-wmf.23/includes/libs/objectcache/wancache: 2e910c9, T232907 (duration: 01m 03s)
  • 17:23 bblack: lvs2005 - restart pybal for deploy/test of https://gerrit.wikimedia.org/r/#/c/operations/puppet/+/536324/
  • 17:19 bblack: lvs2006 - restart pybal for deploy/test of https://gerrit.wikimedia.org/r/#/c/operations/puppet/+/536324/
  • 17:16 bblack: lvs200[456] - puppet disabled for https://gerrit.wikimedia.org/r/536324 deploy/test
  • 17:14 mholloway-shell@deploy1001: Finished deploy [mobileapps/deploy@69b3737]: Update mobileapps to cfc3062 (duration: 05m 42s)
  • 17:08 mholloway-shell@deploy1001: Started deploy [mobileapps/deploy@69b3737]: Update mobileapps to cfc3062
  • 16:31 _joe_: removed manually the purge_checkuser cron from mwmaint1002, to have puppet recreate it
  • 16:20 ejegg: updated fundraising CiviCRM from 90db6cb5a1 to 5def62ab05
  • 16:15 papaul: shutting down scs-a1-codfw for replacement
  • 15:26 moritzm: repooling restbase2012 after completed Cassandra bootstrap T224553
  • 15:25 jmm@puppetmaster1001: conftool action : set/pooled=no; selector: cluster=restbase,service=cassandra,dc=codfw,name=restbase2012.codfw.wmnet
  • 15:25 jmm@puppetmaster1001: conftool action : set/pooled=yes; selector: cluster=restbase,service=restbase-backend,dc=codfw,name=restbase2012.codfw.wmnet
  • 15:25 jmm@puppetmaster1001: conftool action : set/pooled=yes; selector: cluster=restbase,service=restbase-ssl,dc=codfw,name=restbase2012.codfw.wmnet
  • 15:25 jmm@puppetmaster1001: conftool action : set/pooled=yes; selector: cluster=restbase,service=restbase,dc=codfw,name=restbase2012.codfw.wmnet
  • 15:06 jmm@cumin1001: END (PASS) - Cookbook sre.hosts.decommission (exit_code=False)
  • 15:05 jmm@cumin1001: START - Cookbook sre.hosts.decommission
  • 14:56 mholloway-shell@deploy1001: Finished deploy [mobileapps/deploy@16a6af1]: Increase num_workers to (ncpu * 1.5) (T229286) (duration: 05m 39s)
  • 14:51 mholloway-shell@deploy1001: Started deploy [mobileapps/deploy@16a6af1]: Increase num_workers to (ncpu * 1.5) (T229286)
  • 14:47 mobrovac@deploy1001: Finished deploy [restbase/deploy@44f4c79]: Remove the TID suffix in the ETag, if present, take #3 (duration: 10m 42s)
  • 14:37 mobrovac@deploy1001: Started deploy [restbase/deploy@44f4c79]: Remove the TID suffix in the ETag, if present, take #3
  • 14:36 mobrovac@deploy1001: Finished deploy [restbase/deploy@44f4c79]: Remove the TID suffix in the ETag, if present, take #2 (duration: 08m 24s)
  • 14:31 mobrovac: bootstrap restbase2012-c -- T224553
  • 14:28 mobrovac@deploy1001: Started deploy [restbase/deploy@44f4c79]: Remove the TID suffix in the ETag, if present, take #2
  • 14:28 mobrovac@deploy1001: deploy aborted: Remove the TID suffix in the ETag, if present - T230272 (duration: 11m 20s)
  • 14:28 sbassett: Deployed security patch for T224203 (php-1.34.0-wmf.23)
  • 14:19 sbassett: Deployed security patch for T224203
  • 14:19 jmm@cumin1001: END (PASS) - Cookbook sre.hosts.decommission (exit_code=False)
  • 14:18 jmm@cumin1001: START - Cookbook sre.hosts.decommission
  • 14:17 mobrovac@deploy1001: Started deploy [restbase/deploy@44f4c79]: Remove the TID suffix in the ETag, if present - T230272
  • 13:54 mholloway-shell@deploy1001: Finished deploy [recommendation-api/deploy@c8abb0f]: Article recommendation API: replace WDQS with MW API (T216750) (duration: 03m 06s)
  • 13:51 mholloway-shell@deploy1001: Started deploy [recommendation-api/deploy@c8abb0f]: Article recommendation API: replace WDQS with MW API (T216750)
  • 13:43 reedy@deploy1001: Synchronized php-1.34.0-wmf.23/extensions/Translate: T233308 (duration: 01m 07s)
  • 13:14 moritzm: powercycling mw1300
  • 13:12 mobrovac: bootstrap restbase2012-b -- T224553
  • 13:08 marostegui@cumin1001: dbctl commit (dc=all): 'Pool db1089 into contributions service T223151', diff saved to https://phabricator.wikimedia.org/P9133 and previous config saved to /var/cache/conftool/dbconfig/20190919-130848-marostegui.json
  • 13:01 mobrovac@deploy1001: Finished deploy [restbase/deploy@7f4b7f7]: Start using RESTBase built on Stretch - T224553 (duration: 21m 38s)
  • 12:39 mobrovac@deploy1001: Started deploy [restbase/deploy@7f4b7f7]: Start using RESTBase built on Stretch - T224553
  • 12:36 @: helmfile [STAGING] Ran 'sync' command on namespace 'restrouter' for release 'staging' .
  • 11:48 mobrovac: bootstrap restbase2012-a -- T224553
  • 11:32 Urbanecm: EU SWAT done
  • 11:26 urbanecm@deploy1001: Synchronized wmf-config/throttle.php: SWAT: 199a05c: Add new throttle rule for Czech wiki course (T233199) (duration: 01m 01s)
  • 11:23 urbanecm@deploy1001: Synchronized wmf-config/VariantSettings.php: SWAT: eab7c6a: c80f026: GrowthExperiments: GrowthExperiments: Enable Special:Homepage for euwiki, GrowthExperiments: Enable help panel for euwiki (T233066, T233065) (duration: 01m 05s)
  • 09:54 urbanecm@deploy1001: Synchronized php-1.34.0-wmf.22/extensions/CheckUser: security T207094 (duration: 01m 02s)
  • 09:53 urbanecm@deploy1001: sync-file aborted: security T207094 (duration: 00m 28s)
  • 09:51 urbanecm@deploy1001: Synchronized php-1.34.0-wmf.23/extensions/CheckUser: security T207094 (duration: 01m 05s)
  • 09:22 godog: power back on ms-be1027, found with power off
  • 08:31 urbanecm@deploy1001: Synchronized wmf-config/abusefilter.php: 393441b: Change configuration of AbuseFilter extension for enwikisource (T231750) (duration: 01m 04s)
  • 08:22 jynus@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
  • 08:21 urbanecm@deploy1001: Synchronized php-1.34.0-wmf.23/extensions/CheckUser/: revert T207094 (duration: 01m 04s)
  • 08:20 jynus@cumin1001: START - Cookbook sre.hosts.downtime
  • 08:14 urbanecm@deploy1001: Synchronized php-1.34.0-wmf.23/extensions/CheckUser/: security T207094 (duration: 01m 06s)
  • 08:11 marostegui: Rename tables on db1133:labspuppet T233281
  • 07:48 jmm@cumin2001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
  • 07:48 jmm@cumin2001: START - Cookbook sre.hosts.downtime
  • 07:40 moritzm: rebooting failoid1001 for kernel update
  • 07:39 jmm@cumin2001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
  • 07:39 jmm@cumin2001: START - Cookbook sre.hosts.downtime
  • 07:29 jmm@cumin2001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
  • 07:27 jmm@cumin2001: START - Cookbook sre.hosts.downtime
  • 07:22 marostegui@cumin1001: dbctl commit (dc=all): 'Give more logpager weight to db1089 T223151', diff saved to https://phabricator.wikimedia.org/P9131 and previous config saved to /var/cache/conftool/dbconfig/20190919-072234-marostegui.json
  • 07:01 moritzm: reimaging restbase2012 to stretch T224553
  • 06:18 marostegui: Sanitize hiwikisource on db1124:3313 and db2094:3313 T219374
  • 06:04 marostegui@cumin1001: dbctl commit (dc=all): 'Temporarily pool db1089 into enwiki logpager T223151', diff saved to https://phabricator.wikimedia.org/P9130 and previous config saved to /var/cache/conftool/dbconfig/20190919-060440-marostegui.json
  • 05:11 marostegui: Stop MySQL on db2055 for decommission T233186
  • 05:11 marostegui: Remove db2055 from tendril and zarcillo T233186

2019-09-18

  • 23:18 krinkle@deploy1001: Synchronized php-1.34.0-wmf.23/extensions/MobileFrontend/resources/dist/: T233260, 1667ed9 (duration: 01m 04s)
  • 22:58 cmjohnson1: enabled asw2-c-eqiad interface xe-2/0/45
  • 22:40 krinkle@deploy1001: Synchronized php-1.34.0-wmf.23/resources/Resources.php: d6dadfd (duration: 01m 03s)
  • 22:37 krinkle@deploy1001: Synchronized php-1.34.0-wmf.23/extensions/AbuseFilter/includes/: T156095, ff44043efa59e9 (duration: 01m 05s)
  • 22:13 cmjohnson1: disabling asw2-c-eqiad xe-2/0/45 - cr1-eqiad to replace optic T233265
  • 21:54 gilles: T233095 Purging all eswiki articles (both desktop and mobile this time)
  • 21:53 gilles@deploy1001: Synchronized php-1.34.0-wmf.22/maintenance/purgeList.php: T233095 Make purgeList.php use getCdnUrls() (duration: 01m 04s)
  • 21:13 XioNoX: enable damping on primary codfw-eqiad link - T196432
  • 21:09 XioNoX: enable damping on codfw-ulsfo link - T196432
  • 20:50 jforrester@deploy1001: Synchronized wmf-config/CommonSettings.php: No longer load InitialiseSettings at all in CommonSettings (duration: 01m 03s)
  • 20:43 jforrester@deploy1001: Synchronized wmf-config/CommonSettings.php: Quick fix for wmfLoadInitialiseSettings() (duration: 01m 03s)
  • 20:40 jforrester@deploy1001: scap failed: average error rate on 9/11 canaries increased by 10x (rerun with --force to override this check, see https://logstash.wikimedia.org/goto/db09a36be5ed3e81155041f7d46ad040 for details)
  • 20:23 jforrester@deploy1001: Synchronized wmf-config/CommonSettings.php: CommonSettings: Factor out call to InitialiseSettings.php (duration: 01m 04s)
  • 20:18 @: helmfile [EQIAD] Ran 'apply' command on namespace 'sessionstore' for release 'production' .
  • 20:18 jforrester@deploy1001: Synchronized multiversion/MWConfigCacheGenerator.php: Variant configuration: Drop suport for serialised PHP (duration: 01m 04s)
  • 20:15 jforrester@deploy1001: Synchronized wmf-config/CommonSettings.php: Variant configuration: Never write to serialised PHP T223602 (duration: 01m 04s)
  • 20:15 @: helmfile [CODFW] Ran 'apply' command on namespace 'sessionstore' for release 'production' .
  • 20:11 @: helmfile [STAGING] Ran 'apply' command on namespace 'sessionstore' for release 'staging' .
  • 20:07 jforrester@deploy1001: Synchronized wmf-config/CommonSettings.php: T208246 Enforce a 10-byte password for privileged users (duration: 01m 04s)
  • 19:57 urandom: decommissioning Cassandra, restbase2012-c -- T224553
  • 19:56 dzahn@cumin1001: END (PASS) - Cookbook sre.hosts.decommission (exit_code=0)
  • 19:56 dzahn@cumin1001: START - Cookbook sre.hosts.decommission
  • 19:42 gilles: T233095 Purging all pages on eswiki
  • 19:27 joal@deploy1001: Finished deploy [analytics/aqs/deploy@bc9dde1]: Regular deploy - analytics weekly train - Second retry after fix (duration: 03m 40s)
  • 19:24 mutante: ganeti1001 - deleting krypton.eqiad.wmnet - decom T231546
  • 19:23 joal@deploy1001: Started deploy [analytics/aqs/deploy@bc9dde1]: Regular deploy - analytics weekly train - Second retry after fix
  • 19:14 twentyafterfour@deploy1001: Synchronized php: group1 wikis to 1.34.0-wmf.23 refs T220748 (duration: 01m 04s)
  • 19:13 twentyafterfour@deploy1001: rebuilt and synchronized wikiversions files: group1 wikis to 1.34.0-wmf.23 refs T220748
  • 19:10 dzahn@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
  • 19:08 dzahn@cumin1001: START - Cookbook sre.hosts.downtime
  • 19:07 twentyafterfour: There appear to be no blockers on T220748 so I'll proceed with deploying 1.34.0-wmf.23 to group 1.
  • 19:01 joal@deploy1001: Finished deploy [analytics/aqs/deploy@5b011d1]: Regular deploy - analytics weekly train - Retry after fix (duration: 02m 12s)
  • 18:59 joal@deploy1001: Started deploy [analytics/aqs/deploy@5b011d1]: Regular deploy - analytics weekly train - Retry after fix
  • 18:55 joal@deploy1001: Finished deploy [analytics/aqs/deploy@5b011d1]: Regular deploy - analytics weekly train (duration: 01m 05s)
  • 18:54 joal@deploy1001: Started deploy [analytics/aqs/deploy@5b011d1]: Regular deploy - analytics weekly train
  • 18:46 XioNoX: remove `border-in4 term ddos-0906` from all routers
  • 17:53 Amir1: Creating hiwikisource is done
  • 17:50 urandom: decommissioning Cassandra, restbase2012-b -- T224553
  • 17:48 ladsgroup@deploy1001: Synchronized wmf-config/interwiki.php: Update interwiki cache (duration: 02m 32s)
  • 17:45 ladsgroup@deploy1001: Synchronized static/images/project-logos/: Add hiwikisource logos (T218155) (duration: 01m 04s)
  • 17:43 ladsgroup@deploy1001: Synchronized wmf-config/VariantSettings.php: Add hiwikisource (T218155) (duration: 01m 05s)
  • 17:40 ladsgroup@deploy1001: Synchronized wmf-config/InitialiseSettings.php: Add hiwikisource (T218155) (duration: 01m 04s)
  • 17:38 Amir1: manual write on hiwikisource "wikiadmin@10.64.0.205(hiwikisource)> update text set old_text = 'DB://cluster25/1';" (T218155)
  • 17:33 Amir1: mwscript maintenance/createAndPromote.php --wiki=hiwikisource --force --sysop Ladsgroup (T218155)
  • 17:28 ladsgroup@deploy1001: rebuilt and synchronized wikiversions files: (no justification provided)
  • 17:22 ladsgroup@deploy1001: Synchronized dblists: (no justification provided) (duration: 01m 06s)
  • 17:22 Jeff_Green: authdns-update to deploy DNS for new fundraising host
  • 17:03 mutante: ganeti2004 - resetting DRAC in an attempt to make IPMI work again
  • 17:00 Urbanecm: Morning SWAT done
  • 16:48 urbanecm@deploy1001: Synchronized wmf-config/VariantSettings.php: SWAT: Enable DNS blacklist on testwiki temporarily (T230822) (duration: 01m 03s)
  • 16:43 Urbanecm: 8340be9 sync is for T230822, mistakenly inserted `test` instead of the task number
  • 16:42 @: helmfile [CODFW] Ran 'apply' command on namespace 'sessionstore' for release 'production' .
  • 16:42 urbanecm@deploy1001: Synchronized wmf-config/VariantSettings.php: SWAT: 8340be9: Enable logging for BlockManager channel at info level (test) (duration: 01m 04s)
  • 16:36 @: helmfile [EQIAD] Ran 'apply' command on namespace 'sessionstore' for release 'production' .
  • 16:35 urbanecm@deploy1001: Synchronized wmf-config/VariantSettings.php: SWAT: dc1298d: Add Draft and Draft_talk aliases for wikis that define draft namespace (T223472) (duration: 01m 02s)
  • 16:31 urbanecm@deploy1001: Synchronized wmf-config/VariantSettings.php: SWAT: 6e59651: Disable FundraiserLandingPage extension on test.wikipedia.org (T203020) (duration: 01m 04s)
  • 16:26 Urbanecm: Purge https://en.wikipedia.org/static/images/project-logos/tewikisource.png (T232065)
  • 16:25 urbanecm@deploy1001: Synchronized wmf-config/VariantSettings.php: SWAT: 7c987fc: Change Telugu Wikisource Logo (T232065; 2/2) (duration: 01m 06s)
  • 16:24 urbanecm@deploy1001: Synchronized static/images/project-logos/: SWAT: 7c987fc: Change Telugu Wikisource Logo (T232065; 1/2) (duration: 01m 05s)
  • 16:18 urbanecm@deploy1001: Synchronized wmf-config/VariantSettings.php: SWAT: 817d679: Turn on EventLogging at 100% for DonateWiki (T233145) (duration: 01m 04s)
  • 16:05 urbanecm@deploy1001: Synchronized wmf-config/VariantSettings.php: SWAT: ba30276: Add suppressredirect right to filemovers on bnwiki (T233137) (duration: 01m 05s)
  • 15:55 moritzm: repooling restbase2011 after reimage/bootstrap
  • 15:53 urandom: decommissioning Cassandra, restbase2012-a -- T224553
  • 15:06 @: helmfile [STAGING] Ran 'sync' command on namespace 'restrouter' for release 'staging' .
  • 14:59 akosiaris@puppetmaster1001: conftool action : set/pooled=yes; selector: service=restbase-backend
  • 14:21 elukey@cumin1001: END (FAIL) - Cookbook sre.hosts.downtime (exit_code=99)
  • 14:16 elukey@cumin1001: START - Cookbook sre.hosts.downtime
  • 13:52 @: helmfile [CODFW] Ran 'apply' command on namespace 'wikifeeds' for release 'production' .
  • 13:50 @: helmfile [STAGING] Ran 'apply' command on namespace 'wikifeeds' for release 'staging' .
  • 13:41 joal@deploy1001: Finished deploy [analytics/refinery@ca30c4e]: Regular analytics weekly train (duration: 05m 28s)
  • 13:40 elukey@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
  • 13:38 elukey@cumin1001: START - Cookbook sre.hosts.downtime
  • 13:36 joal@deploy1001: Started deploy [analytics/refinery@ca30c4e]: Regular analytics weekly train
  • 13:26 elukey@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
  • 13:23 elukey@cumin1001: START - Cookbook sre.hosts.downtime
  • 13:11 hashar: Restarting Jenkins, starting Zuul
  • 12:56 marostegui: Deploy schema change on the following s6 hosts: db1088, db1093, db1096, db1098, db1139, dbstore1005 - T231172
  • 12:52 hashar: gracefully stopping Zuul (kill SIGUSR1) to prepare for Jenkins restart
  • 12:40 marostegui: Deploy schema change on s6 codfw master with replication T231172
  • 12:18 vgutierrez: restarting ats-tls to avoid spreading Proxy-Connection header - T233205
  • 12:03 marostegui: Stop haproxy on dbproxy1006 - T233207
  • 11:29 mobrovac: bootstrap restbase2011-c -- T224553
  • 11:27 awight: EU SWAT complete
  • 11:27 awight@deploy1001: Synchronized wmf-config/CommonSettings-labs.php: SWAT: Enable FileImport source wiki editing (T228851) (duration: 00m 59s)
  • 11:25 awight@deploy1001: Synchronized wmf-config/CommonSettings.php: SWAT: Enable FileImport source wiki editing (T228851) (duration: 01m 03s)
  • 11:14 awight@deploy1001: Synchronized wmf-config/CommonSettings.php: SWAT: NowCommons test & test2wiki configuration (T228851) (duration: 01m 15s)
  • 10:17 onimisionipe: force relocation of shards for eqiad search(chi) cluster
  • 10:16 moritzm: restarting postgres on puppetdb1002/2002 after updating permissions for replication user
  • 10:00 mobrovac: bootstrap restbase2011-b -- T224553
  • 09:37 godog: run swiftrepl eqiad -> codfw on all containers, no deletes
  • 09:37 effie: upgrading netmon* to PHP 7.2.22 T230024
  • 09:35 godog: run swiftrepl eqiad -> codfw for transcoded containers
  • 08:59 jynus@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
  • 08:57 jynus@cumin1001: START - Cookbook sre.hosts.downtime
  • 08:57 marostegui@cumin1001: dbctl commit (dc=all): 'Repool db2089:3316 after schema change', diff saved to https://phabricator.wikimedia.org/P9125 and previous config saved to /var/cache/conftool/dbconfig/20190918-085721-marostegui.json
  • 08:22 mobrovac: bootstrap restbase2011-a -- T224553
  • 07:43 elukey@cumin1001: END (PASS) - Cookbook sre.hadoop.reboot-workers (exit_code=0)
  • 07:19 jmm@cumin2001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
  • 07:17 jmm@cumin2001: START - Cookbook sre.hosts.downtime
  • 06:43 moritzm: reimaging restbase2011 to stretch T224553
  • 06:04 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db2089:3316 for schema change', diff saved to https://phabricator.wikimedia.org/P9124 and previous config saved to /var/cache/conftool/dbconfig/20190918-060401-marostegui.json
  • 05:58 marostegui: Deploy schema change on db2097:3316 - T233135
  • 05:47 marostegui@cumin1001: dbctl commit (dc=all): 'Repool host after onsite checks T233184', diff saved to https://phabricator.wikimedia.org/P9123 and previous config saved to /var/cache/conftool/dbconfig/20190918-054755-marostegui.json
  • 05:33 marostegui@deploy1001: Synchronized wmf-config/db-eqiad.php: Remove db2055 from config T233186 (duration: 01m 04s)
  • 05:31 marostegui@deploy1001: Synchronized wmf-config/db-codfw.php: Remove db2055 from config T233186 (duration: 01m 06s)
  • 05:03 marostegui: Start MySQL on db2127 T233184
  • 03:55 krinkle@deploy1001: Synchronized php-1.34.0-wmf.22/resources/src/mediawiki.util/: 0333729e, ccfe88241 (duration: 01m 07s)

2019-09-17

  • 23:26 twentyafterfour@deploy1001: rebuilt and synchronized wikiversions files: group0 wikis to 1.34.0-wmf.23 refs T220748
  • 23:20 krinkle@deploy1001: Synchronized php-1.34.0-wmf.22/extensions/VisualEditor/extension.json: aae62a8 (duration: 01m 05s)
  • 22:43 dzahn@cumin1001: END (PASS) - Cookbook sre.hosts.ipmi-password-reset (exit_code=0)
  • 22:43 dzahn@cumin1001: Updating IPMI password on 6 hosts - dzahn@cumin1001
  • 22:43 dzahn@cumin1001: START - Cookbook sre.hosts.ipmi-password-reset
  • 22:09 jforrester@deploy1001: Synchronized wmf-config/CommonSettings.php: Add comment about MinimumPasswordLengthToLogin (duration: 01m 03s)
  • 21:45 cstone: civicrm revision changed from 45dbfdb96f to 90db6cb5a1
  • 21:45 tzatziki: removed one file for legal compliance
  • 21:12 XioNoX: delete AS13335 91.198.174.0/24 RPKI/ROA
  • 21:10 dzahn@cumin1001: END (PASS) - Cookbook sre.hosts.ipmi-password-reset (exit_code=0)
  • 21:10 dzahn@cumin1001: Updating IPMI password on 1 hosts - dzahn@cumin1001
  • 21:10 dzahn@cumin1001: START - Cookbook sre.hosts.ipmi-password-reset
  • 21:08 @: helmfile [EQIAD] Ran 'sync' command on namespace 'wikifeeds' for release 'production' .
  • 21:07 twentyafterfour@deploy1001: Finished scap: testwikis to 1.34.0-wmf.23 refs T220748 (duration: 24m 55s)
  • 21:01 XioNoX: enable interface damping on primary eqiad-esams link (eqiad side) - T196432
  • 20:57 dzahn@cumin1001: END (PASS) - Cookbook sre.hosts.ipmi-password-reset (exit_code=0)
  • 20:47 dzahn@cumin1001: Updating IPMI password on 660 hosts - dzahn@cumin1001
  • 20:46 dzahn@cumin1001: START - Cookbook sre.hosts.ipmi-password-reset
  • 20:42 twentyafterfour@deploy1001: Started scap: testwikis to 1.34.0-wmf.23 refs T220748
  • 20:39 @: helmfile [EQIAD] Ran 'sync' command on namespace 'wikifeeds' for release 'production' .
  • 20:31 krinkle@deploy1001: Synchronized php-1.34.0-wmf.22/resources/src/mediawiki.Title/phpCharToUpper.json: 8372dcd (duration: 00m 56s)
  • 20:30 krinkle@deploy1001: Synchronized php-1.34.0-wmf.22/resources/src/mediawiki.Title/Title.js: 8372dcd (duration: 02m 08s)
  • 20:21 dzahn@cumin1001: END (PASS) - Cookbook sre.hosts.ipmi-password-reset (exit_code=0)
  • 20:19 dzahn@cumin1001: Updating IPMI password on 21 hosts - dzahn@cumin1001
  • 20:18 dzahn@cumin1001: START - Cookbook sre.hosts.ipmi-password-reset
  • 20:16 dzahn@cumin1001: END (PASS) - Cookbook sre.hosts.ipmi-password-reset (exit_code=0)
  • 20:15 tzatziki: changing email for User:Olag
  • 20:12 dzahn@cumin1001: Updating IPMI password on 18 hosts - dzahn@cumin1001
  • 20:11 dzahn@cumin1001: START - Cookbook sre.hosts.ipmi-password-reset
  • 20:07 dzahn@cumin1001: END (PASS) - Cookbook sre.hosts.ipmi-password-reset (exit_code=0)
  • 20:04 dzahn@cumin1001: Updating IPMI password on 29 hosts - dzahn@cumin1001
  • 20:04 dzahn@cumin1001: START - Cookbook sre.hosts.ipmi-password-reset
  • 19:53 dzahn@cumin1001: END (PASS) - Cookbook sre.hosts.ipmi-password-reset (exit_code=0)
  • 19:32 ejegg: updated payments-wiki from fc82318180 to adef0e858f
  • 19:26 dzahn@cumin1001: Updating IPMI password on 543 hosts - dzahn@cumin1001
  • 19:25 dzahn@cumin1001: START - Cookbook sre.hosts.ipmi-password-reset
  • 19:22 dzahn@cumin1001: END (PASS) - Cookbook sre.hosts.ipmi-password-reset (exit_code=0)
  • 19:22 dzahn@cumin1001: Updating IPMI password on 8 hosts - dzahn@cumin1001
  • 19:22 dzahn@cumin1001: START - Cookbook sre.hosts.ipmi-password-reset
  • 19:20 dzahn@cumin1001: END (PASS) - Cookbook sre.hosts.ipmi-password-reset (exit_code=0)
  • 19:20 dzahn@cumin1001: Updating IPMI password on 2 hosts - dzahn@cumin1001
  • 19:20 dzahn@cumin1001: START - Cookbook sre.hosts.ipmi-password-reset
  • 19:14 dzahn@cumin1001: END (PASS) - Cookbook sre.hosts.ipmi-password-reset (exit_code=0)
  • 19:14 dzahn@cumin1001: Updating IPMI password on 8 hosts - dzahn@cumin1001
  • 19:14 dzahn@cumin1001: START - Cookbook sre.hosts.ipmi-password-reset
  • 19:08 twentyafterfour: Branch cut is in progress for 1.34.0-wmf.23
  • 19:05 urandom: decommissioning Cassandra, restbase2011-c -- T224553
  • 18:06 papaul: upgrading firmware on scs1-a1-codfw
  • 17:18 ejegg: updated SmashPig payments listener from a0151434f4 to dc0c6b208b
  • 17:09 urandom: decommissioning Cassandra, restbase2011-b -- T224553
  • 17:08 elukey@cumin1001: START - Cookbook sre.hadoop.reboot-workers
  • 17:00 @: helmfile [EQIAD] Ran 'apply' command on namespace 'wikifeeds' for release 'production' .
  • 16:59 @: helmfile [EQIAD] Ran 'apply' command on namespace 'wikifeeds' for release 'production' .
  • 16:21 elukey@cumin1001: END (PASS) - Cookbook sre.hadoop.reboot-workers (exit_code=0)
  • 16:04 jbond42: run octocatalog-diff from elnath with current facts
  • 15:55 reedy@deploy1001: Synchronized wmf-config/CommonSettings.php: Revert Set MinimumPasswordLengthToLogin to 10 for all prived groups, not just +staff (duration: 00m 55s)
  • 15:53 reedy@deploy1001: sync-file aborted: (no justification provided) (duration: 00m 01s)
  • 15:53 reedy@deploy1001: sync-file aborted: (no justification provided) (duration: 00m 00s)
  • 15:39 elukey@cumin1001: START - Cookbook sre.hadoop.reboot-workers
  • 15:38 urandom: decommissioning Cassandra, restbase2011-a -- T224553
  • 15:17 marostegui@cumin1001: dbctl commit (dc=all): 'Host down for on-site maintenance', diff saved to https://phabricator.wikimedia.org/P9120 and previous config saved to /var/cache/conftool/dbconfig/20190917-151714-marostegui.json
  • 15:16 marostegui: Stop MySQL on db2127 and shut the host down for onsite maintenance
  • 14:52 elukey@cumin1001: END (FAIL) - Cookbook sre.hadoop.reboot-workers (exit_code=99)
  • 14:52 elukey@cumin1001: START - Cookbook sre.hadoop.reboot-workers
  • 14:51 anomie@mwmaint1002: Running cleanupRevActorPage.php on wikitech for T232464
  • 14:51 anomie@mwmaint1002: Running cleanupRevActorPage.php on section 8 wikis for T232464
  • 14:51 anomie@mwmaint1002: Running cleanupRevActorPage.php on section 7 wikis for T232464
  • 14:51 anomie@mwmaint1002: Running cleanupRevActorPage.php on section 6 wikis for T232464
  • 14:50 anomie@mwmaint1002: Running cleanupRevActorPage.php on section 5 wikis for T232464
  • 14:50 anomie@mwmaint1002: Running cleanupRevActorPage.php on section 4 wikis for T232464
  • 14:50 anomie@mwmaint1002: Running cleanupRevActorPage.php on remaining section 3 wikis for T232464
  • 14:50 anomie@mwmaint1002: Running cleanupRevActorPage.php on section 2 wikis for T232464
  • 14:50 anomie@mwmaint1002: Running cleanupRevActorPage.php on section 1 wikis for T232464
  • 14:48 anomie@mwmaint1002: Running cleanupRevActorPage.php on test wikis and mediawikiwiki for T232464
  • 14:39 anomie@deploy1001: Synchronized php-1.34.0-wmf.22/includes/MergeHistory.php: Backport MergeHistory fix for T232464 gerrit:537436 (duration: 00m 54s)
  • 14:35 ottomata: bouncing eventstreams service on scb hosts
  • 14:15 @: helmfile [EQIAD] Ran 'apply' command on namespace 'cxserver' for release 'production' .
  • 14:14 @: helmfile [CODFW] Ran 'apply' command on namespace 'cxserver' for release 'production' .
  • 14:13 @: helmfile [STAGING] Ran 'apply' command on namespace 'cxserver' for release 'staging' .
  • 14:03 herron: migrating kafka1003 to kafka-main1003 T225005
  • 14:00 jbond42: forcing puppet run
  • 14:00 bblack: lvs1015 - restart pybal to remove runcommands - https://gerrit.wikimedia.org/r/#/c/operations/puppet/+/536581/
  • 13:59 bblack: lvs2003 - restart pybal to remove runcommands - https://gerrit.wikimedia.org/r/#/c/operations/puppet/+/536581/
  • 13:57 @: helmfile [CODFW] Ran 'apply' command on namespace 'cxserver' for release 'production' .
  • 13:52 bblack: lvs1016 - restart pybal to remove runcommands - https://gerrit.wikimedia.org/r/#/c/operations/puppet/+/536581/
  • 13:52 bblack: lvs2006 - restart pybal to remove runcommands - https://gerrit.wikimedia.org/r/#/c/operations/puppet/+/536581/
  • 13:46 @: helmfile [STAGING] Ran 'apply' command on namespace 'cxserver' for release 'staging' .
  • 13:45 moritzm: repooling restbase2010 after reimage/completed bootstrap
  • 13:21 marostegui@cumin1001: dbctl commit (dc=all): 'Repool db1130 db1104 db1085 db1086 after PDU maintenance - T227539', diff saved to https://phabricator.wikimedia.org/P9117 and previous config saved to /var/cache/conftool/dbconfig/20190917-132102-marostegui.json
  • 13:17 godog: force-run puppet in eqiad to update exported resources
  • 13:14 jbond42: currently running octocatalog-diff for all hosts from elnath
  • 13:02 marostegui: Start replication on db1130 db1104 db1085 db1086 after PDU maintenance is completed - T227539
  • 13:01 cmjohnson1: The PDU swap in rack B3 eqiad is finished.
  • 12:30 mobrovac: bootstrap restbase2010-c - T224553
  • 11:32 Urbanecm: EU SWAT is done
  • 11:31 dzahn@cumin1001: END (PASS) - Cookbook sre.hosts.ipmi-password-reset (exit_code=0)
  • 11:31 dzahn@cumin1001: Updating IPMI password on 8 hosts - dzahn@cumin1001
  • 11:31 urbanecm@deploy1001: Synchronized wmf-config/VariantSettings.php: SWAT: 290e207: Add channels for the Translate and TranslationsNotification extension (T221119, T144780, T143073) (duration: 00m 56s)
  • 11:30 dzahn@cumin1001: START - Cookbook sre.hosts.ipmi-password-reset
  • 11:30 dzahn@cumin1001: END (FAIL) - Cookbook sre.hosts.ipmi-password-reset (exit_code=99)
  • 11:30 dzahn@cumin1001: START - Cookbook sre.hosts.ipmi-password-reset
  • 11:29 dzahn@cumin1001: END (ERROR) - Cookbook sre.hosts.ipmi-password-reset (exit_code=97)
  • 11:29 dzahn@cumin1001: START - Cookbook sre.hosts.ipmi-password-reset
  • 11:27 awight@deploy1001: Synchronized php-1.34.0-wmf.22/extensions/FileImporter: SWAT: Use https rather than protcol-relative remote API URLs (T228851) (duration: 00m 58s)
  • 11:24 cmjohnson1: commencing pdu swap rack b3 eqiad T227539
  • 11:22 awight@deploy1001: Synchronized wmf-config/VariantSettings.php: SWAT: Update ORES filter threshold configuration for new huwiki model (T230031) (duration: 00m 55s)
  • 11:17 awight@deploy1001: Synchronized wmf-config/VariantSettings.php: SWAT: Enable EditorJourney for euwiki (T232061) (duration: 00m 56s)
  • 11:13 Urbanecm: Run mwscript emptyUserGroup.php --wiki=aawiki 'inactive' (T150538)
  • 10:58 mobrovac: bootstrap restbase2010-b - T224553
  • 10:44 vgutierrez: replacing nginx with ATS in cp1076 (upload cluster) - T231433
  • 09:48 marostegui@cumin1001: dbctl commit (dc=all): 'Depool and stop replication on db1130 db1104 db1085 db1086 (lag will appear on s6 on labsdb) for PDU maintenance - T227539', diff saved to https://phabricator.wikimedia.org/P9116 and previous config saved to /var/cache/conftool/dbconfig/20190917-094827-marostegui.json
  • 09:46 marostegui: Depool and stop replication on db1130 db1104 db1085 db1086 (lag will appear on s6 on labsdb) for PDU maintenance - T227539
  • 09:30 hashar: Restarting CI jenkins
  • 09:29 marostegui: Downtime db1073 db1130 db1104 db1085 db1086 for the PDU maintenance T227539
  • 09:18 jynus@cumin2001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
  • 09:16 mobrovac: bootstrap restbase2010-a - T224553
  • 09:15 jynus@cumin2001: START - Cookbook sre.hosts.downtime
  • 09:05 jiji@deploy1001: Synchronized wmf-config/CommonSettings.php: Push PHP7 traffic to 100% of users who accept cookies - T219150 (duration: 00m 57s)
  • 08:37 vgutierrez: upgrading ATS to 8.0.5-1wm8 on cp3034 - T231849 T232724
  • 07:58 marostegui@cumin1001: dbctl commit (dc=all): 'Pool db1074 with just 50 to keep its warmness level just in case T231638', diff saved to https://phabricator.wikimedia.org/P9115 and previous config saved to /var/cache/conftool/dbconfig/20190917-075807-marostegui.json
  • 07:48 effie: Enable puppet on mw*
  • 07:42 elukey: reboot analytics-tool1004 (host running superset) for kernel updates
  • 07:41 marostegui: Stop mysql on db1063 for decommissioning T232564
  • 07:40 marostegui: Remove db1063 from puppet and zarcillo T232564
  • 07:29 vgutierrez: repooling cp5007 without wikibase configuration - T99531
  • 07:23 jmm@cumin2001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
  • 07:21 jmm@cumin2001: START - Cookbook sre.hosts.downtime
  • 07:19 vgutierrez: depooling cp5007 to ensure that wikibase removal goes as expected - T99531
  • 07:10 vgutierrez: getting rid of wikibase TLS certificate & nginx configuration on the text cache cluster - T99531
  • 06:56 vgutierrez: upgrading ATS to 8.0.5-1wm8 on cp2002, cp4021 and cp5001 - T231849
  • 06:55 vgutierrez: uploaded trafficserver 8.0.5-1wm8 to apt.wikimedia.org (stretch) - T231849
  • 06:53 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db1066 T233071', diff saved to https://phabricator.wikimedia.org/P9114 and previous config saved to /var/cache/conftool/dbconfig/20190917-065342-marostegui.json
  • 06:49 moritzm: reimage restbase2010 to Stretch T224553
  • 05:57 vgutierrez: upgrading ATS to 8.0.5-1wm7 on cp2002 and cp4021 - T232724
  • 05:56 vgutierrez: uploaded trafficserver 8.0.5-1wm7 to apt.wikimedia.org (stretch) - T232298 T232724
  • 05:23 effie: disable puppet on mw* servers for 536979
  • 05:01 marostegui@cumin1001: dbctl commit (dc=all): 'Promote db1122 to s2 master and remove read-only from s2 T230785', diff saved to https://phabricator.wikimedia.org/P9113 and previous config saved to /var/cache/conftool/dbconfig/20190917-050133-marostegui.json
  • 05:00 marostegui@cumin1001: dbctl commit (dc=all): 'Set s2 as read-only for maintenance T230785', diff saved to https://phabricator.wikimedia.org/P9112 and previous config saved to /var/cache/conftool/dbconfig/20190917-050043-marostegui.json
  • 05:00 marostegui: Starting s2 failover from db1066 to db1122 - T230785
  • 04:57 effie: Downtiming HTTPS-blog on icing - T232412
  • 04:14 marostegui@cumin1001: dbctl commit (dc=all): 'Set db1122 with weight 0 and depool it from API T230785', diff saved to https://phabricator.wikimedia.org/P9111 and previous config saved to /var/cache/conftool/dbconfig/20190917-041441-marostegui.json
  • 04:11 marostegui: Start s2 pre-switchover steps T230785
  • 00:34 AndyRussG: updated fruec from fb29cb7407 to 97128874bf

2019-09-16

  • 23:53 jforrester@deploy1001: Synchronized wmf-config/VariantSettings.php: Stop setting wgDebugLogFile in VS (duration: 00m 55s)
  • 23:52 jforrester@deploy1001: Synchronized wmf-config/CommonSettings.php: Set wgDebugLogFile in CS (duration: 00m 55s)
  • 23:42 jforrester@deploy1001: Synchronized wmf-config/VariantSettings.php: Stop setting wgUploadThumbnailRenderHttpCustom* in VS (duration: 00m 54s)
  • 23:41 jforrester@deploy1001: Synchronized wmf-config/CommonSettings.php: Set wgUploadThumbnailRenderHttpCustom* in CS (duration: 00m 55s)
  • 23:30 jforrester@deploy1001: Synchronized wmf-config/VariantSettings.php: Stop setting wmgRC2UDPAddress in VS (duration: 00m 55s)
  • 23:29 jforrester@deploy1001: Synchronized wmf-config/CommonSettings.php: Set wmgRC2UDPAddress in CS (duration: 00m 56s)
  • 23:24 jforrester@deploy1001: Synchronized wmf-config/VariantSettings.php: Stop setting wgCopyUploadProxy in VS (duration: 00m 56s)
  • 23:21 jforrester@deploy1001: Synchronized wmf-config/CommonSettings.php: Set wgCopyUploadProxy in CS (duration: 00m 55s)
  • 23:13 jforrester@deploy1001: Synchronized wmf-config/CommonSettings.php: T225261 T194019 Adjust CentralNotice CSP for banner previews for FR-tech (duration: 00m 55s)
  • 22:59 chaomodus: restarted nagios-nrpe-server on notebook1003
  • 22:46 jforrester@deploy1001: Synchronized wmf-config/InitialiseSettings.php: Use __DIR__ rather than global wmfConfgDir (duration: 00m 55s)
  • 21:48 ebernhardson: unban elastic1027 from production-search-eqiad
  • 20:55 XioNoX: remove 2 sessions to AS12871 on cr2-esams - T232617
  • 20:20 dzahn@cumin1001: END (PASS) - Cookbook sre.hosts.ipmi-password-reset (exit_code=0)
  • 20:20 dzahn@cumin1001: Updating IPMI password on 1 hosts - dzahn@cumin1001
  • 20:20 dzahn@cumin1001: START - Cookbook sre.hosts.ipmi-password-reset
  • 20:19 dzahn@cumin1001: END (PASS) - Cookbook sre.hosts.ipmi-password-reset (exit_code=0)
  • 20:19 dzahn@cumin1001: Updating IPMI password on 1 hosts - dzahn@cumin1001
  • 20:19 dzahn@cumin1001: START - Cookbook sre.hosts.ipmi-password-reset
  • 20:19 dzahn@cumin1001: END (PASS) - Cookbook sre.hosts.ipmi-password-reset (exit_code=0)
  • 20:19 dzahn@cumin1001: Updating IPMI password on 1 hosts - dzahn@cumin1001
  • 20:19 dzahn@cumin1001: START - Cookbook sre.hosts.ipmi-password-reset
  • 20:19 dzahn@cumin1001: END (PASS) - Cookbook sre.hosts.ipmi-password-reset (exit_code=0)
  • 20:19 dzahn@cumin1001: Updating IPMI password on 1 hosts - dzahn@cumin1001
  • 20:18 dzahn@cumin1001: START - Cookbook sre.hosts.ipmi-password-reset
  • 20:15 dzahn@cumin1001: END (PASS) - Cookbook sre.hosts.ipmi-password-reset (exit_code=0)
  • 20:15 dzahn@cumin1001: Updating IPMI password on 8 hosts - dzahn@cumin1001
  • 20:14 dzahn@cumin1001: START - Cookbook sre.hosts.ipmi-password-reset
  • 20:10 dzahn@cumin1001: END (PASS) - Cookbook sre.hosts.ipmi-password-reset (exit_code=0)
  • 20:10 dzahn@cumin1001: Updating IPMI password on 2 hosts - dzahn@cumin1001
  • 20:09 dzahn@cumin1001: START - Cookbook sre.hosts.ipmi-password-reset
  • 20:08 dzahn@cumin1001: END (PASS) - Cookbook sre.hosts.ipmi-password-reset (exit_code=0)
  • 20:08 dzahn@cumin1001: Updating IPMI password on 2 hosts - dzahn@cumin1001
  • 20:08 dzahn@cumin1001: START - Cookbook sre.hosts.ipmi-password-reset
  • 19:55 XioNoX: reboot scs-a8-eqiad (at 100% CPU)
  • 19:55 dzahn@cumin1001: END (PASS) - Cookbook sre.hosts.ipmi-password-reset (exit_code=0)
  • 19:55 dzahn@cumin1001: Updating IPMI password on 2 hosts - dzahn@cumin1001
  • 19:54 dzahn@cumin1001: START - Cookbook sre.hosts.ipmi-password-reset
  • 19:53 dzahn@cumin1001: END (PASS) - Cookbook sre.hosts.ipmi-password-reset (exit_code=0)
  • 19:53 dzahn@cumin1001: Updating IPMI password on 2 hosts - dzahn@cumin1001
  • 19:52 dzahn@cumin1001: START - Cookbook sre.hosts.ipmi-password-reset
  • 19:51 dzahn@cumin1001: END (PASS) - Cookbook sre.hosts.ipmi-password-reset (exit_code=0)
  • 19:51 dzahn@cumin1001: Updating IPMI password on 2 hosts - dzahn@cumin1001
  • 19:51 dzahn@cumin1001: START - Cookbook sre.hosts.ipmi-password-reset
  • 19:35 dzahn@cumin1001: END (PASS) - Cookbook sre.hosts.ipmi-password-reset (exit_code=0)
  • 19:35 dzahn@cumin1001: Updating IPMI password on 12 hosts - dzahn@cumin1001
  • 19:34 dzahn@cumin1001: START - Cookbook sre.hosts.ipmi-password-reset
  • 19:28 dzahn@cumin1001: END (PASS) - Cookbook sre.hosts.ipmi-password-reset (exit_code=0)
  • 19:28 dzahn@cumin1001: Updating IPMI password on 12 hosts - dzahn@cumin1001
  • 19:27 dzahn@cumin1001: START - Cookbook sre.hosts.ipmi-password-reset
  • 19:27 dzahn@cumin1001: END (PASS) - Cookbook sre.hosts.ipmi-password-reset (exit_code=0)
  • 19:27 dzahn@cumin1001: Updating IPMI password on 1 hosts - dzahn@cumin1001
  • 19:26 dzahn@cumin1001: START - Cookbook sre.hosts.ipmi-password-reset
  • 19:19 dzahn@cumin1001: END (PASS) - Cookbook sre.hosts.ipmi-password-reset (exit_code=0)
  • 19:19 dzahn@cumin1001: Updating IPMI password on 2 hosts - dzahn@cumin1001
  • 19:19 dzahn@cumin1001: START - Cookbook sre.hosts.ipmi-password-reset
  • 19:13 dzahn@cumin1001: END (PASS) - Cookbook sre.hosts.ipmi-password-reset (exit_code=0)
  • 19:13 dzahn@cumin1001: Updating IPMI password on 8 hosts - dzahn@cumin1001
  • 19:13 dzahn@cumin1001: START - Cookbook sre.hosts.ipmi-password-reset
  • 19:09 dzahn@cumin1001: END (PASS) - Cookbook sre.hosts.ipmi-password-reset (exit_code=0)
  • 19:09 dzahn@cumin1001: Updating IPMI password on 8 hosts - dzahn@cumin1001
  • 19:08 dzahn@cumin1001: START - Cookbook sre.hosts.ipmi-password-reset
  • 19:03 jforrester@deploy1001: Synchronized wmf-config/CommonSettings.php: Stop setting wgCookieSetOnAutoBlock and wgCookieSetOnIpBlock to the default; never varied (duration: 00m 56s)
  • 19:02 jforrester@deploy1001: Synchronized wmf-config/InitialiseSettings.php: Clean up globals in InitialiseSettings.php (duration: 00m 56s)
  • 19:01 dzahn@cumin1001: END (PASS) - Cookbook sre.hosts.ipmi-password-reset (exit_code=0)
  • 19:01 dzahn@cumin1001: Updating IPMI password on 0 hosts - dzahn@cumin1001
  • 19:00 dzahn@cumin1001: START - Cookbook sre.hosts.ipmi-password-reset
  • 18:55 dzahn@cumin1001: END (PASS) - Cookbook sre.hosts.ipmi-password-reset (exit_code=0)
  • 18:54 dzahn@cumin1001: Updating IPMI password on 1 hosts - dzahn@cumin1001
  • 18:54 dzahn@cumin1001: START - Cookbook sre.hosts.ipmi-password-reset
  • 18:52 jforrester@deploy1001: Synchronized wmf-config/CommonSettings.php: T223602 Variant configuration: Read JSON config for all wikis (duration: 00m 56s)
  • 18:48 jforrester@deploy1001: Synchronized wmf-config/CommonSettings.php: Set MinimumPasswordLengthToLogin to 10 for all prived groups, not just +staff (duration: 00m 56s)
  • 18:40 jforrester@deploy1001: Synchronized src/WmfClusters.php: Use static VariantSettings instead of InitialiseSettings (noc-only change) (duration: 00m 55s)
  • 18:40 mutante: phab1001 - racadm racreset
  • 18:21 jforrester@deploy1001: Synchronized wmf-config/VariantSettings.php: Remove globals declaration and use via GLOBALS for testability (duration: 00m 56s)
  • 18:15 Lucas_WMDE: Morning SWAT done
  • 18:14 lucaswerkmeister-wmde@deploy1001: Synchronized wmf-config/: SWAT: bridge: enable EditTags for beta (T232582) (duration: 00m 58s)
  • 18:12 herron: migrating kafka1002 to kafka-main1002 T225005
  • 18:09 mutante: registry2001 - restarting nginx
  • 17:55 jforrester@deploy1001: Synchronized docroot/noc/conf/VariantSettings.php.txt: New file for NOC (duration: 00m 55s)
  • 17:49 ejegg: updated SmashPig standalone from 5d187092a7 to a0151434f4
  • 17:42 urandom: decommissioning Cassandra, restbase2010-c -- T224553
  • 17:42 ebernhardson: restart elasticsearch_6@production-search-eqiad on elastic1027 due to >1k orphan tasks
  • 17:09 jforrester@deploy1001: Synchronized docroot/noc/conf/VariantSettings.php.txt: New file for NOC (duration: 00m 54s)
  • 16:59 jforrester@deploy1001: Synchronized wmf-config/InitialiseSettings.php: Make CommonSettings use mtime from VariantSettings (duration: 00m 55s)
  • 16:58 jforrester@deploy1001: Synchronized wmf-config/InitialiseSettings.php: Make InitialiseSettings use values from VariantSettings (duration: 00m 54s)
  • 16:55 jforrester@deploy1001: Synchronized wmf-config/VariantSettings.php: Establish VariantSettings.php everywhere (duration: 00m 56s)
  • 16:51 ebernhardson: ban elastic1027 from production-search-eqiad-chi
  • 16:12 jforrester@deploy1001: Synchronized wmf-config/InitialiseSettings.php: T223602 Inject config object into InitialiseSettings-labs rather than use wgConf global (duration: 00m 55s)
  • 15:42 jforrester@deploy1001: Synchronized wmf-config/CommonSettings.php: Variant configuration: Write JSON config for all wikis T223602 (duration: 00m 56s)
  • 15:41 jforrester@deploy1001: sync aborted: wmf-config/CommonSettings.php Variant configuration: Write JSON config for all wikis T223602 (duration: 00m 08s)
  • 15:41 jforrester@deploy1001: Started scap: wmf-config/CommonSettings.php Variant configuration: Write JSON config for all wikis T223602
  • 15:10 @: helmfile [STAGING] Ran 'sync' command on namespace 'wikifeeds' for release 'staging' .
  • 15:07 @: helmfile [CODFW] Ran 'sync' command on namespace 'wikifeeds' for release 'production' .
  • 15:06 @: helmfile [EQIAD] Ran 'sync' command on namespace 'wikifeeds' for release 'production' .
  • 14:54 urandom: decommissioning Cassandra, restbase2010-b -- T224553
  • 14:37 @: helmfile [EQIAD] Ran 'apply' command on namespace 'wikifeeds' for release 'production' .
  • 14:25 akosiaris@: helmfile [EQIAD] Ran 'sync' command on namespace 'kube-system' for release 'calico-policy-controller' .
  • 14:09 @: helmfile [CODFW] Ran 'apply' command on namespace 'wikifeeds' for release 'production' .
  • 14:05 @: helmfile [STAGING] Ran 'apply' command on namespace 'wikifeeds' for release 'staging' .
  • 13:48 elukey@cumin1001: END (FAIL) - Cookbook sre.hadoop.reboot-workers (exit_code=99)
  • 13:28 hashar@deploy1001: Synchronized php-1.34.0-wmf.22/extensions/FlaggedRevs/frontend/specialpages/reports/ValidationStatistics.php: Add missing "use" to getTopReviewers() - T232618 (duration: 00m 55s)
  • 13:10 moritzm: rebooting failoid2001 for kernel update/pick up new qemu
  • 13:09 jmm@cumin2001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
  • 13:09 jmm@cumin2001: START - Cookbook sre.hosts.downtime
  • 13:04 hashar@deploy1001: rebuilt and synchronized wikiversions files: all wikis to 1.34.0-wmf.22
  • 12:59 moritzm: installing qemu security updates on stretch
  • 12:58 urandom: decommissioning Cassandra, restbase2010-a -- T224553
  • 12:44 godog: stop thumbor traffic to statsd/graphite, use Prometheus only and replace Thumbor dashboard - T205870
  • 12:40 elukey@cumin1001: START - Cookbook sre.hadoop.reboot-workers
  • 12:17 elukey@cumin1001: END (FAIL) - Cookbook sre.hadoop.reboot-workers (exit_code=99)
  • 12:17 elukey@cumin1001: START - Cookbook sre.hadoop.reboot-workers
  • 12:07 _joe_: rolling restart ended on eqiad T232613
  • 11:56 _joe_: rolling restart of php-fpm in eqiad to pick up the new memcached extension T232613
  • 11:50 _joe_: rolling restart of php-fpm in codfw to pick up the new memcached extension T232613
  • 11:43 Urbanecm: EU SWAT is done
  • 11:38 urbanecm@deploy1001: Synchronized wmf-config/throttle.php: SWAT: e37aed2: Remove expired throttle rules (duration: 01m 03s)
  • 11:36 urbanecm@deploy1001: Synchronized wmf-config/InitialiseSettings.php: SWAT: 313e3d9: Increase move rate-limit on Commons for all autopatrolled users (T232657) (duration: 01m 05s)
  • 11:33 jbond42: update peer address of AS28598
  • 11:30 effie: Upgrading php-memcached to 3.0.1+2.2.0-1~wmf3
  • 11:30 awight@deploy1001: Synchronized php-1.34.0-wmf.22/extensions/FileImporter: SWAT: Send a User-Agent with remote API requests (T232840) (duration: 01m 02s)
  • 11:28 urbanecm@deploy1001: Synchronized wmf-config/throttle.php: SWAT: 869b56f: Lift IP cap on 2019-10-02 for Senior Citizen Write Wikipedia course - cs.wikipedia (T232831) (duration: 01m 02s)
  • 11:21 awight@deploy1001: Synchronized wmf-config/CommonSettings-labs.php: SWAT: Enable File Importer source wiki edits on beta cluster (T228851) (duration: 01m 03s)
  • 11:14 awight@deploy1001: Synchronized wmf-config/CommonSettings.php: SWAT: Enable source wiki editing for testwiki (T228851) (duration: 01m 02s)
  • 11:10 awight@deploy1001: Synchronized php-1.34.0-wmf.22/extensions/FileImporter: SWAT: Add debug logging for remote API failures (T228851) (duration: 01m 05s)
  • 11:06 _joe_: uploaded php-memcached_3.0.1+2.2.0-1~wmf3 to component/php72 for stretch T232613
  • 10:52 vgutierrez@puppetmaster1001: conftool action : set/pooled=yes; selector: name=ncredir2002.codfw.wmnet
  • 10:51 vgutierrez@puppetmaster1001: conftool action : set/pooled=no; selector: name=ncredir2002.codfw.wmnet
  • 10:50 jdrewniak@deploy1001: Synchronized portals: Wikimedia Portals Update: Bumping portals to master (T128546) (duration: 01m 03s)
  • 10:49 jdrewniak@deploy1001: Synchronized portals/wikipedia.org/assets: Wikimedia Portals Update: Bumping portals to master (T128546) (duration: 01m 04s)
  • 10:45 vgutierrez: Enabling OCSP prefetched responses for the non-canonical redirect service - T232988
  • 10:29 _joe_: installing a patched php-memcached on mw1347 T232613
  • 10:16 vgutierrez: upgrade acme-chief production servers to acme-chief 0.21 - T219765
  • 10:16 moritzm: upload libtrapperkeeper-webserver-jetty9-clojure 1.7.0-2+wmf1 to buster-wikimedia
  • 10:05 vgutierrez: restarting acmechief servers to get latest kernel upgrades
  • 09:26 elukey@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
  • 09:24 elukey@cumin1001: START - Cookbook sre.hosts.downtime
  • 08:57 vgutierrez: replacing nginx with ATS in cp3034 (upload cluster) - T231433
  • 08:56 mobrovac@deploy1001: Synchronized wmf-config/CommonSettings.php: Beta: enable the Parsoid extension - T231569 (duration: 01m 01s)
  • 08:50 marostegui: Apply grants for dbproxy1021 on db1133 (m5 master) with replication - T202367
  • 08:16 elukey@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
  • 08:14 elukey@cumin1001: START - Cookbook sre.hosts.downtime
  • 08:11 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
  • 08:09 marostegui@cumin1001: START - Cookbook sre.hosts.downtime
  • 07:38 moritzm: installing faad2 security updates
  • 07:15 moritzm: repooling restbase2009
  • 06:48 marostegui: Stop MySQL on db1114 to upgrade it to 10.3
  • 06:04 marostegui: Stop MySQL on db2054 for decommissioning T232969
  • 06:01 marostegui: Remove db2054 from tendril and zarcillo T232969
  • 05:59 marostegui@deploy1001: Synchronized wmf-config/db-codfw.php: Remove db2054 from config T232969 (duration: 01m 03s)
  • 05:58 marostegui@deploy1001: Synchronized wmf-config/db-eqiad.php: Remove db2054 from config T232969 (duration: 01m 05s)

2019-09-15

  • 16:51 Krinkle: Fixed a dozen abuse filters, listed at https://phabricator.wikimedia.org/T156096#5494060. The trailing pipe character was removed from filters that had it which is no longer supported in a future version of AbuseFilter.
  • 14:35 _joe_: test: setting opcache.interned_strings_buffer to 0 on mw1348 for T232613

2019-09-14

  • 23:42 onimisionipe: force shard allocation (dewiki_content_1566659363[4]) on eqiad cluster
  • 04:39 effie: Depool and reload mw1286
  • 01:14 ejegg: updated fundraising python tools from 1e405864d7 to e1b81688c6
  • 00:29 ejegg: updated payments-wiki from 1f556670cf to fc82318180

2019-09-13

  • 23:44 dzahn@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
  • 23:42 dzahn@cumin1001: START - Cookbook sre.hosts.downtime
  • 23:06 gehel: re-enable puppet on maps - T232817
  • 20:23 chaomodus: restarting netbox1001.wikimedia.org
  • 20:00 twentyafterfour: hotfixing T232600 due to severity of the bug and relative safety of the fix (if this breaks, yell at James_F who twisted my arm and made me do it)
  • 19:54 urandom: bootstrapping Cassandra, restbase2009-c -- T224553
  • 17:24 urandom: bootstrapping Cassandra, restbase2009-b -- T224553
  • 16:10 XioNoX: fix bgp group netflow on cr2-codfw
  • 15:47 urandom: bootstrapping Cassandra, restbase2009-a -- T224553
  • 15:43 effie: reverting live hacks on mw1348
  • 15:34 hashar@deploy1001: Synchronized wmf-config/CommonSettings.php: Disable adhoc core dump logging - T232613 (duration: 01m 04s)
  • 15:14 akosiaris: upload apertium-dan_0.6.0-1+wmf3 apertium-nno_1.0.0-1+wmf1 apertium-nob_1.0.0-2+wmf1 apertium-swe_0.8.0-1+wmf1 to apt.wikimedia.org/jessie-wikimedia T218184
  • 15:11 jmm@cumin2001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
  • 15:11 jmm@cumin2001: START - Cookbook sre.hosts.downtime
  • 15:08 jmm@cumin2001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
  • 15:08 jmm@cumin2001: START - Cookbook sre.hosts.downtime
  • 15:02 hashar@deploy1001: Synchronized php-1.34.0-wmf.22/includes/libs/rdbms/lbfactory/LBFactoryMulti.php: Add more log and context for T232613 logging - T232613 (duration: 01m 04s)
  • 15:02 jmm@cumin2001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
  • 15:02 jmm@cumin2001: START - Cookbook sre.hosts.downtime
  • 14:51 jmm@cumin2001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
  • 14:51 jmm@cumin2001: START - Cookbook sre.hosts.downtime
  • 14:37 jmm@cumin2001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
  • 14:36 jmm@cumin2001: START - Cookbook sre.hosts.downtime
  • 14:30 akosiaris@: helmfile [EQIAD] Ran 'sync' command on namespace 'kube-system' for release 'coredns' .
  • 14:30 moritzm: installing cups security update on buster (only client-side libs installed)
  • 14:22 moritzm: installing bzip2 update from Buster 10.1 point release
  • 14:18 moritzm: installing reportbug update from Buster 10.1 point release
  • 14:14 akosiaris@: helmfile [CODFW] Ran 'apply' command on namespace 'kube-system' for release 'coredns' .
  • 14:05 akosiaris@: helmfile [CODFW] Ran 'apply' command on namespace 'kube-system' for release 'coredns' .
  • 13:57 oblivian@deploy1001: Synchronized wmf-config/logging.php: unbreak mediawiki logging on scandium (duration: 01m 04s)
  • 13:28 akosiaris@: helmfile [CODFW] Ran 'sync' command on namespace 'kube-system' for release 'coredns' .
  • 13:27 akosiaris@: helmfile [CODFW] Ran 'sync' command on namespace 'kube-system' for release 'calico-policy-controller' .
  • 13:21 akosiaris@: helmfile [CODFW] Ran 'sync' command on namespace 'kube-system' for release 'coredns' .
  • 13:20 akosiaris@: helmfile [CODFW] Ran 'sync' command on namespace 'kube-system' for release 'coredns' .
  • 13:19 akosiaris@: helmfile [CODFW] Ran 'sync' command on namespace 'kube-system' for release 'coredns' .
  • 12:56 _joe_: banning more urls on maps1003
  • 12:37 _joe_: temp ban of class of urls on maps1003 nginx
  • 12:14 jbond42: add timing information to maps1003 access logs
  • 11:39 jbond42: enable access logs on maps1003
  • 11:38 _joe_: manually raising the worker heap limit to 600 MB on kartotherian on maps1003
  • 11:11 elukey: reboot an-conf100* (Analytics Zookeeper nodes - not yet in production) for kernel upgrades
  • 11:10 elukey: reboot an-tool1007 (runs turnilo) for kernel upgrades
  • 11:08 jmm@cumin2001: END (FAIL) - Cookbook sre.hosts.downtime (exit_code=99)
  • 11:08 jmm@cumin2001: START - Cookbook sre.hosts.downtime
  • 11:05 godog: silence kartotherian pages for 2h, known issue
  • 10:47 vgutierrez: rebooting acmechief-test servers to catch up latest kernel upgrades
  • 10:42 akosiaris@: helmfile [STAGING] Ran 'sync' command on namespace 'kube-system' for release 'coredns' .
  • 10:41 moritzm: reimage restbase2009 to stretch T224553
  • 10:38 moritzm: repool restbase1018 after reimage to stretch and completed Cassandra bootstrap
  • 10:36 akosiaris@: helmfile [STAGING] Ran 'sync' command on namespace 'kube-system' for release 'coredns' .
  • 10:36 akosiaris@: helmfile [STAGING] Ran 'sync' command on namespace 'kube-system' for release 'coredns' .
  • 10:13 vgutierrez: disable ATS-TLS debug options on cp5001 - T232298
  • 10:09 akosiaris@: helmfile [STAGING] Ran 'sync' command on namespace 'kube-system' for release 'coredns' .
  • 09:46 gehel: re-enabling /geoline on maps1004 - T232817
  • 09:45 @: helmfile [STAGING] Ran 'apply' command on namespace 'blubberoid' for release 'staging' .
  • 09:44 @: helmfile [EQIAD] Ran 'apply' command on namespace 'blubberoid' for release 'production' .
  • 09:42 @: helmfile [CODFW] Ran 'apply' command on namespace 'blubberoid' for release 'production' .
  • 09:40 godog: install linux-perf-4.9 on maps1002 and attempt to capture a stack sample
  • 09:38 gehel: drop /geoshape and restart kartotherian on maps1004 - T232817
  • 09:27 gehel: restart kartotherian on maps1004 - T232817
  • 09:24 gehel: deny access to /geoline on maps1004 - T232817
  • 09:11 oblivian@puppetmaster1001: conftool action : set/pooled=true; selector: dnsdisc=kartotherian,name=eqiad
  • 09:08 godog: downtime kartotherian pages for 1h in codfw
  • 09:01 oblivian@puppetmaster1001: conftool action : set/pooled=inactive; selector: name=elastic1046.eqiad.wmnet
  • 09:00 oblivian@puppetmaster1001: conftool action : set/pooled=inactive; selector: name=elastic1017.eqiad.wmnet
  • 08:57 oblivian@puppetmaster1001: conftool action : set/pooled=false; selector: dnsdisc=kartotherian,name=eqiad
  • 08:52 godog: downtime kartotherian pages for 1h
  • 08:48 jmm@cumin2001: END (PASS) - Cookbook sre.hosts.ipmi-password-reset (exit_code=0)
  • 08:48 jmm@cumin2001: Updating IPMI password on 1 hosts - jmm@cumin2001
  • 08:47 jmm@cumin2001: START - Cookbook sre.hosts.ipmi-password-reset
  • 08:47 jmm@cumin2001: END (FAIL) - Cookbook sre.hosts.ipmi-password-reset (exit_code=99)
  • 08:47 jmm@cumin2001: START - Cookbook sre.hosts.ipmi-password-reset
  • 08:45 gehel: stop tilerator on maps to help reduce load
  • 08:37 _joe_: rolling restart of karotherian
  • 08:33 _joe_: restarting kartotherian on maps1003, all workers seem stuck
  • 05:58 oblivian@deploy1001: Synchronized w/fatal-error.php: Adding core dump function to fatal-error (duration: 01m 04s)
  • 05:40 _joe_: live-hacking mw1348, setting rlimit_core = unlimited to allow core dumps to be taken
  • 05:17 effie: Rolling restart php-fpm across the fleet for 536400
  • 04:53 vgutierrez: restarting ats-tls on cp4021 and cp2002 to pick up the new SSL session cache timeout - T231849
  • 04:50 eileen: process-control config revision is 43a2677bcf - turned off gender import
  • 02:23 eileen: civicrm revision changed from c5ab5aea9e to 45dbfdb96f, config revision is 1da8391a9a
  • 01:09 XioNoX: add IPv6 sampling to cr1-eqiad
  • 01:07 XioNoX: enable netflow sampling on cr2-codfw

2019-09-12

  • 23:35 XioNoX: enable netflow sampling on cr1-codfw
  • 23:21 urandom: decommissioning Cassandra, restbase2009-b -- T224553
  • 23:19 jforrester@deploy1001: Synchronized wmf-config/CommonSettings.php: T223602 Read config from JSON, not serialised PHP on testwiki (duration: 01m 03s)
  • 23:18 jforrester@deploy1001: Synchronized multiversion/MWConfigCacheGenerator.php: T223602 Add ability to read config from JSON, not serialised PHP (duration: 01m 04s)
  • 23:10 eileen: process-control config revision is 1da8391a9a
  • 22:53 ayounsi@cumin2001: END (PASS) - Cookbook sre.ganeti.makevm (exit_code=0)
  • 22:48 ayounsi@cumin2001: START - Cookbook sre.ganeti.makevm
  • 22:43 ayounsi@cumin2001: END (FAIL) - Cookbook sre.ganeti.makevm (exit_code=99)
  • 22:43 ayounsi@cumin2001: START - Cookbook sre.ganeti.makevm
  • 22:20 XenoRyet: payments-wiki updated from 4ebbdb247d to 1f556670cf
  • 22:14 XioNoX: remove extra prepend in AMS-IX
  • 21:18 hashar@deploy1001: Synchronized php-1.34.0-wmf.22/includes/libs/rdbms/lbfactory/LBFactoryMulti.php: Hardcode posix signal and log coredump - T232613 (duration: 01m 04s)
  • 21:17 mbsantos@deploy1001: Finished deploy [tilerator/deploy@5996843]: Deploy tilerator 1.1.4-wmf.0 (duration: 03m 18s)
  • 21:14 mbsantos@deploy1001: Started deploy [tilerator/deploy@5996843]: Deploy tilerator 1.1.4-wmf.0
  • 21:13 mbsantos@deploy1001: Finished deploy [kartotherian/deploy@c4c9e8b]: Deploy kartotherian 1.1.4-wmf.0 (duration: 03m 52s)
  • 21:09 mbsantos@deploy1001: Started deploy [kartotherian/deploy@c4c9e8b]: Deploy kartotherian 1.1.4-wmf.0
  • 21:00 urandom: decommissioning Cassandra, restbase2009 -- T224553
  • 20:33 krinkle@deploy1001: Synchronized wmf-config/: d495d5e24949 (duration: 01m 03s)
  • 20:28 krinkle@deploy1001: Synchronized multiversion/MWConfigCacheGenerator.php: d495d5e24949 (duration: 01m 04s)
  • 20:27 eileen: civicrm revision changed from 4075e396d5 to f00c6482bf, config revision is 635f198b92
  • 20:05 krinkle@deploy1001: Synchronized wmf-config/InitialiseSettings-labs.php: beta-only (duration: 01m 02s)
  • 20:03 krinkle@deploy1001: Synchronized wmf-config/CommonSettings-labs.php: beta-only (duration: 01m 04s)
  • 20:02 moritzm: installing firmware-nonfree update from Buster 10.1 point release
  • 19:51 moritzm: installing systemd bugfix update from Buster 10.1 point release
  • 19:44 moritzm: installing 4.19.67 kernel from 10.1 point release on Buster systems
  • 19:34 urandom: bootstrapping Cassandra, restbase1018-c -- T224553
  • 18:59 hashar@deploy1001: Synchronized wmf-config/CommonSettings.php: Enable coredump on some mysterious php7.2 failure - T232613 (duration: 01m 04s)
  • 18:32 moritzm: installing gdb updates from buster 10.1 point release
  • 18:28 bblack: lvs1016: restart pybal to revert test
  • 18:21 bblack: lvs1016: restart pybal to test dual bgp peering
  • 18:04 bblack: lvs1015: restart pybal to return BGP session to cr2 - T226424
  • 18:03 bblack: lvs1014: restart pybal to return BGP session to cr2 - T226424
  • 17:58 XioNoX: revert VRRP priority change cr2-eqiad - T226424
  • 17:54 XioNoX: revert OSPF priority change on cr2-eqiad - T226424
  • 17:53 XioNoX: re-enabled external BGP on cr2-eqiad - T226424
  • 17:46 urandom: bootstrapping Cassandra, restbase1018-b -- T224553
  • 17:43 XioNoX: reboot cr2-eqiad - T226424
  • 17:40 XioNoX: failover cr2-eqiad master RE from RE1 to RE0 - T226424
  • 17:31 jforrester@deploy1001: Synchronized php-1.34.0-wmf.22/includes/libs/rdbms/lbfactory/LBFactoryMulti.php: T232613 Add ability to core dump on empty string array key that should exist (wmf.22 only, flagged off) (duration: 01m 03s)
  • 17:31 XioNoX: power off re0.cr2-eqiad - T226424
  • 17:25 XioNoX: failover cr2-eqiad master RE from RE0 to RE1 - T226424
  • 17:19 halfak@deploy1001: Finished deploy [ores/deploy@7d45b80]: T232660 (duration: 13m 41s)
  • 17:05 halfak@deploy1001: Started deploy [ores/deploy@7d45b80]: T232660
  • 17:04 XioNoX: power off re1.cr2-eqiad - T226424
  • 17:02 moritzm: installing unzip security updates on buster
  • 17:00 XioNoX: +1000 metric to all transport to/from cr2-eqiad - T226424
  • 16:57 moritzm: installing libxslt security updates on buster
  • 16:49 XioNoX: Deactivate IX/transit/private-peer v4/v6 BGP on cr2-eqiad - T226424
  • 16:47 moritzm: installing NSS security updates on buster
  • 16:42 XioNoX: er, switch VRRP master to cr1-eqiad - T226424
  • 16:42 XioNoX: switch VRRP master to cr2-eqiad - T226424
  • 16:36 bblack: lvs1013: restart pybal to move bgp session to cr1 - T226424
  • 16:36 bblack: lvs1014: restart pybal to move bgp session to cr1 - T226424
  • 16:35 bblack: lvs1015: restart pybal to move bgp session to cr1 - T226424
  • 16:34 bblack: lvs1016: restart pybal to move bgp session to cr1 - T226424
  • 16:19 XioNoX: rollback force VRRP backup on cr1-eqiad - T226424
  • 16:16 XioNoX: activate CF tunnel on cr1-eqiad - T226424
  • 16:16 XioNoX: activate transit4/6 on cr1-eqiad - T226424
  • 16:09 urandom: bootstrapping Cassandra, restbase1018-a -- T224553
  • 16:04 XioNoX: reboot cr1-eqiad - T226424
  • 16:01 XioNoX: force offline/online of FPC3 on cr1-eqiad
  • 15:45 XioNoX: failover master RE from RE1 to RE0 on cr1-eqiad - T226424
  • 15:39 XioNoX: deactivate transit4/6 on cr1-eqiad - T226424
  • 15:31 XioNoX: shutdown re0.cr1-eqiad - T226424
  • 15:23 XioNoX: failover master RE from RE0 to RE1 on cr1-eqiad - T226424
  • 15:13 XioNoX: shutdown re1.cr1-eqiad - T226424
  • 15:05 XioNoX: disable primary tunnel to CF in eqiad (for real this time, I did see an uptake of traffic on backup link before the rollback)
  • 15:03 XioNoX: rolled back disable primary tunnel to CF in eqiad
  • 15:02 XioNoX: disable primary tunnel to CF in eqiad
  • 14:53 bblack: restart pybal on lvs1013 to move BGP conn to cr2-eqiad - https://gerrit.wikimedia.org/r/#/c/operations/puppet/+/536209 - T226424
  • 14:50 bblack: restart pybal on lvs1016 to move BGP conn to cr2-eqiad - https://gerrit.wikimedia.org/r/#/c/operations/puppet/+/536209 - T226424
  • 14:45 akosiaris@: helmfile [CODFW] Ran 'sync' command on namespace 'kube-system' for release 'calico-policy-controller' .
  • 14:41 akosiaris@: helmfile [CODFW] Ran 'sync' command on namespace 'kube-system' for release 'coredns' .
  • 14:39 akosiaris@: helmfile [CODFW] Ran 'sync' command on namespace 'kube-system' for release 'calico-policy-controller' .
  • 14:37 akosiaris@: helmfile [STAGING] Ran 'sync' command on namespace 'kube-system' for release 'calico-policy-controller' .
  • 14:29 XioNoX: ensure cr1-eqiad is vrrp backup for all groups - T226424
  • 13:22 akosiaris@: helmfile [STAGING] Ran 'sync' command on namespace 'kube-system' for release 'coredns' .
  • 13:03 jmm@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
  • 13:01 jmm@cumin1001: START - Cookbook sre.hosts.downtime
  • 12:57 effie: restarting hhvm on mw1233 and repooling
  • 12:56 effie: depool mw12333
  • 12:38 moritzm: reimaging restbase1018 to stretch
  • 12:03 Amir1: EU SWAT is done
  • 12:03 ladsgroup@deploy1001: Synchronized wmf-config/InitialiseSettings.php: SWAT: Set item terms on write both up to Q20mio (T225055) (duration: 01m 31s)
  • 11:11 akosiaris@: helmfile [EQIAD] Ran 'sync' command on namespace 'kube-system' for release 'calico-policy-controller' .
  • 11:11 akosiaris@: helmfile [CODFW] Ran 'sync' command on namespace 'kube-system' for release 'calico-policy-controller' .
  • 11:09 akosiaris@: helmfile [CODFW] Ran 'apply' command on namespace 'kube-system' for release 'calico-policy-controller' .
  • 11:09 akosiaris@: helmfile [STAGING] Ran 'sync' command on namespace 'kube-system' for release 'calico-policy-controller' .
  • 11:00 akosiaris@: helmfile [STAGING] Ran 'apply' command on namespace 'kube-system' for release 'calico-policy-controller' .
  • 09:42 jynus: compressing tables on labsdb1012 T232446
  • 08:22 vgutierrez: upgrading to acme-chief 0.21 on acmechief-test instances - T219765
  • 08:17 vgutierrez: restarting pybal on lvs1015 and lvs2003 - T176875
  • 08:13 vgutierrez@puppetmaster1001: conftool action : set/pooled=yes; selector: cluster=wdqs,service=wdqs-heavy-queries
  • 08:11 vgutierrez@puppetmaster1001: conftool action : set/pooled=yes; selector: name=puppetmaster1001.eqiad.wmnet,service=wdqs-heavy-queries
  • 08:07 vgutierrez: restarting pybal on lvs2006 - T176875
  • 08:02 vgutierrez: restarting pybal on lvs1016 - T176875
  • 07:45 vgutierrez: uploaded acme-chief 0.21 to apt.wikimedia.org (buster) - T219765
  • 06:51 vgutierrez: restarting ATS-TLS on cp4021 and cp2002 to get the new SSL session cache size - T232298
  • 06:00 marostegui: Stop MySQL on db1073 for decommission T231892
  • 05:59 marostegui: Remove db1073 from tendril and zarcillo T231892
  • 05:26 _joe_: restarting strongswan on all eqiad caches that need it
  • 05:23 _joe_: restarting strongswan on cp1077
  • 03:37 eileen: civicrm revision changed from 32cd5e4953 to 4075e396d5, config revision is 3e22a80bc8
  • 02:13 eileen: civicrm revision changed from 53aeba6318 to 32cd5e4953, config revision is 3e22a80bc8
  • 02:03 XioNoX: repooling ulsfo

2019-09-11

  • 23:50 ejegg: updated payments-wiki from 5432f9c3a4 to 4ebbdb247d
  • 23:20 XioNoX: `set protocols bgp group Netflow cluster 208.80.154.197` on cr2-eqiad
  • 22:43 XioNoX: `set protocols bgp group Netflow cluster 208.80.154.196` on cr1-eqiad
  • 22:36 XioNoX: add BGP session between cr2-eqord and netflow1001
  • 22:30 urandom: decommissioning Cassandra, restbase1018-c -- T224553
  • 20:57 urandom: bootstrapping Cassandra, restbase-dev1005-b -- T224554
  • 20:21 ottomata: stopped and removed eventlogging-service-eventbus - T232122
  • 20:12 ppchelko@deploy1001: Finished deploy [changeprop/deploy@522177f]: Clean up old event style support (duration: 01m 39s)
  • 20:11 ppchelko@deploy1001: Started deploy [changeprop/deploy@522177f]: Clean up old event style support
  • 20:07 ppchelko@deploy1001: Finished deploy [cpjobqueue/deploy@2c9e409]: Clean up old event style support T230049 (duration: 00m 53s)
  • 20:06 ppchelko@deploy1001: Started deploy [cpjobqueue/deploy@2c9e409]: Clean up old event style support T230049
  • 18:43 urandom: decommissioning Cassandra, restbase1018-b -- T224553
  • 18:42 krinkle@deploy1001: Synchronized wmf-config/InitialiseSettings.php: T211124 ed8dd7aad9e5 (duration: 01m 04s)
  • 18:42 nuria@deploy1001: Finished deploy [analytics/refinery@fa994c7]: v0.0.99 of refinery, again, try II. last time shas commited by jenkins were incorrect (duration: 08m 39s)
  • 18:40 krinkle@deploy1001: Synchronized wmf-config/InitialiseSettings-labs.php: no-op ed8dd7aad9e5 (duration: 01m 06s)
  • 18:37 krinkle@deploy1001: Synchronized tests/: no-op ed8dd7aad9e5 (duration: 01m 05s)
  • 18:33 nuria@deploy1001: Started deploy [analytics/refinery@fa994c7]: v0.0.99 of refinery, again, try II. last time shas commited by jenkins were incorrect
  • 18:16 krinkle@deploy1001: Synchronized wmf-config/logging.php: d6865e3365e8 - T211124 (duration: 01m 04s)
  • 18:16 nuria@deploy1001: Finished deploy [analytics/refinery@f4c60a4]: v0.0.99 of refinery (duration: 01m 21s)
  • 18:15 nuria@deploy1001: Started deploy [analytics/refinery@f4c60a4]: v0.0.99 of refinery
  • 18:02 krinkle@deploy1001: Synchronized php-1.34.0-wmf.22/extensions/WikimediaMaintenance/blameStartupRegistry.php: (no justification provided) (duration: 01m 05s)
  • 17:57 XioNoX: upgrade librenms to 1.55
  • 17:43 ayounsi@deploy1001: Finished deploy [librenms/librenms@2a06e98]: Upgrade LibreNMS to 1.55 - T232599 (duration: 00m 09s)
  • 17:42 ayounsi@deploy1001: Started deploy [librenms/librenms@2a06e98]: Upgrade LibreNMS to 1.55 - T232599
  • 17:32 bblack: enable GRE MTU mitigation on eqsin caches (cp5xxx) - T232602
  • 17:27 bblack: restbase2009 - re-pool - T227408
  • 17:07 bblack: restbase2009 - shutdown for hardware work - T227408
  • 17:05 bblack: restbase2009 - depool for hardware work - T227408
  • 16:57 urbanecm@deploy1001: Synchronized php-1.34.0-wmf.21/extensions/GrowthExperiments/modules/homepage/ext.growthExperiments.StartModule.less: SWAT: c0fd061: Homepage: Fix start module layout bugs (T230629, T232549, T225668) (duration: 01m 02s)
  • 16:54 bblack: manually removed decommed eventbus LVS IP on kafka100[23]
  • 16:54 bblack: manually removed decommed eventbus LVS IP on kafka-main1001
  • 16:50 bblack: manually removed decommed eventbus LVS IP on kafka-main200[23]
  • 16:49 bblack: manually removed decommed eventbus LVS IP on kafka-main2001
  • 16:42 urbanecm@deploy1001: Synchronized wmf-config/InitialiseSettings.php: SWAT: 6007fbc: [rowiki] Allow sysops to remove patrollers (T231099) (duration: 01m 03s)
  • 16:39 urandom: decommissioning Cassandra, restbase1018-a -- T224553
  • 16:38 Urbanecm: Run mwscript emptyUserGroup.php --wiki=fawiki OTRS-member (T232554)
  • 16:36 bblack: ran conftool-merge on puppetmaster1001 (manually from sudo -i, to fixup missing updates)
  • 16:35 urbanecm@deploy1001: Synchronized wmf-config/InitialiseSettings.php: SWAT: 76991f2: Remove OTRS-member usergroup from fawiki (T232554) (duration: 01m 05s)
  • 16:32 Urbanecm: mwscript importImages.php --wiki=commonswiki --user=Abbe98 --comment-ext=txt /home/urbanecm/T232346
  • 16:31 urbanecm@deploy1001: Synchronized php-1.34.0-wmf.22/extensions/GrowthExperiments/modules/homepage/ext.growthExperiments.StartModule.less: SWAT: c45d6d0: Homepage: Fix start module layout bugs (T230629, T232549, T225668) (duration: 01m 03s)
  • 16:28 urbanecm@deploy1001: Synchronized wmf-config/InitialiseSettings.php: SWAT: 565fafa: Set noindex for user and user_talk on zhwiki (T231982) (duration: 01m 05s)
  • 16:24 urandom: bootstrapping Cassandra, restbase-dev1005-a -- T224554
  • 16:16 bblack@cumin1001: conftool action : set/pooled=no; selector: cluster=eventbus
  • 16:10 urbanecm@deploy1001: Synchronized wmf-config/throttle.php: SWAT: 510aa6b: Add new whitelist rule for UniversitΓ© de Lorraine course (T232596) (duration: 01m 04s)
  • 16:07 urbanecm@deploy1001: Synchronized wmf-config/InitialiseSettings.php: SWAT: eceaccf: Add autopatrolled user group to az.wikibooks (T231493) (duration: 01m 06s)
  • 15:52 bblack: lvs1015 - remove eventbus.svc.eqiad.wmnet service, restart pybal, etc
  • 15:51 bblack: lvs2003 - remove eventbus.svc.codfw.wmnet service, restart pybal, etc
  • 15:49 bblack: lvs1016 - remove eventbus.svc.eqiad.wmnet service, restart pybal, etc
  • 15:48 bblack: lvs2006 - remove eventbus.svc.codfw.wmnet service, restart pybal, etc
  • 15:03 bblack: downtimed dns-discovery confd health checks for eventbus - T232122
  • 13:13 hashar@deploy1001: Synchronized php: group1 wikis to 1.34.0-wmf.22 (duration: 01m 02s)
  • 13:12 hashar@deploy1001: rebuilt and synchronized wikiversions files: group1 wikis to 1.34.0-wmf.22
  • 12:48 moritzm: upgrade labpuppetmaster* to use facter 3 / puppet 5
  • 12:40 moritzm: removing now obsolete puppet/puppetdb packages from labpuppetmaster* T171188
  • 12:40 moritzm: removing now puppet/puppetdb packages from labpuppetmaster* T171188
  • 11:59 hashar: Restarting Gerrit due to deadlock in the account cache # T224448
  • 11:57 bblack: applying GRE MTU -> MSS fixup to cobalt and gerrit2001 - T218184
  • 11:41 Amir1: EU SWAT is done
  • 11:40 ladsgroup@deploy1001: Synchronized php-1.34.0-wmf.21/maintenance/getReplicaServer.php: SWAT: maintenance/getReplicaServer.php: Remove reference to long-deleted config var (T232268) (duration: 01m 04s)
  • 11:29 ladsgroup@deploy1001: Synchronized wmf-config/InitialiseSettings.php: SWAT: Disable AMC Outreach modal (T231436) (duration: 01m 04s)
  • 11:15 ladsgroup@deploy1001: Synchronized wmf-config/InitialiseSettings.php: SWAT: Set item terms on write both up to Q10mio (T225055) (duration: 01m 03s)
  • 11:10 ladsgroup@deploy1001: Synchronized wmf-config/Wikibase.php: SWAT: TR: set WikibaseTaintedReferencesEnabled true on labs wikidatawiki (T232191) (duration: 01m 03s)
  • 10:57 mobrovac: drop the wiktionary definition keyspace - T231361
  • 10:23 moritzm: removed roentgenium/tureis in Ganeti T224559
  • 10:18 jmm@cumin2001: END (PASS) - Cookbook sre.hosts.decommission (exit_code=0)
  • 10:18 jmm@cumin2001: START - Cookbook sre.hosts.decommission
  • 10:17 jmm@cumin2001: END (PASS) - Cookbook sre.hosts.decommission (exit_code=0)
  • 10:17 jmm@cumin2001: START - Cookbook sre.hosts.decommission
  • 10:01 jynus: stopping and upgrading db1074
  • 09:56 jynus: upgrading mariadb client libary on mariadb root clients
  • 09:46 jiji@deploy1001: Synchronized wmf-config/CommonSettings.php: Push PHP7 traffic to 50% - T219150 (duration: 01m 03s)
  • 09:45 mobrovac@deploy1001: Finished deploy [restbase/deploy@cf2ca76]: Stop using storage for enwiktionary definition and expose new PCS javascript endpoints, take #3a (duration: 12m 15s)
  • 09:32 mobrovac@deploy1001: Started deploy [restbase/deploy@cf2ca76]: Stop using storage for enwiktionary definition and expose new PCS javascript endpoints, take #3a
  • 09:32 mobrovac@deploy1001: Finished deploy [restbase/deploy@cf2ca76]: Stop using storage for enwiktionary definition and expose new PCS javascript endpoints, take #3 (duration: 13m 18s)
  • 09:19 mobrovac@deploy1001: Started deploy [restbase/deploy@cf2ca76]: Stop using storage for enwiktionary definition and expose new PCS javascript endpoints, take #3
  • 09:16 mobrovac@deploy1001: Finished deploy [restbase/deploy@cf2ca76]: Stop using storage for enwiktionary definition and expose new PCS javascript endpoints, take #2 (duration: 03m 59s)
  • 09:13 mobrovac@deploy1001: Started deploy [restbase/deploy@cf2ca76]: Stop using storage for enwiktionary definition and expose new PCS javascript endpoints, take #2
  • 09:11 mobrovac@deploy1001: Finished deploy [restbase/deploy@cf2ca76]: Stop using storage for enwiktionary definition and expose new PCS javascript endpoints - T231361 T232449 (duration: 03m 24s)
  • 09:08 mobrovac@deploy1001: Started deploy [restbase/deploy@cf2ca76]: Stop using storage for enwiktionary definition and expose new PCS javascript endpoints - T231361 T232449
  • 08:36 mobrovac@deploy1001: Finished deploy [changeprop/deploy@7a8ab89]: Stop pregenerating enwiktionary page/definition, take #2 - T231361 (duration: 02m 13s)
  • 08:34 mobrovac@deploy1001: Started deploy [changeprop/deploy@7a8ab89]: Stop pregenerating enwiktionary page/definition, take #2 - T231361
  • 08:24 mobrovac@deploy1001: Finished deploy [changeprop/deploy@069d297]: Revert Stop pregenerating enwiktionary page/definition (duration: 00m 34s)
  • 08:24 mobrovac@deploy1001: Started deploy [changeprop/deploy@069d297]: Revert Stop pregenerating enwiktionary page/definition
  • 08:22 mobrovac@deploy1001: Finished deploy [changeprop/deploy@56a8342]: Stop pregenerating enwiktionary page/definition - T231361 (duration: 02m 45s)
  • 08:19 mobrovac@deploy1001: Started deploy [changeprop/deploy@56a8342]: Stop pregenerating enwiktionary page/definition - T231361
  • 08:13 elukey: add thirdparty/amd-rocm271 to buster-wikimedia and update it with ROCm 2.7.1 packages
  • 08:09 jmm@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
  • 08:07 elukey: execute reprepro clearvanished on install1002 to clear buster-wikimedia|thirdparty/amd-rocm27 (not used anymore)
  • 08:07 jmm@cumin1001: START - Cookbook sre.hosts.downtime
  • 08:04 marostegui@cumin1001: dbctl commit (dc=all): 'Fully repool db1122', diff saved to https://phabricator.wikimedia.org/P9088 and previous config saved to /var/cache/conftool/dbconfig/20190911-080450-marostegui.json
  • 07:52 moritzm: reimaging restbase-dev1005 to Stretch T224554
  • 07:51 marostegui@cumin1001: dbctl commit (dc=all): 'More traffic to db1122', diff saved to https://phabricator.wikimedia.org/P9087 and previous config saved to /var/cache/conftool/dbconfig/20190911-075139-marostegui.json
  • 07:33 marostegui@cumin1001: dbctl commit (dc=all): 'More traffic to db1122', diff saved to https://phabricator.wikimedia.org/P9086 and previous config saved to /var/cache/conftool/dbconfig/20190911-073335-marostegui.json
  • 07:23 marostegui@cumin1001: dbctl commit (dc=all): 'Slowly repool db1122', diff saved to https://phabricator.wikimedia.org/P9085 and previous config saved to /var/cache/conftool/dbconfig/20190911-072344-marostegui.json
  • 07:14 marostegui@cumin1001: dbctl commit (dc=all): 'Slowly repool db1122', diff saved to https://phabricator.wikimedia.org/P9084 and previous config saved to /var/cache/conftool/dbconfig/20190911-071450-marostegui.json
  • 07:07 marostegui: Stop MySQL on db1122 to reboot for a kernel upgrade T230785
  • 07:06 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db1122 to reboot for kernel upgrade T230785', diff saved to https://phabricator.wikimedia.org/P9083 and previous config saved to /var/cache/conftool/dbconfig/20190911-070635-marostegui.json
  • 07:00 hashar: Restarting Gerrit - T224448
  • 06:58 hashar: Restarting Gerrit
  • 06:45 marostegui: Drop unused database puppet on m1 - T231539
  • 06:19 marostegui@cumin1001: dbctl commit (dc=all): 'Re-organize s1 codfw weights and roles - T230106', diff saved to https://phabricator.wikimedia.org/P9082 and previous config saved to /var/cache/conftool/dbconfig/20190911-061924-marostegui.json
  • 06:17 marostegui@cumin1001: dbctl commit (dc=all): 'Re-organize s1 codfw weights and roles - T230106', diff saved to https://phabricator.wikimedia.org/P9081 and previous config saved to /var/cache/conftool/dbconfig/20190911-061659-marostegui.json
  • 05:48 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db2048, will be decommissioned T230106', diff saved to https://phabricator.wikimedia.org/P9080 and previous config saved to /var/cache/conftool/dbconfig/20190911-054855-marostegui.json
  • 05:47 marostegui@cumin1001: dbctl commit (dc=all): 'Promote db2112 to s1 codfw master T230106', diff saved to https://phabricator.wikimedia.org/P9079 and previous config saved to /var/cache/conftool/dbconfig/20190911-054753-marostegui.json
  • 05:29 marostegui: Switchover s1 codfw master db2048 -> db2112 T230106
  • 03:31 eileen: civicrm revision changed from b343642c76 to 53aeba6318, config revision is 3e22a80bc8

2019-09-10

  • 20:46 ejegg: updated payments-wiki from 15baf7f58b to 5432f9c3a4
  • 20:24 XioNoX: add MSS clamp on install1002 - T2324563
  • 20:20 XioNoX: add MSS clamp on archiva1001 - T232456
  • 18:42 herron: rolling out "Aggregate IPsec Tunnel Status” icinga check, please disregard for the time being if it alerts
  • 18:15 jforrester@deploy1001: Synchronized wmf-config/CommonSettings.php: T229863 Remove EventBusRCFeedEngine eventServiceName (duration: 01m 05s)
  • 18:15 XioNoX: rollback test add static route on bast3002 to force advmss
  • 18:10 XioNoX: test add static route on bast3002 to force advmss
  • 17:58 jforrester@deploy1001: Synchronized wmf-config/logging.php: T232042 Direct Parsoid/PHP rt-testing log events to a different target (duration: 01m 02s)
  • 17:56 jforrester@deploy1001: Synchronized wmf-config/ProductionServices.php: T232122 Stop setting production value for eventlogging-service (duration: 01m 00s)
  • 17:55 jforrester@deploy1001: Synchronized wmf-config/CommonSettings.php: T232122 Remove use of eventlogging-service (duration: 01m 03s)
  • 17:33 jforrester@deploy1001: Synchronized wmf-config/CommonSettings.php: Re-sync for safety after scap errored with a broken pipe (duration: 01m 03s)
  • 17:31 jforrester@deploy1001: Synchronized wmf-config/CommonSettings.php: Variant configuration: Write to static (JSON) as well as serialised cache for testwiki T223602 (duration: 01m 02s)
  • 17:29 jforrester@deploy1001: Synchronized multiversion/MWConfigCacheGenerator.php: Variant configuration: Be able to write to static (JSON) as well as serialised cache (duration: 01m 03s)
  • 16:35 elukey: reboot analytics-tool1001 via ganeti gnt - not reachable via ssh
  • 16:24 urandom: disabling reserved space on restbase-dev1005:/dev/mapper/restbase--dev1005--vg-srv -- T224554
  • 16:10 marostegui: Failover m1 from db1063 to db1135 - T231403
  • 15:58 ladsgroup@deploy1001: Synchronized wmf-config/InitialiseSettings.php: Revert "Set items term store on write both for all of Wikidata" (duration: 01m 02s)
  • 15:58 thcipriani: restarting gerrit (again) https://grafana.wikimedia.org/d/Bw2mQ3iWz/gerrit-javamelody?orgId=1&from=1568109359163&to=1568130959163&var-Application=&var-Window=30m due to T224448
  • 15:39 hashar@deploy1001: rebuilt and synchronized wikiversions files: group0 to 1.34.0-wmf.22
  • 15:37 marostegui: Start pre-switchover for m1 steps T231403
  • 15:35 hashar@deploy1001: Synchronized php-1.34.0-wmf.22/includes/libs/http/MultiHttpClient.php: Revert "Improve MultiHttpClient connection concurrency and reuse" - T232487 (duration: 00m 55s)
  • 15:33 reedy@deploy1001: Synchronized php-1.34.0-wmf.22/includes/libs/http/MultiHttpClient.php: T232487 (duration: 00m 55s)
  • 15:13 hashar@deploy1001: rebuilt and synchronized wikiversions files: Revert group0 to 1.34.0-wmf.22 # T220747
  • 14:48 hashar@deploy1001: scap failed: average error rate on 3/11 canaries increased by 10x (rerun with --force to override this check, see https://logstash.wikimedia.org/goto/db09a36be5ed3e81155041f7d46ad040 for details)
  • 14:45 akosiaris: repool cp1075 ats-be, releases cert updated
  • 14:44 akosiaris@puppetmaster1001: conftool action : set/pooled=yes; selector: name=cp1075.eqiad.wmnet,dc=eqiad,cluster=cache_text,service=ats-be
  • 14:44 XioNoX: depool ulsfo for DC UPS power maintenance (see maint-announce)
  • 14:36 @: helmfile [EQIAD] Ran 'apply' command on namespace 'eventgate-main' for release 'main' .
  • 14:32 hashar@deploy1001: Finished scap: testwiki to php-1.34.0-wmf.22 and rebuild l10n cache # T220747 (duration: 34m 03s)
  • 14:31 @: helmfile [CODFW] Ran 'apply' command on namespace 'eventgate-main' for release 'main' .
  • 14:29 @: helmfile [STAGING] Ran 'apply' command on namespace 'eventgate-main' for release 'main' .
  • 14:26 @: helmfile [EQIAD] Ran 'apply' command on namespace 'eventgate-analytics' for release 'analytics' .
  • 14:20 @: helmfile [CODFW] Ran 'apply' command on namespace 'eventgate-analytics' for release 'analytics' .
  • 14:18 @: helmfile [STAGING] Ran 'apply' command on namespace 'eventgate-analytics' for release 'analytics' .
  • 14:18 ottomata: increasing max_body_size to 10mb for all eventgate services - T232362
  • 14:14 akosiaris: depool cp1075 ats-be to test helmfile sync
  • 14:14 akosiaris@puppetmaster1001: conftool action : set/pooled=no; selector: name=cp1075.eqiad.wmnet,dc=eqiad,cluster=cache_text,service=ats-be
  • 13:58 hashar@deploy1001: Started scap: testwiki to php-1.34.0-wmf.22 and rebuild l10n cache # T220747
  • 13:56 hashar: Applied security patches to 1.34.0-wmf.22 # T220747
  • 13:53 hashar: scap prep 1.34.0-wmf.22 # T220747
  • 13:34 elukey: reboot stat1005 to clear incosistent process state after tensorflow tests
  • 13:23 hashar: ./make-wmf-branch -n 1.34.0-wmf.22 -o master -c extensions/CharInsert # T220747
  • 13:12 thcipriani: restarting gerrit
  • 13:11 hashar: Gerrit experimenting difficulty due to ongoing wmf branch cut - T231872
  • 13:01 moritzm: copied prometheus-jmx-exporter to buster-wikimedia (from stretch-wikimedia, just a package with some jars)
  • 12:40 cmjohnson1: the new pdus are racked in b6
  • 12:14 cmjohnson1: removing power from ps1-b6 side B...mgmt should not be affected
  • 11:20 cmjohnson1: swapping the PDU in rack B6 eqiad T227541
  • 11:09 Urbanecm: EU SWAT done
  • 11:09 urbanecm@deploy1001: Synchronized wmf-config/InitialiseSettings.php: SWAT: c780fa4: Bump MobileWebUIActionsTracking sampling rate to 10 percent (T220016) (duration: 00m 55s)
  • 11:07 ema@puppetmaster1001: conftool action : set/weight=100; selector: service=ats-be,dc=eqiad,name=cp1075.eqiad.wmnet
  • 11:06 ema: cp1075: set weight in etcd back to 100
  • 11:06 urbanecm@deploy1001: Synchronized wmf-config/InitialiseSettings.php: SWAT: 6afe963: Set items term store on write both for all of Wikidata (T225055) (duration: 00m 55s)
  • 10:51 akosiaris@: helmfile [CODFW] Ran 'apply' command on namespace 'kube-system' for release 'calico-policy-controller' .
  • 10:45 akosiaris@: helmfile [CODFW] Ran 'apply' command on namespace 'kube-system' for release 'coredns' .
  • 10:45 akosiaris@: helmfile [CODFW] Ran 'apply' command on namespace 'kube-system' for release 'rbac-deploy-clusterrole' .
  • 10:34 akosiaris@: helmfile [STAGING] Ran 'apply' command on namespace 'kube-system' for release 'calico-policy-controller' .
  • 10:34 akosiaris@: helmfile [STAGING] Ran 'apply' command on namespace 'kube-system' for release 'coredns' .
  • 10:34 akosiaris@: helmfile [STAGING] Ran 'apply' command on namespace 'kube-system' for release 'rbac-deploy-clusterrole' .
  • 10:34 @: helmfile [STAGING] Ran 'sync' command on namespace 'kube-system' for release 'calico-policy-controller' .
  • 10:34 @: helmfile [STAGING] Ran 'sync' command on namespace 'kube-system' for release 'coredns' .
  • 10:34 @: helmfile [STAGING] Ran 'sync' command on namespace 'kube-system' for release 'rbac-deploy-clusterrole' .
  • 10:32 vgutierrez: repool cp5001 with ats-tls collecting memory usage details every hour - T232298
  • 09:56 elukey: restart archiva on archiva1001 - UI not working (probably due to connections to maven central being stuck)
  • 09:50 moritzm: installing ghostscript security updates on jessie
  • 09:37 moritzm: added jbond as chanserv ops for #wikimedia-operations
  • 08:08 jmm@cumin2001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
  • 08:06 jmm@cumin2001: START - Cookbook sre.hosts.downtime
  • 07:42 moritzm: reimaging mw2231 after hardware maintenance T231192
  • 07:21 moritzm: iron.wikimedia.org is no longer a bastion host
  • 06:57 moritzm: upgrading snapshot* to PHP 7.2.22 T230024
  • 05:46 marostegui@deploy1001: Synchronized wmf-config/db-eqiad.php: Remove db1073 from config T231892 (duration: 00m 54s)
  • 05:45 marostegui@deploy1001: Synchronized wmf-config/db-codfw.php: Remove db1073 from config T231892 (duration: 00m 55s)
  • 05:35 marostegui: Stop MySQL on db2047 T231852
  • 05:35 marostegui: Remove db2047 from tendril and zarcillo - T231852
  • 05:33 urandom: decommissioning Cassandra, restbase-dev1005-b -- T224554
  • 05:15 marostegui@cumin1001: dbctl commit (dc=all): 'Pool db1104 into API T230762', diff saved to https://phabricator.wikimedia.org/P9071 and previous config saved to /var/cache/conftool/dbconfig/20190910-051529-marostegui.json
  • 05:02 marostegui@cumin1001: dbctl commit (dc=all): 'Promote db1109 to s8 master and remove read-only from s8 T227062', diff saved to https://phabricator.wikimedia.org/P9070 and previous config saved to /var/cache/conftool/dbconfig/20190910-050213-marostegui.json
  • 05:00 marostegui@cumin1001: dbctl commit (dc=all): 'Set s8 as read-only for maintenance T230762', diff saved to https://phabricator.wikimedia.org/P9069 and previous config saved to /var/cache/conftool/dbconfig/20190910-050046-marostegui.json
  • 05:00 marostegui: Starting s8 failover from db1104 to db1109 - T227062
  • 04:46 vgutierrez: depool cp5001 for memory leak debugging on ATS - T232298
  • 04:23 marostegui: Start topology changes on s8, connect everything under db1109 - T230762
  • 04:22 marostegui@cumin1001: dbctl commit (dc=all): 'Set db1109 with weight 0 and depool it from API T230762', diff saved to https://phabricator.wikimedia.org/P9068 and previous config saved to /var/cache/conftool/dbconfig/20190910-042243-marostegui.json
  • 04:18 marostegui: Start s8 (wikidata) pre switchover steps T230762
  • 00:59 krinkle@deploy1001: Finished deploy [performance/navtiming@f2a0863]: (no justification provided) (duration: 00m 05s)
  • 00:59 krinkle@deploy1001: Started deploy [performance/navtiming@f2a0863]: (no justification provided)
  • 00:57 Krinkle: krinkle@deploy1001: Deploy performance/navtiming f2a0863 - T226539
  • 00:41 urandom: decommissioning Cassandra, restbase-dev1005-a -- T224554

2019-09-09

  • 23:44 catrope@deploy1001: Synchronized php-1.34.0-wmf.21/skins/MinervaNeue/: T232260 (duration: 00m 57s)
  • 22:28 ejegg: updated payments-wiki from 51d9ed79b6 to 15baf7f58b
  • 20:50 urandom: bootstrapping Cassandra, restbase-dev1004-b -- T224554
  • 19:48 mholloway-shell@deploy1001: Finished deploy [mobileapps/deploy@533d541]: Update mobileapps to