Server Admin Log/Archive 46

2021-10-31

21:49 urbanecm: urbanecm@mwmaint1002:~$ mwscript userOptions.php --wiki=dewiki --nowarn --touserid 3802752 --old 'linkrecommendation' --new 'control' 'growthexperiments-homepage-variant' # T294712
13:14 urbanecm: Re-create global account User:Calvinius and attach existing local accounts to it (T291745)

2021-10-30

19:25 dcausse: restarting blazegraph on wdqs1007 (jvm stuck)
13:29 urbanecm: Start server-side upload for 1 video file (T291418)

2021-10-29

22:57 urbanecm: [urbanecm@mwmaint1002 ~]$ mwscript extensions/CentralAuth/maintenance/attachAccount.php --wiki=foundationwiki --userlist users.txt # T205347, users.txt is at P17641
22:02 robh@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host cp4036.ulsfo.wmnet with OS buster
21:32 robh@cumin1001: START - Cookbook sre.hosts.reimage for host cp4036.ulsfo.wmnet with OS buster
21:20 mwdebug-deploy@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
21:20 legoktm@deploy1002: Synchronized php-1.38.0-wmf.6/includes/libs/filebackend/SwiftFileBackend.php: Force using HTTP 1.1 for SwiftFileBackend (T275752) (duration: 00m 55s)
21:17 robh@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host cp4034.ulsfo.wmnet with OS buster
21:17 legoktm@deploy1002: Synchronized php-1.38.0-wmf.6/includes/libs/http/MultiHttpClient.php: MultiHttpClient: Allow setting HTTP protocol version in curl (T275752) (duration: 00m 57s)
21:17 mwdebug-deploy@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
21:16 robh@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host cp4035.ulsfo.wmnet with OS buster
21:06 robh@cumin1001: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host cp4036.ulsfo.wmnet with OS buster
20:43 robh@cumin1001: START - Cookbook sre.hosts.reimage for host cp4036.ulsfo.wmnet with OS buster
20:43 robh@cumin1001: START - Cookbook sre.hosts.reimage for host cp4035.ulsfo.wmnet with OS buster
20:43 robh@cumin1001: START - Cookbook sre.hosts.reimage for host cp4034.ulsfo.wmnet with OS buster
20:39 robh@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host cp4033.ulsfo.wmnet with OS buster
19:56 robh@cumin1001: START - Cookbook sre.hosts.reimage for host cp4033.ulsfo.wmnet with OS buster
19:49 robh@cumin1001: END (ERROR) - Cookbook sre.hosts.reimage (exit_code=97) for host cp4033.ulsfo.wmnet with OS buster
19:15 robh@cumin1001: START - Cookbook sre.hosts.reimage for host cp4033.ulsfo.wmnet with OS buster
17:47 urbanecm: Connect Neil Shah-Quinn (WMF)@foundationwiki to SUL (T205347)
17:42 urbanecm: [urbanecm@mwmaint1002 /srv/mediawiki/php/maintenance]$ mwscript reassignEdits.php --wiki=foundationwiki --norc 'Neil P. Quinn-WMF' 'Neil Shah-Quinn (WMF)' # part of SUL finalisation at foundationwiki, T205347
16:41 urbanecm: Connect Babel AutoCreate@foundationwiki to SUL (T205347)
16:37 urbanecm: [urbanecm@mwmaint1002 ~]$ mwscript extensions/CentralAuth/maintenance/attachAccount.php --wiki=foundationwiki --userlist users.txt # T205347, users.txt is at P17640
16:22 urbanecm: [urbanecm@mwmaint1002 ~]$ mwscript extensions/CentralAuth/maintenance/attachAccount.php --wiki=foundationwiki --userlist users.txt # T205347, users.txt is at P17639
12:18 elukey@cumin1001: END (PASS) - Cookbook sre.kafka.roll-restart-brokers (exit_code=0) for Kafka A:kafka-test-eqiad cluster: Roll restart of jvm daemons for openjdk upgrade. - elukey@cumin1001
11:00 urbanecm: [urbanecm@mwdebug1001 /srv/mediawiki]$ scap pull # livehacking ended
10:58 urbanecm: stashing on mwdebug1001
10:38 elukey@cumin1001: START - Cookbook sre.kafka.roll-restart-brokers for Kafka A:kafka-test-eqiad cluster: Roll restart of jvm daemons for openjdk upgrade. - elukey@cumin1001
10:20 hnowlan@deploy1002: helmfile [staging] Ran 'sync' command on namespace 'api-gateway' for release 'production' .
10:20 hnowlan@deploy1002: helmfile [staging] Ran 'sync' command on namespace 'api-gateway' for release 'staging' .
08:50 _joe_: depooling mw1305 while running tests
07:21 XioNoX: stop advertisement to NaWas - T288505

2021-10-28

23:50 robh@cumin1001: END (ERROR) - Cookbook sre.hosts.reimage (exit_code=97) for host cp4033.ulsfo.wmnet with OS buster
23:50 robh@cumin1001: END (ERROR) - Cookbook sre.hosts.reimage (exit_code=97) for host cp4034.ulsfo.wmnet with OS buster
23:46 robh@cumin1001: START - Cookbook sre.hosts.reimage for host cp4034.ulsfo.wmnet with OS buster
23:43 robh@cumin1001: START - Cookbook sre.hosts.reimage for host cp4033.ulsfo.wmnet with OS buster
23:33 brennen: end of UTC late backport & config window
23:28 cjming@deploy1002: Synchronized wmf-config/InitialiseSettings.php: Config: Add User and User talk to $wgExemptFromUserRobotsControl on zhwiki (T288947) (duration: 00m 56s)
23:21 mwdebug-deploy@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
23:18 mwdebug-deploy@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
22:16 mutante: mwdebug1001 - letting puppet remove all mediawiki font packages using new Hiera key 'profile::mediawiki::webserver::install_fonts: false' to make sure we really don't need them (T294378)
22:06 mutante: releases1002 - releases2002, something broke puppet here about 27.5 hours ago. lookup() did not find a value for the name 'profile::docker::storage::physical_volumes'
22:03 mwdebug-deploy@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
22:01 mutante: re-enabled puppet on deploy*, parse* and thenr everything else
21:59 mwdebug-deploy@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
21:56 mutante: re-enabled puppet on mw-app-canary, mwmaint, labweb1002,..
21:53 mutante: re-enabled puppet on mw-api-canary
21:48 arnoldokoth: restore script scheduled to run on gitlab2001 (T285867)
21:46 arnoldokoth: restoreccccccvkvhgbvtklgce kkbeuvvuskljihickdbgcunljcr scheduled to run on gitlab2001 (T285867)
21:42 mutante: sudo cumin 'C:profile::mediawiki::common' "disable-puppet 'gerrit:734798 - ${USER}'"
20:37 mutante: ensured gitlab restore timer is running only on passive server and re-enabled it - https://gerrit.wikimedia.org/r/c/operations/puppet/+/735437 T274463
20:30 mutante: gitlab2001 - re-enabled gitlab-restore-from-backup service
20:19 mwdebug-deploy@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
20:16 mwdebug-deploy@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
20:13 urbanecm@deploy1002: Synchronized wmf-config/InitialiseSettings.php: c23e7ab: foundationwiki: Set wmgLocalAuthLoginOnly=false temporarily (T205347) (duration: 00m 55s)
20:06 mwdebug-deploy@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
20:03 twentyafterfour@deploy1002: rebuilt and synchronized wikiversions files: group2 wikis to 1.38.0-wmf.6 refs T293947
20:03 mwdebug-deploy@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
19:53 mwdebug-deploy@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
19:52 twentyafterfour@deploy1002: Synchronized php: group1 wikis to 1.38.0-wmf.6 refs T293947 (duration: 01m 02s)
19:51 twentyafterfour@deploy1002: rebuilt and synchronized wikiversions files: group1 wikis to 1.38.0-wmf.6 refs T293947
19:50 mwdebug-deploy@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
19:43 legoktm@deploy1002: Synchronized php-1.38.0-wmf.6/includes/libs/http/MultiHttpClient.php: Hack: temporarily log headers in MultiHttpClient (duration: 01m 02s)
19:41 robh@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
19:40 mwdebug-deploy@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
19:40 legoktm@deploy1002: Synchronized php-1.38.0-wmf.5/includes/libs/http/MultiHttpClient.php: Hack: temporarily log headers in MultiHttpClient (duration: 01m 02s)
19:37 robh@cumin1001: START - Cookbook sre.dns.netbox
19:37 mwdebug-deploy@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
19:36 urbanecm@deploy1002: Synchronized php-1.38.0-wmf.6/includes/GlobalFunctions.php: b517ebd: Revert "wfParseUrl: rely on parse_url for proto-relative urls" (T294559) (duration: 01m 03s)
19:29 mutante: [netflow4001:~] $ sudo systemctl reset-failed
19:23 legoktm: manually restarted nginx on ncredir4002 after accidental ganeti reboot
19:23 mwdebug-deploy@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
19:19 mwdebug-deploy@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
19:15 urbanecm@deploy1002: Synchronized php-1.38.0-wmf.6/extensions/FlaggedRevs/: 2cd2a4e: REST: Avoid making wpaccuracy required in API requests (T294544) (duration: 01m 03s)
19:12 urbanecm@deploy1002: Synchronized wmf-config/InitialiseSettings.php: 60413dc: foundationwiki: Use shared OAuth tables (T205347) (duration: 01m 04s)
19:10 mwdebug-deploy@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
19:07 mwdebug-deploy@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
18:57 mwdebug-deploy@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
18:54 mwdebug-deploy@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
18:53 urbanecm@deploy1002: Synchronized wmf-config/InitialiseSettings.php: 4e0200e: emailuser ratelimit: Use user-global rather than user (T293866) (duration: 01m 04s)
17:23 cstone: civicrm revision changed from dba74c443b to 403be9ce05
16:53 mwdebug-deploy@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
16:50 mwdebug-deploy@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
16:44 urbanecm@deploy1002: Synchronized wmf-config/InitialiseSettings.php: REVERT: 78783a7: foundationwiki: Enable Translate extension (T205349) (duration: 01m 04s)
16:31 urbanecm@deploy1002: Synchronized wmf-config/InitialiseSettings.php: 78783a7: foundationwiki: Enable Translate extension (T205349) (duration: 01m 25s)
16:30 mwdebug-deploy@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
16:27 mwdebug-deploy@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
16:24 urbanecm: foundationwiki: Create DB tables for translate extension (T205349)
16:17 urbanecm: Attach BStorm (WMF)@foundationwiki to SUL (T205347)
15:59 cmjohnson@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host cloudcephosd1021.eqiad.wmnet with OS bullseye
15:47 mwdebug-deploy@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
15:44 ayounsi@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
15:44 mwdebug-deploy@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
15:43 urbanecm@deploy1002: Synchronized wmf-config/InitialiseSettings.php: bcc910f: foundationwiki: Revoke editsitejson and editinterface from users (T205347) (duration: 01m 04s)
15:41 ayounsi@cumin1001: START - Cookbook sre.dns.netbox
15:33 cmjohnson@cumin1001: START - Cookbook sre.hosts.reimage for host cloudcephosd1021.eqiad.wmnet with OS bullseye
15:24 cmjohnson@cumin1001: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host cloudcephosd1021.eqiad.wmnet with OS bullseye
15:24 cmjohnson@cumin1001: START - Cookbook sre.hosts.reimage for host cloudcephosd1021.eqiad.wmnet with OS bullseye
15:22 cmjohnson@cumin1001: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host cloudcephosd1021.eqiad.wmnet with OS bullseye
15:22 cmjohnson@cumin1001: START - Cookbook sre.hosts.reimage for host cloudcephosd1021.eqiad.wmnet with OS bullseye
15:05 volans@cumin2002: END (PASS) - Cookbook sre.hosts.ipmi-password-reset (exit_code=0)
15:04 volans@cumin2002: Updating IPMI password on 8 hosts - volans@cumin2002 - T283050
15:03 volans@cumin2002: START - Cookbook sre.hosts.ipmi-password-reset
14:50 cmjohnson@cumin1001: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host cloudcephosd1021.eqiad.wmnet with OS bullseye
14:50 cmjohnson@cumin1001: START - Cookbook sre.hosts.reimage for host cloudcephosd1021.eqiad.wmnet with OS bullseye
14:49 volans@cumin2002: END (PASS) - Cookbook sre.hosts.ipmi-password-reset (exit_code=0)
14:46 volans@cumin2002: Updating IPMI password on 16 hosts - volans@cumin2002 - T283050
14:46 volans@cumin2002: START - Cookbook sre.hosts.ipmi-password-reset
14:25 urbanecm: [urbanecm@mwmaint1002 ~/foundationwiki-sul]$ mwscript extensions/CentralAuth/maintenance/attachAccount.php --wiki=foundationwiki --userlist users.txt # T205347, users.txt P17630
14:14 mwdebug-deploy@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
14:10 mwdebug-deploy@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
14:09 twentyafterfour@deploy1002: Synchronized php: group1 wikis to 1.38.0-wmf.5 refs T293947 (duration: 01m 03s)
14:08 twentyafterfour@deploy1002: rebuilt and synchronized wikiversions files: group1 wikis to 1.38.0-wmf.5 refs T293947
14:05 twentyafterfour: rolling back group1 wikis to 1.38.0-wmf.5 (T293947) due to UBN T294559
14:00 elukey@deploy1002: helmfile [ml-serve-eqiad] DONE helmfile.d/admin 'sync'.
13:59 elukey@deploy1002: helmfile [ml-serve-eqiad] START helmfile.d/admin 'sync'.
13:42 cmjohnson@cumin1001: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host cloudcephosd1021.eqiad.wmnet with OS bullseye
13:42 cmjohnson@cumin1001: START - Cookbook sre.hosts.reimage for host cloudcephosd1021.eqiad.wmnet with OS bullseye
13:40 cmjohnson@cumin1001: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host cloudcephosd1021.eqiad.wmnet with OS bullseye
13:40 cmjohnson@cumin1001: START - Cookbook sre.hosts.reimage for host cloudcephosd1021.eqiad.wmnet with OS bullseye
13:40 bblack: esams: switching unified TLS cert to digicert-2021 (natural rollout over next ~30 mins)
13:33 XioNoX: rollback prepend our AS on all es/knams uplinks (except NaWas) - T288505
13:23 volans@cumin2002: END (PASS) - Cookbook sre.hosts.ipmi-password-reset (exit_code=0)
13:23 volans@cumin2002: Updating IPMI password on 1 hosts - volans@cumin2002 - T283050
13:22 volans@cumin2002: START - Cookbook sre.hosts.ipmi-password-reset
12:48 urbanecm: [urbanecm@mwmaint1002 ~/foundationwiki-sul]$ mwscript extensions/CentralAuth/maintenance/migrateAccount.php --wiki=foundationwiki --safe --auto --userlist users.txt # T205347, users.txt P17629
12:45 urbanecm: [urbanecm@mwmaint1002 ~/foundationwiki-sul]$ mwscript extensions/CentralAuth/maintenance/migrateAccount.php --wiki=foundationwiki --safe --auto --userlist users.txt # T205347, users.txt is at P17628
12:40 cmjohnson@cumin1001: START - Cookbook sre.hosts.reimage for host cloudcephosd1021.eqiad.wmnet with OS bullseye
12:35 urbanecm: [urbanecm@mwmaint1002 ~/foundationwiki-sul]$ mwscript extensions/CentralAuth/maintenance/attachAccount.php --wiki=foundationwiki --userlist users.txt # T205347; users.txt is at P17627
12:27 urbanecm: [urbanecm@mwmaint1002 ~]$ mwscript extensions/CentralAuth/maintenance/migrateAccount.php --wiki=foundationwiki --safe --auto --userlist users.txt # T205347, users.txt is P17626
12:25 XioNoX: prepend our AS on all es/knams uplinks (except NaWas) - T288505
12:20 mwdebug-deploy@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
12:17 urbanecm@deploy1002: Synchronized wmf-config/config/foundationwiki.yaml: 6f557db: Connect foundationwiki to SUL (T205347; 3/3) (duration: 01m 02s)
12:17 mwdebug-deploy@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
12:16 urbanecm@deploy1002: Synchronized dblists/fishbowl.dblist: 6f557db: Connect foundationwiki to SUL (T205347; 2/3) (duration: 01m 03s)
12:14 urbanecm@deploy1002: Synchronized wmf-config/CommonSettings.php: 6f557db: Connect foundationwiki to SUL (T205347; 1/3) (duration: 01m 03s)
12:11 urbanecm@deploy1002: Synchronized wmf-config/InitialiseSettings.php: 6f557db: Connect foundationwiki to SUL (T205347) (duration: 01m 03s)
12:07 mwdebug-deploy@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
12:04 mwdebug-deploy@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
12:03 urbanecm: urbanecm@mwmaint1002:~$ mwscript extensions/CentralAuth/maintenance/migratePass0.php --wiki=foundationwiki # T205347, with 717506 pulled to mwmaint1002
12:02 XioNoX: advertise esams prefix to NaWas - T288505
11:45 mwdebug-deploy@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
11:41 mwdebug-deploy@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
11:40 volans@cumin2002: END (PASS) - Cookbook sre.hosts.ipmi-password-reset (exit_code=0)
11:39 volans@cumin2002: Updating IPMI password on 1 hosts - volans@cumin2002 - T283050
11:39 volans@cumin2002: START - Cookbook sre.hosts.ipmi-password-reset
11:37 volans@cumin2002: END (FAIL) - Cookbook sre.hosts.ipmi-password-reset (exit_code=1)
11:37 volans@cumin2002: START - Cookbook sre.hosts.ipmi-password-reset
11:32 mwdebug-deploy@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
11:29 urbanecm@deploy1002: Synchronized wmf-config/InitialiseSettings.php: 77df86a: Add permissions to eleminators on viwiki (T294530) (duration: 01m 04s)
11:29 mwdebug-deploy@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
11:06 volans@cumin2002: END (PASS) - Cookbook sre.hosts.ipmi-password-reset (exit_code=0)
11:06 volans@cumin2002: Updating IPMI password on 0 hosts - volans@cumin2002 - T283050
11:05 volans@cumin2002: START - Cookbook sre.hosts.ipmi-password-reset
11:05 volans@cumin2002: END (FAIL) - Cookbook sre.hosts.ipmi-password-reset (exit_code=99)
11:05 volans@cumin2002: START - Cookbook sre.hosts.ipmi-password-reset
09:28 mwdebug-deploy@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
09:25 mwdebug-deploy@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
09:19 urbanecm: Wiki creation done. pwnwiki, amiwiki and lmowiktionary got created.
09:19 urbanecm@deploy1002: Synchronized wmf-config/InitialiseSettings.php: 532f8e5: lmowiktionary: Create Appendix namespace (T291390) (duration: 01m 03s)
09:15 urbanecm@deploy1002: Synchronized wmf-config/interwiki.php: Update interwiki cache (duration: 02m 06s)
09:15 mwdebug-deploy@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
09:13 urbanecm@deploy1002: Synchronized wmf-config/InitialiseSettings.php: Creating lmowiktionary (T291390) (duration: 01m 02s)
09:13 urbanecm: [urbanecm@mwmaint1002 ~]$ mwscript extensions/CirrusSearch/maintenance/UpdateSearchIndexConfig.php --wiki=lmowiktionary --cluster=all # T291390
09:12 urbanecm@deploy1002: Synchronized wmf-config/logos.php: Creating lmowiktionary (T291390) (duration: 01m 02s)
09:12 mwdebug-deploy@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
09:11 urbanecm@deploy1002: Synchronized static/images/project-logos/: Creating lmowiktionary (T291390) (duration: 01m 02s)
09:10 urbanecm@deploy1002: rebuilt and synchronized wikiversions files: Creating lmowiktionary (T291390)
09:09 urbanecm@deploy1002: Synchronized dblists: Creating lmowiktionary (T291390) (duration: 01m 02s)
09:08 urbanecm@deploy1002: Synchronized wmf-config/db-codfw.php: Creating lmowiktionary (T291390) (duration: 01m 02s)
09:07 urbanecm@deploy1002: Synchronized wmf-config/db-eqiad.php: Creating lmowiktionary (T291390) (duration: 01m 02s)
09:02 mwdebug-deploy@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
08:59 mwdebug-deploy@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
08:52 urbanecm@deploy1002: Synchronized langlist: Creating amiwiki (T292414) (duration: 01m 02s)
08:51 urbanecm@deploy1002: Synchronized wmf-config/InitialiseSettings.php: Creating amiwiki (T292414) (duration: 01m 02s)
08:50 urbanecm: [urbanecm@mwmaint1002 ~]$ mwscript extensions/CirrusSearch/maintenance/UpdateSearchIndexConfig.php --wiki=amiwiki --cluster=all # T292414
08:50 urbanecm@deploy1002: rebuilt and synchronized wikiversions files: Creating amiwiki (T292414)
08:50 mwdebug-deploy@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
08:49 urbanecm: [urbanecm@mwmaint1002 ~]$ mwscript extensions/CirrusSearch/maintenance/UpdateSearchIndexConfig.php --wiki=pwnwiki --cluster=all # T292415
08:49 urbanecm@deploy1002: Synchronized dblists: Creating amiwiki (T292414) (duration: 01m 02s)
08:48 urbanecm@deploy1002: Synchronized wmf-config/db-codfw.php: Creating amiwiki (T292414) (duration: 01m 02s)
08:47 urbanecm@deploy1002: Synchronized wmf-config/db-eqiad.php: Creating amiwiki (T292414) (duration: 01m 02s)
08:46 mwdebug-deploy@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
08:43 urbanecm@deploy1002: Synchronized langlist: Creating pwnwiki (T292415) (duration: 01m 02s)
08:41 urbanecm@deploy1002: Synchronized wmf-config/InitialiseSettings.php: Creating pwnwiki (T292415) (duration: 01m 03s)
08:40 urbanecm@deploy1002: rebuilt and synchronized wikiversions files: Creating pwnwiki (T292415)
08:39 urbanecm@deploy1002: Synchronized dblists: Creating pwnwiki (T292415) (duration: 01m 02s)
08:38 urbanecm@deploy1002: Synchronized wmf-config/db-codfw.php: Creating pwnwiki (T292415) (duration: 01m 03s)
08:37 urbanecm@deploy1002: Synchronized wmf-config/db-eqiad.php: Creating pwnwiki (T292415) (duration: 01m 03s)
08:37 mwdebug-deploy@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
08:34 mwdebug-deploy@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
08:30 ema: A:cp start rolling varnish upgrades to 6.0.8-1wm2 T293879
08:20 ayounsi@cumin1001: END (PASS) - Cookbook sre.network.cf (exit_code=0)
08:20 ayounsi@cumin1001: START - Cookbook sre.network.cf
07:32 ema@puppetmaster1001: conftool action : set/pooled=yes; selector: name=cp5011.eqsin.wmnet
06:38 elukey: depool cp5011 and restart varnish-frontend (ABI errors while reloading after digicert changes)
06:17 XioNoX: Remove GRE tunnel between cr4-ulsfo and cr2-eqsin - T273308
05:00 marostegui@cumin1001: dbctl commit (dc=all): 'Remove contributions replicas from s6 eqiad T263127', diff saved to https://phabricator.wikimedia.org/P17622 and previous config saved to /var/cache/conftool/dbconfig/20211028-050052-marostegui.json

2021-10-27

23:55 mwdebug-deploy@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
23:51 mwdebug-deploy@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
23:42 mwdebug-deploy@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
23:40 catrope@deploy1002: Synchronized wmf-config/InitialiseSettings.php: Allow upload by URL for Wikisources (T293205), and enable it on enwikisource for autoconfirmed (T294447) (duration: 01m 03s)
23:39 mwdebug-deploy@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
23:29 mwdebug-deploy@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
23:28 catrope@deploy1002: Synchronized wmf-config/InitialiseSettings.php: Config: Add mobile wordmark for Meetei (Manipuri) Wikipedia to config (T294189) (duration: 01m 02s)
23:27 catrope@deploy1002: Synchronized static/images/mobile/copyright/wikipedia-wordmark-mni.svg: Config: Add mobile wordmark for Meetei (Manipuri) Wikipedia to config (T294189) (duration: 01m 03s)
23:26 mwdebug-deploy@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
22:41 mwdebug-deploy@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
22:38 mwdebug-deploy@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
22:13 mwdebug-deploy@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
22:10 mwdebug-deploy@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
22:06 reedy@deploy1002: Synchronized php-1.38.0-wmf.6/extensions/SecurePoll/includes/Crypt/GpgCrypt.php: T294489 (duration: 01m 15s)
21:42 urbanecm: [urbanecm@mwmaint1002 ~]$ foreachwikiindblist wikipedia namespaceDupes.php --fix | tee namespacedupes-wikipedia-real.log # run namespaceDupes.php for all Wikipedias
21:38 urbanecm: run namespaceDupes.php for a bunch of Wikipedias
20:55 mwdebug-deploy@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
20:52 mwdebug-deploy@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
20:51 reedy@deploy1002: Synchronized wmf-config/CommonSettings.php: T294489 (duration: 01m 59s)
20:51 dzahn@cumin1001: conftool action : set/pooled=yes; selector: name=mw2255.codfw.wmnet
20:47 mutante: mw2255 - scap pull, repooling - after DRAC firmware was upgraded - T283582
20:47 dzahn@cumin1001: conftool action : set/pooled=no; selector: name=mw2255.codfw.wmnet
19:53 bblack: cp5xxx: switching unified cert to digicert-2021
19:49 bblack: cp5007: switching unified cert to digicert-2021
19:42 mwdebug-deploy@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
19:39 mwdebug-deploy@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
19:36 twentyafterfour@deploy1002: Synchronized php: group1 wikis to 1.38.0-wmf.6 refs T293947 (duration: 01m 47s)
19:34 twentyafterfour@deploy1002: rebuilt and synchronized wikiversions files: group1 wikis to 1.38.0-wmf.6 refs T293947
19:28 bblack: cp5001: switching unified cert to digicert-2021
19:24 bblack: cp5xxx: disabling puppet ahead of digicert unified certificate update rollout
18:46 legoktm: installing python-swiftclient on mw1305 for debugging
18:29 mwdebug-deploy@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
18:26 bd808@deploy1002: helmfile [staging] Ran 'sync' command on namespace 'toolhub' for release 'main' .
18:25 mwdebug-deploy@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
18:23 bd808@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'toolhub' for release 'main' .
18:22 bd808@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'toolhub' for release 'main' .
18:16 mwdebug-deploy@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
18:15 catrope@deploy1002: Synchronized wmf-config/InitialiseSettings.php: Config: Disable Education Program namespaces in eswiki (T294365) (duration: 01m 04s)
18:13 mwdebug-deploy@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
18:10 catrope@deploy1002: Synchronized wmf-config/InitialiseSettings.php: Config: Temporarily change the votewiki lang to fa (T292685) (duration: 01m 04s)
17:40 otto@deploy1002: Finished deploy [analytics/refinery@0d79e18] (hadoop-test): Regular analytics weekly train TEST [analytics/refinery@0d79e18] (duration: 06m 30s)
17:34 otto@deploy1002: Started deploy [analytics/refinery@0d79e18] (hadoop-test): Regular analytics weekly train TEST [analytics/refinery@0d79e18]
17:29 otto@deploy1002: Finished deploy [analytics/refinery@0d79e18] (thin): Regular analytics weekly train THIN [analytics/refinery@0d79e18] (duration: 00m 07s)
17:29 otto@deploy1002: Started deploy [analytics/refinery@0d79e18] (thin): Regular analytics weekly train THIN [analytics/refinery@0d79e18]
16:42 otto@deploy1002: Finished deploy [analytics/refinery@0d79e18]: Regular analytics weekly train [analytics/refinery@0d79e18] (duration: 20m 30s)
16:29 elukey@deploy1002: helmfile [ml-serve-eqiad] DONE helmfile.d/admin 'sync'.
16:29 elukey@deploy1002: helmfile [ml-serve-eqiad] START helmfile.d/admin 'sync'.
16:21 otto@deploy1002: Started deploy [analytics/refinery@0d79e18]: Regular analytics weekly train [analytics/refinery@0d79e18]
15:36 ejegg: updated payments-wiki from 6e810fb401 to 5b9fdd0fe1
15:28 volans: deployed new prefixes for drmrs in modules/network/data/data.yaml - T282787
15:12 cmooney@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
15:08 cmooney@cumin1001: START - Cookbook sre.dns.netbox
15:07 volans@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
15:03 volans@cumin2002: START - Cookbook sre.dns.netbox
14:56 volans@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
14:51 volans@cumin2002: START - Cookbook sre.dns.netbox
14:00 marostegui: Replace m5-master so it points to dbproxy1017 - T288093
13:58 elukey: removed /var/run/confd-template/.inference*.err files from puppetmaster2001 (backup saved in /home/elukey just in case)
10:53 jbond: enable puppet fleet wide post gerrit:734937
10:43 jbond: disable puppet fleet wide to deploy a puppetmaster change gerrit:734937
10:43 jbond: disable puppet fleet wide to deploy a puppetmaster change
10:21 mwdebug-deploy@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
10:13 mwdebug-deploy@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
10:12 urbanecm@deploy1002: Synchronized php-1.38.0-wmf.6/extensions/GrowthExperiments/: 305e97a, b9eaa20: GrowthExperiments backports (T293434, T294386) (duration: 01m 04s)
10:10 urbanecm@deploy1002: Synchronized php-1.38.0-wmf.5/extensions/GrowthExperiments/: 305e97a, 667a4be: GrowthExperiments backports (T293434, T294386) (duration: 01m 04s)
10:04 mwdebug-deploy@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
10:02 urbanecm: [urbanecm@mwdebug1001 /srv/mediawiki/php]$ mwscript extensions/GrowthExperiments/maintenance/updateMenteeData.php --wiki=cswiki --dbshard=s2 --verbose # testing 734752
10:01 mwdebug-deploy@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
10:01 urbanecm: [urbanecm@mwmaint1002 /srv/mediawiki/php]$ mwscript extensions/GrowthExperiments/maintenance/updateMenteeData.php --wiki=cswiki --dbshard=s2 --verbose # testing 734752
09:25 godog: another run of backfill on graphite1004 - T294355
09:20 marostegui@cumin1001: dbctl commit (dc=all): 'Remove watchlist replicas from s6 eqiad T263127', diff saved to https://phabricator.wikimedia.org/P17615 and previous config saved to /var/cache/conftool/dbconfig/20211027-092043-marostegui.json
09:09 oblivian@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
09:04 oblivian@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
08:50 topranks: Enabling Telxius circuit from cr1-eqiad to asw1-b12-drmrs with homer.
07:49 marostegui@cumin1001: dbctl commit (dc=all): 'Contributions replicas from s6.codfw T263127', diff saved to https://phabricator.wikimedia.org/P17614 and previous config saved to /var/cache/conftool/dbconfig/20211027-074935-marostegui.json
07:26 marostegui@cumin1001: dbctl commit (dc=all): 'Remove watchlist replicas from s6.codfw T263127', diff saved to https://phabricator.wikimedia.org/P17613 and previous config saved to /var/cache/conftool/dbconfig/20211027-072546-marostegui.json
06:06 marostegui@cumin1001: dbctl commit (dc=all): 'Remove recentchanges and recentchangeslinked replicas from s6.codfw T263127', diff saved to https://phabricator.wikimedia.org/P17612 and previous config saved to /var/cache/conftool/dbconfig/20211027-060634-marostegui.json
05:31 marostegui@cumin1001: dbctl commit (dc=all): 'Remove logpager replicas from s6.codfw T263127', diff saved to https://phabricator.wikimedia.org/P17611 and previous config saved to /var/cache/conftool/dbconfig/20211027-053104-marostegui.json

2021-10-26

22:59 legoktm: uploaded python-logstash to buster-wikimedia for T294393
21:29 cmjohnson@cumin1001: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host cloudcephosd1021.eqiad.wmnet with OS bullseye
21:15 mwdebug-deploy@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
21:13 cmjohnson@cumin1001: START - Cookbook sre.hosts.reimage for host cloudcephosd1021.eqiad.wmnet with OS bullseye
21:12 mwdebug-deploy@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
21:04 reedy@deploy1002: Synchronized php-1.38.0-wmf.5/tests/phpunit/includes/api/query/ApiQueryImageInfoTest.php: T293783 (duration: 01m 02s)
21:03 mwdebug-deploy@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
21:03 reedy@deploy1002: Synchronized php-1.38.0-wmf.6/tests/phpunit/includes/api/query/ApiQueryImageInfoTest.php: T293783 (duration: 01m 02s)
21:01 reedy@deploy1002: Synchronized php-1.38.0-wmf.6/includes/api/ApiQueryImageInfo.php: T293783 (duration: 01m 03s)
21:00 reedy@deploy1002: Synchronized php-1.38.0-wmf.5/includes/api/ApiQueryImageInfo.php: T293783 (duration: 01m 03s)
21:00 mwdebug-deploy@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
20:23 mwdebug-deploy@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
20:15 mwdebug-deploy@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
20:00 mwdebug-deploy@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
19:57 mwdebug-deploy@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
19:51 twentyafterfour@deploy1002: rebuilt and synchronized wikiversions files: group0 wikis to 1.38.0-wmf.6 refs T293947
19:48 eileen: civicrm revision changed from 733a8fceda to dba74c443b, config revision is eed79486d5
19:45 mwdebug-deploy@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
19:40 mwdebug-deploy@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
19:38 twentyafterfour@deploy1002: Finished scap: testwikis wikis to 1.38.0-wmf.6 refs T293947 (duration: 25m 28s)
19:20 mwdebug-deploy@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
19:17 mwdebug-deploy@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
19:16 cmjohnson@cumin1001: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host cloudcephosd1021.eqiad.wmnet with OS bullseye
19:13 twentyafterfour@deploy1002: Started scap: testwikis wikis to 1.38.0-wmf.6 refs T293947
18:47 cmjohnson@cumin1001: START - Cookbook sre.hosts.reimage for host cloudcephosd1021.eqiad.wmnet with OS bullseye
17:52 ryankemper@deploy1002: Finished deploy [wdqs/wdqs@e908052] (wcqs): Deploy 0.3.90 to WCQS (duration: 01m 34s)
17:50 ryankemper@deploy1002: Started deploy [wdqs/wdqs@e908052] (wcqs): Deploy 0.3.90 to WCQS
17:09 ryankemper@deploy1002: Finished deploy [wdqs/wdqs@e908052] (wcqs): Deploy 0.3.90 to WCQS (duration: 02m 37s)
17:06 ryankemper@deploy1002: Started deploy [wdqs/wdqs@e908052] (wcqs): Deploy 0.3.90 to WCQS
17:05 ryankemper@deploy1002: Finished deploy [wdqs/wdqs@e908052] (wcqs): Deploy 0.3.90 to WCQS (duration: 1100m 51s)
16:25 cdanis@cumin1001: END (PASS) - Cookbook sre.network.cf (exit_code=0)
16:25 cdanis@cumin1001: START - Cookbook sre.network.cf
16:24 mutante: [mwmaint1002:~] $ sudo systemctl start mediawiki_job_wikidata_resubmit_changes_for_dispatch
16:23 mutante: mwmaint1002 - running puppet, created new mw periodic job from gerrit:732972 (T294031)
16:07 cmooney@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
16:04 cmooney@cumin1001: START - Cookbook sre.dns.netbox
15:45 lucaswerkmeister-wmde@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'changeprop-jobqueue' for release 'production' .
15:41 lucaswerkmeister-wmde@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'changeprop-jobqueue' for release 'production' .
15:38 lucaswerkmeister-wmde@deploy1002: helmfile [staging] Ran 'sync' command on namespace 'changeprop-jobqueue' for release 'staging' .
15:27 cdanis@cumin1001: END (PASS) - Cookbook sre.network.cf (exit_code=0)
15:27 cdanis@cumin1001: START - Cookbook sre.network.cf
15:07 topranks: Running homer against cr3-esams to create new temp GRE tunnel to asw1-b12-drmrs
15:02 cdanis@cumin1001: END (PASS) - Cookbook sre.network.cf (exit_code=0)
15:02 cdanis@cumin1001: START - Cookbook sre.network.cf
14:55 topranks: Adding static route on cr3-esams to asw1-b12-drmrs Telia link IP to allow GRE to be built.
13:50 elukey: ran "Capirca Host Definition" script on netbox - output https://netbox.wikimedia.org/extras/scripts/results/1787315/
13:45 urbanecm@deploy1002: Synchronized php-1.38.0-wmf.5/extensions/Wikibase: 7723cf7: RecentChangeFactory: Add missing rc_logid value (T293885) (duration: 01m 02s)
13:40 elukey: ran "Capirca Host Definition" script on netbox-next to get up-to-date aqs_group host definition - result https://netbox-next.wikimedia.org/extras/scripts/results/894348/
13:24 kart_: Updated cxserver to 2021-10-25-123807-production (T217747, T218217, T292421)
13:19 kartik@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'cxserver' for release 'production' .
13:13 kartik@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'cxserver' for release 'production' .
13:05 kartik@deploy1002: helmfile [staging] Ran 'sync' command on namespace 'cxserver' for release 'staging' .
13:05 hashar@deploy1002: Pruned MediaWiki: 1.38.0-wmf.4 (duration: 31m 07s)
12:35 mwdebug-deploy@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
12:35 hashar: scap clean --delete 1.38.0-wmf.4 # T293947
12:32 hashar: Applied security patches to 1.38.0-wmf.6 # T293947
12:32 mwdebug-deploy@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
12:31 hashar: scap prep 1.38.0-wmf.6 # T293947
12:16 jbond: upload cas_6.4.2-1+wmf10u3_amd64
12:07 mwdebug-deploy@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
12:04 mwdebug-deploy@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
11:55 mwdebug-deploy@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
11:52 mwdebug-deploy@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
11:51 urbanecm@deploy1002: Finished scap: c131f32: Add namespace translations for [ami] Amis and [pwn] Paiwan (T292414, T292415) (duration: 02m 25s)
11:49 urbanecm@deploy1002: Started scap: c131f32: Add namespace translations for [ami] Amis and [pwn] Paiwan (T292414, T292415)
11:13 mwdebug-deploy@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
11:13 urbanecm@deploy1002: Synchronized logos/config.yaml: 575a6a6: Fix HD logo size in some wikis (T250731; 2/2) (duration: 00m 55s)
11:13 urbanecm@deploy1002: Synchronized static/images/project-logos/: 575a6a6: Fix HD logo size in some wikis (T250731; 1/2) (duration: 00m 57s)
11:09 mwdebug-deploy@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
10:46 jbond: upload cas_6.4.2-1+wmf10u2_amd64.deb
10:40 mvernon@cumin2002: conftool action : set/pooled=true; selector: name=codfw,dnsdisc=swift
10:39 mvernon@cumin2002: conftool action : set/pooled=true; selector: name=codfw,dnsdisc=swift-ro
10:09 mwdebug-deploy@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
10:07 oblivian@deploy1002: Synchronized tests/WmfConfigServicesTest.php: Switching back graphite to eqiad (duration: 00m 55s)
10:06 mwdebug-deploy@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
10:06 oblivian@deploy1002: Synchronized wmf-config/ProductionServices.php: Switching back graphite to eqiad (duration: 01m 04s)
09:49 godog: bounce superset on an-tool1005 to pick up statsd changes - T247963
09:49 godog: bounce superset on an-tool1010 to pick up statsd changes - T247963
09:47 godog: bounce navtiming on webperf1001 to pick up statsd changes - T247963
09:40 godog: flip back write traffic to graphite1004 (all but mediawiki) - T247963
09:27 godog: move read traffic back to graphite1004 - T247963
08:37 mwdebug-deploy@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
08:33 mwdebug-deploy@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
08:33 ema: upload varnish_6.0.8-1wm2 to component/varnish6 on apt.wm.org T293879
08:31 urbanecm@deploy1002: Synchronized php-1.38.0-wmf.5/extensions/GrowthExperiments/maintenance: 91316ed: Add purgeExpiredMentorStatus.php (T280307) (duration: 00m 56s)
08:24 mwdebug-deploy@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
08:21 mwdebug-deploy@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
07:21 elukey@deploy1002: helmfile [ml-serve-eqiad] Ran 'sync' command on namespace 'revscoring-articlequality' for release 'main' .
07:07 effie: pool mw1319 and mw1312
07:05 effie: pool wtp1026.eqiad.wmnet
06:36 marostegui@cumin1001: dbctl commit (dc=all): 'db1109 (re)pooling @ 100%: After upgrade', diff saved to https://phabricator.wikimedia.org/P17606 and previous config saved to /var/cache/conftool/dbconfig/20211026-063647-root.json
06:21 marostegui@cumin1001: dbctl commit (dc=all): 'db1109 (re)pooling @ 75%: After upgrade', diff saved to https://phabricator.wikimedia.org/P17605 and previous config saved to /var/cache/conftool/dbconfig/20211026-062144-root.json
06:06 marostegui@cumin1001: dbctl commit (dc=all): 'db1109 (re)pooling @ 50%: After upgrade', diff saved to https://phabricator.wikimedia.org/P17604 and previous config saved to /var/cache/conftool/dbconfig/20211026-060640-root.json
05:51 marostegui@cumin1001: dbctl commit (dc=all): 'db1109 (re)pooling @ 25%: After upgrade', diff saved to https://phabricator.wikimedia.org/P17603 and previous config saved to /var/cache/conftool/dbconfig/20211026-055136-root.json
05:36 marostegui@cumin1001: dbctl commit (dc=all): 'db1109 (re)pooling @ 10%: After upgrade', diff saved to https://phabricator.wikimedia.org/P17602 and previous config saved to /var/cache/conftool/dbconfig/20211026-053633-root.json
05:21 marostegui@cumin1001: dbctl commit (dc=all): 'db1109 (re)pooling @ 5%: After upgrade', diff saved to https://phabricator.wikimedia.org/P17601 and previous config saved to /var/cache/conftool/dbconfig/20211026-052129-root.json
02:33 mwdebug-deploy@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
02:31 mwdebug-deploy@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
02:06 mwdebug-deploy@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
02:03 mwdebug-deploy@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
01:24 mwdebug-deploy@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
01:24 krinkle@deploy1002: Synchronized wmf-config/logging.php: I0211e1c77 (duration: 00m 55s)
01:20 mwdebug-deploy@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .

2021-10-25

23:12 catrope@deploy1002: Synchronized wmf-config/InitialiseSettings.php: Create alias for Appendix and Appendix_talk namespaces on mywiktionary (T291146) (duration: 00m 55s)
23:10 mwdebug-deploy@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
23:07 mwdebug-deploy@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
22:57 ryankemper: [wcqs] Downtimed `wcqs*` until roughly a week from now (while we setup oauth)
22:53 legoktm: uploaded PHP 7.4.25 to apt.wm.o (DSA-4992-1)
22:44 ryankemper@deploy1002: Started deploy [wdqs/wdqs@e908052] (wcqs): Deploy 0.3.90 to WCQS
22:30 ryankemper@deploy1002: Finished deploy [wdqs/wdqs@13448f1] (wcqs): Deploy 0.3.90 to WCQS (duration: 03m 04s)
22:27 ryankemper@deploy1002: Started deploy [wdqs/wdqs@13448f1] (wcqs): Deploy 0.3.90 to WCQS
21:53 mutante: new project language "pwn" added - Paiwan is a native language of Taiwan, spoken by the Paiwan, a Taiwanese indigenous people. T292415
21:52 mutante: new project language "ami" added - Sowal no 'Amis is the Formosan language of the 'Amis (or Ami), an indigenous people living along the east coast of Taiwan. - T292414
21:50 mutante: log authdns1001 (DNS) - sudo authdns-update, add new project language "ami" (Amis) for T292414 - edited langlist.tmpl which regenerates all project zones
21:40 mutante: authdns1001 (DNS) - sudo authdns-update, add new project language "pwn" (Paiwan) for T292415
19:47 dzahn@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2 days, 0:00:00 on mw2255.codfw.wmnet with reason: DRAC upgrade
19:47 dzahn@cumin1001: START - Cookbook sre.hosts.downtime for 2 days, 0:00:00 on mw2255.codfw.wmnet with reason: DRAC upgrade
19:47 mutante: mw2255 - depooled=inactive (incl "dsh groups"), shut down physically for T283582 - can be worked on anytime
19:45 dzahn@cumin1001: conftool action : set/pooled=inactive; selector: name=mw2255.codfw.wmnet
19:45 dzahn@cumin1001: conftool action : set/pooled=no; selector: name=mw2255.codfw.wmnet
19:42 mutante: icinga - ACKing all unhandled CRIT alerts on hosts with "dev" or "test" in their name, regardless of notifications being disabled or not. just so that we get more signal than noise in actual unhandled CRITs in web UI
19:40 mutante: cumin2002 - sudo systemctl reset-failed to clear Icinga alert about failed but (now) non-existing service database-backups-snapshots.service, assuming it's a case of "only in active DC"
19:12 dzahn@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 4:00:00 on db1112.eqiad.wmnet with reason: hardware fail
19:12 dzahn@cumin1001: START - Cookbook sre.hosts.downtime for 4:00:00 on db1112.eqiad.wmnet with reason: hardware fail
19:07 kormat@cumin1001: dbctl commit (dc=all): 'Temporarily move mw groups to db1123 T294295', diff saved to https://phabricator.wikimedia.org/P17597 and previous config saved to /var/cache/conftool/dbconfig/20211025-190717-kormat.json
19:06 mutante: db1112 - powercycling
19:04 legoktm@cumin1001: dbctl commit (dc=all): 'Depool db1112 (T294295)', diff saved to https://phabricator.wikimedia.org/P17596 and previous config saved to /var/cache/conftool/dbconfig/20211025-190436-legoktm.json
18:41 mwdebug-deploy@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
18:40 jforrester@deploy1002: Synchronized php-1.38.0-wmf.5/extensions/timeline/includes/Timeline.php: Backport: Input may be null when rendering a self-closing tag `<timeline />` (T294020) (duration: 00m 55s)
18:38 mwdebug-deploy@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
18:28 mwdebug-deploy@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
18:25 mwdebug-deploy@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
18:24 jforrester@deploy1002: Synchronized wmf-config/InitialiseSettings.php: Config: Fix some easy codestyle issues (duration: 00m 55s)
18:22 jforrester@deploy1002: Synchronized w/static.php: Config: Fix some easy codestyle issues (duration: 00m 54s)
18:19 jforrester@deploy1002: Synchronized wmf-config/InitialiseSettings.php: Config: Fix array declaration of NS_USER_TALK abbreviation on ruwikiquote (T197058) (duration: 00m 55s)
18:16 mwdebug-deploy@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
18:15 jforrester@deploy1002: Synchronized wmf-config/flaggedrevs.php: Config: flaggedrevs: Drop legacy wgFlaggedRevsStatsAge config, no longer read (duration: 00m 55s)
18:13 mwdebug-deploy@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
18:12 jforrester@deploy1002: Synchronized wmf-config/InitialiseSettings.php: Config: Make reply tool available as opt-out on frwiki (T293687) (duration: 00m 56s)
17:41 dzahn@cumin1001: conftool action : set/pooled=yes; selector: name=mw2253.codfw.wmnet
17:40 dzahn@cumin1001: conftool action : set/pooled=no; selector: name=mw2253.codfw.wmnet
17:39 mutante: mw2253 - scap pull after hw maintenance is over
17:32 bd808@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'toolhub' for release 'main' .
17:26 bd808@deploy1002: helmfile [staging] Ran 'sync' command on namespace 'toolhub' for release 'main' .
17:24 mmandere@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
17:23 bd808@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'toolhub' for release 'main' .
17:22 XioNoX: update core routers ACLs
17:20 mmandere@cumin2002: START - Cookbook sre.dns.netbox
16:49 XioNoX: update management routers ACLs
16:36 XioNoX: DNS - Add eqsin-ulsfo transport v6 prefix - T273308
16:31 mmandere@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
16:28 mmandere@cumin2002: START - Cookbook sre.dns.netbox
16:25 accraze@deploy1002: helmfile [ml-serve-eqiad] Ran 'sync' command on namespace 'revscoring-articlequality' for release 'main' .
16:25 mmandere@cumin2002: END (FAIL) - Cookbook sre.dns.netbox (exit_code=99)
16:21 mmandere@cumin2002: START - Cookbook sre.dns.netbox
16:12 mwdebug-deploy@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
16:10 dzahn@cumin1001: conftool action : set/pooled=inactive; selector: name=mw2253.codfw.wmnet
16:09 mwdebug-deploy@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
16:08 lucaswerkmeister-wmde@deploy1002: Synchronized wmf-config/InitialiseSettings-labs.php: Config: Empty wikibase disabled access entity types on Beta (T294159) (beta-only) (duration: 01m 47s)
16:04 mmandere@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
16:01 mmandere@cumin2002: START - Cookbook sre.dns.netbox
15:57 jdrewniak@deploy1002: Synchronized portals/wikipedia.org/assets: Wikimedia Portals Update: Bumping portals to master (T128546) (duration: 01m 52s)
15:55 mwdebug-deploy@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
15:52 mwdebug-deploy@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
15:49 jdrewniak@deploy1002: Synchronized portals/wikipedia.org/assets: Wikimedia Portals Update: Bumping portals to master (T128546) (duration: 01m 54s)
15:46 jbond: upgrade cas/idp to 6.4.2
14:56 mutante: mw2253 - shut down and downtimed for 2 days
14:50 dzahn@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2 days, 0:00:00 on mw2253.codfw.wmnet with reason: DRAC upgrade
14:50 dzahn@cumin1001: START - Cookbook sre.hosts.downtime for 2 days, 0:00:00 on mw2253.codfw.wmnet with reason: DRAC upgrade
14:49 mutante: depooling mw2253 for DRAC upgrade (T283582)
14:48 dzahn@cumin1001: conftool action : set/pooled=no; selector: name=mw2253.codfw.wmnet
14:45 jbond: update cas package
14:31 marostegui: Deploy schema change on s3 codfw - T291719
12:04 ema: cp3062: upgrade varnish to 6.0.8-1wm2 T293879
11:57 ema: deployment-cache-text06: upgrade varnish to 6.0.8-1wm2 T293879
11:40 mwdebug-deploy@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
11:36 mwdebug-deploy@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
11:24 mwdebug-deploy@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
11:24 Lucas_WMDE: UTC morning backport+config window done
11:22 lucaswerkmeister-wmde@deploy1002: Synchronized wmf-config/Wikibase.php: Config: Remove dispatchLagToMaxLagFactor Wikibase setting (T292604) (duration: 00m 54s)
11:20 mwdebug-deploy@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
11:18 lucaswerkmeister-wmde@deploy1002: Synchronized wmf-config/Wikibase.php: Config: Remove wikibaseDispatchRedisLockManager config (T292604) (duration: 00m 54s)
11:14 lucaswerkmeister-wmde@deploy1002: Synchronized wmf-config/InitialiseSettings.php: Config: Remove wmg variables for dispatchChanges.php Wikibase settings (T292604) (duration: 00m 55s)
11:10 mwdebug-deploy@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
11:09 lucaswerkmeister-wmde@deploy1002: Synchronized wmf-config/Wikibase.php: Config: Remove dispatchChanges.php-related Wikibase settings (T292604) (duration: 00m 55s)
11:07 mwdebug-deploy@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
11:05 lucaswerkmeister-wmde@deploy1002: Synchronized wmf-config/Wikibase.php: Config: Remove dispatchViaJobs-related Wikibase settings (T291828) (duration: 00m 56s)
09:52 godog: bounce uwsgi graphite web on graphite2003 - T294220
09:52 volans@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
09:48 volans@cumin1001: START - Cookbook sre.dns.netbox
09:43 lucaswerkmeister-wmde@deploy1002: Synchronized wmf-config/InitialiseSettings-labs.php: Config: [BETA CLUSTER] Enable WikibaseLexeme Scribunto access (T294159) (merged on Friday, syncing now to avoid outdated files even if it’s just -labs.php) (duration: 00m 55s)
09:18 godog: bounce graphite-web on graphite2003 to test timeout bump - T294220
08:08 XioNoX: merge DNS changes to add drmrs
07:50 elukey@deploy1002: helmfile [ml-serve-eqiad] DONE helmfile.d/admin 'sync'.
07:50 elukey@deploy1002: helmfile [ml-serve-eqiad] START helmfile.d/admin 'sync'.
05:47 oblivian@puppetmaster1001: conftool action : set/pooled=yes; selector: dc=eqiad,cluster=parsoid,name=wtp1026.*
05:43 _joe_: pooling wtp1042 T294212
05:26 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host db1109.eqiad.wmnet with OS buster
05:01 marostegui@cumin1001: START - Cookbook sre.hosts.reimage for host db1109.eqiad.wmnet with OS buster
04:30 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db1109 (s8) for reimage T290868', diff saved to https://phabricator.wikimedia.org/P17590 and previous config saved to /var/cache/conftool/dbconfig/20211025-043028-marostegui.json

2021-10-23

16:40 dcausse: restarting blazegraph on wdqs1004 and wdqs1006 (free allocators alert)
15:45 urbanecm: Start server-side upload for 1 video file (T289781), testing whether T291137 is still an issue

2021-10-22

23:17 mwdebug-deploy@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
23:13 mwdebug-deploy@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
20:57 bblack: re-pooling eqiad in DNS
20:54 legoktm: <XioNoX> I disabled the interface on cr1, going to re-enabled the active on on cr2
20:48 legoktm: bblack has temporarily depooled eqiad https://gerrit.wikimedia.org/r/733043
20:41 XioNoX: disable sessions to equinix eqiad IXP
19:17 urbanecm: Start server-side upload of 1 video file (T294134)
15:06 jbond: upload puppetboard_3.1.0-1_all.deb to ullseye-wikimedia
13:42 ema: deployment-cache-upload06: restart varnish-frontend, package got upgraded to 6.0.8 T294116
13:30 jbond: upload python3-pypuppetdb_2.4.0-1_all.deb to bullseye
10:46 jbond: upload cas_6.4.2-1+wmf10u1
10:33 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host ganeti2026.codfw.wmnet with OS buster
10:05 jmm@cumin2002: START - Cookbook sre.hosts.reimage for host ganeti2026.codfw.wmnet with OS buster
09:11 Lucas_WMDE: lucaswerkmeister-wmde@mwmaint1002:~$ mwscript extensions/Wikibase/repo/maintenance/ResubmitChanges.php wikidatawiki --minimum-age $((60*60*12)) # T294029
09:04 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host ganeti2025.codfw.wmnet with OS buster
08:36 jmm@cumin2002: START - Cookbook sre.hosts.reimage for host ganeti2025.codfw.wmnet with OS buster
08:27 ema@puppetmaster1001: conftool action : set/pooled=yes; selector: name=cp3062.esams.wmnet,service=(varnish-fe|ats-tls)
08:24 ema@puppetmaster1001: conftool action : set/pooled=no; selector: name=cp3062.esams.wmnet,service=(varnish-fe|ats-tls)
08:23 ema: cp3062: test 0008-vsl_check_e_inval_assertion.patch https://gerrit.wikimedia.org/r/c/operations/debs/varnish4/+/732913/ T293879
08:00 ema: deployment-cache-text06: test 0008-vsl_check_e_inval_assertion.patch https://gerrit.wikimedia.org/r/c/operations/debs/varnish4/+/732913/ T293879
05:54 marostegui@cumin1001: dbctl commit (dc=all): 'db1126 (re)pooling @ 100%: After upgrade', diff saved to https://phabricator.wikimedia.org/P17580 and previous config saved to /var/cache/conftool/dbconfig/20211022-055403-root.json
05:39 marostegui@cumin1001: dbctl commit (dc=all): 'db1126 (re)pooling @ 75%: After upgrade', diff saved to https://phabricator.wikimedia.org/P17579 and previous config saved to /var/cache/conftool/dbconfig/20211022-053900-root.json
05:23 marostegui@cumin1001: dbctl commit (dc=all): 'db1126 (re)pooling @ 50%: After upgrade', diff saved to https://phabricator.wikimedia.org/P17578 and previous config saved to /var/cache/conftool/dbconfig/20211022-052356-root.json
05:08 marostegui@cumin1001: dbctl commit (dc=all): 'db1126 (re)pooling @ 25%: After upgrade', diff saved to https://phabricator.wikimedia.org/P17577 and previous config saved to /var/cache/conftool/dbconfig/20211022-050852-root.json
04:53 marostegui@cumin1001: dbctl commit (dc=all): 'db1126 (re)pooling @ 10%: After upgrade', diff saved to https://phabricator.wikimedia.org/P17576 and previous config saved to /var/cache/conftool/dbconfig/20211022-045349-root.json
04:46 marostegui_: Deploy schema change on s8 codfw - T291719
04:38 marostegui@cumin1001: dbctl commit (dc=all): 'db1126 (re)pooling @ 5%: After upgrade', diff saved to https://phabricator.wikimedia.org/P17575 and previous config saved to /var/cache/conftool/dbconfig/20211022-043845-root.json
02:59 ejegg: updated payments-wiki from 088a8cda1e to 6e810fb401

2021-10-21

23:40 mwdebug-deploy@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
23:38 jforrester@deploy1002: Synchronized w/fatal-error.php: Config: build: Upgrade composer testing stack to latest as used Wikimedia-wide (duration: 00m 54s)
23:37 mwdebug-deploy@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
23:37 jforrester@deploy1002: Synchronized w/static.php: Config: build: Upgrade composer testing stack to latest as used Wikimedia-wide (duration: 00m 54s)
23:36 jforrester@deploy1002: Synchronized multiversion/: Config: build: Upgrade composer testing stack to latest as used Wikimedia-wide (duration: 00m 55s)
23:34 jforrester@deploy1002: Synchronized docroot/noc/conf/index.php: Config: build: Upgrade composer testing stack to latest as used Wikimedia-wide (duration: 00m 54s)
23:33 jforrester@deploy1002: Synchronized wmf-config: Config: build: Upgrade composer testing stack to latest as used Wikimedia-wide (duration: 00m 55s)
23:32 bd808@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'toolhub' for release 'main' .
23:28 mwdebug-deploy@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
23:25 mwdebug-deploy@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
23:25 thcipriani@deploy1002: Synchronized wmf-config: Config: CommonSettings: Drop legacy CentralAuth config flag, never read (T277932) (duration: 00m 55s)
23:18 thcipriani@deploy1002: Synchronized tests/multiversion/StaticSettingsTest.php: Config: Add new config names for CentralAuth denylist controls (T277932) (duration: 00m 55s)
23:15 thcipriani@deploy1002: Synchronized wmf-config/InitialiseSettings.php: Config: Add new config names for CentralAuth denylist controls (T277932) (duration: 00m 55s)
23:10 mwdebug-deploy@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
23:07 mwdebug-deploy@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
22:42 mutante: T294038 [krb1001:~] $ sudo manage_principals.py create effeietsanders ... Principal successfully created. . .Successfully sent email
21:44 ebernhardson@deploy1002: Finished deploy [wdqs/wdqs@13448f1] (wcqs): Deploy 0.3.90 to WCQS (duration: 02m 47s)
21:41 ebernhardson@deploy1002: Started deploy [wdqs/wdqs@13448f1] (wcqs): Deploy 0.3.90 to WCQS
20:54 ebernhardson@deploy1002: Finished deploy [wdqs/wdqs@1309a97] (wcqs): dry run wcqs deploy (duration: 00m 13s)
20:53 ebernhardson@deploy1002: Started deploy [wdqs/wdqs@1309a97] (wcqs): dry run wcqs deploy
20:53 ebernhardson@deploy1002: Finished deploy [wdqs/wdqs@1309a97] (wcqs): dry run wcqs deploy (duration: 00m 35s)
20:52 ebernhardson@deploy1002: Started deploy [wdqs/wdqs@1309a97] (wcqs): dry run wcqs deploy
20:04 otto@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'eventgate-main' for release 'canary' .
20:04 otto@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'eventgate-main' for release 'production' .
20:02 otto@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'eventgate-main' for release 'production' .
20:02 otto@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'eventgate-main' for release 'canary' .
19:46 mwdebug-deploy@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
19:43 mwdebug-deploy@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
19:42 legoktm@deploy1002: Synchronized wmf-config/CommonSettings.php: Update $wgTimelineFonts for new path to unifont in Shellbox container (T293050) (duration: 00m 55s)
19:38 legoktm@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'shellbox-timeline' for release 'main' .
19:35 legoktm@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'shellbox-timeline' for release 'main' .
19:31 otto@deploy1002: helmfile [staging] Ran 'sync' command on namespace 'eventgate-main' for release 'production' .
19:23 legoktm@deploy1002: helmfile [staging] Ran 'sync' command on namespace 'shellbox-timeline' for release 'main' .
19:10 ebernhardson@deploy1002: Finished deploy [wdqs/wdqs@b2912b7]: deploy 0.3.90, incl oauth, to wcqs (duration: 00m 23s)
19:09 ebernhardson@deploy1002: Started deploy [wdqs/wdqs@b2912b7]: deploy 0.3.90, incl oauth, to wcqs
19:07 ebernhardson@deploy1002: Finished deploy [wdqs/wdqs@b2912b7]: (no justification provided) (duration: 00m 08s)
19:07 ebernhardson@deploy1002: Started deploy [wdqs/wdqs@b2912b7]: (no justification provided)
18:53 urbanecm: Deploy security patch for T285116 (wmf.4, wmf.5)
18:53 mutante: dumpsdata1003 - sudo systemctl reset-failed to clear Icinga alert about failed cleanup_tmpdumps.service
17:55 mutante: that's a key for https://www.worldcat.org/whatis/default.jsp btw for those wondering
17:53 mutante: citoid - replaced "wskey" for worldcat in private repo as requested on T294010 (is in 4 places, 3 for deployment_server/k8s and one remnant for scb)
17:53 mvolz@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'citoid' for release 'production' .
17:52 mvolz@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'citoid' for release 'production' .
17:50 mvolz@deploy1002: helmfile [staging] Ran 'sync' command on namespace 'citoid' for release 'staging' .
16:17 mwdebug-deploy@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
16:14 mwdebug-deploy@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
16:13 elukey@deploy1002: helmfile [ml-serve-eqiad] Ran 'sync' command on namespace 'revscoring-draftquality' for release 'main' .
16:12 elukey@deploy1002: helmfile [ml-serve-eqiad] Ran 'sync' command on namespace 'revscoring-editquality' for release 'main' .
16:07 lucaswerkmeister-wmde@deploy1002: Synchronized php-1.38.0-wmf.5/extensions/Wikibase/repo/tests/: Backport: Remove dispatchViaJobs repo setting (T292604) (3/3) (duration: 00m 56s)
16:06 lucaswerkmeister-wmde@deploy1002: Synchronized php-1.38.0-wmf.5/extensions/Wikibase/repo/config/: Backport: Remove dispatchViaJobs repo setting (T292604) (2/3) (duration: 00m 54s)
16:05 mwdebug-deploy@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
16:04 lucaswerkmeister-wmde@deploy1002: Synchronized php-1.38.0-wmf.5/extensions/Wikibase/repo/includes/: Backport: Remove dispatchViaJobs repo setting (T292604) (1/3) (duration: 00m 56s)
16:03 elukey@deploy1002: helmfile [ml-serve-eqiad] Ran 'sync' command on namespace 'revscoring-editquality' for release 'main' .
16:02 mwdebug-deploy@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
16:01 lucaswerkmeister-wmde@deploy1002: Synchronized php-1.38.0-wmf.5/extensions/Wikibase/repo/tests/: Backport: Remove dispatchViaJobsPruneChangesTableInJobEnabled repo setting (T292604) (3/3) (duration: 00m 56s)
15:59 lucaswerkmeister-wmde@deploy1002: Synchronized php-1.38.0-wmf.5/extensions/Wikibase/repo/config/: Backport: Remove dispatchViaJobsPruneChangesTableInJobEnabled repo setting (T292604) (2/3) (duration: 00m 55s)
15:58 lucaswerkmeister-wmde@deploy1002: Synchronized php-1.38.0-wmf.5/extensions/Wikibase/repo/includes/: Backport: Remove dispatchViaJobsPruneChangesTableInJobEnabled repo setting (T292604) (1/3) (duration: 00m 57s)
15:43 robh@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
15:21 robh@cumin1001: START - Cookbook sre.dns.netbox
15:14 lucaswerkmeister-wmde@deploy1002: Synchronized php-1.38.0-wmf.5/extensions/Wikibase/repo/tests/: Backport: Remove dispatchViaJobsAllowedClients repo setting (T292604) (3/3) (duration: 00m 56s)
15:13 mwdebug-deploy@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
15:13 lucaswerkmeister-wmde@deploy1002: Synchronized php-1.38.0-wmf.5/extensions/Wikibase/repo/config/: Backport: Remove dispatchViaJobsAllowedClients repo setting (T292604) (1/3) (duration: 00m 54s)
15:12 Lucas_WMDE: my next message accidentally says 1/3 again but it’s 2/3, sorry
15:11 lucaswerkmeister-wmde@deploy1002: Synchronized php-1.38.0-wmf.5/extensions/Wikibase/repo/includes/: Backport: Remove dispatchViaJobsAllowedClients repo setting (T292604) (1/3) (duration: 00m 56s)
15:10 mwdebug-deploy@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
14:56 volans@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host sretest1001.eqiad.wmnet with OS buster
14:42 lucaswerkmeister-wmde@deploy1002: Synchronized php-1.38.0-wmf.5/extensions/Wikibase/repo/config/Wikibase.default.php: Backport: Enable dispatching via jobs by default (T291828) (duration: 00m 55s)
14:41 mwdebug-deploy@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
14:39 lucaswerkmeister-wmde@deploy1002: Synchronized php-1.38.0-wmf.5/extensions/Wikibase/client/: Backport: Fix ExternalUserNames service wiring for local database (duration: 00m 57s)
14:38 mwdebug-deploy@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
14:33 volans@cumin2002: START - Cookbook sre.hosts.reimage for host sretest1001.eqiad.wmnet with OS buster
14:26 otto@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'eventgate-main' for release 'canary' .
14:26 otto@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'eventgate-main' for release 'production' .
14:19 otto@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'eventgate-main' for release 'production' .
14:19 otto@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'eventgate-main' for release 'canary' .
13:56 otto@deploy1002: helmfile [staging] Ran 'sync' command on namespace 'eventgate-main' for release 'canary' .
13:55 otto@deploy1002: helmfile [staging] Ran 'sync' command on namespace 'eventgate-main' for release 'production' .
13:49 otto@deploy1002: helmfile [staging] Ran 'sync' command on namespace 'eventgate-main' for release 'production' .
13:49 otto@deploy1002: helmfile [staging] Ran 'sync' command on namespace 'eventgate-main' for release 'canary' .
13:34 volans: uploaded spicerack_1.0.6 to apt.wikimedia.org buster-wikimedia,bullseye-wikimedia
13:08 mwdebug-deploy@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
13:05 mwdebug-deploy@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
13:04 hashar@deploy1002: rebuilt and synchronized wikiversions files: all wikis to 1.38.0-wmf.5 refs T281169
12:56 kormat@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on 7 hosts with reason: Schema change s3 T278619
12:56 kormat@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on 7 hosts with reason: Schema change s3 T278619
12:52 kormat@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on 14 hosts with reason: Schema change s1 T278619
12:52 kormat@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on 14 hosts with reason: Schema change s1 T278619
12:48 kormat@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on 13 hosts with reason: Schema change s4 T278619
12:48 kormat@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on 13 hosts with reason: Schema change s4 T278619
12:43 kormat@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on 9 hosts with reason: Schema change s2 T278619
12:43 kormat@cumin1001: START - Cookbook sre.hosts.downtime for 1:00:00 on 9 hosts with reason: Schema change s2 T278619
12:34 kormat@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on 11 hosts with reason: Schema change s7 T278619
12:34 kormat@cumin1001: START - Cookbook sre.hosts.downtime for 1:00:00 on 11 hosts with reason: Schema change s7 T278619
11:55 kormat@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on 9 hosts with reason: Schema change s5 T278619
11:54 kormat@cumin1001: START - Cookbook sre.hosts.downtime for 1:00:00 on 9 hosts with reason: Schema change s5 T278619
11:47 kormat@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on 9 hosts with reason: Schema change s6 T278619
11:47 kormat@cumin1001: START - Cookbook sre.hosts.downtime for 1:00:00 on 9 hosts with reason: Schema change s6 T278619
11:13 Lucas_WMDE: UTC morning backport+config window done
11:10 Lucas_WMDE: lucaswerkmeister-wmde@mwmaint1002:~$ mwscript extensions/Wikibase/repo/maintenance/ResubmitChanges.php wikidatawiki --minimum-age $((60*60*12)) # T294008
11:10 mwdebug-deploy@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
11:07 jgiannelos@deploy1002: Synchronized wmf-config/InitialiseSettings.php: Config: Configure event stream for map tiles state change (T289771) (duration: 01m 04s)
11:07 mwdebug-deploy@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
10:48 ayounsi@cumin1001: END (PASS) - Cookbook sre.network.cf (exit_code=0)
10:48 ayounsi@cumin1001: START - Cookbook sre.network.cf
10:48 ayounsi@cumin1001: END (PASS) - Cookbook sre.network.cf (exit_code=0)
10:47 ayounsi@cumin1001: START - Cookbook sre.network.cf
10:14 jbond: mergeing refactor of P:base Gerrit:714975
09:54 ayounsi@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
09:49 ayounsi@cumin1001: START - Cookbook sre.dns.netbox
08:56 urbanecm@deploy1002: Synchronized private/PrivateSettings.php: Update T250887 mitigations (duration: 01m 03s)
08:33 ema@puppetmaster1001: conftool action : set/pooled=yes; selector: name=cp3062.esams.wmnet,service=(varnish-fe|ats-tls)
08:26 ema@puppetmaster1001: conftool action : set/pooled=no; selector: name=cp3062.esams.wmnet,service=(varnish-fe|ats-tls)
08:25 ema: cp3062: revert vsl_space experiment T293879
08:24 filippo@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host graphite1004.eqiad.wmnet with OS bullseye
08:03 marostegui@cumin1001: dbctl commit (dc=all): 'db1118 (re)pooling @ 100%: After upgrade', diff saved to https://phabricator.wikimedia.org/P17563 and previous config saved to /var/cache/conftool/dbconfig/20211021-080330-root.json
07:56 filippo@cumin1001: START - Cookbook sre.hosts.reimage for host graphite1004.eqiad.wmnet with OS bullseye
07:48 marostegui@cumin1001: dbctl commit (dc=all): 'db1118 (re)pooling @ 75%: After upgrade', diff saved to https://phabricator.wikimedia.org/P17562 and previous config saved to /var/cache/conftool/dbconfig/20211021-074826-root.json
07:33 marostegui@cumin1001: dbctl commit (dc=all): 'db1118 (re)pooling @ 50%: After upgrade', diff saved to https://phabricator.wikimedia.org/P17561 and previous config saved to /var/cache/conftool/dbconfig/20211021-073323-root.json
07:18 marostegui@cumin1001: dbctl commit (dc=all): 'db1118 (re)pooling @ 25%: After upgrade', diff saved to https://phabricator.wikimedia.org/P17560 and previous config saved to /var/cache/conftool/dbconfig/20211021-071819-root.json
07:03 marostegui@cumin1001: dbctl commit (dc=all): 'db1118 (re)pooling @ 10%: After upgrade', diff saved to https://phabricator.wikimedia.org/P17559 and previous config saved to /var/cache/conftool/dbconfig/20211021-070315-root.json
06:48 marostegui@cumin1001: dbctl commit (dc=all): 'db1118 (re)pooling @ 5%: After upgrade', diff saved to https://phabricator.wikimedia.org/P17558 and previous config saved to /var/cache/conftool/dbconfig/20211021-064812-root.json
06:35 elukey: `systemctl reload nginx` on cloudelastic100[5,6] to pick up the new TLS certificate and clear alerts - T293826
04:47 marostegui: Deploy schema change on s5 codfw - T291719
04:37 marostegui: Deploy schema change on s6 codfw - T291719
04:04 legoktm: restarted apache on lists1001 so it only uses new TLS cert (T293826)
03:29 eileen: civicrm revision changed from e889831012 to 733a8fceda, config revision is eed79486d5
00:06 mwdebug-deploy@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
00:01 mwdebug-deploy@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .

2021-10-20

23:56 thcipriani@deploy1002: Finished scap: Backport: Restore title to mobile skin without logo (T290525) (duration: 11m 41s)
23:44 thcipriani@deploy1002: Started scap: Backport: Restore title to mobile skin without logo (T290525)
23:42 mwdebug-deploy@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
23:39 mwdebug-deploy@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
23:30 mwdebug-deploy@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
23:29 tstarling@deploy1002: Synchronized wmf-config/CommonSettings.php: fawiki require login for creation of pages in the draft namespace T291018 (duration: 01m 02s)
23:27 mwdebug-deploy@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
23:27 tstarling@deploy1002: Synchronized wmf-config/InitialiseSettings.php: fawiki require login to edit main namespace T291018 (duration: 01m 04s)
22:13 dancy@deploy1002: Synchronized README: testing (4/4) (duration: 02m 52s)
22:00 dancy@deploy1002: Synchronized README: testing (3/4) (duration: 02m 57s)
21:54 dancy@deploy1002: Synchronized README: testing (2) (duration: 01m 02s)
21:52 dancy@deploy1002: Synchronized README: (no justification provided) (duration: 01m 03s)
21:50 dancy: Testing a series of one-file scap sync-file runs
21:22 mwdebug-deploy@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
21:19 mwdebug-deploy@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
21:10 mwdebug-deploy@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
21:08 urbanecm@deploy1002: Synchronized wmf-config/InitialiseSettings.php: b9cf996: Promote Growth features out of darkmode on several wikis (T291826, T255037, T287878) (duration: 01m 04s)
21:07 mwdebug-deploy@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
20:38 eileen: civicrm revision changed from 9b5e0d015b to e889831012, config revision is eed79486d5
20:25 legoktm: uploaded php7.4 on buster to apt.wm.o (T293449)
19:24 ebernhardson@deploy1002: Finished deploy [search/mjolnir/deploy@985a139]: bulk_daemon: detect cross-cluste config from old and new locations (duration: 00m 46s)
19:24 ebernhardson@deploy1002: Started deploy [search/mjolnir/deploy@985a139]: bulk_daemon: detect cross-cluste config from old and new locations
19:09 mutante: disabling puppet on mw* for a minute to deploy a change
18:41 otto@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'eventgate-main' for release 'production' .
18:41 otto@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'eventgate-main' for release 'canary' .
18:31 otto@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'eventgate-main' for release 'canary' .
18:30 otto@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'eventgate-main' for release 'production' .
18:24 otto@deploy1002: helmfile [staging] Ran 'sync' command on namespace 'eventgate-main' for release 'production' .
17:28 mutante: [krb1001:~] $ sudo manage_principals.py create statwithlatte --email_address=naray-ctr@wikimedia.org - T293810
17:27 mutante: [krb1001:~] $ sudo manage_principals.py create statwithlatte --email_address=naray-ctr@wikimedia.org
17:11 mwdebug-deploy@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
17:05 mwdebug-deploy@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
17:01 razzi@deploy1002: Finished deploy [analytics/refinery@9e3295f]: Regular analytics weekly train [analytics/refinery@9e3295f] (duration: 23m 42s)
17:00 hashar@deploy1002: Synchronized php-1.38.0-wmf.5/extensions/Wikibase/client: Update deprecated calls to ParserOutput in ShortDescHandler - T293860 (duration: 01m 03s)
16:56 mwdebug-deploy@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
16:53 mwdebug-deploy@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
16:53 hashar@deploy1002: Synchronized php-1.38.0-wmf.5/extensions/LiquidThreads/pages/LqtDiscussionPager.php: Remove deprecated usage of setProperty - T293895 (duration: 01m 03s)
16:49 hashar@deploy1002: Synchronized php-1.38.0-wmf.5/extensions/GeoCrumbs: Replace use of deprecated ParserOutput:getProperty() - T293894 (duration: 01m 09s)
16:44 mwdebug-deploy@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
16:41 mwdebug-deploy@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
16:37 razzi@deploy1002: Started deploy [analytics/refinery@9e3295f]: Regular analytics weekly train [analytics/refinery@9e3295f]
16:36 razzi: deploy refinery change for https://phabricator.wikimedia.org/T287084
16:13 jbond: upload cas_6.4.2-1_amd64.deb
15:42 volans@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
15:39 volans@cumin2002: START - Cookbook sre.dns.netbox
14:57 moritzm: installing modsecurity-crs security updates on Buster
14:48 moritzm: installing xmlgraphics-commons security updates on Buster
14:46 moritzm: installing irssi security updates on Buster
14:44 elukey@deploy1002: helmfile [ml-serve-eqiad] Ran 'sync' command on namespace 'revscoring-draftquality' for release 'main' .
14:44 elukey@deploy1002: helmfile [ml-serve-eqiad] Ran 'sync' command on namespace 'revscoring-editquality' for release 'main' .
14:35 moritzm: installing commons-io security updates on Buster
14:27 ema: cp3062: test higher vsl_space values T293879
14:27 kevinbazira@deploy1002: helmfile [ml-serve-eqiad] Ran 'sync' command on namespace 'revscoring-draftquality' for release 'main' .
14:12 moritzm: installing ruby2.3 security updates
13:40 moritzm: installing apache2 security updates on buster
13:27 mwdebug-deploy@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
13:24 mwdebug-deploy@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
13:21 hashar@deploy1002: Synchronized php: group1 wikis to 1.38.0-wmf.5 refs T281169 (duration: 01m 02s)
13:20 hashar@deploy1002: rebuilt and synchronized wikiversions files: group1 wikis to 1.38.0-wmf.5 refs T281169
13:11 kormat@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 3:00:00 on 7 hosts with reason: Schema change s3 T277116
13:11 kormat@cumin1001: START - Cookbook sre.hosts.downtime for 3:00:00 on 7 hosts with reason: Schema change s3 T277116
13:04 ema@puppetmaster1001: conftool action : set/pooled=yes; selector: name=cp3062.esams.wmnet,service=ats-tls
13:04 ema@puppetmaster1001: conftool action : set/pooled=yes; selector: name=cp3062.esams.wmnet,service=varnish-fe
12:51 ema: cp3062: bump vsl_space from 80M (default) to 512M T293879 - varnish restart needed
12:37 kormat@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 3:00:00 on 14 hosts with reason: Schema change s1 T277116
12:36 kormat@cumin1001: START - Cookbook sre.hosts.downtime for 3:00:00 on 14 hosts with reason: Schema change s1 T277116
12:17 mwdebug-deploy@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
12:09 mwdebug-deploy@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
12:02 urbanecm@deploy1002: Finished scap: 802d3b7: e4f7f85: CreateAccountCampaign: Support for recurring donors (T293699) (duration: 25m 19s)
11:57 mwdebug-deploy@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
11:49 mwdebug-deploy@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
11:46 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.decommission (exit_code=0) for hosts testvm2007.codfw.wmnet
11:40 jmm@cumin2002: START - Cookbook sre.hosts.decommission for hosts testvm2007.codfw.wmnet
11:37 btullis@cumin1001: END (PASS) - Cookbook sre.hadoop.roll-restart-masters (exit_code=0) restart masters for Hadoop test cluster: Restart of jvm daemons. - btullis@cumin1001
11:37 urbanecm@deploy1002: Started scap: 802d3b7: e4f7f85: CreateAccountCampaign: Support for recurring donors (T293699)
11:32 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.decommission (exit_code=0) for hosts testvm2005.codfw.wmnet
11:21 moritzm: installing ffmpeg security updates
11:15 urbanecm@deploy1002: Synchronized wmf-config/InitialiseSettings.php: e520fc5: GrowthExperiments: Add campaign pattern for enwiki (T293699) (duration: 01m 22s)
11:11 btullis@cumin1001: START - Cookbook sre.hadoop.roll-restart-masters restart masters for Hadoop test cluster: Restart of jvm daemons. - btullis@cumin1001
11:10 mwdebug-deploy@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
11:07 mwdebug-deploy@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
10:57 jmm@cumin2002: START - Cookbook sre.hosts.decommission for hosts testvm2005.codfw.wmnet
10:13 kormat@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on 13 hosts with reason: Schema change s4 T277116
10:13 kormat@cumin1001: START - Cookbook sre.hosts.downtime for 1:00:00 on 13 hosts with reason: Schema change s4 T277116
09:59 kormat@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on 9 hosts with reason: Schema change s2 T277116
09:59 kormat@cumin1001: START - Cookbook sre.hosts.downtime for 1:00:00 on 9 hosts with reason: Schema change s2 T277116
09:52 kormat@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on 11 hosts with reason: Schema change s7 T277116
09:52 kormat@cumin1001: START - Cookbook sre.hosts.downtime for 1:00:00 on 11 hosts with reason: Schema change s7 T277116
09:05 kormat@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on 9 hosts with reason: Schema change s5 T277116
09:04 kormat@cumin1001: START - Cookbook sre.hosts.downtime for 1:00:00 on 9 hosts with reason: Schema change s5 T277116
08:50 kormat@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on 9 hosts with reason: Schema change s6 T277116
08:50 kormat@cumin1001: START - Cookbook sre.hosts.downtime for 1:00:00 on 9 hosts with reason: Schema change s6 T277116
08:01 elukey@deploy1002: helmfile [ml-serve-eqiad] DONE helmfile.d/admin 'sync'.
08:01 elukey@deploy1002: helmfile [ml-serve-eqiad] START helmfile.d/admin 'sync'.
07:16 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host db1118.eqiad.wmnet with OS buster
07:09 oblivian@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
06:49 marostegui@cumin1001: START - Cookbook sre.hosts.reimage for host db1118.eqiad.wmnet with OS buster
06:45 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db1118 (s1) for reimage T290865', diff saved to https://phabricator.wikimedia.org/P17552 and previous config saved to /var/cache/conftool/dbconfig/20211020-064529-marostegui.json
06:41 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host db1126.eqiad.wmnet with OS buster
06:39 marostegui@cumin1001: dbctl commit (dc=all): 'Repool db1106 (s1) after upgrade', diff saved to https://phabricator.wikimedia.org/P17551 and previous config saved to /var/cache/conftool/dbconfig/20211020-063926-marostegui.json
06:35 marostegui: Upgrade db1106
06:34 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db1106 (s1) for upgrade', diff saved to https://phabricator.wikimedia.org/P17550 and previous config saved to /var/cache/conftool/dbconfig/20211020-063431-marostegui.json
06:31 dcausse: restarting blazegraph on wdqs1012
06:28 elukey: reboot analytics1066 - OS showing CPU soft lockups, tons of defunct processes (including node manager) and high CPU usage
06:21 marostegui: Depool clouddb1013 for upgrade
06:14 marostegui@cumin1001: START - Cookbook sre.hosts.reimage for host db1126.eqiad.wmnet with OS buster
06:12 oblivian@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
06:12 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db1126 (s8) for upgrade', diff saved to https://phabricator.wikimedia.org/P17549 and previous config saved to /var/cache/conftool/dbconfig/20211020-061202-marostegui.json
06:06 oblivian@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
06:05 XioNoX: put transport link between ulsfo and eqsin in service - T273308
05:59 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host db2112.codfw.wmnet with OS buster
05:26 marostegui@cumin1001: START - Cookbook sre.hosts.reimage for host db2112.codfw.wmnet with OS buster
04:44 mwdebug-deploy@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
04:42 mwdebug-deploy@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
04:40 legoktm@deploy1002: Synchronized wmf-config/InitialiseSettings.php: Enable $wgLocalHTTPProxy on group0 wikis (T288848) (duration: 01m 05s)
01:31 mwdebug-deploy@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
01:28 mwdebug-deploy@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
00:03 mwdebug-deploy@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
00:00 mwdebug-deploy@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
00:00 tgr: west coast evening deploys done

2021-10-19

23:59 tgr@deploy1002: Synchronized wmf-config/InitialiseSettings.php: Config: Reorder some wikis at wgExtraNamespaces and wmgVisualEditorAvailableNamespaces (T293846) (duration: 01m 02s)
23:51 mwdebug-deploy@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
23:48 mwdebug-deploy@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
23:47 tgr@deploy1002: Synchronized wmf-config/InitialiseSettings.php: Config: ruwikiversity: Add 'portal' and 'faculty' namespaces (T293545) (duration: 01m 03s)
23:40 mwdebug-deploy@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
23:37 mwdebug-deploy@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
23:36 tgr@deploy1002: Synchronized wmf-config/InitialiseSettings.php: Config: Set the project namespace and sitename for Javanese Wikipedia and Wiktionary (T287437) (duration: 01m 02s)
23:23 tgr@deploy1002: Synchronized wmf-config/InitialiseSettings.php: Config: Create Portal and Portal talk namespace for shiwiki (T288909) (duration: 01m 03s)
23:23 mwdebug-deploy@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
23:15 mwdebug-deploy@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
23:13 tgr@deploy1002: Synchronized static: Config: Repair the size of the logo of Kashmiri Wikipedia (T293342) (duration: 02m 14s)
21:34 mutante: mwmaint1002 - delete large files over 100MB from puppet clientbucket. sudo /usr/bin/find /var/lib/puppet/clientbucket/ -type f -size +100M -delete | fixed Icinga alert: RECOVERY - Check for large files in client bucket on mwmaint1002 is OK: OK: T165885
21:32 mutante: mwmaint1002 - delete large files over 100MB from puppet clientbucket. sudo /usr/bin/find /var/lib/puppet/clientbucket/ -type f -size +100M -delete
20:56 ejegg: updated payments-wiki from 0f48acea49 to 30e596903d
19:03 hashar@deploy1002: rebuilt and synchronized wikiversions files: group0 wikis to 1.38.0-wmf.5 refs T281169
18:46 urbanecm@deploy1002: Synchronized php-1.38.0-wmf.5/extensions/MediaSearch/: a84a675: 3231578: MediaSearch backports (T291392, T293335, T291392, T291622, T293554) (duration: 01m 03s)
18:45 urbanecm@deploy1002: Synchronized php-1.38.0-wmf.4/extensions/MediaSearch/: 694580a: c02e301: MediaSearch backports(T291392, T293335, T291392, T291622, T293554) (duration: 01m 03s)
18:37 cmjohnson@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host cloudmetrics1003.eqiad.wmnet with OS bullseye
18:30 foks: deleting 1 more email with deleteUserEmail.php
18:17 urbanecm@deploy1002: Synchronized wmf-config/InitialiseSettings.php: 1476a2d93: dd8393c1a0: foundationwiki: Restrict sensitive namespaces to editor group (T205350) (duration: 01m 03s)
18:12 cmjohnson@cumin1001: START - Cookbook sre.hosts.reimage for host cloudmetrics1003.eqiad.wmnet with OS bullseye
18:12 urbanecm@deploy1002: Synchronized wmf-config/InitialiseSettings.php: 9a2893c: Enable topic subscriptions as a beta feature on all remaining projects (T287802) (duration: 01m 04s)
18:00 legoktm@deploy1002: Synchronized wmf-config/: Add framework for setting $wgLocalHTTPProxy (T288848) (2/2) (duration: 01m 06s)
17:59 legoktm@deploy1002: Synchronized wmf-config/InitialiseSettings.php: Add framework for setting $wgLocalHTTPProxy (T288848) (1/2) (duration: 01m 05s)
17:57 foks: removing six email addresses on request (with deleteUserEmail.php)
17:37 cmjohnson@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host cloudmetrics1004.eqiad.wmnet with OS bullseye
17:25 cmjohnson@cumin1001: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host cloudmetrics1003.eqiad.wmnet with OS bullseye
17:11 cmjohnson@cumin1001: START - Cookbook sre.hosts.reimage for host cloudmetrics1004.eqiad.wmnet with OS bullseye
17:09 cmjohnson@cumin1001: START - Cookbook sre.hosts.reimage for host cloudmetrics1003.eqiad.wmnet with OS bullseye
16:48 bd808@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'toolhub' for release 'main' .
16:46 bd808@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'toolhub' for release 'main' .
16:41 bd808@deploy1002: helmfile [staging] Ran 'sync' command on namespace 'toolhub' for release 'main' .
16:12 kormat@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 3:00:00 on 7 hosts with reason: Schema change s3 T277118
16:11 kormat@cumin1001: START - Cookbook sre.hosts.downtime for 3:00:00 on 7 hosts with reason: Schema change s3 T277118
16:09 kormat@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on 14 hosts with reason: Schema change s1 T277118
16:09 kormat@cumin1001: START - Cookbook sre.hosts.downtime for 1:00:00 on 14 hosts with reason: Schema change s1 T277118
16:06 kormat@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on 13 hosts with reason: Schema change s4 T277118
16:06 kormat@cumin1001: START - Cookbook sre.hosts.downtime for 1:00:00 on 13 hosts with reason: Schema change s4 T277118
16:00 kormat@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on 11 hosts with reason: Schema change s7 T277118
16:00 kormat@cumin1001: START - Cookbook sre.hosts.downtime for 1:00:00 on 11 hosts with reason: Schema change s7 T277118
15:46 kormat@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on 9 hosts with reason: Schema change s2 T277118
15:46 kormat@cumin1001: START - Cookbook sre.hosts.downtime for 1:00:00 on 9 hosts with reason: Schema change s2 T277118
15:40 otto@deploy1002: Synchronized wmf-config/InitialiseSettings.php: wgEventStreams - remove now redundant stream setting - T277193 (duration: 01m 04s)
15:35 kormat@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on 9 hosts with reason: Schema change s5 T277118
15:35 kormat@cumin1001: START - Cookbook sre.hosts.downtime for 1:00:00 on 9 hosts with reason: Schema change s5 T277118
15:34 kormat@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 5:00:00 on 9 hosts with reason: Schema change s6 T277118
15:34 kormat@cumin1001: START - Cookbook sre.hosts.downtime for 5:00:00 on 9 hosts with reason: Schema change s6 T277118
15:32 kormat@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on 9 hosts with reason: Schema change s6 T277118
15:32 kormat@cumin1001: START - Cookbook sre.hosts.downtime for 1:00:00 on 9 hosts with reason: Schema change s6 T277118
15:30 bd808@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'toolhub' for release 'main' .
15:28 bd808@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'toolhub' for release 'main' .
15:26 bd808@deploy1002: helmfile [staging] Ran 'sync' command on namespace 'toolhub' for release 'main' .
15:17 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ganeti2025.codfw.wmnet
15:12 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host ganeti2025.codfw.wmnet
14:34 oblivian@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
14:29 jbond: disable puppet on lvs, cp, authdns, mc, mw-be and wcqs to while i merge G:662699
14:15 oblivian@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
14:11 hashar@deploy1002: Finished scap: testwikis wikis to 1.38.0-wmf.5 refs T281169 (duration: 45m 13s)
13:52 kevinbazira@deploy1002: helmfile [ml-serve-eqiad] Ran 'sync' command on namespace 'revscoring-editquality' for release 'main' .
13:45 oblivian@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
13:31 oblivian@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
13:28 mwdebug-deploy@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
13:26 hashar@deploy1002: Started scap: testwikis wikis to 1.38.0-wmf.5 refs T281169
13:19 marostegui@cumin1001: dbctl commit (dc=all): 'db1105:3311 (re)pooling @ 100%: After upgrade', diff saved to https://phabricator.wikimedia.org/P17547 and previous config saved to /var/cache/conftool/dbconfig/20211019-131927-root.json
13:16 marostegui@cumin1001: dbctl commit (dc=all): 'db1105:3312 (re)pooling @ 100%: After upgrade', diff saved to https://phabricator.wikimedia.org/P17546 and previous config saved to /var/cache/conftool/dbconfig/20211019-131651-root.json
13:04 marostegui@cumin1001: dbctl commit (dc=all): 'db1105:3311 (re)pooling @ 75%: After upgrade', diff saved to https://phabricator.wikimedia.org/P17545 and previous config saved to /var/cache/conftool/dbconfig/20211019-130424-root.json
13:01 marostegui@cumin1001: dbctl commit (dc=all): 'db1105:3312 (re)pooling @ 75%: After upgrade', diff saved to https://phabricator.wikimedia.org/P17544 and previous config saved to /var/cache/conftool/dbconfig/20211019-130147-root.json
12:49 marostegui@cumin1001: dbctl commit (dc=all): 'db1105:3311 (re)pooling @ 50%: After upgrade', diff saved to https://phabricator.wikimedia.org/P17543 and previous config saved to /var/cache/conftool/dbconfig/20211019-124920-root.json
12:46 marostegui@cumin1001: dbctl commit (dc=all): 'db1105:3312 (re)pooling @ 50%: After upgrade', diff saved to https://phabricator.wikimedia.org/P17542 and previous config saved to /var/cache/conftool/dbconfig/20211019-124644-root.json
12:40 moritzm: installing aftpd security updates
12:34 marostegui@cumin1001: dbctl commit (dc=all): 'db1105:3311 (re)pooling @ 25%: After upgrade', diff saved to https://phabricator.wikimedia.org/P17541 and previous config saved to /var/cache/conftool/dbconfig/20211019-123416-root.json
12:34 marostegui: Upgrade dbstore1003
12:31 marostegui@cumin1001: dbctl commit (dc=all): 'db1105:3312 (re)pooling @ 25%: After upgrade', diff saved to https://phabricator.wikimedia.org/P17540 and previous config saved to /var/cache/conftool/dbconfig/20211019-123140-root.json
12:19 marostegui@cumin1001: dbctl commit (dc=all): 'db1105:3311 (re)pooling @ 10%: After upgrade', diff saved to https://phabricator.wikimedia.org/P17539 and previous config saved to /var/cache/conftool/dbconfig/20211019-121913-root.json
12:16 marostegui@cumin1001: dbctl commit (dc=all): 'db1105:3312 (re)pooling @ 10%: After upgrade', diff saved to https://phabricator.wikimedia.org/P17538 and previous config saved to /var/cache/conftool/dbconfig/20211019-121636-root.json
12:12 XioNoX: push anycast tuning to all Lumen and NTT transit links - T288843
12:09 marostegui@cumin1001: dbctl commit (dc=all): 'Repool db1167 (s8) after upgrade', diff saved to https://phabricator.wikimedia.org/P17537 and previous config saved to /var/cache/conftool/dbconfig/20211019-120918-marostegui.json
12:04 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db1167 (s8) for upgrade', diff saved to https://phabricator.wikimedia.org/P17536 and previous config saved to /var/cache/conftool/dbconfig/20211019-120458-marostegui.json
12:04 marostegui@cumin1001: dbctl commit (dc=all): 'db1105:3311 (re)pooling @ 5%: After upgrade', diff saved to https://phabricator.wikimedia.org/P17535 and previous config saved to /var/cache/conftool/dbconfig/20211019-120409-root.json
12:03 marostegui@cumin1001: dbctl commit (dc=all): 'db1101:3318 (re)pooling @ 100%: After upgrade', diff saved to https://phabricator.wikimedia.org/P17534 and previous config saved to /var/cache/conftool/dbconfig/20211019-120348-root.json
12:01 urbanecm@deploy1002: Synchronized php-1.38.0-wmf.5/extensions/WikibaseMediaInfo/: ec01257: Escape captions when writing stored data into js state (T293556) (duration: 00m 55s)
12:01 marostegui@cumin1001: dbctl commit (dc=all): 'db1105:3312 (re)pooling @ 5%: After upgrade', diff saved to https://phabricator.wikimedia.org/P17533 and previous config saved to /var/cache/conftool/dbconfig/20211019-120132-root.json
12:00 urbanecm@deploy1002: Synchronized php-1.38.0-wmf.4/extensions/WikibaseMediaInfo/: 79808a9: Escape captions when writing stored data into js state (T293556) (duration: 00m 56s)
12:00 marostegui@cumin1001: dbctl commit (dc=all): 'db1101:3317 (re)pooling @ 100%: After upgrade', diff saved to https://phabricator.wikimedia.org/P17532 and previous config saved to /var/cache/conftool/dbconfig/20211019-120024-root.json
11:58 mwdebug-deploy@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
11:56 XioNoX: push anycast tuning to Tele2, Init7, DT transit links - T288843
11:55 mwdebug-deploy@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
11:48 marostegui@cumin1001: dbctl commit (dc=all): 'db1101:3318 (re)pooling @ 75%: After upgrade', diff saved to https://phabricator.wikimedia.org/P17531 and previous config saved to /var/cache/conftool/dbconfig/20211019-114844-root.json
11:46 marostegui: Upgrade db1105 (s1,s2)
11:46 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db1105 (s1,s2) for upgrade', diff saved to https://phabricator.wikimedia.org/P17530 and previous config saved to /var/cache/conftool/dbconfig/20211019-114649-marostegui.json
11:45 marostegui@cumin1001: dbctl commit (dc=all): 'db1101:3317 (re)pooling @ 75%: After upgrade', diff saved to https://phabricator.wikimedia.org/P17529 and previous config saved to /var/cache/conftool/dbconfig/20211019-114520-root.json
11:33 marostegui@cumin1001: dbctl commit (dc=all): 'db1101:3318 (re)pooling @ 50%: After upgrade', diff saved to https://phabricator.wikimedia.org/P17527 and previous config saved to /var/cache/conftool/dbconfig/20211019-113340-root.json
11:30 marostegui@cumin1001: dbctl commit (dc=all): 'db1101:3317 (re)pooling @ 50%: After upgrade', diff saved to https://phabricator.wikimedia.org/P17526 and previous config saved to /var/cache/conftool/dbconfig/20211019-113017-root.json
11:18 marostegui@cumin1001: dbctl commit (dc=all): 'db1101:3318 (re)pooling @ 25%: After upgrade', diff saved to https://phabricator.wikimedia.org/P17525 and previous config saved to /var/cache/conftool/dbconfig/20211019-111837-root.json
11:15 marostegui@cumin1001: dbctl commit (dc=all): 'db1101:3317 (re)pooling @ 25%: After upgrade', diff saved to https://phabricator.wikimedia.org/P17524 and previous config saved to /var/cache/conftool/dbconfig/20211019-111513-root.json
11:11 mwdebug-deploy@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
11:09 mwdebug-deploy@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
11:08 urbanecm@deploy1002: Synchronized wmf-config/InitialiseSettings.php: 7c31b04: DPL: Explicitly note it is not possible to enable DPL on any more wikis (duration: 00m 55s)
11:03 marostegui@cumin1001: dbctl commit (dc=all): 'db1101:3318 (re)pooling @ 10%: After upgrade', diff saved to https://phabricator.wikimedia.org/P17523 and previous config saved to /var/cache/conftool/dbconfig/20211019-110333-root.json
11:02 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ganeti2025.codfw.wmnet
11:00 marostegui@cumin1001: dbctl commit (dc=all): 'db1101:3317 (re)pooling @ 10%: After upgrade', diff saved to https://phabricator.wikimedia.org/P17522 and previous config saved to /var/cache/conftool/dbconfig/20211019-110009-root.json
10:56 marostegui: Upgrade clouddb1021
10:53 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host ganeti2025.codfw.wmnet
10:51 moritzm: failover master in ganeti-test to ganeti2026
10:50 godog: bounce superset on an-tool1005 to pick up statsd changes - T247963
10:49 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host db2112.codfw.wmnet with OS stretch
10:48 marostegui@cumin1001: dbctl commit (dc=all): 'db1101:3318 (re)pooling @ 5%: After upgrade', diff saved to https://phabricator.wikimedia.org/P17521 and previous config saved to /var/cache/conftool/dbconfig/20211019-104829-root.json
10:45 godog: bounce navtiming on webperf1001 to pick up statsd changes - T247963
10:45 godog: bounce superset on an-tool1010 to pick up statsd changes - T247963
10:45 marostegui@cumin1001: dbctl commit (dc=all): 'db1101:3317 (re)pooling @ 5%: After upgrade', diff saved to https://phabricator.wikimedia.org/P17520 and previous config saved to /var/cache/conftool/dbconfig/20211019-104506-root.json
10:38 oblivian@deploy1002: Synchronized w/static.php: Config: static.php: Add support for /static/current rewrites (take 2) (T285232) (duration: 00m 55s)
10:38 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ganeti2026.codfw.wmnet
10:37 marostegui: Upgrade db1101 (s7,s8)
10:36 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db1101 (s7,s8) for upgrade', diff saved to https://phabricator.wikimedia.org/P17519 and previous config saved to /var/cache/conftool/dbconfig/20211019-103634-marostegui.json
10:34 mwdebug-deploy@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
10:31 mwdebug-deploy@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
10:29 elukey@deploy1002: helmfile [ml-serve-eqiad] DONE helmfile.d/admin 'sync'.
10:28 elukey@deploy1002: helmfile [ml-serve-eqiad] START helmfile.d/admin 'sync'.
10:27 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host ganeti2026.codfw.wmnet
10:22 mwdebug-deploy@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
10:22 oblivian@deploy1002: Synchronized tests/WmfConfigServicesTest.php: Config: ProductionServices: use graphite2003 for statsd (T247963) (duration: 00m 54s)
10:22 godog: flip mw statsd traffic with https://gerrit.wikimedia.org/r/c/operations/mediawiki-config/+/731918 - T247963
10:21 oblivian@deploy1002: Synchronized wmf-config/ProductionServices.php: Config: ProductionServices: use graphite2003 for statsd (T247963) (duration: 00m 54s)
10:20 mwdebug-deploy@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
10:18 marostegui@cumin1001: START - Cookbook sre.hosts.reimage for host db2112.codfw.wmnet with OS stretch
10:16 marostegui@cumin1001: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host db2112.codfw.wmnet with OS buster
09:52 marostegui@cumin1001: START - Cookbook sre.hosts.reimage for host db2112.codfw.wmnet with OS buster
09:50 marostegui@cumin1001: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host db2112.codfw.wmnet with OS buster
09:44 hashar@deploy1002: Pruned MediaWiki: 1.38.0-wmf.3 (duration: 01m 39s)
09:42 hashar@deploy1002: Pruned MediaWiki: 1.38.0-wmf.2 (duration: 16m 06s)
09:37 godog: move graphite/statsd writes to graphite2003 - T247963
09:34 marostegui@cumin1001: START - Cookbook sre.hosts.reimage for host db2112.codfw.wmnet with OS buster
09:27 hashar: sap clean --delete 1.38.0-wmf.2 && scap clean --delete 1.38.0-wmf.3 # T281169
09:27 hashar: Cloned and applied security patches for 1.38.0-wmf.5 # T281169
09:19 marostegui: Stop slave on db2112 T290865
09:18 kormat@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on 14 hosts with reason: Schema change s1 T281058
09:18 kormat@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on 14 hosts with reason: Schema change s1 T281058
09:03 XioNoX: push anycast tuning to all Telia transit links - T288843
08:50 godog: point graphite.discovery.wmnet to graphite2003 - T247963
08:40 XioNoX: push prep-work for anycast tuning to all sites - T288843
08:33 kormat@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on 13 hosts with reason: Schema change s8 T281058
08:33 kormat@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on 13 hosts with reason: Schema change s8 T281058
08:32 urbanecm: [urbanecm@mwmaint1002 ~]$ mwscript namespaceDupes.php hrwiki --fix
08:17 oblivian@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
08:07 mvernon@cumin2002: conftool action : set/pooled=false; selector: name=codfw,dnsdisc=swift
08:07 mvernon@cumin2002: conftool action : set/pooled=false; selector: name=codfw,dnsdisc=swift-ro
08:03 XioNoX: push prep-work for anycast tuning in ulsfo (try 2) - T288843
08:01 oblivian@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
07:32 oblivian@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
07:24 ema: A:cp start rolling varnish upgrades to 6.0.8-1wm1 T292290
07:21 marostegui@cumin1001: dbctl commit (dc=all): 'db1178 (re)pooling @ 100%: After upgrade', diff saved to https://phabricator.wikimedia.org/P17517 and previous config saved to /var/cache/conftool/dbconfig/20211019-072111-root.json
07:15 marostegui@cumin1001: dbctl commit (dc=all): 'db1184 (re)pooling @ 100%: After upgrade', diff saved to https://phabricator.wikimedia.org/P17516 and previous config saved to /var/cache/conftool/dbconfig/20211019-071519-root.json
07:06 marostegui@cumin1001: dbctl commit (dc=all): 'db1178 (re)pooling @ 75%: After upgrade', diff saved to https://phabricator.wikimedia.org/P17515 and previous config saved to /var/cache/conftool/dbconfig/20211019-070607-root.json
07:00 marostegui@cumin1001: dbctl commit (dc=all): 'db1184 (re)pooling @ 75%: After upgrade', diff saved to https://phabricator.wikimedia.org/P17514 and previous config saved to /var/cache/conftool/dbconfig/20211019-070016-root.json
06:51 marostegui@cumin1001: dbctl commit (dc=all): 'db1178 (re)pooling @ 50%: After upgrade', diff saved to https://phabricator.wikimedia.org/P17513 and previous config saved to /var/cache/conftool/dbconfig/20211019-065104-root.json
06:45 marostegui@cumin1001: dbctl commit (dc=all): 'db1184 (re)pooling @ 50%: After upgrade', diff saved to https://phabricator.wikimedia.org/P17512 and previous config saved to /var/cache/conftool/dbconfig/20211019-064512-root.json
06:38 marostegui@cumin1001: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host db2112.codfw.wmnet with OS buster
06:36 marostegui@cumin1001: dbctl commit (dc=all): 'db1178 (re)pooling @ 25%: After upgrade', diff saved to https://phabricator.wikimedia.org/P17511 and previous config saved to /var/cache/conftool/dbconfig/20211019-063559-root.json
06:30 marostegui@cumin1001: dbctl commit (dc=all): 'db1184 (re)pooling @ 25%: After upgrade', diff saved to https://phabricator.wikimedia.org/P17510 and previous config saved to /var/cache/conftool/dbconfig/20211019-063008-root.json
06:20 marostegui@cumin1001: dbctl commit (dc=all): 'db1178 (re)pooling @ 10%: After upgrade', diff saved to https://phabricator.wikimedia.org/P17509 and previous config saved to /var/cache/conftool/dbconfig/20211019-062054-root.json
06:15 marostegui@cumin1001: dbctl commit (dc=all): 'db1184 (re)pooling @ 10%: After upgrade', diff saved to https://phabricator.wikimedia.org/P17508 and previous config saved to /var/cache/conftool/dbconfig/20211019-061505-root.json
06:06 marostegui: Upgrade dbstore1005
06:05 marostegui@cumin1001: dbctl commit (dc=all): 'db1178 (re)pooling @ 5%: After upgrade', diff saved to https://phabricator.wikimedia.org/P17507 and previous config saved to /var/cache/conftool/dbconfig/20211019-060551-root.json
06:04 ryankemper@cumin1001: END (PASS) - Cookbook sre.wdqs.data-transfer (exit_code=0)
06:03 marostegui: Upgrade db1184, db1178
06:01 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db1178 for upgrade', diff saved to https://phabricator.wikimedia.org/P17506 and previous config saved to /var/cache/conftool/dbconfig/20211019-060123-marostegui.json
06:00 marostegui@cumin1001: dbctl commit (dc=all): 'db1184 (re)pooling @ 5%: After upgrade', diff saved to https://phabricator.wikimedia.org/P17505 and previous config saved to /var/cache/conftool/dbconfig/20211019-060001-root.json
05:54 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db1184 for upgrade', diff saved to https://phabricator.wikimedia.org/P17504 and previous config saved to /var/cache/conftool/dbconfig/20211019-055429-marostegui.json
05:51 marostegui@cumin1001: START - Cookbook sre.hosts.reimage for host db2112.codfw.wmnet with OS buster
05:46 marostegui: Reimage db2112 (s1 codfw master) T290865
04:36 ryankemper@cumin1001: START - Cookbook sre.wdqs.data-transfer
03:49 ryankemper@cumin1001: END (PASS) - Cookbook sre.wdqs.data-transfer (exit_code=0)
02:36 mwdebug-deploy@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
02:34 mwdebug-deploy@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
02:21 ryankemper@cumin1001: START - Cookbook sre.wdqs.data-transfer
02:18 ryankemper@cumin1001: END (PASS) - Cookbook sre.wdqs.data-transfer (exit_code=0)
02:09 mwdebug-deploy@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
02:06 mwdebug-deploy@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
00:38 ryankemper@cumin1001: START - Cookbook sre.wdqs.data-transfer

2021-10-18

23:40 hoo: Updated the Wikidata property suggester with data from the 2021-10-04 JSON dump (with pre-applied T132839 workarounds)
23:16 urbanecm@deploy1002: Synchronized wmf-config/InitialiseSettings.php: b654980: Create an alias for the Draft namespace on hrwiki (T291755) (duration: 00m 56s)
23:16 mwdebug-deploy@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
23:13 mwdebug-deploy@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
23:12 urbanecm: [urbanecm@mwmaint1002 ~]$ mwscript namespaceDupes.php --wiki=thwiktionary --fix # T291761
23:10 urbanecm@deploy1002: Synchronized wmf-config/InitialiseSettings.php: abe777d: Create Rhymes namespace for thwiktionary (T291761) (duration: 00m 57s)
23:04 mwdebug-deploy@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
23:01 mwdebug-deploy@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
22:56 legoktm@deploy1002: Synchronized php-1.38.0-wmf.4/includes/http/MWHttpRequest.php: Allow using a reverse proxy for local HTTP requests (T288848) (duration: 00m 56s)
22:06 maryum: deployed security patch for T293589
21:23 maryum: deployed security patch for T293556
21:05 mutante: mwmaint1002 - sudo -u www-data /usr/local/bin/mw-cli-wrapper /usr/local/bin/mwscript extensions/TranslationNotifications/scripts/DigestEmailer.php --wiki mediawikiwiki | Fatal error: Uncaught Error: Class 'MediaWiki\MediaWikiServices' not found
20:58 mutante: mwmaint1002 - attempt to start mediawiki_job_translationnotifications-mediawikiwiki which was alerting as failed
20:41 mwdebug-deploy@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
20:38 mwdebug-deploy@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
19:46 pt1979@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
19:42 pt1979@cumin2002: START - Cookbook sre.dns.netbox
19:29 mutante: LDAP: removed non-existent user gerrit2 from group labsadminbots (T160122)
19:29 urbanecm@deploy1002: Synchronized php-1.38.0-wmf.4/extensions/MediaSearch/resources/store/state.js: ac7b4fc: Revert 727328 (T293554) (duration: 00m 56s)
19:29 mwdebug-deploy@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
19:26 mwdebug-deploy@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
19:12 mwdebug-deploy@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
19:09 mwdebug-deploy@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
18:45 otto@deploy1002: Synchronized wmf-config/InitialiseSettings.php: Convert $wgEventStreams to be an associative array - T277193 (duration: 00m 57s)
18:45 mwdebug-deploy@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
18:42 mwdebug-deploy@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
18:07 mutante: gerrit - removed tonina from wmde-mediawiki gerrit group (T293621)
17:51 mutante: puppet run on all bastion hosts via cumin
15:32 mvernon@cumin2002: END (FAIL) - Cookbook sre.discovery.service-route (exit_code=99)
15:32 mvernon@cumin2002: START - Cookbook sre.discovery.service-route
15:23 kormat@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 18:00:00 on 7 hosts with reason: Schema change s3 T281058
15:23 kormat@cumin1001: START - Cookbook sre.hosts.downtime for 18:00:00 on 7 hosts with reason: Schema change s3 T281058
15:16 herron: reprepro copied anycast-healthchecker, python3-json-logger and python3-anycast-healthchecker from buster-wikimedia to bullseye-wikimedia T292196
15:16 kormat@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 3:00:00 on 13 hosts with reason: Schema change s4 T281058
15:16 kormat@cumin1001: START - Cookbook sre.hosts.downtime for 3:00:00 on 13 hosts with reason: Schema change s4 T281058
14:59 kormat@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on 11 hosts with reason: Schema change s7 T281058
14:59 kormat@cumin1001: START - Cookbook sre.hosts.downtime for 1:00:00 on 11 hosts with reason: Schema change s7 T281058
14:54 herron: rebuilt and uploaded kafkatee for bullseye T292196
14:50 pt1979@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
14:45 pt1979@cumin2002: START - Cookbook sre.dns.netbox
14:36 phuedx@deploy1002: Synchronized wmf-config/InitialiseSettings-labs.php: Config: [beta] Rename $wgIPInfoGeoIP2Path to $wgIPInfoGeoIP2Prefix (T289361) (duration: 00m 56s)
14:36 mwdebug-deploy@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
14:33 mwdebug-deploy@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
14:15 oblivian@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
14:09 oblivian@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
13:54 mwdebug-deploy@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
13:51 mwdebug-deploy@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
13:48 lucaswerkmeister-wmde@deploy1002: Synchronized wmf-config/InitialiseSettings-labs.php: Config: Remove wmg variables for dispatch via jobs (T291828) (2/2) (duration: 00m 56s)
13:47 lucaswerkmeister-wmde@deploy1002: Synchronized wmf-config/InitialiseSettings.php: Config: Remove wmg variables for dispatch via jobs (T291828) (1/2) (duration: 00m 56s)
13:37 mwdebug-deploy@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
13:35 lucaswerkmeister-wmde@deploy1002: Synchronized wmf-config/Wikibase.php: Config: Unconditionally enable Wikibase dispatching via jobs (T291828) (duration: 00m 56s)
13:34 mwdebug-deploy@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
12:22 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host db2079.codfw.wmnet with OS buster
12:04 mwdebug-deploy@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
12:02 mwdebug-deploy@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
11:55 Lucas_WMDE: UTC morning backport window done
11:55 lucaswerkmeister-wmde@deploy1002: Synchronized wmf-config/InitialiseSettings-labs.php: Config: Remove $wmgWikibaseDispatchViaJobsAllowedClients (T291828) (2/2) (duration: 00m 56s)
11:54 lucaswerkmeister-wmde@deploy1002: Synchronized wmf-config/InitialiseSettings.php: Config: Remove $wmgWikibaseDispatchViaJobsAllowedClients (T291828) (1/2) (duration: 00m 56s)
11:53 mwdebug-deploy@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
11:51 marostegui@cumin1001: START - Cookbook sre.hosts.reimage for host db2079.codfw.wmnet with OS buster
11:50 mwdebug-deploy@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
11:49 marostegui: Reimage db2079 (codfw s8 master) T290868
11:48 lucaswerkmeister-wmde@deploy1002: Synchronized wmf-config/Wikibase.php: Config: Set dispatchViaJobsAllowedClients to null everywhere (T291828) (duration: 00m 56s)
11:41 mwdebug-deploy@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
11:38 mwdebug-deploy@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
11:37 lucaswerkmeister-wmde@deploy1002: Synchronized php-1.38.0-wmf.4/extensions/Wikibase/repo/includes/ChangeModification/DispatchChangesJob.php: Backport: Make deduplication actually work for DispatchChangesJob (T291118) (duration: 00m 55s)
11:10 lucaswerkmeister-wmde@deploy1002: Synchronized php-1.38.0-wmf.4/extensions/Wikibase/repo/includes/Hooks/RecentChangeSaveHookHandler.php: Backport: Create DispatchChangesJob without change id (T291118) (2/2) (duration: 00m 56s)
11:09 mwdebug-deploy@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
11:09 lucaswerkmeister-wmde@deploy1002: Synchronized php-1.38.0-wmf.4/extensions/Wikibase/repo/includes/ChangeModification/DispatchChangesJob.php: Backport: Create DispatchChangesJob without change id (T291118) (duration: 00m 56s)
11:07 mwdebug-deploy@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
10:55 mwdebug-deploy@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
10:51 mwdebug-deploy@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
10:47 moritzm: copied wmf-certificates from buster-wikimedia to stretch-wikimedia in reprepro
10:38 lucaswerkmeister-wmde@deploy1002: Synchronized php-1.38.0-wmf.4/extensions/Wikibase/repo/: Backport: Don't filter by change Id when dispatching to client wikis () (duration: 00m 59s)
09:48 moritzm: installing node-tar security updates on buster
09:39 vgutierrez: updating acme-chief to version 0.34 on acmechief instances - T292619
09:38 godog: sync metrics from graphite1004 to graphite2003 - T247963
09:13 moritzm: installing apr security updates on bullseye
08:57 godog: cleanup graphite metrics not modified for >= ~3yr (1024 days)
07:34 ema: cp3060 (text), cp3061 (upload): upgrade varnish to 6.0.8 T292290
07:34 elukey: depool + restart blazegraph on wdqs1013
07:01 oblivian@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
06:31 oblivian@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
06:09 oblivian@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .

2021-10-16

03:56 ryankemper@cumin1001: END (PASS) - Cookbook sre.wdqs.data-transfer (exit_code=0)
02:19 ryankemper@cumin1001: START - Cookbook sre.wdqs.data-transfer
01:30 ryankemper@cumin1001: END (PASS) - Cookbook sre.wdqs.data-transfer (exit_code=0)

2021-10-15

23:48 ryankemper@cumin1001: START - Cookbook sre.wdqs.data-transfer
23:27 dzahn@deploy1002: helmfile [staging] Ran 'sync' command on namespace 'miscweb' for release 'main' .
23:23 ryankemper@cumin1001: END (PASS) - Cookbook sre.wdqs.data-transfer (exit_code=0)
22:38 mutante: apt1001 - removing nginx package, accidentally installed, should just be nginx-light of course, running puppet
22:36 mutante: apt2001 - removing nginx package, accidentally installed, should just be nginx-light of course, running puppet
22:34 mutante: apt2001 - upgraded nginx
22:18 dzahn@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'miscweb' for release 'main' .
22:14 dzahn@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'miscweb' for release 'main' .
22:05 dpifke@deploy1002: Finished deploy [performance/arc-lamp@40cb764]: Revert problematic arclamp patch to fix daemon crashes (duration: 00m 05s)
22:05 dpifke@deploy1002: Started deploy [performance/arc-lamp@40cb764]: Revert problematic arclamp patch to fix daemon crashes
21:51 ryankemper@cumin1001: START - Cookbook sre.wdqs.data-transfer
21:44 ryankemper@cumin1001: END (PASS) - Cookbook sre.wdqs.data-transfer (exit_code=0)
21:44 dzahn@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'miscweb' for release 'main' .
21:36 dzahn@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'miscweb' for release 'main' .
20:09 ryankemper@cumin1001: START - Cookbook sre.wdqs.data-transfer
18:44 ryankemper@cumin1001: END (PASS) - Cookbook sre.wdqs.data-transfer (exit_code=0)
17:20 ryankemper@cumin1001: START - Cookbook sre.wdqs.data-transfer
17:17 mutante: gitlab1001 - disabling puppet for debugging
17:05 mutante: gitlab2001 - temp stopped puppet - debugging gitlab restore script with Arnold - T283076
17:01 ryankemper@cumin1001: END (PASS) - Cookbook sre.wdqs.data-transfer (exit_code=0)
16:50 mutante: gitlab2001 - temp stopped puppet - debugging gitlab restore script with Arnold
16:46 ryankemper@cumin1001: END (PASS) - Cookbook sre.wdqs.data-transfer (exit_code=0)
16:44 jayme@deploy1002: helmfile [staging] Ran 'sync' command on namespace 'shellbox-constraints' for release 'main' .
15:23 ryankemper@cumin1001: START - Cookbook sre.wdqs.data-transfer
15:23 ryankemper@cumin1001: START - Cookbook sre.wdqs.data-transfer
15:08 ryankemper@cumin1001: END (PASS) - Cookbook sre.wdqs.data-transfer (exit_code=0)
15:08 ryankemper@cumin1001: END (PASS) - Cookbook sre.wdqs.data-transfer (exit_code=0)
14:48 oblivian@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
14:31 oblivian@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
14:15 oblivian@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
13:32 ryankemper@cumin1001: START - Cookbook sre.wdqs.data-transfer
13:32 ryankemper@cumin1001: END (FAIL) - Cookbook sre.wdqs.data-transfer (exit_code=99)
13:32 ryankemper@cumin1001: START - Cookbook sre.wdqs.data-transfer
13:30 elukey: start topic rebalancing for kafka main-eqiad (long maintenance, it will last a couple of days)
13:24 ryankemper@cumin1001: START - Cookbook sre.wdqs.data-transfer
13:21 vgutierrez: updating acme-chief to version 0.34 on acmechief-test instances - T292619
13:19 ryankemper@cumin1001: END (PASS) - Cookbook sre.wdqs.data-transfer (exit_code=0)
13:14 vgutierrez: upload acme-chief 0.34 to apt.wikimedia.org (buster) - T292619
11:55 oblivian@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
11:49 ryankemper@cumin1001: START - Cookbook sre.wdqs.data-transfer
11:48 jmm@cumin2002: END (PASS) - Cookbook sre.ganeti.makevm (exit_code=0) for new host testvm2007.codfw.wmnet
11:45 oblivian@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
11:33 oblivian@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
11:24 jmm@cumin2002: START - Cookbook sre.ganeti.makevm for new host testvm2007.codfw.wmnet
11:14 oblivian@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
10:46 ryankemper@cumin1001: END (PASS) - Cookbook sre.wdqs.data-transfer (exit_code=0)
09:15 ryankemper@cumin1001: START - Cookbook sre.wdqs.data-transfer
09:06 ryankemper@cumin1001: END (PASS) - Cookbook sre.wdqs.data-transfer (exit_code=0)
08:58 jelto: jelto@gitlab1001:~$ sudo disable-puppet "disable puppet on gitlab1001 to test 728380 on GitLab replica - T283076"
07:41 ryankemper@cumin1001: START - Cookbook sre.wdqs.data-transfer
06:20 urbanecm: Start server-side upload for 1 video file
02:14 ryankemper: T288231 `wdqs2006` data transfer complete and all tests passing on the host. All of `codfw wdqs-internal` is on the new streaming updater
00:09 ryankemper@cumin1001: END (PASS) - Cookbook sre.wdqs.data-transfer (exit_code=0)
00:07 brennen: end of UTC late backport & config training window

2021-10-14

23:59 cjming@deploy1002: Synchronized wmf-config/logos.php: Config: Change Kashmiri Wikipedia logo (T293342) (duration: 00m 55s)
23:58 cjming@deploy1002: Synchronized logos/config.yaml: Config: Change Kashmiri Wikipedia logo (T293342) (duration: 00m 55s)
23:56 cjming@deploy1002: Synchronized static/images/project-logos: Config: Change Kashmiri Wikipedia logo (T293342) (duration: 00m 56s)
23:49 cjming@deploy1002: Synchronized wmf-config/logos.php: Config: Change Kashmiri Wiktionary logo (T293373) (duration: 00m 55s)
23:48 cjming@deploy1002: Synchronized logos/config.yaml: Config: Change Kashmiri Wiktionary logo (T293373) (duration: 00m 55s)
23:46 cjming@deploy1002: Synchronized static/images/project-logos: Config: Change Kashmiri Wiktionary logo (T293373) (duration: 00m 56s)
23:43 ejegg: updated payments-wiki from 19d18c1852 to 0f48acea49
23:34 cjming@deploy1002: Synchronized php-1.38.0-wmf.4/extensions/WikimediaEvents/includes/VectorPrefDiffInstrumentation.php: Backport: Change VectorPrefDiffInstrumentation stream name to `mediawiki.skin_diff` (T289622) (duration: 00m 56s)
23:24 cjming@deploy1002: Synchronized wmf-config/InitialiseSettings.php: Config: allow sysops to add and remove users to other groups on ptwikivoyage (T292806) (duration: 00m 56s)
23:21 ryankemper@cumin1001: END (FAIL) - Cookbook sre.elasticsearch.rolling-operation (exit_code=99) restart without plugin upgrade (1 nodes at a time) for ElasticSearch cluster cloudelastic: cloudelastic restart - ryankemper@cumin1001 - T292814
23:11 brennen@deploy1002: Synchronized wmf-config/InitialiseSettings.php: Config: Add americanantiquarian.org to the wgCopyUploadsDomains allowlist of Wikimedia Commons (T292918) (duration: 00m 57s)
23:11 mutante: mw1452 - re-pooled, scap pull
23:09 dzahn@deploy1002: helmfile [staging] Ran 'sync' command on namespace 'miscweb' for release 'main' .
22:35 ryankemper@cumin1001: START - Cookbook sre.wdqs.data-transfer
22:35 ryankemper: T288231 Ran puppet on `wdqs2006`, now back to the cookbook run
22:33 ryankemper: T288231 Forgot about running puppet-agent on `wdqs2006`; aborted cookbook run
22:33 ryankemper@cumin1001: END (ERROR) - Cookbook sre.wdqs.data-transfer (exit_code=97)
22:33 ryankemper@cumin1001: START - Cookbook sre.wdqs.data-transfer
22:32 ryankemper: T288231 Merged https://gerrit.wikimedia.org/r/c/operations/puppet/+/730795; proceeding to data-transfer on `wdqs2006`: `sudo rm -fv /srv/wdqs/data_loaded` on `wdqs2006` followed by `ryankemper@cumin1001:~$ sudo cookbook sre.wdqs.data-transfer --source wdqs2008.codfw.wmnet --dest wdqs2006.codfw.wmnet --reason "streaming updater cutover for wdqs2005" --blazegraph_instance blazegraph --task-id T288231`
22:31 mutante: depooling mw1452 for testig
22:28 ryankemper: T288231 `ryankemper@wdqs2005:~$ sudo pool`: transfer completed successfully; tests passing on host (used `ssh -L 9999:localhost:80 wdqs2005.codfw.wmnet` to establish tunnel)
22:23 dpifke@deploy1002: Finished deploy [performance/arc-lamp@84fe496]: New flamegraph.pl from upstream T291898 (duration: 00m 05s)
22:23 dpifke@deploy1002: Started deploy [performance/arc-lamp@84fe496]: New flamegraph.pl from upstream T291898
22:17 ryankemper@cumin1001: START - Cookbook sre.elasticsearch.rolling-operation restart without plugin upgrade (1 nodes at a time) for ElasticSearch cluster cloudelastic: cloudelastic restart - ryankemper@cumin1001 - T292814
22:07 eileen: civicrm revision changed from 018d3b19fe to 9b5e0d015b, config revision is 781d6a1b1f
21:34 robh@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
21:25 robh@cumin1001: START - Cookbook sre.dns.netbox
21:10 robh@cumin1001: END (FAIL) - Cookbook sre.dns.netbox (exit_code=99)
21:06 robh@cumin1001: START - Cookbook sre.dns.netbox
19:45 dancy@deploy1002: rebuilt and synchronized wikiversions files: group2 wikis to 1.38.0-wmf.4 refs T281168
19:23 dzahn@deploy1002: helmfile [staging] Ran 'sync' command on namespace 'miscweb' for release 'main' .
19:05 dzahn@deploy1002: helmfile [staging] Ran 'sync' command on namespace 'miscweb' for release 'main' .
18:53 dzahn@deploy1002: helmfile [staging] Ran 'sync' command on namespace 'miscweb' for release 'main' .
18:53 urbanecm: [urbanecm@mwmaint1002 ~]$ mwscript namespaceDupes.php --wiki=dagwiki --fix
18:47 urbanecm: [urbanecm@mwmaint1002 ~]$ mwscript extensions/CentralAuth/maintenance/fixStuckGlobalRename.php --wiki=frwiktionary --logwiki=metawiki 'TURK FASTER' 'ARTHUR MORGAN'
18:42 urbanecm: [urbanecm@mwmaint1002 ~]$ mwscript extensions/CentralAuth/maintenance/fixStuckGlobalRename.php --wiki=metawiki --logwiki=metawiki 'George Dum Fulton' 'George Fulton' # T293403
18:41 urbanecm: UTC evening B&C done
18:40 urbanecm@deploy1002: Synchronized php-1.38.0-wmf.4/extensions/MediaSearch/extension.json: 6da3523: Fix assessment quickview labels (T292596) (duration: 01m 03s)
18:37 urbanecm@deploy1002: Synchronized wmf-config/InitialiseSettings.php: c8dffef: Create Salima namespace for dagwiki (T289911) (duration: 01m 04s)
18:30 dzahn@deploy1002: helmfile [staging] Ran 'sync' command on namespace 'miscweb' for release 'main' .
18:25 urbanecm@deploy1002: Synchronized wmf-config/InitialiseSettings.php: 0bccd4b: Add $wgSitename and $wgMetaNamespace for kswiki and kswiktionary (T289752, T289767) (duration: 01m 04s)
18:17 ryankemper@cumin1001: END (PASS) - Cookbook sre.wdqs.data-transfer (exit_code=0)
18:14 urbanecm@deploy1002: Synchronized wmf-config/InitialiseSettings.php: 262e588: Enable Growth mentor dashboard backend on all wikis (T278920) (duration: 01m 05s)
18:07 urbanecm@deploy1002: Synchronized wmf-config/InitialiseSettings.php: 41baa8c: Add new mediawiki.skin_diff event logging stream (T289622) (duration: 01m 05s)
18:03 addshore@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'changeprop-jobqueue' for release 'production' .
18:02 addshore@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'changeprop-jobqueue' for release 'production' .
18:01 addshore@deploy1002: helmfile [staging] Ran 'sync' command on namespace 'changeprop-jobqueue' for release 'staging' .
17:54 bd808@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'toolhub' for release 'main' .
17:52 rzl: repooled mw1452 (with `sudo pool` so no auto log from conftool)
17:47 bd808@deploy1002: helmfile [staging] Ran 'sync' command on namespace 'toolhub' for release 'main' .
17:45 rzl@cumin1001: conftool action : set/pooled=no; selector: name=mw1452.eqiad.wmnet
17:42 rzl: depool mw1452 for training
17:32 addshore@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'changeprop-jobqueue' for release 'production' .
17:31 addshore@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'changeprop-jobqueue' for release 'production' .
17:29 addshore@deploy1002: helmfile [staging] Ran 'sync' command on namespace 'changeprop-jobqueue' for release 'staging' .
16:44 ryankemper@cumin1001: START - Cookbook sre.wdqs.data-transfer
16:44 ryankemper: T288231 Manually killed dangling `pigz` / `nc` processes on `wdqs2008` (and `wdqs2005` implicitly). Should be in the right state to re-start the `data-transfer` cookbook from again
16:41 ryankemper@cumin1001: END (ERROR) - Cookbook sre.wdqs.data-transfer (exit_code=97)
16:37 elukey: drop kubeflow-kfserving* docker images from deneb
16:36 ryankemper@cumin1001: START - Cookbook sre.wdqs.data-transfer
16:34 ryankemper@cumin1001: END (FAIL) - Cookbook sre.wdqs.data-transfer (exit_code=99)
16:33 moritzm: installing node-ansi-regex security updates
16:28 mbsantos@deploy1002: Finished deploy [kartotherian/deploy@4bff2d1]: Force mirrored traffic to 0% for everywhere (duration: 02m 24s)
16:25 mbsantos@deploy1002: Started deploy [kartotherian/deploy@4bff2d1]: Force mirrored traffic to 0% for everywhere
16:24 dancy@deploy1002: Synchronized php-1.38.0-wmf.4/extensions/Collection/includes/CollectionHooks.php: Backport: Check that the timestamp key/value is set to avoid undefined offset (T293300) (duration: 01m 04s)
16:16 mbsantos@deploy1002: Finished deploy [kartotherian/deploy@071f7c3]: Increase mirrored traffic to 100% for eqiad (duration: 02m 41s)
16:14 mbsantos@deploy1002: Started deploy [kartotherian/deploy@071f7c3]: Increase mirrored traffic to 100% for eqiad
16:08 ryankemper@cumin1001: START - Cookbook sre.wdqs.data-transfer
16:07 ryankemper@cumin1001: END (ERROR) - Cookbook sre.wdqs.data-transfer (exit_code=97)
16:07 ryankemper: T288231 About to ctrl+c out of ongoing data transfer because puppet run following merge of https://gerrit.wikimedia.org/r/c/operations/puppet/+/730794 restarted blazegraph; we'll manually disable updater and kick off the transfer again
16:04 ryankemper: T288231 `ryankemper@wdqs2005:~$ sudo run-puppet-agent --force`
15:56 ryankemper@cumin1001: START - Cookbook sre.wdqs.data-transfer
15:54 ryankemper: T288231 `ryankemper@wdqs2008:~$ sudo depool`
15:52 ryankemper: T288231 `ryankemper@wdqs2005:~$ sudo depool`
15:22 jmm@cumin2002: END (PASS) - Cookbook sre.ganeti.addnode (exit_code=0) for new host ganeti2026.codfw.wmnet to ganeti-test01.svc.codfw.wmnet
15:20 jmm@cumin2002: START - Cookbook sre.ganeti.addnode for new host ganeti2026.codfw.wmnet to ganeti-test01.svc.codfw.wmnet
15:13 bd808@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'toolhub' for release 'main' .
15:06 dancy@deploy1002: Synchronized php-1.38.0-wmf.4/extensions/VisualEditor/includes/VisualEditorHooks.php: Backport: Fix value of 'namespacesWithSubpages' in wgVisualEditorConfig (T293310) (duration: 01m 04s)
15:02 dancy@deploy1002: Synchronized php-1.38.0-wmf.4/extensions/Collection/includes/CollectionHooks.php: Backport: Check that the timestamp key/value is set to avoid undefined offset (T293300) (duration: 01m 03s)
15:00 jmm@cumin2002: END (FAIL) - Cookbook sre.ganeti.addnode (exit_code=99) for new host ganeti2026.codfw.wmnet to ganeti-test01.svc.codfw.wmnet
14:59 jmm@cumin2002: START - Cookbook sre.ganeti.addnode for new host ganeti2026.codfw.wmnet to ganeti-test01.svc.codfw.wmnet
14:53 kormat: upgrading orchestrator.wm.o to 3.2.6-1 T275784
14:49 jbond@cumin1001: conftool action : set/pooled=true; selector: name=eqiad,dnsdisc=apt
14:43 jbond: migrate apt.w.o to a dns active/passiev discovery address (cc moritzm)
14:23 moritzm: installing krb5 security updates on KDCs
14:19 elukey@deploy1002: helmfile [ml-serve-eqiad] Ran 'sync' command on namespace 'revscoring-editquality' for release 'main' .
14:10 urbanecm@deploy1002: Synchronized dblists/growthexperiments.dblist: b35adfc: Deploy Growth wikis to 4 wikis in dark mode (T291826; 2/2) (duration: 01m 03s)
14:07 urbanecm: Run extensions/GrowthExperiments/initWikiConfig.php for ganwiki, iuwiki, tgwiki (T291826)
14:07 urbanecm: Create growthexperiments DB tables for ganwiki, iuwiki, tgwiki (T291826)
14:06 elukey@deploy1002: helmfile [ml-serve-eqiad] Ran 'sync' command on namespace 'revscoring-editquality' for release 'main' .
14:05 elukey@deploy1002: helmfile [ml-serve-eqiad] DONE helmfile.d/admin 'sync'.
14:05 elukey@deploy1002: helmfile [ml-serve-eqiad] START helmfile.d/admin 'sync'.
14:04 urbanecm@deploy1002: Synchronized wmf-config/InitialiseSettings.php: b35adfc: Deploy Growth wikis to 4 wikis in dark mode (T291826; 1/2) (duration: 01m 04s)
14:03 urbanecm@deploy1002: Synchronized dblists/visualeditor-nondefault.dblist: 82d0a4b: Enable VE by default on 4 more wikis (T290614) (duration: 01m 05s)
13:56 elukey@deploy1002: helmfile [ml-serve-eqiad] Ran 'sync' command on namespace 'revscoring-editquality' for release 'main' .
13:55 elukey@deploy1002: helmfile [ml-serve-eqiad] DONE helmfile.d/admin 'sync'.
13:54 elukey@deploy1002: helmfile [ml-serve-eqiad] START helmfile.d/admin 'sync'.
13:54 elukey@deploy1002: helmfile [ml-serve-eqiad] DONE helmfile.d/admin 'sync'.
13:54 elukey@deploy1002: helmfile [ml-serve-eqiad] START helmfile.d/admin 'sync'.
13:52 elukey@deploy1002: helmfile [ml-serve-eqiad] DONE helmfile.d/admin 'sync'.
13:52 elukey@deploy1002: helmfile [ml-serve-eqiad] START helmfile.d/admin 'sync'.
13:43 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ganeti2026.codfw.wmnet
13:36 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host ganeti2026.codfw.wmnet
13:33 ayounsi@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
13:14 kormat: uploaded orchestrator 3.2.6-1 packages to apt.wm.o (buster) T275784
12:45 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host ganeti2026.codfw.wmnet with OS buster
12:44 ayounsi@cumin1001: START - Cookbook sre.dns.netbox
12:42 aborrero@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 10 days, 0:00:00 on cloudbackup2002.codfw.wmnet with reason: working on cinder backupse
12:42 aborrero@cumin1001: START - Cookbook sre.hosts.downtime for 10 days, 0:00:00 on cloudbackup2002.codfw.wmnet with reason: working on cinder backupse
12:19 lucaswerkmeister-wmde@deploy1002: Synchronized wmf-config/Wikibase.php: Config: Untangle “dispatch via jobs” settings in Wikibase.php (T291828) (no-op) (duration: 01m 04s)
12:12 lucaswerkmeister-wmde@deploy1002: Synchronized wmf-config/InitialiseSettings.php: Config: Set wmgWikibaseDispatchViaJobsPruneChangesTableInJobEnabled for wikidatawiki (T291828) (no-op) (duration: 01m 05s)
11:47 jmm@cumin2002: START - Cookbook sre.hosts.reimage for host ganeti2026.codfw.wmnet with OS buster
11:17 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.decommission (exit_code=0) for hosts testvm2006.codfw.wmnet
11:10 jmm@cumin2002: START - Cookbook sre.hosts.decommission for hosts testvm2006.codfw.wmnet
11:10 jmm@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
11:01 jmm@cumin2002: START - Cookbook sre.dns.netbox
10:52 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.decommission (exit_code=0) for hosts testvm2002.codfw.wmnet
10:38 jmm@cumin2002: START - Cookbook sre.hosts.decommission for hosts testvm2002.codfw.wmnet
10:38 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.decommission (exit_code=0) for hosts testvm2001.codfw.wmnet
10:35 urbanecm@deploy1002: Synchronized php-1.38.0-wmf.4/extensions/GrowthExperiments/: 1f33fc3, e0ea1b8, cba2ac9: GrowthExperiments backports (T290609) (duration: 01m 05s)
10:33 urbanecm@deploy1002: Synchronized php-1.38.0-wmf.3/extensions/GrowthExperiments/: 465b564, a8cc98b, 6e95c48: GrowthExperiments backports (T290609) (duration: 01m 06s)
10:32 jmm@cumin2002: START - Cookbook sre.hosts.decommission for hosts testvm2001.codfw.wmnet
09:20 volans@cumin2002: END (PASS) - Cookbook sre.debmonitor.remove-hosts (exit_code=0) for 1 hosts: sretest1001.eqiad.wmnet
09:20 volans@cumin2002: START - Cookbook sre.debmonitor.remove-hosts for 1 hosts: sretest1001.eqiad.wmnet
09:19 volans@cumin2002: END (PASS) - Cookbook sre.debmonitor.remove-hosts (exit_code=0) for 1 hosts: sretest1001.eqiad.wmnet
09:19 volans@cumin2002: START - Cookbook sre.debmonitor.remove-hosts for 1 hosts: sretest1001.eqiad.wmnet
09:19 volans@cumin2002: END (PASS) - Cookbook sre.debmonitor.remove-hosts (exit_code=0) for 1 hosts: sretest1001.eqiad.wmnet
09:19 volans@cumin2002: START - Cookbook sre.debmonitor.remove-hosts for 1 hosts: sretest1001.eqiad.wmnet
09:18 volans@deploy1002: Finished deploy [debmonitor/deploy@ab62ac5]: Release v0.3.1 (duration: 00m 50s)
09:17 volans@deploy1002: Started deploy [debmonitor/deploy@ab62ac5]: Release v0.3.1
09:04 volans@deploy1002: Finished deploy [debmonitor/deploy@444b931]: Release v0.3.1 (duration: 00m 45s)
09:03 volans@deploy1002: Started deploy [debmonitor/deploy@444b931]: Release v0.3.1
09:02 volans@deploy1002: Finished deploy [debmonitor/deploy@444b931]: Release v0.3.1 (duration: 00m 23s)
09:02 volans@deploy1002: Started deploy [debmonitor/deploy@444b931]: Release v0.3.1
08:52 volans@cumin2002: END (PASS) - Cookbook sre.debmonitor.remove-hosts (exit_code=0) for 1 hosts: sretest1001.eqiad.wmnet
08:52 volans@cumin2002: START - Cookbook sre.debmonitor.remove-hosts for 1 hosts: sretest1001.eqiad.wmnet
08:51 volans@cumin2002: END (PASS) - Cookbook sre.debmonitor.remove-hosts (exit_code=0) for 1 hosts: sretest1001.eqiad.wmnet
08:51 volans@cumin2002: START - Cookbook sre.debmonitor.remove-hosts for 1 hosts: sretest1001.eqiad.wmnet
08:22 volans: rolling out debmonitor-client upgrade to 0.3.1 across the fleet
07:25 oblivian@cumin1001: END (FAIL) - Cookbook sre.discovery.service-route (exit_code=99)
07:25 oblivian@cumin1001: START - Cookbook sre.discovery.service-route
07:25 oblivian@cumin1001: END (FAIL) - Cookbook sre.discovery.service-route (exit_code=99)
07:25 oblivian@cumin1001: START - Cookbook sre.discovery.service-route
07:24 oblivian@cumin1001: END (FAIL) - Cookbook sre.discovery.service-route (exit_code=99)
07:24 oblivian@cumin1001: START - Cookbook sre.discovery.service-route
07:18 filippo@cumin1001: conftool action : set/pooled=true; selector: dnsdisc=swift-ro,name=eqiad
07:18 filippo@cumin1001: conftool action : set/pooled=true; selector: dnsdisc=swift,name=eqiad
07:17 oblivian@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
06:37 oblivian@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
01:52 bd808@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'toolhub' for release 'main' .
01:50 foks: changing user email for "Region of Peel Archives"
01:41 ejegg: updated payments-wiki from b329d2dea2 to 19d18c1852
01:35 bd808@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'toolhub' for release 'main' .
01:31 bd808@deploy1002: helmfile [staging] Ran 'sync' command on namespace 'toolhub' for release 'main' .

2021-10-13

23:37 dzahn@deploy1002: helmfile [staging] Ran 'sync' command on namespace 'miscweb' for release 'main' .
23:36 eileen: civicrm revision changed from 946dfb6c5a to 018d3b19fe, config revision is 85277466ed
23:36 tgr@deploy1002: Synchronized wmf-config/InitialiseSettings.php: Config: Create an alias for the project namespace on kswiki (T291740) (duration: 01m 05s)
22:30 dzahn@deploy1002: helmfile [staging] Ran 'sync' command on namespace 'miscweb' for release 'main' .
22:01 dancy@deploy1002: Synchronized php-1.38.0-wmf.3/extensions/Collection/includes/Specials/SpecialCollection.php: Backport: Api: Avoid trying to access undefined offset in a user's collection (T293261) (duration: 01m 04s)
21:50 dancy@deploy1002: Synchronized php-1.38.0-wmf.4/extensions/Collection: Backport: Api: Avoid trying to access undefined offset in a user's collection (T293261) (duration: 01m 04s)
21:47 foks: removing 8 files for legal compliance
21:03 foks: removing 2 files for legal compliance
21:00 mbsantos@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'tegola-vector-tiles' for release 'main' .
20:50 bd808@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'toolhub' for release 'main' .
20:49 brennen@deploy1002: Synchronized php-1.38.0-wmf.4/extensions/Collection/includes/Api/ApiGetBookCreatorBoxContent.php: Backport: Fall back to main page if given title is invalid (T293299) (duration: 01m 04s)
20:46 bd808@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'toolhub' for release 'main' .
20:40 bd808@deploy1002: helmfile [staging] Ran 'sync' command on namespace 'toolhub' for release 'main' .
20:31 mbsantos@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'tegola-vector-tiles' for release 'main' .
20:27 robh@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host kubernetes1021.eqiad.wmnet with OS stretch
20:04 robh@cumin1001: START - Cookbook sre.hosts.reimage for host kubernetes1021.eqiad.wmnet with OS stretch
20:03 robh@cumin1001: END (ERROR) - Cookbook sre.hosts.reimage (exit_code=97) for host kubernetes1021.eqiad.wmnet with OS stretch
20:01 robh@cumin1001: START - Cookbook sre.hosts.reimage for host kubernetes1021.eqiad.wmnet with OS stretch
19:18 dzahn@deploy1002: helmfile [staging] Ran 'sync' command on namespace 'miscweb' for release 'main' .
19:16 mutante: gitlab2001 - status before was that "gitlab-ctl status" showed components "gitlab-workhorse" and "postgres-exporter" as "down". this was either pre-broken or caused by the restore process. after manually 'gitlab-ctl start gitlab-workhorse' all of the components are in "run" and https://gitlab-replica.wikimedia.org is up ( T285867)
19:08 mutante: gitl1b2001 - started workhorse which was for some reason marked as down after restore command ran
19:08 mutante: [gitlab2001:~] $ sudo /usr/bin/gitlab-ctl start gitlab-workhorse
19:06 dancy@deploy1002: Synchronized php: group1 wikis to 1.38.0-wmf.4 refs T281168 (duration: 01m 03s)
19:05 dancy@deploy1002: rebuilt and synchronized wikiversions files: group1 wikis to 1.38.0-wmf.4 refs T281168
19:02 urbanecm@deploy1002: Synchronized wmf-config/InitialiseSettings.php: 8787986: Create Translation namespace for viwikisource (T290691) (duration: 01m 04s)
18:39 urbanecm@deploy1002: Synchronized wmf-config/InitialiseSettings.php: 06fd0f2: add extendedconfimed for autoreview group on ptwiki (T292912) (duration: 01m 04s)
18:37 urbanecm: [urbanecm@mwmaint1002 ~]$ mwscript initSiteStats.php --wiki=ptwiki --update
18:33 urbanecm: [urbanecm@mwmaint1002 ~]$ mwscript emptyUserGroup.php --wiki=ptwiki extendedconfirmed
18:31 urbanecm@deploy1002: Synchronized wmf-config/InitialiseSettings.php: 0bb2b38: Set autoconfirmedextended and confirmedextended for ptwiki (T292915) (duration: 01m 04s)
18:16 urbanecm@deploy1002: Synchronized static/images/project-logos: 694bc23: Remove an old dawiki temporary logo (duration: 01m 04s)
18:15 urbanecm@deploy1002: Synchronized wmf-config/InitialiseSettings.php: 224e2a3: Add NS_MAIN back to wgExtraSignatureNamespaces for mediawikiwiki (T291630) (duration: 01m 05s)
18:12 volans@cumin2002: END (PASS) - Cookbook sre.debmonitor.remove-hosts (exit_code=0) for 1 hosts: sretest1001.eqiad.wmnet
18:12 volans@cumin2002: START - Cookbook sre.debmonitor.remove-hosts for 1 hosts: sretest1001.eqiad.wmnet
18:11 urbanecm@deploy1002: Synchronized static/images/project-logos/: 1b96f54: Update logo for liwiktionary (T291479) (duration: 01m 14s)
18:10 volans@cumin2002: END (PASS) - Cookbook sre.debmonitor.remove-hosts (exit_code=0) for 1 hosts: sretest1001.eqiad.wmnet
18:10 volans@cumin2002: START - Cookbook sre.debmonitor.remove-hosts for 1 hosts: sretest1001.eqiad.wmnet
18:09 volans@cumin2002: END (PASS) - Cookbook sre.debmonitor.remove-hosts (exit_code=0) for 1 hosts: sretest1001.eqiad.wmnet
18:09 volans@cumin2002: START - Cookbook sre.debmonitor.remove-hosts for 1 hosts: sretest1001.eqiad.wmnet
18:08 volans: uploaded debmonitor-client_0.3.1 to apt.wikimedia.org stretch-wikimedia,buster-wikimedia,bullseye-wikimedia
17:14 urbanecm@deploy1002: Synchronized php-1.38.0-wmf.4/extensions/GrowthExperiments/maintenance/initWikiConfig.php: dd7a331: initWikiConfig: Fix loading difficulty/group from SUGGESTED_EDITS_TASK_TYPES (T293219) (duration: 01m 04s)
17:13 urbanecm@deploy1002: Synchronized php-1.38.0-wmf.3/extensions/GrowthExperiments/maintenance/initWikiConfig.php: 5c27154: initWikiConfig: Fix loading difficulty/group from SUGGESTED_EDITS_TASK_TYPES (T293219) (duration: 01m 15s)
16:57 mutante: stat1008 - short on disk space, mostly used in /tmp, high CPU usage by R proccess, sent a message about it to all shell users via wall
16:50 mutante: stat1008 - apt-get clean - freed 1.3 GB disk space - was alerting in Icinga because / was 97% full
16:37 volans@cumin2002: END (PASS) - Cookbook sre.debmonitor.remove-hosts (exit_code=0) for 1 hosts: sretest1001.eqiad.wmnet
16:37 volans@cumin2002: START - Cookbook sre.debmonitor.remove-hosts for 1 hosts: sretest1001.eqiad.wmnet
16:23 volans@cumin2002: END (PASS) - Cookbook sre.debmonitor.remove-hosts (exit_code=0) for 1 hosts: sretest1001.eqiad.wmnet
16:23 volans@cumin2002: START - Cookbook sre.debmonitor.remove-hosts for 1 hosts: sretest1001.eqiad.wmnet
15:29 volans@cumin2002: END (PASS) - Cookbook sre.debmonitor.remove-hosts (exit_code=0) for 1 hosts: sretest1001.eqiad.wmnet
15:28 volans@cumin2002: START - Cookbook sre.debmonitor.remove-hosts for 1 hosts: sretest1001.eqiad.wmnet
15:26 volans@cumin2002: END (PASS) - Cookbook sre.debmonitor.remove-hosts (exit_code=0) for 1 hosts: sretest1001.eqiad.wmnet
15:26 volans@cumin2002: START - Cookbook sre.debmonitor.remove-hosts for 1 hosts: sretest1001.eqiad.wmnet
15:16 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ganeti2026.codfw.wmnet
15:13 jbond@cumin1001: END (PASS) - Cookbook sre.debmonitor.remove-hosts (exit_code=0) for 1 hosts: sretest1001.eqiad.wmnet
15:13 jbond@cumin1001: START - Cookbook sre.debmonitor.remove-hosts for 1 hosts: sretest1001.eqiad.wmnet
15:12 jbond@cumin1001: END (PASS) - Cookbook sre.debmonitor.remove-hosts (exit_code=0) for 1 hosts: sretest1001.eqiad.wmnet
15:12 jbond@cumin1001: START - Cookbook sre.debmonitor.remove-hosts for 1 hosts: sretest1001.eqiad.wmnet
15:09 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host ganeti2026.codfw.wmnet
15:04 jgiannelos@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'tegola-vector-tiles' for release 'main' .
15:03 jbond@cumin1001: END (PASS) - Cookbook sre.debmonitor.remove-hosts (exit_code=0) for 1 hosts: sretest1001.eqiad.wmnet
15:03 jbond@cumin1001: START - Cookbook sre.debmonitor.remove-hosts for 1 hosts: sretest1001.eqiad.wmnet
15:01 jgiannelos@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'tegola-vector-tiles' for release 'main' .
15:01 jbond@cumin1001: END (PASS) - Cookbook sre.debmonitor.remove-hosts (exit_code=0) for 1 hosts: sretest1001.eqiad.wmnet
15:01 jbond@cumin1001: START - Cookbook sre.debmonitor.remove-hosts for 1 hosts: sretest1001.eqiad.wmnet
14:59 jgiannelos@deploy1002: helmfile [staging] Ran 'sync' command on namespace 'tegola-vector-tiles' for release 'main' .
14:59 jbond@cumin1001: END (PASS) - Cookbook sre.debmonitor.remove-hosts (exit_code=0) for 1 hosts: sretest1001.eqiad.wmnet
14:59 jbond@cumin1001: START - Cookbook sre.debmonitor.remove-hosts for 1 hosts: sretest1001.eqiad.wmnet
14:57 jbond@cumin1001: END (PASS) - Cookbook sre.debmonitor.remove-hosts (exit_code=0) for 1 hosts: sretest1001.eqiad.wmnet
14:56 jbond@cumin1001: START - Cookbook sre.debmonitor.remove-hosts for 1 hosts: sretest1001.eqiad.wmnet
14:56 jbond@cumin1001: END (PASS) - Cookbook sre.debmonitor.remove-hosts (exit_code=0) for 1 hosts: sretest1001.eqiad.wmnet
14:56 jbond@cumin1001: START - Cookbook sre.debmonitor.remove-hosts for 1 hosts: sretest1001.eqiad.wmnet
14:54 jbond@cumin1001: END (PASS) - Cookbook sre.debmonitor.remove-hosts (exit_code=0) for 1 hosts: sretest1001.eqiad.wmnet
14:54 jbond@cumin1001: START - Cookbook sre.debmonitor.remove-hosts for 1 hosts: sretest1001.eqiad.wmnet
14:52 ema: repool cp4021, further testing can be performed on sretest1001 T201317
14:51 volans: restarting ircecho.service on alert1001 to get back icinga-wm without the underscore
14:50 elukey: restart pybal on lvs1015 (low-traffic primary) to pick up new config for inference.discovery.wmnet - T289835
14:48 moritzm: reverted to clean package state on deneb
14:44 elukey@puppetmaster1001: conftool action : ge; selector: cluster=ml_serve,service=inference
14:36 elukey: restart pybal on lvs1016 (low-traffic secondary) to pick up new config for inference.discovery.wmnet - T289835
14:27 jbond@cumin1001: END (PASS) - Cookbook sre.debmonitor.remove-hosts (exit_code=0) for 1 hosts: sretest1001.eqiad.wmnet
14:27 jbond@cumin1001: START - Cookbook sre.debmonitor.remove-hosts for 1 hosts: sretest1001.eqiad.wmnet
14:25 jbond@cumin1001: END (PASS) - Cookbook sre.debmonitor.remove-hosts (exit_code=0) for 1 hosts: sretest1001.eqiad.wmnet
14:25 jbond@cumin1001: START - Cookbook sre.debmonitor.remove-hosts for 1 hosts: sretest1001.eqiad.wmnet
14:21 jbond@cumin1001: END (PASS) - Cookbook sre.debmonitor.remove-hosts (exit_code=0) for 1 hosts: sretest1001.eqiad.wmnet
14:21 jbond@cumin1001: START - Cookbook sre.debmonitor.remove-hosts for 1 hosts: sretest1001.eqiad.wmnet
14:20 moritzm: temporarily downgrade sphinx packages on deneb to 1.7.9-1~bpo9+1 to build a Ganeti 2.16 stretch backport with delicate toolchain needs
14:13 jbond@cumin1001: END (PASS) - Cookbook sre.debmonitor.remove-hosts (exit_code=0) for 1 hosts: sretest1001.eqiad.wmnet
14:13 jbond@cumin1001: START - Cookbook sre.debmonitor.remove-hosts for 1 hosts: sretest1001.eqiad.wmnet
14:10 jbond@cumin1001: END (PASS) - Cookbook sre.debmonitor.remove-hosts (exit_code=0) for 1 hosts: sretest1001.eqiad.wmnet
14:10 jbond@cumin1001: START - Cookbook sre.debmonitor.remove-hosts for 1 hosts: sretest1001.eqiad.wmnet
14:10 jbond@cumin1001: END (PASS) - Cookbook sre.debmonitor.remove-hosts (exit_code=0) for 1 hosts: sretest1001.eqiad.wmnet
14:10 jbond@cumin1001: START - Cookbook sre.debmonitor.remove-hosts for 1 hosts: sretest1001.eqiad.wmnet
13:59 XioNoX: push prep-work for anycast tuning in ulsfo - T288843
13:38 jayme: imported helm-diff_3.1.3-2 to buster-wikimedia (https://gerrit.wikimedia.org/r/c/operations/debs/helm-diff/+/730509)
13:37 jayme@deploy1002: helmfile [staging] Ran 'sync' command on namespace 'shellbox' for release 'main' .
13:34 ema@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host cp4021.ulsfo.wmnet with OS buster
12:13 Lucas_WMDE: UTC morning backport+config window done
12:12 kharlan@deploy1002: Synchronized php-1.38.0-wmf.3/extensions/GrowthExperiments/includes: Backport: Add Link: Do not log "no suggestion found" errors in production log (T291251) (duration: 01m 04s)
12:11 urbanecm: [urbanecm@mwmaint1002 ~]$ mwscript extensions/GrowthExperiments/maintenance/initWikiConfig.php --wiki=itwiki --phab='T255037' # after applying 730512 at mwmaint1002 to workaround T293219 # T255037
12:11 kharlan@deploy1002: Synchronized php-1.38.0-wmf.3/extensions/GrowthExperiments/modules: Backport: Suggested Edits: Update local config.presets when topics/difficulty presets change (T292536) (duration: 01m 07s)
11:56 urbanecm@deploy1002: Synchronized wmf-config/config/itwiki.yaml: 38a019d: itwiki: Deploy Growth features in dark mode (T255037) (duration: 01m 04s)
11:55 urbanecm: mwscript extensions/Translate/scripts/moveTranslatablePage.php --wiki=mediawikiwiki "Growth/Communities/How to introduce yourself as a mentor" "Growth/Communities/How to configure the mentors' list" "Martin Urbanec (WMF)" --reason 'phab:T293184' # T293184
11:55 urbanecm@deploy1002: Synchronized dblists/growthexperiments.dblist: 38a019d: Deploy Growth features in dark mode (T255037; 2/3) (duration: 01m 04s)
11:54 urbanecm@deploy1002: Synchronized wmf-config/InitialiseSettings.php: 38a019d: itwiki: Deploy Growth features in dark mode (T255037; 1/3) (duration: 01m 05s)
11:50 urbanecm: [urbanecm@mwmaint1002 ~]$ mwscript extensions/GrowthExperiments/maintenance/initWikiConfig.php --wiki=itwiki --phab='T255037' # T255037
11:49 urbanecm: [urbanecm@mwmaint1002 ~]$ mwscript extensions/WikimediaMaintenance/createExtensionTables.php --wiki=itwiki growthexperiments # T255037
11:48 lucaswerkmeister-wmde@deploy1002: Synchronized php-1.38.0-wmf.4/extensions/Wikibase/repo/: Backport: Instantiate ItemId for SiteLinkConflictLookup results (T293104) (duration: 01m 07s)
11:43 lucaswerkmeister-wmde@deploy1002: Synchronized php-1.38.0-wmf.3/extensions/Wikibase/repo/: Backport: Instantiate ItemId for SiteLinkConflictLookup results (T293104) (duration: 01m 18s)
11:33 ema@cumin2002: START - Cookbook sre.hosts.reimage for host cp4021.ulsfo.wmnet with OS buster
11:19 ema: pool cp4021 after reimage T201317
11:05 ema@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host cp4021.ulsfo.wmnet with OS buster
10:15 jgiannelos@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'tegola-vector-tiles' for release 'main' .
10:10 ayounsi@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
10:09 phuedx@deploy1002: Synchronized wmf-config/InitialiseSettings-labs.php: Config: Add more types of QuickSurveys on beta cluster (T292459) (duration: 01m 53s)
10:06 ayounsi@cumin1001: START - Cookbook sre.dns.netbox
09:22 ema@cumin2002: START - Cookbook sre.hosts.reimage for host cp4021.ulsfo.wmnet with OS buster
08:35 oblivian@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
08:28 oblivian@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
08:21 elukey: run kafka preferred-replica-election on kafka-main1001 to rebalance partition leaders - T288825
08:15 godog: bounce graphite on graphite1004 to apply new config
07:33 elukey: increase kafka topic partition size of the top 4 high traffic topics of main-eqiad as described in https://phabricator.wikimedia.org/T288825#7422726
07:13 XioNoX: provision new eqsin-ulsfo link - T273308
06:26 elukey: `kafka topics --alter --topic {eqiad,codfw}.change-prop.transcludes.resource-change --partitions 3` on kafka-main2001 - T288825
00:38 ejegg: updated payments-wiki from 030b11da1a to b329d2dea2

2021-10-12

23:48 dzahn@deploy1002: helmfile [staging] Ran 'sync' command on namespace 'miscweb' for release 'main' .
23:16 urbanecm: UTC late B&C window done
23:15 urbanecm@deploy1002: Synchronized wmf-config/logos.php: 59c31d9: Change logo in astwiki (T292742) (duration: 01m 04s)
23:12 urbanecm@deploy1002: Synchronized static/images/project-logos/: 59c31d9: Change logo in astwiki (T292742) (duration: 02m 09s)
23:05 dzahn@deploy1002: helmfile [staging] Ran 'sync' command on namespace 'miscweb' for release 'main' .
22:53 urbanecm: [urbanecm@labweb1001 ~]$ mwscript extensions/OATHAuth/maintenance/disableOATHAuthForUser.php --wiki=labswiki Jamesmontalvo3 #
22:51 dzahn@deploy1002: helmfile [staging] Ran 'sync' command on namespace 'miscweb' for release 'main' .
20:21 bd808@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'toolhub' for release 'main' .
19:31 dancy@deploy1002: Pruned MediaWiki: 1.38.0-wmf.1 (duration: 04m 02s)
19:13 pt1979@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
19:08 pt1979@cumin2002: START - Cookbook sre.dns.netbox
19:02 dancy@deploy1002: rebuilt and synchronized wikiversions files: group0 wikis to 1.38.0-wmf.4 refs T281168
18:47 dancy@deploy1002: Finished scap: testwikis wikis to 1.38.0-wmf.4 refs T281168 (duration: 45m 36s)
18:12 volans@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host sretest1001.eqiad.wmnet with OS buster
18:01 dancy@deploy1002: Started scap: testwikis wikis to 1.38.0-wmf.4 refs T281168
17:58 mwdebug-deploy@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
17:56 dancy@deploy1002: Synchronized php-1.38.0-wmf.4/extensions/CentralNotice: Backport: gerrit:730141 (duration: 00m 59s)
17:55 mwdebug-deploy@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
17:46 volans@cumin2002: START - Cookbook sre.hosts.reimage for host sretest1001.eqiad.wmnet with OS buster
17:43 jgiannelos@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'tegola-vector-tiles' for release 'main' .
17:41 jgiannelos@deploy1002: helmfile [staging] Ran 'sync' command on namespace 'tegola-vector-tiles' for release 'main' .
17:41 mwdebug-deploy@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
17:38 mwdebug-deploy@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
17:32 dancy@deploy1002: Synchronized php-1.38.0-wmf.3/extensions/SyntaxHighlight_GeSHi/includes/ResourceLoaderPygmentsModule.php: Backport: Include generated styles before Mediawiki overrides (T292736) (duration: 00m 57s)
17:30 mwdebug-deploy@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
17:27 mwdebug-deploy@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
17:23 dancy@deploy1002: Synchronized php-1.38.0-wmf.4/includes/actions/pagers/HistoryPager.php: Backport: Fix history page iteration in backwards mode (T292791) (duration: 00m 57s)
17:19 mwdebug-deploy@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
17:16 mwdebug-deploy@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
17:16 dancy@deploy1002: Synchronized php-1.38.0-wmf.3/includes/actions/pagers/HistoryPager.php: Backport: Fix history page iteration in backwards mode (T292791) (duration: 00m 57s)
17:12 moritzm: installing rsync bugfix updates
17:09 bd808@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'toolhub' for release 'main' .
16:56 bd808@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'toolhub' for release 'main' .
16:55 volans@cumin2002: END (PASS) - Cookbook sre.hosts.decommission (exit_code=0) for hosts testvm2009.codfw.wmnet
16:53 moritzm: failed over ganeti master for test cluster to ganeti2025
16:50 bd808@deploy1002: helmfile [staging] Ran 'sync' command on namespace 'toolhub' for release 'main' .
16:48 volans@cumin2002: START - Cookbook sre.hosts.decommission for hosts testvm2009.codfw.wmnet
16:32 mwdebug-deploy@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
16:30 volans@cumin2002: END (ERROR) - Cookbook sre.hosts.decommission (exit_code=97) for hosts testvm2009.codfw.wmnet
16:30 volans@cumin2002: START - Cookbook sre.hosts.decommission for hosts testvm2009.codfw.wmnet
16:29 mwdebug-deploy@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
16:26 volans@cumin2002: END (ERROR) - Cookbook sre.ganeti.makevm (exit_code=97) for new host testvm2009.codfw.wmnet
16:26 dancy@deploy1002: Synchronized php-1.38.0-wmf.4/includes: Backport: Pre-format comments for non-local files too (T292570) (duration: 01m 15s)
16:17 volans@cumin2002: START - Cookbook sre.ganeti.makevm for new host testvm2009.codfw.wmnet
16:16 volans@cumin2002: END (FAIL) - Cookbook sre.hosts.decommission (exit_code=1) for hosts testvm2009.codfw.wmnet
16:11 mwdebug-deploy@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
16:10 volans@cumin2002: START - Cookbook sre.hosts.decommission for hosts testvm2009.codfw.wmnet
16:09 volans@cumin2002: END (PASS) - Cookbook sre.hosts.decommission (exit_code=0) for hosts testvm2009.codfw.wmnet
16:08 mwdebug-deploy@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
16:06 dancy@deploy1002: Synchronized php-1.38.0-wmf.4/extensions/SecurePoll/includes/Hooks/HookRunner.php: Backport: Fix wrong var being passed (T289950 T293102) (duration: 00m 57s)
16:00 volans@cumin2002: START - Cookbook sre.hosts.decommission for hosts testvm2009.codfw.wmnet
15:59 mwdebug-deploy@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
15:58 dancy@deploy1002: Synchronized php-1.38.0-wmf.3/extensions/SecurePoll/includes/Hooks/HookRunner.php: Backport: Fix wrong var being passed (T289950 T293102) (duration: 02m 13s)
15:57 volans@cumin2002: END (PASS) - Cookbook sre.ganeti.makevm (exit_code=0) for new host testvm2009.codfw.wmnet
15:57 mwdebug-deploy@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
15:51 jgiannelos@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'tegola-vector-tiles' for release 'main' .
15:49 volans@cumin2002: START - Cookbook sre.ganeti.makevm for new host testvm2009.codfw.wmnet
15:48 volans@cumin2002: END (ERROR) - Cookbook sre.ganeti.makevm (exit_code=97) for new host testvm2009.codfw.wmnet
15:48 volans@cumin2002: START - Cookbook sre.ganeti.makevm for new host testvm2009.codfw.wmnet
15:41 btullis@cumin1001: END (PASS) - Cookbook sre.hosts.remove-downtime (exit_code=0) for analytics1069.eqiad.wmnet
15:41 btullis@cumin1001: START - Cookbook sre.hosts.remove-downtime for analytics1069.eqiad.wmnet
15:02 volans@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
14:50 volans@cumin2002: START - Cookbook sre.dns.netbox
13:49 jgiannelos@deploy1002: helmfile [staging] Ran 'sync' command on namespace 'tegola-vector-tiles' for release 'main' .
13:40 jmm@cumin2002: END (PASS) - Cookbook sre.ganeti.makevm (exit_code=0) for new host testvm2006.codfw.wmnet
13:25 jmm@cumin2002: START - Cookbook sre.ganeti.makevm for new host testvm2006.codfw.wmnet
13:21 mwdebug-deploy@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
13:19 mwdebug-deploy@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
13:14 godog: add 50G to prometheus/k8s in eqiad
13:13 otto@deploy1002: Synchronized wmf-config/CommonSettings.php: Enable x_client_ip_forwarding_enabled for eventgate-analytics and eventgate-analytics-external - T288853 (duration: 00m 56s)
13:11 btullis@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on analytics1069.eqiad.wmnet with reason: draining flea power T291732
13:11 btullis@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on analytics1069.eqiad.wmnet with reason: draining flea power T291732
13:05 volans: upgraed spicerack to 1.0.5 on cumin hosts
12:25 volans: uploaded spicerack_1.0.5 to apt.wikimedia.org buster-wikimedia,bullseye-wikimedia
12:15 elukey: `kafka topics --alter --topic codfw.mediawiki.job.cirrusSearchElasticaWrite --partitions 5` - T288825
12:15 elukey: `kafka topics --alter --topic eqiad.mediawiki.job.cirrusSearchElasticaWrite --partitions 5` - T288825
12:10 elukey: `kafka topics --alter --topic codfw.cpjobqueue.partitioned.mediawiki.job.cirrusSearchElasticaWrite --partitions 5` - T288825
12:09 elukey: `kafka topics --alter --topic eqiad.cpjobqueue.partitioned.mediawiki.job.cirrusSearchElasticaWrite --partitions 5` - T288825
11:58 elukey: `kafka topics --alter --topic codfw.resource-purge --partitions 5` on kafka-main2001 - T288825
11:49 elukey: `kafka topics --alter --topic eqiad.resource-purge --partitions 5` on kafka-main2001 - T288825
11:46 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ganeti2025.codfw.wmnet
11:44 mwdebug-deploy@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
11:42 mwdebug-deploy@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
11:40 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host ganeti2025.codfw.wmnet
11:34 urbanecm: UTC morning B&C window done
11:33 mwdebug-deploy@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
11:32 urbanecm@deploy1002: Synchronized wmf-config/InitialiseSettings.php: 860ea09: Remove NS_MAIN from wgExtraSignatureNamespaces on most special wikis (T291630) (duration: 00m 57s)
11:30 mwdebug-deploy@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
11:22 mwdebug-deploy@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
11:19 mwdebug-deploy@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
11:14 jgiannelos@deploy1002: helmfile [staging] Ran 'sync' command on namespace 'tegola-vector-tiles' for release 'main' .
11:11 mwdebug-deploy@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
11:08 mwdebug-deploy@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
11:06 urbanecm@deploy1002: Synchronized w/static.php: e77ae17: static.php: correctly report a bad request (duration: 00m 57s)
11:02 jmm@cumin2002: END (FAIL) - Cookbook sre.ganeti.makevm (exit_code=99) for new host testvm2003.codfw.wmnet
10:58 jmm@cumin2002: START - Cookbook sre.ganeti.makevm for new host testvm2003.codfw.wmnet
10:53 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on testvm[2001-2002,2005].codfw.wmnet with reason: Ganeti tests
10:53 jmm@cumin2002: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on testvm[2001-2002,2005].codfw.wmnet with reason: Ganeti tests
10:30 ema: apply https://gerrit.wikimedia.org/r/726912 to all A:cp nodes T288106
10:24 ema@puppetmaster1001: conftool action : set/pooled=yes; selector: name=cp4028.ulsfo.wmnet,service=ats-be
10:23 ema: depool/repool ats-be on cp4028 to verify updates to /etc/varnish/directors.frontend.vcl on cp4027 keep on working fine T288106
10:23 jgiannelos@deploy1002: helmfile [staging] Ran 'sync' command on namespace 'tegola-vector-tiles' for release 'main' .
10:22 ema@puppetmaster1001: conftool action : set/pooled=no; selector: name=cp4028.ulsfo.wmnet,service=ats-be
10:16 ema: cp4027: enable and run puppet to test https://gerrit.wikimedia.org/r/726912 T288106
10:12 jmm@cumin2002: END (ERROR) - Cookbook sre.hosts.reimage (exit_code=97) for host ganeti2025.codfw.wmnet with OS buster
09:16 kormat@cumin1001: dbctl commit (dc=all): 'db1127 (re)pooling @ 100%: repool db1127 T292956', diff saved to https://phabricator.wikimedia.org/P17456 and previous config saved to /var/cache/conftool/dbconfig/20211012-091614-kormat.json
09:01 kormat@cumin1001: dbctl commit (dc=all): 'db1127 (re)pooling @ 75%: repool db1127 T292956', diff saved to https://phabricator.wikimedia.org/P17455 and previous config saved to /var/cache/conftool/dbconfig/20211012-090111-kormat.json
08:46 kormat@cumin1001: dbctl commit (dc=all): 'db1127 (re)pooling @ 50%: repool db1127 T292956', diff saved to https://phabricator.wikimedia.org/P17454 and previous config saved to /var/cache/conftool/dbconfig/20211012-084607-kormat.json
08:31 kormat@cumin1001: dbctl commit (dc=all): 'db1127 (re)pooling @ 25%: repool db1127 T292956', diff saved to https://phabricator.wikimedia.org/P17453 and previous config saved to /var/cache/conftool/dbconfig/20211012-083103-kormat.json
08:03 mwdebug-deploy@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
08:00 mwdebug-deploy@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
07:58 urbanecm@deploy1002: Synchronized php-1.38.0-wmf.3/extensions/GrowthExperiments/: 17dc3aa, e0ca905, c0f4f4e: GrowthExperiments backports (T292224, T290609, T290609) (duration: 00m 59s)
07:40 elukey: run kafka preferred-replica-election on kafka-main2001 to rebalance partition leaders after the last topic moves - T288825
07:33 jmm@cumin2002: START - Cookbook sre.hosts.reimage for host ganeti2025.codfw.wmnet with OS buster
07:22 moritzm: installing RT security updates
04:43 eileen: civicrm revision changed from 96090e4bd2 to 946dfb6c5a, config revision is 85277466ed
03:56 kart_: cxserver: Remove Matxin Key from Production (T292635)
03:54 kartik@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'cxserver' for release 'production' .
03:48 kartik@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'cxserver' for release 'production' .
03:45 kartik@deploy1002: helmfile [staging] Ran 'sync' command on namespace 'cxserver' for release 'staging' .
02:44 mwdebug-deploy@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
02:41 mwdebug-deploy@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
02:28 mwdebug-deploy@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
02:22 mwdebug-deploy@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
00:11 eileen: civicrm revision changed from 598b59b0ee to 96090e4bd2, config revision is 85277466ed

2021-10-11

21:25 btullis@cumin1001: END (PASS) - Cookbook sre.hadoop.roll-restart-masters (exit_code=0) restart masters for Hadoop analytics cluster: Restart of jvm daemons. - btullis@cumin1001
20:58 btullis@cumin1001: START - Cookbook sre.hadoop.roll-restart-masters restart masters for Hadoop analytics cluster: Restart of jvm daemons. - btullis@cumin1001
17:08 elukey: force kafka preferred-replica-election on kafka-main2001 after another batch of topic partitions moves - T288825
15:40 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ganeti2026.codfw.wmnet
15:34 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host ganeti2026.codfw.wmnet
15:31 jgleeson: smashpig updated from 3607b16f83 to dd3a81c7c2
14:59 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on testvm[2001-2002,2005].codfw.wmnet with reason: Ganeti tests
14:59 jmm@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on testvm[2001-2002,2005].codfw.wmnet with reason: Ganeti tests
14:36 Emperor: start restoring weight to ms-be2045 T290881
13:42 elukey: force kafka preferred-replica-election on kafka-main2001 after another batch of topic partitions moves - T288825
12:53 moritzm: install apache security updates on buster
12:49 topranks: Setting up BGP peering to AS12552 (GlobalConnect Group) at AMS-IX on cr2-esams
12:45 ema: cp4027: upgrade varnish to 6.0.8 T292290
12:04 moritzm: install apache security updates on bullseye
10:23 filippo@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host graphite2003.codfw.wmnet
09:50 filippo@cumin1001: START - Cookbook sre.hosts.reimage for host graphite2003.codfw.wmnet
09:45 filippo@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host graphite2003.codfw.wmnet
09:37 elukey: force kafka preferred-replica-election on kafka-main2001 after another batch of topic partitions moves - T288825
09:13 filippo@cumin1001: START - Cookbook sre.hosts.reimage for host graphite2003.codfw.wmnet
09:09 elukey: force kafka preferred-replica-election on kafka-main2001 after the first 50 topic partitions moves - T288825
09:05 volans@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host sretest1002.eqiad.wmnet
09:01 godog: bounce swift-object-replicator on ms-be2036
08:52 godog: bounce statsite on graphite1004 to apply unit config changes
08:48 volans@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host sretest1001.eqiad.wmnet
08:41 volans@cumin2002: START - Cookbook sre.hosts.reimage for host sretest1002.eqiad.wmnet
08:38 moritzm: updated buster d-i image for Bullseye 11.1 point release T292844
08:38 moritzm: updated buster d-i image for Buster 10.11 point release T292838
08:26 godog: swift eqiad-prod: final weight to ms-be10[64-67] - T290546
08:25 moritzm: updated buster d-i image for Buster 10.11 point release T292838
08:24 volans@cumin1001: START - Cookbook sre.hosts.reimage for host sretest1001.eqiad.wmnet
08:06 godog: bounce uwsgi on graphite hosts to bump request size limit - T292877
07:58 volans: migrating physical hosts DHCP to the new reimage process - T269855
07:57 elukey: start kafka topics rebalancing for main-codfw (long running maintenance) - T288825

2021-10-09

05:01 jiji@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
04:28 jiji@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
01:32 ryankemper@cumin1001: END (FAIL) - Cookbook sre.elasticsearch.rolling-operation (exit_code=99) restart without plugin upgrade (1 nodes at a time) for ElasticSearch cluster cloudelastic: cloudelastic restart - ryankemper@cumin1001 - T292814
00:46 mutante: ms-be2045 - started systemd-timedated which had been killed by something
00:28 ryankemper@cumin1001: START - Cookbook sre.elasticsearch.rolling-operation restart without plugin upgrade (1 nodes at a time) for ElasticSearch cluster cloudelastic: cloudelastic restart - ryankemper@cumin1001 - T292814
00:24 ryankemper@cumin1001: END (FAIL) - Cookbook sre.elasticsearch.force-unfreeze (exit_code=99)
00:23 ryankemper@cumin1001: START - Cookbook sre.elasticsearch.force-unfreeze
00:13 ryankemper: T292814 Write queue stuck at 133 events in partition 1 of topic `codfw.cpjobqueue.partitioned.mediawiki.job.cirrusSearchElasticaWrite`, will try again at another time
00:12 ryankemper@cumin1001: END (ERROR) - Cookbook sre.elasticsearch.rolling-operation (exit_code=97) restart without plugin upgrade (1 nodes at a time) for ElasticSearch cluster cloudelastic: cloudelastic restart - ryankemper@cumin1001 - T292814

2021-10-08

23:16 legoktm: sudo cumin -b 10 C:mediawiki::packages 'apt-get purge lilypond-data -y'
23:10 ryankemper@cumin1001: START - Cookbook sre.elasticsearch.rolling-operation restart without plugin upgrade (1 nodes at a time) for ElasticSearch cluster cloudelastic: cloudelastic restart - ryankemper@cumin1001 - T292814
21:38 mutante: mwmaint2002 - disable-puppet, stop bacula-fd, recovery in progress
21:34 mutante: disabling puppet on bacula - going through a restore https://wikitech.wikimedia.org/wiki/Bacula#Restore_from_a_non-existent_host_(missing_private_key)
21:30 legoktm: running puppet across C:mediawiki::packages to uninstall lilypond and ploticus: legoktm@cumin1001:~$ sudo cumin -b 4 C:mediawiki::packages 'run-puppet-agent'
20:12 cmjohnson@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on kubernetes1018.eqiad.wmnet with reason: REIMAGE
20:10 cmjohnson@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on kubestage1004.eqiad.wmnet with reason: REIMAGE
20:08 cmjohnson@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on kubernetes1018.eqiad.wmnet with reason: REIMAGE
20:08 cmjohnson@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on kubestage1003.eqiad.wmnet with reason: REIMAGE
20:06 cmjohnson@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on kubestage1004.eqiad.wmnet with reason: REIMAGE
20:05 cmjohnson@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on kubestage1003.eqiad.wmnet with reason: REIMAGE
19:46 cmjohnson@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on kubernetes1020.eqiad.wmnet with reason: REIMAGE
19:45 cmjohnson@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on kubernetes1019.eqiad.wmnet with reason: REIMAGE
19:43 cmjohnson@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on kubernetes1020.eqiad.wmnet with reason: REIMAGE
19:42 cmjohnson@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on kubernetes1019.eqiad.wmnet with reason: REIMAGE
19:42 cmjohnson@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on kubernetes1018.eqiad.wmnet with reason: REIMAGE
19:39 cmjohnson@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on kubernetes1018.eqiad.wmnet with reason: REIMAGE
18:15 cstone: civicrm revision changed from 5cb7d487cb to 598b59b0ee
16:19 urbanecm: [urbanecm@mwmaint1002 ~]$ mwscript extensions/GrowthExperiments/maintenance/updateMenteeData.php --wiki=enwiki --force # to measure performance on a large wiki
15:48 elukey@deploy1002: helmfile [ml-serve-eqiad] DONE helmfile.d/admin 'sync'.
15:48 elukey@deploy1002: helmfile [ml-serve-eqiad] START helmfile.d/admin 'sync'.
15:29 jelto: enable puppet on gitlab1001 again for T283076
14:05 jiji@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
14:01 jiji@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
09:49 Amir1: wikiadmin@10.64.16.85(wikidatawiki)> delete from wb_changes_subscription where cs_subscriber_id in ('testcommonswiki', 'mowiki');
09:39 Emperor: installing stress on ms-be2045 given recent h/w issues T290881
08:20 mwdebug-deploy@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
08:12 mwdebug-deploy@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
08:04 urbanecm: [urbanecm@mwmaint1002 ~]$ mwscript extensions/GrowthExperiments/maintenance/updateMenteeData.php --wiki=frwiki --force
07:43 Emperor: reboot ms-be2045 T290881
07:41 gehel: manually resuming the data reloads on wdqs1009 and wdqs2008
06:42 ayounsi@cumin1001: END (PASS) - Cookbook sre.network.cf (exit_code=0)
06:42 ayounsi@cumin1001: START - Cookbook sre.network.cf
06:28 ayounsi@cumin2002: END (PASS) - Cookbook sre.network.cf (exit_code=0)
06:28 ayounsi@cumin2002: START - Cookbook sre.network.cf
05:35 ryankemper@cumin1001: END (FAIL) - Cookbook sre.elasticsearch.rolling-operation (exit_code=99) restart without plugin upgrade (1 nodes at a time) for ElasticSearch cluster cloudelastic: cloudelastic restart - ryankemper@cumin1001 - T292814
04:56 ryankemper: [WDQS Deploy] Deploy complete. Successful test query placed on query.wikidata.org, there's no relevant criticals in Icinga, and Grafana looks good
04:32 ryankemper: T292814 Beginning rolling restart of `cloudelastic`: `sudo -i cookbook sre.elasticsearch.rolling-operation cloudelastic "cloudelastic restart" --nodes-per-run 1 --start-datetime 2021-10-08T03:53:49 --task-id T292814` on `ryankemper@cumin1001` tmux `elastic`
04:31 ryankemper@cumin1001: START - Cookbook sre.elasticsearch.rolling-operation restart without plugin upgrade (1 nodes at a time) for ElasticSearch cluster cloudelastic: cloudelastic restart - ryankemper@cumin1001 - T292814
04:29 ryankemper: [WDQS Deploy] Restarting `wdqs-categories` across lvs-managed hosts, one node at a time: `sudo -E cumin -b 1 'A:wdqs-all and not A:wdqs-test' 'depool && sleep 45 && systemctl restart wdqs-categories && sleep 45 && pool'`
04:28 ryankemper: [WDQS Deploy] Restarted `wdqs-categories` across both test hosts simultaneously: `sudo -E cumin 'A:wdqs-test' 'systemctl restart wdqs-categories'`
04:28 ryankemper: [WDQS Deploy] Restarted `wdqs-updater` across all hosts, 4 hosts at a time: `sudo -E cumin -b 4 'A:wdqs-all' 'systemctl restart wdqs-updater'`
04:23 ryankemper@deploy1002: Finished deploy [wdqs/wdqs@8f57a56]: 0.3.89 (duration: 08m 22s)
04:20 ryankemper@cumin1001: END (FAIL) - Cookbook sre.elasticsearch.rolling-operation (exit_code=99) restart without plugin upgrade (1 nodes at a time) for ElasticSearch cluster cloudelastic: cloudelastic restart - ryankemper@cumin1001 - T292814
04:20 ryankemper@cumin1001: START - Cookbook sre.elasticsearch.rolling-operation restart without plugin upgrade (1 nodes at a time) for ElasticSearch cluster cloudelastic: cloudelastic restart - ryankemper@cumin1001 - T292814
04:18 gehel@cumin1001: END (FAIL) - Cookbook sre.wdqs.data-reload (exit_code=99)
04:17 gehel@cumin1001: END (FAIL) - Cookbook sre.wdqs.data-reload (exit_code=99)
04:15 ryankemper: [WDQS Deploy] Tests passing following deploy of `0.3.89` on canary `wdqs1003`; proceeding to rest of fleet
04:14 ryankemper@deploy1002: Started deploy [wdqs/wdqs@8f57a56]: 0.3.89
04:14 ryankemper: [WDQS Deploy] Gearing up for deploy of wdqs `0.3.89`. Pre-deploy tests passing on canary `wdqs1003`
03:58 ryankemper@cumin1001: END (FAIL) - Cookbook sre.elasticsearch.rolling-operation (exit_code=99) restart without plugin upgrade (1 nodes at a time) for ElasticSearch cluster cloudelastic: cloudelastic restart - ryankemper@cumin1001 - T292814
03:58 ryankemper@cumin1001: START - Cookbook sre.elasticsearch.rolling-operation restart without plugin upgrade (1 nodes at a time) for ElasticSearch cluster cloudelastic: cloudelastic restart - ryankemper@cumin1001 - T292814
02:04 Krinkle: krinkle@deploy1002$ echo 'https://en.wikipedia.org/static/images/project-logos/jvwiktionary.png' | mwscript purgeList.php , ref T287425, T292810
00:07 tgr_: deploy window over
00:05 tgr@deploy1002: Synchronized php-1.38.0-wmf.3/extensions/GrowthExperiments: Backport: Mentee overview: Make UncachedMenteeOverviewDataProvider::getBlocksForUsers faster (T290609) (duration: 00m 56s)

2021-10-07

23:43 thcipriani@deploy1002: Synchronized wmf-config/logos.php: Config: Change Javanese Wiktionary logo (T287425) part 3/3 (duration: 00m 55s)
23:41 thcipriani@deploy1002: Synchronized logos/config.yaml: Config: Change Javanese Wiktionary logo (T287425) part 2/3 (duration: 00m 55s)
23:40 thcipriani@deploy1002: Synchronized static/images/project-logos: Config: Change Javanese Wiktionary logo (T287425) part 1/3 (duration: 00m 56s)
23:30 thcipriani@deploy1002: Synchronized wmf-config/InitialiseSettings.php: Config: Adding and use wordmark in trwikiquote (T286133) Part 2/2 (duration: 00m 56s)
23:28 thcipriani@deploy1002: Synchronized static/images/mobile/copyright/wikiquote-wordmark-tr.svg: Config: Adding and use wordmark in trwikiquote (T286133) Part 1/2 (duration: 00m 57s)
21:35 urbanecm: Password reset for SUL User:LA2-bot (T292793)
20:43 brennen@deploy1002: rebuilt and synchronized wikiversions files: all wikis to 1.38.0-wmf.3
20:37 brennen@deploy1002: rebuilt and synchronized wikiversions files: all wikis to 1.38.0-wmf.2 refs T281167
20:35 cmooney@cumin1001: END (PASS) - Cookbook sre.network.cf (exit_code=0)
20:35 cmooney@cumin1001: START - Cookbook sre.network.cf
20:23 krinkle@deploy1002: Synchronized php-1.38.0-wmf.3/extensions/Gadgets/: I7c858b8c4bc (duration: 00m 56s)
20:01 urbanecm@deploy1002: Synchronized php-1.38.0-wmf.3/extensions/Echo/: 8a7ff05: Revert "Use namespaced CentralAuthSessionProvider" (duration: 00m 57s)
19:45 urbanecm@deploy1002: Synchronized php-1.38.0-wmf.3/extensions/CentralAuth/: c01c2e4: Revert "Namespace session providers" (duration: 00m 57s)
19:44 urbanecm: Backporting https://gerrit.wikimedia.org/r/c/mediawiki/extensions/CentralAuth/+/727489, https://gerrit.wikimedia.org/r/c/mediawiki/extensions/Echo/+/727487 in an unsafe way -- exceptions at testwikis expected, wmf.3 is not deployed elsewhere, so this should be ok
19:37 brennen@deploy1002: rebuilt and synchronized wikiversions files: Revert all wikis to 1.38.0-wmf.2 (T281167)
19:33 brennen: 1.38.0-wmf.3 train (T281167): variously blocked, rolling back to testwikis for safe deploy of backports
19:14 brennen@deploy1002: rebuilt and synchronized wikiversions files: Revert group2 wikis to 1.38.0-wmf.2
19:07 brennen@deploy1002: rebuilt and synchronized wikiversions files: all wikis to 1.38.0-wmf.3 refs T281167
19:03 brennen: 1.38.0-wmf.3 train (T281167): unblocked, rolling to all wikis
18:50 urbanecm: [urbanecm@mwmaint1002 /srv/mediawiki/php]$ mwscript extensions/GrowthExperiments/maintenance/initWikiConfig.php --wiki=test2wiki
18:46 sukhe: running authdns-update for T292537
18:29 urbanecm: Morning B&C window done
18:28 urbanecm@deploy1002: Synchronized wmf-config/InitialiseSettings.php: 4a946c0: Deploy Growth mentor dashboard to pilot wikis (T278920) (duration: 01m 04s)
18:23 urbanecm@deploy1002: Synchronized dblists/growthexperiments.dblist: 87e3001: Deploy Growth features to test2wiki (duration: 01m 03s)
18:21 urbanecm@deploy1002: Synchronized wmf-config/InitialiseSettings.php: 87e3001: Deploy Growth features to test2wiki (duration: 01m 04s)
18:20 urbanecm@deploy1002: Synchronized wmf-config/InitialiseSettings.php: 31770f2: shwiki: Deploy Growth features to newcomers (T278240) (duration: 01m 04s)
18:15 urbanecm@deploy1002: Synchronized wmf-config/InitialiseSettings.php: 33526df: Stream config changes for android_daily_stats schema (T286000) (duration: 01m 06s)
18:10 ejegg: updated payments-wiki from 6d3560d083 to 030b11da1a
18:07 arnoldokoth: gitlab2001 re-image complete (T283076)
17:30 mutante: rebooting gitlab2001.wikimedia.org
16:56 arnoldokoth: down timing gitlab2001 for re-imaging (T283076)
16:47 dzahn@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on gitlab2001.wikimedia.org with reason: reimage
16:47 dzahn@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on gitlab2001.wikimedia.org with reason: reimage
16:32 hnowlan: roll restarting maps cassandra instances for java updates
16:19 ayounsi@cumin2002: END (PASS) - Cookbook sre.network.cf (exit_code=0)
16:19 ayounsi@cumin2002: START - Cookbook sre.network.cf
16:18 ayounsi@cumin1001: END (FAIL) - Cookbook sre.network.cf (exit_code=99)
16:18 ayounsi@cumin1001: START - Cookbook sre.network.cf
16:18 ayounsi@cumin1001: END (FAIL) - Cookbook sre.network.cf (exit_code=99)
16:18 ayounsi@cumin1001: START - Cookbook sre.network.cf
15:07 hashar@deploy1002: Finished deploy [gerrit/gerrit@13cef9f]: Gerrit to 3.3.6 on gerrit1001 (duration: 00m 08s)
15:07 hashar@deploy1002: Started deploy [gerrit/gerrit@13cef9f]: Gerrit to 3.3.6 on gerrit1001
14:50 mwdebug-deploy@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
14:49 hashar@deploy1002: Finished deploy [gerrit/gerrit@13cef9f]: Gerrit to 3.3.6 on gerrit2001 (duration: 00m 10s)
14:49 hashar@deploy1002: Started deploy [gerrit/gerrit@13cef9f]: Gerrit to 3.3.6 on gerrit2001
14:48 hashar: Upgrading Gerrit replica to 3.3.6 # T290236
14:48 mwdebug-deploy@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
14:34 mwdebug-deploy@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
14:31 mwdebug-deploy@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
13:56 jiji@deploy1002: helmfile [eqiad] DONE helmfile.d/admin 'apply'.
13:46 jiji@deploy1002: helmfile [eqiad] START helmfile.d/admin 'apply'.
13:39 cmjohnson@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
13:30 jiji@deploy1002: helmfile [codfw] DONE helmfile.d/admin 'apply'.
13:29 jiji@deploy1002: helmfile [codfw] START helmfile.d/admin 'apply'.
13:29 hashar: restarting CI Jenkins for git plugin update
13:19 cmjohnson@cumin1001: START - Cookbook sre.dns.netbox
13:15 cmjohnson@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
13:14 hashar: Upgraded CI Jenkins on contint2001
13:14 jiji@deploy1002: helmfile [staging-eqiad] DONE helmfile.d/admin 'apply'.
13:13 jiji@deploy1002: helmfile [staging-eqiad] START helmfile.d/admin 'apply'.
13:10 jiji@deploy1002: helmfile [staging-codfw] DONE helmfile.d/admin 'apply'.
13:09 jiji@deploy1002: helmfile [staging-codfw] START helmfile.d/admin 'apply'.
13:06 cmjohnson@cumin1001: START - Cookbook sre.dns.netbox
13:06 volans@cumin2002: END (PASS) - Cookbook sre.experimental.reimage (exit_code=0) for host sretest1001.eqiad.wmnet
13:05 jiji@deploy1002: helmfile [staging-codfw] DONE helmfile.d/admin 'apply'.
13:05 jiji@deploy1002: helmfile [staging-codfw] START helmfile.d/admin 'apply'.
12:56 cmjohnson@cumin1001: START - Cookbook sre.dns.netbox
12:56 cmjohnson@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
12:51 cmjohnson@cumin1001: START - Cookbook sre.dns.netbox
12:40 volans@cumin2002: START - Cookbook sre.experimental.reimage for host sretest1001.eqiad.wmnet
12:16 moritzm: installing testvm2005
11:59 moritzm: installing openssl security updates for stretch (buster/bullseye already fixed)
11:52 Lucas_WMDE: EU backport+config window (aka UTC morning) done
11:52 mwdebug-deploy@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
11:50 lucaswerkmeister-wmde@deploy1002: Synchronized wmf-config/InitialiseSettings.php: Config: Enable Content and Section Translation to Kurdish WP (T290238) (duration: 01m 04s)
11:49 mwdebug-deploy@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
11:44 lucaswerkmeister-wmde@deploy1002: Synchronized php-1.38.0-wmf.3/extensions/WikidataPageBanner/includes/WikidataPageBannerFunctions.php: Backport: Change PropertyId to NumericPropertyId (T289125, T292667) (duration: 01m 05s)
11:41 mwdebug-deploy@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
11:38 mwdebug-deploy@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
11:10 jbond: update puppet stdlib gerrit:726872
09:36 jmm@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
09:31 jmm@cumin2002: START - Cookbook sre.dns.netbox
09:27 jmm@cumin2002: END (FAIL) - Cookbook sre.hosts.decommission (exit_code=1) for hosts testvm2004.codfw.wmnet
09:26 mvernon@cumin2002: END (PASS) - Cookbook sre.experimental.reimage (exit_code=0) for host ms-be2045.codfw.wmnet
09:25 jmm@cumin2002: END (PASS) - Cookbook sre.ganeti.makevm (exit_code=0) for new host testvm2005.codfw.wmnet
09:19 jmm@cumin2002: START - Cookbook sre.hosts.decommission for hosts testvm2004.codfw.wmnet
09:08 jmm@cumin2002: START - Cookbook sre.ganeti.makevm for new host testvm2005.codfw.wmnet
08:49 mvernon@cumin2002: START - Cookbook sre.experimental.reimage for host ms-be2045.codfw.wmnet
08:36 moritzm: imported jenkins 2.303.2 to thirdparty/ci component for buster-wikimedia
07:57 Emperor: re-enabling puppet on ms-be2045 after hw work T290881
07:39 jayme@deploy1002: helmfile [eqiad] DONE helmfile.d/admin 'sync'.
07:39 jayme@deploy1002: helmfile [eqiad] START helmfile.d/admin 'sync'.
07:38 jayme@deploy1002: helmfile [codfw] DONE helmfile.d/admin 'sync'.
07:37 jayme@deploy1002: helmfile [codfw] START helmfile.d/admin 'sync'.
07:34 jayme@deploy1002: helmfile [staging-eqiad] DONE helmfile.d/admin 'sync'.
07:33 jayme@deploy1002: helmfile [staging-eqiad] START helmfile.d/admin 'sync'.
07:32 jayme@deploy1002: helmfile [staging-codfw] DONE helmfile.d/admin 'sync'.
07:31 jayme@deploy1002: helmfile [staging-codfw] START helmfile.d/admin 'sync'.
06:21 ryankemper: [Elastic] Restart of `relforge` complete
06:05 ryankemper: [Elastic] Cluster in green status, proceeding to next and final node => `ryankemper@relforge1003:~$ sudo systemctl restart elasticsearch_6@relforge-eqiad-small-alpha.service && sudo systemctl restart elasticsearch_6@relforge-eqiad.service`
05:53 ryankemper: [Elastic] `ryankemper@relforge1004:~$ sudo systemctl restart elasticsearch_6@relforge-eqiad-small-alpha.service && sudo systemctl restart elasticsearch_6@relforge-eqiad.service`
05:48 ryankemper: [Elastic] Performing rolling restarts of `relforge`. `relforge1003` is the master so I'll restart `relforge1004` first to minimize disruption
03:00 ejegg: updated payments-wiki from 23d0ffac66 to 6d3560d083
02:30 mwdebug-deploy@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
02:28 tstarling@deploy1002: Synchronized wmf-config/CommonSettings.php: enable Parsoid API everywhere (duration: 01m 04s)
02:27 mwdebug-deploy@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
00:11 mutante: [grafana2001:~] $ sudo systemctl start rsync-var-lib-grafana because of "PROBLEM - Check systemd state on grafana2001 is CRITICAL: CRITICAL - degraded" because of some race condition where a file vanished during sync

2021-10-06

23:57 mutante: releases2002 - rm /srv/org/wikimedia/reprepro/conf/distributions - contains only jessie-mediawiki - see 725670 and EOL of MediaWiki 1.31
23:32 mwdebug-deploy@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
23:29 mwdebug-deploy@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
23:21 mwdebug-deploy@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
23:21 jforrester@deploy1002: Synchronized wmf-config/InitialiseSettings.php: Config: Adding and use wordmark in ckbwiki (T288368) (duration: 01m 04s)
23:20 jforrester@deploy1002: Synchronized static/images/mobile/copyright/wikipedia-wordmark-ckb.svg: Config: Adding and use wordmark in ckbwiki (T288368) (duration: 01m 04s)
23:18 mwdebug-deploy@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
23:16 jforrester@deploy1002: Synchronized wmf-config/InitialiseSettings.php: Config: Enable NewUserMessage for ptwikivoyage (T290820) (duration: 01m 05s)
22:30 mutante: re-enabling puppet on mw*, an-worker* after deploying gerrit:726954. no issue this time
22:23 mutante: temp. disabling puppet on an-worker*, mw*
20:50 mutante: global puppet failure - revert is merged, puppet run will recover on next run everywhere. partially forcing with cumin, partially letting it recover naturally
20:43 mutante: [cumin1001:~] $ sudo cumin -b 5 -p 95 'mw2*' 'run-puppet-agent -q --failed-only'
19:08 mwdebug-deploy@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
19:06 mwdebug-deploy@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
19:05 brennen@deploy1002: Synchronized php: group1 wikis to 1.38.0-wmf.3 refs T281167 (duration: 01m 03s)
19:04 brennen@deploy1002: rebuilt and synchronized wikiversions files: group1 wikis to 1.38.0-wmf.3 refs T281167
19:01 brennen: 1.38.0-wmf.3 train (T281167): still unblocked after triage meeting, rolling to group1
18:52 mwdebug-deploy@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
18:50 mwdebug-deploy@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
18:44 legoktm@deploy1002: Synchronized wmf-config/InitialiseSettings.php: Revert disabling static mapframes on eswiki (duration: 01m 14s)
18:36 mwdebug-deploy@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
18:34 mwdebug-deploy@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
18:31 legoktm@deploy1002: Synchronized wmf-config/InitialiseSettings.php: eswiki: Disable static mapframes (T291736) (duration: 01m 17s)
18:25 mwdebug-deploy@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
18:22 mwdebug-deploy@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
18:22 legoktm@deploy1002: Synchronized wmf-config/InitialiseSettings.php: viwikibooks: Set $wgRestrictDisplayTitle to false (T289837) (duration: 01m 21s)
17:16 mwdebug-deploy@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
17:10 mwdebug-deploy@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
16:53 bd808@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'toolhub' for release 'main' .
16:47 brennen@deploy1002: rebuilt and synchronized wikiversions files: group0 wikis to 1.38.0-wmf.3 refs T281167
16:47 bd808@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'toolhub' for release 'main' .
16:43 brennen: 1.38.0-wmf.3 train (T281167): unblocked, rolling to group0
16:41 bd808@deploy1002: helmfile [staging] Ran 'sync' command on namespace 'toolhub' for release 'main' .
16:35 brennen@deploy1002: Synchronized php-1.38.0-wmf.3/extensions/Scribunto/includes/engines/LuaCommon/LanguageLibrary.php: Backport: Replace deprecated ParserOptions::getUser with ::getUserIdentity (T292589) (duration: 01m 04s)
16:35 jynus: stopping db1127 for hw maintenance T292366
16:31 jynus@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1127.eqiad.wmnet with reason: hw maintenance
16:31 jynus@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1127.eqiad.wmnet with reason: hw maintenance
16:28 brennen@deploy1002: Synchronized php-1.38.0-wmf.2/extensions/Scribunto/includes/engines/LuaCommon/LanguageLibrary.php: Backport: Replace deprecated ParserOptions::getUser with ::getUserIdentity (T292589) (duration: 01m 10s)
16:22 mwdebug-deploy@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
16:19 mwdebug-deploy@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
16:11 mwdebug-deploy@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
16:08 mwdebug-deploy@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
16:01 volans@cumin2002: END (PASS) - Cookbook sre.experimental.reimage (exit_code=0) for host sretest1002.eqiad.wmnet
15:45 brennen: 1.38.0-wmf.3 train (T281167): proceeding to deploy backports for T292589
15:37 volans@cumin2002: START - Cookbook sre.experimental.reimage for host sretest1002.eqiad.wmnet
15:35 volans: installer spicerack 1.0.4 on cumin2002
12:50 oblivian@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
12:48 volans: uploaded spicerack_1.0.4 to apt.wikimedia.org buster-wikimedia,bullseye-wikimedia
12:26 jmm@cumin2002: END (PASS) - Cookbook sre.ganeti.makevm (exit_code=0) for new host testvm2004.codfw.wmnet
12:23 jgiannelos@deploy1002: helmfile [staging] Ran 'sync' command on namespace 'tegola-vector-tiles' for release 'main' .
12:18 effie: pool mw1455 mw1422
12:17 urbanecm: wikiadmin@10.64.0.164(viwiki)> delete from growthexperiments_mentee_data; # cleanup after disabling mentor dashboard backend
12:16 jmm@cumin2002: START - Cookbook sre.ganeti.makevm for new host testvm2004.codfw.wmnet
12:13 urbanecm@deploy1002: Synchronized wmf-config/InitialiseSettings.php: 1aa67d4: viwiki: Disable mentor dashboard backend (T278920) (duration: 01m 06s)
12:02 jmm@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
11:59 jmm@cumin2002: START - Cookbook sre.dns.netbox
11:56 jmm@cumin2002: END (FAIL) - Cookbook sre.hosts.decommission (exit_code=1) for hosts testvm2003.codfw.wmnet
11:55 XioNoX: esams - Advertise 185.15.59.0/24 instead of 185.15.58.0/23 - T288505 - T283050
11:46 jmm@cumin2002: START - Cookbook sre.hosts.decommission for hosts testvm2003.codfw.wmnet
10:58 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ganeti2025.codfw.wmnet
10:52 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host ganeti2025.codfw.wmnet
10:50 jelto: disable puppet on gitlab1001 to test puppetized code on GitLab replica - T283076
10:21 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ldap-replica1003.wikimedia.org
10:18 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host ldap-replica1003.wikimedia.org
10:06 mwdebug-deploy@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
10:04 mwdebug-deploy@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
10:04 urbanecm@deploy1002: Synchronized wmf-config/: 0163373: Delete gettingstarted-with-category-suggestions dblist (T235752; 2/2) (duration: 01m 05s)
10:01 urbanecm@deploy1002: Synchronized dblists/: 0163373: Delete gettingstarted-with-category-suggestions dblist (T235752; 1/2) (duration: 01m 04s)
09:59 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host copernicium.wikimedia.org
09:54 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host copernicium.wikimedia.org
09:19 jbond: update ipaddress6 fact - https://gerrit.wikimedia.org/r/c/operations/puppet/+/726625
09:15 mwdebug-deploy@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
09:13 ladsgroup@deploy1002: Synchronized php-1.38.0-wmf.2/extensions/Wikibase/repo/includes/ChangeModification/DispatchChangesJob.php: Backport: Don't fail job if subscribed wiki is unknown (T292446 T292440) (duration: 01m 15s)
09:13 mwdebug-deploy@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
08:29 volans@cumin2002: END (PASS) - Cookbook sre.experimental.reimage (exit_code=0) for host sretest1001.eqiad.wmnet
08:21 XioNoX: add ROAs for 185.15.58.0/24 and 185.15.59.0/24 - T288505 - T283050
08:04 volans@cumin2002: START - Cookbook sre.experimental.reimage for host sretest1001.eqiad.wmnet
07:56 urbanecm: [urbanecm@mwmaint1002 ~]$ mwscript namespaceDupes.php plwikinews --fix # T291344
07:56 urbanecm: [urbanecm@mwmaint1002 ~]$ mwscript namespaceDupes.php plwikinews # T291344
07:55 urbanecm: mwdebug1001: scap pull (T291344 fix done)
07:51 urbanecm: Staging at mwdebug1001 for T291344
05:53 kart_: Updated cxserver to use nodejs12 (T290754)
05:47 kartik@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'cxserver' for release 'production' .
05:39 kartik@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'cxserver' for release 'production' .
05:36 Amir1: start of mwscript extensions/Wikibase/repo/maintenance/pruneChanges.php --wiki wikidatawiki --number-of-days=2
05:31 kartik@deploy1002: helmfile [staging] Ran 'sync' command on namespace 'cxserver' for release 'staging' .
04:54 mwdebug-deploy@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
04:49 mwdebug-deploy@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
04:29 ryankemper: [WDQS] `wdqs1012` is back up after restarting blazegraph (blazegraph was locked up)
04:27 ryankemper: [WDQS] `ryankemper@wdqs1012:~$ sudo systemctl restart wdqs-blazegraph` (attempting to bring downed `wdqs1012` back into health)
04:25 ryankemper: [WDQS] Repooling eqiad hosts following the brief outage from earlier: `wdqs1004`, `wdqs1006`, `wdqs1007`
03:19 eileen: civicrm revision changed from b6f5f71c18 to 82efd2e195, config revision is f4c57d4733
03:11 tstarling@deploy1002: Synchronized php-1.38.0-wmf.3/includes/CommentFormatter/RowCommentIterator.php: fix UBN T292590 (duration: 01m 04s)
01:39 legoktm: legoktm@mwmaint1002:~$ echo "https://en.wikiversity.org/static/images/mobile/copyright/wikiversity.svg" |mwscript purgeList.php
01:17 arlolra@deploy1002: Synchronized php-1.38.0-wmf.2/extensions/GlobalUserPage/includes/GlobalUserPage.php: Bump GlobalUserPage::PARSED_CACHE_VERSION for media DOM changes (duration: 01m 03s)
01:12 arlolra@deploy1002: Synchronized php-1.38.0-wmf.3/extensions/GlobalUserPage/includes/GlobalUserPage.php: Bump GlobalUserPage::PARSED_CACHE_VERSION for media DOM changes (duration: 01m 17s)
00:59 arlolra@deploy1002: Synchronized wmf-config/InitialiseSettings.php: Enable legacy media dom on metawiki (duration: 01m 05s)
00:37 arlolra@deploy1002: Synchronized php-1.38.0-wmf.2/includes/resourceloader/ResourceLoaderSkinModule.php: Add a separate config for content.media.less 2/2 (duration: 01m 03s)
00:35 arlolra@deploy1002: Synchronized php-1.38.0-wmf.2/includes/DefaultSettings.php: Add a separate config for content.media.less 1/2 (duration: 01m 03s)
00:32 arlolra@deploy1002: Synchronized php-1.38.0-wmf.3/includes/resourceloader/ResourceLoaderSkinModule.php: Add a separate config for content.media.less 2/2 (duration: 01m 03s)
00:29 arlolra@deploy1002: Synchronized php-1.38.0-wmf.3/includes/DefaultSettings.php: Add a separate config for content.media.less 1/2 (duration: 01m 04s)
00:16 mutante: puppetmasters: rm /etc/logrotate.d/geoipupdate && systemctl start logrotate && puppet agent -tv
00:14 mutante: puppetmaster2002 - rm /etc/logrotate.d/geoipupdate (not managed by puppet anymore but not removed, caused duplicate logrotate config, made logrotate service fail), start logrotate
00:08 cstone: civicrm revision changed from 34d3c3aae8 to b6f5f71c18
00:01 tgr@deploy1002: Synchronized wmf-config/InitialiseSettings.php: Config: Add WN as an alias to project namespace in Polish Wikinews (T291344) (duration: 01m 04s)

2021-10-05

23:54 tgr@deploy1002: Synchronized static/images/mobile/copyright/wikiversity.svg: Config: Wikiversity Logo Update for 2017 Logo Version (T292109) (duration: 01m 03s)
23:47 tgr@deploy1002: Synchronized wmf-config/InitialiseSettings.php: Config: Adding and use wordmark in azwiki (T284877) (duration: 01m 04s)
23:44 tgr@deploy1002: Synchronized static/images/mobile/copyright/wikipedia-wordmark-az.svg: Config: Adding and use wordmark in azwiki (T284877) (duration: 01m 23s)
23:16 tgr@deploy1002: Synchronized wmf-config/InitialiseSettings.php: Config: Add image_suggestion_interaction event stream (duration: 01m 12s)
23:02 legoktm: deleting old stretch docker images from the registry for T292485
22:24 brennen@deploy1002: rebuilt and synchronized wikiversions files: Revert group0 wikis to 1.38.0-wmf.2
22:20 brennen: 1.38.0-wmf.3 (T281167) rolling back to testwikis for the day; will revisit in US-morning
20:47 brennen@deploy1002: rebuilt and synchronized wikiversions files: group0 wikis to 1.38.0-wmf.3 refs T281167
20:44 brennen@deploy1002: Synchronized php-1.38.0-wmf.3/includes/page: Backport: Pre-format comments for non-local files too (T292570) (duration: 01m 04s)
20:18 mutante: puppetmaster1003 et al - converting maxmind geoip database fetching from cron to timers
20:06 mutante: cumin 'puppetmaster*' "disable-puppet 'T288844 - T273673 - gerrit:721595 - ${USER}'"
19:30 mutante: restoring /home/amire80 from and to mwmaint2002 via Bacula bconsole (T292573)
19:09 brennen@deploy1002: rebuilt and synchronized wikiversions files: Revert group0 wikis to 1.38.0-wmf.2
19:03 brennen@deploy1002: rebuilt and synchronized wikiversions files: group0 wikis to 1.38.0-wmf.3 refs T281167
18:26 brennen@deploy1002: Pruned MediaWiki: 1.37.0-wmf.23 (duration: 01m 57s)
18:23 brennen@deploy1002: Pruned MediaWiki: 1.37.0-wmf.21 (duration: 04m 20s)
18:21 brennen: 1.38.0-wmf.3 (T281167): pruning old branches, starting with 1.37.0-wmf.21, proceeeding to 1.37.0-wmf.23 if time allows
18:11 ppchelko@deploy1002: Synchronized wmf-config: Remove mb_strtoupper overrides for HHVM T219279 Php72ToUpper.php removal (duration: 01m 06s)
18:04 ppchelko@deploy1002: Synchronized wmf-config/CommonSettings.php: Remove mb_strtoupper overrides for HHVM T219279 CS.php (duration: 01m 06s)
17:55 brennen@deploy1002: Finished scap: testwikis wikis to 1.38.0-wmf.3 refs T281167 (duration: 45m 59s)
17:12 btullis@cumin1001: END (PASS) - Cookbook sre.aqs.roll-restart (exit_code=0) for AQS aqs cluster: Roll restart of all AQS's nodejs daemons. - btullis@cumin1001
17:09 brennen@deploy1002: Started scap: testwikis wikis to 1.38.0-wmf.3 refs T281167
17:03 btullis@cumin1001: START - Cookbook sre.aqs.roll-restart for AQS aqs cluster: Roll restart of all AQS's nodejs daemons. - btullis@cumin1001
17:02 btullis@cumin1001: END (FAIL) - Cookbook sre.aqs.roll-restart (exit_code=99) for AQS aqs cluster: Roll restart of all AQS's nodejs daemons. - btullis@cumin1001
17:02 btullis@cumin1001: START - Cookbook sre.aqs.roll-restart for AQS aqs cluster: Roll restart of all AQS's nodejs daemons. - btullis@cumin1001
16:56 brennen: successfully applied security patches for 1.38.0-wmf.3 train (T281167)
16:47 brennen: coordinated with deployment backup and starting train prep for 1.38.0-wmf.3 (T281167), branched at 6527949
15:57 jbond@cumin2002: END (PASS) - Cookbook sre.puppet.renew-cert (exit_code=0) for puppetboard2002.codfw.wmnet: Renew puppet certificate - jbond@cumin2002
15:57 jbond@cumin2002: START - Cookbook sre.puppet.renew-cert for puppetboard2002.codfw.wmnet: Renew puppet certificate - jbond@cumin2002
15:38 jbond: reimage puppetboard2002
15:15 jbond@cumin1001: END (PASS) - Cookbook sre.puppet.renew-cert (exit_code=0) for puppetboard1002.eqiad.wmnet: Renew puppet certificate - jbond@cumin1001
15:15 jbond@cumin1001: START - Cookbook sre.puppet.renew-cert for puppetboard1002.eqiad.wmnet: Renew puppet certificate - jbond@cumin1001
15:10 moritzm: imported routinator 0.10.1-1bullseye to thirdparty/routinator for bullseye-wikimedia T292503
14:58 jbond: reimage puppetboard1002
14:40 effie: depool mw1455 and mw1422
14:30 Pchelolo: run foreachwiki uppercaseTitlesForUnicodeTransition.php --charmap current_to_php7_overrides.php T219279
13:51 reedy@deploy1002: Synchronized wmf-config/CommonSettings.php: ExtensionDistributor - Drop REL1_31, start REL1_37 (duration: 00m 57s)
13:46 Pchelolo: run renameInvalidUsernames.php --wiki loginwiki --list /tmp/rename_users_for_uppercase_all.txt T219279
13:39 elukey@cumin1001: END (FAIL) - Cookbook sre.aqs.roll-restart (exit_code=99) for AQS aqs cluster: Roll restart of all AQS's nodejs daemons. - elukey@cumin1001
13:39 elukey@cumin1001: START - Cookbook sre.aqs.roll-restart for AQS aqs cluster: Roll restart of all AQS's nodejs daemons. - elukey@cumin1001
13:23 ppchelko@deploy1002: Synchronized php-1.38.0-wmf.2/maintenance/uppercaseTitlesForUnicodeTransition.php: Backport uppercaseTitlesForUnicodeTransition.php maintenance script improvements T219279 (duration: 00m 58s)
12:53 ema: upload varnish 6.0.8-1wm1 to apt.wikimedia.org T292290
12:43 elukey: import AMD ROCm 4.2 to buster-wikimedia's thirdparty/amd-rocm42 - T287267
12:24 ema: deployment-cache-text06: upgrade varnish to 6.0.8-1wm1 T292290
11:58 hnowlan: reverted restbase2023 to use CN=hostname certificate due to loading errors
11:57 hnowlan@cumin1001: END (FAIL) - Cookbook sre.cassandra.roll-restart (exit_code=99) for nodes matching restbase2023.codfw.wmnet: Switching over to using FQDN certificate - hnowlan@cumin1001
11:57 hnowlan@cumin1001: START - Cookbook sre.cassandra.roll-restart for nodes matching restbase2023.codfw.wmnet: Switching over to using FQDN certificate - hnowlan@cumin1001
11:37 hnowlan@cumin1001: END (FAIL) - Cookbook sre.cassandra.roll-restart (exit_code=99) for nodes matching restbase2023.codfw.wmnet: Switching over to using FQDN certificate - hnowlan@cumin1001
11:28 hnowlan@cumin1001: START - Cookbook sre.cassandra.roll-restart for nodes matching restbase2023.codfw.wmnet: Switching over to using FQDN certificate - hnowlan@cumin1001
11:17 hnowlan_: disabling puppet on cassandra nodes for rollout of 724061 - defaulting to cn=fqdn certificates
11:15 effie: upgrade scap to 4.0.2 - T291095
11:12 urbanecm@deploy1002: Synchronized dblists/commonsuploads.dblist: 0452499: Enable local uploads for tcywiki (T166763) (duration: 00m 59s)
10:11 vgutierrez: update acme-chief to version 0.32 on acmechief hosts - T290249
10:09 vgutierrez: update acme-chief to version 0.32 on acmechief-test hosts - T290249
10:06 vgutierrez: upload acme-chief 0.32 to apt.wm.o (buster) - T290249
09:46 hnowlan_: generated cassandra certificate using FQDN for restbase2023
09:09 topranks: updating routinator on rpki2001 (T291543)
08:59 dcausse: depool and restart blazegraph on wdqs1007
08:51 moritzm: installing openssl security updates for stretch (buster/bullseye already fixed)
07:58 moritzm: installing apache security updates
07:57 elukey: upgrade GPU drivers (AMD ROCm 4.3.1) on an-worker1[096-101]
07:27 ladsgroup@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'eventgate-main' for release 'production' .
07:26 ladsgroup@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'eventgate-main' for release 'production' .
07:26 elukey@puppetmaster1001: conftool action : set/pooled=yes; selector: name=wdqs1004.wmnet
06:38 elukey: reboot an-worker1096 after installing new GPU drivers
04:20 eileen: civicrm revision changed from d74e9aa0a1 to 34d3c3aae8, config revision is cae09f7691

2021-10-04

23:30 foks: resetting some emails used for abuse by a globally-banned user
23:19 jdrewniak@deploy1002: Synchronized portals: Wikimedia Portals Update: Bumping portals to master (T128546) (duration: 00m 59s)
23:18 jdrewniak@deploy1002: Synchronized portals/wikipedia.org/assets: Wikimedia Portals Update: Bumping portals to master (T128546) (duration: 00m 59s)
23:13 urbanecm@deploy1002: Synchronized wmf-config/InitialiseSettings.php: 75645c9: Add explicit config for licensing/copyright message overrides (T284097) (duration: 00m 59s)
23:05 mutante: [deneb:~] $ sudo systemctl start docker-reporter-releng-images
22:54 mutante: puppetmaster2001 - rm /etc/logrotate.d/geoipupdate_ipinfo and geoipupdate_ipinfo ; running puppet, starting logrotate service
18:13 jiji@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
16:51 bblack: rolling restart of haproxy for DoTLS on dns300[12],authdns1001,authdns2001 to recycle connections
15:24 vgutierrez: pool cp5006
15:17 ladsgroup@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'changeprop-jobqueue' for release 'production' .
15:16 ladsgroup@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'changeprop-jobqueue' for release 'production' .
14:50 phuedx: phuedx@mwmaint1002:~$ mwscript extensions/SecurePoll/cli/purgeDecryptionKeys.php --wiki=votewiki --before="20210101000000"
14:46 ladsgroup@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'changeprop-jobqueue' for release 'production' .
14:46 effie: uploading scap 4.0.2 - T291095
14:45 ladsgroup@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'changeprop-jobqueue' for release 'production' .
14:39 brennen: gitlab: upgrade to 14.3.2 (note there was an additional patch release on 2021-10-01) complete (T292256)
14:25 Amir1: cleaning up wb_changes_subscription rows from closed wikis (T292440)
14:24 brennen: gitlab: downtime for upgrade to 14.3.1
14:19 elukey: import AMD ROCm 4.3.1 packages in buster-wikimedia's thirdparty/amd-rocm431 - T287267
14:19 mwdebug-deploy@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
14:17 mwdebug-deploy@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
14:13 ladsgroup@deploy1002: Synchronized wmf-config/Wikibase.php: Config: Explicitly enable dispatching and pruning for wikidata (T48643) (duration: 00m 58s)
14:08 mwdebug-deploy@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
14:06 mwdebug-deploy@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
14:03 jelto@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on gitlab2001.wikimedia.org with reason: upgrade gitlab2001 to new version https://phabricator.wikmiedia.org/T292256
14:03 jelto@cumin1001: START - Cookbook sre.hosts.downtime for 1:00:00 on gitlab2001.wikimedia.org with reason: upgrade gitlab2001 to new version https://phabricator.wikmiedia.org/T292256
14:01 ladsgroup@deploy1002: Synchronized wmf-config: Config: Enable dispatching via jobs everywhere (T48643) (duration: 01m 00s)
12:57 mwdebug-deploy@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
12:56 ladsgroup@deploy1002: Synchronized wmf-config/InitialiseSettings.php: Config: Enable dispatching for wikidatawiki and commonswiki (T292088) (duration: 01m 00s)
12:54 mwdebug-deploy@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
12:08 mwdebug-deploy@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
12:02 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2 days, 0:00:00 on ganeti2026.codfw.wmnet with reason: Ganeti tests
12:02 jmm@cumin2002: START - Cookbook sre.hosts.downtime for 2 days, 0:00:00 on ganeti2026.codfw.wmnet with reason: Ganeti tests
12:02 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2 days, 0:00:00 on ganeti2025.codfw.wmnet with reason: Ganeti tests
12:02 jmm@cumin2002: START - Cookbook sre.hosts.downtime for 2 days, 0:00:00 on ganeti2025.codfw.wmnet with reason: Ganeti tests
12:01 mwdebug-deploy@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
11:55 urbanecm: EU B&C window done
11:55 urbanecm@deploy1002: Synchronized multiversion/MWWikiversions.php: 508cf5c: Let DB expressions intersect DB lists (T290609) (duration: 00m 58s)
11:50 urbanecm@deploy1002: Synchronized wmf-config/InitialiseSettings.php: a855078: dewiki, nlwiki: Bump Growth features to 80% (T288420, T285254) (duration: 00m 58s)
11:46 urbanecm@deploy1002: Synchronized private/PrivateSettings.php: 5728376: Update T250887 mitigations (duration: 00m 58s)
11:44 urbanecm@deploy1002: Synchronized wmf-config/InitialiseSettings.php: b0a96be: Undeploy GettingStarted V: Remove now-obsolete logging channels (T235752) (duration: 00m 59s)
11:42 urbanecm@deploy1002: Synchronized wmf-config/extension-list: 9709bcf: Undeploy GettingStarted IV: Dont build i18n (T235752) (duration: 00m 58s)
11:39 urbanecm@deploy1002: Synchronized wmf-config/InitialiseSettings.php: d60f332: Undeploy getting started III: Dont set wmgUseGettingStarted, now ignored (T235752) (duration: 00m 58s)
11:37 urbanecm@deploy1002: Synchronized wmf-config/CommonSettings.php: 9eaf960: Undeploy GettingStarted II: Dont load regardless of config (T235752) (duration: 00m 58s)
11:35 urbanecm@deploy1002: Synchronized wmf-config/InitialiseSettings.php: 1c7405a: Undeploy GettingStarted I: Disable on all wikis (T235752) (duration: 00m 58s)
11:31 lucaswerkmeister-wmde@deploy1002: Synchronized wmf-config/InitialiseSettings.php: Config: Remove deprecated SectionTranslationTargetLanguage config (T290302) (duration: 00m 58s)
11:22 lucaswerkmeister-wmde@deploy1002: Synchronized wmf-config/InitialiseSettings.php: Config: Add wikisource-bot.toolforge.org to Commons copy upload list (T292213) (duration: 00m 59s)
11:16 lucaswerkmeister-wmde@deploy1002: Synchronized wmf-config/InitialiseSettings.php: Config: Add IA-Upload tool domains to Commons wgCopyUploadsDomains (T287241) (duration: 00m 59s)
11:12 akosiaris@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'mathoid' for release 'production' .
11:10 akosiaris@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'mathoid' for release 'production' .
11:07 jiji@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
11:06 akosiaris@deploy1002: helmfile [staging] Ran 'sync' command on namespace 'mathoid' for release 'staging' .
11:04 effie: depool wtp1026 for tests
11:04 effie: pool wtp1025
10:59 jiji@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
09:13 akosiaris: hbal -L -G row_C -X on ganeti01.svc.eqiad.wmnet
08:59 jgiannelos@deploy1002: Finished deploy [kartotherian/deploy@071f7c3] (eqiad): Increase mirrored traffic to 100% for eqiad (duration: 00m 54s)
08:58 jgiannelos@deploy1002: Started deploy [kartotherian/deploy@071f7c3] (eqiad): Increase mirrored traffic to 100% for eqiad
07:37 joal@deploy1002: Finished deploy [analytics/refinery@38f3adc] (hadoop-test): Hotfix analytics deploy TEST [analytics/refinery@38f3adc] (duration: 06m 14s)
07:31 joal@deploy1002: Started deploy [analytics/refinery@38f3adc] (hadoop-test): Hotfix analytics deploy TEST [analytics/refinery@38f3adc]
07:30 joal@deploy1002: Finished deploy [analytics/refinery@38f3adc] (thin): Hotfix analytics deploy THIN [analytics/refinery@38f3adc] (duration: 00m 06s)
07:30 joal@deploy1002: Started deploy [analytics/refinery@38f3adc] (thin): Hotfix analytics deploy THIN [analytics/refinery@38f3adc]
07:29 joal@deploy1002: Finished deploy [analytics/refinery@38f3adc]: Hotfix analytics deploy [analytics/refinery@38f3adc] (duration: 19m 18s)
07:19 dcausse: restarting blazegraph on wdqs2001 & wdqs2004 (allocators burning too quickly)
07:18 elukey: depool + restart blazegraph + restart updater for wdqs1006
07:18 elukey@puppetmaster1001: conftool action : set/pooled=inactive; selector: name=wdqs1006.wmnet
07:18 elukey@puppetmaster1001: conftool action : set/pooled=inactive; selector: name=wdqs1004.wmnet
07:10 joal@deploy1002: Started deploy [analytics/refinery@38f3adc]: Hotfix analytics deploy [analytics/refinery@38f3adc]
07:02 godog: swift eqiad-prod: add weight to ms-be10[64-67] - T290546
06:44 elukey: depool + restart blazegraph + restart updater on wdqs1004
05:50 ladsgroup@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'changeprop-jobqueue' for release 'production' .
05:49 ladsgroup@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'changeprop-jobqueue' for release 'production' .
05:47 ladsgroup@deploy1002: helmfile [staging] Ran 'sync' command on namespace 'changeprop-jobqueue' for release 'staging' .

2021-10-03

14:45 _joe_: restarting acmechief on acmechief1001
12:55 kormat@cumin1001: dbctl commit (dc=all): 'Depool db1127, bad ram', diff saved to https://phabricator.wikimedia.org/P17414 and previous config saved to /var/cache/conftool/dbconfig/20211003-125530-kormat.json
08:24 elukey: powercycle cp5006 (unresponsive to ssh, remote tty available but not able to login as root, no prometheus metrics in hours)
08:23 elukey@puppetmaster1001: conftool action : set/pooled=no; selector: name=cp5006.eqsin.wmnet

2021-10-02

17:28 bd808@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'toolhub' for release 'main' .
16:10 bd808@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'toolhub' for release 'main' .

2021-10-01

23:19 bd808@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'toolhub' for release 'main' .
22:27 mutante: puppetmaster2001 - systemctl reset-failed
22:16 mutante: puppetmaster2001 systemctl disable geoip_update_ipinfo.timer
22:15 mutante: puppetmaster2001 - sudo /usr/local/bin/geoipupdate_job after adding new shell command and timer - succesfully downloaded enterprise database for T288844
21:56 bd808@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'toolhub' for release 'main' .
21:44 mutante: puppetmasters - temp. disabling puppet one more time, now for a different deploy, to fetch an additional MaxMind database - T288844
21:19 mutante: puppetmaster2001 - puppet removed cron sync_volatile and cron sync_ca - starting and verifying new timers: 'systemctl status sync-puppet-volatile', 'systemctl status sync-puppet-ca' T273673
21:12 mutante: puppetmaster1002, puppetmaster1003, puppetmaster2002, puppetmaster2003: re-enabled puppet, they are backends. backends don't have the sync cron/job/timer, so noop as well, just like 1004/1005/2004/2005. this just leaves the actual change on 2001 - T273673
21:07 mutante: puppetmaster1004, puppetmaster1005, puppetmaster2004, puppetmaster2005: re-enabled puppet, they are "insetup" role
21:06 mbsantos@deploy1002: Finished deploy [kartotherian/deploy@d309a6e] (eqiad): tegola: reduce load to 50% during the weekend (duration: 00m 54s)
21:05 mbsantos@deploy1002: Started deploy [kartotherian/deploy@d309a6e] (eqiad): tegola: reduce load to 50% during the weekend
21:05 mutante: puppetmaster1001 - re-enabled puppet, noop as expected, the passive host pulls from the active one, so only 2001 has the cron/job/timer
21:05 mwdebug-deploy@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
21:02 mwdebug-deploy@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
21:01 legoktm@deploy1002: Synchronized wmf-config/CommonSettings.php: Revert "Have PdfHandler use Shellbox on Commons for 10% of requests" (duration: 00m 59s)
20:58 mutante: temp disabling puppet on puppetmasters - deploying gerrit:724115 (gerrit:723310) T273673
18:58 robh@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on an-db1002.eqiad.wmnet with reason: REIMAGE
18:56 robh@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on an-db1001.eqiad.wmnet with reason: REIMAGE
18:55 robh@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on an-db1002.eqiad.wmnet with reason: REIMAGE
18:53 robh@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on an-db1001.eqiad.wmnet with reason: REIMAGE
18:07 robh@cumin1001: END (ERROR) - Cookbook sre.experimental.reimage (exit_code=97) for host an-db1001.eqiad.wmnet
18:05 robh@cumin1001: START - Cookbook sre.experimental.reimage for host an-db1001.eqiad.wmnet
17:58 effie: depool mw1025, mw1319, mw1312 for test
16:20 dancy: testing upcoming Scap 4.0.2 release on beta
14:04 bblack: C:envoyproxy (appservers and others): restarting envoyproxy
14:04 bblack: C:envoyproxy (appservers and others): ca-certificates updated via cumin to workaround T292291 issues
13:45 elukey@deploy1002: helmfile [ml-serve-eqiad] DONE helmfile.d/admin 'sync'.
13:45 elukey@deploy1002: helmfile [ml-serve-eqiad] START helmfile.d/admin 'sync'.
13:23 bblack: manually trying LE expired root workaround on mwdebug1001 with puppet disabled ...
13:12 gehel@cumin1001: START - Cookbook sre.wdqs.data-reload
13:11 gehel@cumin1001: END (ERROR) - Cookbook sre.wdqs.data-reload (exit_code=97)
13:11 gehel@cumin1001: START - Cookbook sre.wdqs.data-reload
13:10 gehel@cumin1001: START - Cookbook sre.wdqs.data-reload
11:42 jgiannelos@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'tegola-vector-tiles' for release 'main' .
11:11 jynus: manually migrating some vms out of ganeti1009 to avoid excessive memory pressure
10:58 marostegui@cumin1001: dbctl commit (dc=all): 'db1164 (re)pooling @ 100%: After upgrade', diff saved to https://phabricator.wikimedia.org/P17413 and previous config saved to /var/cache/conftool/dbconfig/20211001-105849-root.json
10:57 marostegui@cumin1001: dbctl commit (dc=all): 'db1177 (re)pooling @ 100%: After upgrade', diff saved to https://phabricator.wikimedia.org/P17412 and previous config saved to /var/cache/conftool/dbconfig/20211001-105735-root.json
10:43 jgiannelos@deploy1002: Finished deploy [kartotherian/deploy@d4caf6d] (eqiad): Increase mirrored traffic to 100% for eqiad (duration: 00m 49s)
10:43 marostegui@cumin1001: dbctl commit (dc=all): 'db1164 (re)pooling @ 75%: After upgrade', diff saved to https://phabricator.wikimedia.org/P17411 and previous config saved to /var/cache/conftool/dbconfig/20211001-104345-root.json
10:43 jgiannelos@deploy1002: Started deploy [kartotherian/deploy@d4caf6d] (eqiad): Increase mirrored traffic to 100% for eqiad
10:42 marostegui@cumin1001: dbctl commit (dc=all): 'db1177 (re)pooling @ 75%: After upgrade', diff saved to https://phabricator.wikimedia.org/P17410 and previous config saved to /var/cache/conftool/dbconfig/20211001-104232-root.json
10:28 marostegui@cumin1001: dbctl commit (dc=all): 'db1164 (re)pooling @ 50%: After upgrade', diff saved to https://phabricator.wikimedia.org/P17409 and previous config saved to /var/cache/conftool/dbconfig/20211001-102841-root.json
10:27 marostegui@cumin1001: dbctl commit (dc=all): 'db1177 (re)pooling @ 50%: After upgrade', diff saved to https://phabricator.wikimedia.org/P17408 and previous config saved to /var/cache/conftool/dbconfig/20211001-102728-root.json
10:13 marostegui@cumin1001: dbctl commit (dc=all): 'db1164 (re)pooling @ 25%: After upgrade', diff saved to https://phabricator.wikimedia.org/P17407 and previous config saved to /var/cache/conftool/dbconfig/20211001-101338-root.json
10:12 marostegui@cumin1001: dbctl commit (dc=all): 'db1177 (re)pooling @ 25%: After upgrade', diff saved to https://phabricator.wikimedia.org/P17406 and previous config saved to /var/cache/conftool/dbconfig/20211001-101224-root.json
10:00 jgiannelos@deploy1002: Finished deploy [kartotherian/deploy@c123ab9] (eqiad): Increase mirrored traffic to 80% for eqiad (duration: 00m 51s)
09:59 jgiannelos@deploy1002: Started deploy [kartotherian/deploy@c123ab9] (eqiad): Increase mirrored traffic to 80% for eqiad
09:58 marostegui@cumin1001: dbctl commit (dc=all): 'db1164 (re)pooling @ 10%: After upgrade', diff saved to https://phabricator.wikimedia.org/P17405 and previous config saved to /var/cache/conftool/dbconfig/20211001-095834-root.json
09:57 marostegui@cumin1001: dbctl commit (dc=all): 'db1177 (re)pooling @ 10%: After upgrade', diff saved to https://phabricator.wikimedia.org/P17404 and previous config saved to /var/cache/conftool/dbconfig/20211001-095720-root.json
09:55 marostegui: Upgrade db1164 and db1177
09:54 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db1177 and db1164 for upgrade', diff saved to https://phabricator.wikimedia.org/P17403 and previous config saved to /var/cache/conftool/dbconfig/20211001-095433-marostegui.json
09:49 marostegui@cumin1001: dbctl commit (dc=all): 'db1172 (re)pooling @ 100%: After upgrade', diff saved to https://phabricator.wikimedia.org/P17402 and previous config saved to /var/cache/conftool/dbconfig/20211001-094913-root.json
09:49 marostegui@cumin1001: dbctl commit (dc=all): 'db1135 (re)pooling @ 100%: After upgrade', diff saved to https://phabricator.wikimedia.org/P17401 and previous config saved to /var/cache/conftool/dbconfig/20211001-094902-root.json
09:38 urbanecm: [urbanecm@mwmaint1002 ~]$ mwscript extensions/GrowthExperiments/maintenance/updateMenteeData.php --wiki=frwiki --force # to get an idea about timing for T290609, runs in a tmux session under my account
09:34 marostegui@cumin1001: dbctl commit (dc=all): 'db1172 (re)pooling @ 75%: After upgrade', diff saved to https://phabricator.wikimedia.org/P17400 and previous config saved to /var/cache/conftool/dbconfig/20211001-093410-root.json
09:33 marostegui@cumin1001: dbctl commit (dc=all): 'db1135 (re)pooling @ 75%: After upgrade', diff saved to https://phabricator.wikimedia.org/P17399 and previous config saved to /var/cache/conftool/dbconfig/20211001-093358-root.json
09:25 jmm@cumin2002: END (PASS) - Cookbook sre.ganeti.makevm (exit_code=0) for new host testvm2002.codfw.wmnet
09:19 marostegui@cumin1001: dbctl commit (dc=all): 'db1172 (re)pooling @ 50%: After upgrade', diff saved to https://phabricator.wikimedia.org/P17398 and previous config saved to /var/cache/conftool/dbconfig/20211001-091906-root.json
09:18 marostegui@cumin1001: dbctl commit (dc=all): 'db1135 (re)pooling @ 50%: After upgrade', diff saved to https://phabricator.wikimedia.org/P17397 and previous config saved to /var/cache/conftool/dbconfig/20211001-091854-root.json
09:04 marostegui@cumin1001: dbctl commit (dc=all): 'db1172 (re)pooling @ 25%: After upgrade', diff saved to https://phabricator.wikimedia.org/P17396 and previous config saved to /var/cache/conftool/dbconfig/20211001-090402-root.json
09:03 marostegui@cumin1001: dbctl commit (dc=all): 'db1135 (re)pooling @ 25%: After upgrade', diff saved to https://phabricator.wikimedia.org/P17395 and previous config saved to /var/cache/conftool/dbconfig/20211001-090351-root.json
09:02 jmm@cumin2002: START - Cookbook sre.ganeti.makevm for new host testvm2002.codfw.wmnet
09:00 _joe_: restarting pybal low-traffic in eqiad to pick up the drop of proxyfetch to kubernetes services
08:48 marostegui@cumin1001: dbctl commit (dc=all): 'db1172 (re)pooling @ 10%: After upgrade', diff saved to https://phabricator.wikimedia.org/P17394 and previous config saved to /var/cache/conftool/dbconfig/20211001-084859-root.json
08:48 marostegui@cumin1001: dbctl commit (dc=all): 'db1135 (re)pooling @ 10%: After upgrade', diff saved to https://phabricator.wikimedia.org/P17393 and previous config saved to /var/cache/conftool/dbconfig/20211001-084847-root.json
08:44 marostegui: Upgrade db1135 and db1172
08:44 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db1172 for upgrade', diff saved to https://phabricator.wikimedia.org/P17392 and previous config saved to /var/cache/conftool/dbconfig/20211001-084435-marostegui.json
08:44 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db1135 for upgrade', diff saved to https://phabricator.wikimedia.org/P17391 and previous config saved to /var/cache/conftool/dbconfig/20211001-084411-marostegui.json
08:43 marostegui@cumin1001: dbctl commit (dc=all): 'Repool db2080 T290868', diff saved to https://phabricator.wikimedia.org/P17390 and previous config saved to /var/cache/conftool/dbconfig/20211001-084345-marostegui.json
08:17 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ganeti2026.codfw.wmnet
08:15 _joe_: restarting pybal in codfw to pick up config changes
08:09 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host ganeti2026.codfw.wmnet
08:04 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 8:00:00 on testvm[2001,2003].codfw.wmnet with reason: Ganeti tests
08:03 jmm@cumin2002: START - Cookbook sre.hosts.downtime for 8:00:00 on testvm[2001,2003].codfw.wmnet with reason: Ganeti tests
06:28 marostegui@cumin1001: dbctl commit (dc=all): 'db1114 (re)pooling @ 100%: After upgrade', diff saved to https://phabricator.wikimedia.org/P17388 and previous config saved to /var/cache/conftool/dbconfig/20211001-062846-root.json
06:27 elukey@deploy1002: helmfile [ml-serve-eqiad] Ran 'sync' command on namespace 'revscoring-editquality' for release 'main' .
06:24 marostegui@cumin1001: dbctl commit (dc=all): 'db1119 (re)pooling @ 100%: After upgrade', diff saved to https://phabricator.wikimedia.org/P17387 and previous config saved to /var/cache/conftool/dbconfig/20211001-062453-root.json
06:13 marostegui@cumin1001: dbctl commit (dc=all): 'db1114 (re)pooling @ 75%: After upgrade', diff saved to https://phabricator.wikimedia.org/P17386 and previous config saved to /var/cache/conftool/dbconfig/20211001-061342-root.json
06:13 elukey@deploy1002: helmfile [ml-serve-eqiad] Ran 'sync' command on namespace 'revscoring-editquality' for release 'main' .
06:09 marostegui@cumin1001: dbctl commit (dc=all): 'db1119 (re)pooling @ 75%: After upgrade', diff saved to https://phabricator.wikimedia.org/P17385 and previous config saved to /var/cache/conftool/dbconfig/20211001-060949-root.json
05:58 marostegui@cumin1001: dbctl commit (dc=all): 'db1114 (re)pooling @ 50%: After upgrade', diff saved to https://phabricator.wikimedia.org/P17384 and previous config saved to /var/cache/conftool/dbconfig/20211001-055838-root.json
05:54 marostegui@cumin1001: dbctl commit (dc=all): 'db1119 (re)pooling @ 50%: After upgrade', diff saved to https://phabricator.wikimedia.org/P17383 and previous config saved to /var/cache/conftool/dbconfig/20211001-055445-root.json
05:43 marostegui@cumin1001: dbctl commit (dc=all): 'db1114 (re)pooling @ 25%: After upgrade', diff saved to https://phabricator.wikimedia.org/P17382 and previous config saved to /var/cache/conftool/dbconfig/20211001-054335-root.json
05:39 marostegui@cumin1001: dbctl commit (dc=all): 'db1119 (re)pooling @ 25%: After upgrade', diff saved to https://phabricator.wikimedia.org/P17381 and previous config saved to /var/cache/conftool/dbconfig/20211001-053942-root.json
05:28 marostegui@cumin1001: dbctl commit (dc=all): 'db1114 (re)pooling @ 10%: After upgrade', diff saved to https://phabricator.wikimedia.org/P17380 and previous config saved to /var/cache/conftool/dbconfig/20211001-052831-root.json
05:26 marostegui: Upgrade db1114
05:25 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db1114 for upgrade', diff saved to https://phabricator.wikimedia.org/P17379 and previous config saved to /var/cache/conftool/dbconfig/20211001-052509-marostegui.json
05:24 marostegui@cumin1001: dbctl commit (dc=all): 'db1119 (re)pooling @ 10%: After upgrade', diff saved to https://phabricator.wikimedia.org/P17378 and previous config saved to /var/cache/conftool/dbconfig/20211001-052438-root.json
05:22 marostegui: Upgrade db1119
05:21 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db1119 for upgrade', diff saved to https://phabricator.wikimedia.org/P17377 and previous config saved to /var/cache/conftool/dbconfig/20211001-052133-marostegui.json
04:00 legoktm@deploy1002: Synchronized wmf-config/CommonSettings.php: Have PdfHandler use Shellbox on Commons for 10% of requests (T289228) (duration: 00m 59s)
04:00 mwdebug-deploy@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
03:57 mwdebug-deploy@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
03:24 bd808@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'toolhub' for release 'main' .
03:15 bd808@deploy1002: helmfile [staging] Ran 'sync' command on namespace 'toolhub' for release 'main' .

2021-09-30

23:57 mwdebug-deploy@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
23:54 mwdebug-deploy@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
23:51 reedy@deploy1002: Synchronized wmf-config/InitialiseSettings.php: Put a https protocol into values (duration: 01m 00s)
23:48 dpifke@deploy1002: Finished deploy [statsv/statsv@afeff42]: Deploy statsv with Kafka TLS support (not yet enabled) T290131 (duration: 00m 05s)
23:48 dpifke@deploy1002: Started deploy [statsv/statsv@afeff42]: Deploy statsv with Kafka TLS support (not yet enabled) T290131
23:41 dpifke@deploy1002: Finished deploy [performance/coal@1be49f8]: Deploy Coal with Kafka TLS support (not yet enabled) T290131 (duration: 01m 07s)
23:40 dpifke@deploy1002: Started deploy [performance/coal@1be49f8]: Deploy Coal with Kafka TLS support (not yet enabled) T290131
23:39 dpifke@deploy1002: Finished deploy [performance/navtiming@29264fb]: Deploy Navtiming with Kafka TLS support (not yet enabled) T290131 (duration: 00m 05s)
23:39 dpifke@deploy1002: Started deploy [performance/navtiming@29264fb]: Deploy Navtiming with Kafka TLS support (not yet enabled) T290131
23:34 ejegg: updated Fundraising CiviCRM from d4da344274 to d74e9aa0a1
22:09 legoktm@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'shellbox-media' for release 'main' .
22:07 legoktm@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'shellbox-media' for release 'main' .
22:06 legoktm@deploy1002: helmfile [staging] Ran 'sync' command on namespace 'shellbox-media' for release 'main' .
21:53 legoktm@deploy1002: helmfile [staging] Ran 'sync' command on namespace 'shellbox-timeline' for release 'main' .
21:06 eileen: civicrm revision changed from 2ecb8f0bcd to d4da344274, config revision is 77cb7ec866
20:54 ryankemper: [WCQS] `ryankemper@wcqs1003:~$ sudo pool` (merged https://gerrit.wikimedia.org/r/c/operations/puppet/+/725110 to unbreak readiness probe)
20:54 topranks: Routinator on rpki1001 upgraded to 0.10.0 and working again after force refresh.
20:49 brennen: gitlab1001: upgrade to 14.2.5 complete
20:32 brennen: gitlab2001, gitlab1001: downtime for upgrades to 14.2.5
20:18 ryankemper: [WCQS] `ryankemper@wcqs1003:~$ sudo depool` (not sure why pybal can't depool it, the other 2 servers are pooled)
19:51 topranks: Updating routinator on rpki1001 T291543
19:39 legoktm@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'shellbox-timeline' for release 'main' .
19:38 mwdebug-deploy@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
19:37 legoktm@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'shellbox-timeline' for release 'main' .
19:36 mwdebug-deploy@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
19:26 mwdebug-deploy@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
19:21 mwdebug-deploy@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
19:14 jhuneidi@deploy1002: rebuilt and synchronized wikiversions files: all wikis to 1.38.0-wmf.2 refs T281166
19:11 mwdebug-deploy@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
19:08 mwdebug-deploy@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
19:07 thcipriani@deploy1002: Synchronized php-1.38.0-wmf.2/extensions/MobileFrontend: Backport: Fix search within pages alignment (T292107) (duration: 01m 09s)
19:05 thcipriani@deploy1002: Synchronized php-1.38.0-wmf.1/extensions/EventBus/includes/EventBus.php: Backport: Guard against undefined index notice when setting x-client-ip (T288853) (duration: 01m 09s)
19:04 thcipriani@deploy1002: Synchronized php-1.38.0-wmf.2/extensions/EventBus/includes/EventBus.php: Backport: Guard against undefined index notice when setting x-client-ip (T288853) (duration: 01m 09s)
18:58 mwdebug-deploy@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
18:58 thcipriani@deploy1002: Synchronized php-1.38.0-wmf.2/skins/Vector/resources/skins.vector.styles.legacy/components/MenuDropdown.less: Backport: Restore original more menu padding in legacy Vector (T289163) (duration: 01m 08s)
18:54 mwdebug-deploy@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
18:44 mwdebug-deploy@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
18:43 thcipriani@deploy1002: Scap failed!: Call to mwscript eval.php stderr: not empty
18:42 moritzm: imported gitlab 14.2.5 to thirdparty/gitlab T292219
18:41 mwdebug-deploy@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
18:38 thcipriani@deploy1002: Synchronized wmf-config/InitialiseSettings.php: Config: Use Wikimania's logo in a new vector (T286405) Part III (duration: 01m 07s)
18:37 thcipriani@deploy1002: Synchronized static/images/mobile/copyright/wikimania-wordmark.svg: Config: Use Wikimania's logo in a new vector (T286405) Part II (duration: 01m 07s)
18:35 thcipriani@deploy1002: Synchronized static/images/mobile/copyright/wikimania.svg: Config: Use Wikimania's logo in a new vector (T286405) part I (duration: 01m 07s)
18:32 cmooney@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
18:31 mwdebug-deploy@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
18:31 thcipriani@deploy1002: Synchronized wmf-config: Config: Enable sticky header on beta cluster (T289721) (duration: 01m 08s)
18:29 cmooney@cumin1001: START - Cookbook sre.dns.netbox
18:28 mwdebug-deploy@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
18:27 otto@cumin1001: END (PASS) - Cookbook sre.hosts.decommission (exit_code=0) for hosts thorium.eqiad.wmnet
18:22 bd808@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'toolhub' for release 'main' .
18:20 thcipriani@deploy1002: Synchronized wmf-config/InitialiseSettings.php: Config: Disable legacy media dom on a few more wikis (T51097) (duration: 01m 08s)
18:07 bd808@deploy1002: helmfile [staging] Ran 'sync' command on namespace 'toolhub' for release 'main' .
17:49 elukey@deploy1002: helmfile [ml-serve-eqiad] DONE helmfile.d/admin 'sync'.
17:49 elukey@deploy1002: helmfile [ml-serve-eqiad] START helmfile.d/admin 'sync'.
17:49 otto@cumin1001: START - Cookbook sre.hosts.decommission for hosts thorium.eqiad.wmnet
17:42 bstorm: updating packages for thirdparty/kubeadm-k8s-1-20 and thirdparty/kubeadm-k8s-1-19 in stretch-wikimedia on apt1001 T292131
17:09 jgiannelos@deploy1002: Finished deploy [kartotherian/deploy@8fbf87c] (eqiad): Increase mirrored traffic to 50% for eqiad (duration: 00m 55s)
17:08 jgiannelos@deploy1002: Started deploy [kartotherian/deploy@8fbf87c] (eqiad): Increase mirrored traffic to 50% for eqiad
17:03 jgiannelos@deploy1002: Finished deploy [kartotherian/deploy@8fbf87c] (eqiad): Increase mirrored traffic to 50% for eqiad (duration: 00m 08s)
17:02 jgiannelos@deploy1002: Started deploy [kartotherian/deploy@8fbf87c] (eqiad): Increase mirrored traffic to 50% for eqiad
17:02 elukey@deploy1002: helmfile [ml-serve-eqiad] Ran 'sync' command on namespace 'revscoring-editquality' for release 'main' .
17:00 jgiannelos@deploy1002: Finished deploy [kartotherian/deploy@8fbf87c] (eqiad): Increase mirrored traffic to 50% for eqiad (duration: 00m 11s)
17:00 jgiannelos@deploy1002: Started deploy [kartotherian/deploy@8fbf87c] (eqiad): Increase mirrored traffic to 50% for eqiad
16:49 sukhe: restart dnsdist.service on doh[1001-1002,2001-2002,3001-3002,4001-4002,5001-5002].wikimedia.org
16:43 jgiannelos@deploy1002: Finished deploy [kartotherian/deploy@67a4d22]: Increase mirrored traffic to 10% (duration: 02m 33s)
16:40 jgiannelos@deploy1002: Started deploy [kartotherian/deploy@67a4d22]: Increase mirrored traffic to 10%
16:38 jgiannelos@deploy1002: Finished deploy [kartotherian/deploy@67a4d22] (eqiad): Increase mirrored traffic to 10% (duration: 00m 40s)
16:37 jgiannelos@deploy1002: Started deploy [kartotherian/deploy@67a4d22] (eqiad): Increase mirrored traffic to 10%
16:37 mwdebug-deploy@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
16:33 mwdebug-deploy@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
16:32 hnowlan: Ran `GRANT pg_monitor TO prometheus` for maps in eqiad and codfw to fix empty prometheus connection metrics
16:30 jgiannelos@deploy1002: Finished deploy [kartotherian/deploy@67a4d22] (eqiad): Increase mirrored traffic to 10% (duration: 00m 16s)
16:30 jgiannelos@deploy1002: Started deploy [kartotherian/deploy@67a4d22] (eqiad): Increase mirrored traffic to 10%
16:24 mwdebug-deploy@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
16:20 mwdebug-deploy@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
16:11 ladsgroup@deploy1002: Synchronized wmf-config/InitialiseSettings.php: Config: Disable jQuery migrate in metawiki (T280944) (duration: 01m 09s)
16:08 ladsgroup@deploy1002: Synchronized wmf-config/InitialiseSettings.php: Config: Enable dispatching via job to 10 prod wikis (duration: 01m 09s)
15:55 elukey@deploy1002: helmfile [ml-serve-eqiad] Ran 'sync' command on namespace 'revscoring-editquality' for release 'main' .
14:36 elukey: drop /etc/helmfile-defaults/private/backup_old_paths from deploy1002 (old data not needed anymore)
14:33 marostegui@cumin1001: dbctl commit (dc=all): 'db1111 (re)pooling @ 100%: After upgrade', diff saved to https://phabricator.wikimedia.org/P17374 and previous config saved to /var/cache/conftool/dbconfig/20210930-143325-root.json
14:30 marostegui@cumin1001: dbctl commit (dc=all): 'db1134 (re)pooling @ 100%: After upgrade', diff saved to https://phabricator.wikimedia.org/P17373 and previous config saved to /var/cache/conftool/dbconfig/20210930-143044-root.json
14:18 marostegui@cumin1001: dbctl commit (dc=all): 'db1111 (re)pooling @ 75%: After upgrade', diff saved to https://phabricator.wikimedia.org/P17372 and previous config saved to /var/cache/conftool/dbconfig/20210930-141822-root.json
14:15 marostegui@cumin1001: dbctl commit (dc=all): 'db1134 (re)pooling @ 75%: After upgrade', diff saved to https://phabricator.wikimedia.org/P17370 and previous config saved to /var/cache/conftool/dbconfig/20210930-141540-root.json
14:03 marostegui@cumin1001: dbctl commit (dc=all): 'db1111 (re)pooling @ 50%: After upgrade', diff saved to https://phabricator.wikimedia.org/P17369 and previous config saved to /var/cache/conftool/dbconfig/20210930-140318-root.json
14:00 marostegui@cumin1001: dbctl commit (dc=all): 'db1134 (re)pooling @ 50%: After upgrade', diff saved to https://phabricator.wikimedia.org/P17368 and previous config saved to /var/cache/conftool/dbconfig/20210930-140037-root.json
13:48 marostegui@cumin1001: dbctl commit (dc=all): 'db1111 (re)pooling @ 25%: After upgrade', diff saved to https://phabricator.wikimedia.org/P17367 and previous config saved to /var/cache/conftool/dbconfig/20210930-134815-root.json
13:45 marostegui@cumin1001: dbctl commit (dc=all): 'db1134 (re)pooling @ 25%: After upgrade', diff saved to https://phabricator.wikimedia.org/P17366 and previous config saved to /var/cache/conftool/dbconfig/20210930-134533-root.json
13:42 jmm@cumin2002: END (PASS) - Cookbook sre.ganeti.makevm (exit_code=0) for new host testvm2001.codfw.wmnet
13:40 jgiannelos@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'tegola-vector-tiles' for release 'main' .
13:38 akosiaris@deploy1002: helmfile [eqiad] DONE helmfile.d/admin 'apply'.
13:38 akosiaris@deploy1002: helmfile [eqiad] START helmfile.d/admin 'apply'.
13:37 akosiaris@deploy1002: helmfile [codfw] DONE helmfile.d/admin 'apply'.
13:36 akosiaris@deploy1002: helmfile [codfw] START helmfile.d/admin 'apply'.
13:33 marostegui@cumin1001: dbctl commit (dc=all): 'db1111 (re)pooling @ 10%: After upgrade', diff saved to https://phabricator.wikimedia.org/P17365 and previous config saved to /var/cache/conftool/dbconfig/20210930-133311-root.json
13:30 marostegui@cumin1001: dbctl commit (dc=all): 'db1134 (re)pooling @ 10%: After upgrade', diff saved to https://phabricator.wikimedia.org/P17364 and previous config saved to /var/cache/conftool/dbconfig/20210930-133029-root.json
13:29 marostegui: Upgrade db1111
13:28 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db1111 for upgrade', diff saved to https://phabricator.wikimedia.org/P17363 and previous config saved to /var/cache/conftool/dbconfig/20210930-132831-marostegui.json
13:27 marostegui: Upgrade db1134
13:27 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db1134 (s1) for upgrade', diff saved to https://phabricator.wikimedia.org/P17362 and previous config saved to /var/cache/conftool/dbconfig/20210930-132700-marostegui.json
13:26 marostegui: Upgrade db1133
13:26 jmm@cumin2002: START - Cookbook sre.ganeti.makevm for new host testvm2001.codfw.wmnet
13:02 urbanecm: Start server-side upload for 2 video files (T292096, T291492)
13:01 marostegui@cumin1001: dbctl commit (dc=all): 'db1099:3311 (re)pooling @ 100%: After upgrade', diff saved to https://phabricator.wikimedia.org/P17361 and previous config saved to /var/cache/conftool/dbconfig/20210930-130116-root.json
13:01 marostegui@cumin1001: dbctl commit (dc=all): 'db1099:3318 (re)pooling @ 100%: After upgrade', diff saved to https://phabricator.wikimedia.org/P17360 and previous config saved to /var/cache/conftool/dbconfig/20210930-130109-root.json
12:46 marostegui@cumin1001: dbctl commit (dc=all): 'db1099:3311 (re)pooling @ 75%: After upgrade', diff saved to https://phabricator.wikimedia.org/P17359 and previous config saved to /var/cache/conftool/dbconfig/20210930-124612-root.json
12:46 marostegui@cumin1001: dbctl commit (dc=all): 'db1099:3318 (re)pooling @ 75%: After upgrade', diff saved to https://phabricator.wikimedia.org/P17358 and previous config saved to /var/cache/conftool/dbconfig/20210930-124606-root.json
12:31 Reedy: downloading files for T290900 in screen on mwmaint1002
12:31 marostegui@cumin1001: dbctl commit (dc=all): 'db1099:3311 (re)pooling @ 50%: After upgrade', diff saved to https://phabricator.wikimedia.org/P17357 and previous config saved to /var/cache/conftool/dbconfig/20210930-123109-root.json
12:31 marostegui@cumin1001: dbctl commit (dc=all): 'db1099:3318 (re)pooling @ 50%: After upgrade', diff saved to https://phabricator.wikimedia.org/P17356 and previous config saved to /var/cache/conftool/dbconfig/20210930-123101-root.json
12:18 jgiannelos@deploy1002: Finished deploy [kartotherian/deploy@35b9174]: tegola: remove mirror_threshold variable because of parsing errors (duration: 00m 17s)
12:18 jgiannelos@deploy1002: Started deploy [kartotherian/deploy@35b9174]: tegola: remove mirror_threshold variable because of parsing errors
12:17 moritzm: adapted MX records to point to both mx1001.wikimedia.org and mx2001.wikimedia.org with equal weights T286911
12:17 jgiannelos@deploy1002: Finished deploy [kartotherian/deploy@35b9174]: tegola: remove mirror_threshold variable because of parsing errors (duration: 00m 16s)
12:16 jgiannelos@deploy1002: Started deploy [kartotherian/deploy@35b9174]: tegola: remove mirror_threshold variable because of parsing errors
12:16 marostegui@cumin1001: dbctl commit (dc=all): 'db1099:3311 (re)pooling @ 25%: After upgrade', diff saved to https://phabricator.wikimedia.org/P17355 and previous config saved to /var/cache/conftool/dbconfig/20210930-121605-root.json
12:15 marostegui@cumin1001: dbctl commit (dc=all): 'db1099:3318 (re)pooling @ 25%: After upgrade', diff saved to https://phabricator.wikimedia.org/P17354 and previous config saved to /var/cache/conftool/dbconfig/20210930-121558-root.json
12:14 jgiannelos@deploy1002: Finished deploy [kartotherian/deploy@35b9174]: tegola: remove mirror_threshold variable because of parsing errors (duration: 00m 15s)
12:13 jgiannelos@deploy1002: Started deploy [kartotherian/deploy@35b9174]: tegola: remove mirror_threshold variable because of parsing errors
12:13 jgiannelos@deploy1002: Finished deploy [kartotherian/deploy@35b9174]: tegola: remove mirror_threshold variable because of parsing errors (duration: 00m 15s)
12:13 jgiannelos@deploy1002: Started deploy [kartotherian/deploy@35b9174]: tegola: remove mirror_threshold variable because of parsing errors
12:11 jgiannelos@deploy1002: Finished deploy [kartotherian/deploy@35b9174]: tegola: remove mirror_threshold variable because of parsing errors (duration: 00m 10s)
12:10 jgiannelos@deploy1002: Started deploy [kartotherian/deploy@35b9174]: tegola: remove mirror_threshold variable because of parsing errors
12:10 jgiannelos@deploy1002: Finished deploy [kartotherian/deploy@35b9174]: tegola: remove mirror_threshold variable because of parsing errors (duration: 00m 01s)
12:10 jgiannelos@deploy1002: Started deploy [kartotherian/deploy@35b9174]: tegola: remove mirror_threshold variable because of parsing errors
12:03 mwdebug-deploy@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
12:01 marostegui@cumin1001: dbctl commit (dc=all): 'db1099:3311 (re)pooling @ 10%: After upgrade', diff saved to https://phabricator.wikimedia.org/P17353 and previous config saved to /var/cache/conftool/dbconfig/20210930-120102-root.json
12:00 marostegui@cumin1001: dbctl commit (dc=all): 'db1099:3318 (re)pooling @ 10%: After upgrade', diff saved to https://phabricator.wikimedia.org/P17352 and previous config saved to /var/cache/conftool/dbconfig/20210930-120054-root.json
12:00 mwdebug-deploy@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
11:58 hnowlan: imported wikidiff2_1.13.0-1/php-wikidiff2_1.13.0-1_amd64.deb to buster-wikimedia component/php72
11:56 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db1099 (s1 and s8) for upgrade', diff saved to https://phabricator.wikimedia.org/P17351 and previous config saved to /var/cache/conftool/dbconfig/20210930-115631-marostegui.json
11:51 mwdebug-deploy@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
11:47 jgiannelos@deploy1002: Finished deploy [kartotherian/deploy@0a38bc5]: tegola: use eqiad discovery endpoin (duration: 00m 03s)
11:47 jgiannelos@deploy1002: Started deploy [kartotherian/deploy@0a38bc5]: tegola: use eqiad discovery endpoin
11:47 jgiannelos@deploy1002: Finished deploy [kartotherian/deploy@0a38bc5]: tegola: use eqiad discovery endpoin (duration: 00m 01s)
11:47 jgiannelos@deploy1002: Started deploy [kartotherian/deploy@0a38bc5]: tegola: use eqiad discovery endpoin
11:47 mwdebug-deploy@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
11:46 jgiannelos@deploy1002: Finished deploy [kartotherian/deploy@0a38bc5]: tegola: use eqiad discovery endpoin (duration: 00m 01s)
11:46 jgiannelos@deploy1002: Started deploy [kartotherian/deploy@0a38bc5]: tegola: use eqiad discovery endpoin
11:44 effie: downgrading scap to 3.17.1-1 on maps* hosts - T291990
11:43 kartik@deploy1002: Synchronized wmf-config/InitialiseSettings.php: Config: Make reply tool available as opt-out almost everywhere (phase 3) (T288485) (duration: 01m 07s)
11:37 mwdebug-deploy@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
11:35 kartik@deploy1002: Synchronized php-1.38.0-wmf.2/extensions/DiscussionTools: Backport: Add a link to preferences within the Reply and New Discussion Tools (T291002) (duration: 01m 08s)
11:34 mwdebug-deploy@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
11:30 kartik@deploy1002: Synchronized php-1.38.0-wmf.1/extensions/DiscussionTools: Backport: Add a link to preferences within the Reply and New Discussion Tools (T291002) (duration: 01m 09s)
11:25 mwdebug-deploy@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
11:21 mwdebug-deploy@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
11:14 kartik@deploy1002: Synchronized wmf-config/InitialiseSettings.php: Config: Enable SectionTranslation in Igbo, Hausa, Yoruba Wikipedias (T290175) (duration: 01m 08s)
11:05 elukey@deploy1002: helmfile [ml-serve-eqiad] DONE helmfile.d/admin 'sync'.
11:05 elukey@deploy1002: helmfile [ml-serve-eqiad] START helmfile.d/admin 'sync'.
10:13 akosiaris: upgrade znuny to 6.0.37
10:06 godog: test bounce logstash on logstash1023
08:21 moritzm: installing nettle security updates on stretch
08:10 jmm@cumin2002: END (PASS) - Cookbook sre.ganeti.makevm (exit_code=0) for new host testvm2003.codfw.wmnet
07:49 jmm@cumin2002: START - Cookbook sre.ganeti.makevm for new host testvm2003.codfw.wmnet
07:31 jgiannelos@deploy1002: Finished deploy [kartotherian/deploy@0a38bc5]: tegola: use eqiad discovery endpoin (duration: 00m 06s)
07:31 jgiannelos@deploy1002: Started deploy [kartotherian/deploy@0a38bc5]: tegola: use eqiad discovery endpoin
07:03 legoktm@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'shellbox-syntaxhighlight' for release 'main' .
06:58 legoktm@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'shellbox-syntaxhighlight' for release 'main' .
06:56 legoktm@deploy1002: helmfile [staging] Ran 'sync' command on namespace 'shellbox-syntaxhighlight' for release 'main' .
06:48 marostegui: Deploy schema change on s8 codfw (lag will show up) T270620
06:01 marostegui: Deploy schema change on s1 codfw (lag will show up) T270620
05:53 marostegui: Deploy schema change on s3 codfw (lag will show up) T270620
05:52 marostegui: Deploy schema change on s7 codfw (lag will show up) T270620
05:47 marostegui: Deploy schema change on s5 codfw (lag will show up) T270620
05:45 marostegui: Deploy schema change on s4 codfw (lag will show up) T270620
05:45 marostegui: Deploy schema change on s2 codfw (lag will show up) T270620

2021-09-29

23:20 bd808@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'toolhub' for release 'main' .
23:05 bd808@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'toolhub' for release 'main' .
23:02 bd808@deploy1002: helmfile [staging] Ran 'sync' command on namespace 'toolhub' for release 'main' .
22:10 mwdebug-deploy@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
22:07 mwdebug-deploy@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
21:57 legoktm@deploy1002: Synchronized php-1.38.0-wmf.2/extensions/timeline/includes/Timeline.php: Catch TimelineException from fixMap() (T292126) (duration: 01m 07s)
21:48 mwdebug-deploy@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
21:44 mwdebug-deploy@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
21:37 legoktm@deploy1002: Synchronized php-1.38.0-wmf.2/extensions/timeline/includes/Timeline.php: Bump Timeline::CACHE_VERSION (duration: 01m 08s)
20:34 mwdebug-deploy@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
20:31 mwdebug-deploy@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
20:22 jhuneidi@deploy1002: Synchronized php: group1 wikis to 1.38.0-wmf.2 refs T281166 (duration: 01m 08s)
20:21 jhuneidi@deploy1002: rebuilt and synchronized wikiversions files: group1 wikis to 1.38.0-wmf.2 refs T281166
20:20 mwdebug-deploy@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
20:16 mwdebug-deploy@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
20:16 jhuneidi@deploy1002: Finished scap: Fix pywikibot feature detection (duration: 13m 38s)
20:02 jhuneidi@deploy1002: Started scap: Fix pywikibot feature detection
19:17 mwdebug-deploy@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
19:13 mwdebug-deploy@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
19:06 legoktm@deploy1002: Synchronized php-1.38.0-wmf.2/extensions/timeline/scripts/renderTimeline.sh: Fix passing temp directory to EasyTimeline.pl (duration: 01m 07s)
18:59 mwdebug-deploy@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
18:55 mwdebug-deploy@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
18:52 dancy@deploy1002: Synchronized php-1.38.0-wmf.2/skins/MinervaNeue/resources/skins.minerva.base.styles/ui.less: Backport: Search header should be vertically centered, not top aligned(take 2) (T292071) (duration: 01m 08s)
17:26 mwdebug-deploy@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
17:22 mwdebug-deploy@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
17:14 ladsgroup@deploy1002: Synchronized wmf-config/InitialiseSettings.php: Config: Fully enable change dispatching via jobs on test wikis, Part I (duration: 01m 09s)
17:13 ladsgroup@deploy1002: Synchronized wmf-config/Wikibase.php: Config: Fully enable change dispatching via jobs on test wikis, Part I (duration: 01m 07s)
16:52 mwdebug-deploy@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
16:47 pt1979@cumin2002: END (PASS) - Cookbook sre.experimental.reimage (exit_code=0) for host thumbor2006.codfw.wmnet
16:47 mwdebug-deploy@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
16:43 akosiaris: start hbal -L -G row_B -X on ganeti01.svc.codfw.wmnet . Rows C and D are fine
16:42 akosiaris: start hbal -L -G row_A -X on ganeti01.svc.codfw.wmnet
16:40 akosiaris: migrate kubemaster2001 off ganeti2007 and to ganeti2008 due to memory starvation on ganeti2007
16:37 mwdebug-deploy@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
16:34 pt1979@cumin2002: START - Cookbook sre.experimental.reimage for host thumbor2006.codfw.wmnet
16:33 mwdebug-deploy@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
16:25 lucaswerkmeister-wmde@deploy1002: Synchronized php-1.38.0-wmf.2/extensions/WikimediaBadges/: Backport: Handle missing items in WikibaseClientSiteLinksForItemHandler (T291953) (duration: 01m 08s)
16:24 lucaswerkmeister-wmde@deploy1002: Synchronized php-1.38.0-wmf.1/extensions/WikimediaBadges/: Backport: Handle missing items in WikibaseClientSiteLinksForItemHandler (T291953) (duration: 01m 10s)
15:58 pt1979@cumin2002: END (FAIL) - Cookbook sre.experimental.reimage (exit_code=99) for host thumbor2006.codfw.wmnet
15:53 mwdebug-deploy@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
15:50 mwdebug-deploy@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
15:45 Amir1: disabled cron dispatching for mediawikiwiki
15:44 ladsgroup@deploy1002: Synchronized wmf-config/InitialiseSettings.php: Config: Enable change dispatching via jobs in wikidatawiki (T48643) (duration: 01m 08s)
15:44 pt1979@cumin2002: START - Cookbook sre.experimental.reimage for host thumbor2006.codfw.wmnet
15:39 filippo@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ms-be2035.codfw.wmnet
15:39 ladsgroup@deploy1002: Synchronized php-1.38.0-wmf.2/extensions/Wikibase/client: Backport: Track time until dispatched recent changes are inserted (T291962) (duration: 01m 10s)
15:24 pt1979@cumin2002: END (PASS) - Cookbook sre.experimental.reimage (exit_code=0) for host thumbor2006.codfw.wmnet
15:22 filippo@cumin1001: START - Cookbook sre.hosts.reboot-single for host ms-be2035.codfw.wmnet
15:21 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ganeti2026.codfw.wmnet
15:12 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host ganeti2026.codfw.wmnet
15:02 elukey@deploy1002: helmfile [ml-serve-eqiad] Ran 'sync' command on namespace 'revscoring-editquality' for release 'main' .
14:51 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ganeti2025.codfw.wmnet
14:45 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host ganeti2025.codfw.wmnet
14:38 jmm@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
14:35 jmm@cumin2002: START - Cookbook sre.dns.netbox
14:25 jmm@cumin2002: END (FAIL) - Cookbook sre.hosts.decommission (exit_code=1) for hosts testvm2001.codfw.wmnet
14:17 jmm@cumin2002: START - Cookbook sre.hosts.decommission for hosts testvm2001.codfw.wmnet
14:08 dcausse@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'rdf-streaming-updater' for release 'main' .
14:07 elukey@deploy1002: helmfile [ml-serve-eqiad] DONE helmfile.d/admin 'sync'.
14:07 elukey@deploy1002: helmfile [ml-serve-eqiad] START helmfile.d/admin 'sync'.
14:04 pt1979@cumin2002: START - Cookbook sre.experimental.reimage for host thumbor2006.codfw.wmnet
14:01 dcausse@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'rdf-streaming-updater' for release 'main' .
13:38 elukey@deploy1002: helmfile [ml-serve-eqiad] DONE helmfile.d/admin 'apply'.
13:38 elukey@deploy1002: helmfile [ml-serve-eqiad] START helmfile.d/admin 'apply'.
13:34 volans@cumin2002: END (PASS) - Cookbook sre.experimental.reimage (exit_code=0) for host sretest1001.eqiad.wmnet
13:31 dcausse@deploy1002: helmfile [staging] Ran 'sync' command on namespace 'rdf-streaming-updater' for release 'main' .
13:11 otto@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'eventgate-analytics' for release 'production' .
13:11 otto@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'eventgate-analytics' for release 'canary' .
13:09 volans@cumin2002: START - Cookbook sre.experimental.reimage for host sretest1001.eqiad.wmnet
13:09 otto@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'eventgate-analytics' for release 'canary' .
13:09 otto@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'eventgate-analytics' for release 'production' .
13:08 elukey@deploy1002: helmfile [ml-serve-eqiad] DONE helmfile.d/admin 'apply'.
13:08 elukey@deploy1002: helmfile [ml-serve-eqiad] START helmfile.d/admin 'apply'.
13:04 elukey@deploy1002: helmfile [ml-serve-eqiad] DONE helmfile.d/admin 'sync'.
13:04 elukey@deploy1002: helmfile [ml-serve-eqiad] START helmfile.d/admin 'sync'.
12:09 mwdebug-deploy@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
12:06 mwdebug-deploy@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
11:56 mwdebug-deploy@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
11:52 mwdebug-deploy@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
11:48 Lucas_WMDE: EU backport+config window done
11:48 lucaswerkmeister-wmde@deploy1002: Synchronized php-1.38.0-wmf.2/skins/MinervaNeue/skinStyles/mobile.startup/Overlay.less: Backport: Revert "Search header should be vertically centered, not top aligned." (T292030) (duration: 01m 07s)
11:43 lucaswerkmeister-wmde@deploy1002: Synchronized php-1.38.0-wmf.2/extensions/Wikibase/repo/includes/Store/Sql/SqlSiteLinkConflictLookup.php: Backport: Use CONN_TRX_AUTOCOMMIT in SqlSiteLinkConflictLookup (T291377) (duration: 01m 07s)
11:43 volans@cumin2002: END (PASS) - Cookbook sre.experimental.reimage (exit_code=0) for host sretest1001.eqiad.wmnet
11:42 mwdebug-deploy@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
11:38 mwdebug-deploy@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
11:26 mwdebug-deploy@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
11:26 lucaswerkmeister-wmde@deploy1002: Synchronized wmf-config/InitialiseSettings.php: Config: Enable line numbering on all namespaces (pilot wikis) (T280027) (duration: 01m 09s)
11:22 mwdebug-deploy@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
11:20 lucaswerkmeister-wmde@deploy1002: Synchronized php-1.38.0-wmf.1/extensions/DiscussionTools/modules/dt.ui.ReplyWidget.js: Backport: Fix almost all errors codes being logged as `http-0` (T290514) (duration: 01m 09s)
11:17 lucaswerkmeister-wmde@deploy1002: Synchronized php-1.38.0-wmf.2/extensions/DiscussionTools/modules/dt.ui.ReplyWidget.js: Backport: Fix almost all errors codes being logged as `http-0` (T290514) (duration: 01m 09s)
11:16 volans@cumin2002: START - Cookbook sre.experimental.reimage for host sretest1001.eqiad.wmnet
11:15 volans@cumin2002: END (ERROR) - Cookbook sre.experimental.reimage (exit_code=97) for host sretest1001.eqiad.wmnet
10:35 volans@cumin2002: START - Cookbook sre.experimental.reimage for host sretest1001.eqiad.wmnet
10:34 volans@cumin2002: END (ERROR) - Cookbook sre.experimental.reimage (exit_code=97) for host sretest1001.eqiad.wmnet
10:24 volans@cumin2002: START - Cookbook sre.experimental.reimage for host sretest1001.eqiad.wmnet
10:02 volans@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 0:05:00 on cumin1001.eqiad.wmnet with reason: testing latest change
10:00 volans@cumin1001: START - Cookbook sre.hosts.downtime for 0:05:00 on cumin1001.eqiad.wmnet with reason: testing latest change
09:54 godog: bounce mtail on centrallog* - T246470
09:47 dcausse@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'rdf-streaming-updater' for release 'main' .
09:40 jgiannelos@deploy1002: Finished deploy [kartotherian/deploy@0a38bc5]: tegola: use eqiad discovery endpoin (duration: 00m 11s)
09:39 jgiannelos@deploy1002: Started deploy [kartotherian/deploy@0a38bc5]: tegola: use eqiad discovery endpoin
08:58 dcausse@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'rdf-streaming-updater' for release 'main' .
08:39 dcausse@deploy1002: helmfile [staging] Ran 'sync' command on namespace 'rdf-streaming-updater' for release 'main' .
08:22 ema: fleet-wide rm /etc/rsyslog.d/00-abort-unclean-config.conf && systemctl restart rsyslog
07:51 godog: fail sdg on be2036 - T291988
07:25 marostegui@cumin1001: dbctl commit (dc=all): 'Repool db2081 T290868', diff saved to https://phabricator.wikimedia.org/P17345 and previous config saved to /var/cache/conftool/dbconfig/20210929-072520-marostegui.json
07:15 oblivian@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
06:15 marostegui: Deploy schema change on s8 codfw (lag will show up) T283499
06:10 ryankemper: T289517 Ran puppet across query_service fleet `sudo cumin -b 6 'P{w*qs*}' 'sudo run-puppet-agent'`
06:09 ryankemper: T289517 Merged https://gerrit.wikimedia.org/r/c/operations/puppet/+/720746 (fix dcat-ap loading)
05:56 marostegui@cumin1001: dbctl commit (dc=all): 'Repool db2103 T290865', diff saved to https://phabricator.wikimedia.org/P17344 and previous config saved to /var/cache/conftool/dbconfig/20210929-055645-marostegui.json
04:50 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db2081 T290868', diff saved to https://phabricator.wikimedia.org/P17342 and previous config saved to /var/cache/conftool/dbconfig/20210929-045033-marostegui.json
03:18 eileen: civicrm revision changed from a0bc324a61 to 2ecb8f0bcd, config revision is 77cb7ec866
03:01 eileen: civicrm revision changed from 1b7bae4033 to a0bc324a61, config revision is 77cb7ec866
03:00 eileen: civicrm revision changed from a480bf03c9 to 1b7bae4033, config revision is 77cb7ec866
02:36 legoktm@deploy1002: Synchronized wmf-config/InitialiseSettings.php: Have PdfHandler/PagedTiffHandler use Shellbox on all wikis but Commons (duration: 01m 07s)
00:52 eileen: civicrm revision changed from a1929b3dfd to a480bf03c9, config revision is 77cb7ec866
00:27 legoktm@deploy1002: Synchronized wmf-config/InitialiseSettings.php: Have SyntaxHighlight use Shellbox on all wikis (duration: 01m 18s)
00:21 ryankemper: T280001 `ryankemper@authdns1001:~$ sudo -i authdns-update` following merge of https://gerrit.wikimedia.org/r/c/operations/dns/+/724538
00:19 ryankemper: T280001 Okay now we're clear to proceed to https://wikitech.wikimedia.org/wiki/LVS#For_active/active_services; merging https://gerrit.wikimedia.org/r/c/operations/dns/+/724538
00:15 ryankemper: T280001 `ryankemper@cumin1001:~$ sudo cumin 'A:icinga or A:dns-auth' run-puppet-agent` per https://wikitech.wikimedia.org/wiki/LVS#Make_the_service_page,_add_discovery_resources
00:14 ryankemper: T280001 Moving wcqs state from `monitoring_setup` to `production`; merged https://gerrit.wikimedia.org/r/c/operations/puppet/+/724536

2021-09-28

23:53 ryankemper: T280001 New icinga checks are green, will proceed to next step of moving wcqs state from `monitoring_setup` -> `production`

23:49 ryankemper: T280001 New icinga alerts showing up as expected following wcqs state change to `monitoring_setup`: `LVS wcqs codfw port 443/tcp - Wikimedia Commons Query Service IPv4` and `LVS wcqs eqiad port 443/tcp - Wikimedia Commons Query Service IPv4`
23:45 ryankemper: T280001 Changing wcqs state from `lvs_setup` to `monitoring_setup`: `ryankemper@cumin1001:~$ sudo cumin 'A:icinga' 'run-puppet-agent'`
23:14 ryankemper: !log T282117 `error: plugin_geoip: Invalid resource name 'disc-wcqs' detected from zonefile lookup` We must be missing a line, reverting change to fix
23:14 ryankemper: T282117 `ryankemper@authdns1001:~$ sudo -i authdns-update` following merge of https://gerrit.wikimedia.org/r/724520
23:13 pt1979@cumin2002: END (FAIL) - Cookbook sre.hosts.downtime (exit_code=99) for 2:00:00 on mw2419.codfw.wmnet with reason: REIMAGE
23:11 pt1979@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on mw2419.codfw.wmnet with reason: REIMAGE
22:46 pt1979@cumin2002: END (FAIL) - Cookbook sre.hosts.downtime (exit_code=99) for 2:00:00 on mw2418.codfw.wmnet with reason: REIMAGE
22:44 pt1979@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on mw2418.codfw.wmnet with reason: REIMAGE
22:41 legoktm@deploy1002: Finished scap: Fix erroneous en-gb translations in 1.38.0-wmf.1 (T291717) (duration: 17m 43s)
22:25 eileen: civicrm revision changed from b8f756b60e to a1929b3dfd, config revision is 77cb7ec866
22:25 pt1979@cumin2002: END (FAIL) - Cookbook sre.hosts.downtime (exit_code=99) for 2:00:00 on mw2417.codfw.wmnet with reason: REIMAGE
22:23 legoktm@deploy1002: Started scap: Fix erroneous en-gb translations in 1.38.0-wmf.1 (T291717)
22:23 pt1979@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on mw2417.codfw.wmnet with reason: REIMAGE
22:17 pt1979@cumin2002: END (FAIL) - Cookbook sre.hosts.downtime (exit_code=99) for 2:00:00 on mw2416.codfw.wmnet with reason: REIMAGE
22:15 pt1979@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on mw2416.codfw.wmnet with reason: REIMAGE
22:15 legoktm@cumin1001: conftool action : set/pooled=true; selector: dnsdisc=wcqs
21:58 pt1979@cumin2002: END (FAIL) - Cookbook sre.hosts.downtime (exit_code=99) for 2:00:00 on mw2415.codfw.wmnet with reason: REIMAGE
21:56 pt1979@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on mw2415.codfw.wmnet with reason: REIMAGE
21:52 pt1979@cumin2002: END (FAIL) - Cookbook sre.hosts.downtime (exit_code=99) for 2:00:00 on mw2414.codfw.wmnet with reason: REIMAGE
21:49 pt1979@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on mw2414.codfw.wmnet with reason: REIMAGE
21:22 ryankemper: T280247 Puppet run complete on all of `cp-text`, trafficserver backend work is done
21:22 pt1979@cumin1001: END (PASS) - Cookbook sre.experimental.reimage (exit_code=0) for host thumbor2005.codfw.wmnet
21:19 bd808: bd808@mwmaint1002 echo "https://toolhub.wikimedia.org/static/js/chunk-vendors.js" | mwscript purgeList.php
21:17 topranks: Configure cr2-esams for NaWas BGP peering to gateway-1 IPv6 and gateway-2 (T288505)
21:11 topranks: Configure cr2-esams for NaWas BGP peering to gateway-1 IPv4 (T288505)
21:10 ryankemper: T280247 `ryankemper@cumin1001:~$ sudo cumin -b 5 'A:cp-text' 'sudo run-puppet-agent --force'`
21:09 ryankemper: T280247 `ryankemper@cp1075:~$ sudo grep commons-query /etc/trafficserver/remap.config` shows `map http://commons-query.wikimedia.org https://wcqs.discovery.wmnet`; proceeding to rest of fleet in batches of 5
21:08 pt1979@cumin1001: START - Cookbook sre.experimental.reimage for host thumbor2005.codfw.wmnet
21:07 ryankemper: T280247 Running on single cp-text host: `ryankemper@cp1075:~$ sudo run-puppet-agent --force`
21:05 ryankemper: T280247 Merged https://gerrit.wikimedia.org/r/c/operations/puppet/+/720078
21:03 ryankemper: T280247 `ryankemper@cumin1001:~$ sudo cumin 'A:cp-text' 'sudo disable-puppet "Add trafficserver backend mapping for commons-query.wikimedia.org - T280247"'`
21:02 legoktm: legoktm@deploy1002:~$ echo "https://toolhub.wikimedia.org/" | mwscript purgeList.php
20:52 otto@deploy1002: helmfile [staging] Ran 'sync' command on namespace 'eventgate-analytics' for release 'canary' .
20:52 otto@deploy1002: helmfile [staging] Ran 'sync' command on namespace 'eventgate-analytics' for release 'production' .
20:51 ryankemper: T280247 Puppet successfully ran on all `w*qs*` hosts; GUI working as before for WDQS, and WCQS seems fine as well. Deploy succeeded without any hitches
20:49 legoktm: re-enabling and running puppet on A:cp-text: sudo cumin -b 5 A:cp-text 'enable-puppet --force && run-puppet-agent'
20:49 otto@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'eventgate-analytics-external' for release 'production' .
20:49 otto@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'eventgate-analytics-external' for release 'canary' .
20:41 legoktm: disabling puppet on A:cp-text in preparation for adding toolhub
20:38 ryankemper: T280247 `ryankemper@cumin1001:~$ sudo cumin -b 5 'P{w*qs*}' 'sudo run-puppet-agent --force'`; 25 hosts total so will take 5 iterations
20:37 ryankemper: T280247 Test queries on `wdqs1003` passed (tunneled into `wdqs1003`), proceeding to rest of fleet
20:37 ryankemper: T280247 Ran on wdqs canary `wdqs1003`: `ryankemper@wdqs1003:~$ sudo run-puppet-agent --force`
20:33 ryankemper: T280247 Running on single wcqs hosts: `ryankemper@wcqs1001:~$ sudo run-puppet-agent --force`
20:33 ryankemper: T280247 `ryankemper@cumin1001` -> `sudo cumin 'P{w*qs*}' 'sudo disable-puppet "Make query_service nginx proxy to GUI microsite - T280247"'`
20:33 topranks: Adding IPv6 address to NaWas sub-interfaceon cr2-esams (AMS-IX) - T288505
19:48 jhuneidi@deploy1002: rebuilt and synchronized wikiversions files: group0 wikis to 1.38.0-wmf.2 refs T281166
19:35 legoktm@deploy1002: Synchronized private/PrivateSettings.php: Use IPUtils instead of removed IP class (T292010) (duration: 01m 09s)
19:27 jhuneidi@deploy1002: rebuilt and synchronized wikiversions files: Revert "group0 wikis to 1.38.0-wmf.1"
19:08 jhuneidi@deploy1002: rebuilt and synchronized wikiversions files: group0 wikis to 1.38.0-wmf.2 refs T281166
19:05 legoktm@cumin1001: conftool action : set/pooled=true; selector: name=eqiad,dnsdisc=toolhub
19:04 legoktm: adding toolhub to discovery DNS (T280881)
19:00 jgiannelos@deploy1002: Finished deploy [kartotherian/deploy@0a38bc5]: tegola: use eqiad discovery endpoin (duration: 00m 20s)
19:00 jgiannelos@deploy1002: Started deploy [kartotherian/deploy@0a38bc5]: tegola: use eqiad discovery endpoin
18:54 ryankemper: T280001 Merged https://gerrit.wikimedia.org/r/c/operations/puppet/+/721600 (add wcqs scap dsh groups), running puppet on scap::dsh hosts: `ryankemper@cumin1001:~$ sudo cumin 'P:scap::dsh' 'sudo run-puppet-agent'`
18:45 jhuneidi@deploy1002: Finished scap: testwikis wikis to 1.38.0-wmf.2 refs T281166 (duration: 49m 27s)
18:21 pt1979@cumin1001: END (FAIL) - Cookbook sre.experimental.reimage (exit_code=99) for host thumbor2005.codfw.wmnet
18:18 cmjohnson@cumin1001: END (FAIL) - Cookbook sre.hosts.downtime (exit_code=99) for 2:00:00 on puppetmaster1005.eqiad.wmnet with reason: REIMAGE
18:18 jgiannelos@deploy1002: Finished deploy [kartotherian/deploy@0a38bc5]: tegola: use eqiad discovery endpoin (duration: 00m 08s)
18:18 jgiannelos@deploy1002: Started deploy [kartotherian/deploy@0a38bc5]: tegola: use eqiad discovery endpoin
18:16 cmjohnson@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on puppetmaster1005.eqiad.wmnet with reason: REIMAGE
18:14 cmjohnson@cumin1001: END (FAIL) - Cookbook sre.hosts.downtime (exit_code=99) for 2:00:00 on puppetmaster1004.eqiad.wmnet with reason: REIMAGE
18:12 cmjohnson@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on puppetmaster1004.eqiad.wmnet with reason: REIMAGE
18:02 otto@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'eventgate-analytics-external' for release 'production' .
18:02 otto@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'eventgate-analytics-external' for release 'canary' .
18:01 pt1979@cumin1001: START - Cookbook sre.experimental.reimage for host thumbor2005.codfw.wmnet
18:00 otto@deploy1002: helmfile [staging] Ran 'sync' command on namespace 'eventgate-analytics-external' for release 'production' .
17:57 otto@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'eventgate-logging-external' for release 'production' .
17:57 otto@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'eventgate-logging-external' for release 'canary' .
17:55 jhuneidi@deploy1002: Started scap: testwikis wikis to 1.38.0-wmf.2 refs T281166
17:50 pt1979@cumin2002: END (PASS) - Cookbook sre.experimental.reimage (exit_code=0) for host mw2413.codfw.wmnet
17:48 pt1979@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
17:46 otto@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'eventgate-logging-external' for release 'canary' .
17:46 otto@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'eventgate-logging-external' for release 'production' .
17:44 pt1979@cumin2002: START - Cookbook sre.dns.netbox
17:44 otto@deploy1002: helmfile [staging] Ran 'sync' command on namespace 'eventgate-logging-external' for release 'production' .
17:36 jgiannelos@deploy1002: Finished deploy [kartotherian/deploy@0a38bc5]: tegola: use eqiad discovery endpoin (duration: 00m 11s)
17:36 jgiannelos@deploy1002: Started deploy [kartotherian/deploy@0a38bc5]: tegola: use eqiad discovery endpoin
17:35 jgiannelos@deploy1002: Finished deploy [kartotherian/deploy@0a38bc5]: tegola: use eqiad discovery endpoin (duration: 00m 17s)
17:35 pt1979@cumin2002: START - Cookbook sre.experimental.reimage for host mw2413.codfw.wmnet
17:35 jgiannelos@deploy1002: Started deploy [kartotherian/deploy@0a38bc5]: tegola: use eqiad discovery endpoin
17:32 jgiannelos@deploy1002: Finished deploy [kartotherian/deploy@0a38bc5]: tegola: use eqiad discovery endpoin (duration: 00m 06s)
17:32 jgiannelos@deploy1002: Started deploy [kartotherian/deploy@0a38bc5]: tegola: use eqiad discovery endpoin
17:29 mbsantos@deploy1002: Finished deploy [kartotherian/deploy@0a38bc5]: tegola: use eqiad discovery endpoin (duration: 02m 43s)
17:26 mbsantos@deploy1002: Started deploy [kartotherian/deploy@0a38bc5]: tegola: use eqiad discovery endpoin
17:24 mbsantos@deploy1002: Finished deploy [kartotherian/deploy@35b9174]: tegola: remove mirror_threshold variable because of parsing errors (duration: 00m 24s)
17:24 mbsantos@deploy1002: Started deploy [kartotherian/deploy@35b9174]: tegola: remove mirror_threshold variable because of parsing errors
17:23 pt1979@cumin2002: END (FAIL) - Cookbook sre.experimental.reimage (exit_code=99) for host mw2413.codfw.wmnet
17:14 mbsantos@deploy1002: Finished deploy [kartotherian/deploy@1f90e6f]: tegola: hard code threshold because deployment fails (duration: 00m 18s)
17:13 mbsantos@deploy1002: Started deploy [kartotherian/deploy@1f90e6f]: tegola: hard code threshold because deployment fails
17:09 mbsantos@deploy1002: Finished deploy [kartotherian/deploy@3e52e0a]: tegola: use global config var for load tests (duration: 00m 11s)
17:09 mbsantos@deploy1002: Started deploy [kartotherian/deploy@3e52e0a]: tegola: use global config var for load tests
17:05 jgiannelos@deploy1002: helmfile [staging] Ran 'sync' command on namespace 'tegola-vector-tiles' for release 'main' .
17:04 jgiannelos@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'tegola-vector-tiles' for release 'main' .
17:04 pt1979@cumin2002: START - Cookbook sre.experimental.reimage for host mw2413.codfw.wmnet
17:00 pt1979@cumin2002: END (PASS) - Cookbook sre.experimental.reimage (exit_code=0) for host mw2412.codfw.wmnet
16:46 pt1979@cumin2002: START - Cookbook sre.experimental.reimage for host mw2412.codfw.wmnet
16:39 bd808@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'toolhub' for release 'main' .
16:38 elukey@cumin1001: END (PASS) - Cookbook sre.hadoop.roll-restart-workers (exit_code=0) restart workers for Hadoop analytics cluster: Roll restart of jvm daemons for openjdk upgrade. - elukey@cumin1001
16:28 mbsantos@deploy1002: Finished deploy [kartotherian/deploy@3e52e0a]: tegola: use global config var for load tests (duration: 00m 14s)
16:28 mbsantos@deploy1002: Started deploy [kartotherian/deploy@3e52e0a]: tegola: use global config var for load tests
16:27 bd808@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'toolhub' for release 'main' .
16:26 pt1979@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
16:21 pt1979@cumin2002: START - Cookbook sre.dns.netbox
16:19 mbsantos@deploy1002: Finished deploy [kartotherian/deploy@f35571e] (eqiad): tegola: mirror kartotherian/eqiad traffic to codfw/tegola (duration: 00m 18s)
16:19 mbsantos@deploy1002: Started deploy [kartotherian/deploy@f35571e] (eqiad): tegola: mirror kartotherian/eqiad traffic to codfw/tegola
16:16 bd808@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'toolhub' for release 'main' .
16:13 bd808@deploy1002: helmfile [staging] Ran 'sync' command on namespace 'toolhub' for release 'main' .
16:12 jgiannelos@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'tegola-vector-tiles' for release 'main' .
16:10 pt1979@cumin2002: END (FAIL) - Cookbook sre.experimental.reimage (exit_code=99) for host mw2412.codfw.wmnet
16:09 jgiannelos@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'tegola-vector-tiles' for release 'main' .
16:07 jgiannelos@deploy1002: helmfile [staging] Ran 'sync' command on namespace 'tegola-vector-tiles' for release 'main' .
15:53 pt1979@cumin2002: START - Cookbook sre.experimental.reimage for host mw2412.codfw.wmnet
15:41 mwdebug-deploy@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
15:39 _joe_: restarting pybal on lvs2010
15:38 mwdebug-deploy@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
15:31 oblivian@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'toolhub' for release 'main' .
14:51 _joe_: restarting pybals in codfw again
14:41 oblivian@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'toolhub' for release 'main' .
14:39 elukey@cumin1001: START - Cookbook sre.hadoop.roll-restart-workers restart workers for Hadoop analytics cluster: Roll restart of jvm daemons for openjdk upgrade. - elukey@cumin1001
14:38 marostegui: Remove flaggedimages from s5 T290340
14:36 _joe_: restarting pybal on lvs2009
14:34 _joe_: restarting pybal on lvs1015
14:33 hnowlan@puppetmaster1001: conftool action : set/pooled=false; selector: dnsdisc=kartotherian,name=codfw
14:32 _joe_: restarting pybal on lvs2010
14:32 arturo: add packages for buster-wikimedia|thirdparty/kubeadm-k8s-1-20 (T280402)
14:31 _joe_: restarting pybal on lvs1016
13:40 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db2080 T290868', diff saved to https://phabricator.wikimedia.org/P17339 and previous config saved to /var/cache/conftool/dbconfig/20210928-134030-marostegui.json
13:40 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db2103 T290865', diff saved to https://phabricator.wikimedia.org/P17337 and previous config saved to /var/cache/conftool/dbconfig/20210928-134012-marostegui.json
13:39 pt1979@cumin2002: END (FAIL) - Cookbook sre.hosts.downtime (exit_code=99) for 2:00:00 on centrallog2002.codfw.wmnet with reason: REIMAGE
13:37 pt1979@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on centrallog2002.codfw.wmnet with reason: REIMAGE
13:36 marostegui@cumin1001: END (PASS) - Cookbook sre.experimental.reimage (exit_code=0) for host db2103.codfw.wmnet
13:36 otto@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'eventgate-main' for release 'canary' .
13:36 otto@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'eventgate-main' for release 'production' .
13:33 otto@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'eventgate-main' for release 'production' .
13:33 otto@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'eventgate-main' for release 'canary' .
13:30 otto@deploy1002: helmfile [staging] Ran 'sync' command on namespace 'eventgate-main' for release 'production' .
13:03 marostegui@cumin1001: START - Cookbook sre.experimental.reimage for host db2103.codfw.wmnet
13:01 btullis@deploy1002: Finished deploy [analytics/refinery@380d165] (hadoop-test): Regular analytics weekly train TEST [analytics/refinery@380d165] (duration: 07m 02s)
12:54 btullis@deploy1002: Started deploy [analytics/refinery@380d165] (hadoop-test): Regular analytics weekly train TEST [analytics/refinery@380d165]
12:54 btullis@deploy1002: Finished deploy [analytics/refinery@380d165] (thin): Regular analytics weekly train THIN [analytics/refinery@380d165] (duration: 00m 07s)
12:53 btullis@deploy1002: Started deploy [analytics/refinery@380d165] (thin): Regular analytics weekly train THIN [analytics/refinery@380d165]
12:53 btullis@deploy1002: Finished deploy [analytics/refinery@380d165]: Regular analytics weekly train [analytics/refinery@380d165] (duration: 17m 42s)
12:35 btullis@deploy1002: Started deploy [analytics/refinery@380d165]: Regular analytics weekly train [analytics/refinery@380d165]
12:29 akosiaris@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'wikifeeds' for release 'production' .
12:27 akosiaris@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'wikifeeds' for release 'production' .
12:11 urbanecm: [urbanecm@wtp1026 ~]$ sudo -i /usr/local/sbin/restart-php7.2-fpm
12:10 Lucas_WMDE: lucaswerkmeister-wmde@wtp1026:~$ sudo -u mwdeploy /usr/local/sbin/restart-php7.2-fpm # attempt to solve a recurrence of T290120, but it failed
12:06 marostegui: Remove flaggedimages from s7 T290340
12:03 mwdebug-deploy@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
12:00 mwdebug-deploy@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
11:57 Lucas_WMDE: EU backport+config window done
11:54 lucaswerkmeister-wmde@deploy1002: Synchronized php-1.38.0-wmf.1/extensions/Wikibase/repo/includes/Store/Sql/SqlSiteLinkConflictLookup.php: Backport: Use CONN_TRX_AUTOCOMMIT in SqlSiteLinkConflictLookup (T291377) (duration: 00m 57s)
11:32 jmm@cumin2002: END (PASS) - Cookbook sre.ganeti.makevm (exit_code=0) for new host testvm2001.codfw.wmnet
11:31 mwdebug-deploy@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
11:29 marostegui: Deploy schema change on s3 codfw (lag will show up) T283499
11:29 lucaswerkmeister-wmde@deploy1002: Synchronized wmf-config/InitialiseSettings.php: Config: Add support for SectionTranslationTargetLanguages (T290302, T290175) (duration: 00m 57s)
11:29 arturo: cleanup unused repo component buster-wikimedia|thirdparty/kubeadm-k8s-1-18 (T280402)
11:28 mwdebug-deploy@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
11:27 elukey@cumin1001: END (PASS) - Cookbook sre.ores.roll-restart-workers (exit_code=0) for ORES codfw cluster: Roll restart of ORES's daemons. - elukey@cumin1001
11:25 marostegui: Deploy schema change on s6 codfw (lag will show up) T283499
11:12 jmm@cumin2002: START - Cookbook sre.ganeti.makevm for new host testvm2001.codfw.wmnet
11:09 ladsgroup@deploy1002: Synchronized wmf-config/InitialiseSettings.php: Config: Enable new dispatch via job approach on testwikidata and testwiki (T291610) (duration: 00m 57s)
11:08 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.decommission (exit_code=0) for hosts testvm2002.codfw.wmnet
11:07 elukey@cumin1001: START - Cookbook sre.ores.roll-restart-workers for ORES codfw cluster: Roll restart of ORES's daemons. - elukey@cumin1001
11:05 effie: downgrading scap to 3.17.1 on deploy1002 - T291095
11:01 jiji@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host rdb1011.eqiad.wmnet
10:53 jiji@cumin1001: START - Cookbook sre.hosts.reboot-single for host rdb1011.eqiad.wmnet
10:49 mwdebug-deploy@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
10:46 mwdebug-deploy@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
10:46 jmm@cumin2002: START - Cookbook sre.hosts.decommission for hosts testvm2002.codfw.wmnet
10:45 jiji@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host rdb1012.eqiad.wmnet
10:40 jiji@cumin1001: START - Cookbook sre.hosts.reboot-single for host rdb1012.eqiad.wmnet
10:32 jiji@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host rdb2009.codfw.wmnet
10:29 jiji@cumin1001: START - Cookbook sre.hosts.reboot-single for host rdb2009.codfw.wmnet
10:23 jiji@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host rdb2007.codfw.wmnet
10:16 jiji@cumin1001: START - Cookbook sre.hosts.reboot-single for host rdb2007.codfw.wmnet
10:13 jiji@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host rdb2010.codfw.wmnet
10:10 elukey@cumin1001: END (PASS) - Cookbook sre.ores.roll-restart-workers (exit_code=0) for ORES codfw cluster: Roll restart of ORES's daemons. - elukey@cumin1001
10:08 jiji@cumin1001: START - Cookbook sre.hosts.reboot-single for host rdb2010.codfw.wmnet
10:01 marostegui: Deploy schema change on s5 codfw (lag will show up) T283499
10:00 marostegui: Deploy schema change on s7 codfw (lag will show up) T283499
09:57 jiji@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host rdb2008.codfw.wmnet
09:50 elukey@cumin1001: START - Cookbook sre.ores.roll-restart-workers for ORES codfw cluster: Roll restart of ORES's daemons. - elukey@cumin1001
09:48 _joe_: removing old builds from compiler1002.puppet-diffs.eqiad1.wikimedia.cloud
09:48 jiji@cumin1001: START - Cookbook sre.hosts.reboot-single for host rdb2008.codfw.wmnet
09:46 jiji@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host rdb1010.eqiad.wmnet
09:42 jiji@cumin1001: START - Cookbook sre.hosts.reboot-single for host rdb1010.eqiad.wmnet
09:37 elukey@cumin1001: END (PASS) - Cookbook sre.kafka.roll-restart-mirror-maker (exit_code=0) restart MirrorMaker for Kafka A:kafka-mirror-maker-test-eqiad cluster: Roll restart of jvm daemons. - elukey@cumin1001
09:27 elukey@cumin1001: START - Cookbook sre.kafka.roll-restart-mirror-maker restart MirrorMaker for Kafka A:kafka-mirror-maker-test-eqiad cluster: Roll restart of jvm daemons. - elukey@cumin1001
09:26 marostegui: Deploy schema change on s4 codfw (lag will show up) T283499
09:23 marostegui: Deploy schema change on s2 codfw (lag will show up) T283499
09:00 marostegui@cumin1001: END (PASS) - Cookbook sre.experimental.reimage (exit_code=0) for host db2080.codfw.wmnet
08:57 effie: upgrade scap on eqiad and codfw - T291095
08:30 marostegui@cumin1001: START - Cookbook sre.experimental.reimage for host db2080.codfw.wmnet
08:17 volans: uploaded spicerack_1.0.3 to apt.wikimedia.org buster-wikimedia,bullseye-wikimedia
07:38 volans@cumin2002: END (PASS) - Cookbook sre.experimental.reimage (exit_code=0) for host sretest1002.eqiad.wmnet
07:21 volans@cumin2002: END (PASS) - Cookbook sre.experimental.reimage (exit_code=0) for host sretest1001.eqiad.wmnet
07:14 volans@cumin2002: START - Cookbook sre.experimental.reimage for host sretest1002.eqiad.wmnet
06:54 volans@cumin2002: START - Cookbook sre.experimental.reimage for host sretest1001.eqiad.wmnet
06:52 volans@cumin2002: END (FAIL) - Cookbook sre.experimental.reimage (exit_code=99) for host sretest1002.eqiad.wmnet
06:52 volans@cumin2002: START - Cookbook sre.experimental.reimage for host sretest1002.eqiad.wmnet
06:42 volans: installed spicerack 1.0.2 on cumin2002
05:10 marostegui: Remove flaggedimages from s6 T290340
02:42 mwdebug-deploy@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
02:39 mwdebug-deploy@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
02:15 mwdebug-deploy@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
02:12 mwdebug-deploy@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
01:26 eileen: civicrm revision changed from ef5367bffc to b8f756b60e, config revision is 77cb7ec866

2021-09-27

23:42 mwdebug-deploy@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
23:40 krinkle@deploy1002: Synchronized docroot/wikipedia.org/speed-tests/: I82f072 (duration: 00m 59s)
23:39 mwdebug-deploy@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
23:30 mwdebug-deploy@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
23:27 mwdebug-deploy@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
23:17 urbanecm@deploy1002: Synchronized wmf-config/InitialiseSettings.php: 1891d28: Deploy Growth features to 100% of newcomers of small wikis (T291876) (duration: 00m 57s)
22:58 legoktm@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'shellbox-syntaxhighlight' for release 'main' .
22:57 legoktm@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'shellbox-syntaxhighlight' for release 'main' .
22:48 mwdebug-deploy@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
22:44 mwdebug-deploy@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
22:34 legoktm@deploy1002: Synchronized wmf-config/InitialiseSettings.php: Have SyntaxHighlight use Shellbox on group1 wikis too (T289227) (duration: 00m 57s)
22:31 mwdebug-deploy@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
22:28 mwdebug-deploy@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
22:27 legoktm@deploy1002: Synchronized wmf-config/CommonSettings.php: Have PdfHandler use Shellbox service on group0 wikis (T289228) (2/2) (duration: 00m 56s)
22:26 legoktm@deploy1002: Synchronized wmf-config/InitialiseSettings.php: Have PdfHandler use Shellbox service on group0 wikis (T289228) (1/2) (duration: 00m 57s)
22:25 legoktm@deploy1002: sync-file aborted: Have PdfHandler use Shellbox service on group0 wikis (T289228) (duration: 00m 00s)
22:23 maryum: deployed security patch for T291696
22:14 legoktm@deploy1002: Synchronized wmf-config/CommonSettings.php: Have PagedTiffHandler use Shellbox service on group0 wikis (T289228) (2/2) (duration: 00m 58s)
22:14 mwdebug-deploy@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
22:13 legoktm@deploy1002: Synchronized wmf-config/InitialiseSettings.php: Have PagedTiffHandler use Shellbox service on group0 wikis (T289228) (1/2) (duration: 00m 57s)
22:11 mwdebug-deploy@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
21:41 tzatziki: re-running `extensions/SecurePoll/cli/wm-scripts/makeGlobalVoterList.php` for MCDC elections (in a screen this time) (https://phabricator.wikimedia.org/T291668)
21:37 mwdebug-deploy@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
21:36 mutante: puppetmaster2001 - systemctl disable sync-puppet-ca, systemctl unmask sync-puppet-ca, rm /usr/lib/systemd/system/sync-puppet-ca.*, systemctl stop sync-puppet-ca.timer
21:34 mwdebug-deploy@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
21:34 pt1979@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
21:33 tzatziki: running `extensions/SecurePoll/cli/wm-scripts/makeGlobalVoterList.php` for MCDC elections
21:29 mutante: puppetmaster2001 - rm /usr/lib/systemd/system/sync-puppet-ca.*
21:28 pt1979@cumin2002: START - Cookbook sre.dns.netbox
21:25 mwdebug-deploy@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
21:24 mutante: puppetmaster2001 systemctl reset-failed
21:22 mwdebug-deploy@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
21:20 legoktm@deploy1002: Synchronized wmf-config/CommonSettings.php: Set $wgTimelineFonts and send all Timeline generation to Shellbox (T289226) (2/2) (duration: 00m 56s)
21:18 legoktm@deploy1002: Synchronized wmf-config/InitialiseSettings.php: Set $wgTimelineFonts and send all Timeline generation to Shellbox (T289226) (1/2) (duration: 00m 56s)
21:16 mutante: puppetmaster2001 - /usr/bin/rsync -avz --delete puppetmaster1001.eqiad.wmnet::puppet_ca /var/lib/puppet/server/ssl/ca
21:14 mwdebug-deploy@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
21:10 mwdebug-deploy@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
21:01 mwdebug-deploy@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
21:01 legoktm@deploy1002: Synchronized docroot/: Configure Timeline like most other extensions (4/3) (duration: 00m 56s)
20:59 legoktm@deploy1002: Synchronized wmf-config/: Configure Timeline like most other extensions (3/3) (duration: 00m 57s)
20:59 mwdebug-deploy@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
20:56 legoktm@deploy1002: Synchronized wmf-config/CommonSettings.php: Configure Timeline like most other extensions (2/3) (duration: 00m 56s)
20:50 legoktm@deploy1002: Synchronized wmf-config/InitialiseSettings.php: Configure Timeline like most other extensions (1/3) (duration: 00m 58s)
20:42 mutante: [puppetmaster2001:~] $ sudo systemctl start sync-puppet-volatile
20:28 brennen: gitlab1001: done with user renames, restarting gitlab to apply session duration value after a reconfiguration
20:06 brennen: gitlab1001: ~1hr downtime to attempt migration of usernames to shell uid (T288392)
20:00 mutante: ms-be2036 - remove commeeted out swift-drive-audit cron
19:55 eileen: civicrm revision changed from 18228490ae to ef5367bffc, config revision is 77cb7ec866
19:32 otto@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'eventgate-main' for release 'canary' .
19:32 otto@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'eventgate-main' for release 'production' .
19:28 otto@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'eventgate-main' for release 'canary' .
19:28 otto@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'eventgate-main' for release 'production' .
19:26 otto@deploy1002: helmfile [staging] Ran 'sync' command on namespace 'eventgate-main' for release 'production' .
19:24 otto@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'eventgate-analytics-external' for release 'canary' .
19:24 otto@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'eventgate-analytics-external' for release 'production' .
19:22 otto@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'eventgate-analytics-external' for release 'canary' .
19:22 otto@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'eventgate-analytics-external' for release 'production' .
19:20 otto@deploy1002: helmfile [staging] Ran 'sync' command on namespace 'eventgate-analytics-external' for release 'production' .
19:16 otto@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'eventgate-analytics' for release 'canary' .
19:16 otto@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'eventgate-analytics' for release 'production' .
19:16 cmjohnson@cumin1001: END (FAIL) - Cookbook sre.hosts.downtime (exit_code=99) for 2:00:00 on elastic1082.eqiad.wmnet with reason: REIMAGE
19:15 cmjohnson@cumin1001: END (FAIL) - Cookbook sre.hosts.downtime (exit_code=99) for 2:00:00 on elastic1081.eqiad.wmnet with reason: REIMAGE
19:14 mwdebug-deploy@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
19:14 cmjohnson@cumin1001: END (FAIL) - Cookbook sre.hosts.downtime (exit_code=99) for 2:00:00 on elastic1080.eqiad.wmnet with reason: REIMAGE
19:14 cmjohnson@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on elastic1083.eqiad.wmnet with reason: REIMAGE
19:13 otto@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'eventgate-analytics' for release 'canary' .
19:13 otto@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'eventgate-analytics' for release 'production' .
19:12 cmjohnson@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on elastic1079.eqiad.wmnet with reason: REIMAGE
19:12 mwdebug-deploy@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
19:11 cmjohnson@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on elastic1083.eqiad.wmnet with reason: REIMAGE
19:11 otto@deploy1002: helmfile [staging] Ran 'sync' command on namespace 'eventgate-analytics' for release 'production' .
19:11 otto@deploy1002: helmfile [staging] Ran 'sync' command on namespace 'eventgate-analytics' for release 'canary' .
19:11 cmjohnson@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on elastic1082.eqiad.wmnet with reason: REIMAGE
19:10 cmjohnson@cumin1001: END (FAIL) - Cookbook sre.hosts.downtime (exit_code=99) for 2:00:00 on elastic1078.eqiad.wmnet with reason: REIMAGE
19:10 cmjohnson@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on elastic1081.eqiad.wmnet with reason: REIMAGE
19:09 cmjohnson@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on elastic1080.eqiad.wmnet with reason: REIMAGE
19:08 otto@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'eventgate-logging-external' for release 'canary' .
19:08 otto@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'eventgate-logging-external' for release 'production' .
19:08 cmjohnson@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on elastic1079.eqiad.wmnet with reason: REIMAGE
19:08 cmjohnson@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on elastic1078.eqiad.wmnet with reason: REIMAGE
19:05 cmjohnson@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on elastic1077.eqiad.wmnet with reason: REIMAGE
19:05 otto@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'eventgate-logging-external' for release 'canary' .
19:05 otto@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'eventgate-logging-external' for release 'production' .
19:03 cmjohnson@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on elastic1076.eqiad.wmnet with reason: REIMAGE
19:03 mwdebug-deploy@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
19:02 cmjohnson@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on elastic1077.eqiad.wmnet with reason: REIMAGE
19:01 cmjohnson@cumin1001: END (FAIL) - Cookbook sre.hosts.downtime (exit_code=99) for 2:00:00 on elastic1075.eqiad.wmnet with reason: REIMAGE
19:00 cmjohnson@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on elastic1076.eqiad.wmnet with reason: REIMAGE
18:59 mwdebug-deploy@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
18:59 cmjohnson@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on elastic1075.eqiad.wmnet with reason: REIMAGE
18:57 cmjohnson@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on elastic1074.eqiad.wmnet with reason: REIMAGE
18:56 otto@deploy1002: Synchronized wmf-config/CommonSettings.php: REVERT: Enable x_client_ip_forwarding_enabled for eventgate-analytics and eventgate-analytics-external - T288853 (duration: 00m 56s)
18:55 cmjohnson@cumin1001: END (FAIL) - Cookbook sre.hosts.downtime (exit_code=99) for 2:00:00 on elastic1073.eqiad.wmnet with reason: REIMAGE
18:53 cmjohnson@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on elastic1074.eqiad.wmnet with reason: REIMAGE
18:53 cmjohnson@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on elastic1073.eqiad.wmnet with reason: REIMAGE
18:52 otto@deploy1002: scap failed: average error rate on 6/9 canaries increased by 10x (rerun with --force to override this check, see https://logstash.wikimedia.org/goto/83629bcb5560d11e61d3085c89dd9ed6 for details)
18:50 cmjohnson@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on elastic1072.eqiad.wmnet with reason: REIMAGE
18:48 cmjohnson@cumin1001: END (FAIL) - Cookbook sre.hosts.downtime (exit_code=99) for 2:00:00 on elastic1069.eqiad.wmnet with reason: REIMAGE
18:47 cmjohnson@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on elastic1072.eqiad.wmnet with reason: REIMAGE
18:46 cmjohnson@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on elastic1069.eqiad.wmnet with reason: REIMAGE
18:43 cmjohnson@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on elastic1071.eqiad.wmnet with reason: REIMAGE
18:42 otto@deploy1002: helmfile [staging] Ran 'sync' command on namespace 'eventgate-logging-external' for release 'production' .
18:41 cmjohnson@cumin1001: END (FAIL) - Cookbook sre.hosts.downtime (exit_code=99) for 2:00:00 on elastic1070.eqiad.wmnet with reason: REIMAGE
18:41 Amir1: Deployed patch for T284419 second time
18:40 cmjohnson@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on elastic1071.eqiad.wmnet with reason: REIMAGE
18:39 cmjohnson@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on elastic1070.eqiad.wmnet with reason: REIMAGE
18:35 cmjohnson@cumin1001: END (FAIL) - Cookbook sre.hosts.downtime (exit_code=99) for 2:00:00 on elastic1068.eqiad.wmnet with reason: REIMAGE
18:33 cmjohnson@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on elastic1068.eqiad.wmnet with reason: REIMAGE
18:32 urbanecm@deploy1002: Synchronized php-1.38.0-wmf.1/includes/changetags/ChangeTags.php: b1f4b4e: ChangeTags: Set interface flag when parsing tag names (T291776) (duration: 00m 56s)
18:30 cmjohnson1: updating firmware on sessionstore1003
18:21 mwdebug-deploy@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
18:18 mwdebug-deploy@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
18:16 Amir1: Deployed patch for T284419
18:10 urbanecm@deploy1002: Synchronized wmf-config/InitialiseSettings.php: 2cb6f47: Growth: Promote 208 wikis out of dark mode (T290582) (duration: 00m 56s)
17:54 mwdebug-deploy@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
17:51 mwdebug-deploy@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
17:46 ladsgroup@deploy1002: Synchronized php-1.38.0-wmf.1/includes/Title.php: Backport: Expand local URLs to absolute URLs in ParserOutput (T263581), Part IV (duration: 00m 56s)
17:44 ladsgroup@deploy1002: Synchronized php-1.38.0-wmf.1/includes/parser/ParserCache.php: Backport: Expand local URLs to absolute URLs in ParserOutput (T263581), Part III (duration: 00m 56s)
17:43 ladsgroup@deploy1002: Synchronized php-1.38.0-wmf.1/includes/parser/ParserOutput.php: Backport: Expand local URLs to absolute URLs in ParserOutput (T263581), Part II (duration: 00m 57s)
17:42 ladsgroup@deploy1002: Synchronized php-1.38.0-wmf.1/includes/page/Article.php: Backport: Expand local URLs to absolute URLs in ParserOutput (T263581), Part I (duration: 00m 59s)
17:39 volans: uploaded spicerack_1.0.2 to apt.wikimedia.org buster-wikimedia,bullseye-wikimedia
17:32 mwdebug-deploy@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
17:29 mwdebug-deploy@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
17:27 dzahn@cumin1001: conftool action : set/weight=10; selector: name=ldap-replica2006.wikimedia.org
17:26 dzahn@cumin1001: conftool action : set/weight=10; selector: name=ldap-replica2006.wikimedia.org
17:26 dzahn@cumin1001: conftool action : set/weight=10; selector: name=ldap-replica2005.wikimedia.org
17:26 dzahn@cumin1001: conftool action : set/weight=10; selector: name=ldap-replica200*.wikimedia.org
17:25 dzahn@cumin1001: conftool action : set/weight=10; selector: name=ldap-replica1004.wikimedia.org
17:25 dzahn@cumin1001: conftool action : set/weight=10; selector: name=ldap-replica1003.wikimedia.org
17:24 dzahn@cumin1001: conftool action : set/weight=10; selector: name=ldap-replica*
17:24 dzahn@cumin1001: conftool action : set/weight=10; selector: name=ldap-ro*.eqiad.wmnet
16:18 otto@puppetmaster1001: conftool action : set/ttl=300; selector: dnsdisc=eventgate-logging-external
16:16 otto@puppetmaster1001: conftool action : set/pooled=true; selector: dnsdisc=eventgate-logging-external,name=codfw
16:14 otto@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'eventgate-logging-external' for release 'production' .
16:14 otto@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'eventgate-logging-external' for release 'canary' .
16:12 otto@deploy1002: helmfile [staging] Ran 'sync' command on namespace 'eventgate-logging-external' for release 'production' .
16:10 ottomata: reverting eventgate-logging-external chart change in codfw - T291504
16:08 urbanecm: [urbanecm@mwmaint1002 ~]$ scap pull # T291836
16:01 urbanecm: Livehack debugging at mwmaint1002 for T291836
14:41 urbanecm: /usr/local/bin/foreachwikiindblist /srv/mediawiki/dblists/growthexperiments.dblist extensions/GrowthExperiments/maintenance/updateMenteeData.php --statsd # measuring time backports saved
14:39 mwdebug-deploy@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
14:38 otto@puppetmaster1001: conftool action : set/pooled=false; selector: dnsdisc=eventgate-logging-external,name=codfw
14:36 urbanecm@deploy1002: Synchronized php-1.38.0-wmf.1/extensions/GrowthExperiments/: 08f1e73: 3b154db: GrowthExperiments backports (T290609, T291658) (duration: 00m 58s)
14:36 mwdebug-deploy@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
14:33 volker-e@deploy1002: Finished deploy [design/style-guide@9b3b0fb]: Deploy design/style-guide: 9b3b0fb “Apps”: Fix typos and unify orthography (#491) (duration: 00m 06s)
14:33 volker-e@deploy1002: Started deploy [design/style-guide@9b3b0fb]: Deploy design/style-guide: 9b3b0fb “Apps”: Fix typos and unify orthography (#491)
14:30 otto@puppetmaster1001: conftool action : set/pooled=true; selector: dnsdisc=eventgate-logging-external,name=codfw
14:27 mwdebug-deploy@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
14:24 mwdebug-deploy@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
14:14 otto@deploy1002: conftool action : set/pooled=true; selector: dnsdisc=eventgate-logging-external,name=codfw
14:11 otto@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'eventgate-logging-external' for release 'canary' .
14:11 otto@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'eventgate-logging-external' for release 'production' .
13:59 otto@puppetmaster1001: conftool action : set/pooled=false; selector: dnsdisc=eventgate-logging-external,name=codfw
13:58 ottomata: beginning re-deploy of eventgate-logging-external - https://phabricator.wikimedia.org/T291504#7380252
13:57 otto@puppetmaster1001: conftool action : set/ttl=10; selector: dnsdisc=eventgate-logging-external
13:52 oblivian@deploy1002: helmfile [staging] Ran 'sync' command on namespace 'eventgate-logging-external' for release 'production' .
13:48 otto@deploy1002: helmfile [staging] Ran 'sync' command on namespace 'eventgate-logging-external' for release 'production' .
13:36 mbsantos@deploy1002: Finished deploy [kartotherian/deploy@04d2df4]: tegola: use eqiad discovery endpoin (duration: 00m 15s)
13:35 mbsantos@deploy1002: Started deploy [kartotherian/deploy@04d2df4]: tegola: use eqiad discovery endpoin
11:45 marostegui: Upgrade es4 in codfw to 10.4.21
11:43 marostegui: Turn off es2021 for onsite maintenance T290327
11:09 volans: re-enabled puppet on install hosts after deployment of g/723996 - T221388
11:02 volans: disabling puppet on install hosts to deploy 723996 - T221388
10:35 filippo@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host thanos-fe2003.codfw.wmnet
10:29 filippo@cumin1001: START - Cookbook sre.hosts.reboot-single for host thanos-fe2003.codfw.wmnet
10:02 jmm@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host cumin2002.codfw.wmnet
09:55 filippo@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host thanos-fe2002.codfw.wmnet
09:53 jmm@cumin1001: START - Cookbook sre.hosts.reboot-single for host cumin2002.codfw.wmnet
09:51 filippo@cumin1001: START - Cookbook sre.hosts.reboot-single for host thanos-fe2002.codfw.wmnet
09:50 filippo@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host thanos-fe2001.codfw.wmnet
09:45 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host mx2001.wikimedia.org
09:44 filippo@cumin1001: START - Cookbook sre.hosts.reboot-single for host thanos-fe2001.codfw.wmnet
09:43 filippo@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host thanos-fe1003.eqiad.wmnet
09:42 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host mx2001.wikimedia.org
09:38 marostegui: Optimize table commonswiki.image on codfw (s4 will show lag) - T288273
09:38 aborrero@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host cloudcontrol1004.wikimedia.org
09:37 filippo@cumin1001: START - Cookbook sre.hosts.reboot-single for host thanos-fe1003.eqiad.wmnet
09:36 filippo@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host thanos-fe1002.eqiad.wmnet
09:34 jbond@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 0:10:00 on ldap-replica2006.wikimedia.org with reason: reboot - T291813
09:33 jbond@cumin2002: START - Cookbook sre.hosts.downtime for 0:10:00 on ldap-replica2006.wikimedia.org with reason: reboot - T291813
09:31 jbond@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 0:10:00 on ldap-replica2005.wikimedia.org with reason: reboot - T291813
09:30 jbond@cumin2002: START - Cookbook sre.hosts.downtime for 0:10:00 on ldap-replica2005.wikimedia.org with reason: reboot - T291813
09:30 filippo@cumin1001: START - Cookbook sre.hosts.reboot-single for host thanos-fe1002.eqiad.wmnet
09:29 moritzm: systemctl reset-failed networking T273026
09:29 filippo@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host thanos-fe1001.eqiad.wmnet
09:27 jbond@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 0:10:00 on ldap-replica1004.wikimedia.org with reason: reboot - T291813
09:27 jbond@cumin1001: START - Cookbook sre.hosts.downtime for 0:10:00 on ldap-replica1004.wikimedia.org with reason: reboot - T291813
09:27 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host mx1001.wikimedia.org
09:24 aborrero@cumin1001: START - Cookbook sre.hosts.reboot-single for host cloudcontrol1004.wikimedia.org
09:23 filippo@cumin1001: START - Cookbook sre.hosts.reboot-single for host thanos-fe1001.eqiad.wmnet
09:22 filippo@cumin1001: END (FAIL) - Cookbook sre.hosts.reboot-single (exit_code=99) for host thanos-fe1001.eqiad.wmnet
09:22 filippo@cumin1001: START - Cookbook sre.hosts.reboot-single for host thanos-fe1001.eqiad.wmnet
09:18 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host mx1001.wikimedia.org
09:17 jbond@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 0:10:00 on ldap-replica1003.wikimedia.org with reason: reboot - T291813
09:17 jbond@cumin1001: START - Cookbook sre.hosts.downtime for 0:10:00 on ldap-replica1003.wikimedia.org with reason: reboot - T291813
09:13 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host failoid1002.eqiad.wmnet
09:10 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host failoid1002.eqiad.wmnet
09:09 jbond@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 0:10:00 on people1003.eqiad.wmnet with reason: reboot - T291813
09:09 jbond@cumin1001: START - Cookbook sre.hosts.downtime for 0:10:00 on people1003.eqiad.wmnet with reason: reboot - T291813
09:07 jbond@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 0:10:00 on people2002.codfw.wmnet with reason: reboot - T291813
09:07 jbond@cumin1001: START - Cookbook sre.hosts.downtime for 0:10:00 on people2002.codfw.wmnet with reason: reboot - T291813
09:07 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host failoid2002.codfw.wmnet
09:04 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host failoid2002.codfw.wmnet
08:35 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host copernicium.wikimedia.org
08:35 dcausse@deploy1002: helmfile [staging] Ran 'sync' command on namespace 'rdf-streaming-updater' for release 'main' .
08:30 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host copernicium.wikimedia.org
08:27 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host theemin.codfw.wmnet
08:24 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host theemin.codfw.wmnet
08:14 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host sretest1002.eqiad.wmnet
08:07 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host sretest1002.eqiad.wmnet
07:18 godog: swift eqiad-prod: add weight to ms-be10[64-67] - T290546
07:07 marostegui: Remove flaggedimages from s3 T290340
06:13 effie: rolling restart php-fpm in eqiad - T291052
06:07 effie: upgrade php7.2 in eqiad - T291052
05:56 marostegui: Drop labswiki from m5 T167973
05:28 marostegui: Remove flaggedimages from s2 T290340

2021-09-26

14:51 volker-e@deploy1002: Finished deploy [design/style-guide@aac0ae9]: Deploy design/style-guide: aac0ae9 “Apps”: Fix image path (#490) (duration: 00m 06s)
14:51 volker-e@deploy1002: Started deploy [design/style-guide@aac0ae9]: Deploy design/style-guide: aac0ae9 “Apps”: Fix image path (#490)
03:16 legoktm: killed queries on db1099
03:14 legoktm: killing queries on db1105

2021-09-25

02:00 legoktm@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'shellbox-timeline' for release 'main' .
01:27 legoktm@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'shellbox-timeline' for release 'main' .
01:24 legoktm@deploy1002: helmfile [staging] Ran 'sync' command on namespace 'shellbox-timeline' for release 'main' .

2021-09-24

20:00 volker-e@deploy1002: Finished deploy [design/style-guide@362c6b1]: Deploy design/style-guide: 362c6b1 “Components”: Fix index link (#489) (duration: 00m 06s)
20:00 volker-e@deploy1002: Started deploy [design/style-guide@362c6b1]: Deploy design/style-guide: 362c6b1 “Components”: Fix index link (#489)
19:33 volker-e@deploy1002: Finished deploy [design/style-guide@6585e79]: Deploy design/style-guide: 6585e79 “Apps”: Add Apps x Design System section (#487) (duration: 00m 07s)
19:33 volker-e@deploy1002: Started deploy [design/style-guide@6585e79]: Deploy design/style-guide: 6585e79 “Apps”: Add Apps x Design System section (#487)
19:07 mwdebug-deploy@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
19:04 mwdebug-deploy@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
18:57 legoktm@deploy1002: Synchronized php-1.38.0-wmf.1/includes/MovePage.php: MovePage: don't create a recent change for a redirect (T291677) (duration: 00m 57s)
18:54 legoktm@deploy1002: Synchronized php-1.38.0-wmf.1/extensions/PageTriage/: Revert "Remove deprecated date.js library" (T291675) (duration: 00m 59s)
18:53 legoktm@deploy1002: sync-file aborted: (no justification provided) (duration: 00m 00s)
18:13 legoktm@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'zotero' for release 'production' .
18:12 legoktm@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'zotero' for release 'production' .
17:20 elukey@cumin1001: END (PASS) - Cookbook sre.kafka.roll-restart-mirror-maker (exit_code=0) restart MirrorMaker for Kafka A:kafka-mirror-maker-jumbo-eqiad cluster: Roll restart of jvm daemons. - elukey@cumin1001
17:02 elukey@cumin1001: START - Cookbook sre.kafka.roll-restart-mirror-maker restart MirrorMaker for Kafka A:kafka-mirror-maker-jumbo-eqiad cluster: Roll restart of jvm daemons. - elukey@cumin1001
16:35 elukey@cumin1001: END (PASS) - Cookbook sre.kafka.roll-restart-brokers (exit_code=0) for Kafka A:kafka-jumbo-eqiad cluster: Roll restart of jvm daemons for openjdk upgrade. - elukey@cumin1001
15:59 elukey@cumin1001: END (PASS) - Cookbook sre.zookeeper.roll-restart-zookeeper (exit_code=0) for Zookeeper A:zookeeper-analytics cluster: Roll restart of jvm daemons. - elukey@cumin1001
15:53 elukey@cumin1001: START - Cookbook sre.zookeeper.roll-restart-zookeeper for Zookeeper A:zookeeper-analytics cluster: Roll restart of jvm daemons. - elukey@cumin1001
15:52 elukey@cumin1001: END (PASS) - Cookbook sre.zookeeper.roll-restart-zookeeper (exit_code=0) for Zookeeper A:zookeeper-druid-public cluster: Roll restart of jvm daemons. - elukey@cumin1001
15:46 elukey@cumin1001: START - Cookbook sre.zookeeper.roll-restart-zookeeper for Zookeeper A:zookeeper-druid-public cluster: Roll restart of jvm daemons. - elukey@cumin1001
15:23 elukey@cumin1001: END (PASS) - Cookbook sre.zookeeper.roll-restart-zookeeper (exit_code=0) for Zookeeper A:zookeeper-druid-analytics cluster: Roll restart of jvm daemons. - elukey@cumin1001
15:17 elukey@cumin1001: START - Cookbook sre.zookeeper.roll-restart-zookeeper for Zookeeper A:zookeeper-druid-analytics cluster: Roll restart of jvm daemons. - elukey@cumin1001
15:09 elukey: sudo cumin -m async -b2 "c:profile::analytics::cluster::hdfs_mount" "umount /mnt/hdfs" "mount /mnt/hdfs" - T288625
14:32 bd808@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'toolhub' for release 'main' .
14:07 cmjohnson@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
14:03 cmjohnson@cumin1001: START - Cookbook sre.dns.netbox
13:31 Amir1: start of rebuilding metadata of images in commons to make them use json
13:24 elukey@cumin1001: START - Cookbook sre.kafka.roll-restart-brokers for Kafka A:kafka-jumbo-eqiad cluster: Roll restart of jvm daemons for openjdk upgrade. - elukey@cumin1001
11:58 effie: upgrading scap on canaries - T291095
11:39 jiji@cumin1001: conftool action : set/pooled=true; selector: dnsdisc=tegola-vector-tiles
11:32 effie: uploading scap-4.0.0 to buster-wikimedia and stretch-wikimedia
11:17 effie: restart pybal in low traffic load balancers
10:44 jynus: corrupting and fixing image metadata on testwiki before running script on commons T290462
10:16 elukey@cumin1001: END (PASS) - Cookbook sre.druid.roll-restart-workers (exit_code=0) for Druid public cluster: Roll restart of Druid jvm daemons. - elukey@cumin1001
10:11 btullis@cumin1001: END (FAIL) - Cookbook sre.hadoop.roll-restart-masters (exit_code=99) restart masters for Hadoop analytics cluster: Restart of jvm daemons. - btullis@cumin1001
09:39 jynus: upgrade and restart db2099
09:32 btullis@cumin1001: START - Cookbook sre.hadoop.roll-restart-masters restart masters for Hadoop analytics cluster: Restart of jvm daemons. - btullis@cumin1001
09:29 btullis@cumin1001: END (PASS) - Cookbook sre.hadoop.roll-restart-masters (exit_code=0) restart masters for Hadoop test cluster: Restart of jvm daemons. - btullis@cumin1001
09:25 marostegui: Rename flaggedimages on db1096(ruwiki) and db1098(arwiki) T290340
09:25 elukey@cumin1001: START - Cookbook sre.druid.roll-restart-workers for Druid public cluster: Roll restart of Druid jvm daemons. - elukey@cumin1001
09:09 jynus: upgrade and restart db2139, db2101
09:03 btullis@cumin1001: START - Cookbook sre.hadoop.roll-restart-masters restart masters for Hadoop test cluster: Restart of jvm daemons. - btullis@cumin1001
08:35 elukey@cumin1001: END (PASS) - Cookbook sre.kafka.roll-restart-brokers (exit_code=0) for Kafka A:kafka-test-eqiad cluster: Roll restart of jvm daemons for openjdk upgrade. - elukey@cumin1001
08:22 jynus: upgrade and restart db2098 T290868
08:20 elukey@cumin1001: END (PASS) - Cookbook sre.druid.roll-restart-workers (exit_code=0) for Druid analytics cluster: Roll restart of Druid jvm daemons. - elukey@cumin1001
08:08 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.decommission (exit_code=0) for hosts mx2002.wikimedia.org
07:59 jmm@cumin2002: START - Cookbook sre.hosts.decommission for hosts mx2002.wikimedia.org
07:42 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.decommission (exit_code=0) for hosts mx1002.wikimedia.org
07:34 elukey@cumin1001: START - Cookbook sre.druid.roll-restart-workers for Druid analytics cluster: Roll restart of Druid jvm daemons. - elukey@cumin1001
07:17 elukey@cumin1001: END (PASS) - Cookbook sre.hadoop.roll-restart-workers (exit_code=0) restart workers for Hadoop test cluster: Roll restart of jvm daemons for openjdk upgrade. - elukey@cumin1001
07:11 jmm@cumin2002: START - Cookbook sre.hosts.decommission for hosts mx1002.wikimedia.org
07:01 elukey@cumin1001: START - Cookbook sre.hadoop.roll-restart-workers restart workers for Hadoop test cluster: Roll restart of jvm daemons for openjdk upgrade. - elukey@cumin1001
07:01 elukey@cumin1001: END (ERROR) - Cookbook sre.hadoop.roll-restart-workers (exit_code=97) restart workers for Hadoop test cluster: Roll restart of jvm daemons for openjdk upgrade. - elukey@cumin1001
07:00 elukey@cumin1001: START - Cookbook sre.hadoop.roll-restart-workers restart workers for Hadoop test cluster: Roll restart of jvm daemons for openjdk upgrade. - elukey@cumin1001
06:55 elukey@cumin1001: START - Cookbook sre.kafka.roll-restart-brokers for Kafka A:kafka-test-eqiad cluster: Roll restart of jvm daemons for openjdk upgrade. - elukey@cumin1001
06:53 elukey@cumin1001: END (PASS) - Cookbook sre.druid.roll-restart-workers (exit_code=0) for Druid test cluster: Roll restart of Druid jvm daemons. - elukey@cumin1001
06:44 elukey@cumin1001: START - Cookbook sre.druid.roll-restart-workers for Druid test cluster: Roll restart of Druid jvm daemons. - elukey@cumin1001
06:41 elukey@cumin1001: END (PASS) - Cookbook sre.presto.roll-restart-workers (exit_code=0) for Presto analytics cluster: Roll restart of all Presto's jvm daemons. - elukey@cumin1001
06:30 elukey@cumin1001: START - Cookbook sre.presto.roll-restart-workers for Presto analytics cluster: Roll restart of all Presto's jvm daemons. - elukey@cumin1001
06:26 elukey: restart archiva on archiva1002 to pick up new openjdk upgrades
06:11 marostegui@cumin1001: dbctl commit (dc=all): 'db1177 (re)pooling @ 100%: After fixing some indexes T291584', diff saved to https://phabricator.wikimedia.org/P17324 and previous config saved to /var/cache/conftool/dbconfig/20210924-061105-root.json
05:56 marostegui@cumin1001: dbctl commit (dc=all): 'db1177 (re)pooling @ 75%: After fixing some indexes T291584', diff saved to https://phabricator.wikimedia.org/P17323 and previous config saved to /var/cache/conftool/dbconfig/20210924-055601-root.json
05:40 marostegui@cumin1001: dbctl commit (dc=all): 'db1177 (re)pooling @ 50%: After fixing some indexes T291584', diff saved to https://phabricator.wikimedia.org/P17322 and previous config saved to /var/cache/conftool/dbconfig/20210924-054057-root.json
05:25 marostegui@cumin1001: dbctl commit (dc=all): 'db1177 (re)pooling @ 25%: After fixing some indexes T291584', diff saved to https://phabricator.wikimedia.org/P17321 and previous config saved to /var/cache/conftool/dbconfig/20210924-052554-root.json
05:10 marostegui@cumin1001: dbctl commit (dc=all): 'db1177 (re)pooling @ 10%: After fixing some indexes T291584', diff saved to https://phabricator.wikimedia.org/P17320 and previous config saved to /var/cache/conftool/dbconfig/20210924-051050-root.json
05:07 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db1177 T291584', diff saved to https://phabricator.wikimedia.org/P17319 and previous config saved to /var/cache/conftool/dbconfig/20210924-050739-marostegui.json
01:27 mwdebug-deploy@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
01:23 mwdebug-deploy@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
01:16 krinkle@deploy1002: Synchronized wmf-config/profiler.php: I25f4b70b9d4b (duration: 00m 57s)
00:39 mwdebug-deploy@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
00:39 legoktm@deploy1002: Synchronized php-1.38.0-wmf.1/resources/src/mediawiki.searchSuggest/searchSuggest.js: Hiding fallback button depends on HTML order (T291272) (duration: 00m 57s)
00:36 mwdebug-deploy@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .

2021-09-23

23:38 foks: running wm-scripts/mcdc2021/populateEditCount.php on each wiki (s1 thru s8 simultaneously) https://phabricator.wikimedia.org/T291668
22:58 bd808@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'toolhub' for release 'main' .
22:58 foks: creating `mcdc2021_edits` table on each wiki for elections voterlist https://phabricator.wikimedia.org/T291668
22:37 mwdebug-deploy@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
22:34 mwdebug-deploy@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
22:33 reedy@deploy1002: Synchronized php-1.38.0-wmf.1/extensions/SecurePoll/cli/wm-scripts/: T291668 (duration: 00m 57s)
22:27 ryankemper: T280001 `ryankemper@cumin1001:~$ sudo cumin 'P{puppetmaster*}' 'sudo rm -fv /var/run/confd-template/.wcqs*'` complete, forcing recheck
22:27 ryankemper: T280001 The pooling of the `wcqs*` hosts has gotten `/srv/config-master/pybal/${DC}/wcqs` to render, but we need to clear away the stale error files to get rid of the associated warnings `Stale template error files present for '/srv/config-master/pybal/${DC}/wcqs'` => `sudo rm -fv /var/run/confd-template/.wcqs*`
22:20 mwdebug-deploy@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
22:18 ryankemper: T280001 `ryankemper@puppetmaster1001:/srv$ sudo confctl select 'name=wcqs.*' set/pooled=yes:weight=10`
22:17 ryankemper@puppetmaster1001: conftool action : set/pooled=yes:weight=10; selector: name=wcqs.*
22:17 mwdebug-deploy@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
22:13 ryankemper: T280001 [codfw] `root@lvs2010:/home/ryankemper# ipvsadm -Dt 10.2.2.67:443` and `root@lvs2009:/home/ryankemper# ipvsadm -Dt 10.2.2.67:443`
22:13 ryankemper: T280001 [eqiad] `root@lvs1016:/home/ryankemper# ipvsadm -Dt 10.2.1.67:443` and `root@lvs1015:/home/ryankemper# ipvsadm -Dt 10.2.1.67:443`
22:06 ryankemper: T280001 Restarted pybal on low-traffic primaries: `ryankemper@cumin1001:~$ sudo cumin 'P{lvs2009*,lvs1015*}' 'sudo systemctl restart pybal'`
22:06 ryankemper: T280001 Waited 120s and checked https://icinga.wikimedia.org/alerts, proceeding to primary low-traffic hosts `lvs2009` and `lvs1015`
22:05 ryankemper: T280001 [Cleanup required] `TCP 10.2.1.67:443 wrr` shows up on `ryankemper@lvs1016:~$ sudo ipvsadm -L -n` and `TCP 10.2.2.67:443 wrr` shows up on `ryankemper@lvs2010:~$ sudo ipvsadm -L -n` (erroneous)
22:05 ryankemper: T280001 [Sanity check] `TCP 10.2.2.67:443 wrr` shows up on `ryankemper@lvs1016:~$ sudo ipvsadm -L -n` and `TCP 10.2.1.67:443 wrr` shows up on `ryankemper@lvs2010:~$ sudo ipvsadm -L -n` as expected
22:04 ryankemper: T280001 Restarted pybal on low-traffic backups: `ryankemper@cumin1001:~$ sudo cumin 'P{lvs2010*,lvs1016*}' 'sudo systemctl restart pybal'`
22:03 ryankemper: T280001 Restarting pybal on low-traffic backups `lvs2010` and `lvs1016`...
22:03 ryankemper: T280001 Ran puppet on all lvs hosts: `ryankemper@cumin1001:~$ sudo cumin 'O:lvs::balancer' 'sudo run-puppet-agent'`
22:00 ryankemper: T280001 Running puppet on all lvs hosts: `ryankemper@cumin1001:~$ sudo cumin 'O:lvs::balancer' 'sudo run-puppet-agent'`...
21:59 ryankemper: T280001 Merged https://gerrit.wikimedia.org/r/c/operations/puppet/+/723315, ran puppet agent on `wcqs*` to fix `local lo:LVS destination IPs`
21:59 ryankemper: T280001 Swapped the netbox IPAM addresses back, after erroneously swapping them earlier. `sre.dns.netbox` cookbook run complete as well
21:57 ryankemper@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
21:53 ryankemper@cumin1001: START - Cookbook sre.dns.netbox
21:43 bd808@deploy1002: helmfile [staging] Ran 'sync' command on namespace 'toolhub' for release 'main' .
21:43 foks: altering some rows in the `securepoll_elections` table on metawiki
21:36 ryankemper: T280001 `sre.dns.netbox` run complete, netbox IP mixup *should* be resolved
21:33 ryankemper@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
21:27 ryankemper: T280001 `ryankemper@cumin1001:~$ sudo -i cookbook sre.dns.netbox -t T280001 'Fix swapped wcqs.svc.[eqiad,codfw].wmnet'` in progress (note: no `sudo authdns-update` will be necessary because that's just for `operations/dns` repo changes; we only need to run the netbox cookbook)
21:24 ryankemper@cumin1001: START - Cookbook sre.dns.netbox
21:23 ryankemper: T280001 Swapped IPs of https://netbox.wikimedia.org/ipam/ip-addresses/9062/ and https://netbox.wikimedia.org/ipam/ip-addresses/9063; this should fix the issue where eqiad and codfw were swapped in netbox (my error)...still need to run netbox cookbook and possibly a manual `sudo authdns-update`
21:19 ryankemper: The pybal side of the changes looks good, but I made a mistake with the assigning of IPs in netbox; `wcqs.svc.eqiad.wmnet` is routing to where codfw should go and vice versa. Fixing...
21:05 ryankemper: T280001 Restarted pybal on low-traffic primaries: `ryankemper@cumin1001:~$ sudo cumin 'P{lvs2009*,lvs1015*}' 'sudo systemctl restart pybal'`
21:04 ryankemper: T280001 Restarting pybal on low-traffic primaries `lvs2009` and `lvs1015`...
21:04 ryankemper: T280001 Waited 120s and checked https://icinga.wikimedia.org/alerts, proceeding to primary low-traffic hosts `lvs2009` and `lvs1015`
21:00 ryankemper: T280001 Sanity check of `sudo ipvsadm -L -n` on low-traffic backups `lvs2010` and `lvs1016` looks good, proceeding
21:00 ryankemper: T280001 `TCP 10.2.1.67:443 wrr` shows up on `ryankemper@lvs1016:~$ sudo ipvsadm -L -n ` and `TCP 10.2.2.67:443 wrr` shows up on `ryankemper@lvs2010:~$ sudo ipvsadm -L -n` as expected
20:58 brennen: canceling backport training window for 2021-09-23
20:54 ryankemper: T280001 Restarted pybal on backup low-traffic hosts: `ryankemper@cumin1001:~$ sudo cumin 'P{lvs2010*,lvs1016*}' 'sudo systemctl restart pybal'`
20:53 ryankemper: T280001 Restarting pybal on backup low-traffic hosts `lvs2010` and `lvs1016`...
20:53 ryankemper: T280001 Ran puppet on all lvs hosts => `ryankemper@cumin1001:~$ sudo cumin 'O:lvs::balancer' 'sudo run-puppet-agent'`
20:47 ryankemper: T280001 Merging https://gerrit.wikimedia.org/r/c/operations/puppet/+/723254 to proceed with `lvs_setup` state change; will be restarting low-traffic lvs hosts shortly
20:04 dduvall: 1.38.0-wmf.1 promoted to all wikis. no new errors or rising rates (T281165)
20:02 mwdebug-deploy@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
19:58 mwdebug-deploy@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
19:50 dduvall@deploy1002: rebuilt and synchronized wikiversions files: all wikis to 1.38.0-wmf.1
19:40 kostajh: UTC morning backport window done
19:40 mwdebug-deploy@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
19:39 kharlan@deploy1002: Synchronized php-1.38.0-wmf.1/extensions/GrowthExperiments/includes/HomepageHooks.php: Backport: Suggested Edits: Update editor preference for tasks that shouldn't open the editor by default (T291020) (duration: 01m 05s)
19:36 mwdebug-deploy@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
19:13 mwdebug-deploy@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
19:10 mwdebug-deploy@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
19:02 krinkle@deploy1002: Synchronized wmf-config/InitialiseSettings.php: I3323ce (duration: 01m 07s)
18:58 ryankemper: T280001 Merged https://gerrit.wikimedia.org/r/c/operations/puppet/+/721089 to see if it resolves the `confd` error that popped up
18:57 krinkle@deploy1002: Synchronized wmf-config/logging.php: I2cd81a (duration: 01m 05s)
18:51 mwdebug-deploy@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
18:48 mwdebug-deploy@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
18:02 cmjohnson@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
17:56 cmjohnson@cumin1001: START - Cookbook sre.dns.netbox
17:31 volans@cumin2002: END (PASS) - Cookbook sre.experimental.reimage (exit_code=0) for host sretest1001.eqiad.wmnet
17:30 cmjohnson@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
17:28 cmjohnson@cumin1001: START - Cookbook sre.dns.netbox
17:22 cmjohnson@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
17:19 cmjohnson@cumin1001: START - Cookbook sre.dns.netbox
17:18 cmjohnson@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
17:13 cmjohnson@cumin1001: START - Cookbook sre.dns.netbox
17:06 volans@cumin2002: START - Cookbook sre.experimental.reimage for host sretest1001.eqiad.wmnet
17:01 cmjohnson@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
16:59 volans: uploaded spicerack_1.0.1 to apt.wikimedia.org buster-wikimedia,bullseye-wikimedia
16:55 cmjohnson@cumin1001: START - Cookbook sre.dns.netbox
16:38 ryankemper: T280001 Merged https://gerrit.wikimedia.org/r/c/operations/puppet/+/713959, running puppet on `*w*qs*` (i.e. wcqs and wdqs)
16:13 elukey: reboot an-worker1096 to see if megacli status for a new disk changes - T290805
16:09 brennen: gitlab1001: reverting gitlab cas: uid instead of CN; add nickname_key for T288392, as existing user logins are broken.
15:54 Lucas_WMDE: lucaswerkmeister-wmde@mwmaint1002:~$ echo 'https://query.wikidata.org/querybuilder/' | mwscript purgeList.php # T285761
15:54 brennen: gitlab1001: brief downtime to apply gitlab cas: uid instead of CN; add nickname_key for T288392
15:12 mwdebug-deploy@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
15:09 mwdebug-deploy@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
15:09 elukey@deploy1002: helmfile [ml-serve-eqiad] DONE helmfile.d/admin 'apply'.
15:09 elukey@deploy1002: helmfile [ml-serve-eqiad] START helmfile.d/admin 'apply'.
15:06 elukey@deploy1002: helmfile [ml-serve-eqiad] DONE helmfile.d/admin 'apply'.
15:06 elukey@deploy1002: helmfile [ml-serve-eqiad] START helmfile.d/admin 'apply'.
14:58 reedy@deploy1002: Synchronized wmf-config/reverse-proxy-staging.php: T291643 (duration: 01m 05s)
14:19 moritzm: removed routers filter for mx1001, reimage to bullseye complete T286911
14:19 elukey@deploy1002: helmfile [ml-serve-eqiad] DONE helmfile.d/admin 'apply'.
14:19 elukey@deploy1002: helmfile [ml-serve-eqiad] START helmfile.d/admin 'apply'.
14:14 otto@deploy1002: helmfile [staging] Ran 'sync' command on namespace 'eventgate-logging-external' for release 'production' .
13:53 effie: upgrade php7.2 on codfw - T291052
13:36 jmm@cumin2002: END (PASS) - Cookbook sre.puppet.renew-cert (exit_code=0) for mx1001.wikimedia.org: Renew puppet certificate - jmm@cumin2002
13:36 jmm@cumin2002: START - Cookbook sre.puppet.renew-cert for mx1001.wikimedia.org: Renew puppet certificate - jmm@cumin2002
13:34 jmm@cumin2002: END (FAIL) - Cookbook sre.puppet.renew-cert (exit_code=99) for mx1001.wikimedia.org: Renew puppet certificate - jmm@cumin2002
13:34 jmm@cumin2002: START - Cookbook sre.puppet.renew-cert for mx1001.wikimedia.org: Renew puppet certificate - jmm@cumin2002
13:28 marostegui: Deploy schema change on s8 codfw wikidatawiki.wb_changes T291584
13:27 moritzm: reimaging mx1001 to bullseye T286911
13:25 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on mx1001.wikimedia.org with reason: reimage
13:25 jmm@cumin2002: START - Cookbook sre.hosts.downtime for 1:00:00 on mx1001.wikimedia.org with reason: reimage
13:23 jbond: merge refactor of resolv.conf puppet class - (gerrit 717241)
13:14 marostegui: Deploy schema change on s4 {commonswiki,testcommonswiki}.wb_changes T291584
13:11 marostegui: Deploy schema change on s3 testwikidatawiki.wb_changes T291584
13:09 elukey: update pcc facts (after change in puppetdb's fact filter list, to allow partitions for analytics)
11:24 mwdebug-deploy@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
11:21 mwdebug-deploy@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
11:19 marostegui: Upgrade db2081 db2082 db2083 db2084 db2091 db2152 T290868
11:16 kostajh: UTC morning backport and config deploys done
11:15 kharlan@deploy1002: Synchronized wmf-config/InitialiseSettings.php: Config: GrowthExperiments: Place new dewiki accounts in control group (T288420) (duration: 01m 06s)
11:10 jynus: restart and upgrade db2141 T290865
10:55 mvolz@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'zotero' for release 'production' .
10:53 moritzm: mx1001 filterered on the routers for forthcoming reimage to bullseye T286911
10:52 mvolz@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'zotero' for release 'production' .
10:51 volans@cumin1001: END (PASS) - Cookbook sre.experimental.reimage (exit_code=0) for host sretest1002.eqiad.wmnet
10:50 marostegui: Upgrade db2102 db2116 db2130 db2145 db2146
10:47 mvolz@deploy1002: helmfile [staging] Ran 'sync' command on namespace 'zotero' for release 'staging' .
10:27 volans@cumin1001: START - Cookbook sre.experimental.reimage for host sretest1002.eqiad.wmnet
09:59 jiji@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'tegola-vector-tiles' for release 'main' .
09:55 jiji@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'tegola-vector-tiles' for release 'main' .
09:52 jmm@cumin2002: END (PASS) - Cookbook sre.puppet.renew-cert (exit_code=0) for mx2002.wikimedia.org: Renew puppet certificate - jmm@cumin2002
09:52 jmm@cumin2002: START - Cookbook sre.puppet.renew-cert for mx2002.wikimedia.org: Renew puppet certificate - jmm@cumin2002
09:51 jiji@deploy1002: helmfile [staging] Ran 'sync' command on namespace 'tegola-vector-tiles' for release 'main' .
09:40 moritzm: reinstalling mx2002 (test server) to validate bullseye installs are fixed
09:31 jiji@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'tegola-vector-tiles' for release 'main' .
09:30 jiji@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'tegola-vector-tiles' for release 'main' .
09:29 jiji@deploy1002: helmfile [staging] Ran 'sync' command on namespace 'tegola-vector-tiles' for release 'main' .
08:16 mwdebug-deploy@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
08:12 mwdebug-deploy@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
08:04 mwdebug-deploy@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
08:04 legoktm@deploy1002: Synchronized wmf-config/CommonSettings.php: Have SyntaxHighlight use Shellbox service on group0 wikis (2/2) (T289227) (duration: 01m 05s)
08:02 legoktm@deploy1002: Synchronized wmf-config/InitialiseSettings.php: Have SyntaxHighlight use Shellbox service on group0 wikis (1/2) (T289227) (duration: 01m 06s)
08:01 mwdebug-deploy@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
07:54 legoktm@deploy1002: Synchronized wmf-config/InitialiseSettings.php: Rename $wmgUseGeSHi to $wmgUseSyntaxHighlight (3/3) (duration: 01m 05s)
07:52 legoktm@deploy1002: Synchronized wmf-config/CommonSettings.php: Rename $wmgUseGeSHi to $wmgUseSyntaxHighlight (2/3) (duration: 01m 05s)
07:49 legoktm@deploy1002: Synchronized wmf-config/InitialiseSettings.php: Rename $wmgUseGeSHi to $wmgUseSyntaxHighlight (1/3) (duration: 01m 06s)
07:10 tgr: running `mwscript extensions/GrowthExperiments/maintenance/fixLinkRecommendationData.php --wiki=$WIKI --search-index --db-table --statsd` for growthexperiments.dblist wikis
07:01 elukey@deploy1002: helmfile [ml-serve-eqiad] DONE helmfile.d/admin 'sync'.
07:01 elukey@deploy1002: helmfile [ml-serve-eqiad] START helmfile.d/admin 'sync'.
06:59 elukey@deploy1002: helmfile [ml-serve-eqiad] DONE helmfile.d/admin 'sync'.
06:59 elukey@deploy1002: helmfile [ml-serve-eqiad] START helmfile.d/admin 'sync'.
06:57 elukey@deploy1002: helmfile [ml-serve-eqiad] DONE helmfile.d/admin 'apply'.
06:57 elukey@deploy1002: helmfile [ml-serve-eqiad] START helmfile.d/admin 'apply'.
06:56 marostegui: Upgrade db2116
06:55 elukey@deploy1002: helmfile [ml-serve-eqiad] DONE helmfile.d/admin 'apply'.
06:55 elukey@deploy1002: helmfile [ml-serve-eqiad] START helmfile.d/admin 'apply'.
06:55 elukey@deploy1002: helmfile [ml-serve-eqiad] DONE helmfile.d/admin 'sync'.
06:55 elukey@deploy1002: helmfile [ml-serve-eqiad] START helmfile.d/admin 'sync'.
06:53 marostegui: Upgrade db2085, db2088 and db2092
05:24 marostegui: Optimize ruwiki.logging on codfw T286102
02:55 eileen: civicrm revision changed from 14658445a2 to 18228490ae, config revision is 77cb7ec866
02:06 RoanKattouw: Deployed patch for T291600
01:05 eileen: tools revision changed from 1d67c52c12 to d90f4c91ee
00:35 catrope@deploy1002: Synchronized php-1.38.0-wmf.1/extensions/MediaSearch/: Use text() instead of parse() for MediaSearch UI messages (T291590) (duration: 01m 08s)
00:29 mwdebug-deploy@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
00:27 mwdebug-deploy@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .

2021-09-22

22:51 mutante: mx2001 - re-enabled puppet
20:48 ryankemper: [WDQS] After puppet-merging, running puppet on `miscweb*`, and doing a `ryankemper@mwmaint1002:~$ echo 'https://query.wikidata.org/querybuilder' | mwscript purgeList.php`, https://query.wikidata.org/querybuilder is working properly again
20:39 ryankemper: [WDQS] Merging https://gerrit.wikimedia.org/r/c/operations/puppet/+/722958/ which should (hopefully) resolve an issue where https://query.wikidata.org/querybuilder gives a 404, whereas https://query.wikidata.org/querybuilder/ works (due to the trailing slash avoiding the rewrite regex)
20:38 ryankemper: `[WCQS]` `wcqs1001.eqiad.wmnet` is reachable again following the powercycle
20:20 ryankemper: `[WCQS]` Ran `racadm>>racadm serveraction powercycle` on `wcqs1001.mgmt.eqiad.wmnet`
20:18 ryankemper: `[WCQS]` `wcqs1001` is ssh unreachable (https://icinga.wikimedia.org/cgi-bin/icinga/extinfo.cgi?type=2&host=wcqs1001&service=SSH), will try restarting from mgmt console
19:29 dduvall: 1.38.0-wmf.1 promoted to group1. no new errors or rising error rates (T281165)
19:22 mwdebug-deploy@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
19:20 mwdebug-deploy@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
19:20 dduvall@deploy1002: Synchronized php: group1 wikis to 1.38.0-wmf.1 (duration: 01m 11s)
19:18 dduvall@deploy1002: rebuilt and synchronized wikiversions files: group1 wikis to 1.38.0-wmf.1
19:12 mwdebug-deploy@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
19:11 dduvall@deploy1002: Synchronized php-1.38.0-wmf.1/extensions/CentralAuth: Backport: Avoid $wgUser deprecation warnings (T291515) (duration: 01m 06s)
19:10 mwdebug-deploy@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
18:33 mwdebug-deploy@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
18:32 legoktm@deploy1002: Synchronized php-1.38.0-wmf.1/extensions/GrowthExperiments/modules/help/ext.growthExperiments.PostEditPanel.js: Post-edit Panel: Set task.pageviews to null rather than undefined (T291510) (duration: 01m 05s)
18:31 mwdebug-deploy@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
18:23 mwdebug-deploy@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
18:21 mwdebug-deploy@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
18:13 mwdebug-deploy@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
18:12 legoktm@deploy1002: Synchronized wmf-config/InitialiseSettings.php: logging: send DuplicateParse bucket to Logstash (duration: 01m 05s)
18:11 mwdebug-deploy@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
18:06 legoktm@deploy1002: Synchronized wmf-config/ProductionServices.php: Add new Shellboxes (duration: 01m 16s)
18:03 volans@cumin1001: END (PASS) - Cookbook sre.experimental.reimage (exit_code=0) for host sretest1001.eqiad.wmnet
17:43 mwdebug-deploy@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
17:40 mwdebug-deploy@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
17:38 volans@cumin1001: START - Cookbook sre.experimental.reimage for host sretest1001.eqiad.wmnet
17:38 legoktm@deploy1002: Synchronized php-1.38.0-wmf.1/includes/api/: Restore deprecated API token methods (3/3) (duration: 01m 07s)
17:36 legoktm@deploy1002: Synchronized php-1.38.0-wmf.1/autoload.php: Restore deprecated API token methods (2/3) (duration: 01m 05s)
17:34 legoktm@deploy1002: Synchronized php-1.38.0-wmf.1/includes/api/ApiTokens.php: Restore deprecated API token methods (1/3) (duration: 01m 05s)
16:58 cmjohnson@cumin1001: END (FAIL) - Cookbook sre.hosts.downtime (exit_code=99) for 2:00:00 on cloudcephosd1022.eqiad.wmnet with reason: REIMAGE
16:57 mwdebug-deploy@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
16:56 cmjohnson@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on cloudcephosd1022.eqiad.wmnet with reason: REIMAGE
16:55 mwdebug-deploy@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
16:53 volans@cumin1001: END (ERROR) - Cookbook sre.experimental.reimage (exit_code=97) for host sretest1002.eqiad.wmnet
16:50 reedy@deploy1002: Synchronized wmf-config/InitialiseSettings.php: Remove wmgFileBlacklist (duration: 01m 06s)
16:49 joal@deploy1002: Finished deploy [analytics/refinery@04aae46] (hadoop-test): Regular analytics weekly train TEST [analytics/refinery@04aae46] (duration: 06m 17s)
16:48 reedy@deploy1002: Synchronized wmf-config/CommonSettings.php: Use wmgProhibitedFileExtensions (duration: 01m 05s)
16:48 mwdebug-deploy@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
16:46 mwdebug-deploy@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
16:45 reedy@deploy1002: Synchronized wmf-config/InitialiseSettings.php: Add wmgProhibitedFileExtensions (duration: 01m 07s)
16:43 joal@deploy1002: Started deploy [analytics/refinery@04aae46] (hadoop-test): Regular analytics weekly train TEST [analytics/refinery@04aae46]
16:41 mutante: [netmon1002:~] $ sudo systemctl start rancid-differ
16:41 reedy@deploy1002: Synchronized wmf-config/InitialiseSettings.php: Rename wgShortPagesNamespaceBlacklist to wgShortPagesNamespaceExclusions (duration: 01m 05s)
16:40 mutante: [netmon1002:~] $ sudo systemctl start rancid-clean-logs
16:39 reedy@deploy1002: Synchronized wmf-config/CommonSettings.php: Rename wgEnableUserEmailBlacklist to wgEnableUserEmailMuteList (duration: 01m 05s)
16:38 mwdebug-deploy@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
16:38 cmjohnson@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
16:37 joal@deploy1002: Finished deploy [analytics/refinery@04aae46] (thin): Regular analytics weekly train THIN [analytics/refinery@04aae46] (duration: 00m 07s)
16:37 joal@deploy1002: Started deploy [analytics/refinery@04aae46] (thin): Regular analytics weekly train THIN [analytics/refinery@04aae46]
16:36 volans@cumin1001: START - Cookbook sre.experimental.reimage for host sretest1002.eqiad.wmnet
16:36 mwdebug-deploy@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
16:35 reedy@deploy1002: Synchronized wmf-config/CommonSettings.php: Use wgMimeTypeExclusions and set wgProhibitedFileExtensions not wgFileBlacklist (duration: 01m 05s)
16:32 joal@deploy1002: Finished deploy [analytics/refinery@04aae46]: Regular analytics weekly train [analytics/refinery@04aae46] (duration: 18m 19s)
16:30 cmjohnson@cumin1001: START - Cookbook sre.dns.netbox
16:19 mwdebug-deploy@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
16:17 mwdebug-deploy@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
16:14 joal@deploy1002: Started deploy [analytics/refinery@04aae46]: Regular analytics weekly train [analytics/refinery@04aae46]
16:13 ladsgroup@deploy1002: Synchronized wmf-config/InitialiseSettings.php: Config: Set jQuery migrate to false everywhere except metawiki (T280944) (duration: 01m 56s)
16:08 volans@cumin1001: END (ERROR) - Cookbook sre.experimental.reimage (exit_code=97) for host sretest1002.eqiad.wmnet
15:57 volans@cumin1001: START - Cookbook sre.experimental.reimage for host sretest1002.eqiad.wmnet
15:56 joal@deploy1002: Finished deploy [analytics/refinery@b2ca54f] (hadoop-test): Bugfix analytics deploy TEST [analytics/refinery@b2ca54f] (duration: 06m 17s)
15:52 moritzm: removed filters on mx1001 filterered on the routers due to an issue with the mx1001 reinstall T286911
15:49 joal@deploy1002: Started deploy [analytics/refinery@b2ca54f] (hadoop-test): Bugfix analytics deploy TEST [analytics/refinery@b2ca54f]
15:49 joal@deploy1002: Finished deploy [analytics/refinery@b2ca54f] (thin): Bugfix analytics deploy THIN [analytics/refinery@b2ca54f] (duration: 00m 07s)
15:49 joal@deploy1002: Started deploy [analytics/refinery@b2ca54f] (thin): Bugfix analytics deploy THIN [analytics/refinery@b2ca54f]
15:16 mbsantos@deploy1002: Finished deploy [kartotherian/deploy@7ed9c3b]: Revert "change tegola uri to test single production node" (duration: 00m 15s)
15:15 mbsantos@deploy1002: Started deploy [kartotherian/deploy@7ed9c3b]: Revert "change tegola uri to test single production node"
15:02 moritzm: re-installing mx1001 with bullseye T286911
14:47 volans: upgraded spicerack to 1.0.0 on cumin hosts
14:14 volans: uploaded spicerack_1.0.0 to apt.wikimedia.org buster-wikimedia,bullseye-wikimedia
13:39 herron: flushed mx1001 mail queue to mx2001 T286911
13:26 moritzm: mx1001 filterered on the routers for forthcoming reimage to bullseye T286911
13:23 joal@deploy1002: Finished deploy [analytics/refinery@b2ca54f]: Bugfix analytics deploy [analytics/refinery@b2ca54f] (duration: 18m 25s)
13:09 mbsantos@deploy1002: Finished deploy [kartotherian/deploy@3293ce1]: tegola: increase mirrored requests to 10% (duration: 00m 14s)
13:09 mbsantos@deploy1002: Started deploy [kartotherian/deploy@3293ce1]: tegola: increase mirrored requests to 10%
13:04 joal@deploy1002: Started deploy [analytics/refinery@b2ca54f]: Bugfix analytics deploy [analytics/refinery@b2ca54f]
12:56 mbsantos@deploy1002: Finished deploy [kartotherian/deploy@5617839]: tegola: increase mirrored requests to 5% (duration: 00m 15s)
12:55 mbsantos@deploy1002: Started deploy [kartotherian/deploy@5617839]: tegola: increase mirrored requests to 5%
12:46 mbsantos@deploy1002: Finished deploy [kartotherian/deploy@8765218]: change tegola uri to test single production node (duration: 00m 14s)
12:46 mbsantos@deploy1002: Started deploy [kartotherian/deploy@8765218]: change tegola uri to test single production node
11:46 jiji@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'tegola-vector-tiles' for release 'main' .
11:38 jbond: enable puppet fleet wide to post puppetdb restart
11:33 jbond: disable puppet fleet wide to preforme puppdb restart
11:11 jgiannelos@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'tegola-vector-tiles' for release 'main' .
10:50 jgiannelos@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'tegola-vector-tiles' for release 'main' .
10:31 jiji@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'tegola-vector-tiles' for release 'main' .
10:20 jiji@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'tegola-vector-tiles' for release 'main' .
09:51 jiji@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'tegola-vector-tiles' for release 'main' .
09:38 jiji@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'tegola-vector-tiles' for release 'main' .
08:46 effie: upgrade php7.2 on api-canaries and restart service - T291052
06:02 elukey: update pcc facts
05:48 legoktm@cumin1001: conftool action : set/pooled=true; selector: dnsdisc=shellbox-syntaxhighlight
05:48 legoktm@cumin1001: conftool action : set/pooled=true; selector: dnsdisc=shellbox-timeline
05:47 legoktm@cumin1001: conftool action : set/pooled=true; selector: dnsdisc=shellbox-media
05:31 legoktm: restarting pybal on lvs2009
05:27 legoktm: restarting pybal on lvs2010
05:23 legoktm: restarting pybal on lvs1015
05:17 legoktm: restarting pybal on lvs1016
05:12 legoktm: sudo cumin 'O:lvs::balancer' 'run-puppet-agent'
04:48 legoktm: ran authdns-update for adding new shellbox svc entries https://gerrit.wikimedia.org/r/721908

2021-09-21

23:20 mwdebug-deploy@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
23:19 mwdebug-deploy@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
22:56 bd808@deploy1002: helmfile [staging] Ran 'sync' command on namespace 'toolhub' for release 'main' .
22:29 bd808@deploy1002: helmfile [staging] Ran 'sync' command on namespace 'toolhub' for release 'main' .
21:58 bd808@deploy1002: helmfile [staging] Ran 'sync' command on namespace 'toolhub' for release 'main' .
21:16 cstone: payments-wiki revision is 23d0ffac66
19:59 mwdebug-deploy@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
19:57 mwdebug-deploy@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
19:54 hashar@deploy1002: Synchronized wmf-config/InitialiseSettings.php: Enable 'DuplicateParse' logging bucket (duration: 01m 07s)
19:19 mwdebug-deploy@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
19:15 mwdebug-deploy@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
19:10 ryankemper: T280001 `sre.dns.netbox` completed successfully
19:06 dduvall@deploy1002: rebuilt and synchronized wikiversions files: group0 wikis to 1.38.0-wmf.1
19:03 ryankemper@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
18:57 ryankemper@cumin1001: START - Cookbook sre.dns.netbox
18:56 ryankemper: T280001 Running `sudo -i cookbook sre.dns.netbox -t T280001 'Added wcqs.svc.[eqiad,codfw].wmnet'` per final step of https://wikitech.wikimedia.org/wiki/LVS#DNS_changes_(svc_zone_only)...
18:53 ryankemper: T280001 `for i in 0 1 2 ; do dig @ns${i}.wikimedia.org -t any wcqs.svc.[eqiad,codfw].wmnet ; done` looks as expected
18:48 ryankemper: T280001 `OK - authdns-update successful on all nodes!`
18:45 ryankemper: T280001 `ryankemper@authdns1001:~$ sudo authdns-update`
18:44 ryankemper: T280001 Merging https://gerrit.wikimedia.org/r/c/operations/dns/+/713929; will follow steps in https://wikitech.wikimedia.org/wiki/DNS#Changing_records_in_a_zonefile post-merge
17:56 cstone: payments-wiki revision is 23d0ffac66
17:49 dduvall: 1.38.0-wmf.1 deployed to testwikis (T281165)
17:48 jiji@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'tegola-vector-tiles' for release 'main' .
17:48 dduvall@deploy1002: Finished scap: testwikis wikis to 1.38.0-wmf.1 (duration: 35m 44s)
17:42 mwdebug-deploy@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
17:39 elukey: update pcc facts
17:37 mwdebug-deploy@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
17:35 jiji@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'tegola-vector-tiles' for release 'main' .
17:27 jgiannelos@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'tegola-vector-tiles' for release 'main' .
17:12 dduvall@deploy1002: Started scap: testwikis wikis to 1.38.0-wmf.1
17:08 jgiannelos@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'tegola-vector-tiles' for release 'main' .
17:00 mwdebug-deploy@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
16:58 mwdebug-deploy@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
16:51 jgiannelos@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'tegola-vector-tiles' for release 'main' .
16:35 mwdebug-deploy@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
16:33 mwdebug-deploy@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
16:33 jgiannelos@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'tegola-vector-tiles' for release 'main' .
16:14 jgiannelos@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'tegola-vector-tiles' for release 'main' .
15:56 mwdebug-deploy@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
15:54 mwdebug-deploy@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
15:46 jgiannelos@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'tegola-vector-tiles' for release 'main' .
15:39 elukey: update pcc facts
15:26 effie: upgrade php7.2 on app-canaries and restart service - T291052
15:24 cmjohnson@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
15:21 cmjohnson@cumin1001: START - Cookbook sre.dns.netbox
15:10 marostegui@cumin1001: dbctl commit (dc=all): 'Remove s10 from codfw T167973', diff saved to https://phabricator.wikimedia.org/P17307 and previous config saved to /var/cache/conftool/dbconfig/20210921-150958-marostegui.json
14:35 XioNoX: re-enable AMS-IX peering sessions - T291407
14:17 XioNoX: temporarily downpref Telia-Deutsch Telekom to not saturate Telia transit - T291407
13:52 XioNoX: disable AMS-IX peering sessions for maintenance - T291407
13:48 otto@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'eventgate-main' for release 'canary' .
13:48 otto@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'eventgate-main' for release 'production' .
13:41 otto@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'eventgate-main' for release 'production' .
13:41 otto@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'eventgate-main' for release 'canary' .
13:37 otto@deploy1002: helmfile [staging] Ran 'sync' command on namespace 'eventgate-main' for release 'production' .
13:32 jmm@cumin2002: END (PASS) - Cookbook sre.ganeti.makevm (exit_code=0) for new host testvm2002.codfw.wmnet
13:18 effie: upgrading php on wtp* servers to 7.2.34-18+0~20210223.60+debian10~1.gbpb21322+wmf2 && rolling service restart - T291052
13:08 jmm@cumin2002: START - Cookbook sre.ganeti.makevm for new host testvm2002.codfw.wmnet
12:01 jmm@cumin2002: END (FAIL) - Cookbook sre.hosts.reboot-single (exit_code=99) for host ganeti2025.codfw.wmnet
11:58 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host ganeti2025.codfw.wmnet
11:55 elukey@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
11:46 elukey@cumin1001: START - Cookbook sre.dns.netbox
11:45 jgiannelos@deploy1002: Synchronized wmf-config/InitialiseSettings.php: Configure event stream for map tile state change - 3b01ef587 (duration: 00m 57s)
11:45 jmm@cumin2002: END (FAIL) - Cookbook sre.hosts.reboot-single (exit_code=99) for host ganeti2026.codfw.wmnet
11:40 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host ganeti2026.codfw.wmnet
11:29 mwdebug-deploy@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
11:27 mwdebug-deploy@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
11:07 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ganeti2026.codfw.wmnet
10:56 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host ganeti2026.codfw.wmnet
10:54 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.decommission (exit_code=0) for hosts testvm2001.codfw.wmnet
10:25 jmm@cumin2002: START - Cookbook sre.hosts.decommission for hosts testvm2001.codfw.wmnet
09:59 _joe_: rebuilding openjdk8* image, ruby, nodejs-slim for T291458
09:46 _joe_: deneb:~# docker-registryctl delete-tags docker-registry.wikimedia.org/fluentd T291458
09:44 _joe_: deleting images for graphoid, T291458
05:16 kart_: Upgraded cxserver to 2021-09-16-130208-production
05:12 kartik@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'cxserver' for release 'production' .
05:03 kartik@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'cxserver' for release 'production' .
04:58 kartik@deploy1002: helmfile [staging] Ran 'sync' command on namespace 'cxserver' for release 'staging' .
02:35 mwdebug-deploy@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
02:33 mwdebug-deploy@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
02:05 mwdebug-deploy@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
02:03 mwdebug-deploy@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
00:16 tgr: Evening deploys done
00:16 tgr@deploy1002: Synchronized php-1.37.0-wmf.23/extensions/GrowthExperiments/modules/ext.growthExperiments.StructuredTask/addlink/AddLinkArticleTarget.js: Backport: AddLink: Skip over headings in phrase matching (T291361) (duration: 00m 57s)
00:04 mwdebug-deploy@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
00:03 mwdebug-deploy@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .

2021-09-20

23:31 ejegg: updated fundraising CiviCRM from e6bf81d99c to 14658445a2
23:29 bd808@deploy1002: helmfile [staging] Ran 'sync' command on namespace 'toolhub' for release 'main' .
23:22 mutante: LDAP - added georginaburnett-wmde to NDA group (T291391, T273780)
23:15 mwdebug-deploy@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
23:13 mwdebug-deploy@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
22:21 bd808@deploy1002: helmfile [staging] Ran 'sync' command on namespace 'toolhub' for release 'main' .
22:14 mutante: wdqs1004 - depool
22:10 mutante: wdqs1004 - service wdqs-updater restart
22:06 mutante: wdqs1004 - HTTP/1.1 503 Service Unavailable - systemctl restart wdqs-blazegraph
22:05 foks: changing user email for MIskander (WMF)@collabwiki
21:41 mutante: ms-fe1005 - systemctl start swift_dispersion_stats.service (gerrit:719285)
21:30 bd808@deploy1002: helmfile [staging] Ran 'sync' command on namespace 'toolhub' for release 'main' .
19:49 mwdebug-deploy@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
19:47 mwdebug-deploy@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
19:45 legoktm@deploy1002: Synchronized wmf-config/InitialiseSettings.php: Revert "Disable jQuery Migrate on group1" (T291410) (duration: 00m 56s)
17:02 legoktm: repooled codfw (traffic/caches) 1 week after DC switchover
16:41 effie: upgrading php on wtp[1025-1029] to 7.2.34-18+0~20210223.60+debian10~1.gbpb21322+wmf2 - T291052
16:28 mwdebug-deploy@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
16:26 mwdebug-deploy@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
14:48 marostegui@cumin1001: dbctl commit (dc=all): 'db1168 (re)pooling @ 100%: After migrating wikitech to codfw', diff saved to https://phabricator.wikimedia.org/P17305 and previous config saved to /var/cache/conftool/dbconfig/20210920-144844-root.json
14:42 jiji@deploy1002: helmfile [staging] Ran 'sync' command on namespace 'tegola-vector-tiles' for release 'main' .
14:33 marostegui@cumin1001: dbctl commit (dc=all): 'db1168 (re)pooling @ 75%: After migrating wikitech to codfw', diff saved to https://phabricator.wikimedia.org/P17304 and previous config saved to /var/cache/conftool/dbconfig/20210920-143340-root.json
14:18 marostegui@cumin1001: dbctl commit (dc=all): 'db1168 (re)pooling @ 50%: After migrating wikitech to codfw', diff saved to https://phabricator.wikimedia.org/P17303 and previous config saved to /var/cache/conftool/dbconfig/20210920-141836-root.json
14:11 jiji@deploy1002: helmfile [staging] Ran 'sync' command on namespace 'tegola-vector-tiles' for release 'main' .
14:03 marostegui@cumin1001: dbctl commit (dc=all): 'db1168 (re)pooling @ 25%: After migrating wikitech to codfw', diff saved to https://phabricator.wikimedia.org/P17302 and previous config saved to /var/cache/conftool/dbconfig/20210920-140333-root.json
13:51 jiji@deploy1002: helmfile [staging] Ran 'sync' command on namespace 'tegola-vector-tiles' for release 'main' .
13:47 hnowlan@cumin1001: END (PASS) - Cookbook sre.postgresql.postgres-init (exit_code=0)
13:45 moritzm: restarting apache on Logstash ELK5 cluster to pick up GNUTLS update T283165
13:39 elukey@cumin1001: END (PASS) - Cookbook sre.ores.roll-restart-workers (exit_code=0) for ORES codfw cluster: Roll restart of ORES's daemons. - elukey@cumin1001
13:20 elukey@cumin1001: START - Cookbook sre.ores.roll-restart-workers for ORES codfw cluster: Roll restart of ORES's daemons. - elukey@cumin1001
13:13 damilare: updated payments-wiki from f9cbf95a12 to 23d0ffac66
12:59 jiji@deploy1002: helmfile [staging] Ran 'sync' command on namespace 'tegola-vector-tiles' for release 'main' .
12:58 marostegui: Drop ct_tag_id_log key from db1144:3314 T277416
12:54 moritzm: installing gnutls28 updates for stretch with backport for forthcoming Let's encrypt issuance chain update (T283165)
12:42 marostegui: Add ct_tag_id_log key to db1144:3314 T277416
11:50 mwdebug-deploy@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
11:48 hnowlan@cumin1001: START - Cookbook sre.postgresql.postgres-init
11:47 hnowlan@cumin1001: END (PASS) - Cookbook sre.postgresql.postgres-init (exit_code=0)
11:46 mwdebug-deploy@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
11:36 mwdebug-deploy@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
11:31 mwdebug-deploy@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
11:31 urbanecm@deploy1002: Finished scap: b9031bc: Mentor dashboard: Mentor tools (T280307) (duration: 11m 44s)
11:23 mwdebug-deploy@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
11:20 urbanecm@deploy1002: Started scap: b9031bc: Mentor dashboard: Mentor tools (T280307)
11:18 mwdebug-deploy@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
11:18 ladsgroup@deploy1002: Synchronized wmf-config/InitialiseSettings.php: Config: Disable jQuery Migrate on group1 (T280944) (duration: 00m 56s)
11:16 urbanecm@deploy1002: Synchronized wmf-config/InitialiseSettings.php: b518d8b: Mentor dashboard: Enable beta mode at testwiki (T281534) (duration: 00m 55s)
11:11 urbanecm@deploy1002: Synchronized php-1.37.0-wmf.23/extensions/GrowthExperiments/includes/: b9031bc: Mentor dashboard: Mentor tools (T280307; 5) (duration: 00m 56s)
11:10 urbanecm@deploy1002: Synchronized php-1.37.0-wmf.23/extensions/GrowthExperiments/ServiceWiring.php: b9031bc: Mentor dashboard: Mentor tools (T280307; 4) (duration: 00m 56s)
11:09 hnowlan: roll restarting restbase service in codfw
11:08 urbanecm@deploy1002: Synchronized php-1.37.0-wmf.23/extensions/GrowthExperiments/includes/MentorDashboard/Modules/MentorTools.php: b9031bc: Mentor dashboard: Mentor tools (T280307; 2) (duration: 00m 55s)
11:07 urbanecm@deploy1002: sync-file aborted: b9031bc: Mentor dashboard: Mentor tools (T280307; 1) (duration: 00m 00s)
11:07 urbanecm@deploy1002: Synchronized php-1.37.0-wmf.23/extensions/GrowthExperiments/includes/MentorDashboard/MentorTools/MentorStatusManager.php: b9031bc: Mentor dashboard: Mentor tools (T280307; 1) (duration: 00m 57s)
11:05 hnowlan: roll restarting restbase service in eqiad for openssl updates
10:45 hnowlan: roll restarting kartotherian and tilerator on maps2*
10:41 hnowlan: roll restarting kartotherian and tilerator on maps1*
10:36 jynus: rolling restart bacula & minio daemons on backup hosts
09:59 moritzm: restarting apache2 on thorium
09:48 hnowlan@cumin1001: START - Cookbook sre.postgresql.postgres-init
09:47 marostegui@cumin1001: dbctl commit (dc=all): 'Remove s10 from eqiad T167973', diff saved to https://phabricator.wikimedia.org/P17300 and previous config saved to /var/cache/conftool/dbconfig/20210920-094739-marostegui.json
09:10 moritzm: installing openssl1.0 updates for stretch with backport for forthcoming Let's encrypt issuance chain update (T283165)
08:35 moritzm: updating clamav on ticket.wikimedia.org/otrs1001 to 0.103.3
08:02 mwdebug-deploy@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
07:58 mwdebug-deploy@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
07:58 oblivian@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
07:49 moritzm: uploaded maps-deduped-tilelist 0.0.3~deb10u1 to buster-wikimedia/main T290982
07:48 moritzm: uploaded maps-deduped-tilelist 0.0.3~deb10u1 to buster-wikimedia/main
07:48 mwdebug-deploy@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
07:43 oblivian@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
07:43 mwdebug-deploy@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
07:35 marostegui: Stop db1168 and db2129 in sync T167973
07:34 mwdebug-deploy@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
07:34 urbanecm@deploy1002: Synchronized wmf-config/throttle.php: af9d6e4: Revert "Add throttle rule for Czech wiki course" (duration: 00m 56s)
07:32 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db1168 T167973', diff saved to https://phabricator.wikimedia.org/P17299 and previous config saved to /var/cache/conftool/dbconfig/20210920-073256-marostegui.json
07:32 marostegui@cumin1001: dbctl commit (dc=all): 'Repool db1096:3316 T167973', diff saved to https://phabricator.wikimedia.org/P17298 and previous config saved to /var/cache/conftool/dbconfig/20210920-073206-marostegui.json
07:31 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db1096:3316 T167973', diff saved to https://phabricator.wikimedia.org/P17297 and previous config saved to /var/cache/conftool/dbconfig/20210920-073141-marostegui.json
07:31 moritzm: uploaded PHP 7.2.34-18+0~20210223.60+debian10~1.gbpb21322+wmf2 to apt.wikimedia.org (component/php7.2 for buster-wikimedia) T291052
07:29 mwdebug-deploy@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
07:28 urbanecm@deploy1002: Synchronized wmf-config/InitialiseSettings.php: 8c1d665: enwiki: Bump Growth features to 25% (mentorship limited to 20% of those users) (T290927) (duration: 00m 57s)
07:20 urbanecm: Revert undeployed config patch (https://gerrit.wikimedia.org/r/c/operations/mediawiki-config/+/721959); not even pulled to deployment, so assuming it never hit prod (T289771)
06:00 marostegui: Upgrade db2071, db2072, db2094

2021-09-18

01:47 ladsgroup@deploy1002: Synchronized php-1.37.0-wmf.23/includes/libs/rdbms/database/Database.php: (no justification provided) (duration: 00m 57s)
01:01 ladsgroup@deploy1002: Synchronized php-1.37.0-wmf.23/includes/libs/rdbms/database/Database.php: (no justification provided) (duration: 01m 03s)

2021-09-17

21:28 legoktm@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
21:19 legoktm@cumin1001: START - Cookbook sre.dns.netbox
19:00 hnowlan@cumin1001: END (PASS) - Cookbook sre.postgresql.postgres-init (exit_code=0)
17:02 cmjohnson@cumin1001: END (FAIL) - Cookbook sre.hosts.downtime (exit_code=99) for 2:00:00 on cloudcephosd1022.eqiad.wmnet with reason: REIMAGE
17:02 hnowlan@cumin1001: START - Cookbook sre.postgresql.postgres-init
17:00 cmjohnson@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on cloudcephosd1022.eqiad.wmnet with reason: REIMAGE
16:48 hnowlan@cumin1001: END (PASS) - Cookbook sre.postgresql.postgres-init (exit_code=0)
16:27 cmjohnson@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
16:25 cmjohnson@cumin1001: START - Cookbook sre.dns.netbox
16:11 cmjohnson@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
16:04 cmjohnson@cumin1001: START - Cookbook sre.dns.netbox
14:49 hnowlan@cumin1001: START - Cookbook sre.postgresql.postgres-init
14:29 hnowlan@cumin1001: END (PASS) - Cookbook sre.postgresql.postgres-init (exit_code=0)
13:06 moritzm: installing 4.9.272 kernels on stretch hosts (no reboots yet)
11:28 hnowlan@cumin1001: START - Cookbook sre.postgresql.postgres-init
11:14 mwdebug-deploy@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
11:09 mwdebug-deploy@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
09:37 milimetric@deploy1002: Finished deploy [analytics/refinery@37e904a] (thin): Only syncing sanitize allowlist, deploying THIN for consistency (duration: 00m 07s)
09:37 milimetric@deploy1002: Started deploy [analytics/refinery@37e904a] (thin): Only syncing sanitize allowlist, deploying THIN for consistency
09:36 milimetric@deploy1002: Finished deploy [analytics/refinery@37e904a]: Only syncing sanitize allowlist (duration: 17m 43s)
09:19 milimetric@deploy1002: Started deploy [analytics/refinery@37e904a]: Only syncing sanitize allowlist
08:00 jayme: restarting php-fpm on wtp1037 and wtp1030
02:28 ryankemper: T290330 [Remove WDQS codfw ~hourly restarts] Successfully rolled out to rest of fleet `sudo cumin 'C:query_service::crontasks' 'sudo run-puppet-agent --force && sudo systemctl reset-failed wdqs-restart-hourly-w-random-delay.timer'`
02:22 ryankemper: T290330 [Remove WDQS codfw ~hourly restarts] `wdqs2001` and `wdqs2004` look fine after running `sudo systemctl reset-failed wdqs-restart-hourly-w-random-delay.timer` to clean up dangling timer
01:55 ryankemper: T290330 [Remove WDQS codfw ~hourly restarts] Testing on arbitrary codfw host: `ryankemper@wdqs2001:~$ sudo run-puppet-agent`
01:48 ryankemper: T290330 [Remove WDQS codfw ~hourly restarts] `sudo cumin 'C:query_service::crontasks' 'sudo disable-puppet "Stop doing wdqs codfw ~hourly restarts - T290330"'`
00:04 legoktm@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'shellbox-media' for release 'main' .
00:01 legoktm@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'shellbox-media' for release 'main' .

2021-09-16

23:58 legoktm@deploy1002: helmfile [staging] Ran 'sync' command on namespace 'shellbox-media' for release 'main' .
23:51 ryankemper: T273673 All looks good, re-enabling puppet and running on rest of fleet: `sudo cumin 'R:Class = elasticsearch::log::hot_threads' 'sudo run-puppet-agent --force'`
23:44 ryankemper: T273673 The associated crons are gone and I see the new systemd timers for both gc-cleanup and the hot threads logger
23:39 ryankemper: T273673 Testing elasticsearch cron->systemd timer-job changes on canary instance `ryankemper@elastic1064:~$ sudo run-puppet-agent --force`
23:37 ryankemper: T273673 Disabling puppet on elasticsearch hosts `sudo cumin 'R:Class = elasticsearch::log::hot_threads' 'sudo disable-puppet "https://gerrit.wikimedia.org/r/c/operations/puppet/+/721413 - T273673"'`
23:21 legoktm@deploy1002: helmfile [eqiad] DONE helmfile.d/admin 'apply'.
23:21 legoktm@deploy1002: helmfile [eqiad] START helmfile.d/admin 'apply'.
23:19 legoktm@deploy1002: helmfile [codfw] DONE helmfile.d/admin 'apply'.
23:18 legoktm@deploy1002: helmfile [codfw] START helmfile.d/admin 'apply'.
23:18 legoktm@deploy1002: helmfile [staging-eqiad] DONE helmfile.d/admin 'apply'.
23:17 legoktm@deploy1002: helmfile [staging-eqiad] START helmfile.d/admin 'apply'.
23:17 legoktm@deploy1002: helmfile [staging-codfw] DONE helmfile.d/admin 'apply'.
23:16 legoktm@deploy1002: helmfile [staging-codfw] START helmfile.d/admin 'apply'.
22:45 mwdebug-deploy@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
22:40 mwdebug-deploy@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
22:38 legoktm@deploy1002: Finished scap: i18n for restoring deprecated token APIs (duration: 15m 30s)
22:30 mwdebug-deploy@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
22:25 mwdebug-deploy@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
22:23 legoktm@deploy1002: Started scap: i18n for restoring deprecated token APIs
22:21 legoktm@deploy1002: Synchronized php-1.37.0-wmf.23/includes/api/: Restore deprecated token APIs (3/3) (duration: 00m 56s)
22:19 legoktm@deploy1002: Synchronized php-1.37.0-wmf.23/autoload.php: Restore deprecated token APIs (2/3) (duration: 00m 56s)
22:16 legoktm@deploy1002: Synchronized php-1.37.0-wmf.23/includes/api/ApiTokens.php: Restore deprecated token APIs (1/3) (duration: 00m 56s)
21:22 robh@cumin1001: END (FAIL) - Cookbook sre.hosts.downtime (exit_code=99) for 2:00:00 on ganeti4004.ulsfo.wmnet with reason: REIMAGE
21:19 robh@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on ganeti4004.ulsfo.wmnet with reason: REIMAGE
21:04 mwdebug-deploy@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
21:00 mwdebug-deploy@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
20:49 ladsgroup@deploy1002: Synchronized wmf-config/InitialiseSettings.php: Config: Set jQuery migrate to false for wikibooks and Commons (T280944) (duration: 00m 56s)
19:26 mwdebug-deploy@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
19:21 mwdebug-deploy@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
19:08 hashar@deploy1002: rebuilt and synchronized wikiversions files: all wikis to 1.37.0-wmf.23
18:55 robh@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
18:51 mwdebug-deploy@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
18:50 robh@cumin1001: START - Cookbook sre.dns.netbox
18:49 dzahn@cumin1001: END (FAIL) - Cookbook sre.dns.netbox (exit_code=99)
18:46 dzahn@cumin1001: START - Cookbook sre.dns.netbox
18:44 mwdebug-deploy@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
18:29 urbanecm@deploy1002: Synchronized php-1.37.0-wmf.23/extensions/GrowthExperiments/modules/ext.growthExperiments.StructuredTask/addlink/AddLinkArticleTarget.js: bb8cba1: Use growthexperiments-structuredtask-no-suggestions-found-dialog-button in outdated suggestions dialog (2/2) (duration: 01m 06s)
18:27 urbanecm@deploy1002: Synchronized php-1.37.0-wmf.23/extensions/GrowthExperiments/extension.json: bb8cba1: Use growthexperiments-structuredtask-no-suggestions-found-dialog-button in outdated suggestions dialog (1/2) (duration: 01m 07s)
17:54 volans: turn of lldp agent on NIC (both ports) on ms-be105[1-9],ms-be205[2-6] - T290984
17:31 volans: turn of lldp agent on NIC (both ports) on ms-be2051 - T290984
17:09 jynus: deployed extra grants for admin user on s6 primary
16:39 btullis@cumin1001: END (PASS) - Cookbook sre.hosts.decommission (exit_code=0) for hosts an-test-coord1002.eqiad.wmnet
16:17 btullis@cumin1001: START - Cookbook sre.hosts.decommission for hosts an-test-coord1002.eqiad.wmnet
16:04 marostegui: Disconnect s6 master from m5 master (noting the replication position) T167973
16:04 marostegui: Disconnect s6 master from m5 master (noting the replication position)
15:52 bd808: marostegui is awesome and made wikitech better today. :)
15:04 marostegui@cumin1001: dbctl commit (dc=all): 'Set wikitech on read-only for maintenance T287454', diff saved to https://phabricator.wikimedia.org/P17283 and previous config saved to /var/cache/conftool/dbconfig/20210916-150444-marostegui.json
15:03 marostegui: Set wikitech on read-only (from now on all SAL changes will fail) T167973
14:56 dzahn@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on mwmaint2002.codfw.wmnet with reason: reimage
14:55 dzahn@cumin1001: START - Cookbook sre.hosts.downtime for 1:00:00 on mwmaint2002.codfw.wmnet with reason: reimage
14:53 dzahn@cumin1001: END (FAIL) - Cookbook sre.hosts.downtime (exit_code=99) for 2:00:00 on mwmaint2002.codfw.wmnet with reason: REIMAGE
14:53 mwdebug-deploy@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
14:51 mwdebug-deploy@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
14:51 dzahn@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on mwmaint2002.codfw.wmnet with reason: REIMAGE
14:35 mutante: reimaging mwmaint2002 to buster (T267607, T245757)
14:21 dzahn@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on mwmaint2002.codfw.wmnet with reason: reimage
14:21 dzahn@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on mwmaint2002.codfw.wmnet with reason: reimage
14:12 mutante: switching https://noc.wikimedia.org from codfw to eqiad (T287539, T267607)
13:44 sukhe: homer: running for Gerrit: 721018: set up BGP peering to durum hosts in {eqiad,codfw,esams,ulsfo,eqsin}
13:25 effie: pool mw1422 mw1455
13:24 effie: poiol mw1422 mw1455
13:18 mwdebug-deploy@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
13:16 mwdebug-deploy@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
13:12 hashar@deploy1002: Synchronized php: group1 wikis to 1.37.0-wmf.23 (duration: 01m 04s)
13:11 hashar@deploy1002: rebuilt and synchronized wikiversions files: group1 wikis to 1.37.0-wmf.23
12:08 marostegui: Deploy schema change on s2 codfw (lag will show up) T290057
12:00 mbsantos: start OSM re-import script in maps2009 (depooled)
11:51 urbanecm@deploy1002: Synchronized php-1.37.0-wmf.21/extensions/GrowthExperiments/includes/MentorDashboard/MenteeOverview/UncachedMenteeOverviewDataProvider.php: 529f86c: UncachedMenteeOverviewDataProvider: Do not fatal with zero mentees (T291088) (duration: 01m 04s)
11:49 urbanecm@deploy1002: Synchronized php-1.37.0-wmf.23/extensions/GrowthExperiments/includes/MentorDashboard/MenteeOverview/UncachedMenteeOverviewDataProvider.php: 9e0f6f8: UncachedMenteeOverviewDataProvider: Do not fatal with zero mentees (T291088) (duration: 01m 04s)
11:43 urbanecm@deploy1002: Synchronized php-1.37.0-wmf.23/extensions/AbuseFilter/: Fixing incorrect deployment of 01e4450 for T291123. This is supposed to be a no-op. (duration: 01m 05s)
11:43 mwdebug-deploy@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
11:41 urbanecm: [urbanecm@deploy1002 /srv/mediawiki-staging/php-1.37.0-wmf.23 (wmf/1.37.0-wmf.23 * u+2-2)]$ git rebase && git submodule update extensions/AbuseFilter/ # fixing an incorrect deployment that happened in T291123
11:41 mwdebug-deploy@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
11:41 urbanecm: [urbanecm@deploy1002 /srv/mediawiki-staging/php-1.37.0-wmf.23/extensions/AbuseFilter (wmf/1.37.0-wmf.23 u=)]$ git co 0d2bc7c # reset repo to expected state, fixing incorrect deploy of a backport in T291123
11:34 moritzm: installing 4.9.272 kernels on stretch hosts (no reboots yet)
11:34 mwdebug-deploy@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
11:32 mwdebug-deploy@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
11:21 jmm@cumin2002: END (PASS) - Cookbook sre.puppet.renew-cert (exit_code=0) for mx2002.wikimedia.org: Renew puppet certificate - jmm@cumin2002
11:21 jmm@cumin2002: START - Cookbook sre.puppet.renew-cert for mx2002.wikimedia.org: Renew puppet certificate - jmm@cumin2002
11:15 mwdebug-deploy@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
11:13 mwdebug-deploy@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
11:11 lucaswerkmeister-wmde@deploy1002: Synchronized wmf-config/InitialiseSettings-labs.php: Config: Add new WikimediaBadges config (T232927) (2/2) (duration: 01m 05s)
11:09 lucaswerkmeister-wmde@deploy1002: Synchronized wmf-config/InitialiseSettings.php: Config: Add new WikimediaBadges config (T232927) (1/2) (duration: 01m 05s)
11:03 jmm@cumin2002: END (FAIL) - Cookbook sre.puppet.renew-cert (exit_code=99) for mx2002.wikimedia.org: Renew puppet certificate - jmm@cumin2002
11:03 jmm@cumin2002: START - Cookbook sre.puppet.renew-cert for mx2002.wikimedia.org: Renew puppet certificate - jmm@cumin2002
10:59 hashar@deploy1002: Synchronized php-1.37.0-wmf.21/includes/language/Message.php: Message: Remove deprecated format property - T146416 T291124 (duration: 01m 06s)
10:55 mwdebug-deploy@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
10:54 mwdebug-deploy@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
10:21 topranks: Changing default gateway on mw1422 to use VRRP backup (cr2), to determine if tail drops from switches to cr1 is cause of TCP retransmissions.
10:14 effie: depool mw1455 for network testing
10:11 effie: depool mw1422 for network testing
10:01 jmm@cumin2002: END (FAIL) - Cookbook sre.puppet.renew-cert (exit_code=99) for mx2002.wikimedia.org: Renew puppet certificate - jmm@cumin2002
10:01 jmm@cumin2002: START - Cookbook sre.puppet.renew-cert for mx2002.wikimedia.org: Renew puppet certificate - jmm@cumin2002
10:00 jmm@cumin2002: END (FAIL) - Cookbook sre.puppet.renew-cert (exit_code=99) for mx2002.wikimedia.org: Renew puppet certificate - jmm@cumin2002
10:00 jmm@cumin2002: START - Cookbook sre.puppet.renew-cert for mx2002.wikimedia.org: Renew puppet certificate - jmm@cumin2002
09:36 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on mx2002.wikimedia.org with reason: reimage
09:36 jmm@cumin2002: START - Cookbook sre.hosts.downtime for 1:00:00 on mx2002.wikimedia.org with reason: reimage
09:10 moritzm: in-place re-installation of mx2002.wikimedia.org (test VM) to test the new installer key support in the sre.puppet.renew-cert cookbook
08:04 moritzm: upgrading scandium to PHP 7.2 backport of patch for enhanced DOM replaceChild/removeChild performance T291052
07:48 elukey@puppetmaster1001: conftool action : set/pooled=true; selector: dnsdisc=helm-charts,name=eqiad
05:35 marostegui: Optimize dewiki.logging in codfw T287344

2021-09-15

23:02 legoktm: upgrading lists1001 to use postorius 1.3.5
22:51 legoktm: uploaded new mailmanclient/postorius packages to apt1001
22:38 ryankemper: [WDQS Deploy] Deploy complete. Successful test query placed on query.wikidata.org, there's no relevant criticals in Icinga, and Grafana looks good
22:03 ryankemper: [WDQS Deploy] Restarting `wdqs-categories` across lvs-managed hosts, one node at a time: `sudo -E cumin -b 1 'A:wdqs-all and not A:wdqs-test' 'depool && sleep 45 && systemctl restart wdqs-categories && sleep 45 && pool'`
22:03 ryankemper: [WDQS Deploy] Restarted `wdqs-categories` across both test hosts simultaneously: `sudo -E cumin 'A:wdqs-test' 'systemctl restart wdqs-categories'`
22:03 ryankemper: [WDQS Deploy] Restarted `wdqs-updater` across all hosts, 4 hosts at a time: `sudo -E cumin -b 4 'A:wdqs-all' 'systemctl restart wdqs-updater'`
22:02 ryankemper@deploy1002: Finished deploy [wdqs/wdqs@902529b]: 0.3.85 (duration: 06m 59s)
21:56 ryankemper: [WDQS Deploy] Tests passing following deploy of `0.3.85` on canary `wdqs1003`; proceeding to rest of fleet
21:55 ryankemper@deploy1002: Started deploy [wdqs/wdqs@902529b]: 0.3.85
21:55 ryankemper: [WDQS Deploy] Gearing up for deploy of wdqs `0.3.85`. Pre-deploy tests passing on canary `wdqs1003`
21:42 ebernhardson@deploy1002: Finished deploy [wdqs/wdqs@f3473d9]: Reference files deployed by puppet through query_service paths instead of wdqs (duration: 02m 07s)
21:40 ebernhardson@deploy1002: Started deploy [wdqs/wdqs@f3473d9]: Reference files deployed by puppet through query_service paths instead of wdqs
21:29 mwdebug-deploy@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
21:27 mwdebug-deploy@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
21:04 mwdebug-deploy@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
21:03 mwdebug-deploy@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
21:00 urbanecm@deploy1002: Synchronized wmf-config/InitialiseSettings.php: 60e7e51: Set wmgEchoEnablePush to false explicitly on arbcom_* wikis (T291128) (duration: 01m 06s)
19:50 twentyafterfour@deploy1002: Synchronized php-1.37.0-wmf.23/extensions/AbuseFilter/: sync backport for https://gerrit.wikimedia.org/r/c/mediawiki/extensions/AbuseFilter/+/721312 (duration: 01m 06s)
19:44 mwdebug-deploy@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
19:43 mwdebug-deploy@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
19:15 mwdebug-deploy@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
19:14 mwdebug-deploy@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
19:07 hashar@deploy1002: rebuilt and synchronized wikiversions files: Rollback all wikis to 1.37.0-wmf.23
19:07 urbanecm: Re-start server-side upload for 1 video file, likely temporary swift failure (T289781)
19:06 urbanecm: Start server-side upload for 1 video file (T287686)
19:04 hashar@deploy1002: Synchronized php: group1 wikis to 1.37.0-wmf.23 (duration: 00m 55s)
19:03 hashar@deploy1002: rebuilt and synchronized wikiversions files: group1 wikis to 1.37.0-wmf.23
18:52 urbanecm: Start server-side upload for 1 video file (T289949)
18:50 urbanecm: Start server-side upload for 1 video file (T289781)
18:44 urbanecm: Start server-side upload for 3 large PDF files (T290722)
18:43 legoktm: migrated sitereq-l@ from Google Groups to Mailman (T290908)
18:27 urbanecm: Start server-side upload for 1 video file (T290290)
18:23 urbanecm: Start server-side upload for 1 video file (T290685)
18:21 urbanecm: Start server-side upload for 1 video file (T290707)
18:16 mwdebug-deploy@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
18:14 mwdebug-deploy@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
18:07 mwdebug-deploy@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
18:05 mwdebug-deploy@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
18:05 urbanecm@deploy1002: Synchronized wmf-config/InitialiseSettings.php: 7620084: Add portrattarkiv.se to wgCopyUploadsDomains whitelist of Wikimedia Commons (T290581) (duration: 01m 05s)
17:39 mutante: thumbor - running puppet on all thumbor hosts, removed cron job systemd-thumbor-tmpfiles-clean, added thumbor_systemd_tmpfiles_clean timer job
16:56 joal@deploy1002: Finished deploy [analytics/refinery@0f7f6f3] (hadoop-test): Regular analytics weekly train TEST [analytics/refinery@0f7f6f3] (duration: 06m 15s)
16:50 joal@deploy1002: Started deploy [analytics/refinery@0f7f6f3] (hadoop-test): Regular analytics weekly train TEST [analytics/refinery@0f7f6f3]
16:47 joal@deploy1002: Finished deploy [analytics/refinery@0f7f6f3] (thin): Regular analytics weekly train THIN [analytics/refinery@0f7f6f3] (duration: 00m 07s)
16:47 joal@deploy1002: Started deploy [analytics/refinery@0f7f6f3] (thin): Regular analytics weekly train THIN [analytics/refinery@0f7f6f3]
16:45 joal@deploy1002: Finished deploy [analytics/refinery@0f7f6f3]: Regular analytics weekly train [analytics/refinery@0f7f6f3] (duration: 19m 43s)
16:31 dzahn@cumin1001: END (PASS) - Cookbook sre.ganeti.makevm (exit_code=0) for new host durum5002.eqsin.wmnet
16:26 joal@deploy1002: Started deploy [analytics/refinery@0f7f6f3]: Regular analytics weekly train [analytics/refinery@0f7f6f3]
16:19 dzahn@cumin1001: START - Cookbook sre.ganeti.makevm for new host durum5002.eqsin.wmnet
16:17 dzahn@cumin1001: END (PASS) - Cookbook sre.ganeti.makevm (exit_code=0) for new host durum5001.eqsin.wmnet
16:02 dzahn@cumin1001: START - Cookbook sre.ganeti.makevm for new host durum5001.eqsin.wmnet
15:56 urbanecm: Remove 2FA for User:Rho at wikitech, identity verified via a videocall
14:50 moritzm: installing lz4 security updates on stretch
13:50 oblivian@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
13:33 ottomata: pointing {stats,analytics}.wikimedia.org at analytics-web.discovery.wmnet cname - T285355
13:32 dzahn@cumin1001: END (PASS) - Cookbook sre.ganeti.makevm (exit_code=0) for new host durum4002.ulsfo.wmnet
13:18 dzahn@cumin1001: START - Cookbook sre.ganeti.makevm for new host durum4002.ulsfo.wmnet
13:15 dzahn@cumin1001: END (PASS) - Cookbook sre.ganeti.makevm (exit_code=0) for new host durum4001.ulsfo.wmnet
13:03 dzahn@cumin1001: START - Cookbook sre.ganeti.makevm for new host durum4001.ulsfo.wmnet
12:54 oblivian@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
11:41 marostegui: Install 10.4.21-2 on db1125
11:25 mwdebug-deploy@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
11:23 mwdebug-deploy@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
11:21 Lucas_WMDE: EU backport+config window done
11:20 lucaswerkmeister-wmde@deploy1002: Synchronized wmf-config/InitialiseSettings.php: Config: Enable change-tags for new edits' proofread status at mulWS (T289140) (duration: 01m 06s)
11:16 mwdebug-deploy@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
11:10 lucaswerkmeister-wmde@deploy1002: Synchronized wmf-config/InitialiseSettings.php: Config: Don’t check constraints on two property qualifiers (T235292) (duration: 01m 11s)
11:09 mwdebug-deploy@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
10:03 hnowlan@puppetmaster1001: conftool action : set/pooled=yes; selector: name=maps1010.eqiad.wmnet
09:55 effie: depool wtp1026
09:54 effie: depooling mw1312 and mw1319
09:46 topranks: Disabling Intel X710 NIC on-board LLDP processing on relforge1003 (T290984)
07:04 jiji@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
06:57 elukey: shutdown ms-be2045 (again) after seeing T290881
06:02 elukey: powercycle ms-be2045 - no ssh, no remote tty available
05:28 marostegui@cumin1001: dbctl commit (dc=all): 'Restore db1109 original load', diff saved to https://phabricator.wikimedia.org/P17274 and previous config saved to /var/cache/conftool/dbconfig/20210915-052802-marostegui.json
04:30 marostegui@cumin1001: dbctl commit (dc=all): 'Increase db1109 load', diff saved to https://phabricator.wikimedia.org/P17273 and previous config saved to /var/cache/conftool/dbconfig/20210915-043053-marostegui.json

2021-09-14

23:01 legoktm@deploy1002: Synchronized wmf-config/CommonSettings.php: Re-enable VipsScaler (2 of 2) (duration: 01m 04s)
22:59 legoktm@deploy1002: Synchronized wmf-config/InitialiseSettings.php: Re-enable VipsScaler (1 of 2) (duration: 01m 05s)
22:58 mwdebug-deploy@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
22:53 mwdebug-deploy@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
22:43 legoktm: legoktm@cumin2001:~$ sudo systemctl reset-failed # clear httpbb_hourly_tests failure, moved to cumin1001
22:34 legoktm@deploy1002: Finished scap: Rebuild i18n for redeployment of VipsScaler (T290759) (duration: 23m 49s)
22:34 mwdebug-deploy@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
22:29 mwdebug-deploy@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
22:12 mwdebug-deploy@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
22:11 legoktm@deploy1002: Started scap: Rebuild i18n for redeployment of VipsScaler (T290759)
22:10 mwdebug-deploy@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
20:26 mwdebug-deploy@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
20:23 mwdebug-deploy@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
20:20 dancy: testing upcoming Scap release on beta
20:20 ladsgroup@deploy1002: Synchronized wmf-config/InitialiseSettings.php: Config: Early adopt wgIncludejQueryMigrate=false on nlwiki (T280944) (duration: 01m 48s)
20:06 cdanis: T290425 ✔️ cdanis@alert1001.wikimedia.org ~ 🕓🍵 sudo /usr/bin/statograph -c /etc/statograph/config.yml erase_metric_data lyfcttm2lhw4
20:06 cdanis: T290425 ✔️ cdanis@alert1001.wikimedia.org ~ 🕓🍵 sudo /usr/bin/statograph -c /etc/statograph/config.yml erase_metric_data h5mvbny28713
19:19 mwdebug-deploy@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
19:16 mwdebug-deploy@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
19:08 hashar@deploy1002: rebuilt and synchronized wikiversions files: group0 wikis to 1.37.0-wmf.23
18:48 moritzm: removed filter for tcp/25 on mx2001, reimage is complete T286911
18:33 mwdebug-deploy@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
18:32 mwdebug-deploy@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
18:24 mwdebug-deploy@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
18:23 mwdebug-deploy@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
18:20 urbanecm@deploy1002: Synchronized wmf-config/InitialiseSettings.php: 2982638: Offer the DiscussionTools reply tool as opt-out setting at ptwikinews (T285162) (duration: 01m 06s)
18:16 urbanecm@deploy1002: Synchronized wmf-config/InitialiseSettings.php: 7f1de32: Offer the DiscussionTools reply tool as opt-out setting at Wikimania wiki (T284339) (duration: 01m 05s)
18:15 mwdebug-deploy@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
18:14 mwdebug-deploy@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
18:13 urbanecm@deploy1002: Synchronized wmf-config/InitialiseSettings.php: e36f4d3: DiscussionTools: Make newtopictool available to everyone on arwiki and cswiki (T285724) (duration: 01m 04s)
18:09 urbanecm@deploy1002: Synchronized debug.json: Idef64e72 (duration: 01m 29s)
18:06 mwdebug-deploy@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
18:05 mwdebug-deploy@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
17:54 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on mx2001.wikimedia.org with reason: reimage
17:54 jmm@cumin2002: START - Cookbook sre.hosts.downtime for 1:00:00 on mx2001.wikimedia.org with reason: reimage
17:45 moritzm: reimaging mx2001 to bullseye T286911
16:57 mwdebug-deploy@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
16:55 mwdebug-deploy@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
16:32 cmjohnson@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
16:28 cmjohnson@cumin1001: START - Cookbook sre.dns.netbox
16:28 cmjohnson@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
16:24 cmjohnson@cumin1001: START - Cookbook sre.dns.netbox
16:20 cmjohnson@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
16:16 cmjohnson@cumin1001: START - Cookbook sre.dns.netbox
15:53 hnowlan@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 5:00:00 on maps1010.eqiad.wmnet with reason: Resyncing from master
15:53 hnowlan@cumin1001: START - Cookbook sre.hosts.downtime for 5:00:00 on maps1010.eqiad.wmnet with reason: Resyncing from master
15:51 hnowlan@puppetmaster1001: conftool action : set/pooled=no; selector: name=maps1010.eqiad.wmnet
15:43 mwdebug-deploy@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
15:41 mwdebug-deploy@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
15:34 mwdebug-deploy@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
15:32 mwdebug-deploy@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
15:19 kormat@cumin1001: END (PASS) - Cookbook sre.hosts.remove-downtime (exit_code=0) for 37 hosts
15:19 kormat@cumin1001: START - Cookbook sre.hosts.remove-downtime for 37 hosts
15:11 jelto@cumin2002: END (PASS) - Cookbook sre.switchdc.mediawiki.09-update-tendril (exit_code=0)
15:11 jelto@cumin2002: START - Cookbook sre.switchdc.mediawiki.09-update-tendril
15:10 jelto@cumin2002: END (PASS) - Cookbook sre.switchdc.mediawiki.09-run-puppet-on-db-masters (exit_code=0)
15:07 cmjohnson@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
15:07 jelto@cumin2002: START - Cookbook sre.switchdc.mediawiki.09-run-puppet-on-db-masters
15:06 jelto@cumin2002: END (PASS) - Cookbook sre.switchdc.mediawiki.09-restore-ttl (exit_code=0)
15:05 jelto@cumin2002: START - Cookbook sre.switchdc.mediawiki.09-restore-ttl
15:04 marostegui@cumin1001: dbctl commit (dc=all): 'Increase db1109 load', diff saved to https://phabricator.wikimedia.org/P17271 and previous config saved to /var/cache/conftool/dbconfig/20210914-150458-marostegui.json
15:03 cmjohnson@cumin1001: START - Cookbook sre.dns.netbox
15:00 jelto@cumin2002: END (PASS) - Cookbook sre.switchdc.mediawiki.08-start-maintenance (exit_code=0)
14:58 jelto@cumin2002: START - Cookbook sre.switchdc.mediawiki.08-start-maintenance
14:55 marostegui@cumin1001: dbctl commit (dc=all): 'Reduce db1109 load', diff saved to https://phabricator.wikimedia.org/P17270 and previous config saved to /var/cache/conftool/dbconfig/20210914-145522-marostegui.json
14:54 jelto@cumin2002: END (PASS) - Cookbook sre.switchdc.mediawiki.01-stop-maintenance (exit_code=0)
14:54 jelto@cumin2002: START - Cookbook sre.switchdc.mediawiki.01-stop-maintenance
14:53 jelto@cumin2002: END (ERROR) - Cookbook sre.switchdc.mediawiki.08-start-maintenance (exit_code=97)
14:53 marostegui@cumin1001: dbctl commit (dc=all): 'Reduce db1109 load', diff saved to https://phabricator.wikimedia.org/P17269 and previous config saved to /var/cache/conftool/dbconfig/20210914-145324-marostegui.json
14:52 jelto@cumin2002: START - Cookbook sre.switchdc.mediawiki.08-start-maintenance
14:49 jelto@cumin2002: END (FAIL) - Cookbook sre.switchdc.mediawiki.08-restart-envoy-on-jobrunners (exit_code=99)
14:49 cmjohnson@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
14:49 jelto@cumin2002: START - Cookbook sre.switchdc.mediawiki.08-restart-envoy-on-jobrunners
14:46 cmjohnson@cumin1001: START - Cookbook sre.dns.netbox
14:46 jelto@cumin2002: END (PASS) - Cookbook sre.switchdc.mediawiki.07-set-readwrite (exit_code=0)
14:46 jelto@cumin2002: MediaWiki read-only period ends at: 2021-09-14 14:46:30.570035
14:45 jelto@cumin2002: START - Cookbook sre.switchdc.mediawiki.07-set-readwrite
14:45 jelto@cumin2002: END (PASS) - Cookbook sre.switchdc.mediawiki.06-set-db-readwrite (exit_code=0)
14:45 jelto@cumin2002: START - Cookbook sre.switchdc.mediawiki.06-set-db-readwrite
14:45 jelto@cumin2002: END (PASS) - Cookbook sre.switchdc.mediawiki.05-invert-redis-sessions (exit_code=0)
14:45 jelto@cumin2002: START - Cookbook sre.switchdc.mediawiki.05-invert-redis-sessions
14:45 jelto@cumin2002: END (PASS) - Cookbook sre.switchdc.mediawiki.04-switch-mediawiki (exit_code=0)
14:44 jelto@cumin2002: START - Cookbook sre.switchdc.mediawiki.04-switch-mediawiki
14:44 jelto@cumin2002: END (PASS) - Cookbook sre.switchdc.mediawiki.03-set-db-readonly (exit_code=0)
14:44 jelto@cumin2002: START - Cookbook sre.switchdc.mediawiki.03-set-db-readonly
14:44 jelto@cumin2002: END (PASS) - Cookbook sre.switchdc.mediawiki.02-set-readonly (exit_code=0)
14:43 jelto@cumin2002: MediaWiki read-only period starts at: 2021-09-14 14:43:48.272827
14:43 jelto@cumin2002: START - Cookbook sre.switchdc.mediawiki.02-set-readonly
14:40 kormat@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on 37 hosts with reason: DC switchover
14:40 kormat@cumin1001: START - Cookbook sre.hosts.downtime for 1:00:00 on 37 hosts with reason: DC switchover
14:39 jelto@cumin2002: END (PASS) - Cookbook sre.switchdc.mediawiki.01-stop-maintenance (exit_code=0)
14:39 jelto@cumin2002: START - Cookbook sre.switchdc.mediawiki.01-stop-maintenance
14:34 jelto@cumin2002: END (PASS) - Cookbook sre.switchdc.mediawiki.00-warmup-caches (exit_code=0)
14:32 jelto@cumin2002: START - Cookbook sre.switchdc.mediawiki.00-warmup-caches
14:30 jelto@cumin2002: END (PASS) - Cookbook sre.switchdc.mediawiki.00-reduce-ttl (exit_code=0)
14:24 jelto@cumin2002: START - Cookbook sre.switchdc.mediawiki.00-reduce-ttl
14:22 jelto@cumin2002: END (PASS) - Cookbook sre.switchdc.mediawiki.00-disable-puppet (exit_code=0)
14:22 jelto@cumin2002: START - Cookbook sre.switchdc.mediawiki.00-disable-puppet
14:19 mwdebug-deploy@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
14:12 mwdebug-deploy@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
14:10 legoktm@deploy1002: Synchronized wmf-config/CommonSettings.php: Avoid warning about undefined $wgFileBlacklist (T290640) (duration: 01m 32s)
13:44 mbsantos@deploy1002: Finished deploy [kartotherian/deploy@0a38bc5]: kartotherian: restore v4 maxzoom to z15 (duration: 00m 10s)
13:43 mbsantos@deploy1002: Started deploy [kartotherian/deploy@0a38bc5]: kartotherian: restore v4 maxzoom to z15
13:43 mbsantos@deploy1002: Finished deploy [kartotherian/deploy@79bc0c6]: geoshapes: update table names (duration: 00m 14s)
13:42 mbsantos@deploy1002: Started deploy [kartotherian/deploy@79bc0c6]: geoshapes: update table names
13:27 mbsantos@deploy1002: Finished deploy [kartotherian/deploy@0a38bc5]: kartotherian: restore v4 maxzoom to z15 (duration: 00m 10s)
13:27 mbsantos@deploy1002: Started deploy [kartotherian/deploy@0a38bc5]: kartotherian: restore v4 maxzoom to z15
13:26 mbsantos@deploy1002: Finished deploy [kartotherian/deploy@1ebdca4]: (no justification provided) (duration: 00m 15s)
13:26 mbsantos@deploy1002: Started deploy [kartotherian/deploy@1ebdca4]: (no justification provided)
12:32 jelto@deploy1002: helmfile [codfw] DONE helmfile.d/admin 'apply'.
12:32 jelto@deploy1002: helmfile [codfw] START helmfile.d/admin 'apply'.
12:29 jelto@deploy1002: helmfile [eqiad] DONE helmfile.d/admin 'apply'.
12:29 jelto@deploy1002: helmfile [eqiad] START helmfile.d/admin 'apply'.
12:19 jelto@deploy1002: helmfile [staging-eqiad] DONE helmfile.d/admin 'apply'.
12:19 jelto@deploy1002: helmfile [staging-eqiad] START helmfile.d/admin 'apply'.
12:17 jelto@deploy1002: helmfile [staging-codfw] DONE helmfile.d/admin 'apply'.
12:17 jelto@deploy1002: helmfile [staging-codfw] START helmfile.d/admin 'apply'.
11:33 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ganeti2026.codfw.wmnet
11:22 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host ganeti2026.codfw.wmnet
10:47 jmm@cumin2002: END (PASS) - Cookbook sre.ganeti.makevm (exit_code=0) for new host testvm2001.codfw.wmnet
10:31 jmm@cumin2002: START - Cookbook sre.ganeti.makevm for new host testvm2001.codfw.wmnet
10:05 hashar@deploy1002: Pruned MediaWiki: 1.37.0-wmf.20 (duration: 01m 48s)
09:47 hashar@deploy1002: Pruned MediaWiki: 1.37.0-wmf.19 (duration: 04m 13s)
09:40 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.decommission (exit_code=0) for hosts testvm2001.codfw.wmnet
09:38 hashar@deploy1002: Finished scap: testwikis wikis to 1.37.0-wmf.23 (duration: 70m 39s)
09:29 jmm@cumin2002: START - Cookbook sre.hosts.decommission for hosts testvm2001.codfw.wmnet
09:21 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.decommission (exit_code=0) for hosts testvm2002.codfw.wmnet
09:10 jmm@cumin2002: START - Cookbook sre.hosts.decommission for hosts testvm2002.codfw.wmnet
09:09 Emperor: swift rebalance to remove h/w faulty host ms-be2045 T290881
09:04 mwdebug-deploy@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
08:57 mwdebug-deploy@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
08:47 moritzm: installing testvm2002
08:42 jmm@cumin2002: END (PASS) - Cookbook sre.ganeti.makevm (exit_code=0) for new host testvm2002.codfw.wmnet
08:28 jmm@cumin2002: START - Cookbook sre.ganeti.makevm for new host testvm2002.codfw.wmnet
08:27 hashar@deploy1002: Started scap: testwikis wikis to 1.37.0-wmf.23
08:25 godog: poweroff ms-be2045 and set it as failed in netbox - T290881
08:24 hashar: train: applied security patches for 1.37.0-wmf.23 # T281164
08:05 godog: wipe non-os partitions from ms-be2045 - T290881
07:50 vgutierrez: update acme-chief to version 0.31 on acmechief hosts - T290249
04:47 eileen: civicrm revision changed from 1f071f6c6c to e6bf81d99c, config revision is 23eda8ba3a
02:41 mwdebug-deploy@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
02:39 mwdebug-deploy@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
02:07 mwdebug-deploy@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
02:07 James_F: wmf/1.37.0-wmf.23 was branched at ea72c9b for T281164
02:03 mwdebug-deploy@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
00:04 mwdebug-deploy@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
00:01 mwdebug-deploy@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .

2021-09-13

23:54 mwdebug-deploy@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
23:52 mwdebug-deploy@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
23:45 jforrester@deploy1002: Synchronized wmf-config/InitialiseSettings.php: T290759: Undeploy VipsScaler: III – Don't set wmgUseVips, now ignored (duration: 00m 58s)
23:45 mwdebug-deploy@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
23:43 mwdebug-deploy@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
23:41 jforrester@deploy1002: Synchronized wmf-config/CommonSettings.php: T290759: Undeploy VipsScaler: II – Don't load regardless of config (duration: 00m 58s)
19:52 jforrester@deploy1002: Synchronized wmf-config/InitialiseSettings.php: T290759 Undeploy VipsScaler: I – Disable on all wikis (duration: 00m 57s)
19:49 mwdebug-deploy@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
19:47 mwdebug-deploy@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
19:04 mwdebug-deploy@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
18:59 mwdebug-deploy@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
18:59 urbanecm: [urbanecm@mwmaint2002 ~]$ mwscript resetAuthenticationThrottle.php --wiki={cswiki,cswikiversity} --signup --ip=185.47.223.49 # T290809
18:58 urbanecm@deploy1002: Synchronized wmf-config/throttle.php: 9db1d1a: Add throttle rule for Czech wiki course (T290809) (duration: 00m 58s)
18:29 ryankemper: [Cirrus] `eqiad` fully recovered (100% of shards), `codfw` at 99.816%. `codfw` is getting held up by recovery of `enwiki` shards which tend to be quite large
18:25 razzi: reenable replication on dbstore1007 for T290841
18:16 cwhite: apply high log volume from ES mitigations to deprecated inputs
18:13 razzi: razzi@dbstore1007:~$ sudo systemctl restart mariadb@s3.service for T290841
18:05 razzi: sudo systemctl restart mariadb@s2.service
17:48 ryankemper: [Cirrus] `eqiad` is at 99.13% shards recovered and `codfw` is at 98.83%
17:20 volans@cumin1001: END (PASS) - Cookbook sre.experimental.reimage (exit_code=0) for host sretest1002.eqiad.wmnet
17:17 ryankemper: [Cirrus] `enwiki` searches appear to be working now. `production-search-eqiad` is at 93.5% recovered shards, `production-search-codfw` is at 95.3% recovered
16:57 volans@cumin1001: START - Cookbook sre.experimental.reimage for host sretest1002.eqiad.wmnet
16:18 legoktm@cumin1001: conftool action : set/pooled=false; selector: name=codfw,dnsdisc=eventgate-main
16:16 volans@cumin1001: conftool action : set/pooled=yes; selector: name=mw1414.*
16:08 volans@cumin1001: conftool action : set/pooled=no; selector: name=mw1414.*
16:06 volans@cumin1001: END (PASS) - Cookbook sre.experimental.reimage (exit_code=0) for host mw1414.eqiad.wmnet
15:54 moritzm: filtered mx2001 on the routers for reimage T286911
15:43 vgutierrez: update acme-chief to version 0.31 on acmechief-test hosts - T290249
15:40 vgutierrez: upload acme-chief 0.31 to apt.wm.o (buster) - T290249
15:32 jelto: Traffic: depool codfw from user traffic
15:26 jelto@cumin2002: END (PASS) - Cookbook sre.switchdc.services.02-restore-ttl (exit_code=0)
15:25 jelto@cumin2002: START - Cookbook sre.switchdc.services.02-restore-ttl
15:25 volans@cumin1001: START - Cookbook sre.experimental.reimage for host mw1414.eqiad.wmnet
15:20 Emperor: rebooting ms-be2045 to see if that brings the disk back properly T290881
15:13 jelto@cumin2002: conftool action : set/pooled=false; selector: name=eqiad,dnsdisc=restbase-async
15:13 legoktm: (cotd.) box-constraints|similar-users|termbox|thanos-query|thanos-swift|wdqs|wdqs-internal|wikifeeds|zotero)
15:13 rzl: (contd.) box-constraints|similar-users|termbox|thanos-query|thanos-swift|wdqs|wdqs-internal|wikifeeds|zotero)
15:12 jelto@cumin2002: conftool action : set/pooled=true; selector: name=codfw,dnsdisc=(apertium|api-gateway|citoid|cxserver|echostore|eventgate-analytics|eventgate-analytics-external|eventgate-logging-external|eventgate-main|eventstreams|eventstreams-internal|kartotherian|linkrecommendation|mathoid|mobileapps|ores|proton|push-notifications|recommendation-api|restbase|restbase-async|schema|search|sessionstore|shellbox|shell
15:02 jelto@cumin2002: END (PASS) - Cookbook sre.switchdc.services.00-reduce-ttl-and-sleep (exit_code=0)
15:02 topranks: Restarting unused line-card FPC 1 in cr2-codfw in attempt to clear alarm.
14:56 jelto@cumin2002: START - Cookbook sre.switchdc.services.00-reduce-ttl-and-sleep
14:44 herron: drained mx2001 mail queue to mx1001 T286911
14:38 dcausse: restarting wdqs-updater.service on all wdqs servers
14:21 jelto@cumin2002: END (PASS) - Cookbook sre.switchdc.services.02-restore-ttl (exit_code=0)
14:20 jelto@cumin2002: START - Cookbook sre.switchdc.services.02-restore-ttl
14:13 jelto@cumin2002: END (PASS) - Cookbook sre.switchdc.services.01-switch-dc (exit_code=0)
14:13 legoktm: (cotd.) ternal, eventgate-main, wikifeeds, eventstreams-internal, eventgate-analytics-external: codfw => eqiad
14:12 jelto@cumin2002: Switching services echostore, termbox, cxserver, eventstreams, search, ores, mathoid, schema, push-notifications, thanos-swift, wdqs, sessionstore, restbase, wdqs-internal, apertium, eventgate-analytics, citoid, api-gateway, restbase-async, proton, linkrecommendation, thanos-query, shellbox, kartotherian, mobileapps, recommendation-api, zotero, similar-users, shellbox-constraints, eventgate-logging-ex
14:12 jelto@cumin2002: START - Cookbook sre.switchdc.services.01-switch-dc
14:11 jelto@cumin2002: END (PASS) - Cookbook sre.switchdc.services.00-reduce-ttl-and-sleep (exit_code=0)
14:05 jelto@cumin2002: START - Cookbook sre.switchdc.services.00-reduce-ttl-and-sleep
14:03 dzahn@cumin1001: END (PASS) - Cookbook sre.ganeti.makevm (exit_code=0) for new host durum3002.esams.wmnet
13:51 dzahn@cumin1001: START - Cookbook sre.ganeti.makevm for new host durum3002.esams.wmnet
13:50 dzahn@cumin1001: END (PASS) - Cookbook sre.ganeti.makevm (exit_code=0) for new host durum3001.esams.wmnet
13:39 dzahn@cumin1001: START - Cookbook sre.ganeti.makevm for new host durum3001.esams.wmnet
13:36 dzahn@cumin1001: END (PASS) - Cookbook sre.ganeti.makevm (exit_code=0) for new host durum2002.codfw.wmnet
13:21 dzahn@cumin1001: START - Cookbook sre.ganeti.makevm for new host durum2002.codfw.wmnet
13:20 dzahn@cumin1001: END (PASS) - Cookbook sre.ganeti.makevm (exit_code=0) for new host durum2001.codfw.wmnet
13:08 dzahn@cumin1001: START - Cookbook sre.ganeti.makevm for new host durum2001.codfw.wmnet
12:09 volans@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
12:03 volans@cumin1001: START - Cookbook sre.dns.netbox
11:32 mwdebug-deploy@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
11:27 mwdebug-deploy@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
11:26 kostajh: European mid-day backport window deploys done
11:24 kharlan@deploy1002: Synchronized wmf-config: Config: WikimediaEvents: Remove UnderstandingFirstDay config (duration: 00m 59s)
10:51 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.decommission (exit_code=0) for hosts testvm2002.codfw.wmnet
10:43 jmm@cumin2002: START - Cookbook sre.hosts.decommission for hosts testvm2002.codfw.wmnet
10:15 volans@cumin1001: END (FAIL) - Cookbook sre.experimental.reimage (exit_code=93) for host mw1414.eqiad.wmnet
09:33 volans: restarting tcpircbot-logmsgbot on alert1001, not relying messages
09:18 elukey: upgrade rsyslog* on ml-serve* nodes to 8.1901.0-1+wmf2
09:16 godog: swift eqiad-prod: add weight to ms-be10[64-67] - T290546
09:11 moritzm: reimaging sretest1002
09:11 elukey: upload rsyslog* 8.1901.0-1+wmf2 to buster-wikimedia component/rsyslog-k8s - T277739
08:16 godog: bump +100G prometheus/ops codfw

2021-09-12

18:33 vgutierrez: restart varnish-fe on cp3061, cp3063 and cp3065
18:29 vgutierrez: restart varnish on cp3055
18:26 vgutierrez: restart varnish on cp3057
04:53 mwdebug-deploy@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
04:52 mwdebug-deploy@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .

2021-09-11

19:02 urbanecm@deploy1002: Synchronized wmf-config/InitialiseSettings.php: 27814b8: testwiki: Fully remove securepoll-related groups (T290808) (duration: 00m 57s)
18:35 urbanecm: [urbanecm@mwmaint2002 ~]$ mwscript emptyUserGroup.php --wiki=testwiki {electionadmin,electcomm} # T290808
18:31 urbanecm@deploy1002: Synchronized wmf-config/InitialiseSettings.php: 908bbf3: Revert "test: Add electcomm and electionadmin groups" (T290808) (duration: 00m 58s)

2021-09-10

21:28 legoktm@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'shellbox-syntaxhighlight' for release 'main' .
21:27 legoktm@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'shellbox-syntaxhighlight' for release 'main' .
21:21 legoktm@deploy1002: helmfile [staging] Ran 'sync' command on namespace 'shellbox-syntaxhighlight' for release 'main' .
20:46 jhuneidi@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'blubberoid' for release 'production' .
20:44 jhuneidi@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'blubberoid' for release 'production' .
20:42 jhuneidi@deploy1002: helmfile [staging] Ran 'sync' command on namespace 'blubberoid' for release 'staging' .
18:34 volans@cumin1001: END (FAIL) - Cookbook sre.experimental.reimage (exit_code=99) for host sretest1001.eqiad.wmnet
18:08 volans@cumin1001: START - Cookbook sre.experimental.reimage for host sretest1001.eqiad.wmnet
17:16 pt1979@cumin2002: END (FAIL) - Cookbook sre.hosts.downtime (exit_code=99) for 2:00:00 on puppetmaster2005.codfw.wmnet with reason: REIMAGE
17:14 pt1979@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on puppetmaster2005.codfw.wmnet with reason: REIMAGE
16:42 pt1979@cumin2002: END (FAIL) - Cookbook sre.hosts.downtime (exit_code=99) for 2:00:00 on puppetmaster2004.codfw.wmnet with reason: REIMAGE
16:40 pt1979@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on puppetmaster2004.codfw.wmnet with reason: REIMAGE
16:14 volans@cumin1001: END (FAIL) - Cookbook sre.experimental.reimage (exit_code=99) for host sretest1001.eqiad.wmnet
16:03 volans@cumin1001: START - Cookbook sre.experimental.reimage for host sretest1001.eqiad.wmnet
15:39 volans@cumin1001: END (FAIL) - Cookbook sre.experimental.reimage (exit_code=99) for host sretest1001.eqiad.wmnet
15:27 volans@cumin1001: START - Cookbook sre.experimental.reimage for host sretest1001.eqiad.wmnet
14:48 jiji@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
14:43 jiji@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
13:54 jiji@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
09:31 XioNoX: push pfw policies - T290611
09:07 mutante: planet - deleted all state files for all languages, running fresh update via systemctl start for all languages after proxy changes (T285251)
08:37 jynus: upgrade and restart db2139
08:14 elukey@deploy1002: helmfile [ml-serve-eqiad] DONE helmfile.d/admin 'sync'.
08:14 elukey@deploy1002: helmfile [ml-serve-eqiad] START helmfile.d/admin 'sync'.
08:14 elukey@deploy1002: helmfile [ml-serve-eqiad] DONE helmfile.d/admin 'sync'.
08:13 elukey@deploy1002: helmfile [ml-serve-eqiad] START helmfile.d/admin 'sync'.
08:12 elukey@deploy1002: helmfile [ml-serve-eqiad] DONE helmfile.d/admin 'sync'.
08:12 elukey@deploy1002: helmfile [ml-serve-eqiad] START helmfile.d/admin 'sync'.
07:58 jayme: updating rsyslog to 8.1901.0-1~bpo9+wmf2 on kubernetes-workers - T289766
07:57 moritzm: installing ntfs-3g security updates
07:46 jayme@deploy1002: helmfile [ml-serve-eqiad] DONE helmfile.d/admin 'sync'.
07:45 jayme@deploy1002: helmfile [ml-serve-eqiad] START helmfile.d/admin 'sync'.
07:31 elukey@deploy1002: helmfile [ml-serve-eqiad] DONE helmfile.d/admin 'sync'.
07:31 elukey@deploy1002: helmfile [ml-serve-eqiad] START helmfile.d/admin 'sync'.
07:25 jayme: updating rsyslog to 8.1901.0-1~bpo9+wmf2 on kubernetes-staging - T289766
07:19 jayme: importes rsyslog 8.1901.0-1~bpo9+wmf2 to stretch-wikimedia - T289766
06:56 effie: disable puppet on deploy1002 and mw2254
06:29 jayme@deploy1002: helmfile [codfw] DONE helmfile.d/admin 'sync'.
06:27 jayme@deploy1002: helmfile [codfw] START helmfile.d/admin 'sync'.
06:26 jayme@deploy1002: helmfile [eqiad] DONE helmfile.d/admin 'sync'.
06:26 jayme@deploy1002: helmfile [eqiad] START helmfile.d/admin 'sync'.
06:02 elukey@puppetmaster1001: conftool action : set/pooled=inactive; selector: name=mw2280.codfw.wmnet
05:59 jiji@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
05:56 elukey: powercycle mw2280 - no tty available in mgmt, no ssh, host frozen
05:55 elukey@puppetmaster1001: conftool action : set/pooled=no; selector: name=mw2280.codfw.wmnet
05:54 jiji@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
05:45 jiji@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
05:42 jiji@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
05:12 marostegui: Repool clouddb1017:3311
05:12 marostegui: Repool clouddb1013:3311
04:49 marostegui: Depool clouddb1013:3311
04:49 marostegui: Depool clouddb1017:3311
02:52 eileen: civicrm revision changed from 83f514f693 to 1f071f6c6c, config revision is 23eda8ba3a
00:35 tgr: Deployed patch for T290692

2021-09-09

23:07 brennen: no takers on patches, ending backport & config training window.
21:31 pt1979@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
21:27 pt1979@cumin2002: START - Cookbook sre.dns.netbox
21:20 pt1979@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
21:17 pt1979@cumin2002: START - Cookbook sre.dns.netbox
21:02 pt1979@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
20:58 pt1979@cumin2002: START - Cookbook sre.dns.netbox
19:40 jiji@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
19:37 jiji@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
19:06 mwdebug-deploy@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
19:04 jiji@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
18:37 mwdebug-deploy@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
18:34 mwdebug-deploy@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
18:33 urbanecm@deploy1002: Synchronized wmf-config/InitialiseSettings.php: bc4f204: Growth: Push 44 wikis out of dark mode (T289680) (duration: 00m 57s)
18:24 urbanecm@deploy1002: Synchronized wmf-config/InitialiseSettings.php: 6af38d9: Deploy Growth features in dark modes to ~200 wikis (T290582; 3/3) (duration: 00m 57s)
18:22 urbanecm@deploy1002: Synchronized wmf-config/config/: 6af38d9: Deploy Growth features in dark modes to ~200 wikis (T290582; 2/3) (duration: 01m 01s)
18:21 urbanecm@deploy1002: Synchronized dblists/growthexperiments.dblist: 6af38d9: Deploy Growth features in dark modes to ~200 wikis (T290582; 1/3) (duration: 00m 58s)
18:21 jayme@deploy1002: helmfile [staging-eqiad] DONE helmfile.d/admin 'sync'.
18:20 jayme@deploy1002: helmfile [staging-eqiad] START helmfile.d/admin 'sync'.
18:20 urbanecm@deploy1002: sync-file aborted: 6af38d9: Deploy Growth features in dark modes to ~200 wikis (T290582) (duration: 00m 05s)
18:18 volans@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on sretest1001.eqiad.wmnet with reason: REIMAGE
18:18 jayme@deploy1002: helmfile [staging-codfw] DONE helmfile.d/admin 'sync'.
18:17 mwdebug-deploy@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
18:17 jayme@deploy1002: helmfile [staging-codfw] START helmfile.d/admin 'sync'.
18:16 volans@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on sretest1001.eqiad.wmnet with reason: REIMAGE
18:13 mwdebug-deploy@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
18:12 urbanecm: [urbanecm@mwmaint2002 /srv/mediawiki/php]$ foreachwikiindblist growthexperiments extensions/GrowthExperiments/maintenance/initWikiConfig.php --phab=T290582 | tee ~/initwikiconfig.out # T290582
18:11 urbanecm: Run extensions/WikimediaMaintenance/createExtensionTables.php growthexperiments for wikis in P17258 (T290582)
18:06 mwdebug-deploy@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
18:05 mwdebug-deploy@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
18:05 urbanecm@deploy1002: Synchronized wmf-config/config: no-op: 76c51f2: Standardize indentation in several .yaml files (duration: 00m 58s)
17:29 jelto@cumin1001: END (PASS) - Cookbook sre.switchdc.mediawiki.08-update-tendril (exit_code=0)
17:28 jelto@cumin1001: START - Cookbook sre.switchdc.mediawiki.08-update-tendril
17:28 jelto@cumin1001: END (PASS) - Cookbook sre.switchdc.mediawiki.08-start-maintenance (exit_code=0)
17:26 jelto@cumin1001: START - Cookbook sre.switchdc.mediawiki.08-start-maintenance
17:25 jelto@cumin1001: END (PASS) - Cookbook sre.switchdc.mediawiki.08-run-puppet-on-db-masters (exit_code=0)
17:22 jelto@cumin1001: START - Cookbook sre.switchdc.mediawiki.08-run-puppet-on-db-masters
17:21 jelto@cumin1001: END (PASS) - Cookbook sre.switchdc.mediawiki.08-restore-ttl (exit_code=0)
17:21 jelto@cumin1001: START - Cookbook sre.switchdc.mediawiki.08-restore-ttl
17:21 jelto@cumin1001: END (PASS) - Cookbook sre.switchdc.mediawiki.08-restart-envoy-on-jobrunners (exit_code=0)
17:20 jelto@cumin1001: START - Cookbook sre.switchdc.mediawiki.08-restart-envoy-on-jobrunners
17:14 jelto@cumin1001: END (PASS) - Cookbook sre.switchdc.mediawiki.07-set-readwrite (exit_code=0)
17:14 jelto@cumin1001: [DRY-RUN] MediaWiki read-only period ends at: 2021-09-09 17:14:12.502162
17:14 jelto@cumin1001: START - Cookbook sre.switchdc.mediawiki.07-set-readwrite
17:14 jelto@cumin1001: END (PASS) - Cookbook sre.switchdc.mediawiki.06-set-db-readwrite (exit_code=0)
17:14 jelto@cumin1001: START - Cookbook sre.switchdc.mediawiki.06-set-db-readwrite
17:13 jelto@cumin1001: END (PASS) - Cookbook sre.switchdc.mediawiki.05-invert-redis-sessions (exit_code=0)
17:13 jelto@cumin1001: START - Cookbook sre.switchdc.mediawiki.05-invert-redis-sessions
17:13 jelto@cumin1001: END (PASS) - Cookbook sre.switchdc.mediawiki.04-switch-mediawiki (exit_code=0)
17:13 jelto@cumin1001: START - Cookbook sre.switchdc.mediawiki.04-switch-mediawiki
17:13 jelto@cumin1001: END (PASS) - Cookbook sre.switchdc.mediawiki.03-set-db-readonly (exit_code=0)
17:13 jelto@cumin1001: START - Cookbook sre.switchdc.mediawiki.03-set-db-readonly
17:12 jelto@cumin1001: END (PASS) - Cookbook sre.switchdc.mediawiki.02-set-readonly (exit_code=0)
17:12 jelto@cumin1001: [DRY-RUN] MediaWiki read-only period starts at: 2021-09-09 17:12:27.974410
17:12 jelto@cumin1001: START - Cookbook sre.switchdc.mediawiki.02-set-readonly
17:08 jelto@cumin1001: END (PASS) - Cookbook sre.switchdc.mediawiki.01-stop-maintenance (exit_code=0)
17:07 jelto@cumin1001: START - Cookbook sre.switchdc.mediawiki.01-stop-maintenance
17:07 jelto@cumin1001: END (PASS) - Cookbook sre.switchdc.mediawiki.00-warmup-caches (exit_code=0)
17:04 jelto@cumin1001: START - Cookbook sre.switchdc.mediawiki.00-warmup-caches
17:04 jelto@cumin1001: END (PASS) - Cookbook sre.switchdc.mediawiki.00-reduce-ttl (exit_code=0)
16:58 jelto@cumin1001: START - Cookbook sre.switchdc.mediawiki.00-reduce-ttl
16:58 jelto@cumin1001: END (PASS) - Cookbook sre.switchdc.mediawiki.00-disable-puppet (exit_code=0)
16:58 jelto@cumin1001: START - Cookbook sre.switchdc.mediawiki.00-disable-puppet
16:57 jelto: start cookbook sre.switchdc.mediawiki eqiad codfw --live-test this will generate some additional SAL logs here
16:41 cmjohnson@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
16:36 cmjohnson@cumin1001: START - Cookbook sre.dns.netbox
16:33 cmjohnson@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
16:23 cmjohnson@cumin1001: START - Cookbook sre.dns.netbox
16:10 volans@cumin1001: END (FAIL) - Cookbook sre.experimental.reimage (exit_code=99) for host sretest1001.eqiad.wmnet
16:00 volans@cumin1001: START - Cookbook sre.experimental.reimage for host sretest1001.eqiad.wmnet
15:34 volans@cumin1001: END (FAIL) - Cookbook sre.experimental.reimage (exit_code=99) for host sretest1001.eqiad.wmnet
15:32 mwdebug-deploy@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
15:29 mwdebug-deploy@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
15:28 dancy@deploy1002: Synchronized .pipeline/config.yaml: Config: pipeline: add comment redirecting to correct file (duration: 00m 59s)
15:24 volans@cumin1001: START - Cookbook sre.experimental.reimage for host sretest1001.eqiad.wmnet
14:47 mutante: planet - deleting all state and lock files for the "en" feeds (T285251 T289984)
14:34 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host mx2002.wikimedia.org
14:31 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host mx2002.wikimedia.org
14:25 jmm@cumin2002: END (PASS) - Cookbook sre.idm.logout (exit_code=0) Logging Muehlenhoff out of all services on: 2 hosts
14:25 jmm@cumin2002: START - Cookbook sre.idm.logout Logging Muehlenhoff out of all services on: 2 hosts
14:19 jmm@cumin2002: END (PASS) - Cookbook sre.idm.logout (exit_code=0) Logging Muehlenhoff out of all services on: 2 hosts
14:19 jmm@cumin2002: START - Cookbook sre.idm.logout Logging Muehlenhoff out of all services on: 2 hosts
14:11 hnowlan@puppetmaster1001: conftool action : set/pooled=yes; selector: name=maps1007.eqiad.wmnet
13:48 jmm@cumin2002: END (ERROR) - Cookbook sre.hosts.reboot-single (exit_code=97) for host mx2002.wikimedia.org
13:16 mwdebug-deploy@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
13:11 mutante: planet1002 - re-enabling disabled puppet
13:06 jmm@cumin2002: END (PASS) - Cookbook sre.idm.logout (exit_code=0) Logging Muehlenhoff out of all services on: 2 hosts
13:06 jmm@cumin2002: START - Cookbook sre.idm.logout Logging Muehlenhoff out of all services on: 2 hosts
13:05 jmm@cumin2002: END (PASS) - Cookbook sre.idm.logout (exit_code=0) Logging Muehlenhoff out of all services on: 2 hosts
13:05 jmm@cumin2002: START - Cookbook sre.idm.logout Logging Muehlenhoff out of all services on: 2 hosts
13:03 jmm@cumin2002: END (PASS) - Cookbook sre.idm.logout (exit_code=0) Logging Muehlenhoff out of all services on: 2 hosts
13:03 jmm@cumin2002: START - Cookbook sre.idm.logout Logging Muehlenhoff out of all services on: 2 hosts
13:01 mwdebug-deploy@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
12:56 mwdebug-deploy@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
10:49 hnowlan@puppetmaster1001: conftool action : set/pooled=no; selector: name=maps1007.eqiad.wmnet
10:48 hnowlan@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 5:00:00 on maps1007.eqiad.wmnet with reason: Resyncing from master
10:48 hnowlan@cumin1001: START - Cookbook sre.hosts.downtime for 5:00:00 on maps1007.eqiad.wmnet with reason: Resyncing from master
10:48 hnowlan@puppetmaster1001: conftool action : set/pooled=yes; selector: name=maps1007.eqiad.wmnet
10:48 hnowlan@puppetmaster1001: conftool action : set/pooled=yes; selector: name=maps1006.eqiad.wmnet
10:47 topranks: Removing peering to old IPs of AS139931 (BSCCL) at Equinix Singapore (cr3-eqsin).
10:45 topranks: Removing peering to AS24218 at Equinix Singapore (cr3-eqsin) - network no longer uses this ASN.
10:22 volans: upgrading spicerack on cumin1001
10:20 volans@cumin2002: END (FAIL) - Cookbook sre.hosts.decommission (exit_code=1) for hosts mc1027.eqiad.wmnet
10:10 volans@cumin2002: START - Cookbook sre.hosts.decommission for hosts mc1027.eqiad.wmnet
09:56 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host mx2002.wikimedia.org
09:47 volans@cumin2002: END (ERROR) - Cookbook sre.hosts.decommission (exit_code=97) for hosts mc1027.eqiad.wmnet
09:46 volans@cumin2002: START - Cookbook sre.hosts.decommission for hosts mc1027.eqiad.wmnet
09:37 godog: swift eqiad add ms-be10[64-67] with initial weight - T290546
09:19 filippo@puppetmaster1001: conftool action : set/pooled=false; selector: dnsdisc=swift-ro,name=eqiad
09:19 filippo@puppetmaster1001: conftool action : set/pooled=false; selector: dnsdisc=swift,name=eqiad
09:15 volans: rebooting sretest1001 to test ipmi reboot via spicerack
09:15 volans@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 0:20:00 on sretest1001.eqiad.wmnet with reason: testing reboot via ipmi
09:15 volans@cumin2002: START - Cookbook sre.hosts.downtime for 0:20:00 on sretest1001.eqiad.wmnet with reason: testing reboot via ipmi
09:13 btullis@cumin1001: END (PASS) - Cookbook sre.aqs.roll-restart (exit_code=0) for AQS aqs cluster: Roll restart of all AQS's nodejs daemons. - btullis@cumin1001
09:09 btullis@cumin1001: START - Cookbook sre.aqs.roll-restart for AQS aqs cluster: Roll restart of all AQS's nodejs daemons. - btullis@cumin1001
08:59 godog: move swift traffic fully to codfw to rebalance eqiad - T287539
08:59 filippo@puppetmaster1001: conftool action : set/pooled=true; selector: dnsdisc=swift,name=codfw
08:58 filippo@puppetmaster1001: conftool action : set/pooled=true; selector: dnsdisc=swift-ro,name=codfw
08:56 volans: upgrading spicerack on cumin2002 to test the new release
08:50 volans: uploaded spicerack_0.0.59 to apt.wikimedia.org buster-wikimedia,bullseye-wikimedia
08:23 jelto: run ansible change 719041 on gitlab1001
08:13 jelto: run ansible change 719041 on gitlab2001
07:07 dzahn@cumin1001: END (PASS) - Cookbook sre.ganeti.makevm (exit_code=0) for new host durum1002.eqiad.wmnet
06:47 dzahn@cumin1001: START - Cookbook sre.ganeti.makevm for new host durum1002.eqiad.wmnet
04:37 ryankemper: [WDQS] Dispatched e-mail to the banned user agent (dailymotion)
03:57 ryankemper: [WDQS] Dispatched e-mail to WDQS public mailing list informing them the outage is over; all that's left is the e-mail to the banned UA
03:47 ryankemper: [WDQS] Restarting `wdqs-blazegraph` on `wdqs[2001-2008].codfw.wmnet`; if banning the dailymotion UA was sufficient then servers should come back up healthy and not drop back into deadlock
03:43 ryankemper: [WDQS] Running puppet agent on `wdqs[2001-2008].codfw.wmnet` to roll out https://gerrit.wikimedia.org/r/719753
03:29 ryankemper: [WDQS] There's no clear indication of them being a culprit, but by far the most common user agent is a dailymotion VideocatalogTopic UA (see https://logstash.wikimedia.org/goto/51f238e9010d0220e5d33c6c210be93e)
03:12 bstorm: attempting to start replication on clouddb1017 s1 T290630
03:11 bstorm: stopping and restarting mariadb on clouddb1017 s1
03:04 ryankemper: [WDQS] Dispatched email to Wikidata public mailing list about reduced service availability
02:36 ryankemper: [WDQS] https://grafana.wikimedia.org/d/000000489/wikidata-query-service?viewPanel=7&orgId=1&from=1631152574841&to=1631154942992 shows the availability pattern, anywhere we see missing data (null) represents time that blazegraph was locked up and therefore unable to report metrics
02:34 ryankemper: [WDQS] For context I glanced at `ryankemper@cumin1001:~$ sudo -E cumin 'P{wdqs2*}' 'sudo systemctl status wdqs-blazegraph'` before doing the aforementioned restarts and they'd all last restarted between 25-28 minutes ago
02:33 ryankemper: [WDQS] Restarting `wdqs-blazegraph` across all of `wdqs2*`
00:50 legoktm@deploy1002: Synchronized wmf-config/CommonSettings.php: Don't set default to Score (try #2) (duration: 00m 58s)
00:48 legoktm@deploy1002: Synchronized php-1.37.0-wmf.21/extensions/Score/includes/Score.php: Use the 'score' Shellbox if configured (T290193) (duration: 00m 57s)
00:46 legoktm@deploy1002: Synchronized php-1.37.0-wmf.21/includes/shell/CommandFactory.php: shell: Fix $wgShellboxUrls by passing service name when creating BoxedCommand (T290193) (duration: 00m 58s)
00:45 legoktm@deploy1002: sync-file aborted: shell: Fix $wgShellboxUrls by passing service name when creating BoxedCommand (T290193 (duration: 00m 07s)
00:15 legoktm@deploy1002: Synchronized wmf-config/CommonSettings.php: Remove putenv() for GDFONTPATH (duration: 00m 58s)

2021-09-08

22:34 ryankemper: WDQS] T280247 Ran puppet-agent on `miscweb*` following merge of https://gerrit.wikimedia.org/r/c/wikidata/query/gui-deploy/+/717649
22:24 ryankemper: WDQS] T280247 Ran puppet-agent on `miscweb*` following merge of https://gerrit.wikimedia.org/r/c/wikidata/query/gui-deploy/+/714623
21:55 ryankemper: [WDQS] T280247 Purged varnish to make sure change took effect: `echo 'https://query-preview.wikidata.org/' | mwscript purgeList.php` and `echo 'https://query.wikidata.org/' | mwscript purgeList.php` on `mwmaint1002`
21:53 ryankemper: [WDQS] T280247 Merged https://gerrit.wikimedia.org/r/c/operations/puppet/+/719502 and ran puppet-agent on `miscweb*`
20:49 eileen: civicrm revision changed from 593d01f4fc to 83f514f693, config revision is 23eda8ba3a
20:41 legoktm: Successfully published image docker-registry.discovery.wmnet/php7.2-fpm-multiversion-base:1.0.2
19:25 Krinkle: krinkle@mw1369 Running some benchmarks in Eqiad on load.php
18:27 urbanecm@deploy1002: Synchronized wmf-config/config/itwiki.yaml: 6bcbe61: Italian Wikipedia is now a group 1 wiki (T286664; 2/2) (duration: 00m 58s)
18:26 urbanecm@deploy1002: Synchronized dblists/: 6bcbe61: Italian Wikipedia is now a group 1 wiki (T286664; 1/2) (duration: 00m 58s)
18:10 urbanecm@deploy1002: Synchronized wmf-config/InitialiseSettings.php: bbefce6: Growth: Remove config that moved on-wiki (T290295) (duration: 00m 58s)
18:03 urbanecm@deploy1002: Synchronized wmf-config/InitialiseSettings.php: 950a377: Stop setting $wgAbuseFilterParserClass (T239990) (duration: 00m 58s)
17:01 hnowlan@cumin1001: END (PASS) - Cookbook sre.hosts.decommission (exit_code=0) for hosts maps2004.codfw.wmnet
16:53 hnowlan@cumin1001: START - Cookbook sre.hosts.decommission for hosts maps2004.codfw.wmnet
16:52 hnowlan@cumin1001: END (PASS) - Cookbook sre.hosts.decommission (exit_code=0) for hosts maps2003.codfw.wmnet
16:37 hnowlan@cumin1001: START - Cookbook sre.hosts.decommission for hosts maps2003.codfw.wmnet
16:37 hnowlan@cumin1001: END (PASS) - Cookbook sre.hosts.decommission (exit_code=0) for hosts maps2001.codfw.wmnet
16:33 urbanecm@deploy1002: Synchronized php-1.37.0-wmf.21/extensions/GrowthExperiments/maintenance/updateMenteeData.php: 796e23c: updateMenteeData.php: Make it possible to force update (duration: 00m 58s)
16:28 ladsgroup@deploy1002: Synchronized wmf-config/InitialiseSettings.php: Config: Turn off jQuery migrate on wikisource wikis (T280944) (duration: 00m 59s)
16:23 hnowlan@cumin1001: START - Cookbook sre.hosts.decommission for hosts maps2001.codfw.wmnet
16:22 hnowlan@puppetmaster1001: conftool action : set/pooled=no; selector: name=maps1006.eqiad.wmnet
16:14 hnowlan@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 20:00:00 on maps1006.eqiad.wmnet with reason: Resyncing from master
16:14 hnowlan@cumin1001: START - Cookbook sre.hosts.downtime for 20:00:00 on maps1006.eqiad.wmnet with reason: Resyncing from master
16:13 hnowlan@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 5:00:00 on maps1006.eqiad.wmnet with reason: Resyncing from master
16:13 hnowlan@cumin1001: START - Cookbook sre.hosts.downtime for 5:00:00 on maps1006.eqiad.wmnet with reason: Resyncing from master
16:13 hnowlan@puppetmaster1001: conftool action : set/pooled=yes; selector: name=maps1005.eqiad.wmnet
15:43 filippo@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ms-be1067.eqiad.wmnet
15:43 filippo@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ms-be1065.eqiad.wmnet
15:43 filippo@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ms-be1066.eqiad.wmnet
15:41 filippo@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ms-be1064.eqiad.wmnet
15:38 filippo@cumin1001: START - Cookbook sre.hosts.reboot-single for host ms-be1067.eqiad.wmnet
15:37 filippo@cumin1001: START - Cookbook sre.hosts.reboot-single for host ms-be1065.eqiad.wmnet
15:37 filippo@cumin1001: START - Cookbook sre.hosts.reboot-single for host ms-be1066.eqiad.wmnet
15:37 filippo@cumin1001: START - Cookbook sre.hosts.reboot-single for host ms-be1064.eqiad.wmnet
14:57 marostegui: Retroactive: started to warm up eqiad databaes
14:57 moritzm: installing 4.19.194 kernels on stretch systems with 4.19.x (no reboots yet)
14:54 brennen: gitlab: upgrading gitlab2001, followed by gitlab1001, to 14.2.3 (T289802)
14:53 robh@cumin1001: END (FAIL) - Cookbook sre.hosts.downtime (exit_code=99) for 2:00:00 on ms-be1067.eqiad.wmnet with reason: REIMAGE
14:51 robh@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on ms-be1067.eqiad.wmnet with reason: REIMAGE
14:33 moritzm: installing zeromq3 security updates
13:50 mbsantos@deploy1002: Finished deploy [kartotherian/deploy@eb211ac]: kartotherian: restore v4 maxzoom to z15 (duration: 06m 42s)
13:44 mbsantos@deploy1002: Started deploy [kartotherian/deploy@eb211ac]: kartotherian: restore v4 maxzoom to z15
13:38 brennen: gitlab: upgrading gitlab2001, followed by gitlab1001, to 14.1.5 (T289802)
13:13 brennen: gitlab1001: downtiming alerts for 2.5 hours; upgrading to 14.0.10 (T289802)
12:45 brennen: gitlab: pausing all runners in preparation for upgrade to 14.0.10 (T289802)
11:57 moritzm: installing curl security updates on stretch
11:09 jbond: upload statograph_0.1.2
11:02 hnowlan@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 5:00:00 on maps1005.eqiad.wmnet with reason: Resyncing from master
11:01 hnowlan@cumin1001: START - Cookbook sre.hosts.downtime for 5:00:00 on maps1005.eqiad.wmnet with reason: Resyncing from master
11:01 hnowlan@puppetmaster1001: conftool action : set/pooled=no; selector: name=maps1005.eqiad.wmnet
10:06 jelto: upgrade gitlab2001 to gitlab-ce=14.0.10-ce.0
10:03 jelto@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on gitlab2001.wikimedia.org with reason: upgrade gitlab2001 to new version https://phabricator.wikmiedia.org/T289802
10:03 jelto@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on gitlab2001.wikimedia.org with reason: upgrade gitlab2001 to new version https://phabricator.wikmiedia.org/T289802
09:38 godog: start rollout of prometheus-rsyslog-exporter 0.0.0+git20201008-3 to wikimedia.org - T210137
09:29 godog: start rollout of prometheus-rsyslog-exporter 0.0.0+git20201008-3 to codfw - T210137
09:09 godog: start rollout of prometheus-rsyslog-exporter 0.0.0+git20201008-3 to eqiad - T210137
07:45 godog: start rollout of prometheus-rsyslog-exporter 0.0.0+git20201008-3 to eqsin/esams/ulsfo - T210137
06:46 ryankemper: [WDQS] Manually running puppet-agent on `miscweb2002.codfw.wmnet,miscweb1002.eqiad.wmnet`
06:45 ryankemper: [WDQS] Merged https://gerrit.wikimedia.org/r/c/operations/puppet/+/719185 to rollback query.wikidata.org changes
02:59 eileen: civicrm revision changed from 06ef98593f to 593d01f4fc, config revision is 5f004d94d7
00:00 legoktm: legoktm@lists1001:~$ sudo rm -rf /etc/mailman # cleanup as part of 4869d91b0be / T282303

2021-09-07

23:25 robh@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
23:20 robh@cumin1001: START - Cookbook sre.dns.netbox
23:13 ladsgroup@deploy1002: Synchronized wmf-config/InitialiseSettings.php: Config: Enable UrlShortener everywhere (T267925) (duration: 00m 58s)
23:07 dpifke@deploy1002: Synchronized wmf-config/profiler.php: Config: profiler: use seperate pipeline inside k8s pods (T288165) (duration: 00m 58s)
22:29 cstone: SmashPig revision changed from afd362b163 to 3607b16f83
20:41 ladsgroup@deploy1002: Synchronized wmf-config/InitialiseSettings.php: Config: Set $wgWBRepoSettings['tmpNormalizeDataValues'] on all wikis (T251480) (duration: 00m 59s)
20:31 pt1979@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
20:27 pt1979@cumin2002: START - Cookbook sre.dns.netbox
17:18 jgiannelos@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'push-notifications' for release 'main' .
17:09 jgiannelos@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'push-notifications' for release 'main' .
17:01 jgiannelos@deploy1002: helmfile [staging] Ran 'sync' command on namespace 'push-notifications' for release 'main' .
16:39 moritzm: installing jetty9 security updates on buster
16:30 dzahn@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 5 days, 8:00:00 on planet1002.eqiad.wmnet with reason: known issue
16:30 dzahn@cumin1001: START - Cookbook sre.hosts.downtime for 5 days, 8:00:00 on planet1002.eqiad.wmnet with reason: known issue
16:30 dancy@deploy1002: Synchronized README: testing (duration: 00m 59s)
15:18 akosiaris: run_benchmarky.py against mwdebug.svc.codfw.wmnet for performance tests
15:07 akosiaris@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
15:04 jbond: upload python-prometheus-client_0.6.0 to stretch-wikimedia
14:50 mutante: snapshot1015 - manually removed prometheus-puppet-agent-stats from crontab which was sending spam and is now a timer
14:33 mutante: CI - migrating zuul-merger cronjob to systemd timer (contint*)
14:23 XioNoX: re-pool esams-eqiad - T288503
14:23 cmjohnson@cumin1001: END (FAIL) - Cookbook sre.hosts.downtime (exit_code=99) for 2:00:00 on cloudcephosd1024.eqiad.wmnet with reason: REIMAGE
14:23 cmjohnson@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on cloudcephosd1024.eqiad.wmnet with reason: REIMAGE
14:22 cmjohnson@cumin1001: END (FAIL) - Cookbook sre.hosts.downtime (exit_code=99) for 2:00:00 on cloudcephosd1023.eqiad.wmnet with reason: REIMAGE
14:22 cmjohnson@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on cloudcephosd1023.eqiad.wmnet with reason: REIMAGE
14:17 marostegui: No more db maintenance on eqiad T288594
14:08 mutante: alert1001 - temp disabled puppet, stopped icinga-wm
14:07 mutante: temp killed icinga-wm because of flooding
14:01 Emperor: removing pc2010 from orchestrator T289117
13:59 Emperor: removing pc2010 from tendril and zarcillo T289117
13:57 pt1979@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
13:57 XioNoX: drain esams-eqiad for circuit maintenance - T288503
13:54 pt1979@cumin2002: START - Cookbook sre.dns.netbox
13:51 jayme: uncordoned kubestage2001
13:50 jiji@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
13:49 mutante: mw2264 - scap pulled and repooled after T290242
13:49 dzahn@cumin1001: conftool action : set/pooled=yes; selector: name=mw2264.codfw.wmnet
13:43 jiji@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
13:40 mvernon@cumin1001: END (PASS) - Cookbook sre.hosts.decommission (exit_code=0) for hosts pc2010.codfw.wmnet
13:25 mvernon@cumin1001: START - Cookbook sre.hosts.decommission for hosts pc2010.codfw.wmnet
13:21 Emperor: removing pc2009 from orchestrator T289116
13:21 Emperor: removing pc2009 from tendril and zarcillo T289116
13:02 marostegui@cumin1001: dbctl commit (dc=all): 'fix s8 weights T288594', diff saved to https://phabricator.wikimedia.org/P17248 and previous config saved to /var/cache/conftool/dbconfig/20210907-130244-marostegui.json
12:59 mvernon@cumin1001: END (PASS) - Cookbook sre.hosts.decommission (exit_code=0) for hosts pc2009.codfw.wmnet
12:51 mvernon@deploy1002: Synchronized wmf-config/ProductionServices.php: Remove old decommissioned pc hosts T284825 (duration: 01m 02s)
12:45 mvernon@cumin1001: START - Cookbook sre.hosts.decommission for hosts pc2009.codfw.wmnet
12:27 marostegui@cumin1001: dbctl commit (dc=all): 'fix s1 weights T288594', diff saved to https://phabricator.wikimedia.org/P17247 and previous config saved to /var/cache/conftool/dbconfig/20210907-122747-marostegui.json
12:27 marostegui@cumin1001: dbctl commit (dc=all): 'fix s1 weights T288594', diff saved to https://phabricator.wikimedia.org/P17246 and previous config saved to /var/cache/conftool/dbconfig/20210907-122708-marostegui.json
11:46 btullis@cumin1001: END (PASS) - Cookbook sre.hosts.remove-downtime (exit_code=0) for 6 hosts
11:46 btullis@cumin1001: START - Cookbook sre.hosts.remove-downtime for 6 hosts
11:36 awight: EU backport complete
11:33 awight@deploy1002: Synchronized php-1.37.0-wmf.21/extensions/CodeMirror/extension.json: Backport: Change line numbers default to null (T290226) (duration: 00m 59s)
11:28 awight@deploy1002: Synchronized wmf-config/InitialiseSettings.php: Config: Set template namespace for code mirror line numbering (T290226) (duration: 00m 59s)
10:51 Emperor: removing pc2008 from orchestrator T289115
10:49 Emperor: removing pc2008 from tendril and zarcillo T289115
10:46 mvernon@cumin1001: END (PASS) - Cookbook sre.hosts.decommission (exit_code=0) for hosts pc2008.codfw.wmnet
10:35 mvernon@cumin1001: START - Cookbook sre.hosts.decommission for hosts pc2008.codfw.wmnet
10:29 btullis@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 5 days, 0:00:00 on 6 hosts with reason: commissioning aqs_new hosts
10:29 btullis@cumin1001: START - Cookbook sre.hosts.downtime for 5 days, 0:00:00 on 6 hosts with reason: commissioning aqs_new hosts
10:29 btullis@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 5 days, 0:00:00 on aqs1010.eqiad.wmnet with reason: commissioning aqs_new hosts
10:29 btullis@cumin1001: START - Cookbook sre.hosts.downtime for 5 days, 0:00:00 on aqs1010.eqiad.wmnet with reason: commissioning aqs_new hosts
10:27 Emperor: removing pc1010 from orchestrator T289122
10:22 Emperor: removing pc1010 from tendril and zarcillo T289122
10:15 mvernon@cumin1001: END (PASS) - Cookbook sre.hosts.decommission (exit_code=0) for hosts pc1010.eqiad.wmnet
10:02 mvernon@cumin1001: START - Cookbook sre.hosts.decommission for hosts pc1010.eqiad.wmnet
09:46 Emperor: removing pc1009 from orchestrator T289120
09:26 Emperor: removing pc1009 from tendril and zarcillo T289120
09:25 mvernon@cumin1001: END (PASS) - Cookbook sre.hosts.decommission (exit_code=0) for hosts pc1009.eqiad.wmnet
09:16 mvernon@cumin1001: START - Cookbook sre.hosts.decommission for hosts pc1009.eqiad.wmnet
08:57 elukey@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
08:53 elukey@cumin1001: START - Cookbook sre.dns.netbox
08:51 Emperor: removing pc1008 from orchestrator T289119
08:44 Emperor: removing pc1008 from tendril and zarcillo T289119
08:42 mvernon@cumin1001: END (PASS) - Cookbook sre.hosts.decommission (exit_code=0) for hosts pc1008.eqiad.wmnet
08:31 mvernon@cumin1001: START - Cookbook sre.hosts.decommission for hosts pc1008.eqiad.wmnet
08:29 marostegui@cumin1001: dbctl commit (dc=all): 'More weight for db2090 into API T288803', diff saved to https://phabricator.wikimedia.org/P17241 and previous config saved to /var/cache/conftool/dbconfig/20210907-082952-marostegui.json
08:25 elukey@deploy1002: helmfile [ml-serve-eqiad] DONE helmfile.d/admin 'sync'.
08:25 elukey@deploy1002: helmfile [ml-serve-eqiad] START helmfile.d/admin 'sync'.
08:25 elukey@deploy1002: helmfile [ml-serve-eqiad] DONE helmfile.d/admin 'sync'.
08:24 elukey@deploy1002: helmfile [ml-serve-eqiad] START helmfile.d/admin 'sync'.
08:02 marostegui@cumin1001: dbctl commit (dc=all): 'db2090 (re)pooling @ 100%: Slowly repool T288803', diff saved to https://phabricator.wikimedia.org/P17240 and previous config saved to /var/cache/conftool/dbconfig/20210907-080230-root.json
07:52 kormat@cumin1001: dbctl commit (dc=all): 'db2118 (re)pooling @ 100%: reimage to buster (now with fixed pool config) T288244', diff saved to https://phabricator.wikimedia.org/P17239 and previous config saved to /var/cache/conftool/dbconfig/20210907-075235-kormat.json
07:49 marostegui@cumin1001: dbctl commit (dc=all): 'More weight for db2090 into API T288803', diff saved to https://phabricator.wikimedia.org/P17238 and previous config saved to /var/cache/conftool/dbconfig/20210907-074901-marostegui.json
07:47 marostegui@cumin1001: dbctl commit (dc=all): 'db2090 (re)pooling @ 75%: Slowly repool T288803', diff saved to https://phabricator.wikimedia.org/P17237 and previous config saved to /var/cache/conftool/dbconfig/20210907-074726-root.json
07:37 kormat@cumin1001: dbctl commit (dc=all): 'db2118 (re)pooling @ 75%: reimage to buster (now with fixed pool config) T288244', diff saved to https://phabricator.wikimedia.org/P17236 and previous config saved to /var/cache/conftool/dbconfig/20210907-073731-kormat.json
07:37 godog: +100G for prometheus/k8s codfw
07:34 marostegui@cumin1001: dbctl commit (dc=all): 'Start to pool db2090 into API T288803', diff saved to https://phabricator.wikimedia.org/P17235 and previous config saved to /var/cache/conftool/dbconfig/20210907-073436-marostegui.json
07:32 marostegui@cumin1001: dbctl commit (dc=all): 'db2090 (re)pooling @ 50%: Slowly repool T288803', diff saved to https://phabricator.wikimedia.org/P17234 and previous config saved to /var/cache/conftool/dbconfig/20210907-073222-root.json
07:22 kormat@cumin1001: dbctl commit (dc=all): 'db2118 (re)pooling @ 50%: reimage to buster (now with fixed pool config) T288244', diff saved to https://phabricator.wikimedia.org/P17233 and previous config saved to /var/cache/conftool/dbconfig/20210907-072227-kormat.json
07:17 marostegui@cumin1001: dbctl commit (dc=all): 'db2090 (re)pooling @ 25%: Slowly repool T288803', diff saved to https://phabricator.wikimedia.org/P17232 and previous config saved to /var/cache/conftool/dbconfig/20210907-071719-root.json
07:13 jayme@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'wikifeeds' for release 'production' .
07:13 jayme@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'wikifeeds' for release 'staging' .
07:07 kormat@cumin1001: dbctl commit (dc=all): 'db2118 (re)pooling @ 25%: reimage to buster (now with fixed pool config) T288244', diff saved to https://phabricator.wikimedia.org/P17231 and previous config saved to /var/cache/conftool/dbconfig/20210907-070724-kormat.json
07:07 kormat@cumin1001: dbctl commit (dc=all): 'Fixing db2118's pooling config T288244', diff saved to https://phabricator.wikimedia.org/P17230 and previous config saved to /var/cache/conftool/dbconfig/20210907-070702-kormat.json
07:02 marostegui@cumin1001: dbctl commit (dc=all): 'db2090 (re)pooling @ 10%: Slowly repool T288803', diff saved to https://phabricator.wikimedia.org/P17229 and previous config saved to /var/cache/conftool/dbconfig/20210907-070215-root.json
06:47 marostegui@cumin1001: dbctl commit (dc=all): 'db2090 (re)pooling @ 5%: Slowly repool T288803', diff saved to https://phabricator.wikimedia.org/P17228 and previous config saved to /var/cache/conftool/dbconfig/20210907-064711-root.json
05:15 marostegui: Optimize eowiki.flaggedtemplates in eqiad T290057
05:15 marostegui: Optimize vecwiki.flaggedtemplates in eqiad T290057
05:14 marostegui: Optimize kawiki.flaggedtemplates in eqiad T290057

2021-09-06

23:52 tstarling@deploy1002: Synchronized php-1.37.0-wmf.21/extensions/SecurePoll/includes/Talliers/STVTallier.php: T290000 (duration: 00m 58s)
16:14 Amir1: Deployed patch for T290394
15:01 Emperor: removing pc1007 from orchestrator T289118
15:00 jiji@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
14:53 kormat@cumin1001: dbctl commit (dc=all): 'db2118 (re)pooling @ 25%: reimage to buster T288244', diff saved to https://phabricator.wikimedia.org/P17226 and previous config saved to /var/cache/conftool/dbconfig/20210906-145341-kormat.json
14:50 Emperor: removing pc1007 from tendril and zarcillo T289118
14:45 mvernon@cumin1001: END (PASS) - Cookbook sre.hosts.decommission (exit_code=0) for hosts pc1007.eqiad.wmnet
14:45 volans@cumin1001: END (ERROR) - Cookbook sre.hosts.decommission (exit_code=97) for hosts mc1026.eqiad.wmnet
14:44 volans@cumin1001: START - Cookbook sre.hosts.decommission for hosts mc1026.eqiad.wmnet
14:36 volans@cumin1001: END (FAIL) - Cookbook sre.hosts.decommission (exit_code=1) for hosts mc1027.eqiad.wmnet
14:35 mvernon@cumin1001: START - Cookbook sre.hosts.decommission for hosts pc1007.eqiad.wmnet
14:22 volans@cumin1001: START - Cookbook sre.hosts.decommission for hosts mc1027.eqiad.wmnet
14:19 ladsgroup@deploy1002: Synchronized wmf-config/InitialiseSettings.php: Config: Set permission of creating short url to everyone everywhere (T267921 T267925), Part II (duration: 00m 57s)
14:17 ladsgroup@deploy1002: Synchronized wmf-config/CommonSettings.php: Config: Set permission of creating short url to everyone everywhere (T267921 T267925), Part I (duration: 00m 59s)
14:12 moritzm: installing postgres 9.6 security updates
14:05 gehel: re-pooling wdqs1007, catched up on lag
13:56 jbond: update facter networking fact gerrit:715949
13:51 jiji@deploy1002: Synchronized wmf-config/ProductionServices.php: Config: ProductionServices: fix comment for rdb* servers (duration: 00m 58s)
13:42 moritzm: updated thirdparty/gitlab component to 14.0.10 T284811
13:04 jiji@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
12:42 jiji@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
12:42 akosiaris@deploy1002: helmfile [eqiad] DONE helmfile.d/admin 'apply'.
12:42 akosiaris@deploy1002: helmfile [eqiad] START helmfile.d/admin 'apply'.
12:41 akosiaris@deploy1002: helmfile [codfw] DONE helmfile.d/admin 'sync'.
12:40 akosiaris@deploy1002: helmfile [codfw] START helmfile.d/admin 'sync'.
12:29 jiji@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
12:06 godog: silence statograph until thurs on alert1001 - T290425
11:58 urbanecm: [urbanecm@mwmaint2002 ~]$ mwscript renameRestrictions.php --wiki=plwiki 'editor' 'editeditorprotected' # T230103
11:56 urbanecm: [urbanecm@mwmaint2002 ~]$ mwscript renameRestrictions.php --wiki={hewiki,lvwiki,srwiki,srwikibooks} 'autopatrol' 'editautopatrolprotected' # T230103
11:53 urbanecm: [urbanecm@mwmaint2002 ~]$ mwscript renameRestrictions.php --wiki=etwiki 'autopatrol' 'editautopatrolprotected' # T230103
11:50 urbanecm: [urbanecm@mwmaint2002 ~]$ mwscript renameRestrictions.php --wiki=dewiktionary 'autoreviewprotected' 'editautoreviewprotected' # T230103
11:48 urbanecm: [urbanecm@mwmaint2002 ~]$ mwscript renameRestrictions.php --wiki=arwiki 'autoreview' 'editautoreviewprotected' # T230103
11:07 urbanecm: EU B&C window done
11:06 urbanecm@deploy1002: Synchronized wmf-config/InitialiseSettings.php: c8d7cf8: foundationwiki: Create editor group (T205352) (duration: 00m 57s)
11:04 urbanecm@deploy1002: Synchronized wmf-config/InitialiseSettings.php: f90862b: Growth: Define wgGEMentorDashboardDiscoveryEnabled (T289054) (duration: 00m 58s)
11:02 urbanecm@deploy1002: Synchronized php-1.37.0-wmf.21/maintenance/renameRestrictions.php: 18e43ec: renameRestrictions.php: Update protected_titles as well (T290398) (duration: 00m 59s)
10:39 volans@cumin1001: END (ERROR) - Cookbook sre.hosts.decommission (exit_code=97) for hosts mc1027.eqiad.wmnet
10:38 volans@cumin1001: START - Cookbook sre.hosts.decommission for hosts mc1027.eqiad.wmnet
10:22 volans@cumin1001: END (FAIL) - Cookbook sre.hosts.decommission (exit_code=1) for hosts mc1027.eqiad.wmnet
10:17 volans@cumin1001: START - Cookbook sre.hosts.decommission for hosts mc1027.eqiad.wmnet
09:22 gehel: depooling wdqs1007, catching up on lag
09:06 gehel: restart blazegraph and updater on wdqs1007
08:46 jbond: update networking fact - gerrit:715943
07:57 godog: fail sdw on ms-be1062, reported errors
07:51 moritzm: installing libssh security updates
07:45 elukey@deploy1002: helmfile [ml-serve-eqiad] DONE helmfile.d/admin 'sync'.
07:45 elukey@deploy1002: helmfile [ml-serve-eqiad] START helmfile.d/admin 'sync'.
07:44 moritzm: installing squashfs-tools security updates
06:56 elukey@deploy1002: helmfile [ml-serve-eqiad] DONE helmfile.d/admin 'sync'.
06:56 elukey@deploy1002: helmfile [ml-serve-eqiad] START helmfile.d/admin 'sync'.
06:28 marostegui: Optimize table mkwiki.flaggedtemplates in eqiad T290057
06:26 marostegui: Optimize table bewiki.flaggedtemplates in eqiad T290057
06:23 marostegui: Optimize table dewiki.flaggedtemplates in eqiad T290057
05:34 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on db2090.codfw.wmnet with reason: REIMAGE
05:32 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on db2090.codfw.wmnet with reason: REIMAGE
05:07 marostegui: Stop replication on db2090 (old s4 master) T289650 T288803
05:05 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db2110 (current master) from API T289650', diff saved to https://phabricator.wikimedia.org/P17223 and previous config saved to /var/cache/conftool/dbconfig/20210906-050502-marostegui.json
05:04 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db2090 T289650', diff saved to https://phabricator.wikimedia.org/P17222 and previous config saved to /var/cache/conftool/dbconfig/20210906-050419-marostegui.json
05:01 marostegui@cumin1001: dbctl commit (dc=all): 'Promote db2110 to s4 primary and set section read-write T289650', diff saved to https://phabricator.wikimedia.org/P17221 and previous config saved to /var/cache/conftool/dbconfig/20210906-050140-root.json
05:00 marostegui@cumin1001: dbctl commit (dc=all): 'Set s4 codfw as read-only for maintenance - T289650', diff saved to https://phabricator.wikimedia.org/P17220 and previous config saved to /var/cache/conftool/dbconfig/20210906-050048-root.json
05:00 marostegui: Starting s4 codfw failover from db2090 to db2110 - T289650
04:07 marostegui@cumin1001: dbctl commit (dc=all): 'Set db2110 with weight 0 T289650', diff saved to https://phabricator.wikimedia.org/P17219 and previous config saved to /var/cache/conftool/dbconfig/20210906-040740-root.json
04:07 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on 33 hosts with reason: Primary switchover s4 T289650
04:06 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on 33 hosts with reason: Primary switchover s4 T289650

2021-09-05

18:54 urbanecm: wikiadmin@10.192.0.119(ptwiki)> update protected_titles set pt_create_perm='editautoreviewprotected' where pt_create_perm='autoreviewer'; # T290396

2021-09-04

13:35 marostegui@cumin1001: dbctl commit (dc=all): 'db2137:3314 (re)pooling @ 100%: Slowly repool T290374', diff saved to https://phabricator.wikimedia.org/P17217 and previous config saved to /var/cache/conftool/dbconfig/20210904-133532-root.json
13:20 marostegui@cumin1001: dbctl commit (dc=all): 'db2137:3314 (re)pooling @ 75%: Slowly repool T290374', diff saved to https://phabricator.wikimedia.org/P17216 and previous config saved to /var/cache/conftool/dbconfig/20210904-132029-root.json
13:05 marostegui@cumin1001: dbctl commit (dc=all): 'db2137:3314 (re)pooling @ 50%: Slowly repool T290374', diff saved to https://phabricator.wikimedia.org/P17215 and previous config saved to /var/cache/conftool/dbconfig/20210904-130525-root.json
12:50 marostegui@cumin1001: dbctl commit (dc=all): 'db2137:3314 (re)pooling @ 25%: Slowly repool T290374', diff saved to https://phabricator.wikimedia.org/P17214 and previous config saved to /var/cache/conftool/dbconfig/20210904-125021-root.json
12:35 marostegui@cumin1001: dbctl commit (dc=all): 'db2137:3314 (re)pooling @ 10%: Slowly repool T290374', diff saved to https://phabricator.wikimedia.org/P17213 and previous config saved to /var/cache/conftool/dbconfig/20210904-123518-root.json
12:20 marostegui@cumin1001: dbctl commit (dc=all): 'db2137:3314 (re)pooling @ 5%: Slowly repool T290374', diff saved to https://phabricator.wikimedia.org/P17212 and previous config saved to /var/cache/conftool/dbconfig/20210904-122014-root.json
09:04 elukey: restart wmf_auto_restart_rsyslog.service on puppetdb1002
09:00 elukey: `systemctl reset-failed ifup@ens6.service` on puppetdb2002 - T273026
03:02 rzl@cumin2001: dbctl commit (dc=all): 'Depool db2137:3314', diff saved to https://phabricator.wikimedia.org/P17210 and previous config saved to /var/cache/conftool/dbconfig/20210904-030231-rzl.json

2021-09-03

21:49 bd808@deploy1002: helmfile [staging] Ran 'sync' command on namespace 'toolhub' for release 'main' .
20:30 bd808@deploy1002: helmfile [staging] Ran 'sync' command on namespace 'toolhub' for release 'main' .
19:33 krinkle@deploy1002: Finished deploy [integration/docroot@6492b3d]: I48480e89e5f6 (duration: 00m 10s)
19:33 krinkle@deploy1002: Started deploy [integration/docroot@6492b3d]: I48480e89e5f6
19:26 bd808@deploy1002: helmfile [staging] Ran 'sync' command on namespace 'toolhub' for release 'main' .
19:04 ryankemper: T290330 `ryankemper@cumin1001:~$ sudo -E cumin 'P{wdqs2*}' 'sudo rm -fv /etc/cron.hourly/restart-blazegraph'` (Cleaned up manually created crons now that we have [somewhat hacky] systemd timers doing the same job)
17:42 dduvall@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'blubberoid' for release 'production' .
17:40 dduvall@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'blubberoid' for release 'production' .
17:35 dduvall@deploy1002: helmfile [staging] Ran 'sync' command on namespace 'blubberoid' for release 'staging' .
17:17 ryankemper: T290330 Deployed https://gerrit.wikimedia.org/r/c/operations/puppet/+/717508 across `wdqs` fleet; codfw wdqs hosts will restart on average once per hour now to address ongoing availability issues for wdqs codfw
16:32 bd808@deploy1002: helmfile [staging] Ran 'sync' command on namespace 'toolhub' for release 'main' .
16:10 gehel: blazegraph (public cofdfw cluster) will now restart every hour - T290330
15:53 jbond: enable puppet fleet wide to post puppetdb database maintance - T263578
15:21 jbond: create lvm snapshot puppetdb2002_data_snapshot on ganeti2023 - T263578
15:17 jbond: create lvm snapshot puppetdb1002_data_snapshot on ganeti1012 - T263578
15:00 jbond: disable puppet fleet wide to preform puppetdb database maintance - T263578
14:58 elukey@deploy1002: helmfile [ml-serve-eqiad] DONE helmfile.d/admin 'sync'.
14:58 elukey@deploy1002: helmfile [ml-serve-eqiad] START helmfile.d/admin 'sync'.
14:35 pt1979@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
14:29 pt1979@cumin2002: START - Cookbook sre.dns.netbox
14:20 mutante: mw2264 - scap pull
14:18 elukey@deploy1002: helmfile [ml-serve-eqiad] DONE helmfile.d/admin 'sync'.
14:18 elukey@deploy1002: helmfile [ml-serve-eqiad] START helmfile.d/admin 'sync'.
13:11 jiji@cumin1001: END (FAIL) - Cookbook sre.hosts.decommission (exit_code=1) for hosts mc1027.eqiad.wmnet
13:10 dcausse: installing openjdk-8-dbg on wdqs2007
13:04 jiji@cumin1001: START - Cookbook sre.hosts.decommission for hosts mc1027.eqiad.wmnet
13:02 jiji@cumin1001: END (PASS) - Cookbook sre.hosts.decommission (exit_code=0) for hosts mc1023.eqiad.wmnet
12:48 jiji@cumin1001: START - Cookbook sre.hosts.decommission for hosts mc1023.eqiad.wmnet
12:46 jiji@cumin1001: END (PASS) - Cookbook sre.hosts.decommission (exit_code=0) for hosts mc[1035-1036].eqiad.wmnet
12:32 jiji@cumin1001: START - Cookbook sre.hosts.decommission for hosts mc[1035-1036].eqiad.wmnet
12:12 jiji@cumin1001: END (PASS) - Cookbook sre.hosts.decommission (exit_code=0) for hosts mc[1028-1032].eqiad.wmnet
12:03 joal@deploy1002: Finished deploy [analytics/refinery@7208d3d] (thin): Analytics hotfix deploy (bis) THIN [analytics/refinery@7208d3d] (duration: 00m 06s)
12:03 joal@deploy1002: Started deploy [analytics/refinery@7208d3d] (thin): Analytics hotfix deploy (bis) THIN [analytics/refinery@7208d3d]
12:03 joal@deploy1002: Finished deploy [analytics/refinery@7208d3d]: Analytics hotfix deploy (bis)[analytics/refinery@7208d3d] (duration: 19m 16s)
11:56 dcausse@deploy1002: Finished deploy [wdqs/wdqs@8361ac9]: ban queries from a generic UA (duration: 19m 21s)
11:44 joal@deploy1002: Started deploy [analytics/refinery@7208d3d]: Analytics hotfix deploy (bis)[analytics/refinery@7208d3d]
11:42 marostegui: Remove flaggedrevs_stats2 and flaggedrevs_stats from enwiki - T289050
11:37 dcausse@deploy1002: Started deploy [wdqs/wdqs@8361ac9]: ban queries from a generic UA
11:36 dcausse@deploy1002: Finished deploy [wdqs/wdqs@8361ac9]: ban queries from a generic UA (duration: 01m 07s)
11:35 dcausse@deploy1002: Started deploy [wdqs/wdqs@8361ac9]: ban queries from a generic UA
10:58 jiji@cumin1001: START - Cookbook sre.hosts.decommission for hosts mc[1028-1032].eqiad.wmnet
10:54 jiji@cumin1001: END (FAIL) - Cookbook sre.hosts.decommission (exit_code=1) for hosts mc[1025-1026].eqiad.wmnet
10:47 joal@deploy1002: Finished deploy [analytics/aqs/deploy@d273fde] (aqs-next): Deploy latest code on AQS new servers - test after failures (duration: 00m 32s)
10:46 joal@deploy1002: Started deploy [analytics/aqs/deploy@d273fde] (aqs-next): Deploy latest code on AQS new servers - test after failures
10:45 joal@deploy1002: deploy aborted: Deploy latest code on AQS new servers - test after failures (duration: 00m 05s)
10:45 joal@deploy1002: Started deploy [analytics/aqs/deploy@d273fde] (aqs-test): Deploy latest code on AQS new servers - test after failures
10:29 hnowlan@deploy1002: Finished deploy [analytics/aqs/deploy@d273fde] (aqs-next): deploying aqs to inactive aqs-next hosts (duration: 00m 03s)
10:29 hnowlan@deploy1002: Started deploy [analytics/aqs/deploy@d273fde] (aqs-next): deploying aqs to inactive aqs-next hosts
10:22 hnowlan@deploy1002: Finished deploy [analytics/aqs/deploy@d273fde] (aqs-next): deploying aqs to inactive aqs-next hosts (duration: 00m 55s)
10:21 hnowlan@deploy1002: Started deploy [analytics/aqs/deploy@d273fde] (aqs-next): deploying aqs to inactive aqs-next hosts
10:17 hnowlan@deploy1002: Finished deploy [analytics/aqs/deploy@d273fde] (aqs-next): deploying aqs to inactive aqs-next hosts (duration: 00m 36s)
10:16 hnowlan@deploy1002: Started deploy [analytics/aqs/deploy@d273fde] (aqs-next): deploying aqs to inactive aqs-next hosts
10:08 hnowlan@deploy1002: Finished deploy [analytics/aqs/deploy@d273fde] (aqs-next): deploying aqs to inactive aqs-next hosts (duration: 00m 45s)
10:08 hnowlan@deploy1002: Started deploy [analytics/aqs/deploy@d273fde] (aqs-next): deploying aqs to inactive aqs-next hosts
10:05 hnowlan@deploy1002: Finished deploy [analytics/aqs/deploy@d273fde] (aqs-next): deploying aqs to inactive aqs-next hosts (duration: 00m 36s)
10:04 hnowlan@deploy1002: Started deploy [analytics/aqs/deploy@d273fde] (aqs-next): deploying aqs to inactive aqs-next hosts
10:02 hnowlan@deploy1002: Finished deploy [analytics/aqs/deploy@d273fde] (aqs-next): deploying aqs to inactive aqs-next hosts (duration: 01m 25s)
10:01 hnowlan@deploy1002: Started deploy [analytics/aqs/deploy@d273fde] (aqs-next): deploying aqs to inactive aqs-next hosts
10:00 hnowlan@deploy1002: Finished deploy [analytics/aqs/deploy@d273fde] (aqs-next): deploying aqs to inactive aqs-next hosts (duration: 01m 53s)
09:58 hnowlan@deploy1002: Started deploy [analytics/aqs/deploy@d273fde] (aqs-next): deploying aqs to inactive aqs-next hosts
09:57 hnowlan@deploy1002: Finished deploy [analytics/aqs/deploy@d273fde] (aqs-next): deploying aqs to inactive aqs-next hosts (duration: 00m 09s)
09:57 hnowlan@deploy1002: Started deploy [analytics/aqs/deploy@d273fde] (aqs-next): deploying aqs to inactive aqs-next hosts
09:32 joal@deploy1002: Finished deploy [analytics/refinery@4ff8979] (thin): Analytics hotfix deploy THIN [analytics/refinery@4ff8979] (duration: 00m 07s)
09:32 joal@deploy1002: Started deploy [analytics/refinery@4ff8979] (thin): Analytics hotfix deploy THIN [analytics/refinery@4ff8979]
09:26 joal@deploy1002: Finished deploy [analytics/refinery@4ff8979]: Analytics hotfix deploy [analytics/refinery@4ff8979] (duration: 17m 36s)
09:25 jiji@cumin1001: START - Cookbook sre.hosts.decommission for hosts mc[1025-1026].eqiad.wmnet
09:15 jelto@deploy1002: helmfile [codfw] DONE helmfile.d/admin 'apply'.
09:14 jiji@cumin1001: END (PASS) - Cookbook sre.hosts.decommission (exit_code=0) for hosts mc1022.eqiad.wmnet
09:13 jelto@deploy1002: helmfile [codfw] START helmfile.d/admin 'apply'.
09:09 akosiaris@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'mathoid' for release 'production' .
09:09 jelto@deploy1002: helmfile [eqiad] DONE helmfile.d/admin 'apply'.
09:09 joal@deploy1002: Started deploy [analytics/refinery@4ff8979]: Analytics hotfix deploy [analytics/refinery@4ff8979]
09:08 jelto@deploy1002: helmfile [eqiad] START helmfile.d/admin 'apply'.
09:06 akosiaris@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'mathoid' for release 'production' .
09:03 jelto@deploy1002: helmfile [staging-eqiad] DONE helmfile.d/admin 'apply'.
09:03 jelto@deploy1002: helmfile [staging-eqiad] START helmfile.d/admin 'apply'.
08:53 jelto@deploy1002: helmfile [staging-codfw] DONE helmfile.d/admin 'apply'.
08:52 jelto@deploy1002: helmfile [staging-codfw] START helmfile.d/admin 'apply'.
08:45 ema: cp-eqsin: clean apt cache to free up some space T290305
08:45 jiji@cumin1001: START - Cookbook sre.hosts.decommission for hosts mc1022.eqiad.wmnet
08:23 akosiaris@deploy1002: helmfile [staging] Ran 'sync' command on namespace 'mathoid' for release 'staging' .
07:43 legoktm: uploaded pygments 2.10.0+dfsg-1~wmf1 to apt.wm.o in component/pygments
07:42 marostegui: Remove flaggedrevs_stats2 and flaggedrevs_stats from severak s3 wikis - T289050
07:10 godog: more weight to ms-be20[62-65] - T288458
07:01 marostegui@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
06:57 marostegui@cumin1001: START - Cookbook sre.dns.netbox
06:45 elukey: run `apt-get clean` on cp5012 to free some space (94% of the root partition used)
06:12 marostegui@cumin1001: dbctl commit (dc=all): 'db2138:3314 (re)pooling @ 100%: Slowly repool after reimage T288803', diff saved to https://phabricator.wikimedia.org/P17203 and previous config saved to /var/cache/conftool/dbconfig/20210903-061204-root.json
06:11 marostegui@cumin1001: dbctl commit (dc=all): 'db2138:3312 (re)pooling @ 100%: Slowly repool after reimage T288803', diff saved to https://phabricator.wikimedia.org/P17202 and previous config saved to /var/cache/conftool/dbconfig/20210903-061138-root.json
05:57 marostegui@cumin1001: dbctl commit (dc=all): 'db2138:3314 (re)pooling @ 75%: Slowly repool after reimage T288803', diff saved to https://phabricator.wikimedia.org/P17201 and previous config saved to /var/cache/conftool/dbconfig/20210903-055700-root.json
05:56 marostegui@cumin1001: dbctl commit (dc=all): 'db2138:3312 (re)pooling @ 75%: Slowly repool after reimage T288803', diff saved to https://phabricator.wikimedia.org/P17200 and previous config saved to /var/cache/conftool/dbconfig/20210903-055635-root.json
05:41 marostegui@cumin1001: dbctl commit (dc=all): 'db2138:3314 (re)pooling @ 50%: Slowly repool after reimage T288803', diff saved to https://phabricator.wikimedia.org/P17199 and previous config saved to /var/cache/conftool/dbconfig/20210903-054157-root.json
05:41 marostegui@cumin1001: dbctl commit (dc=all): 'db2138:3312 (re)pooling @ 50%: Slowly repool after reimage T288803', diff saved to https://phabricator.wikimedia.org/P17198 and previous config saved to /var/cache/conftool/dbconfig/20210903-054131-root.json
05:30 marostegui@cumin1001: END (FAIL) - Cookbook sre.hosts.decommission (exit_code=1) for hosts pc2007.codfw.wmnet
05:26 marostegui@cumin1001: dbctl commit (dc=all): 'db2138:3314 (re)pooling @ 25%: Slowly repool after reimage T288803', diff saved to https://phabricator.wikimedia.org/P17196 and previous config saved to /var/cache/conftool/dbconfig/20210903-052653-root.json
05:26 marostegui@cumin1001: dbctl commit (dc=all): 'db2138:3312 (re)pooling @ 25%: Slowly repool after reimage T288803', diff saved to https://phabricator.wikimedia.org/P17195 and previous config saved to /var/cache/conftool/dbconfig/20210903-052628-root.json
05:20 marostegui@cumin1001: START - Cookbook sre.hosts.decommission for hosts pc2007.codfw.wmnet
05:11 marostegui@cumin1001: dbctl commit (dc=all): 'db2138:3314 (re)pooling @ 10%: Slowly repool after reimage T288803', diff saved to https://phabricator.wikimedia.org/P17194 and previous config saved to /var/cache/conftool/dbconfig/20210903-051149-root.json
05:11 marostegui@cumin1001: dbctl commit (dc=all): 'db2138:3312 (re)pooling @ 10%: Slowly repool after reimage T288803', diff saved to https://phabricator.wikimedia.org/P17193 and previous config saved to /var/cache/conftool/dbconfig/20210903-051124-root.json
05:04 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db2138 for upgrade', diff saved to https://phabricator.wikimedia.org/P17192 and previous config saved to /var/cache/conftool/dbconfig/20210903-050423-marostegui.json
00:31 tgr@deploy1002: Synchronized php-1.37.0-wmf.21/extensions/GrowthExperiments/maintenance/fixLinkRecommendationData.php: Backport: fixLinkRecommendationData: Try harder to avoid >10K result sets (T284531) (duration: 00m 58s)
00:25 mwdebug-deploy@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
00:23 mwdebug-deploy@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .

2021-09-02

23:12 thcipriani@deploy1002: Synchronized wmf-config/InitialiseSettings.php: Config: Adding wordmark for ptwikinews mobile and desktop skins (T281591) Part II (duration: 00m 57s)
23:11 thcipriani@deploy1002: Synchronized static/images/mobile/copyright/wikinews-wordmark-pt.svg: Config: Adding wordmark for ptwikinews mobile and desktop skins (T281591) Part I (duration: 01m 14s)
21:47 bd808@deploy1002: helmfile [staging] Ran 'sync' command on namespace 'toolhub' for release 'main' .
21:37 bd808@deploy1002: helmfile [staging] Ran 'sync' command on namespace 'toolhub' for release 'main' .
21:17 bd808@deploy1002: helmfile [staging] Ran 'sync' command on namespace 'toolhub' for release 'main' .
19:57 ejegg: updated fundraising CiviCRM from 7ac13753c7 to 06ef98593f
19:49 cmjohnson@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
19:48 jiji@cumin1001: END (FAIL) - Cookbook sre.hosts.decommission (exit_code=1) for hosts mc1021.eqiad.wmnet
19:45 cmjohnson@cumin1001: START - Cookbook sre.dns.netbox
19:40 jiji@cumin1001: START - Cookbook sre.hosts.decommission for hosts mc1021.eqiad.wmnet
19:28 twentyafterfour@deploy1002: rebuilt and synchronized wikiversions files: all wikis to 1.37.0-wmf.21 refs T281162
18:31 ryankemper: [WCQS] `wcqs100[1-3],wcqs200[1-3]` downtimed until `2021-09-09 20:29:55` (UTC)
18:28 ryankemper: [WCQS] Merged & deployed https://gerrit.wikimedia.org/r/c/operations/puppet/+/713946, going to suppress icinga alerts on `wcqs*` hosts because these are still in the process of being spun up properly and aren't serving traffic or anything
18:24 ryankemper@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'linkrecommendation' for release 'external' .
18:24 ryankemper@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'linkrecommendation' for release 'internal' .
18:20 ryankemper@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'linkrecommendation' for release 'internal' .
18:20 ryankemper@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'linkrecommendation' for release 'external' .
17:06 mwdebug-deploy@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
17:03 mwdebug-deploy@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
16:57 akosiaris@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
16:18 jiji@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
16:09 jiji@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
16:04 jiji@cumin1001: END (PASS) - Cookbook sre.hosts.decommission (exit_code=0) for hosts mc1020.eqiad.wmnet
15:53 jiji@cumin1001: START - Cookbook sre.hosts.decommission for hosts mc1020.eqiad.wmnet
15:40 jiji@cumin1001: END (PASS) - Cookbook sre.hosts.decommission (exit_code=0) for hosts mc1019.eqiad.wmnet
15:31 dzahn@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'miscweb' for release 'main' .
15:28 dzahn@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'miscweb' for release 'main' .
15:26 jiji@cumin1001: START - Cookbook sre.hosts.decommission for hosts mc1019.eqiad.wmnet
15:16 jiji@cumin1001: END (FAIL) - Cookbook sre.hosts.decommission (exit_code=99) for hosts mc1033.eqiad.wmnet
15:15 jiji@cumin1001: END (PASS) - Cookbook sre.hosts.decommission (exit_code=0) for hosts mc1034.eqiad.wmnet
15:04 marostegui@cumin1001: dbctl commit (dc=all): 'db2136 (re)pooling @ 100%: Slowly repool after reimage T288803', diff saved to https://phabricator.wikimedia.org/P17178 and previous config saved to /var/cache/conftool/dbconfig/20210902-150412-root.json
14:50 jiji@cumin1001: START - Cookbook sre.hosts.decommission for hosts mc1034.eqiad.wmnet
14:49 marostegui@cumin1001: dbctl commit (dc=all): 'db2136 (re)pooling @ 75%: Slowly repool after reimage T288803', diff saved to https://phabricator.wikimedia.org/P17177 and previous config saved to /var/cache/conftool/dbconfig/20210902-144908-root.json
14:49 jiji@cumin1001: START - Cookbook sre.hosts.decommission for hosts mc1033.eqiad.wmnet
14:47 cmjohnson@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
14:44 cmjohnson@cumin1001: START - Cookbook sre.dns.netbox
14:39 jayme@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'cxserver' for release 'production' .
14:38 jayme@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'cxserver' for release 'production' .
14:38 jayme@deploy1002: helmfile [staging] Ran 'sync' command on namespace 'cxserver' for release 'staging' .
14:35 jayme@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'wikifeeds' for release 'production' .
14:34 marostegui@cumin1001: dbctl commit (dc=all): 'db2136 (re)pooling @ 50%: Slowly repool after reimage T288803', diff saved to https://phabricator.wikimedia.org/P17176 and previous config saved to /var/cache/conftool/dbconfig/20210902-143405-root.json
14:33 jayme@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'wikifeeds' for release 'production' .
14:32 jayme@deploy1002: helmfile [staging] Ran 'sync' command on namespace 'wikifeeds' for release 'staging' .
14:22 moritzm: installing exiv2 security updates
14:19 marostegui@cumin1001: dbctl commit (dc=all): 'db2136 (re)pooling @ 25%: Slowly repool after reimage T288803', diff saved to https://phabricator.wikimedia.org/P17175 and previous config saved to /var/cache/conftool/dbconfig/20210902-141901-root.json
14:13 moritzm: installing ffmpeg security updates
14:03 marostegui@cumin1001: dbctl commit (dc=all): 'db2136 (re)pooling @ 10%: Slowly repool after reimage T288803', diff saved to https://phabricator.wikimedia.org/P17174 and previous config saved to /var/cache/conftool/dbconfig/20210902-140357-root.json
14:00 jayme@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'zotero' for release 'production' .
13:57 jayme@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'zotero' for release 'production' .
13:55 jayme@deploy1002: helmfile [staging] Ran 'sync' command on namespace 'zotero' for release 'staging' .
13:48 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db2136 for upgrade', diff saved to https://phabricator.wikimedia.org/P17173 and previous config saved to /var/cache/conftool/dbconfig/20210902-134838-marostegui.json
13:44 marostegui@cumin1001: dbctl commit (dc=all): 'db2119 (re)pooling @ 100%: Slowly repool after reimage T288803', diff saved to https://phabricator.wikimedia.org/P17172 and previous config saved to /var/cache/conftool/dbconfig/20210902-134448-root.json
13:42 jayme@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'citoid' for release 'production' .
13:42 jayme@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'citoid' for release 'production' .
13:41 jayme@deploy1002: helmfile [staging] Ran 'sync' command on namespace 'citoid' for release 'staging' .
13:39 jayme@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'termbox' for release 'production' .
13:39 jayme@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'termbox' for release 'production' .
13:38 jayme@deploy1002: helmfile [staging] Ran 'sync' command on namespace 'termbox' for release 'test' .
13:38 jayme@deploy1002: helmfile [staging] Ran 'sync' command on namespace 'termbox' for release 'staging' .
13:36 jayme@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'blubberoid' for release 'production' .
13:35 jayme@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'blubberoid' for release 'production' .
13:29 marostegui@cumin1001: dbctl commit (dc=all): 'db2119 (re)pooling @ 75%: Slowly repool after reimage T288803', diff saved to https://phabricator.wikimedia.org/P17171 and previous config saved to /var/cache/conftool/dbconfig/20210902-132945-root.json
13:29 jayme@deploy1002: helmfile [staging] Ran 'sync' command on namespace 'blubberoid' for release 'staging' .
13:24 jbond@cumin1001: END (FAIL) - Cookbook sre.hosts.downtime (exit_code=99) for 2:00:00 on sretest1002.eqiad.wmnet with reason: REIMAGE
13:22 jbond@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on sretest1002.eqiad.wmnet with reason: REIMAGE
13:14 jbond: reimage sretest1002 (not sretest1001)
13:14 marostegui@cumin1001: dbctl commit (dc=all): 'db2119 (re)pooling @ 50%: Slowly repool after reimage T288803', diff saved to https://phabricator.wikimedia.org/P17169 and previous config saved to /var/cache/conftool/dbconfig/20210902-131441-root.json
13:14 jbond: reimage sretest1001
12:59 marostegui@cumin1001: dbctl commit (dc=all): 'db2119 (re)pooling @ 25%: Slowly repool after reimage T288803', diff saved to https://phabricator.wikimedia.org/P17168 and previous config saved to /var/cache/conftool/dbconfig/20210902-125937-root.json
12:55 jbond: disable puppet fleet wide to roll out 715728
12:44 marostegui@cumin1001: dbctl commit (dc=all): 'db2119 (re)pooling @ 10%: Slowly repool after reimage T288803', diff saved to https://phabricator.wikimedia.org/P17167 and previous config saved to /var/cache/conftool/dbconfig/20210902-124434-root.json
12:42 marostegui: Upgrade db2119
12:41 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db2119 for upgrade', diff saved to https://phabricator.wikimedia.org/P17166 and previous config saved to /var/cache/conftool/dbconfig/20210902-124102-marostegui.json
12:28 marostegui@cumin1001: dbctl commit (dc=all): 'db2106 (re)pooling @ 100%: Slowly repool after reimage T288803', diff saved to https://phabricator.wikimedia.org/P17165 and previous config saved to /var/cache/conftool/dbconfig/20210902-122826-root.json
12:13 marostegui@cumin1001: dbctl commit (dc=all): 'db2106 (re)pooling @ 75%: Slowly repool after reimage T288803', diff saved to https://phabricator.wikimedia.org/P17164 and previous config saved to /var/cache/conftool/dbconfig/20210902-121323-root.json
11:58 marostegui@cumin1001: dbctl commit (dc=all): 'db2106 (re)pooling @ 50%: Slowly repool after reimage T288803', diff saved to https://phabricator.wikimedia.org/P17163 and previous config saved to /var/cache/conftool/dbconfig/20210902-115819-root.json
11:43 marostegui@cumin1001: dbctl commit (dc=all): 'db2106 (re)pooling @ 25%: Slowly repool after reimage T288803', diff saved to https://phabricator.wikimedia.org/P17162 and previous config saved to /var/cache/conftool/dbconfig/20210902-114315-root.json
11:28 marostegui@cumin1001: dbctl commit (dc=all): 'db2106 (re)pooling @ 10%: Slowly repool after reimage T288803', diff saved to https://phabricator.wikimedia.org/P17161 and previous config saved to /var/cache/conftool/dbconfig/20210902-112812-root.json
11:26 urbanecm@deploy1002: Synchronized README: testing scap (duration: 01m 06s)
11:22 jmm@puppetmaster1001: conftool action : set/pooled=inactive; selector: name=mw2264.codfw.wmnet
11:18 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db2106 for upgrade', diff saved to https://phabricator.wikimedia.org/P17160 and previous config saved to /var/cache/conftool/dbconfig/20210902-111843-marostegui.json
11:07 mwdebug-deploy@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
11:05 urbanecm@deploy1002: Synchronized wmf-config/InitialiseSettings.php: 3ce5d80: dewiki: Enable Growth features for 30% of newcomers (T288420) (duration: 01m 58s)
11:05 mwdebug-deploy@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
11:04 urbanecm: metawiki: Server-side page move from VRT -> Volunteer Response Team (T290083)
11:00 marostegui@cumin1001: dbctl commit (dc=all): 'db2073 (re)pooling @ 100%: Slowly repool after reimage T288803', diff saved to https://phabricator.wikimedia.org/P17158 and previous config saved to /var/cache/conftool/dbconfig/20210902-110022-root.json
10:45 marostegui@cumin1001: dbctl commit (dc=all): 'db2073 (re)pooling @ 75%: Slowly repool after reimage T288803', diff saved to https://phabricator.wikimedia.org/P17155 and previous config saved to /var/cache/conftool/dbconfig/20210902-104518-root.json
10:38 mbsantos: REINDEX database gis in maps1009 while it's in depooled state
10:30 marostegui@cumin1001: dbctl commit (dc=all): 'db2073 (re)pooling @ 50%: Slowly repool after reimage T288803', diff saved to https://phabricator.wikimedia.org/P17152 and previous config saved to /var/cache/conftool/dbconfig/20210902-103014-root.json
10:24 mvolz@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'zotero' for release 'production' .
10:23 mvolz@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'zotero' for release 'production' .
10:19 mvolz@deploy1002: helmfile [staging] Ran 'sync' command on namespace 'zotero' for release 'staging' .
10:15 marostegui@cumin1001: dbctl commit (dc=all): 'db2073 (re)pooling @ 25%: Slowly repool after reimage T288803', diff saved to https://phabricator.wikimedia.org/P17150 and previous config saved to /var/cache/conftool/dbconfig/20210902-101511-root.json
10:00 marostegui@cumin1001: dbctl commit (dc=all): 'db2073 (re)pooling @ 10%: Slowly repool after reimage T288803', diff saved to https://phabricator.wikimedia.org/P17147 and previous config saved to /var/cache/conftool/dbconfig/20210902-100007-root.json
09:57 marostegui: Upgrade db2073
09:56 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db2073 for upgrade', diff saved to https://phabricator.wikimedia.org/P17145 and previous config saved to /var/cache/conftool/dbconfig/20210902-095601-marostegui.json
09:56 hashar@deploy1002: Finished deploy [integration/docroot@973ac8a]: Support listing files on index pages - T289196 (duration: 00m 07s)
09:55 hashar@deploy1002: Started deploy [integration/docroot@973ac8a]: Support listing files on index pages - T289196
09:20 marostegui@cumin1001: dbctl commit (dc=all): 'db2140 (re)pooling @ 100%: Slowly repool after reimage T288803', diff saved to https://phabricator.wikimedia.org/P17142 and previous config saved to /var/cache/conftool/dbconfig/20210902-092026-root.json
09:05 marostegui@cumin1001: dbctl commit (dc=all): 'db2140 (re)pooling @ 75%: Slowly repool after reimage T288803', diff saved to https://phabricator.wikimedia.org/P17141 and previous config saved to /var/cache/conftool/dbconfig/20210902-090523-root.json
08:55 marostegui: Remove flaggedrevs_stats2 and flaggedrevs_stats from eowiki,idwiki,plwiki,trwiki - T289050
08:50 marostegui@cumin1001: dbctl commit (dc=all): 'db2140 (re)pooling @ 50%: Slowly repool after reimage T288803', diff saved to https://phabricator.wikimedia.org/P17140 and previous config saved to /var/cache/conftool/dbconfig/20210902-085019-root.json
08:35 marostegui@cumin1001: dbctl commit (dc=all): 'db2140 (re)pooling @ 25%: Slowly repool after reimage T288803', diff saved to https://phabricator.wikimedia.org/P17138 and previous config saved to /var/cache/conftool/dbconfig/20210902-083515-root.json
08:20 marostegui@cumin1001: dbctl commit (dc=all): 'db2140 (re)pooling @ 10%: Slowly repool after reimage T288803', diff saved to https://phabricator.wikimedia.org/P17136 and previous config saved to /var/cache/conftool/dbconfig/20210902-082012-root.json
08:14 marostegui: Upgrade db2140
08:14 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db2140 for upgrade', diff saved to https://phabricator.wikimedia.org/P17135 and previous config saved to /var/cache/conftool/dbconfig/20210902-081436-marostegui.json
07:57 filippo@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ms-be1064.eqiad.wmnet
07:51 filippo@cumin1001: START - Cookbook sre.hosts.reboot-single for host ms-be1064.eqiad.wmnet
07:44 marostegui: Remove flaggedrevs_stats2 and flaggedrevs_stats on huwiki - T289050
07:44 marostegui: Remove flaggedrevs_stats2 and flaggedrevs_stats on arwiki - T289050
07:06 mwdebug-deploy@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
07:04 mwdebug-deploy@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
07:00 marostegui: Stop mariadb on pc2007 before decommissioning T289112
06:59 marostegui@deploy1002: Synchronized wmf-config/ProductionServices.php: Remove pc2007 T289112 (duration: 01m 06s)
06:13 eileen: civicrm revision changed from ad37f21a7d to 7ac13753c7, config revision is 5f004d94d7
04:50 marostegui: Remove flaggedrevs_stats2 and flaggedrevs_stats on ruwiki - T289050
02:09 mwdebug-deploy@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
02:06 mwdebug-deploy@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
02:05 krinkle@deploy1002: Synchronized php-1.37.0-wmf.21/extensions/WikimediaMaintenance/blameStartupRegistry.php: I63bf19 (duration: 01m 07s)

2021-09-01

23:50 Amir1: mwscript createAndPromote.php --wiki=test2wiki --sysop --force Ladsgroup
23:50 mwdebug-deploy@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
23:45 mwdebug-deploy@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
23:35 mwdebug-deploy@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
23:30 mwdebug-deploy@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
23:27 urbanecm@deploy1002: Synchronized php-1.37.0-wmf.20/extensions/GrowthExperiments/maintenance/fixLinkRecommendationData.php: 0bd6542: fixLinkRecommendationData: stay under 10K search limit (T284531) (duration: 01m 06s)
23:27 eileen: civicrm revision changed from 30cd9c1d90 to ad37f21a7d, config revision is 5f004d94d7
23:25 legoktm@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'shellbox-timeline' for release 'main' .
23:24 urbanecm@deploy1002: Synchronized php-1.37.0-wmf.20/extensions/GrowthExperiments/maintenance/fixLinkRecommendationData.php: 3c7d4ec: fixLinkRecommendationData: Allow --db-table in dry-run mode (T283868) (duration: 01m 06s)
23:20 urbanecm@deploy1002: Synchronized wmf-config/extension-list: 91ff927: Enable NearbyPages on beta cluster (T246493; 3/3) (duration: 01m 05s)
23:19 legoktm@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'shellbox-timeline' for release 'main' .
23:18 urbanecm@deploy1002: Synchronized wmf-config/CommonSettings.php: 91ff927: Enable NearbyPages on beta cluster (T246493; 2/3) (duration: 01m 06s)
23:18 mwdebug-deploy@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
23:17 urbanecm@deploy1002: Synchronized wmf-config/InitialiseSettings.php: 91ff927: Enable NearbyPages on beta cluster (T246493; 1/3) (duration: 01m 06s)
23:16 mwdebug-deploy@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
23:15 legoktm@deploy1002: helmfile [staging] Ran 'sync' command on namespace 'shellbox-timeline' for release 'main' .
23:11 urbanecm@deploy1002: Synchronized wmf-config/InitialiseSettings.php: bb7d92c: Enable WVUI search on Wikimedia Commons (T287215) (duration: 01m 07s)
23:04 dpifke@deploy1002: Finished deploy [performance/navtiming@63c9d31]: Deploy fix for CpuBenchmark-related Prometheus timeouts T281243 (duration: 00m 06s)
23:04 dpifke@deploy1002: Started deploy [performance/navtiming@63c9d31]: Deploy fix for CpuBenchmark-related Prometheus timeouts T281243
22:44 legoktm@deploy1002: helmfile [codfw] DONE helmfile.d/admin 'apply'.
22:43 legoktm@deploy1002: helmfile [codfw] START helmfile.d/admin 'apply'.
22:43 legoktm@deploy1002: helmfile [eqiad] DONE helmfile.d/admin 'apply'.
22:43 legoktm@deploy1002: helmfile [eqiad] START helmfile.d/admin 'apply'.
22:42 legoktm@deploy1002: helmfile [staging-eqiad] DONE helmfile.d/admin 'apply'.
22:42 legoktm@deploy1002: helmfile [staging-eqiad] START helmfile.d/admin 'apply'.
22:40 legoktm@deploy1002: helmfile [staging-codfw] DONE helmfile.d/admin 'apply'.
22:39 legoktm@deploy1002: helmfile [staging-codfw] START helmfile.d/admin 'apply'.
22:35 legoktm@deploy1002: helmfile [codfw] DONE helmfile.d/admin 'apply'.
22:34 legoktm@deploy1002: helmfile [codfw] START helmfile.d/admin 'apply'.
22:33 legoktm@deploy1002: helmfile [eqiad] DONE helmfile.d/admin 'apply'.
22:33 legoktm@deploy1002: helmfile [eqiad] START helmfile.d/admin 'apply'.
22:32 legoktm@deploy1002: helmfile [staging-eqiad] DONE helmfile.d/admin 'apply'.
22:32 legoktm@deploy1002: helmfile [staging-eqiad] START helmfile.d/admin 'apply'.
22:30 legoktm@deploy1002: helmfile [staging-codfw] DONE helmfile.d/admin 'apply'.
22:29 legoktm@deploy1002: helmfile [staging-codfw] START helmfile.d/admin 'apply'.
20:02 mwdebug-deploy@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
20:01 mwdebug-deploy@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
19:57 twentyafterfour@deploy1002: Synchronized php: group1 wikis to 1.37.0-wmf.21 refs T281161 (duration: 01m 06s)
19:57 twentyafterfour: twentyafterfour@deploy1002 rebuilt and synchronized wikiversions files: group1 wikis to 1.37.0-wmf.21 refs T281162
19:56 twentyafterfour@deploy1002: rebuilt and synchronized wikiversions files: group1 wikis to 1.37.0-wmf.21 refs T281161
18:38 mwdebug-deploy@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
18:37 mwdebug-deploy@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
18:30 mwdebug-deploy@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
18:28 mwdebug-deploy@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
18:21 mwdebug-deploy@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
18:19 mwdebug-deploy@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
18:17 urbanecm@deploy1002: Synchronized wmf-config/InitialiseSettings.php: fe1ae2e: Growth features: Deploy to 100% of newcomers on small wikis (T289786) (duration: 01m 06s)
18:09 mwdebug-deploy@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
18:07 urbanecm@deploy1002: Synchronized wmf-config/InitialiseSettings.php: 27e85b1: nlwiki: Enable link recommendations for all Growth users (T285254) (duration: 01m 06s)
18:07 mwdebug-deploy@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
18:05 urbanecm@deploy1002: Synchronized wmf-config/InitialiseSettings.php: 94b1cca: Growth features: Enable for newcomers on two wikis (T285254, T287867) (duration: 01m 09s)
17:31 ejegg: updated payments-wiki from c4d56178d0 to f9cbf95a12
16:23 mforns@deploy1002: Finished deploy [analytics/refinery@ff15071] (thin): Fix for cassandra3 loading THIN [analytics/refinery@ff15071] (duration: 00m 06s)
16:23 mforns@deploy1002: Started deploy [analytics/refinery@ff15071] (thin): Fix for cassandra3 loading THIN [analytics/refinery@ff15071]
16:22 mforns@deploy1002: Finished deploy [analytics/refinery@ff15071]: Fix for cassandra3 loading [analytics/refinery@ff15071] (duration: 26m 58s)
16:06 cmjohnson@cumin1001: END (FAIL) - Cookbook sre.hosts.downtime (exit_code=99) for 2:00:00 on ms-be1066.eqiad.wmnet with reason: REIMAGE
16:04 cmjohnson@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on ms-be1065.eqiad.wmnet with reason: REIMAGE
16:02 cmjohnson@cumin1001: END (FAIL) - Cookbook sre.hosts.downtime (exit_code=99) for 2:00:00 on ms-be1064.eqiad.wmnet with reason: REIMAGE
16:01 cmjohnson@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on ms-be1066.eqiad.wmnet with reason: REIMAGE
16:01 cmjohnson@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on ms-be1065.eqiad.wmnet with reason: REIMAGE
16:00 cmjohnson@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on ms-be1064.eqiad.wmnet with reason: REIMAGE
15:55 mforns@deploy1002: Started deploy [analytics/refinery@ff15071]: Fix for cassandra3 loading [analytics/refinery@ff15071]
15:35 jiji@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
15:08 dzahn@deploy1002: helmfile [staging] Ran 'sync' command on namespace 'miscweb' for release 'main' .
14:08 mwdebug-deploy@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
14:07 mwdebug-deploy@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
14:04 godog: move simone-this-dot from wmf to nda ldap group - T289783
13:51 jiji@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host rdb2009.codfw.wmnet
13:49 mwdebug-deploy@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
13:48 krinkle@deploy1002: Synchronized php-1.37.0-wmf.20/includes/resourceloader: Id7c258 (duration: 01m 06s)
13:46 krinkle@deploy1002: Synchronized php-1.37.0-wmf.21/includes/resourceloader: Id7c258 (duration: 01m 49s)
13:45 jiji@cumin1001: START - Cookbook sre.hosts.reboot-single for host rdb2009.codfw.wmnet
13:45 mwdebug-deploy@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
13:38 mwdebug-deploy@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
13:36 mwdebug-deploy@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
13:16 dzahn@deploy1002: helmfile [staging] Ran 'sync' command on namespace 'miscweb' for release 'main' .
13:05 mutante: planet1002 - temp removing feed from ad.huikeshoven - seems to cause corrupt state file (T289984)
13:01 dzahn@deploy1002: helmfile [staging] Ran 'sync' command on namespace 'miscweb' for release 'main' .
12:48 godog: s/webperf/navtiming/
12:47 godog: bounce webperf on webperf2001 - T290138
12:41 mutante: planet1002 - rm /etc/rawdog/en/feeds/39a7970f.state (corrupt) T289984
12:38 dzahn@deploy1002: helmfile [staging] Ran 'sync' command on namespace 'miscweb' for release 'main' .
11:19 Krinkle: effie restarted php-fpm on parse2007.codfw.wmnet, ref T290120.
10:21 jbond: start filtering more puppet facts G:715461 - T263578
09:23 marostegui: Drop flaggedrevs_stats and flaggedrevs_stats2 from dewiki T289050
07:45 ema: deploy Varnish SLO dashboard with grr apply slo_dashboards.jsonnet T289036
07:05 XioNoX: pfw NAT and ACLs changes - T290077
06:29 elukey@cumin1001: END (PASS) - Cookbook sre.puppet.renew-cert (exit_code=0) for sodium.wikimedia.org: Renew puppet certificate - elukey@cumin1001
06:28 elukey@cumin1001: START - Cookbook sre.puppet.renew-cert for sodium.wikimedia.org: Renew puppet certificate - elukey@cumin1001
05:25 effie: depool mw2251 mw2255 parse2001 for tests - T280497
04:41 marostegui: Optimize idwiki.flaggedtemplates T290057
04:23 marostegui: Optimize arwiki.flaggedtemplates T290057
04:16 eileen: civicrm revision changed from 7da3eba4f9 to 30cd9c1d90, config revision is 5f004d94d7
00:53 eileen: civicrm revision changed from e567b4c289 to 7da3eba4f9, config revision is 5f004d94d7

2021-08-31

23:41 mwdebug-deploy@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
23:38 mwdebug-deploy@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
23:38 eileen: civicrm revision changed from 718aa9cad3 to e567b4c289, config revision is 7a24870bc7
23:33 dpifke@deploy1002: Synchronized wmf-config/profiler.php: Revert excimer-k8s pipelines T288165 (duration: 01m 14s)
23:31 mwdebug-deploy@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
23:25 dpifke@deploy1002: scap failed: average error rate on 3/6 canaries increased by 10x (rerun with --force to override this check, see https://logstash.wikimedia.org/goto/83629bcb5560d11e61d3085c89dd9ed6 for details)
23:23 mwdebug-deploy@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
23:15 mforns: failed deployment of refinery (v0.1.17) to an-test-coord1001.eqiad.wmnet (scap error)
23:15 mwdebug-deploy@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
23:14 mforns@deploy1002: Finished deploy [analytics/refinery@a0f039b] (hadoop-test): Regular analytics weekly train TEST v0.1.17 [analytics/refinery@a0f039b] (duration: 13m 42s)
23:09 urbanecm@deploy1002: Synchronized wmf-config/InitialiseSettings.php: 1437d99: Enable link recommendation frontent in dewiki and nlwiki (T288420, T285254) (duration: 01m 06s)
23:08 mwdebug-deploy@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
23:07 urbanecm@deploy1002: Synchronized wmf-config/InitialiseSettings.php: 8997ae5: Fix wgDiscussionTools_sourcemodetoolbar settings (duration: 01m 22s)
23:01 mforns@deploy1002: Started deploy [analytics/refinery@a0f039b] (hadoop-test): Regular analytics weekly train TEST v0.1.17 [analytics/refinery@a0f039b]
23:00 mforns@deploy1002: Finished deploy [analytics/refinery@a0f039b] (thin): Regular analytics weekly train THIN v0.1.17 [analytics/refinery@a0f039b] (duration: 00m 07s)
23:00 mforns@deploy1002: Started deploy [analytics/refinery@a0f039b] (thin): Regular analytics weekly train THIN v0.1.17 [analytics/refinery@a0f039b]
23:00 mforns@deploy1002: Finished deploy [analytics/refinery@a0f039b]: Regular analytics weekly train v0.1.17 [analytics/refinery@a0f039b] (duration: 17m 39s)
22:42 mforns@deploy1002: Started deploy [analytics/refinery@a0f039b]: Regular analytics weekly train v0.1.17 [analytics/refinery@a0f039b]
21:58 ejegg: switched Adyen to new Checkout integration
21:41 dduvall@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'blubberoid' for release 'production' .
21:38 dduvall@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'blubberoid' for release 'production' .
21:34 dduvall@deploy1002: helmfile [staging] Ran 'sync' command on namespace 'blubberoid' for release 'staging' .
20:19 mwdebug-deploy@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
20:11 mwdebug-deploy@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
20:00 twentyafterfour@deploy1002: rebuilt and synchronized wikiversions files: group0 wikis to 1.37.0-wmf.21 refs T281161
19:20 brennen: gitlab1001: brief downtime for testing reconfiguration of cas3.session_duration
19:05 twentyafterfour@deploy1002: Finished scap: testwikis wikis to 1.37.0-wmf.21 refs T281161 (duration: 35m 53s)
19:03 mwdebug-deploy@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
18:56 mwdebug-deploy@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
18:40 ejegg: switched Adyen back to HPP integration
18:38 ejegg: updated payments-wiki from 564daed816 to c4d56178d0, switched Adyen to Checkout integration
18:30 twentyafterfour@deploy1002: Started scap: testwikis wikis to 1.37.0-wmf.21 refs T281161
18:24 twentyafterfour: ran `scap prep 1.37.0-wmf.21` and `scap apply-patches --train 1.37.0-wmf.21` refs T281162
18:05 XioNoX: re-pool eqsin-codfw link
16:18 dcausse@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'rdf-streaming-updater' for release 'main' .
16:14 dcausse@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'rdf-streaming-updater' for release 'main' .
16:08 hnowlan@deploy1002: Finished deploy [restbase/deploy@09156c2]: fix core Title redirect loop (duration: 16m 02s)
15:52 hnowlan@deploy1002: Started deploy [restbase/deploy@09156c2]: fix core Title redirect loop
14:30 jbond: enable puppet fleet wide to post preform puppetdb maintance T263578
14:29 hashar: Restarting CI Jenkins for plugins upgrade
14:19 ottomata: merged change to service_auto_restart.pp that changes the way service names are matched to be more explicit. tested in deployment prep and nothing bad happened. Logging in case something bad does happen in prod. https://gerrit.wikimedia.org/r/c/operations/puppet/+/697605
14:09 otto@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'eventgate-main' for release 'production' .
14:09 otto@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'eventgate-main' for release 'production' .
14:07 otto@deploy1002: helmfile [staging] Ran 'sync' command on namespace 'eventgate-main' for release 'production' .
14:05 otto@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'eventgate-analytics' for release 'production' .
14:05 otto@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'eventgate-analytics' for release 'production' .
14:03 otto@deploy1002: helmfile [staging] Ran 'sync' command on namespace 'eventgate-analytics' for release 'production' .
14:03 otto@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'eventgate-analytics-external' for release 'production' .
14:02 otto@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'eventgate-analytics-external' for release 'production' .
14:02 jbond@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 4:00:00 on puppetdb2002.codfw.wmnet with reason: puppetdb maintance - T289779
14:02 jbond@cumin1001: START - Cookbook sre.hosts.downtime for 4:00:00 on puppetdb2002.codfw.wmnet with reason: puppetdb maintance - T289779
14:02 jbond@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 4:00:00 on puppetdb1002.eqiad.wmnet with reason: puppetdb maintance - T289779
14:02 jbond@cumin1001: START - Cookbook sre.hosts.downtime for 4:00:00 on puppetdb1002.eqiad.wmnet with reason: puppetdb maintance - T289779
14:01 otto@deploy1002: helmfile [staging] Ran 'sync' command on namespace 'eventgate-analytics-external' for release 'production' .
14:00 otto@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'eventgate-logging-external' for release 'production' .
13:47 jbond: disable puppet fleet wide to preform puppetdb maintance T263578
13:41 otto@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'eventgate-logging-external' for release 'production' .
13:37 urbanecm: Start `mwscript extensions/GrowthExperiments/maintenance/refreshLinkRecommendations.php --wiki=nlwiki --verbose` in a tmux session at mwmaint2002
13:28 hnowlan@puppetmaster1001: conftool action : set/pooled=yes; selector: name=maps1010.eqiad.wmnet
13:06 dzahn@deploy1002: helmfile [staging] Ran 'sync' command on namespace 'miscweb' for release 'main' .
13:04 jayme@deploy1002: helmfile [staging] Ran 'sync' command on namespace 'eventgate-logging-external' for release 'production' .
12:59 urbanecm: [urbanecm@mwmaint2002 ~]$ sudo -u www-data kill 133282 # stop updateMenteeData.php at frwiki
12:52 jelto: run kubectl scale deployments.apps -n ci mediawiki-bruce --replicas=0 to stop ImagePulling and reduce io on kubestage1001
12:42 dzahn@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 5 days, 8:00:00 on planet1002.eqiad.wmnet with reason: known issue
12:42 dzahn@cumin1001: START - Cookbook sre.hosts.downtime for 5 days, 8:00:00 on planet1002.eqiad.wmnet with reason: known issue
11:38 jbond: sudo gnt-instance modify --disk add:size=100G puppetdb2002.codfw.wmnet T263578
11:38 jbond: sudo gnt-instance modify --disk add:size=100G puppetdb1002.eqiad.wmnet T263578
11:37 jbond: sudo gnt-instance modify --disk add:size=100G puppetdb2002.codfw.wmnet
11:35 mwdebug-deploy@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
11:33 mwdebug-deploy@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
11:31 urbanecm@deploy1002: Synchronized php-1.37.0-wmf.20/extensions/GrowthExperiments/maintenance/updateMenteeData.php: 53a1856: updateMenteeData: Send timing to statsd (T278971) (duration: 00m 57s)
11:11 mwdebug-deploy@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
11:09 mwdebug-deploy@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
11:07 urbanecm: EU B&C window done
11:06 urbanecm@deploy1002: Synchronized wmf-config/InitialiseSettings.php: eb482e3: Offer the DiscussionTools reply tool as opt-out setting at 21 phase 2 Wikipedias (T288483) (duration: 00m 57s)
10:38 hnowlan@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 5:00:00 on maps1010.eqiad.wmnet with reason: Resyncing from master
10:38 hnowlan@cumin1001: START - Cookbook sre.hosts.downtime for 5:00:00 on maps1010.eqiad.wmnet with reason: Resyncing from master
10:23 hnowlan@puppetmaster1001: conftool action : set/pooled=no; selector: name=maps1010.eqiad.wmnet
10:23 hnowlan@puppetmaster1001: conftool action : set/pooled=yes; selector: name=maps1008.eqiad.wmnet
10:14 marostegui: Optimize huwiki.flaggedtemplates T290057
10:11 hnowlan@puppetmaster1001: conftool action : set/pooled=yes; selector: name=maps1008.eqiad.wmnet
08:39 marostegui: Optimize plwiki.flaggedtemplates T290057
08:18 marostegui: Optimize cewiki.flaggedtemplates T290057
08:05 marostegui: Optimize plwiktionary.flaggedtemplates T290057
07:44 marostegui: Optimize ruwiki.flaggedtemplates T290057
07:01 XioNoX: drain eqsin-codfw link
06:56 marostegui@cumin1001: dbctl commit (dc=all): 'db2110 (re)pooling @ 100%: Slowly repool after reimage T288803', diff saved to https://phabricator.wikimedia.org/P17113 and previous config saved to /var/cache/conftool/dbconfig/20210831-065600-root.json
06:40 marostegui@cumin1001: dbctl commit (dc=all): 'db2110 (re)pooling @ 75%: Slowly repool after reimage T288803', diff saved to https://phabricator.wikimedia.org/P17112 and previous config saved to /var/cache/conftool/dbconfig/20210831-064056-root.json
06:25 marostegui@cumin1001: dbctl commit (dc=all): 'db2110 (re)pooling @ 50%: Slowly repool after reimage T288803', diff saved to https://phabricator.wikimedia.org/P17111 and previous config saved to /var/cache/conftool/dbconfig/20210831-062553-root.json
06:10 marostegui@cumin1001: dbctl commit (dc=all): 'db2110 (re)pooling @ 25%: Slowly repool after reimage T288803', diff saved to https://phabricator.wikimedia.org/P17110 and previous config saved to /var/cache/conftool/dbconfig/20210831-061049-root.json
06:06 marostegui: Rename flaggedrevs_stats2 and flaggedrevs_stats on dewiki codfw T289050
05:55 marostegui@cumin1001: dbctl commit (dc=all): 'db2110 (re)pooling @ 10%: Slowly repool after reimage T288803', diff saved to https://phabricator.wikimedia.org/P17109 and previous config saved to /var/cache/conftool/dbconfig/20210831-055546-root.json
03:39 eileen: civicrm revision changed from e89504652a to 718aa9cad3, config revision is cb0a008cad
02:33 mwdebug-deploy@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
02:31 mwdebug-deploy@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
02:09 mwdebug-deploy@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
02:08 mwdebug-deploy@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
02:04 eileen: tools revision changed from 14e4125f73 to 1d67c52c12

2021-08-30

23:14 mwdebug-deploy@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
23:13 mwdebug-deploy@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
23:11 urbanecm: Evening B&C done
23:11 urbanecm@deploy1002: Synchronized php-1.37.0-wmf.20/extensions/GrowthExperiments/includes/Specials/SpecialMentorDashboard.php: 9e2264a: Instrument Special:MentorDashboard (T289369) (duration: 00m 55s)
23:08 urbanecm@deploy1002: Synchronized php-1.37.0-wmf.20/extensions/GrowthExperiments/includes/Specials/SpecialHomepage.php: 9e2264a: Instrument Special:MentorDashboard (T289369) (duration: 00m 57s)
21:56 eileen: civicrm revision changed from 13bf3a02df to e89504652a, config revision is cb0a008cad
19:59 mwdebug-deploy@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
19:57 mwdebug-deploy@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
19:52 urbanecm@deploy1002: Synchronized wmf-config/InitialiseSettings.php: 9a92e2a: Fix mediawiki.mentor_dashboard.visits definition (duration: 00m 56s)
19:08 tgr: morning deploys done for real
19:06 tgr@deploy1002: Synchronized wmf-config/InitialiseSettings.php: Config: Fix schema definition for mediawiki.mentor_dashboard.visit (T289369) (duration: 00m 56s)
19:05 mwdebug-deploy@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
19:03 mwdebug-deploy@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
18:49 tgr@deploy1002: Synchronized wmf-config/InitialiseSettings.php: Config: Revert: Add mediawiki.mentor_dashboard.visit schema (T289369) (duration: 00m 26s)
18:48 tgr@deploy1002: Scap failed!: 5/6 canaries failed their endpoint checks(https://en.wikipedia.org)
18:43 tgr: morning deploys done
18:43 tgr@deploy1002: scap failed: average error rate on 3/6 canaries increased by 10x (rerun with --force to override this check, see https://logstash.wikimedia.org/goto/83629bcb5560d11e61d3085c89dd9ed6 for details)
18:41 mwdebug-deploy@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
18:38 mwdebug-deploy@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
18:26 mwdebug-deploy@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
18:24 mwdebug-deploy@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
18:22 tgr@deploy1002: Synchronized wmf-config/InitialiseSettings.php: Config: GrowthExperiments: Enable link recommendation for dewiki and nlwiki (T288420 T285254) (duration: 00m 56s)
18:18 mwdebug-deploy@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
18:16 mwdebug-deploy@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
18:14 tgr@deploy1002: Synchronized wmf-config/InitialiseSettings.php: Config: GrowthExperiments: Switch image recommendations flag off (T288797) (duration: 00m 57s)
17:44 ryankemper: [WDQS Deploy] Test query passing on `query.wikidata.org` and icinga looks good. This deploy is done.
17:12 ryankemper: [WDQS Deploy] Restarting `wdqs-categories` across lvs-managed hosts, one node at a time: `sudo -E cumin -b 1 'A:wdqs-all and not A:wdqs-test' 'depool && sleep 45 && systemctl restart wdqs-categories && sleep 45 && pool'`
17:12 ryankemper: [WDQS Deploy] Restarted `wdqs-categories` across both test hosts simultaneously: `sudo -E cumin 'A:wdqs-test' 'systemctl restart wdqs-categories'`
17:12 ryankemper: [WDQS Deploy] Restarted `wdqs-updater` across all hosts, 4 hosts at a time: `sudo -E cumin -b 4 'A:wdqs-all' 'systemctl restart wdqs-updater'`
17:10 ryankemper@deploy1002: Finished deploy [wdqs/wdqs@a17833c]: 0.3.84 (duration: 08m 16s)
17:04 ryankemper: [WDQS Deploy] Tests passing following deploy of `0.3.84` on canary `wdqs1003`; proceeding to rest of fleet
17:02 ryankemper@deploy1002: Started deploy [wdqs/wdqs@a17833c]: 0.3.84
17:02 ryankemper: [WDQS Deploy] Gearing up for deploy of wdqs `0.3.84`. Pre-deploy tests passing on canary `wdqs1003`
17:00 ryankemper: T289483 Pooled `wdqs1013`
16:36 dzahn@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on ganeti1024.eqiad.wmnet with reason: REIMAGE
16:34 dzahn@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on ganeti1024.eqiad.wmnet with reason: REIMAGE
16:20 hnowlan@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on maps1008.eqiad.wmnet with reason: Resyncing from master
16:20 hnowlan@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on maps1008.eqiad.wmnet with reason: Resyncing from master
16:20 hnowlan@puppetmaster1001: conftool action : set/pooled=no; selector: name=maps1008.eqiad.wmnet
16:20 hnowlan@puppetmaster1001: conftool action : set/pooled=yes; selector: name=maps1007.eqiad.wmnet
16:16 sukhe: running authdns-update for Gerrit 715499
14:44 dzahn@deploy1002: helmfile [staging] Ran 'sync' command on namespace 'miscweb' for release 'main' .
14:21 hnowlan@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 5:00:00 on maps1007.eqiad.wmnet with reason: Resyncing from master
14:21 hnowlan@cumin1001: START - Cookbook sre.hosts.downtime for 5:00:00 on maps1007.eqiad.wmnet with reason: Resyncing from master
14:21 hnowlan@puppetmaster1001: conftool action : set/pooled=no; selector: name=maps1007.eqiad.wmnet
14:21 hnowlan@puppetmaster1001: conftool action : set/pooled=yes; selector: name=maps1006.eqiad.wmnet
14:18 akosiaris@deploy1002: helmfile [staging] Ran 'sync' command on namespace 'rdf-streaming-updater' for release 'main' .
14:02 mwdebug-deploy@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
14:00 mwdebug-deploy@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
13:55 urbanecm@deploy1002: Synchronized wmf-config/InitialiseSettings.php: b170153: Growth mentor dashboard: Enable beta features only on beta wikis (T280307) (duration: 00m 55s)
13:53 urbanecm@deploy1002: Synchronized wmf-config/InitialiseSettings.php: f1a178e: knwiki: Disable wmgNewUserMessageOnAutoCreate (T289333) (duration: 00m 57s)
13:52 mwdebug-deploy@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
13:48 mwdebug-deploy@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
13:48 urbanecm@deploy1002: Synchronized wmf-config/CommonSettings.php: 6fbcc93: Add missing edit*protected rights to $wgAvailableRights (duration: 00m 56s)
12:12 Amir1: ladsgroup@mwmaint2002:~$ mwscript extensions/WikimediaMaintenance/filebackend/setZoneAccess.php --wiki=jvwikisource --backend=local-multiwrite (T289860)
11:52 jelto@deploy1002: helmfile [codfw] DONE helmfile.d/admin 'apply'.
11:51 jelto@deploy1002: helmfile [codfw] START helmfile.d/admin 'apply'.
11:48 jelto@deploy1002: helmfile [eqiad] DONE helmfile.d/admin 'apply'.
11:47 jelto@deploy1002: helmfile [eqiad] START helmfile.d/admin 'apply'.
11:31 jelto@deploy1002: helmfile [staging-eqiad] DONE helmfile.d/admin 'apply'.
11:30 jelto@deploy1002: helmfile [staging-eqiad] START helmfile.d/admin 'apply'.
10:55 jelto@deploy1002: helmfile [staging-codfw] DONE helmfile.d/admin 'apply'.
10:53 jelto@deploy1002: helmfile [staging-codfw] START helmfile.d/admin 'apply'.
10:21 dcausse@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'rdf-streaming-updater' for release 'main' .
09:51 mwdebug-deploy@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
09:46 mwdebug-deploy@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
09:34 ladsgroup@deploy1002: Synchronized wmf-config/InitialiseSettings.php: Config: Set $wgIncludejQueryMigrate to false in group0 (T280944) (duration: 00m 57s)
09:01 hnowlan@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 5:00:00 on maps1006.eqiad.wmnet with reason: Resyncing from master
09:01 hnowlan@cumin1001: START - Cookbook sre.hosts.downtime for 5:00:00 on maps1006.eqiad.wmnet with reason: Resyncing from master
09:01 hnowlan@cumin1001: END (FAIL) - Cookbook sre.postgresql.postgres-init (exit_code=99)
09:00 hnowlan@cumin1001: START - Cookbook sre.postgresql.postgres-init
08:59 hnowlan@puppetmaster1001: conftool action : set/pooled=no; selector: name=maps1006.eqiad.wmnet
08:57 hnowlan@puppetmaster1001: conftool action : set/pooled=yes; selector: name=maps1005.eqiad.wmnet
08:57 godog: +100G to prometheus/global in codfw
08:04 vgutierrez: pool cp2027 - T289908
06:53 elukey: drop an-airflow1001's old airflow logs to fix root partition almost filled up
06:38 godog: more weight to ms-be20[62-65] - T288458
05:44 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on db2110.codfw.wmnet with reason: REIMAGE
05:42 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on db2110.codfw.wmnet with reason: REIMAGE
05:23 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db2110 for reimage T288803', diff saved to https://phabricator.wikimedia.org/P17105 and previous config saved to /var/cache/conftool/dbconfig/20210830-052336-marostegui.json

2021-08-29

00:15 mwdebug-deploy@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
00:13 mwdebug-deploy@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .

2021-08-28

23:26 mwdebug-deploy@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
23:24 mwdebug-deploy@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
09:12 elukey: powercycle cp2027 - OEM event registered in racadm getsel, no tty, no ssh
09:11 elukey@puppetmaster1001: conftool action : set/pooled=no; selector: name=cp2027.codfw.wmnet

2021-08-27

16:46 hnowlan@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 3 days, 0:00:00 on maps1005.eqiad.wmnet with reason: Resyncing from master
16:46 hnowlan@cumin1001: START - Cookbook sre.hosts.downtime for 3 days, 0:00:00 on maps1005.eqiad.wmnet with reason: Resyncing from master
14:50 akosiaris: stop flink on staging cluster to verify some IOPS starvation issues
14:46 akosiaris@deploy1002: helmfile [staging-codfw] DONE helmfile.d/admin 'sync'.
14:45 akosiaris@deploy1002: helmfile [staging-codfw] START helmfile.d/admin 'sync'.
14:44 akosiaris@deploy1002: helmfile [staging-eqiad] DONE helmfile.d/admin 'sync'.
14:44 akosiaris@deploy1002: helmfile [staging-eqiad] START helmfile.d/admin 'sync'.
14:44 akosiaris@deploy1002: helmfile [staging-eqiad] DONE helmfile.d/admin 'sync'.
14:44 akosiaris@deploy1002: helmfile [staging-eqiad] START helmfile.d/admin 'sync'.
14:39 hnowlan@puppetmaster1001: conftool action : set/pooled=no; selector: name=maps1005.eqiad.wmnet
14:38 hnowlan@cumin1001: END (FAIL) - Cookbook sre.postgresql.postgres-init (exit_code=99)
14:37 hnowlan@cumin1001: START - Cookbook sre.postgresql.postgres-init
14:30 hnowlan@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 4:00:00 on maps1005.eqiad.wmnet with reason: Resyncing from master
14:30 hnowlan@cumin1001: START - Cookbook sre.hosts.downtime for 4:00:00 on maps1005.eqiad.wmnet with reason: Resyncing from master
13:48 dzahn@deploy1002: helmfile [staging] Ran 'sync' command on namespace 'miscweb' for release 'main' .
12:49 mutante: rsynced /srv/org/wikimedia/racktables from miscweb1002 to miscweb2002 (T269746)
12:04 topranks: removing peering to Wave Division Holdings / AS11404 at Equinix Chicago cr2-eqord, AS no longer on exchange.
10:56 akosiaris: sudo cumin 'mw*' 'ip ro ls dev docker0 && sysctl net.ipv4.ip_forward=0' to clear up the docker remnants of the dragonfly evaluation. T286054
10:31 godog: bounce logstash on logstash1007
10:22 elukey: fallback codfw ores to rdb2007 after maintenance
10:18 jiji@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host rdb2007.codfw.wmnet
10:12 jiji@cumin1001: START - Cookbook sre.hosts.reboot-single for host rdb2007.codfw.wmnet
09:49 elukey: restart ores uwsgi/celery workers to failover rdb2007 to rdb2008 (and ease the reboot of rdb2007
09:33 topranks: Running homer against mr1-ulsfo to force OOB interface to 100Mb/full-duplex - T288343
09:25 cmooney@cumin1001: END (PASS) - Cookbook sre.deploy.python-code (exit_code=0) homer to cumin2002.codfw.wmnet with reason: Update to expose int type from Netbox - cmooney@cumin1001
09:25 cmooney@cumin1001: START - Cookbook sre.deploy.python-code homer to cumin2002.codfw.wmnet with reason: Update to expose int type from Netbox - cmooney@cumin1001
09:23 cmooney@deploy1002: Finished deploy [homer/deploy@8183056]: Homer update exposing interface type from Netbox - T288343 (duration: 01m 28s)
09:21 cmooney@deploy1002: Started deploy [homer/deploy@8183056]: Homer update exposing interface type from Netbox - T288343
08:05 tstarling@deploy1002: Synchronized php-1.37.0-wmf.20/extensions/SecurePoll/cli/wm-scripts/sendMail.php: (no justification provided) (duration: 00m 56s)
07:49 jayme: stopped kube-apiserver on kubestagemaster2001 for testing
07:49 jayme: stopped kube-apiserver on kubestage2001 for testing
07:00 godog: bounce logstash on logstash1008
06:43 mwdebug-deploy@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
06:41 tstarling@deploy1002: Synchronized php-1.37.0-wmf.20/extensions/SecurePoll/cli/wm-scripts/sendMail.php: (no justification provided) (duration: 00m 56s)
06:41 mwdebug-deploy@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
00:46 mwdebug-deploy@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
00:44 mwdebug-deploy@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
00:44 legoktm@deploy1002: Synchronized php-1.37.0-wmf.20/extensions/PageTriage/: Revert backbone.js and underscore.js updates (T289825) (duration: 01m 06s)

2021-08-26

22:06 legoktm: restarted mailman3-web on lists1001 (T289798)
19:09 mwdebug-deploy@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
19:08 mwdebug-deploy@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
19:02 dancy@deploy1002: rebuilt and synchronized wikiversions files: group2 wikis to 1.37.0-wmf.20
18:59 cmjohnson@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
18:54 cmjohnson@cumin1001: START - Cookbook sre.dns.netbox
18:26 mwdebug-deploy@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
18:24 mwdebug-deploy@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
18:19 urbanecm@deploy1002: Synchronized wmf-config/InitialiseSettings.php: 66717bc: Install Extension Quiz on ja.wikibooks (T289383) (duration: 01m 05s)
18:17 mwdebug-deploy@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
18:16 sukhe@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 0:30:00 on durum1001.eqiad.wmnet with reason: testing out durum
18:16 sukhe@cumin1001: START - Cookbook sre.hosts.downtime for 0:30:00 on durum1001.eqiad.wmnet with reason: testing out durum
18:15 mwdebug-deploy@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
18:15 urbanecm@deploy1002: Synchronized wmf-config/InitialiseSettings.php: cde8891: Install Extension Quiz on fa.wikibooks (T289381) (duration: 01m 07s)
18:09 mwdebug-deploy@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
18:07 mwdebug-deploy@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
18:03 urbanecm@deploy1002: Synchronized wmf-config/InitialiseSettings.php: d4340e9: Finalize Event Platform migration of EchoEmail and EchoInteraction (T287210) (duration: 01m 07s)
17:40 mwdebug-deploy@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
17:38 mwdebug-deploy@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
17:31 mwdebug-deploy@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
17:30 dancy@deploy1002: Synchronized php: group1 wikis to 1.37.0-wmf.20 (duration: 01m 05s)
17:30 mwdebug-deploy@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
17:29 dancy@deploy1002: rebuilt and synchronized wikiversions files: group1 wikis to 1.37.0-wmf.20
17:26 dancy@deploy1002: Synchronized php-1.37.0-wmf.20/includes/page/PageStore.php: Backport: PageStore: Pass query flags to getPageById() too (T289717 T195069) (duration: 01m 05s)
16:27 elukey@deploy1002: helmfile [ml-serve-eqiad] DONE helmfile.d/admin 'sync'.
16:26 elukey@deploy1002: helmfile [ml-serve-eqiad] START helmfile.d/admin 'sync'.
15:56 sukhe: ran homer for Gerrit 715007: Set up BGP peering to durum1001 in eqiad
15:41 elukey@deploy1002: helmfile [ml-serve-eqiad] DONE helmfile.d/admin 'sync'.
15:40 elukey@deploy1002: helmfile [ml-serve-eqiad] START helmfile.d/admin 'sync'.
14:24 Amir1: start of mwscript extensions/FlaggedRevs/maintenance/pruneRevData.php --wiki=plwiki --prune --batch-size=10 --sleep=2 (T289249)
13:19 klausman@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ml-serve1004.eqiad.wmnet
13:15 klausman@cumin1001: START - Cookbook sre.hosts.reboot-single for host ml-serve1004.eqiad.wmnet
13:04 klausman@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ml-serve1003.eqiad.wmnet
12:59 klausman@cumin1001: START - Cookbook sre.hosts.reboot-single for host ml-serve1003.eqiad.wmnet
12:57 elukey@deploy1002: helmfile [ml-serve-eqiad] DONE helmfile.d/admin 'sync'.
12:56 elukey@deploy1002: helmfile [ml-serve-eqiad] START helmfile.d/admin 'sync'.
12:21 sukhe: running puppet initial run on durum1001.eqiad.wmnet - T289536
11:50 mwdebug-deploy@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
11:48 mwdebug-deploy@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
11:42 mwdebug-deploy@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
11:40 Lucas_WMDE: EU backport+config window done
11:40 mwdebug-deploy@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
11:39 lucaswerkmeister-wmde@deploy1002: Synchronized php-1.37.0-wmf.19/extensions/Math/src/HookHandlers/ParserHooksHandler.php: Backport: Allow rendering of <math>0</math> (T288846) (duration: 01m 04s)
11:35 lucaswerkmeister-wmde@deploy1002: Synchronized php-1.37.0-wmf.20/extensions/Math/src/HookHandlers/ParserHooksHandler.php: Backport: Allow rendering of <math>0</math> (T288846) (duration: 01m 05s)
11:32 dzahn@cumin1001: END (PASS) - Cookbook sre.ganeti.makevm (exit_code=0) for new host durum1001.eqiad.wmnet
11:21 dzahn@cumin1001: START - Cookbook sre.ganeti.makevm for new host durum1001.eqiad.wmnet
11:20 nikerabbit@deploy1002: Synchronized wmf-config/CommonSettings.php: Config: Rename wgTranslateBlacklist to wgTranslateDisabledTargetLanguages (duration: 01m 05s)
11:13 mwdebug-deploy@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
11:12 mwdebug-deploy@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
10:09 vgutierrez: rolling restart of varnishkafka-statsv - T289618
10:07 vgutierrez: disable puppet on cp-text to merge I52cf2a - T286038
10:06 klausman@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ml-serve1001.eqiad.wmnet
10:01 klausman@cumin1001: START - Cookbook sre.hosts.reboot-single for host ml-serve1001.eqiad.wmnet
09:36 klausman@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ml-serve1002.eqiad.wmnet
09:30 klausman@cumin1001: START - Cookbook sre.hosts.reboot-single for host ml-serve1002.eqiad.wmnet
09:24 klausman@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ml-serve-ctrl1001.eqiad.wmnet
09:21 elukey: elukey@kafka-main1001:~$ kafka acls --add --allow-principal User:CN=varnishkafka --producer --topic statsv - T286038
09:21 klausman@cumin1001: START - Cookbook sre.hosts.reboot-single for host ml-serve-ctrl1001.eqiad.wmnet
09:20 klausman@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ml-etcd1003.eqiad.wmnet
09:17 elukey: restart varnishkafka-statsv on cp4032 to pick up TLS settings
09:15 klausman@cumin1001: START - Cookbook sre.hosts.reboot-single for host ml-etcd1003.eqiad.wmnet
09:15 klausman@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ml-etcd1002.eqiad.wmnet
09:13 klausman@cumin1001: START - Cookbook sre.hosts.reboot-single for host ml-etcd1002.eqiad.wmnet
09:12 klausman@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ml-etcd1001.eqiad.wmnet
09:10 klausman@cumin1001: START - Cookbook sre.hosts.reboot-single for host ml-etcd1001.eqiad.wmnet
08:52 vgutierrez: restart varnishkafka-statsv on cp4032
06:59 marostegui@cumin1001: END (FAIL) - Cookbook sre.hosts.downtime (exit_code=99) for 2:00:00 on db1138.eqiad.wmnet with reason: REIMAGE
06:57 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on db1138.eqiad.wmnet with reason: REIMAGE
06:48 godog: more weight to ms-be20[62-65] - T288458
06:46 marostegui@cumin1001: dbctl commit (dc=all): 'Repool db1160 T288273', diff saved to https://phabricator.wikimedia.org/P17085 and previous config saved to /var/cache/conftool/dbconfig/20210826-064655-marostegui.json
06:43 marostegui: Reimage s4 eqiad master (db1138), expect lag on eqiad T288803
06:37 elukey@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
06:33 elukey@cumin1001: START - Cookbook sre.dns.netbox

2021-08-25

23:23 mwdebug-deploy@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
23:22 mwdebug-deploy@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
23:20 urbanecm: Evening B&C window completed
23:19 urbanecm@deploy1002: Synchronized php-1.37.0-wmf.20/extensions/GlobalWatchlist/modules/EntryLog.js: 230aec3: GlobalWatchlistEntryLog: fix storing log id (T288385) (duration: 01m 07s)
22:19 mwdebug-deploy@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
22:18 mwdebug-deploy@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
22:11 mwdebug-deploy@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
22:10 legoktm@deploy1002: Synchronized debug.json: List primary DC servers first (T289246) (duration: 01m 04s)
22:09 mwdebug-deploy@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
22:07 urbanecm@deploy1002: Synchronized php-1.37.0-wmf.20/extensions/Flow/includes/Content/BoardContent.php: 694b946: BoardContent: Fix deprecation warning (T289625) (duration: 01m 04s)
22:04 urbanecm@deploy1002: Synchronized php-1.37.0-wmf.20/extensions/VisualEditor/includes/ApiVisualEditor.php: 73478bc: Make sure params is an array (T289730) (duration: 01m 04s)
22:00 legoktm@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'shellbox-constraints' for release 'main' .
21:59 brennen: 1.37.0-wmf.20 train status (T281161) blockers should be patched shortly; as we've reached the 15:00 Pacific deploy cutoff for the day, train will resume first thing in US morning
21:58 legoktm@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'shellbox-constraints' for release 'main' .
21:37 mwdebug-deploy@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
21:36 mwdebug-deploy@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
21:35 urbanecm@deploy1002: Synchronized php-1.37.0-wmf.20/extensions/DiscussionTools/includes/Notifications/EventDispatcher.php: cc04b33: EventDispatcher: Try really, really hard to read from master (T289717) (duration: 01m 04s)
21:32 urbanecm@deploy1002: Synchronized php-1.37.0-wmf.20/includes/page/PageStore.php: 34fb2b9: PageStore: Pass query flags to getPageByName() (T289717; T195069) (duration: 01m 06s)
21:19 mwdebug-deploy@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
21:17 mwdebug-deploy@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
21:14 urbanecm@deploy1002: Synchronized php-1.37.0-wmf.20/extensions/ConfirmEdit/SimpleCaptcha/SimpleCaptcha.php: 190d8b7: Use Parser::getUserIdentity() instead of ::getUser() in SimpleCaptcha (T289731) (duration: 01m 05s)
21:05 mwdebug-deploy@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
21:04 mwdebug-deploy@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
21:03 urbanecm@deploy1002: Synchronized php-1.37.0-wmf.20/extensions/ProofreadPage/: 913043a: Fixes exception thrown by FilePagination::getPageNumber (T289728) (duration: 01m 06s)
20:02 brennen: 1.37.0-wmf.20 (T281161) status: blocked at group0; 2/3 blockers have probable patches, all seem to be getting attention, so holding off on blocker mail for now.
19:54 urbanecm: enwikisource: Start server-side upload for one video file (T289698)
19:45 urbanecm: Start server-side upload for ~2 GB tiff file (T289711)
19:31 mwdebug-deploy@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
19:29 mwdebug-deploy@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
19:28 dancy@deploy1002: Synchronized php: group1 wikis to 1.37.0-wmf.19 (duration: 01m 05s)
19:27 dancy@deploy1002: rebuilt and synchronized wikiversions files: group1 wikis to 1.37.0-wmf.19
19:17 mwdebug-deploy@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
19:16 mwdebug-deploy@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
19:14 dancy@deploy1002: Synchronized php: group1 wikis to 1.37.0-wmf.20 (duration: 01m 04s)
19:13 dancy@deploy1002: rebuilt and synchronized wikiversions files: group1 wikis to 1.37.0-wmf.20
19:10 eileen: tools revision changed from 15bfaa7117 to 14e4125f73
18:43 cmjohnson@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
18:42 robh@cumin1001: END (FAIL) - Cookbook sre.dns.netbox (exit_code=99)
18:34 mwdebug-deploy@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
18:32 mwdebug-deploy@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
18:30 cmjohnson@cumin1001: START - Cookbook sre.dns.netbox
18:25 mwdebug-deploy@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
18:25 urbanecm@deploy1002: Synchronized php-1.37.0-wmf.20/extensions/Flow/modules/editor/editors/visualeditor/ui/inspectors/mw.flow.ve.ui.MentionInspector.js: dd464b4: Fix reference to renamed abortAllApiRequests method (T289648) (duration: 01m 04s)
18:24 mwdebug-deploy@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
18:23 urbanecm@deploy1002: Synchronized php-1.37.0-wmf.20/skins/WikimediaApiPortal/src/Component/NotificationAlertComponent.php: a5bfcc8: Remove call to text() on string (T289692) (duration: 01m 04s)
18:23 cmjohnson@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
18:18 urbanecm@deploy1002: Synchronized wmf-config/InitialiseSettings.php: e7c8c04: Add Wikimedia ES to $wgCopyUploadsDomains whitelist (T289446) (duration: 01m 04s)
18:17 mwdebug-deploy@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
18:16 urbanecm@deploy1002: Synchronized wmf-config/InitialiseSettings.php: e6df080: Disable legacy media dom on a few more wikis (T51097) (duration: 01m 05s)
18:15 mwdebug-deploy@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
18:15 robh@cumin1001: START - Cookbook sre.dns.netbox
18:13 cmjohnson@cumin1001: START - Cookbook sre.dns.netbox
18:12 cmjohnson@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
18:08 mwdebug-deploy@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
18:07 mwdebug-deploy@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
18:07 urbanecm@deploy1002: Synchronized wmf-config/InitialiseSettings.php: 5182ac8: Disable upcoming DiscussionTools automatic topic subscriptions for now (duration: 01m 04s)
18:06 cmjohnson@cumin1001: START - Cookbook sre.dns.netbox
18:06 cmjohnson@cumin1001: END (ERROR) - Cookbook sre.dns.netbox (exit_code=97)
18:05 urbanecm@deploy1002: Synchronized wmf-config/InitialiseSettings.php: 2b14eb5: Enable topic subscriptions as a beta feature on Wikipedias except enwiki (T287801) (duration: 01m 06s)
18:04 cmjohnson@cumin1001: START - Cookbook sre.dns.netbox
17:59 cmjohnson@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
17:56 cmjohnson@cumin1001: START - Cookbook sre.dns.netbox
17:56 cmjohnson@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
17:53 cmjohnson@cumin1001: START - Cookbook sre.dns.netbox
17:52 cmjohnson@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
17:50 mwdebug-deploy@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
17:48 mwdebug-deploy@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
17:48 cmjohnson@cumin1001: START - Cookbook sre.dns.netbox
17:46 lucaswerkmeister-wmde@deploy1002: Synchronized php-1.37.0-wmf.19/extensions/Wikibase/repo/includes/Content/EntityHandler.php: Backport: Set EntityHandler::generateHTMLOnEdit to false (T285987) (duration: 01m 06s)
17:45 cmjohnson@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
17:38 cmjohnson@cumin1001: START - Cookbook sre.dns.netbox
17:31 mwdebug-deploy@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
17:30 mwdebug-deploy@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
17:29 cmjohnson@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
17:27 lucaswerkmeister-wmde@deploy1002: Synchronized php-1.37.0-wmf.20/extensions/Wikibase: Backport: Return normalized snaks from SetClaim, SetReference (T289501) (duration: 01m 11s)
17:24 cmjohnson@cumin1001: START - Cookbook sre.dns.netbox
17:14 ryankemper: T289483 Depooled `wdqs1013`
17:14 ladsgroup@deploy1002: Synchronized php-1.37.0-wmf.20/extensions/Wikibase/repo/includes/Content/EntityHandler.php: Backport: Set EntityHandler::generateHTMLOnEdit to false (T285987) (duration: 01m 18s)
17:12 mwdebug-deploy@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
17:11 mwdebug-deploy@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
15:54 elukey@deploy1002: helmfile [ml-serve-eqiad] DONE helmfile.d/admin 'sync'.
15:54 elukey@deploy1002: helmfile [ml-serve-eqiad] START helmfile.d/admin 'sync'.
15:22 urbanecm: Run `User::newSystemUser( 'MediaWiki default', ['steal' => true] )` in mywiki shell.php session (same issue as T289690)
15:16 urbanecm: [urbanecm@mwmaint2002 ~]$ mwscript extensions/WikimediaMaintenance/createExtensionTables.php --wiki=zh_yuewiki growthexperiments # T289680
15:04 klausman@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ml-serve2004.codfw.wmnet
15:04 mwdebug-deploy@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
15:02 urbanecm@deploy1002: Synchronized php-1.37.0-wmf.19/extensions/GrowthExperiments/includes/Config/WikiPageConfigWriter.php: 0b9ca1e: WikiPageConfigWriter: Fix `autopatrol` right name (T288886) (duration: 01m 04s)
15:02 mwdebug-deploy@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
15:00 urbanecm@deploy1002: Synchronized wmf-config/InitialiseSettings.php: 0ccac4b: Deploy Growth features to 44 new Wikipedias in dark mode (T289680; 3/3) (duration: 01m 06s)
14:59 klausman@cumin2002: START - Cookbook sre.hosts.reboot-single for host ml-serve2004.codfw.wmnet
14:58 klausman@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ml-serve2003.codfw.wmnet
14:56 urbanecm@deploy1002: Synchronized wmf-config/config/: 0ccac4b: Deploy Growth features to 44 new Wikipedias in dark mode (T289680; 2/3) (duration: 01m 05s)
14:55 mwdebug-deploy@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
14:55 urbanecm@deploy1002: Synchronized dblists/growthexperiments.dblist: 0ccac4b: Deploy Growth features to 44 new Wikipedias in dark mode (T289680; 1/3) (duration: 01m 06s)
14:54 urbanecm@deploy1002: sync-file aborted: 0ccac4b: Deploy Growth features to 44 new Wikipedias in dark mode (T289680) (duration: 00m 01s)
14:54 mwdebug-deploy@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
14:52 klausman@cumin2002: START - Cookbook sre.hosts.reboot-single for host ml-serve2003.codfw.wmnet
14:52 klausman@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ml-serve2002.codfw.wmnet
14:46 klausman@cumin2002: START - Cookbook sre.hosts.reboot-single for host ml-serve2002.codfw.wmnet
14:42 urbanecm: [urbanecm@mwmaint2002 /srv/mediawiki/php]$ mwscript extensions/GrowthExperiments/maintenance/initWikiConfig.php --wiki=brwiki # T289690, T289680
14:40 urbanecm: Run `User::newSystemUser( 'MediaWiki default', ['steal' => true] )` in brwiki shell.php session (T289690)
14:35 klausman@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ml-serve2001.codfw.wmnet
14:32 urbanecm: mwmaint2002: scap pull # clearing temporary config changes
14:30 klausman@cumin2002: START - Cookbook sre.hosts.reboot-single for host ml-serve2001.codfw.wmnet
14:29 klausman@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ml-serve-ctrl2002.codfw.wmnet
14:26 klausman@cumin2002: START - Cookbook sre.hosts.reboot-single for host ml-serve-ctrl2002.codfw.wmnet
14:25 klausman@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ml-serve-ctrl2001.codfw.wmnet
14:23 urbanecm: [urbanecm@mwmaint2002 /srv/mediawiki/php]$ foreachwikiindblist growthexperiments extensions/GrowthExperiments/maintenance/initWikiConfig.php # T289680 # r714765 applied at mwmaint2002
14:22 urbanecm: Apply https://gerrit.wikimedia.org/r/c/operations/mediawiki-config/+/714765/ at mwmaint2002 temporarily (T289680)
14:21 klausman@cumin2002: START - Cookbook sre.hosts.reboot-single for host ml-serve-ctrl2001.codfw.wmnet
14:20 urbanecm: Create GrowthExperiments DB tables for wikis listed in P17081 (T289680)
14:20 klausman@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ml-etcd2003.codfw.wmnet
14:18 klausman@cumin2002: START - Cookbook sre.hosts.reboot-single for host ml-etcd2003.codfw.wmnet
14:17 klausman@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ml-etcd2002.codfw.wmnet
14:15 klausman@cumin2002: START - Cookbook sre.hosts.reboot-single for host ml-etcd2002.codfw.wmnet
14:12 klausman@cumin2001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ml-etcd2001.codfw.wmnet
14:10 ejegg: updated fundraising CiviCRM from d60442e119 to 13bf3a02df
14:08 klausman@cumin2001: START - Cookbook sre.hosts.reboot-single for host ml-etcd2001.codfw.wmnet
13:59 volans@cumin1001: END (FAIL) - Cookbook sre.hosts.downtime (exit_code=99) for 0:05:00 on cumin2001.codfw.wmnet with reason: apostrophe's test failure
13:59 volans@cumin1001: START - Cookbook sre.hosts.downtime for 0:05:00 on cumin2001.codfw.wmnet with reason: apostrophe's test failure
13:57 ejegg: updated fundraising CiviCRM from 42bb64c608 to d60442e119
13:53 volans@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 0:05:00 on cumin1001.eqiad.wmnet with reason: apostrophe's test
13:53 volans@cumin2002: START - Cookbook sre.hosts.downtime for 0:05:00 on cumin1001.eqiad.wmnet with reason: apostrophe's test
13:51 volans: upgraded spicerack to 0.0.58 on cumin2002
13:37 joal@deploy1002: Finished deploy [analytics/refinery@7bed213] (hadoop-test): Regular analytics weekly train TEST [analytics/refinery@7bed213] (duration: 05m 55s)
13:32 joal@deploy1002: Started deploy [analytics/refinery@7bed213] (hadoop-test): Regular analytics weekly train TEST [analytics/refinery@7bed213]
13:31 joal@deploy1002: Finished deploy [analytics/refinery@7bed213] (thin): Regular analytics weekly train THIN [analytics/refinery@7bed213] (duration: 00m 07s)
13:31 joal@deploy1002: Started deploy [analytics/refinery@7bed213] (thin): Regular analytics weekly train THIN [analytics/refinery@7bed213]
13:31 joal@deploy1002: Finished deploy [analytics/refinery@7bed213]: Regular analytics weekly train [analytics/refinery@7bed213] (duration: 20m 25s)
13:10 joal@deploy1002: Started deploy [analytics/refinery@7bed213]: Regular analytics weekly train [analytics/refinery@7bed213]
13:03 jayme: restarted all pods in kube-system namespace in codfw k8s cluster - T289131
12:25 kormat@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
12:21 kormat@cumin1001: START - Cookbook sre.dns.netbox
11:39 jayme: slowly restarting all pods in kube-system namespace in eqiad k8s cluster - T289131
11:38 btullis@cumin1001: END (PASS) - Cookbook sre.ganeti.makevm (exit_code=0) for new host an-test-coord1002.eqiad.wmnet
11:32 kharlan@deploy1002: Synchronized php-1.37.0-wmf.20/extensions/VisualEditor/includes/ApiVisualEditorEdit.php: Backport: ApiVisualEditorEdit: data-{plugin} is not multi (T289652) (duration: 01m 06s)
11:30 mwdebug-deploy@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
11:28 mwdebug-deploy@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
11:18 volans: uploaded spicerack_0.0.58 to apt.wikimedia.org buster-wikimedia,bullseye-wikimedia
11:02 jiji@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host rdb2010.codfw.wmnet
10:57 jiji@cumin1001: START - Cookbook sre.hosts.reboot-single for host rdb2010.codfw.wmnet
10:54 jiji@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host rdb2008.codfw.wmnet
10:49 ladsgroup@deploy1002: Synchronized php-1.37.0-wmf.19/includes/Storage/DerivedPageDataUpdater.php: Backport: Introduce concept of generateHTMLOnEdit() for ContentHandler (T285987), Part II (duration: 01m 04s)
10:47 ladsgroup@deploy1002: Synchronized php-1.37.0-wmf.19/includes/content/ContentHandler.php: Backport: Introduce concept of generateHTMLOnEdit() for ContentHandler (T285987), Part I (duration: 01m 08s)
10:46 mwdebug-deploy@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
10:45 jiji@cumin1001: START - Cookbook sre.hosts.reboot-single for host rdb2008.codfw.wmnet
10:45 mwdebug-deploy@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
10:21 jbond: rolling out openssl updates
10:07 mwdebug-deploy@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
10:05 mwdebug-deploy@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
10:03 ladsgroup@deploy1002: Synchronized php-1.37.0-wmf.20/includes: Backport: Introduce concept of generateHTMLOnEdit() for ContentHandler (T285987) (duration: 02m 17s)
10:01 mutante: - removed jmads from wmf group
09:59 btullis@cumin1001: START - Cookbook sre.ganeti.makevm for new host an-test-coord1002.eqiad.wmnet
09:49 jiji@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host rdb1011.eqiad.wmnet
09:44 jiji@cumin1001: START - Cookbook sre.hosts.reboot-single for host rdb1011.eqiad.wmnet
09:35 jayme@deploy1002: helmfile [codfw] DONE helmfile.d/admin 'apply'.
09:35 jiji@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host rdb1012.eqiad.wmnet
09:35 jayme@deploy1002: helmfile [codfw] START helmfile.d/admin 'apply'.
09:35 jayme@deploy1002: helmfile [eqiad] DONE helmfile.d/admin 'apply'.
09:34 jayme@deploy1002: helmfile [eqiad] START helmfile.d/admin 'apply'.
09:30 jiji@cumin1001: START - Cookbook sre.hosts.reboot-single for host rdb1012.eqiad.wmnet
08:59 jiji@cumin1001: END (FAIL) - Cookbook sre.hosts.downtime (exit_code=99) for 2:00:00 on mc2033.codfw.wmnet with reason: REIMAGE
08:57 jiji@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on mc2033.codfw.wmnet with reason: REIMAGE
08:17 godog: swift codfw add ms-be20[62-65] with initial weight - T288458
07:01 marostegui@cumin1001: END (FAIL) - Cookbook sre.hosts.downtime (exit_code=99) for 2:00:00 on db1160.eqiad.wmnet with reason: REIMAGE
06:59 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on db1160.eqiad.wmnet with reason: REIMAGE
06:43 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db1160 for reimage T288803', diff saved to https://phabricator.wikimedia.org/P17078 and previous config saved to /var/cache/conftool/dbconfig/20210825-064319-marostegui.json
06:08 kormat@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2 days, 0:00:00 on db2118.codfw.wmnet with reason: Reimaging T288244
06:08 kormat@cumin1001: START - Cookbook sre.hosts.downtime for 2 days, 0:00:00 on db2118.codfw.wmnet with reason: Reimaging T288244
06:07 kormat@cumin1001: dbctl commit (dc=all): 'Depool db2118 until it's reimaged to buster T289129', diff saved to https://phabricator.wikimedia.org/P17077 and previous config saved to /var/cache/conftool/dbconfig/20210825-060742-kormat.json
06:02 kormat@cumin1001: dbctl commit (dc=all): 'Promote db2121 to s7 primary and set section read-write T289129', diff saved to https://phabricator.wikimedia.org/P17076 and previous config saved to /var/cache/conftool/dbconfig/20210825-060222-kormat.json
06:01 kormat@cumin1001: dbctl commit (dc=all): 'Set s7 codfw as read-only for maintenance - T289129', diff saved to https://phabricator.wikimedia.org/P17075 and previous config saved to /var/cache/conftool/dbconfig/20210825-060112-kormat.json
06:00 kormat: Starting s7 codfw failover from db2118 to db2121 - T289129
05:33 eileen: civicrm revision changed from a4ce949828 to 42bb64c608, config revision is 1afcea7f5b
05:28 kormat: Moving s7 codfw replicas under db2121 - T289129
05:27 kormat@cumin1001: dbctl commit (dc=all): 'Set db2121 with weight 0 T289129', diff saved to Unable to send diff to phaste and previous config saved to /var/cache/conftool/dbconfig/20210825-052741-kormat.json
05:27 kormat@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:04:00 on 27 hosts with reason: Primary switchover s7 T289129
05:27 kormat@cumin1001: START - Cookbook sre.hosts.downtime for 1:04:00 on 27 hosts with reason: Primary switchover s7 T289129
02:06 eileen: civicrm revision changed from 8ed303f2d1 to a4ce949828, config revision is ac2d75d4a8
00:53 legoktm@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'shellbox' for release 'main' .
00:50 legoktm@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'shellbox' for release 'main' .
00:47 legoktm@deploy1002: helmfile [staging] Ran 'sync' command on namespace 'shellbox' for release 'main' .

2021-08-24

22:05 legoktm@deploy1002: helmfile [staging] Ran 'sync' command on namespace 'shellbox' for release 'main' .
22:04 legoktm@deploy1002: helmfile [staging] Ran 'sync' command on namespace 'shellbox-constraints' for release 'main' .
21:10 tgr: running extensions/GrowthExperiments/maintenance/revalidateLinkRecommendations.php on various wikis per T282873#7303828
20:59 mwdebug-deploy@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
20:55 urbanecm@deploy1002: Synchronized wmf-config/InitialiseSettings.php: a6fd96b: Growth features: Promote 9 wikis out of dark mode (T287871; T287874; T287872; T287880; T287868; T287873; T287879; T287875; T287876) (duration: 01m 25s)
20:54 mwdebug-deploy@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
20:43 mwdebug-deploy@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
20:35 mwdebug-deploy@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
20:35 dancy@deploy1002: Pruned MediaWiki: 1.37.0-wmf.17 (duration: 01m 48s)
20:33 dancy@deploy1002: Pruned MediaWiki: 1.37.0-wmf.18 (duration: 03m 26s)
20:27 dancy@deploy1002: rebuilt and synchronized wikiversions files: group0 wikis to 1.37.0-wmf.20
20:18 dancy@deploy1002: Finished scap: testwikis wikis to 1.37.0-wmf.20 (duration: 36m 32s)
20:16 mwdebug-deploy@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
20:09 mwdebug-deploy@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
19:41 dancy@deploy1002: Started scap: testwikis wikis to 1.37.0-wmf.20
17:23 mbsantos@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'mobileapps' for release 'production' .
17:19 mbsantos@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'mobileapps' for release 'production' .
17:17 mbsantos@deploy1002: helmfile [staging] Ran 'sync' command on namespace 'mobileapps' for release 'staging' .
15:26 dcausse@deploy1002: Finished deploy [wikimedia/discovery/analytics@e02c602]: transfer_to_es: stop adding data to article_topics (duration: 02m 17s)
15:23 dcausse@deploy1002: Started deploy [wikimedia/discovery/analytics@e02c602]: transfer_to_es: stop adding data to article_topics
15:15 mwdebug-deploy@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
15:13 mwdebug-deploy@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
14:55 jayme@deploy1002: helmfile [staging-eqiad] DONE helmfile.d/admin 'apply'.
14:54 jayme@deploy1002: helmfile [staging-eqiad] START helmfile.d/admin 'apply'.
14:50 jayme@deploy1002: helmfile [staging-codfw] DONE helmfile.d/admin 'apply'.
14:49 jayme@deploy1002: helmfile [staging-codfw] START helmfile.d/admin 'apply'.
14:23 jiji@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on mc2031.codfw.wmnet with reason: REIMAGE
14:19 jiji@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on mc2031.codfw.wmnet with reason: REIMAGE
13:12 XioNoX: push pfw policies - T289353
12:45 vgutierrez: enable puppet on P:tlsproxy::envoy hosts - merging https://gerrit.wikimedia.org/r/c/operations/puppet/+/710507/9
12:37 vgutierrez: disable puppet on P:tlsproxy::envoy hosts - merging https://gerrit.wikimedia.org/r/c/operations/puppet/+/710507/9
12:33 godog: test patched python3-eventlet on thanos-fe1003 - T283714
12:30 marostegui: Install 10.4.21 on clouddb1015
11:27 jiji@cumin1001: END (FAIL) - Cookbook sre.hosts.downtime (exit_code=99) for 2:00:00 on mc2029.codfw.wmnet with reason: REIMAGE
11:24 jiji@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on mc2029.codfw.wmnet with reason: REIMAGE
09:08 jbond: upload new statograph version
09:02 elukey@deploy1002: helmfile [ml-serve-eqiad] DONE helmfile.d/admin 'sync'.
09:02 elukey@deploy1002: helmfile [ml-serve-eqiad] START helmfile.d/admin 'sync'.
08:54 Amir1: start of mwscript extensions/FlaggedRevs/maintenance/pruneRevData.php --wiki=dewiki --prune --batch-size=5 --sleep=5 (T289249)
08:51 Amir1: start of mwscript extensions/FlaggedRevs/maintenance/pruneRevData.php --wiki=arwiki --prune --batch-size=5 --sleep=5 (T289249)
08:01 godog: temp fix thanos-swift.discovery.wmnet in /etc/hosts to get swift-dispersion-stats to work - T283714
07:51 dcausse: repool wdqs1012 T289551
07:29 dcausse: restarting blazegraph on wdqs1012
07:17 marostegui: Optimize huwiki.flaggedtemplates on db1127
07:15 marostegui: Optimize huwiki.flaggedtemplates on db1098:3317
06:17 jiji@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on mc2037.codfw.wmnet with reason: REIMAGE
06:14 jiji@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on mc2037.codfw.wmnet with reason: REIMAGE
03:51 rzl: rzl@wdqs1012:~$ sudo depool
03:46 legoktm: wdqs1012 restarted prometheus-blazegraph-exporter-wdqs-blazegraph.service and prometheus-blazegraph-exporter-wdqs-categories.service after apparent exceptions/crashes
02:08 mwdebug-deploy@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
02:06 mwdebug-deploy@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
00:17 ryankemper: [WDQS Deploy] Restarting `wdqs-categories` across lvs-managed hosts, one node at a time: `sudo -E cumin -b 1 'A:wdqs-all and not A:wdqs-test' 'depool && sleep 45 && systemctl restart wdqs-categories && sleep 45 && pool'`
00:17 ryankemper: [WDQS Deploy] Restarted `wdqs-categories` across all test hosts simultaneously: `sudo -E cumin 'A:wdqs-test' 'systemctl restart wdqs-categories'`
00:17 ryankemper: [WDQS Deploy] Restarted `wdqs-updater` across all hosts, 4 hosts at a time: `sudo -E cumin -b 4 'A:wdqs-all' 'systemctl restart wdqs-updater'`
00:16 ryankemper@deploy1002: Finished deploy [wdqs/wdqs@da9efa9]: 0.3.83 (duration: 07m 05s)
00:10 ryankemper: [WDQS Deploy] Tests passing following deploy of `0.3.83` on canary `wdqs1003`; proceeding to rest of fleet
00:09 ryankemper@deploy1002: Started deploy [wdqs/wdqs@da9efa9]: 0.3.83
00:08 ryankemper: [WDQS Deploy] Gearing up for deploy of wdqs `0.3.83`. Pre-deploy tests passing on canary `wdqs1003`

2021-08-23

23:41 ryankemper: T285355 `helmfile -e staging -i apply` on `/srv/deployment-charts/helmfile.d/services/linkrecommendation/` from `ryankemper@deploy1002`
23:40 ryankemper@deploy1002: helmfile [staging] Ran 'sync' command on namespace 'linkrecommendation' for release 'staging' .
18:56 tgr: morning deploys done
18:56 tgr@deploy1002: Synchronized php-1.37.0-wmf.19/extensions/GrowthExperiments: Backport: Add Link: store when tasks were generated (T284551) (duration: 00m 57s)
18:49 mwdebug-deploy@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
18:47 mwdebug-deploy@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
18:35 mwdebug-deploy@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
18:34 mwdebug-deploy@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
18:27 dancy@deploy1002: Synchronized wmf-config/etcd.php: Config: wmfSetupEtcd only supports array input (duration: 00m 57s)
18:27 mwdebug-deploy@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
18:24 mwdebug-deploy@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
18:23 dancy@deploy1002: Synchronized wmf-config: Config: Use array format to specify etcd server (duration: 00m 57s)
18:12 mwdebug-deploy@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
18:12 dancy@deploy1002: Synchronized wmf-config/etcd.php: Config: Allow protocol for etcd server to be specified (duration: 00m 57s)
18:11 mwdebug-deploy@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
17:59 mwdebug-deploy@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
17:57 mwdebug-deploy@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
17:17 ebernhardson@deploy1002: Finished deploy [search/airflow@4c49df7]: ship modern pip/wheel version to support manylinux2014 (pyarrow) (duration: 00m 56s)
17:16 ebernhardson@deploy1002: Started deploy [search/airflow@4c49df7]: ship modern pip/wheel version to support manylinux2014 (pyarrow)
16:37 ebernhardson@deploy1002: Finished deploy [search/airflow@32f5039]: Add pyarrow lib for hdfs integration (duration: 00m 35s)
16:37 ebernhardson@deploy1002: Started deploy [search/airflow@32f5039]: Add pyarrow lib for hdfs integration
16:24 jiji@cumin1001: END (FAIL) - Cookbook sre.hosts.downtime (exit_code=99) for 2:00:00 on mc2027.codfw.wmnet with reason: REIMAGE
16:21 jiji@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on mc2027.codfw.wmnet with reason: REIMAGE
15:48 pt1979@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
15:44 pt1979@cumin2002: START - Cookbook sre.dns.netbox
15:43 pt1979@cumin2002: END (FAIL) - Cookbook sre.dns.netbox (exit_code=99)
15:38 pt1979@cumin2002: START - Cookbook sre.dns.netbox
14:59 mwdebug-deploy@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
14:59 urbanecm@deploy1002: Synchronized wmf-config/InitialiseSettings.php: 26fe6d7: ckbwiki: Enable Growth features in dark mode (T287867; 3/3) (duration: 00m 56s)
14:58 urbanecm@deploy1002: Synchronized wmf-config/config/ckbwiki.yaml: 26fe6d7: ckbwiki: Enable Growth features in dark mode (T287867; 2/3) (duration: 00m 57s)
14:57 urbanecm@deploy1002: Synchronized dblists/growthexperiments.dblist: 26fe6d7: ckbwiki: Enable Growth features in dark mode (T287867; 1/3) (duration: 00m 57s)
14:56 mwdebug-deploy@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
14:54 urbanecm: [urbanecm@mwmaint2002 /srv/mediawiki-staging/php]$ mwscript extensions/GrowthExperiments/maintenance/initWikiConfig.php --wiki=ckbwiki --phab=T287867 # T287867
14:53 urbanecm: [urbanecm@mwmaint2002 /srv/mediawiki-staging/php]$ mwscript extensions/WikimediaMaintenance/createExtensionTables.php --wiki=ckbwiki growthexperiments # T287867
14:29 zpapierski@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'rdf-streaming-updater' for release 'main' .
14:26 btullis@cumin1001: END (PASS) - Cookbook sre.hadoop.roll-restart-masters (exit_code=0) restart masters for Hadoop analytics cluster: Restart of jvm daemons. - btullis@cumin1001
14:00 btullis@cumin1001: START - Cookbook sre.hadoop.roll-restart-masters restart masters for Hadoop analytics cluster: Restart of jvm daemons. - btullis@cumin1001
13:57 zpapierski@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'rdf-streaming-updater' for release 'main' .
13:56 zpapierski@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'rdf-streaming-updater' for release 'main' .
13:26 dzahn@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 5 days, 8:00:00 on planet1002.eqiad.wmnet with reason: known issue
13:26 dzahn@cumin1001: START - Cookbook sre.hosts.downtime for 5 days, 8:00:00 on planet1002.eqiad.wmnet with reason: known issue
12:59 mwdebug-deploy@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
12:57 mwdebug-deploy@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
12:55 jiji@deploy1002: Synchronized wmf-config/ProductionServices.php: Config: ProductionServices: change rdb* servers in eqiad and codfw (T280582) (duration: 00m 57s)
11:35 Lucas_WMDE: EU backport+config window done
11:33 lucaswerkmeister-wmde@deploy1002: Synchronized wmf-config/Wikibase.php: Config: Set $wgWBRepoSettings['tmpNormalizeDataValues'] on test wikis (T251480) (2/2) (duration: 00m 57s)
11:31 lucaswerkmeister-wmde@deploy1002: Synchronized wmf-config/InitialiseSettings.php: Config: Set $wgWBRepoSettings['tmpNormalizeDataValues'] on test wikis (T251480) (1/2) (duration: 00m 58s)
11:20 mwdebug-deploy@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
11:18 mwdebug-deploy@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
11:10 mwdebug-deploy@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
11:09 mwdebug-deploy@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
11:04 lucaswerkmeister-wmde@deploy1002: Synchronized wmf-config/InitialiseSettings.php: Config: Revert "Enable NewUserMessage on hiwiktionary" (T287091) (duration: 00m 57s)
10:57 jiji@cumin1001: END (FAIL) - Cookbook sre.hosts.downtime (exit_code=99) for 2:00:00 on mc2025.codfw.wmnet with reason: REIMAGE
10:55 jiji@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on mc2025.codfw.wmnet with reason: REIMAGE
09:56 ladsgroup@deploy1002: Synchronized php-1.37.0-wmf.19/extensions/FlaggedRevs/maintenance/pruneRevData.php: Backport: Add extra sleep option between each batch in pruneRevData.php (T289249) (duration: 00m 58s)
09:55 mbsantos: start re-import OSM planet data into maps1009 eqiad master (T288400, T288897)
09:53 urbanecm: Deploy security patch for T289408
09:51 mwdebug-deploy@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
09:50 mwdebug-deploy@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
09:33 filippo@puppetmaster1001: conftool action : set/pooled=false; selector: dnsdisc=swift-ro,name=codfw
09:33 filippo@puppetmaster1001: conftool action : set/pooled=false; selector: dnsdisc=swift,name=codfw
09:02 filippo@puppetmaster1001: conftool action : set/pooled=true; selector: dnsdisc=swift-ro,name=eqiad
09:02 filippo@puppetmaster1001: conftool action : set/pooled=true; selector: dnsdisc=swift,name=eqiad
09:01 godog: pooling swift in eqiad - T288458
07:59 mwdebug-deploy@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
07:55 mwdebug-deploy@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
07:46 mwdebug-deploy@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
07:44 ladsgroup@deploy1002: Synchronized wmf-config/InitialiseSettings.php: Config: Set request languages rdf output for wikidata to true (T285795) (duration: 00m 57s)
07:42 mwdebug-deploy@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
07:28 Amir1: running FlaggedRevs/maintenance/pruneRevData.php on all flaggedrevs wikis
07:28 ladsgroup@deploy1002: Synchronized php-1.37.0-wmf.19/extensions/FlaggedRevs/maintenance/pruneRevData.php: Backport: Avoid calling delete() with empty arrays in PruneFRIncludeData (T289249) (duration: 00m 59s)
07:20 jiji@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on mc2023.codfw.wmnet with reason: REIMAGE
07:18 jiji@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on mc2023.codfw.wmnet with reason: REIMAGE

2021-08-21

15:12 mwdebug-deploy@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
15:11 mwdebug-deploy@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .

2021-08-20

23:17 legoktm: deployed patch for T289385
17:03 btullis@cumin1001: END (PASS) - Cookbook sre.hadoop.init-hadoop-workers (exit_code=0) for hosts an-worker1141.eqiad.wmnet
17:01 btullis@cumin1001: START - Cookbook sre.hadoop.init-hadoop-workers for hosts an-worker1141.eqiad.wmnet
16:58 btullis@cumin1001: END (PASS) - Cookbook sre.hadoop.init-hadoop-workers (exit_code=0) for hosts an-worker1140.eqiad.wmnet
16:56 btullis@cumin1001: START - Cookbook sre.hadoop.init-hadoop-workers for hosts an-worker1140.eqiad.wmnet
16:56 btullis@cumin1001: END (PASS) - Cookbook sre.hadoop.init-hadoop-workers (exit_code=0) for hosts an-worker1139.eqiad.wmnet
16:54 btullis@cumin1001: START - Cookbook sre.hadoop.init-hadoop-workers for hosts an-worker1139.eqiad.wmnet
16:45 btullis@cumin1001: END (PASS) - Cookbook sre.hadoop.init-hadoop-workers (exit_code=0) for hosts an-worker1134.eqiad.wmnet
16:43 btullis@cumin1001: START - Cookbook sre.hadoop.init-hadoop-workers for hosts an-worker1134.eqiad.wmnet
16:38 btullis@cumin1001: END (PASS) - Cookbook sre.hadoop.init-hadoop-workers (exit_code=0) for hosts an-worker1133.eqiad.wmnet
16:36 btullis@cumin1001: START - Cookbook sre.hadoop.init-hadoop-workers for hosts an-worker1133.eqiad.wmnet
15:37 jayme: deleting various pods from staging to have them recreated with priorities - T289131
15:25 btullis@cumin1001: END (PASS) - Cookbook sre.hadoop.init-hadoop-workers (exit_code=0) for hosts an-worker1129.eqiad.wmnet
15:23 btullis@cumin1001: START - Cookbook sre.hadoop.init-hadoop-workers for hosts an-worker1129.eqiad.wmnet
15:14 jgiannelos@deploy1002: helmfile [staging] Ran 'sync' command on namespace 'tegola-vector-tiles' for release 'main' .
14:41 jiji@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on mc2021.codfw.wmnet with reason: REIMAGE
14:39 jiji@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on mc2021.codfw.wmnet with reason: REIMAGE
13:54 jelto@deploy1002: helmfile [staging] Ran 'sync' command on namespace 'miscweb' for release 'main' .
13:48 jgiannelos@deploy1002: helmfile [staging] Ran 'sync' command on namespace 'tegola-vector-tiles' for release 'main' .
12:00 jayme: enabled priority admission plugin on k8s staging, rolling restart all pods in kube-system namespace - T289131
11:55 jgiannelos@deploy1002: helmfile [staging] Ran 'sync' command on namespace 'tegola-vector-tiles' for release 'main' .
10:35 dzahn@deploy1002: helmfile [staging] Ran 'sync' command on namespace 'miscweb' for release 'main' .
09:33 btullis@cumin1001: END (PASS) - Cookbook sre.hosts.decommission (exit_code=0) for hosts druid1001.eqiad.wmnet
09:32 dzahn@deploy1002: helmfile [staging] Ran 'sync' command on namespace 'miscweb' for release 'main' .
09:23 btullis@cumin1001: START - Cookbook sre.hosts.decommission for hosts druid1001.eqiad.wmnet
08:48 godog: roll depool/pool thanos-fe to apply swift change - T288815
08:43 godog: temp depool thanos-fe2003 to test https://gerrit.wikimedia.org/r/c/operations/puppet/+/713815
08:43 btullis@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 5 days, 0:00:00 on druid1001.eqiad.wmnet with reason: decommissioning druid1001
08:43 btullis@cumin1001: START - Cookbook sre.hosts.downtime for 5 days, 0:00:00 on druid1001.eqiad.wmnet with reason: decommissioning druid1001
07:14 jiji@cumin1001: END (FAIL) - Cookbook sre.hosts.downtime (exit_code=99) for 2:00:00 on mc2019.codfw.wmnet with reason: REIMAGE
07:12 jiji@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on mc1038.eqiad.wmnet with reason: REIMAGE
07:10 jiji@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on mc1037.eqiad.wmnet with reason: REIMAGE
07:10 jiji@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on mc1038.eqiad.wmnet with reason: REIMAGE
07:09 jiji@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on mc2019.codfw.wmnet with reason: REIMAGE
07:08 jiji@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on mc1037.eqiad.wmnet with reason: REIMAGE
06:13 jgiannelos@deploy1002: helmfile [staging] Ran 'sync' command on namespace 'tegola-vector-tiles' for release 'main' .
06:07 TimStarling: sending election email to 44k people
03:15 legoktm@deploy1002: Synchronized php-1.37.0-wmf.19/extensions/Score/scripts/removeTagline.php: removeTagline: Set explicit pcre.backtrack_limit (T289298) (duration: 00m 58s)
03:12 mwdebug-deploy@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
03:11 mwdebug-deploy@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
00:18 mwdebug-deploy@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
00:16 mwdebug-deploy@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
00:13 tstarling@deploy1002: Synchronized php-1.37.0-wmf.19/extensions/SecurePoll/cli/wm-scripts/makeMailingList.php: code that uses said hack (duration: 00m 57s)
00:12 tstarling@deploy1002: Synchronized php-1.37.0-wmf.19/extensions/SecurePoll/includes/User/LocalAuth.php: hack for mailout (duration: 00m 58s)
00:09 mwdebug-deploy@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
00:08 mwdebug-deploy@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .

2021-08-19

23:15 brennen: ended backport & config window early, as no patches were scheduled and no new attendees for this week
22:42 ejegg: updated payments-wiki from 0a27dbe9b6 to 564daed816
21:20 Amir1: ladsgroup@mwmaint2002:~$ mwscript extensions/FlaggedRevs/maintenance/pruneRevData.php --wiki=huwiki --prune (T289249)
19:13 mwdebug-deploy@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
19:12 mwdebug-deploy@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
19:07 brennen@deploy1002: rebuilt and synchronized wikiversions files: all wikis to 1.37.0-wmf.19
19:07 razzi@deploy1002: Finished deploy [analytics/aqs/deploy@57c253e]: Deploy aqs 9c062f2 (duration: 03m 30s)
19:03 razzi@deploy1002: Started deploy [analytics/aqs/deploy@57c253e]: Deploy aqs 9c062f2
18:27 razzi: Beginning aqs deploy process
18:00 herron@cumin1001: END (PASS) - Cookbook sre.hosts.decommission (exit_code=0) for hosts kafkamon2001.codfw.wmnet
17:49 herron@cumin1001: START - Cookbook sre.hosts.decommission for hosts kafkamon2001.codfw.wmnet
17:48 herron@cumin1001: END (PASS) - Cookbook sre.hosts.decommission (exit_code=0) for hosts kafkamon1001.eqiad.wmnet
17:41 herron@cumin1001: START - Cookbook sre.hosts.decommission for hosts kafkamon1001.eqiad.wmnet
17:11 hnowlan@cumin1001: END (PASS) - Cookbook sre.hosts.decommission (exit_code=0) for hosts maps1004.eqiad.wmnet
17:01 hnowlan@cumin1001: START - Cookbook sre.hosts.decommission for hosts maps1004.eqiad.wmnet
17:00 hnowlan@cumin1001: END (PASS) - Cookbook sre.hosts.decommission (exit_code=0) for hosts maps1003.eqiad.wmnet
16:54 mwdebug-deploy@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
16:52 mwdebug-deploy@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
16:49 legoktm@deploy1002: Synchronized wmf-config/InitialiseSettings.php: Re-enable Score with Shellbox on most public wikis (T257066) (duration: 01m 08s)
16:46 hnowlan@cumin1001: START - Cookbook sre.hosts.decommission for hosts maps1003.eqiad.wmnet
16:45 hnowlan@cumin1001: END (PASS) - Cookbook sre.hosts.decommission (exit_code=0) for hosts maps1002.eqiad.wmnet
16:31 hnowlan@cumin1001: START - Cookbook sre.hosts.decommission for hosts maps1002.eqiad.wmnet
16:31 hnowlan@cumin1001: END (FAIL) - Cookbook sre.hosts.decommission (exit_code=99) for hosts maps1002.eqiad.wmnet
16:30 hnowlan@cumin1001: START - Cookbook sre.hosts.decommission for hosts maps1002.eqiad.wmnet
16:28 hnowlan@cumin1001: END (PASS) - Cookbook sre.hosts.decommission (exit_code=0) for hosts maps1001.eqiad.wmnet
16:14 hnowlan@cumin1001: START - Cookbook sre.hosts.decommission for hosts maps1001.eqiad.wmnet
16:14 hnowlan: starting decommission of old eqiad maps hardware
16:10 cwhite: remove rotated logstash-plain-* and logstash-json-* logs on logstash collectors
16:00 akosiaris@deploy1002: helmfile [staging] Ran 'sync' command on namespace 'rdf-streaming-updater' for release 'main' .
15:53 dpifke@deploy1002: Finished deploy [performance/navtiming@f8bf39f]: Deploy CpuBenchmark processor again T281243 (duration: 00m 06s)
15:52 dpifke@deploy1002: Started deploy [performance/navtiming@f8bf39f]: Deploy CpuBenchmark processor again T281243
15:50 Amir1: test2wiki)> delete from flaggedtemplates where ft_rev_id not in (select fp_stable from flaggedpages); (T289249)
15:42 hnowlan@puppetmaster1001: conftool action : set/pooled=yes; selector: name=maps2005.codfw.wmnet
15:40 hnowlan@puppetmaster1001: conftool action : set/pooled=no; selector: name=maps2005.codfw.wmnet
15:38 jiji@deploy1002: helmfile [staging] Ran 'sync' command on namespace 'rdf-streaming-updater' for release 'main' .
15:35 hnowlan@puppetmaster1001: conftool action : set/pooled=yes; selector: name=maps2005.codfw.wmnet
15:33 hnowlan@puppetmaster1001: conftool action : set/pooled=no; selector: name=maps2005.codfw.wmnet
15:29 jiji@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'rdf-streaming-updater' for release 'main' .
15:25 jiji@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'rdf-streaming-updater' for release 'main' .
15:06 hnowlan@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 3 days, 0:00:00 on maps[1001-1004].eqiad.wmnet with reason: Awaiting decommissioning
15:06 hnowlan@cumin1001: START - Cookbook sre.hosts.downtime for 3 days, 0:00:00 on maps[1001-1004].eqiad.wmnet with reason: Awaiting decommissioning
15:04 godog: clean logstash json logs off logstash hosts
14:51 jiji@deploy1002: helmfile [staging] Ran 'sync' command on namespace 'rdf-streaming-updater' for release 'main' .
14:49 jiji@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'rdf-streaming-updater' for release 'main' .
14:36 effie: enable puppet on mediawiki and memcached servers for 713842
14:26 effie: disable puppet on mediawiki and memcached servers for 713842
13:58 zpapierski@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'rdf-streaming-updater' for release 'main' .
13:49 urbanecm: Start server-side upload for 1 video file (T288384)
13:48 urbanecm: Start server-side upload for 1 video file (T288554)
13:47 kharlan@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'linkrecommendation' for release 'external' .
13:47 kharlan@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'linkrecommendation' for release 'internal' .
13:45 urbanecm: Start server-side upload for 1 video file (T288628)
13:44 kharlan@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'linkrecommendation' for release 'internal' .
13:44 kharlan@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'linkrecommendation' for release 'external' .
13:42 urbanecm: Start server-side upload for 1 video file (T289203)
13:40 kharlan@deploy1002: helmfile [staging] Ran 'sync' command on namespace 'linkrecommendation' for release 'staging' .
13:34 kormat: reconfiguring replication tree on pc3 T284825
13:30 kormat: reconfiguring replication tree on pc2 T284825
13:24 kormat: reconfiguring replication tree on pc1 T284825
13:18 mwdebug-deploy@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
13:17 mwdebug-deploy@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
13:10 mwdebug-deploy@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
13:09 mwdebug-deploy@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
13:09 kormat@deploy1002: Synchronized wmf-config/ProductionServices.php: Promote new h/w to primary of eqiad pc sections T284825 (duration: 01m 08s)
12:35 zpapierski@deploy1002: helmfile [staging] Ran 'sync' command on namespace 'rdf-streaming-updater' for release 'main' .
12:17 mwdebug-deploy@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
12:15 mwdebug-deploy@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
12:11 Lucas_WMDE: EU backport+config window done
12:11 lucaswerkmeister-wmde@deploy1002: Synchronized php-1.37.0-wmf.18/extensions/Wikibase/view/lib/wikibase-termbox/: Backport: Update termbox (T236893, T286775) (duration: 01m 08s)
11:56 zpapierski@deploy1002: helmfile [staging] Ran 'sync' command on namespace 'rdf-streaming-updater' for release 'main' .
11:49 mwdebug-deploy@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
11:45 mwdebug-deploy@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
11:42 lucaswerkmeister-wmde@deploy1002: Synchronized wmf-config/Wikibase.php: Config: Revert "Don't set termbox v2 tags yet" (T236893, T286775) (duration: 01m 06s)
11:40 lucaswerkmeister-wmde@deploy1002: Synchronized php-1.37.0-wmf.19/extensions/Wikibase/view/lib/wikibase-termbox/: Backport: Update termbox (T236893, T286775) (duration: 01m 08s)
11:39 lucaswerkmeister-wmde@deploy1002: sync-file aborted: Backport: Update termbox (T236893T286775) (duration: 00m 01s)
11:35 mwdebug-deploy@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
11:34 mwdebug-deploy@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
10:45 mvolz@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'citoid' for release 'production' .
10:42 mvolz@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'citoid' for release 'production' .
10:36 mvolz@deploy1002: helmfile [staging] Ran 'sync' command on namespace 'citoid' for release 'staging' .
10:12 twentyafterfour: restart php-fpm on phab1001
10:02 godog: roll-reload nginx on ms-fe to apply config change
08:48 jayme@deploy1002: helmfile [staging-codfw] DONE helmfile.d/admin 'apply'.
08:48 jayme@deploy1002: helmfile [staging-codfw] START helmfile.d/admin 'apply'.
08:41 dzahn@deploy1002: helmfile [staging] Ran 'sync' command on namespace 'miscweb' for release 'main' .
04:20 effie: pool mw2383 - T286463
01:13 ejegg: updated fundraising CiviCRM from 73f6ec9190 to 8ed303f2d1
00:42 cmjohnson@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
00:40 cmjohnson@cumin1001: START - Cookbook sre.dns.netbox

2021-08-18

22:16 ebernhardson@deploy1002: Finished deploy [wikimedia/discovery/analytics@26480d5]: fully enable imagerec data shipping (duration: 02m 09s)
22:14 ebernhardson@deploy1002: Started deploy [wikimedia/discovery/analytics@26480d5]: fully enable imagerec data shipping
21:15 jgleeson: civicrm changed from 66568246a2 to 73f6ec9190
19:40 ebernhardson@deploy1002: Finished deploy [wikimedia/discovery/analytics@8d71e72]: configuration for imagerec data shipping (duration: 02m 12s)
19:38 ebernhardson@deploy1002: Started deploy [wikimedia/discovery/analytics@8d71e72]: configuration for imagerec data shipping
19:14 mwdebug-deploy@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
19:12 mwdebug-deploy@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
19:09 brennen@deploy1002: Synchronized php: group1 wikis to 1.37.0-wmf.19 (duration: 01m 05s)
19:08 brennen@deploy1002: rebuilt and synchronized wikiversions files: group1 wikis to 1.37.0-wmf.19
18:20 mwdebug-deploy@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
18:18 mwdebug-deploy@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
18:16 legoktm: Successfully published image docker-registry.discovery.wmnet/nodejs12-devel:0.0.1, docker-registry.discovery.wmnet/nodejs12-slim:0.0.1 (T284346)
18:12 urbanecm@deploy1002: Synchronized wmf-config/InitialiseSettings.php: 559dd70: Enable page previews on German Wikivoyage (T264305) (duration: 01m 08s)
18:09 mwdebug-deploy@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
18:07 mwdebug-deploy@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
18:06 urbanecm@deploy1002: Synchronized wmf-config/InitialiseSettings.php: 35113b6: Enable DiscussionTools topicsubscription as beta feature on phase 1 wikis (T287800) (duration: 01m 25s)
16:28 cmjohnson@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
16:24 cmjohnson@cumin1001: START - Cookbook sre.dns.netbox
15:48 cmjohnson@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
15:46 ejegg: updated matching gift employers list on payments-wiki
15:43 cmjohnson@cumin1001: START - Cookbook sre.dns.netbox
14:50 jgiannelos@deploy1002: helmfile [staging] Ran 'sync' command on namespace 'tegola-vector-tiles' for release 'main' .
14:26 effie: enable puppet on alert*
14:11 effie: disable puppet on alerts* to avoid alert flood due to 713494
14:01 mwdebug-deploy@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
13:57 jiji@deploy1002: Synchronized wmf-config/ProductionServices.php: Config: ProductionServices: change rdb* servers in eqiad and codfw (T280582) (duration: 01m 51s)
13:57 mwdebug-deploy@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
13:41 godog: bounce logstash on logstash100[89]
13:33 jgiannelos@deploy1002: helmfile [staging] Ran 'sync' command on namespace 'tegola-vector-tiles' for release 'main' .
13:24 effie: mw2383 is depooled - T286463
13:01 kormat: Deploying wmfmariadbpy 0.7.2 T289139
13:01 kormat: uploaded wmfmariadbpy 0.7.2 to apt.wm.o
11:38 zpapierski@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'rdf-streaming-updater' for release 'main' .
11:36 zpapierski@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'rdf-streaming-updater' for release 'main' .
11:35 zpapierski@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'rdf-streaming-updater' for release 'main' .
11:12 jiji@cumin1001: conftool action : set/pooled=inactive; selector: name=mw2383.codfw.wmnet
11:03 zpapierski@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'rdf-streaming-updater' for release 'main' .
10:47 effie: pooling mw2383 - T286463
10:41 zpapierski@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'rdf-streaming-updater' for release 'main' .
10:18 hnowlan@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 3 days, 0:00:00 on maps1004.eqiad.wmnet with reason: Awaiting decommissioning
10:18 hnowlan@cumin1001: START - Cookbook sre.hosts.downtime for 3 days, 0:00:00 on maps1004.eqiad.wmnet with reason: Awaiting decommissioning
10:13 jiji@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on mw2383.codfw.wmnet with reason: REIMAGE
10:10 jiji@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on mw2383.codfw.wmnet with reason: REIMAGE
09:36 joal@deploy1002: Finished deploy [analytics/refinery@88c6618] (hadoop-test): Regular analytics weekly train TEST [analytics/refinery@88c6618] (duration: 05m 48s)
09:30 joal@deploy1002: Started deploy [analytics/refinery@88c6618] (hadoop-test): Regular analytics weekly train TEST [analytics/refinery@88c6618]
09:30 joal@deploy1002: Finished deploy [analytics/refinery@88c6618] (thin): Regular analytics weekly train THIN [analytics/refinery@88c6618] (duration: 00m 07s)
09:30 joal@deploy1002: Started deploy [analytics/refinery@88c6618] (thin): Regular analytics weekly train THIN [analytics/refinery@88c6618]
09:29 joal@deploy1002: Finished deploy [analytics/refinery@88c6618]: Regular analytics weekly train [analytics/refinery@88c6618] (duration: 32m 29s)
08:57 joal@deploy1002: Started deploy [analytics/refinery@88c6618]: Regular analytics weekly train [analytics/refinery@88c6618]
04:38 marostegui: Drop user2 from s6 - T289051
02:03 rzl@cumin2001: conftool action : get/pooled; selector: service=docker-registry
00:39 dpifke@deploy1002: Finished deploy [performance/navtiming@88f12a0]: Revert CpuBenchmark again (T281243) (duration: 00m 05s)
00:39 dpifke@deploy1002: Started deploy [performance/navtiming@88f12a0]: Revert CpuBenchmark again (T281243)
00:38 dpifke@deploy1002: Finished deploy [performance/navtiming@88f12a0]: Re-deploy fixed CpuBenchmark (T281243) (duration: 00m 06s)
00:38 dpifke@deploy1002: Started deploy [performance/navtiming@88f12a0]: Re-deploy fixed CpuBenchmark (T281243)

2021-08-17

23:40 mwdebug-deploy@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
23:33 mwdebug-deploy@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
23:32 ebernhardson@deploy1002: Synchronized php-1.37.0-wmf.19/extensions/CirrusSearch/maintenance/UpdateSuggesterIndex.php: T288233: Work around cache failure for wikitech (duration: 01m 28s)
23:05 tzatziki: resetting email for vanished user
21:51 mwdebug-deploy@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
21:44 mwdebug-deploy@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
21:44 urbanecm: Deploy security patch for T289063
20:30 brennen: running scap pull on mw2383
20:29 brennen@deploy1002: Pruned MediaWiki: 1.37.0-wmf.16 (duration: 02m 01s)
20:20 brennen@deploy1002: Pruned MediaWiki: 1.37.0-wmf.15 (duration: 06m 51s)
20:14 brennen: pruning 1.37.0-wmf.15 and .16 (T281160)
20:06 urbanecm@deploy1002: Synchronized php-1.37.0-wmf.18/includes/block/BlockUser.php: d377d4f: BlockUser: Restore blocking autoblocked IP addresses (T287798) (duration: 01m 08s)
19:05 brennen@deploy1002: rebuilt and synchronized wikiversions files: group0 wikis to 1.37.0-wmf.19
19:02 brennen: 1.37.0-wmf.19 train status: no current blockers, proceeding to group0 (T281160)
17:44 ladsgroup@deploy1002: Synchronized php-1.37.0-wmf.18/includes/: Backport: Revert "objectcache: make use of new `modtoken` field in SqlBagOStuff" (T288998) (duration: 01m 13s)
17:41 urbanecm: [urbanecm@mw2383 ~]$ scap pull # to clear an icinga alert
17:39 ladsgroup@deploy1002: Synchronized php-1.37.0-wmf.19/includes/: Backport: Revert "objectcache: make use of new `modtoken` field in SqlBagOStuff" (T288998) (duration: 01m 14s)
17:15 bblack: authdns2001,dns[245]001: upgrade gdnsd package to 3.8.0-1~wmf1 (all authdns upgraded after this)
17:07 mbsantos@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'mobileapps' for release 'production' .
17:04 mbsantos@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'mobileapps' for release 'production' .
17:02 mbsantos@deploy1002: helmfile [staging] Ran 'sync' command on namespace 'mobileapps' for release 'staging' .
16:56 brennen@deploy1002: Finished scap: testwikis wikis to 1.37.0-wmf.19 (duration: 38m 24s)
16:50 bblack: dns1001: upgrade gdnsd package to 3.8.0-1~wmf1
16:25 bblack: dns3001: upgrade gdnsd package to 3.8.0-1~wmf1
16:17 brennen@deploy1002: Started scap: testwikis wikis to 1.37.0-wmf.19
16:13 brennen: 1.37.0-wmf.19 train: running scap prep, branched at 79c9b9e
16:08 zpapierski@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'rdf-streaming-updater' for release 'main' .
16:06 zpapierski@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'rdf-streaming-updater' for release 'main' .
16:05 mwdebug-deploy@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
16:03 mwdebug-deploy@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
15:55 urbanecm: Deploy a security patch for T289064
15:37 jgiannelos@deploy1002: helmfile [staging] Ran 'sync' command on namespace 'tegola-vector-tiles' for release 'main' .
15:32 zpapierski@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'rdf-streaming-updater' for release 'main' .
15:23 jgiannelos@deploy1002: helmfile [staging] Ran 'sync' command on namespace 'tegola-vector-tiles' for release 'main' .
15:06 jgiannelos@deploy1002: helmfile [staging] Ran 'sync' command on namespace 'tegola-vector-tiles' for release 'main' .
14:41 mwdebug-deploy@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
14:40 mwdebug-deploy@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
14:37 kormat@deploy1002: Synchronized wmf-config/ProductionServices.php: Promote pc2013 to primary of pc3 T284825 (duration: 00m 58s)
14:25 jynus: running a full testwiki media backup on a single thread, single worker T262668
14:23 mwdebug-deploy@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
14:21 mwdebug-deploy@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
14:21 jgiannelos@deploy1002: helmfile [staging] Ran 'sync' command on namespace 'tegola-vector-tiles' for release 'main' .
14:20 kormat@deploy1002: Synchronized wmf-config/ProductionServices.php: Promote pc2012 to primary of pc2 T284825 (duration: 00m 59s)
13:53 jynus: rolling restart of minio on backup server
13:51 jgiannelos@deploy1002: helmfile [staging] Ran 'sync' command on namespace 'tegola-vector-tiles' for release 'main' .
13:06 jgiannelos@deploy1002: helmfile [staging] Ran 'sync' command on namespace 'tegola-vector-tiles' for release 'main' .
12:55 jayme@deploy1002: helmfile [staging] Ran 'sync' command on namespace 'blubberoid' for release 'staging' .
12:55 jayme@deploy1002: helmfile [staging] Ran 'sync' command on namespace 'blubberoid' for release 'production' .
12:55 jayme@deploy1002: helmfile [staging] Ran 'sync' command on namespace 'blubberoid' for release 'staging' .
12:55 jayme@deploy1002: helmfile [staging] Ran 'sync' command on namespace 'blubberoid' for release 'production' .
11:29 phuedx@deploy1002: Synchronized php-1.37.0-wmf.18/extensions/SecurePoll/includes/Jobs/TallyElectionJob.php: Backport: tallyElectionJob: Catch and log exceptions (T288361) (duration: 00m 58s)
11:16 mvernon@cumin1001: dbctl commit (dc=all): 'db2121 (re)pooling @ 100%: buster reimage T288244', diff saved to https://phabricator.wikimedia.org/P17038 and previous config saved to /var/cache/conftool/dbconfig/20210817-111629-mvernon.json
11:15 jgiannelos@deploy1002: helmfile [staging] Ran 'sync' command on namespace 'tegola-vector-tiles' for release 'main' .
11:13 mwdebug-deploy@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
11:09 mwdebug-deploy@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
11:01 mvernon@cumin1001: dbctl commit (dc=all): 'db2121 (re)pooling @ 75%: buster reimage T288244', diff saved to https://phabricator.wikimedia.org/P17037 and previous config saved to /var/cache/conftool/dbconfig/20210817-110125-mvernon.json
10:46 mvernon@cumin1001: dbctl commit (dc=all): 'db2121 (re)pooling @ 50%: buster reimage T288244', diff saved to https://phabricator.wikimedia.org/P17035 and previous config saved to /var/cache/conftool/dbconfig/20210817-104622-mvernon.json
10:31 mvernon@cumin1001: dbctl commit (dc=all): 'db2121 (re)pooling @ 25%: buster reimage T288244', diff saved to https://phabricator.wikimedia.org/P17034 and previous config saved to /var/cache/conftool/dbconfig/20210817-103118-mvernon.json
10:07 effie: enable puppet on mediawiki hosts
09:52 mvernon@cumin2002: END (FAIL) - Cookbook sre.hosts.downtime (exit_code=99) for 2:00:00 on db2121.codfw.wmnet with reason: REIMAGE
09:50 mvernon@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on db2121.codfw.wmnet with reason: REIMAGE
09:20 mvernon@cumin1001: dbctl commit (dc=all): 'db2121 depooling: reimage to buster T288244', diff saved to https://phabricator.wikimedia.org/P17033 and previous config saved to /var/cache/conftool/dbconfig/20210817-092045-mvernon.json
09:17 dzahn@cumin1001: conftool action : set/pooled=yes; selector: name=mw1456.eqiad.wmnet
09:16 Emperor: reimaging db2121 to buster T288244
09:08 dzahn@cumin1001: conftool action : set/weight=30; selector: name=mw1456.eqiad.wmnet
08:37 jelto@cumin1001: END (PASS) - Cookbook sre.hosts.decommission (exit_code=0) for hosts mw[1276-1279].eqiad.wmnet
08:29 effie: disable puppet on mediawiki hosts to merge 712920
08:24 jelto@cumin1001: START - Cookbook sre.hosts.decommission for hosts mw[1276-1279].eqiad.wmnet
08:24 dzahn@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on mw1456.eqiad.wmnet with reason: new setup
08:24 dzahn@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on mw1456.eqiad.wmnet with reason: new setup
08:21 mutante: mw2383 - scap pull (still depooled because T286463 but alerts in Icinga since a while)
08:20 dzahn@cumin1001: END (FAIL) - Cookbook sre.hosts.downtime (exit_code=99) for 2:00:00 on mw1456.eqiad.wmnet with reason: REIMAGE
08:18 zpapierski@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'rdf-streaming-updater' for release 'main' .
08:18 jelto@cumin1001: conftool action : set/pooled=inactive; selector: name=mw127[6-9].eqiad.wmnet
08:18 dzahn@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on mw1456.eqiad.wmnet with reason: REIMAGE
08:17 jelto@cumin1001: conftool action : set/pooled=no; selector: name=mw127[6-9].eqiad.wmnet
08:15 jelto@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 3:00:00 on mw[1276-1279].eqiad.wmnet with reason: decom old appservers in eqiad T280203
08:15 jelto@cumin1001: START - Cookbook sre.hosts.downtime for 3:00:00 on mw[1276-1279].eqiad.wmnet with reason: decom old appservers in eqiad T280203
08:06 zpapierski@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'rdf-streaming-updater' for release 'main' .
08:00 jelto@cumin1001: conftool action : set/weight=30; selector: name=mw144[7-9].eqiad.wmnet
07:59 mutante: mw1384 - start failed ferm service
07:59 jelto@cumin1001: conftool action : set/weight=30; selector: name=mw1450.eqiad.wmnet
07:52 mutante: mw1451 through mw1455 - fresh hardware pooled the first time as appservers
07:50 dzahn@cumin1001: conftool action : set/pooled=yes; selector: name=mw145[1-5].eqiad.wmnet
07:49 dzahn@cumin1001: conftool action : set/pooled=no; selector: name=mw145[1-5].eqiad.wmnet
07:48 dzahn@cumin1001: conftool action : set/weight=30; selector: name=mw145[1-5].eqiad.wmnet
07:44 marostegui: Drop aft_feedback tables on x1 T250715
07:39 jelto@cumin1001: conftool action : set/pooled=yes; selector: name=mw1450.eqiad.wmnet
07:39 jelto@cumin1001: conftool action : set/pooled=yes; selector: name=mw144[7-9].eqiad.wmnet
06:57 tstarling@deploy1002: Synchronized php-1.37.0-wmf.18/extensions/SecurePoll/includes/Entities/Election.php: T288924 (duration: 00m 57s)
06:56 mwdebug-deploy@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
06:55 tstarling@deploy1002: Synchronized php-1.37.0-wmf.18/extensions/SecurePoll/cli/dump.php: T288924 (duration: 00m 58s)
06:54 mwdebug-deploy@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
05:59 TimStarling: foreachwikiindblist securepollglobal mysql.php --write -- -e 'insert into securepoll_properties (pr_entity,pr_key,pr_value) select el_entity,'\mobile-jump-url'\,'\https://vote.m.wikimedia.org/wiki/Special:SecurePoll'\ from securepoll_elections where el_title='\DWalden STV Election Test 456'\ limit 1;'
05:47 mwdebug-deploy@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
05:43 mwdebug-deploy@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
05:37 tstarling@deploy1002: Finished scap: collected SecurePoll maintenance scripts and bug fix (duration: 04m 12s)
05:36 mwdebug-deploy@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
05:35 mwdebug-deploy@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
05:33 tstarling@deploy1002: Started scap: collected SecurePoll maintenance scripts and bug fix
05:28 mwdebug-deploy@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
05:27 mwdebug-deploy@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
03:11 eileen: civicrm revision changed from 175a3101f7 to 66568246a2, config revision is 7bdc78073d
02:39 mwdebug-deploy@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
02:37 mwdebug-deploy@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
02:05 mwdebug-deploy@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
02:04 mwdebug-deploy@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
00:44 eileen: civicrm revision changed from ba0c7705bb to 175a3101f7, config revision is 7bdc78073d
00:14 mwdebug-deploy@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
00:12 mwdebug-deploy@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
00:07 urbanecm@deploy1002: Synchronized wmf-config/InitialiseSettings.php: eccdd3e: Growth mentor dashboard: Enable on testwiki (T278920) (duration: 00m 59s)

2021-08-16

23:20 mwdebug-deploy@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
23:20 urbanecm: Evening B&C window done
23:19 mwdebug-deploy@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
23:18 urbanecm@deploy1002: Synchronized wmf-config/InitialiseSettings.php: a14868b: Enable NewUserMessage on hiwiktionary (T287091) (duration: 01m 00s)
23:15 eileen: civicrm revision changed from 1e32084622 to ba0c7705bb, config revision is 7bdc78073d
22:13 bblack: dns[1235]002: upgrade gdnsd package to 3.8.0-1~wmf1
21:31 bblack: authdns1001: upgrade gdnsd package to 3.8.0-1~wmf1
21:28 bblack: dns4002: upgrade gdnsd package to 3.8.0-1~wmf1
20:38 bstorm@cumin1001: END (PASS) - Cookbook wmcs.wikireplicas.add_wiki (exit_code=0)
20:38 bstorm@cumin1001: Added views for new wiki: labswiki T287442
20:37 bstorm@cumin1001: START - Cookbook wmcs.wikireplicas.add_wiki
20:36 bstorm@cumin1001: END (FAIL) - Cookbook wmcs.wikireplicas.add_wiki (exit_code=99)
20:36 bstorm@cumin1001: START - Cookbook wmcs.wikireplicas.add_wiki
20:35 bstorm@cumin1001: END (FAIL) - Cookbook wmcs.wikireplicas.add_wiki (exit_code=99)
20:35 bstorm@cumin1001: START - Cookbook wmcs.wikireplicas.add_wiki
18:48 dancy: Restarted Jenkins due to stuck jobs.
18:17 cmjohnson@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
18:15 cmjohnson@cumin1001: START - Cookbook sre.dns.netbox
17:52 cmjohnson@cumin1001: END (FAIL) - Cookbook sre.hosts.downtime (exit_code=99) for 2:00:00 on mw1453.eqiad.wmnet with reason: REIMAGE
17:49 cmjohnson@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on mw1453.eqiad.wmnet with reason: REIMAGE
17:34 cmjohnson1: installing new line card in slot1 cr2-eqiad T277339
17:33 ladsgroup@deploy1002: Synchronized php-1.37.0-wmf.18/extensions/SpamBlacklist/includes/SpamBlacklistHooks.php: Backport: Try to use EditStash before re-rendering (T288639) (duration: 00m 59s)
17:30 mwdebug-deploy@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
17:28 mwdebug-deploy@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
17:25 XioNoX: cr1-eqiad> request chassis fpc offline slot 5 - T277339
17:17 cmjohnson1: installing new line card in slot1 cr1-eqiad T277339
17:11 ejegg: updated fundraising CiviCRM from f3895dc907 to 1e32084622
17:08 XioNoX: asw2-a-eqiad> request virtual-chassis vc-port set pic-slot 1 member 8 port 1 - T288834
17:05 XioNoX: asw2-a-eqiad> request virtual-chassis vc-port delete pic-slot 1 member 8 port 1 - T288834
16:40 cmjohnson@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
16:37 cwhite: restart logstash on logstash1008
16:36 cmjohnson@cumin1001: START - Cookbook sre.dns.netbox
16:01 mutante: LDAP - added user tandic to nda group (T288527)
15:37 ryankemper: [WDQS] Re-pooled `codfw`: `ryankemper@puppetmaster1001:~$ sudo -i confctl --quiet --object-type discovery select 'dnsdisc=wdqs,name=codfw' set/pooled=true`
14:42 mutante: miscweb - deploying new microsite for Wikidata Query Builder subpage (T266703)
14:41 mutante: mw1455 - works fine after a reimage, unknown why it didnt last time, but ok :)
14:11 dzahn@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on mw1455.eqiad.wmnet with reason: REIMAGE
14:08 dzahn@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on mw1455.eqiad.wmnet with reason: REIMAGE
13:53 mutante: mw1455 - mysteriously showing a bunch of issues in icinga, broken packages, envoy, memcached etc, after recent fresh install, trying another reimage (T273915)
13:50 mwdebug-deploy@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
13:48 mwdebug-deploy@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
13:42 lucaswerkmeister-wmde@deploy1002: Synchronized wmf-config/InitialiseSettings.php: Config: Remove $wmgWikibaseFineGrainedLuaTracking (T288612) (duration: 00m 58s)
13:42 mwdebug-deploy@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
13:40 dzahn@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 5 days, 8:00:00 on planet1002.eqiad.wmnet with reason: known issue
13:40 dzahn@cumin1001: START - Cookbook sre.hosts.downtime for 5 days, 8:00:00 on planet1002.eqiad.wmnet with reason: known issue
13:40 mwdebug-deploy@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
13:40 lucaswerkmeister-wmde@deploy1002: Synchronized wmf-config/Wikibase.php: Config: Stop setting $wgWBClientSettings['fineGrainedLuaTracking'] (T288612) (duration: 00m 58s)
13:37 lucaswerkmeister-wmde@deploy1002: Synchronized wmf-config/InitialiseSettings-labs.php: Config: Remove $wmgWikibaseClientUseTermsTableSearchFields (T288612) (beta, 2/2) (duration: 00m 59s)
13:36 lucaswerkmeister-wmde@deploy1002: Synchronized wmf-config/InitialiseSettings.php: Config: Remove $wmgWikibaseClientUseTermsTableSearchFields (T288612) (prod, 1/2) (duration: 00m 59s)
13:34 mwdebug-deploy@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
13:33 lucaswerkmeister-wmde@deploy1002: Synchronized wmf-config/Wikibase.php: Config: Stop setting 'useTermsTableSearchFields' Wikibase option (T288612) (duration: 00m 59s)
13:32 mwdebug-deploy@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
12:22 Lucas_WMDE: EU backport+config window done (slightly belatedly)
12:20 mwdebug-deploy@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
12:19 mwdebug-deploy@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
12:18 tstarling@deploy1002: Synchronized php-1.37.0-wmf.18/extensions/SecurePoll/includes/Pages/VotePage.php: allow linking by title (duration: 00m 58s)
12:17 jynus@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on db2099.codfw.wmnet with reason: REIMAGE
12:15 lucaswerkmeister-wmde@deploy1002: Synchronized php-1.37.0-wmf.18/extensions/Math/src/HookHandlers/ParserHooksHandler.php: Backport: Support null content in parser tag hook (T288846) (hopefully also fixes T288790) (duration: 00m 59s)
12:15 jynus@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on db2099.codfw.wmnet with reason: REIMAGE
12:14 kormat: clean up old /root/.my.cnf files T150446
11:54 mwdebug-deploy@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
11:50 mwdebug-deploy@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
11:49 lucaswerkmeister-wmde@deploy1002: Synchronized wmf-config/InitialiseSettings.php: Config: Add extendedconfirmed on zhwiki (T287322) + Config: Fix extendedconfirmed for bots on zhwiki (T287322) (duration: 01m 01s)
11:44 mwdebug-deploy@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
11:42 mwdebug-deploy@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
11:26 Lucas_WMDE: namespaceDupes.php for T287024 finished
11:22 Lucas_WMDE: lucaswerkmeister-wmde@mwmaint2002:~$ mwscript namespaceDupes.php hrwiki --fix --add-prefix=T287024/ | tee T287024.out # T287024
11:12 lucaswerkmeister-wmde@deploy1002: Synchronized wmf-config/InitialiseSettings.php: Config: Add namespace aliases for hr.wiki (T287024) (duration: 00m 59s)
11:11 mwdebug-deploy@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
11:07 mwdebug-deploy@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
10:40 mwdebug-deploy@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
10:33 mwdebug-deploy@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
10:32 ladsgroup@deploy1002: Synchronized wmf-config/Wikibase.php: Config: Add tags for wikidata edits (T236893) (duration: 00m 58s)
09:16 gehel: depooling wdqs codfw to allow catching up on lag
08:49 jynus: replacing s2 with s4 on db2097 T287230
08:28 gehel: repool wdqs eqiad (`confctl --quiet --object-type discovery select 'dnsdisc=wdqs,name=eqiad' set/pooled=true`) - codfw currently overloaded
07:47 marostegui: Rename aft_feedback tables on db2115, db2131 - T250715
06:41 TimStarling: on votewiki, set voter-privacy option to 1 on all prior elections T288924
05:54 marostegui@cumin1001: dbctl commit (dc=all): 'db2088:3312 (re)pooling @ 100%: After upgrade', diff saved to https://phabricator.wikimedia.org/P17031 and previous config saved to /var/cache/conftool/dbconfig/20210816-055445-root.json
05:54 marostegui@cumin1001: dbctl commit (dc=all): 'db2088:3311 (re)pooling @ 100%: After upgrade', diff saved to https://phabricator.wikimedia.org/P17030 and previous config saved to /var/cache/conftool/dbconfig/20210816-055427-root.json
05:39 marostegui@cumin1001: dbctl commit (dc=all): 'db2088:3312 (re)pooling @ 75%: After upgrade', diff saved to https://phabricator.wikimedia.org/P17029 and previous config saved to /var/cache/conftool/dbconfig/20210816-053941-root.json
05:39 marostegui@cumin1001: dbctl commit (dc=all): 'db2088:3311 (re)pooling @ 75%: After upgrade', diff saved to https://phabricator.wikimedia.org/P17028 and previous config saved to /var/cache/conftool/dbconfig/20210816-053924-root.json
05:24 marostegui@cumin1001: dbctl commit (dc=all): 'db2088:3312 (re)pooling @ 50%: After upgrade', diff saved to https://phabricator.wikimedia.org/P17027 and previous config saved to /var/cache/conftool/dbconfig/20210816-052437-root.json
05:24 marostegui@cumin1001: dbctl commit (dc=all): 'db2088:3311 (re)pooling @ 50%: After upgrade', diff saved to https://phabricator.wikimedia.org/P17026 and previous config saved to /var/cache/conftool/dbconfig/20210816-052420-root.json
05:09 marostegui@cumin1001: dbctl commit (dc=all): 'db2088:3312 (re)pooling @ 25%: After upgrade', diff saved to https://phabricator.wikimedia.org/P17025 and previous config saved to /var/cache/conftool/dbconfig/20210816-050934-root.json
05:09 marostegui@cumin1001: dbctl commit (dc=all): 'db2088:3311 (re)pooling @ 25%: After upgrade', diff saved to https://phabricator.wikimedia.org/P17024 and previous config saved to /var/cache/conftool/dbconfig/20210816-050916-root.json
04:54 marostegui@cumin1001: dbctl commit (dc=all): 'db2088:3312 (re)pooling @ 10%: After upgrade', diff saved to https://phabricator.wikimedia.org/P17023 and previous config saved to /var/cache/conftool/dbconfig/20210816-045430-root.json
04:54 marostegui@cumin1001: dbctl commit (dc=all): 'db2088:3311 (re)pooling @ 10%: After upgrade', diff saved to https://phabricator.wikimedia.org/P17022 and previous config saved to /var/cache/conftool/dbconfig/20210816-045413-root.json
04:49 marostegui: Upgrade db2088 (s1 and s2) to 10.4.21
04:49 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db2088 (s1 and s2) to upgrade', diff saved to https://phabricator.wikimedia.org/P17021 and previous config saved to /var/cache/conftool/dbconfig/20210816-044906-marostegui.json

2021-08-15

20:02 addshore: restarting blazegraph on wdqs2004
16:13 andrew@deploy1002: Finished deploy [horizon/deploy@c23a155]: adding cinder volume resize warning (duration: 03m 52s)
16:10 andrew@deploy1002: Started deploy [horizon/deploy@c23a155]: adding cinder volume resize warning

2021-08-14

03:54 legoktm[m]: restarting mailman3 on lists1001, bounce runner crashed (T288880)

2021-08-13

18:43 bblack: reprepro: uploaded gdnsd-3.8.0-1~wmf1 to buster-wikimedia - T252132
17:32 jelto@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 3 days, 0:00:00 on mw[1451-1452,1454-1455].eqiad.wmnet with reason: setup new mediawiki servers in eqiad https://phabricator.wikimedia.org/T279309
17:32 jelto@cumin1001: START - Cookbook sre.hosts.downtime for 3 days, 0:00:00 on mw[1451-1452,1454-1455].eqiad.wmnet with reason: setup new mediawiki servers in eqiad https://phabricator.wikimedia.org/T279309
17:06 jelto@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on mw[1451-1452,1454-1455].eqiad.wmnet with reason: setup new mediawiki servers in eqiad https://phabricator.wikimedia.org/T279309
17:05 jelto@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on mw[1451-1452,1454-1455].eqiad.wmnet with reason: setup new mediawiki servers in eqiad https://phabricator.wikimedia.org/T279309
15:39 mutante: mw1451, mw1452, mw1454 - rebooting after reimage, memcached needs one
15:30 mutante: mw1453 - racadm serveraction powercycle (down and was working until right before the switch issue)
15:18 godog: restart pybal on lvs2009, to clear CRITICAL - thanos-swift_443: Servers thanos-fe2002.codfw.wmnet are marked down but pooled
15:14 godog: restart pybal on lvs2010, to clear CRITICAL - thanos-swift_443: Servers thanos-fe2002.codfw.wmnet are marked down but pooled
15:02 mutante: etherpad1002 - started failed ferm
15:00 mutante: an-worker1117, an-worker1118 - started failed ferm (why are these slowly trickling in )
14:57 jelto@cumin1001: conftool action : set/pooled=no; selector: name=mw1450.eqiad.wmnet
14:57 jelto@cumin1001: conftool action : set/pooled=no; selector: name=mw144[7-9].eqiad.wmnet
14:54 dzahn@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on mw[1451-1452,1454-1455].eqiad.wmnet with reason: new setup
14:54 dzahn@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on mw[1451-1452,1454-1455].eqiad.wmnet with reason: new setup
14:50 mutante: an-worker1079 - started failed ferm
14:47 jelto@cumin1001: conftool action : set/weight=25; selector: name=mw1450.eqiad.wmnet
14:46 jelto@cumin1001: conftool action : set/weight=25; selector: name=mw144[7-9].eqiad.wmnet
14:45 mutante: an-worker1095 - started ferm, service failed
14:44 mutante: an-worker1082 - started ferm (was failed due to DNS hickup)
14:44 jelto@cumin1001: conftool action : set/pooled=inactive; selector: name=mw1450.eqiad.wmnet
14:43 jelto@cumin1001: conftool action : set/pooled=inactive; selector: name=mw144[7-9].eqiad.wmnet
14:41 mutante: mw1419 - started ferm
13:35 sukhe: ran homer for Gerrit 712400: Set up BGP peering to doh4002 in ulsfo
13:23 mutante: mw1453 - manual powercycle after it never rebooted when the reimage cookbook tries to trigger one
13:22 jelto@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on mw1450.eqiad.wmnet with reason: setup new mediawiki servers in eqiad https://phabricator.wikimedia.org/T279309
13:21 jelto@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on mw1450.eqiad.wmnet with reason: setup new mediawiki servers in eqiad https://phabricator.wikimedia.org/T279309
13:21 jelto@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on mw[1447-1449].eqiad.wmnet with reason: setup new mediawiki servers in eqiad https://phabricator.wikimedia.org/T279309
13:21 jelto@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on mw[1447-1449].eqiad.wmnet with reason: setup new mediawiki servers in eqiad https://phabricator.wikimedia.org/T279309
12:54 dzahn@cumin1001: END (FAIL) - Cookbook sre.hosts.downtime (exit_code=99) for 2:00:00 on mw1454.eqiad.wmnet with reason: REIMAGE
12:53 godog: set runtime envoy.reloadable_features.strict_1xx_and_204_response_headers=false on thanos-fe* - T288815
12:53 dzahn@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on thanos-fe2001.codfw.wmnet with reason: new setup
12:53 dzahn@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on thanos-fe2001.codfw.wmnet with reason: new setup
12:52 dzahn@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on mw1454.eqiad.wmnet with reason: REIMAGE
12:33 dzahn@cumin1001: END (FAIL) - Cookbook sre.hosts.downtime (exit_code=99) for 2:00:00 on mw1452.eqiad.wmnet with reason: REIMAGE
12:31 dzahn@cumin1001: END (FAIL) - Cookbook sre.hosts.downtime (exit_code=99) for 2:00:00 on mw1451.eqiad.wmnet with reason: REIMAGE
12:30 dzahn@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on mw1452.eqiad.wmnet with reason: REIMAGE
12:29 dzahn@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 4:00:00 on mw1450.eqiad.wmnet with reason: new setup
12:29 dzahn@cumin1001: START - Cookbook sre.hosts.downtime for 4:00:00 on mw1450.eqiad.wmnet with reason: new setup
12:29 dzahn@cumin1001: END (FAIL) - Cookbook sre.hosts.downtime (exit_code=99) for 2:00:00 on mw1450.eqiad.wmnet with reason: REIMAGE
12:28 dzahn@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on mw1451.eqiad.wmnet with reason: REIMAGE
12:26 dzahn@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on mw1450.eqiad.wmnet with reason: REIMAGE
12:26 jelto@cumin1001: END (FAIL) - Cookbook sre.hosts.downtime (exit_code=99) for 2:00:00 on mw1455.eqiad.wmnet with reason: REIMAGE
12:24 urbanecm: mwscript extensions/Translate/scripts/refresh-translatable-pages.php --wiki=commonswiki --jobqueue # T288683
12:24 jelto@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on mw1455.eqiad.wmnet with reason: REIMAGE
12:24 dzahn@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on mw1449.eqiad.wmnet with reason: REIMAGE
12:22 dzahn@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on mw1448.eqiad.wmnet with reason: REIMAGE
12:21 dzahn@cumin1001: conftool action : set/pooled=yes; selector: name=mw1444.eqiad.wmnet
12:21 dzahn@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on mw1449.eqiad.wmnet with reason: REIMAGE
12:21 mutante: mw1444 - scap pull, pooled as new API server for the first time
12:20 dzahn@cumin1001: conftool action : set/weight=30; selector: name=mw1444.eqiad.wmnet
12:19 dzahn@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on mw1448.eqiad.wmnet with reason: REIMAGE
11:59 urbanecm: mwscript extensions/Translate/scripts/refresh-translatable-pages.php --wiki=mediawikiwiki --jobqueue # T288683
11:36 topranks: cloudsw1-d5-eqiad - configuring new 2x40G trunk to cloudsw2-d5-eqiad with homer (T277340)
11:11 jelto: mw1455 - powering on via mgmt - OS install, initial setup (T279309, T273915)
10:22 dzahn@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 4:00:00 on mw1444.eqiad.wmnet with reason: new setup
10:22 dzahn@cumin1001: START - Cookbook sre.hosts.downtime for 4:00:00 on mw1444.eqiad.wmnet with reason: new setup
10:07 filippo@puppetmaster1001: conftool action : set/pooled=yes; selector: name=thanos-fe2003.codfw.wmnet
09:53 dzahn@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 4:00:00 on mw1444.eqiad.wmnet with reason: new setup
09:53 dzahn@cumin1001: START - Cookbook sre.hosts.downtime for 4:00:00 on mw1444.eqiad.wmnet with reason: new setup
09:42 mutante: mw1448, mw1449, mw1450 - powering on via mgmt - OS install, initial setup (T279309, T273915)
09:38 filippo@cumin1001: END (FAIL) - Cookbook sre.hosts.downtime (exit_code=99) for 2:00:00 on thanos-fe2003.codfw.wmnet with reason: REIMAGE
09:35 filippo@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on thanos-fe2003.codfw.wmnet with reason: REIMAGE
09:35 mutante: mw1444 - signed puppet cert, initial run (after hardware fix) T279309
09:17 filippo@puppetmaster1001: conftool action : set/pooled=no; selector: name=thanos-fe2003.codfw.wmnet
09:17 filippo@puppetmaster1001: conftool action : set/pooled=yes; selector: name=thanos-fe2001.codfw.wmnet
09:15 filippo@puppetmaster1001: conftool action : set/pooled=yes; selector: name=thanos-fe2002.codfw.wmnet
08:42 filippo@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on thanos-fe2002.codfw.wmnet with reason: REIMAGE
08:40 filippo@puppetmaster1001: conftool action : set/pooled=no; selector: name=thanos-fe2002.codfw.wmnet
08:40 filippo@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on thanos-fe2002.codfw.wmnet with reason: REIMAGE
05:24 marostegui@cumin1001: END (FAIL) - Cookbook sre.hosts.downtime (exit_code=99) for 2:00:00 on db1132.eqiad.wmnet with reason: REIMAGE
05:22 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on db1132.eqiad.wmnet with reason: REIMAGE
01:02 tgr: running extensions/GrowthExperiments/maintenance/fixLinkRecommendationData.php for Growth wikis

2021-08-12

23:50 derick@deploy1002: Synchronized wmf-config/InitialiseSettings.php: Config: Set archive namespaces on foundationwiki to 'noindex,follow' (T288763) (duration: 00m 59s)
23:47 mwdebug-deploy@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
23:45 mwdebug-deploy@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
23:38 cjming@deploy1002: Synchronized php-1.37.0-wmf.18/extensions/GrowthExperiments: Backport: Add Link: fix invalidation on non-addlink edit (T283606) (duration: 01m 00s)
23:34 mwdebug-deploy@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
23:33 mwdebug-deploy@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
22:09 tgr: T283867 running userOptions.php on Growth wikis as per T283867#7280296
22:00 mwdebug-deploy@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
21:59 mwdebug-deploy@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
21:57 ladsgroup@deploy1002: Synchronized php-1.37.0-wmf.18/extensions/SpamBlacklist/includes/SpamBlacklistHooks.php: Backport: Don't generate HTML when asking for ParserOutput (T288639) (duration: 00m 58s)
21:52 urbanecm: Run `mwscript extensions/Translate/scripts/refresh-translatable-pages.php --wiki=$WIKI --jobqueue` for a bunch of Translate-enabled wikis (T288683)
21:32 mwdebug-deploy@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
21:31 mwdebug-deploy@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
21:30 twentyafterfour@deploy1002: rebuilt and synchronized wikiversions files: all wikis to 1.37.0-wmf.18 refs T281159
21:13 twentyafterfour@deploy1002: Synchronized php-1.37.0-wmf.18/extensions/DiscussionTools/includes/Notifications/EventDispatcher.php: sync Ic27418 to unblock the train refs T288775 and T281159 (duration: 01m 07s)
20:56 urbanecm: [urbanecm@mwmaint2002 /srv/mediawiki/php]$ mwscript extensions/Translate/scripts/refresh-translatable-pages.php --wiki=testwikidatawiki --jobqueue # T288683, errored out
20:54 mwdebug-deploy@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
20:54 urbanecm: [urbanecm@mwmaint2002 /srv/mediawiki/php]$ mwscript extensions/Translate/scripts/refresh-translatable-pages.php --wiki=testwiki --jobqueue # T288683
20:53 mwdebug-deploy@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
20:24 urbanecm: [urbanecm@mwmaint2002 /srv/mediawiki/php]$ mwscript extensions/Translate/scripts/refresh-translatable-pages.php --wiki=wikimaniawiki --jobqueue # T288683
20:13 urbanecm: [urbanecm@mwmaint2002 /srv/mediawiki/php]$ mwscript extensions/Translate/scripts/refresh-translatable-pages.php --wiki=wikimaniawiki --jobqueue # T288683
19:43 twentyafterfour@deploy1002: Synchronized php-1.37.0-wmf.18/extensions/Translate/src/PageTranslation/TranslationPage.php: sync I2f46ab which should fix T288683 & T288700 thus unblocking the train: T281159 (duration: 01m 07s)
19:36 mwdebug-deploy@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
19:34 mwdebug-deploy@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
18:52 cmjohnson@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
18:49 cmjohnson@cumin1001: START - Cookbook sre.dns.netbox
16:47 dzahn@cumin1001: END (PASS) - Cookbook sre.ganeti.makevm (exit_code=0) for new host doh4002.wikimedia.org
16:37 dzahn@cumin1001: START - Cookbook sre.ganeti.makevm for new host doh4002.wikimedia.org
16:33 mbsantos@deploy1002: Finished deploy [tilerator/deploy@b88cf50]: maps1005: (duration: 00m 15s)
16:32 mbsantos@deploy1002: Started deploy [tilerator/deploy@b88cf50]: maps1005:
16:32 effie: enabling puppet on mediawiki servers && rolling restart mcrouter
16:31 mbsantos@deploy1002: Finished deploy [tilerator/deploy@b88cf50]: maps1006: (duration: 00m 15s)
16:31 mbsantos@deploy1002: Started deploy [tilerator/deploy@b88cf50]: maps1006:
16:31 mbsantos@deploy1002: Finished deploy [tilerator/deploy@b88cf50]: maps1007: (duration: 00m 15s)
16:30 mbsantos@deploy1002: Started deploy [tilerator/deploy@b88cf50]: maps1007:
16:29 mbsantos@deploy1002: Finished deploy [tilerator/deploy@b88cf50]: maps1008: (duration: 00m 15s)
16:29 mbsantos@deploy1002: Started deploy [tilerator/deploy@b88cf50]: maps1008:
16:29 mbsantos@deploy1002: Finished deploy [tilerator/deploy@b88cf50]: maps1009: (duration: 00m 17s)
16:28 mbsantos@deploy1002: Started deploy [tilerator/deploy@b88cf50]: maps1009:
16:27 mbsantos@deploy1002: Finished deploy [tilerator/deploy@b88cf50]: maps1010: (duration: 00m 15s)
16:27 mbsantos@deploy1002: Started deploy [tilerator/deploy@b88cf50]: maps1010:
16:26 mbsantos@deploy1002: Finished deploy [tilerator/deploy@b88cf50]: maps2005: (duration: 00m 24s)
16:26 mbsantos@deploy1002: Started deploy [tilerator/deploy@b88cf50]: maps2005:
16:24 mbsantos@deploy1002: Finished deploy [tilerator/deploy@b88cf50]: maps2006: (duration: 00m 23s)
16:24 mbsantos@deploy1002: Started deploy [tilerator/deploy@b88cf50]: maps2006:
16:23 mbsantos@deploy1002: Finished deploy [tilerator/deploy@b88cf50]: maps2007: (duration: 00m 27s)
16:23 mbsantos@deploy1002: Started deploy [tilerator/deploy@b88cf50]: maps2007:
16:22 mbsantos@deploy1002: Finished deploy [tilerator/deploy@b88cf50]: maps2008: (duration: 00m 24s)
16:21 mbsantos@deploy1002: Started deploy [tilerator/deploy@b88cf50]: maps2008:
16:16 mbsantos@deploy1002: Finished deploy [tilerator/deploy@b88cf50]: maps2009: (duration: 00m 24s)
16:15 mbsantos@deploy1002: Started deploy [tilerator/deploy@b88cf50]: maps2009:
16:14 elukey@deploy1002: helmfile [ml-serve-eqiad] DONE helmfile.d/admin 'sync'.
16:14 elukey@deploy1002: helmfile [ml-serve-eqiad] START helmfile.d/admin 'sync'.
16:14 mbsantos@deploy1002: Finished deploy [tilerator/deploy@b88cf50]: maps2010: (duration: 00m 23s)
16:14 mbsantos@deploy1002: Started deploy [tilerator/deploy@b88cf50]: maps2010:
16:14 elukey@deploy1002: helmfile [ml-serve-eqiad] DONE helmfile.d/admin 'sync'.
16:14 elukey@deploy1002: helmfile [ml-serve-eqiad] START helmfile.d/admin 'sync'.
16:13 mbsantos@deploy1002: Finished deploy [tilerator/deploy@b88cf50]: Deploy tilerator 1.1.7-beta.5 (duration: 02m 30s)
16:10 mbsantos@deploy1002: Started deploy [tilerator/deploy@b88cf50]: Deploy tilerator 1.1.7-beta.5
15:50 papaul: powerdown ms-be2060 for relocation
15:49 mutante: netbox - deleted 2620:0:863:1:198:35:26:6/64 (along with 198.35.26.6) due to the previous error when running makevm cookbook (T288630)
15:47 mutante: netbox - deleted 198.35.26.6 (doh4002)
15:44 pt1979@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
15:37 dzahn@cumin1001: END (FAIL) - Cookbook sre.ganeti.makevm (exit_code=99) for new host doh4002.wikimedia.org
15:36 pt1979@cumin2002: START - Cookbook sre.dns.netbox
15:35 dzahn@cumin1001: START - Cookbook sre.ganeti.makevm for new host doh4002.wikimedia.org
15:33 moritzm: importing openjdk-8 8u302-b08-1+deb11u1 to apt.wikimedia.org/component/jdk8 T287960
15:10 btullis@cumin1001: END (PASS) - Cookbook sre.hosts.decommission (exit_code=0) for hosts druid1002.eqiad.wmnet
15:07 filippo@cumin1001: END (FAIL) - Cookbook sre.hosts.downtime (exit_code=99) for 2:00:00 on thanos-fe1003.eqiad.wmnet with reason: REIMAGE
15:04 filippo@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on thanos-fe1003.eqiad.wmnet with reason: REIMAGE
15:00 btullis@cumin1001: START - Cookbook sre.hosts.decommission for hosts druid1002.eqiad.wmnet
14:48 papaul: reset to factory ps-test-d8-codfw
14:35 filippo@cumin1001: END (FAIL) - Cookbook sre.hosts.downtime (exit_code=99) for 2:00:00 on thanos-fe1002.eqiad.wmnet with reason: REIMAGE
14:33 filippo@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on thanos-fe1002.eqiad.wmnet with reason: REIMAGE
14:33 papaul: reset to factory ps2-test-d8-codfw
14:25 hnowlan: reenabling puppet on P:cassandra
13:57 hnowlan: disabling puppet on P:cassandra to test removal of cassandra-metrics-agent
13:50 effie: disable puppet on mediawiki hosts to merge 705852
13:39 hnowlan@cumin1001: END (PASS) - Cookbook sre.cassandra.roll-restart (exit_code=0) for nodes matching A:sessionstore: Restarting to pick up Java security updates - hnowlan@cumin1001
13:31 btullis@cumin1001: END (PASS) - Cookbook sre.hosts.decommission (exit_code=0) for hosts druid1003.eqiad.wmnet
13:20 btullis@cumin1001: START - Cookbook sre.hosts.decommission for hosts druid1003.eqiad.wmnet
13:03 hnowlan@cumin1001: START - Cookbook sre.cassandra.roll-restart for nodes matching A:sessionstore: Restarting to pick up Java security updates - hnowlan@cumin1001
12:43 godog: upgrade NIC firmware on thanos-be2* / thanos-fe2* - T286722
12:28 hnowlan@cumin1001: END (PASS) - Cookbook sre.cassandra.roll-restart (exit_code=0) for nodes matching A:restbase-codfw: Restarting to pick up Java security updates - hnowlan@cumin1001
12:23 filippo@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on thanos-fe1001.eqiad.wmnet with reason: REIMAGE
12:18 filippo@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on thanos-fe1001.eqiad.wmnet with reason: REIMAGE
12:09 godog: upgrade NIC firmware on thanos-be1* - T286722
12:08 godog: upgrade NIC firmware on thanos-fe100[34] - T286722
12:04 godog: upgrade NIC firmware on thanos-fe100[12] - T286722
11:57 moritzm: installing openexr security updates
11:47 moritzm: installing bluez security updates on buster
10:22 jmm@cumin2002: END (PASS) - Cookbook sre.idm.logout (exit_code=0) Logging Holger Knust out of all services on: 1743 hosts
10:22 jmm@cumin2002: START - Cookbook sre.idm.logout Logging Holger Knust out of all services on: 1743 hosts
10:18 marostegui@cumin1001: dbctl commit (dc=all): 'Pool db2107 into API', diff saved to https://phabricator.wikimedia.org/P17016 and previous config saved to /var/cache/conftool/dbconfig/20210812-101840-marostegui.json
10:18 mvolz@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'citoid' for release 'production' .
10:13 mvolz@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'citoid' for release 'production' .
10:08 mvolz@deploy1002: helmfile [staging] Ran 'sync' command on namespace 'citoid' for release 'staging' .
09:49 hnowlan@cumin1001: START - Cookbook sre.cassandra.roll-restart for nodes matching A:restbase-codfw: Restarting to pick up Java security updates - hnowlan@cumin1001
09:38 mwdebug-deploy@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
09:36 mwdebug-deploy@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
09:31 lucaswerkmeister-wmde@deploy1002: Synchronized php-1.37.0-wmf.18/extensions/Wikibase/: Backport: Revert "Inject NamespaceInfo into EntitySourceDefinitionsConfigParser" (T288724) (2/2) (duration: 01m 12s)
09:30 kormat@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on 8 hosts with reason: Reconfiguring replication tree T284825
09:30 kormat@cumin1001: START - Cookbook sre.hosts.downtime for 1:00:00 on 8 hosts with reason: Reconfiguring replication tree T284825
09:30 mwdebug-deploy@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
09:29 lucaswerkmeister-wmde@deploy1002: Synchronized php-1.37.0-wmf.18/extensions/Wikibase/data-access/: Backport: Revert "Inject NamespaceInfo into EntitySourceDefinitionsConfigParser" (T288724) (1/2) (duration: 01m 08s)
09:29 mwdebug-deploy@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
09:29 marostegui@cumin1001: dbctl commit (dc=all): 'db2107 (re)pooling @ 100%: After reimage', diff saved to https://phabricator.wikimedia.org/P17015 and previous config saved to /var/cache/conftool/dbconfig/20210812-092909-root.json
09:28 kormat: reconfiguring replication tree for pc1 T284825
09:27 kormat@deploy1002: Synchronized wmf-config/ProductionServices.php: Promote pc2011 to primary of pc1 T284825 (duration: 01m 10s)
09:14 marostegui@cumin1001: dbctl commit (dc=all): 'db2107 (re)pooling @ 80%: After reimage', diff saved to https://phabricator.wikimedia.org/P17014 and previous config saved to /var/cache/conftool/dbconfig/20210812-091406-root.json
08:59 marostegui@cumin1001: dbctl commit (dc=all): 'db2107 (re)pooling @ 60%: After reimage', diff saved to https://phabricator.wikimedia.org/P17013 and previous config saved to /var/cache/conftool/dbconfig/20210812-085902-root.json
08:57 mwdebug-deploy@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
08:56 mwdebug-deploy@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
08:55 dcaro@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on cloudservices[1003-1004].wikimedia.org with reason: T288725
08:55 dcaro@cumin1001: START - Cookbook sre.hosts.downtime for 1:00:00 on cloudservices[1003-1004].wikimedia.org with reason: T288725
08:53 kormat@deploy1002: Synchronized wmf-config/ProductionServices.php: Adding new pc hosts (duration: 01m 09s)
08:48 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host copernicium.wikimedia.org
08:48 jmm@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host theemin.codfw.wmnet
08:43 marostegui@cumin1001: dbctl commit (dc=all): 'db2107 (re)pooling @ 50%: After reimage', diff saved to https://phabricator.wikimedia.org/P17012 and previous config saved to /var/cache/conftool/dbconfig/20210812-084359-root.json
08:43 jmm@cumin1001: START - Cookbook sre.hosts.reboot-single for host theemin.codfw.wmnet
08:43 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host copernicium.wikimedia.org
08:41 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host people1003.eqiad.wmnet
08:38 jmm@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host cumin2002.codfw.wmnet
08:37 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host people1003.eqiad.wmnet
08:29 jmm@cumin1001: START - Cookbook sre.hosts.reboot-single for host cumin2002.codfw.wmnet
08:28 marostegui@cumin1001: dbctl commit (dc=all): 'db2107 (re)pooling @ 40%: After reimage', diff saved to https://phabricator.wikimedia.org/P17011 and previous config saved to /var/cache/conftool/dbconfig/20210812-082855-root.json
08:21 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host people2002.codfw.wmnet
08:18 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host people2002.codfw.wmnet
08:13 marostegui@cumin1001: dbctl commit (dc=all): 'db2107 (re)pooling @ 30%: After reimage', diff saved to https://phabricator.wikimedia.org/P17010 and previous config saved to /var/cache/conftool/dbconfig/20210812-081351-root.json
07:58 marostegui@cumin1001: dbctl commit (dc=all): 'db2107 (re)pooling @ 20%: After reimage', diff saved to https://phabricator.wikimedia.org/P17009 and previous config saved to /var/cache/conftool/dbconfig/20210812-075848-root.json
07:58 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host sretest1002.eqiad.wmnet
07:53 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host sretest1002.eqiad.wmnet
07:52 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host thanos-fe2001.codfw.wmnet
07:46 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host thanos-fe2001.codfw.wmnet
07:43 marostegui@cumin1001: dbctl commit (dc=all): 'db2107 (re)pooling @ 15%: After reimage', diff saved to https://phabricator.wikimedia.org/P17008 and previous config saved to /var/cache/conftool/dbconfig/20210812-074344-root.json
07:40 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ldap-replica2006.wikimedia.org
07:38 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host ldap-replica2006.wikimedia.org
07:36 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ldap-replica2005.wikimedia.org
07:34 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host ldap-replica2005.wikimedia.org
07:32 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ldap-replica1004.wikimedia.org
07:30 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host ldap-replica1004.wikimedia.org
07:28 marostegui@cumin1001: dbctl commit (dc=all): 'db2107 (re)pooling @ 10%: After reimage', diff saved to https://phabricator.wikimedia.org/P17007 and previous config saved to /var/cache/conftool/dbconfig/20210812-072841-root.json
07:26 godog: temp upgrade thanos to 0.22.0 on thanos-fe2001 to help debug a potential upstream issue
07:25 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ldap-replica1003.wikimedia.org
07:23 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host ldap-replica1003.wikimedia.org
07:21 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host failoid1002.eqiad.wmnet
07:17 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host failoid1002.eqiad.wmnet
07:16 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host failoid2002.codfw.wmnet
07:13 marostegui@cumin1001: dbctl commit (dc=all): 'db2107 (re)pooling @ 5%: After reimage', diff saved to https://phabricator.wikimedia.org/P17006 and previous config saved to /var/cache/conftool/dbconfig/20210812-071337-root.json
07:13 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host failoid2002.codfw.wmnet
06:58 marostegui@cumin1001: dbctl commit (dc=all): 'db2107 (re)pooling @ 1%: After reimage', diff saved to https://phabricator.wikimedia.org/P17005 and previous config saved to /var/cache/conftool/dbconfig/20210812-065833-root.json
06:49 tstarling@deploy1002: Synchronized php-1.37.0-wmf.18/extensions/SecurePoll/includes/Crypt/GpgCrypt.php: fix for T288711 failure of election creation (duration: 01m 09s)
06:47 moritzm: updating bullseye installations to the latest state of testing
06:46 ryankemper@cumin1001: END (PASS) - Cookbook sre.wdqs.data-transfer (exit_code=0)
06:36 moritzm: installing c-ares security updates on Bullseye
06:32 mwdebug-deploy@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
06:31 mwdebug-deploy@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
06:00 marostegui: Failover m3 from db1132 to db1107 - T288197
05:15 ryankemper: [WDQS] `sudo -i cookbook sre.wdqs.data-transfer --source wdqs2005.codfw.wmnet --dest wdqs2004.codfw.wmnet --reason "transferring fresh wikidata journal after nuking wdqs2004's" --blazegraph_instance blazegraph`
05:15 ryankemper@cumin1001: START - Cookbook sre.wdqs.data-transfer
05:14 ryankemper: [WDQS Deploy] Deploy complete. Successful test query placed on query.wikidata.org, there's no relevant criticals in Icinga, and Grafana looks good
04:45 eileen: tools revision changed from c26a8c0cb6 to 15bfaa7117
04:44 ryankemper: [WDQS Deploy] Restarting `wdqs-categories` across lvs-managed hosts, one node at a time: `sudo -E cumin -b 1 'A:wdqs-all and not A:wdqs-test' 'depool && sleep 45 && systemctl restart wdqs-categories && sleep 45 && pool'`
04:44 ryankemper: [WDQS Deploy] Restarted `wdqs-categories` across all test hosts simultaneously: `sudo -E cumin 'A:wdqs-test' 'systemctl restart wdqs-categories'`
04:44 ryankemper: [WDQS Deploy] Restarted `wdqs-updater` across all hosts, 4 hosts at a time: `sudo -E cumin -b 4 'A:wdqs-all' 'systemctl restart wdqs-updater'`
04:43 ryankemper@deploy1002: Finished deploy [wdqs/wdqs@9d03aaa]: 0.3.81 (duration: 02m 07s)
04:41 ryankemper@deploy1002: Started deploy [wdqs/wdqs@9d03aaa]: 0.3.81
04:41 ryankemper: [WDQS Deploy] Re-rolling deploy so that `wdqs2004` gets deployed to
04:41 ryankemper: [WDQS] `wdqs2004`'s disk is full due to overinflated `wikidata.jnl`, nuking and depooling: `sudo rm -fv /srv/wdqs/wikidata.jnl && sudo depool`
04:40 ryankemper@deploy1002: Finished deploy [wdqs/wdqs@9d03aaa]: 0.3.81 (duration: 17m 03s)
04:26 ryankemper: [WDQS Deploy] Tests passing following deploy of `0.3.81` on canary `wdqs1003`; proceeding to rest of fleet
04:23 ryankemper@deploy1002: Started deploy [wdqs/wdqs@9d03aaa]: 0.3.81
04:21 ryankemper: [WDQS Deploy] Gearing up for deploy of wdqs `0.3.81`. Pre-deploy tests passing on canary `wdqs1003`
03:40 eileen: process-control config revision is 7bdc78073d
03:01 eileen: civicrm revision changed from d8ebf45819 to f3895dc907, config revision is 7bdc78073d

2021-08-11

23:30 mwdebug-deploy@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
23:29 mwdebug-deploy@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
23:24 ebernhardson@deploy1002: Synchronized wmf-config/InitialiseSettings.php: cirrus: switch more_like traffic to codfw 2/2 (duration: 01m 08s)
23:07 mwdebug-deploy@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
23:06 ebernhardson@deploy1002: Synchronized wmf-config/InitialiseSettings.php: cirrus: switch more_like traffic to codfw 1/2 (duration: 01m 08s)
23:06 mwdebug-deploy@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
22:34 mwdebug-deploy@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
22:33 mwdebug-deploy@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
22:32 legoktm@deploy1002: Synchronized php-1.37.0-wmf.18/extensions/Score/includes/Score.php: Record shell outs in statsd (duration: 01m 07s)
22:30 legoktm@deploy1002: Synchronized php-1.37.0-wmf.17/extensions/Score/includes/Score.php: Record shell outs in statsd (duration: 01m 08s)
21:46 mwdebug-deploy@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
21:42 ladsgroup@deploy1002: Synchronized php-1.37.0-wmf.17/extensions/SpamBlacklist/includes/SpamBlacklistHooks.php: Backport: Avoid using deprecated WikiPage::prepareContentForEdit (T288639) (duration: 01m 08s)
21:40 mwdebug-deploy@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
21:33 mwdebug-deploy@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
21:30 mwdebug-deploy@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
21:29 ladsgroup@deploy1002: Synchronized php-1.37.0-wmf.18/extensions/SpamBlacklist/includes/SpamBlacklistHooks.php: Backport: Avoid using deprecated WikiPage::prepareContentForEdit (T288639) (duration: 01m 07s)
21:18 legoktm@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
20:58 legoktm@cumin1001: START - Cookbook sre.dns.netbox
20:30 urbanecm: [urbanecm@mwmaint2002 ~]$ mwscript namespaceDupes.php --wiki=wikimaniawiki --move-talk --add-prefix=T288643 --fix # T288643
20:28 mwdebug-deploy@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
20:27 mwdebug-deploy@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
20:23 mholloway-shell@deploy1002: Synchronized php-1.37.0-wmf.17/extensions/Popups: Log VirtualPageView events to Event Platform (T288655) (duration: 01m 06s)
20:21 mwdebug-deploy@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
20:20 mholloway-shell@deploy1002: Synchronized php-1.37.0-wmf.18/extensions/Popups: Log VirtualPageView events to Event Platform (T288655) (duration: 01m 09s)
20:19 mwdebug-deploy@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
19:45 cmjohnson@cumin1001: START - Cookbook sre.dns.netbox
19:39 cmjohnson@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
19:36 mwdebug-deploy@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
19:35 cmjohnson@cumin1001: START - Cookbook sre.dns.netbox
19:30 mwdebug-deploy@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
19:29 jhuneidi@deploy1002: Synchronized php: group1 wikis to 1.37.0-wmf.18 refs T281159 (duration: 01m 08s)
19:28 jhuneidi@deploy1002: rebuilt and synchronized wikiversions files: group1 wikis to 1.37.0-wmf.18 refs T281159
19:22 mwdebug-deploy@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
19:19 mwdebug-deploy@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
19:10 jhuneidi@deploy1002: rebuilt and synchronized wikiversions files: group0 wikis to 1.37.0-wmf.18 refs T281159
19:01 jgleeson: payments-wiki updated from a70aaa7944 to 0a27dbe9b6
18:32 hnowlan@cumin1001: END (PASS) - Cookbook sre.cassandra.roll-restart (exit_code=0) for nodes matching A:restbase-eqiad: Restarting to pick up Java security updates - hnowlan@cumin1001
18:24 legoktm@deploy1002: helmfile [codfw] DONE helmfile.d/admin 'apply'.
18:23 legoktm@deploy1002: helmfile [codfw] START helmfile.d/admin 'apply'.
18:23 legoktm@deploy1002: helmfile [eqiad] DONE helmfile.d/admin 'apply'.
18:22 legoktm@deploy1002: helmfile [eqiad] START helmfile.d/admin 'apply'.
18:22 legoktm@deploy1002: helmfile [staging-eqiad] DONE helmfile.d/admin 'apply'.
18:21 bstorm: removed thirdparty/kubeadm-k8s-1-17 in reprepro
18:21 legoktm@deploy1002: helmfile [staging-eqiad] START helmfile.d/admin 'apply'.
18:20 legoktm@deploy1002: helmfile [staging-codfw] DONE helmfile.d/admin 'apply'.
18:19 legoktm@deploy1002: helmfile [staging-codfw] START helmfile.d/admin 'apply'.
18:04 ebernhardson@deploy1002: Finished deploy [wikimedia/discovery/analytics@563f876]: process_sparql_query: increase parallelism to help backfill (duration: 02m 21s)
18:02 ebernhardson@deploy1002: Started deploy [wikimedia/discovery/analytics@563f876]: process_sparql_query: increase parallelism to help backfill
17:37 mwdebug-deploy@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
17:36 mwdebug-deploy@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
17:35 jforrester@deploy1002: Synchronized php-1.37.0-wmf.18/includes/specials/pagers/ContribsPager.php: T288563 Don't explode Special:Contributions on extension-formatted rows (3/3) (duration: 01m 06s)
17:34 jforrester@deploy1002: Synchronized php-1.37.0-wmf.18/includes/Revision/RevisionFactory.php: T288563 Don't explode Special:Contributions on extension-formatted rows (2/3) (duration: 01m 08s)
17:32 jforrester@deploy1002: Synchronized php-1.37.0-wmf.18/includes/Revision/RevisionStore.php: T288563 Don't explode Special:Contributions on extension-formatted rows (1/3) (duration: 01m 09s)
16:22 dancy: Results of testing php_fpm_always_restart: php_fpm_always_restart=false: 1m19.942s php_fpm_always_restart=true: 3m12.836s
16:19 dancy@deploy1002: Synchronized README: Testing scap php-rpm rolling restart (after) (duration: 03m 12s)
16:16 thcipriani: moment of truth for php-fpm-always-restart in scap
16:10 btullis@cumin1001: END (PASS) - Cookbook sre.druid.roll-restart-workers (exit_code=0) for Druid analytics cluster: Roll restart of Druid jvm daemons. - btullis@cumin1001
16:05 dancy@deploy1002: Synchronized README: Testing scap php-rpm rolling restart (before) (duration: 01m 19s)
15:37 hnowlan@cumin1001: START - Cookbook sre.cassandra.roll-restart for nodes matching A:restbase-eqiad: Restarting to pick up Java security updates - hnowlan@cumin1001
15:12 moritzm: import openjdk-8 8u302-b08-1+wmf1 to bullseye-wikimedia (bootstrap build, not to be used yet) T287960
15:02 sukhe@cumin1001: END (FAIL) - Cookbook sre.hosts.decommission (exit_code=1) for hosts bast4002.wikimedia.org
14:57 btullis@cumin1001: START - Cookbook sre.druid.roll-restart-workers for Druid analytics cluster: Roll restart of Druid jvm daemons. - btullis@cumin1001
14:55 sukhe@cumin1001: START - Cookbook sre.hosts.decommission for hosts bast4002.wikimedia.org
14:55 sukhe@cumin1001: END (FAIL) - Cookbook sre.hosts.decommission (exit_code=99) for hosts bast4002.wikimedia.org
14:44 sukhe@cumin1001: START - Cookbook sre.hosts.decommission for hosts bast4002.wikimedia.org
14:44 sukhe: s/depool/decommission bast4002.wikimedia.org - T288579
14:43 sukhe: depool bast4002.wikimedia.org - T288579
14:23 moritzm: installing mx2002 T286911
14:21 hnowlan: disabled cassandra-metrics-collector on maps*
13:33 moritzm: installing Java 8/Java 11 security updates on various analytics hosts
13:29 btullis@cumin1001: END (PASS) - Cookbook sre.druid.roll-restart-workers (exit_code=0) for Druid analytics cluster: Roll restart of Druid jvm daemons. - btullis@cumin1001
12:45 moritzm: imported openjdk-8 8u302-b08-1~deb10u1 to component/jdk8 for buster-wikimedia (forward port of the latest Java 8 security release)
12:32 godog: roll-restart prometheus T284213
12:16 moritzm: installing c-ares security updates on stretch
12:16 btullis@cumin1001: START - Cookbook sre.druid.roll-restart-workers for Druid analytics cluster: Roll restart of Druid jvm daemons. - btullis@cumin1001
12:14 oblivian@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
12:08 oblivian@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
11:41 mwdebug-deploy@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
11:33 Lucas_WMDE: EU backport+config window done
11:32 lucaswerkmeister-wmde@deploy1002: Synchronized wmf-config/InitialiseSettings.php: Config: Remove $wmgWikibaseClientEntityNamespaces (T257260) (duration: 01m 08s)
11:31 mwdebug-deploy@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
11:29 mwdebug-deploy@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
11:29 lucaswerkmeister-wmde@deploy1002: Synchronized wmf-config/Wikibase.php: Config: Stop setting $wgWBClientSettings['entityNamespaces'] (T257260) (duration: 01m 07s)
11:25 lucaswerkmeister-wmde@deploy1002: Synchronized wmf-config/InitialiseSettings.php: Config: Remove $wmgWikibaseRepoEntityNamespaces (T257260) (duration: 01m 08s)
11:22 lucaswerkmeister-wmde@deploy1002: Synchronized wmf-config/Wikibase.php: Config: Stop setting $wgWBRepoSettings['entityNamespaces'] (T257260) (duration: 01m 08s)
11:17 oblivian@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
11:17 btullis@cumin1001: END (PASS) - Cookbook sre.druid.roll-restart-workers (exit_code=0) for Druid analytics cluster: Roll restart of Druid jvm daemons. - btullis@cumin1001
11:17 lucaswerkmeister-wmde@deploy1002: Synchronized php-1.37.0-wmf.18/extensions/SecurePoll/: Backport: Add ad-hoc logging to tally process (T288366) (duration: 01m 09s)
11:11 oblivian@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
11:06 lucaswerkmeister-wmde@deploy1002: Synchronized wmf-config/InitialiseSettings.php: Config: Disable Collection sidebar link on English Wikisource (T288021) (duration: 01m 14s)
10:58 oblivian@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
10:42 moritzm: rolling restart of Buster-based maps services to pick up c-ares security updates
10:37 oblivian@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
10:20 oblivian@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
10:02 btullis@cumin1001: START - Cookbook sre.druid.roll-restart-workers for Druid analytics cluster: Roll restart of Druid jvm daemons. - btullis@cumin1001
09:50 lucaswerkmeister-wmde@deploy1002: Synchronized php-1.37.0-wmf.18/includes/specials/SpecialWhatLinksHere.php: Backport: Fix SelectQueryBuilder use in SpecialWhatLinksHere (T288565) (duration: 01m 08s)
09:50 godog: upgrade thanos on cloudmetrics* - T288604
09:26 godog: upgrade thanos on prometheus* - T288604
09:21 elukey: run "sudo find /var/log/airflow -type f -mtime +15 -delete" on an-airflow1001 to free space (root partition almost full)
09:19 jgiannelos@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'mobileapps' for release 'production' .
09:15 jgiannelos@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'mobileapps' for release 'production' .
09:11 jgiannelos@deploy1002: helmfile [staging] Ran 'sync' command on namespace 'mobileapps' for release 'staging' .
09:05 godog: upgrade thanos on thanos-fe* - T288604
08:23 kormat@deploy1002: Synchronized wmf-config/ProductionServices.php: Minor cleanup of parsercache entries (duration: 01m 17s)
08:19 moritzm: restart Aphlict to pick up c-ares security updates
08:17 moritzm: restart Turnilo on an-tool1007 to pick up c-ares security updates
08:02 moritzm: rolling restart of AQS to pick up the c-ares security update
07:09 moritzm: restart etherpad-lite on etherpad1002 to pick up c-ares security updates
06:59 _joe_: deleting the staging deployment of mwdebug
05:55 marostegui@cumin1001: END (FAIL) - Cookbook sre.hosts.downtime (exit_code=99) for 2:00:00 on db2107.codfw.wmnet with reason: REIMAGE
05:53 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on db2107.codfw.wmnet with reason: REIMAGE
05:22 marostegui: Stop replication on db2107 T287454
05:18 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db2107 T287454', diff saved to https://phabricator.wikimedia.org/P16999 and previous config saved to /var/cache/conftool/dbconfig/20210811-051856-marostegui.json
05:10 marostegui@cumin1001: dbctl commit (dc=all): 'Promote db2104 to s2 master and set section read-write T287454', diff saved to https://phabricator.wikimedia.org/P16998 and previous config saved to /var/cache/conftool/dbconfig/20210811-051041-root.json
05:00 marostegui@cumin1001: dbctl commit (dc=all): 'Set s2 codfw as read-only for maintenance - T287454', diff saved to https://phabricator.wikimedia.org/P16997 and previous config saved to /var/cache/conftool/dbconfig/20210811-050040-marostegui.json
05:00 marostegui: Starting s2 codfw failover from db2107 to db2104 - T287454
04:16 marostegui@cumin1001: dbctl commit (dc=all): 'Set db2104 with weight 0 T287454', diff saved to https://phabricator.wikimedia.org/P16996 and previous config saved to /var/cache/conftool/dbconfig/20210811-041625-root.json
04:15 root@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on 24 hosts with reason: Master switchover s2 T287454
04:15 root@cumin1001: START - Cookbook sre.hosts.downtime for 1:00:00 on 24 hosts with reason: Master switchover s2 T287454
03:45 razzi@cumin1001: END (FAIL) - Cookbook sre.aqs.roll-restart (exit_code=99) for AQS aqs cluster: Roll restart of all AQS's nodejs daemons. - razzi@cumin1001
03:45 razzi@cumin1001: START - Cookbook sre.aqs.roll-restart for AQS aqs cluster: Roll restart of all AQS's nodejs daemons. - razzi@cumin1001
01:49 dpifke@deploy1002: Finished deploy [performance/navtiming@12d8381]: Revert https://gerrit.wikimedia.org/r/c/performance/navtiming/+/693423 (duration: 00m 05s)
01:49 dpifke@deploy1002: Started deploy [performance/navtiming@12d8381]: Revert https://gerrit.wikimedia.org/r/c/performance/navtiming/+/693423
01:47 dpifke@deploy1002: Finished deploy [performance/navtiming@12d8381]: Deploying https://gerrit.wikimedia.org/r/c/performance/navtiming/+/693423 (duration: 00m 06s)
01:47 dpifke@deploy1002: Started deploy [performance/navtiming@12d8381]: Deploying https://gerrit.wikimedia.org/r/c/performance/navtiming/+/693423
01:38 legoktm@deploy1002: Synchronized docroot/noc/conf/index.php: noc: Expose primary datacenter on conf/ (duration: 01m 06s)
01:22 bstorm@cumin1001: END (PASS) - Cookbook wmcs.wikireplicas.add_wiki (exit_code=0)
01:22 bstorm@cumin1001: Added views for new wiki: jvwikisource T286245
01:00 bstorm@cumin1001: START - Cookbook wmcs.wikireplicas.add_wiki
00:38 bstorm@cumin1001: END (ERROR) - Cookbook wmcs.wikireplicas.add_wiki (exit_code=97)
00:36 bstorm@cumin1001: START - Cookbook wmcs.wikireplicas.add_wiki

2021-08-10

23:33 cjming@deploy1002: Synchronized wmf-config/InitialiseSettings.php: Config: Enable user links feature for pilot wikis, modern vector (T288274) (duration: 01m 08s)
23:18 legoktm@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
23:06 krinkle@deploy1002: Synchronized wmf-config/: I13e88c303a, T284418 (duration: 01m 07s)
23:02 legoktm@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
22:58 eileen: process-control config revision is 7bdc78073d
22:50 krinkle@deploy1002: Synchronized wmf-config/: I8052636, I2038702b7e0 (duration: 01m 21s)
21:50 robh@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on mc1054.eqiad.wmnet with reason: REIMAGE
21:48 robh@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on mc1053.eqiad.wmnet with reason: REIMAGE
21:46 robh@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on mc1054.eqiad.wmnet with reason: REIMAGE
21:46 robh@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on mc1052.eqiad.wmnet with reason: REIMAGE
21:44 robh@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on mc1053.eqiad.wmnet with reason: REIMAGE
21:44 robh@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on mc1051.eqiad.wmnet with reason: REIMAGE
21:42 robh@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on mc1052.eqiad.wmnet with reason: REIMAGE
21:42 robh@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on mc1050.eqiad.wmnet with reason: REIMAGE
21:40 robh@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on mc1051.eqiad.wmnet with reason: REIMAGE
21:40 ryankemper: [WDQS] `ryankemper@wdqs2005:~$ sudo pool`
21:40 robh@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on mc1049.eqiad.wmnet with reason: REIMAGE
21:40 ryankemper: T288501 `ryankemper@wdqs2003:~$ sudo pool`
21:38 robh@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on mc1050.eqiad.wmnet with reason: REIMAGE
21:37 robh@cumin1001: END (FAIL) - Cookbook sre.hosts.downtime (exit_code=99) for 2:00:00 on mc1048.eqiad.wmnet with reason: REIMAGE
21:36 robh@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on mc1049.eqiad.wmnet with reason: REIMAGE
21:35 robh@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on mc1048.eqiad.wmnet with reason: REIMAGE
21:35 robh@cumin1001: END (FAIL) - Cookbook sre.hosts.downtime (exit_code=99) for 2:00:00 on mc1047.eqiad.wmnet with reason: REIMAGE
21:33 robh@cumin1001: END (FAIL) - Cookbook sre.hosts.downtime (exit_code=99) for 2:00:00 on mc1046.eqiad.wmnet with reason: REIMAGE
21:32 robh@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on mc1047.eqiad.wmnet with reason: REIMAGE
21:30 robh@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on mc1046.eqiad.wmnet with reason: REIMAGE
21:08 jhuneidi@deploy1002: rebuilt and synchronized wikiversions files: Revert "group0 wikis to 1.37.0-wmf.18"
21:02 krinkle@deploy1002: Synchronized wmf-config/: I3b54d163b6 (duration: 01m 09s)
20:54 krinkle@deploy1002: Synchronized wmf-config/CommonSettings.php: If7a8d6b6 (duration: 01m 22s)
20:43 andrew@cumin1001: END (FAIL) - Cookbook sre.hosts.downtime (exit_code=99) for 2:00:00 on cloudvirt1038.eqiad.wmnet with reason: REIMAGE
20:42 krinkle@deploy1002: Synchronized wmf-config/: Ic5ff34b (duration: 01m 08s)
20:40 andrew@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on cloudvirt1038.eqiad.wmnet with reason: REIMAGE
20:37 robh@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on mc1045.eqiad.wmnet with reason: REIMAGE
20:35 robh@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on mc1044.eqiad.wmnet with reason: REIMAGE
20:34 robh@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on mc1045.eqiad.wmnet with reason: REIMAGE
20:33 robh@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on mc1043.eqiad.wmnet with reason: REIMAGE
20:32 robh@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on mc1044.eqiad.wmnet with reason: REIMAGE
20:31 krinkle@deploy1002: Synchronized docroot/noc/: Ic013a93998f (duration: 01m 37s)
20:31 robh@cumin1001: END (FAIL) - Cookbook sre.hosts.downtime (exit_code=99) for 2:00:00 on mc1042.eqiad.wmnet with reason: REIMAGE
20:30 robh@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on mc1043.eqiad.wmnet with reason: REIMAGE
20:29 robh@cumin1001: END (FAIL) - Cookbook sre.hosts.downtime (exit_code=99) for 2:00:00 on mc1041.eqiad.wmnet with reason: REIMAGE
20:28 robh@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on mc1042.eqiad.wmnet with reason: REIMAGE
20:26 robh@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on mc1041.eqiad.wmnet with reason: REIMAGE
19:29 robh@cumin1001: END (FAIL) - Cookbook sre.hosts.downtime (exit_code=99) for 2:00:00 on mc1040.eqiad.wmnet with reason: REIMAGE
19:27 robh@cumin1001: END (FAIL) - Cookbook sre.hosts.downtime (exit_code=99) for 2:00:00 on mc1039.eqiad.wmnet with reason: REIMAGE
19:27 robh@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on mc1040.eqiad.wmnet with reason: REIMAGE
19:25 robh@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on mc1039.eqiad.wmnet with reason: REIMAGE
19:16 cmjohnson@cumin1001: END (ERROR) - Cookbook sre.dns.netbox (exit_code=97)
19:15 cmjohnson@cumin1001: START - Cookbook sre.dns.netbox
19:09 cmjohnson@cumin1001: END (FAIL) - Cookbook sre.hosts.downtime (exit_code=99) for 2:00:00 on dumpsdata1005.eqiad.wmnet with reason: REIMAGE
19:09 cmjohnson@cumin1001: END (FAIL) - Cookbook sre.hosts.downtime (exit_code=99) for 2:00:00 on ganeti1024.eqiad.wmnet with reason: REIMAGE
19:07 cmjohnson@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on dumpsdata1004.eqiad.wmnet with reason: REIMAGE
19:05 cmjohnson@cumin1001: END (FAIL) - Cookbook sre.hosts.downtime (exit_code=99) for 2:00:00 on ganeti1023.eqiad.wmnet with reason: REIMAGE
19:04 cmjohnson@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on dumpsdata1005.eqiad.wmnet with reason: REIMAGE
19:04 cmjohnson@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on ganeti1024.eqiad.wmnet with reason: REIMAGE
19:04 jhuneidi@deploy1002: rebuilt and synchronized wikiversions files: group0 wikis to 1.37.0-wmf.18 refs T281159
19:04 cmjohnson@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on dumpsdata1004.eqiad.wmnet with reason: REIMAGE
19:03 cmjohnson@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on ganeti1023.eqiad.wmnet with reason: REIMAGE
18:49 cmjohnson@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
18:47 ryankemper: [WDQS] `ryankemper@wdqs2005:~$ sudo depool` (~1.26 hours of lag)
18:46 cmjohnson@cumin1001: START - Cookbook sre.dns.netbox
18:46 ryankemper: T288501 (Misread grafana graph, `wdqs2003` only has 1.33 hours to catch up on)
18:45 ryankemper: T288501 `data-transfer` of `wikidata.jnl` completed successfully. Host needs to catch up on ~22 hours of WDQS lag before being re-pooled
18:42 ryankemper@cumin2001: END (PASS) - Cookbook sre.wdqs.data-transfer (exit_code=0)
17:23 jhuneidi@deploy1002: Finished scap: testwikis wikis to 1.37.0-wmf.18 (duration: 36m 35s)
17:19 ryankemper: T288501 `sudo -i cookbook sre.wdqs.data-transfer --source wdqs2005.codfw.wmnet --dest wdqs2003.codfw.wmnet --reason "transferring fresh wikidata journal to resolve disk issue" --blazegraph_instance blazegraph` on `cumin2001` tmux session `wdqs_data_xfer`
17:19 ryankemper@cumin2001: START - Cookbook sre.wdqs.data-transfer
17:18 mbsantos@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'mobileapps' for release 'production' .
17:13 ryankemper: T288501 [WDQS] `ryankemper@wdqs2003:~$ sudo rm -fv /srv/wdqs/wikidata.jnl`
17:09 razzi@cumin1001: END (FAIL) - Cookbook sre.druid.roll-restart-workers (exit_code=99) for Druid analytics cluster: Roll restart of Druid's jvm daemons. - razzi@cumin1001
17:09 razzi@cumin1001: START - Cookbook sre.druid.roll-restart-workers for Druid analytics cluster: Roll restart of Druid's jvm daemons. - razzi@cumin1001
17:06 mbsantos@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'mobileapps' for release 'production' .
17:02 btullis@cumin1001: END (FAIL) - Cookbook sre.druid.roll-restart-workers (exit_code=99) for Druid analytics cluster: Roll restart of Druid's jvm daemons. - btullis@cumin1001
17:02 btullis@cumin1001: START - Cookbook sre.druid.roll-restart-workers for Druid analytics cluster: Roll restart of Druid's jvm daemons. - btullis@cumin1001
17:01 mbsantos@deploy1002: helmfile [staging] Ran 'sync' command on namespace 'mobileapps' for release 'staging' .
16:49 btullis@cumin1001: END (FAIL) - Cookbook sre.druid.roll-restart-workers (exit_code=99) for Druid analytics cluster: Roll restart of Druid's jvm daemons. - btullis@cumin1001
16:49 btullis@cumin1001: START - Cookbook sre.druid.roll-restart-workers for Druid analytics cluster: Roll restart of Druid's jvm daemons. - btullis@cumin1001
16:47 jhuneidi@deploy1002: Started scap: testwikis wikis to 1.37.0-wmf.18
16:36 ebernhardson@deploy1002: Finished deploy [wikimedia/discovery/analytics@d3c5363]: T287225: Bump rdf-spark-tools to 0.3.81 (duration: 02m 10s)
16:34 ebernhardson@deploy1002: Started deploy [wikimedia/discovery/analytics@d3c5363]: T287225: Bump rdf-spark-tools to 0.3.81
16:33 btullis@cumin1001: END (FAIL) - Cookbook sre.druid.roll-restart-workers (exit_code=99) for Druid analytics cluster: Roll restart of Druid's jvm daemons. - btullis@cumin1001
16:33 btullis@cumin1001: START - Cookbook sre.druid.roll-restart-workers for Druid analytics cluster: Roll restart of Druid's jvm daemons. - btullis@cumin1001
16:25 brennen: gitlab: run ansible to apply fix shell for backup cronjob (T288324)
16:01 moritzm: installing c-ares security updates on buster
14:48 ladsgroup@deploy1002: Synchronized wmf-config/InitialiseSettings.php: Config: Reduce ten seconds from dispatch max time (T288175) (duration: 00m 58s)
13:32 moritzm: updating bullseye installations to the latest state of testing
13:19 moritzm: installing perl security updates on Bullseye (older distros not affected)
13:00 jayme@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
12:54 ppchelko@deploy1002: Finished deploy [restbase/deploy@5791a7a]: Add count parameter to recommendations API T287227 (duration: 37m 18s)
12:42 lucaswerkmeister-wmde@deploy1002: Synchronized tests/multiversion/StaticSettingsTest.php: Config: Remove wmgWBRepoConceptBaseUri (T257260) (3/3, test) (duration: 00m 57s)
12:41 lucaswerkmeister-wmde@deploy1002: Synchronized wmf-config/InitialiseSettings-labs.php: Config: Remove wmgWBRepoConceptBaseUri (T257260) (2/3, beta) (duration: 00m 57s)
12:39 lucaswerkmeister-wmde@deploy1002: Synchronized wmf-config/InitialiseSettings.php: Config: Remove wmgWBRepoConceptBaseUri (T257260) (1/3, prod) (duration: 00m 57s)
12:36 lucaswerkmeister-wmde@deploy1002: Synchronized wmf-config/Wikibase.php: Config: Stop setting $wgWBRepoSettings['conceptBaseUri'] (T257260) (duration: 00m 58s)
12:23 kormat: non-destructive (🤞) testing of db-switchover against s2/eqiad T288500
12:17 ppchelko@deploy1002: Started deploy [restbase/deploy@5791a7a]: Add count parameter to recommendations API T287227
11:27 dzahn@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 5 days, 8:00:00 on planet1002.eqiad.wmnet with reason: known issue
11:27 dzahn@cumin1001: START - Cookbook sre.hosts.downtime for 5 days, 8:00:00 on planet1002.eqiad.wmnet with reason: known issue
10:56 marostegui: Install 10.4.21 on db1169 (s1)
10:54 jayme@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
10:53 mutante: etherpad deleting 2 pads as requested in T288328
10:52 marostegui: Install 10.4.21 on db1096 (s5 and s6)
10:34 elukey@deploy1002: helmfile [ml-serve-eqiad] DONE helmfile.d/admin 'sync'.
10:34 elukey@deploy1002: helmfile [ml-serve-eqiad] START helmfile.d/admin 'sync'.
10:33 elukey@deploy1002: helmfile [ml-serve-eqiad] DONE helmfile.d/admin 'sync'.
10:33 elukey@deploy1002: helmfile [ml-serve-eqiad] START helmfile.d/admin 'sync'.
10:28 oblivian@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
10:27 oblivian@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
10:24 oblivian@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
09:55 lucaswerkmeister-wmde@deploy1002: Synchronized wmf-config/InitialiseSettings-labs.php: Config: Remove $wmgWikibaseClientRepoDatabase (T257260) (2/2, beta) (duration: 00m 57s)
09:54 lucaswerkmeister-wmde@deploy1002: Synchronized wmf-config/InitialiseSettings.php: Config: Remove $wmgWikibaseClientRepoDatabase (T257260) (1/2, prod) (duration: 00m 57s)
09:50 lucaswerkmeister-wmde@deploy1002: Synchronized wmf-config/Wikibase.php: Config: Stop setting $wgWBClientSettings['repoDatabase'] (T257260) (duration: 00m 58s)
09:47 jayme@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
09:23 ariel@deploy1002: Finished deploy [dumps/dumps@72ff209]: refuse to use info from corrupt run settings file (duration: 00m 03s)
09:22 ariel@deploy1002: Started deploy [dumps/dumps@72ff209]: refuse to use info from corrupt run settings file
09:17 kormat: running non-destructive test against s7/codfw (db2107/db2014) T288500
09:05 jayme@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
09:04 moritzm: removing stale Java 8 packages from logstash1024/1025/2023/2024/2025 (ELK7 Logstash cluster is on Java 11 for a while now)
09:00 oblivian@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
08:58 ariel@deploy1002: Finished deploy [dumps/dumps@170e394]: more resilience when reading bad run cache settings files (duration: 00m 03s)
08:58 ariel@deploy1002: Started deploy [dumps/dumps@170e394]: more resilience when reading bad run cache settings files
08:49 oblivian@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
08:20 elukey@deploy1002: helmfile [ml-serve-eqiad] DONE helmfile.d/admin 'sync'.
08:20 elukey@deploy1002: helmfile [ml-serve-eqiad] START helmfile.d/admin 'sync'.
08:19 jayme@deploy1002: helmfile [codfw] DONE helmfile.d/admin 'apply'.
08:18 jayme@deploy1002: helmfile [codfw] START helmfile.d/admin 'apply'.
08:16 jayme@deploy1002: helmfile [eqiad] DONE helmfile.d/admin 'apply'.
08:16 jayme@deploy1002: helmfile [eqiad] START helmfile.d/admin 'apply'.
08:15 jayme@deploy1002: helmfile [staging-eqiad] DONE helmfile.d/admin 'apply'.
08:15 jayme@deploy1002: helmfile [staging-eqiad] START helmfile.d/admin 'apply'.
08:15 jayme@deploy1002: helmfile [staging-codfw] DONE helmfile.d/admin 'apply'.
08:14 jayme@deploy1002: helmfile [staging-codfw] START helmfile.d/admin 'apply'.
08:06 godog: upload thanos 0.21.1-1 and upgrade prometheus1004 / thanos-fe2001 to it - T288326
08:03 moritzm: installing openjdk-8 security updates on stretch
07:33 moritzm: installing lynx security updates
05:56 marostegui@cumin1001: dbctl commit (dc=all): 'db2104 (re)pooling @ 100%: repool after failed switchover', diff saved to https://phabricator.wikimedia.org/P16987 and previous config saved to /var/cache/conftool/dbconfig/20210810-055642-root.json
05:41 marostegui@cumin1001: dbctl commit (dc=all): 'db2104 (re)pooling @ 75%: repool after failed switchover', diff saved to https://phabricator.wikimedia.org/P16986 and previous config saved to /var/cache/conftool/dbconfig/20210810-054139-root.json
05:26 marostegui@cumin1001: dbctl commit (dc=all): 'db2104 (re)pooling @ 50%: repool after failed switchover', diff saved to https://phabricator.wikimedia.org/P16985 and previous config saved to /var/cache/conftool/dbconfig/20210810-052635-root.json
05:11 marostegui@cumin1001: dbctl commit (dc=all): 'db2104 (re)pooling @ 25%: repool after failed switchover', diff saved to https://phabricator.wikimedia.org/P16984 and previous config saved to /var/cache/conftool/dbconfig/20210810-051131-root.json
05:06 marostegui@cumin1001: dbctl commit (dc=all): 'Set s2 as read-write again - master has not been swapped T287454', diff saved to https://phabricator.wikimedia.org/P16983 and previous config saved to /var/cache/conftool/dbconfig/20210810-050604-root.json
05:00 marostegui@cumin1001: dbctl commit (dc=all): 'Set s2 codfw as read-only for maintenance - T287454', diff saved to https://phabricator.wikimedia.org/P16982 and previous config saved to /var/cache/conftool/dbconfig/20210810-050051-root.json
05:00 marostegui: Starting s2 codfw failover from db2107 to db2104 - T287454
04:23 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on 24 hosts with reason: Master switchover s2 T287454
04:23 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 1:00:00 on 24 hosts with reason: Master switchover s2 T287454
04:16 marostegui@cumin1001: dbctl commit (dc=all): 'Set db2104 with weight 0 T287454', diff saved to https://phabricator.wikimedia.org/P16981 and previous config saved to /var/cache/conftool/dbconfig/20210810-041627-root.json
02:35 mwdebug-deploy@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
02:33 mwdebug-deploy@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
02:07 mwdebug-deploy@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
02:06 mwdebug-deploy@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .

2021-08-09

16:12 legoktm@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'shellbox-constraints' for release 'main' .
16:10 jayme@deploy1002: helmfile [codfw] DONE helmfile.d/admin 'apply'.
16:09 jayme@deploy1002: helmfile [codfw] START helmfile.d/admin 'apply'.
16:07 legoktm@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'shellbox-constraints' for release 'main' .
16:07 jayme@deploy1002: helmfile [eqiad] DONE helmfile.d/admin 'apply'.
16:07 jayme@deploy1002: helmfile [eqiad] START helmfile.d/admin 'apply'.
16:04 legoktm@deploy1002: helmfile [staging] Ran 'sync' command on namespace 'shellbox-constraints' for release 'main' .
16:03 jayme@deploy1002: helmfile [staging-eqiad] DONE helmfile.d/admin 'apply'.
16:03 jayme@deploy1002: helmfile [staging-eqiad] START helmfile.d/admin 'apply'.
16:03 jayme@deploy1002: helmfile [staging-codfw] DONE helmfile.d/admin 'apply'.
16:02 legoktm@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'shellbox' for release 'main' .
16:02 jayme@deploy1002: helmfile [staging-codfw] START helmfile.d/admin 'apply'.
16:00 legoktm@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'shellbox' for release 'main' .
16:00 jayme@deploy1002: helmfile [staging-codfw] DONE helmfile.d/admin 'apply'.
16:00 jayme@deploy1002: helmfile [staging-codfw] START helmfile.d/admin 'apply'.
15:57 legoktm@deploy1002: helmfile [staging] Ran 'sync' command on namespace 'shellbox' for release 'main' .
15:34 filippo@cumin1001: END (ERROR) - Cookbook sre.hosts.reboot-single (exit_code=97) for host ms-be2065.codfw.wmnet
15:33 filippo@cumin1001: END (ERROR) - Cookbook sre.hosts.reboot-single (exit_code=97) for host ms-be2064.codfw.wmnet
15:33 filippo@cumin1001: END (ERROR) - Cookbook sre.hosts.reboot-single (exit_code=97) for host ms-be2062.codfw.wmnet
14:17 sukhe: ran homer for Gerrit 710358: Set up BGP peering to doh5002 in eqsin
14:10 filippo@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ms-be2063.codfw.wmnet
14:09 hnowlan@puppetmaster1001: conftool action : set/pooled=no; selector: name=maps100[1234].eqiad.wmnet
14:06 jayme: re-enabled (and ran) puppet on all kubernetes nodes - T288345
14:05 filippo@cumin1001: START - Cookbook sre.hosts.reboot-single for host ms-be2065.codfw.wmnet
14:05 filippo@cumin1001: START - Cookbook sre.hosts.reboot-single for host ms-be2064.codfw.wmnet
14:05 filippo@cumin1001: END (ERROR) - Cookbook sre.hosts.reboot-single (exit_code=97) for host ms-be2063.codfw.wmnet
14:05 filippo@cumin1001: START - Cookbook sre.hosts.reboot-single for host ms-be2063.codfw.wmnet
14:04 filippo@cumin1001: START - Cookbook sre.hosts.reboot-single for host ms-be2063.codfw.wmnet
14:04 mwdebug-deploy@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
14:03 filippo@cumin1001: START - Cookbook sre.hosts.reboot-single for host ms-be2062.codfw.wmnet
14:03 mwdebug-deploy@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
14:02 reedy@deploy1002: Synchronized wmf-config/CommonSettings.php: T280886 UCoC comment update (duration: 00m 58s)
13:58 marostegui@cumin1001: dbctl commit (dc=all): 'db2128 (re)pooling @ 100%: After mysql restart', diff saved to https://phabricator.wikimedia.org/P16979 and previous config saved to /var/cache/conftool/dbconfig/20210809-135805-root.json
13:52 kormat: disabling puppet on all db hosts for roll-out of T285390
13:43 marostegui@cumin1001: dbctl commit (dc=all): 'db2128 (re)pooling @ 80%: After mysql restart', diff saved to https://phabricator.wikimedia.org/P16978 and previous config saved to /var/cache/conftool/dbconfig/20210809-134301-root.json
13:27 marostegui@cumin1001: dbctl commit (dc=all): 'db2128 (re)pooling @ 60%: After mysql restart', diff saved to https://phabricator.wikimedia.org/P16977 and previous config saved to /var/cache/conftool/dbconfig/20210809-132758-root.json
13:12 marostegui@cumin1001: dbctl commit (dc=all): 'db2128 (re)pooling @ 40%: After mysql restart', diff saved to https://phabricator.wikimedia.org/P16976 and previous config saved to /var/cache/conftool/dbconfig/20210809-131254-root.json
12:57 marostegui@cumin1001: dbctl commit (dc=all): 'db2128 (re)pooling @ 20%: After mysql restart', diff saved to https://phabricator.wikimedia.org/P16975 and previous config saved to /var/cache/conftool/dbconfig/20210809-125750-root.json
12:42 marostegui@cumin1001: dbctl commit (dc=all): 'db2128 (re)pooling @ 10%: After mysql restart', diff saved to https://phabricator.wikimedia.org/P16974 and previous config saved to /var/cache/conftool/dbconfig/20210809-124247-root.json
12:38 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db2128 T288398', diff saved to https://phabricator.wikimedia.org/P16973 and previous config saved to /var/cache/conftool/dbconfig/20210809-123852-marostegui.json
11:58 hnowlan@puppetmaster1001: conftool action : set/pooled=yes; selector: name=maps1005.eqiad.wmnet
11:53 jayme: running puppet on kubernetes staging nodes (-b1 -s10) - T288345
11:50 jayme: disabling puppet on all kubernetes nodes - T288345
11:45 mwdebug-deploy@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
11:44 Lucas_WMDE: EU backport+config window done
11:44 mwdebug-deploy@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
11:43 lucaswerkmeister-wmde@deploy1002: Synchronized wmf-config/InitialiseSettings.php: Config: Remove wmgWikibaseClientRepoNamespaces (T257260) (duration: 00m 57s)
11:39 lucaswerkmeister-wmde@deploy1002: Synchronized wmf-config/Wikibase.php: Config: Stop setting $wgWBClientSettings['repoNamespaces'] (T257260) (duration: 00m 57s)
11:37 mwdebug-deploy@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
11:35 lucaswerkmeister-wmde@deploy1002: Synchronized wmf-config/InitialiseSettings-labs.php: Config: Remove wmgWikibaseClientRepositories (T257260) (2/2, beta) (duration: 00m 56s)
11:34 mwdebug-deploy@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
11:34 lucaswerkmeister-wmde@deploy1002: Synchronized wmf-config/InitialiseSettings.php: Config: Remove wmgWikibaseClientRepositories (T257260) (1/2, prod) (duration: 00m 57s)
11:31 lucaswerkmeister-wmde@deploy1002: Synchronized wmf-config/Wikibase.php: Config: Stop setting $wgWBClientSettings['repositories'] (T257260) (duration: 00m 57s)
11:29 kormat@cumin1001: END (FAIL) - Cookbook sre.hosts.downtime (exit_code=99) for 2:00:00 on db1136.eqiad.wmnet with reason: REIMAGE
11:27 kormat@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on db1136.eqiad.wmnet with reason: REIMAGE
11:25 urbanecm: >>> \MediaWiki\MediaWikiServices::getInstance()->get('GrowthExperimentsWikiPageConfigLoader')->invalidate(Title::newFromText('MediaWiki:GrowthExperimentsConfig.json')) # dewiki shell.php; debugging Growth's wiki config
11:24 urbanecm@deploy1002: Synchronized wmf-config/config/dewiki.yaml: d656435: dewiki: Enable Growth features in dark mode (T288420; 3/3) (duration: 00m 57s)
11:23 urbanecm@deploy1002: Synchronized dblists/growthexperiments.dblist: d656435: dewiki: Enable Growth features in dark mode (T288420; 2/3) (duration: 00m 57s)
11:21 urbanecm@deploy1002: Synchronized wmf-config/InitialiseSettings.php: d656435: dewiki: Enable Growth features in dark mode (T288420; 1/3) (duration: 00m 57s)
11:19 mwdebug-deploy@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
11:17 mwdebug-deploy@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
11:16 urbanecm: [urbanecm@mwmaint2002 ~]$ mwscript extensions/GrowthExperiments/maintenance/initWikiConfig.php --wiki=dewiki --phab=T288420 # T288420
11:15 urbanecm: [urbanecm@mwmaint2002 ~]$ mwscript extensions/WikimediaMaintenance/createExtensionTables.php --wiki=dewiki growthexperiments # T288420
11:15 urbanecm@deploy1002: Synchronized dblists/commonsuploads.dblist: 9b9bb5b: Disable local uploads for non-administrators on nlwiki (T288386) (duration: 00m 57s)
11:11 mwdebug-deploy@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
11:11 urbanecm@deploy1002: Synchronized wmf-config/InitialiseSettings.php: 037aceb: Enable GeoData on zhwikinews (T287807) (duration: 00m 57s)
11:10 mwdebug-deploy@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
11:05 kormat@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 5:00:00 on 15 hosts with reason: Reimage db1136 (s7 primary) to buster T288244
11:04 kormat@cumin1001: START - Cookbook sre.hosts.downtime for 5:00:00 on 15 hosts with reason: Reimage db1136 (s7 primary) to buster T288244
11:04 urbanecm@deploy1002: Synchronized wmf-config/InitialiseSettings.php: 54c532f: Add *.happysrv.de to the wgCopyUploadsDomains allowlist of Wikimedia Commons (T288039) (duration: 00m 58s)
10:44 mwdebug-deploy@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
10:42 mwdebug-deploy@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
10:36 ladsgroup@deploy1002: Synchronized wmf-config/InitialiseSettings.php: Config: Enable shellbox constraint for commons wikis (T176312) (duration: 00m 57s)
10:35 mwdebug-deploy@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
10:33 mwdebug-deploy@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
10:31 awight@deploy1002: sync-file aborted: Config: [beta] Enable new VE template dialog sidebar (T286765) (duration: 00m 23s)
10:27 mwdebug-deploy@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
10:27 ladsgroup@deploy1002: Synchronized wmf-config/InitialiseSettings.php: Config: Enable post edit constraint jobs in all edits (T204031) (duration: 00m 58s)
10:26 mwdebug-deploy@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
09:55 mwdebug-deploy@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
09:50 mwdebug-deploy@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
09:49 ladsgroup@deploy1002: Synchronized wmf-config/InitialiseSettings.php: Config: Increase post edit constraint jobs to 85% of edits (T204031) (duration: 00m 58s)
09:46 hnowlan@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on maps1005.eqiad.wmnet with reason: REIMAGE
09:44 hnowlan@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on maps1005.eqiad.wmnet with reason: REIMAGE
09:31 hnowlan@puppetmaster1001: conftool action : set/pooled=no; selector: name=maps200[1234].codfw.wmnet
08:46 godog: upgrade prometheus on prometheus2004 - T222113
08:41 godog: upgrade prometheus on prometheus1004 - T222113
08:36 jynus@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on dbprov2002.codfw.wmnet with reason: REIMAGE
08:34 jynus@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on dbprov2002.codfw.wmnet with reason: REIMAGE
08:24 marostegui: Upgrade db1117 (all sections) to 10.4.19
08:03 ariel@deploy1002: Finished deploy [dumps/dumps@142e91c]: fix for T288192 runnerutils bug (duration: 00m 03s)
08:03 ariel@deploy1002: Started deploy [dumps/dumps@142e91c]: fix for T288192 runnerutils bug
07:52 marostegui@cumin1001: dbctl commit (dc=all): 'Repool db1160 T288273', diff saved to https://phabricator.wikimedia.org/P16971 and previous config saved to /var/cache/conftool/dbconfig/20210809-075212-marostegui.json
07:50 mwdebug-deploy@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
07:45 mwdebug-deploy@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
07:30 ladsgroup@deploy1002: Synchronized wmf-config/InitialiseSettings.php: Config: Enable shellbox for constraint for all of wikidata (T176312) (duration: 00m 58s)
07:15 marostegui: Stop db1117:3323 to clone db1107 - T288197
07:05 kart__: Updated cxserver to 2021-08-06-062053-production (T288272)
07:04 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on db1107.eqiad.wmnet with reason: REIMAGE
07:02 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on db1107.eqiad.wmnet with reason: REIMAGE
06:53 kartik@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'cxserver' for release 'production' .
06:45 kartik@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'cxserver' for release 'production' .
05:56 XioNoX: enable cloudsw1-c8 interfaces toward cloudsw2-c8 - T277340
05:23 marostegui: Lag in s4 (commonswiki) will appear on clouddb* hosts (wiki replicas) T288273
05:23 marostegui: Optimize commonswiki.image on eqiad, lag will appear - T288273

2021-08-06

19:17 cmjohnson@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
19:12 cmjohnson@cumin1001: START - Cookbook sre.dns.netbox
19:04 cmjohnson@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
18:53 cmjohnson@cumin1001: START - Cookbook sre.dns.netbox
18:53 cmjohnson@cumin1001: END (ERROR) - Cookbook sre.dns.netbox (exit_code=97)
18:52 cmjohnson@cumin1001: START - Cookbook sre.dns.netbox
18:45 cmjohnson@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
18:41 cmjohnson@cumin1001: START - Cookbook sre.dns.netbox
18:40 cmjohnson@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
18:36 cmjohnson@cumin1001: START - Cookbook sre.dns.netbox
17:39 brennen: gitlab: run ansible to apply remove backup warning for config backups (T288324)
16:59 hnowlan@puppetmaster1001: conftool action : set/pooled=yes; selector: name=maps2005.codfw.wmnet
16:56 mwdebug-deploy@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
16:50 mwdebug-deploy@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
16:38 dzahn@cumin1001: END (PASS) - Cookbook sre.hosts.decommission (exit_code=0) for hosts peek2001.codfw.wmnet
16:34 mwdebug-deploy@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
16:34 hnowlan@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 3 days, 0:00:00 on maps1005.eqiad.wmnet with reason: Awaiting reimaging, depooled.
16:34 hnowlan@cumin1001: START - Cookbook sre.hosts.downtime for 3 days, 0:00:00 on maps1005.eqiad.wmnet with reason: Awaiting reimaging, depooled.
16:30 mwdebug-deploy@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
16:30 dzahn@cumin1001: START - Cookbook sre.hosts.decommission for hosts peek2001.codfw.wmnet
16:29 dzahn@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 8 days, 4:00:00 on peek2001.codfw.wmnet with reason: decom
16:29 dzahn@cumin1001: START - Cookbook sre.hosts.downtime for 8 days, 4:00:00 on peek2001.codfw.wmnet with reason: decom
16:03 mwdebug-deploy@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
16:02 mwdebug-deploy@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
15:14 hnowlan: removing maps1005 from old maps cassandra cluster before reimaging
14:35 hnowlan@puppetmaster1001: conftool action : set/pooled=no; selector: name=maps1005.eqiad.wmnet
14:29 hnowlan@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 3:00:00 on maps2005.codfw.wmnet with reason: Reimaging
14:29 hnowlan@cumin1001: START - Cookbook sre.hosts.downtime for 3:00:00 on maps2005.codfw.wmnet with reason: Reimaging
14:26 hnowlan@cumin2002: END (FAIL) - Cookbook sre.hosts.downtime (exit_code=99) for 2:00:00 on maps2005.codfw.wmnet with reason: REIMAGE
14:24 hnowlan@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on maps2005.codfw.wmnet with reason: REIMAGE
13:35 hnowlan@puppetmaster1001: conftool action : set/pooled=yes; selector: name=maps1006.eqiad.wmnet
13:07 zpapierski@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'rdf-streaming-updater' for release 'main' .
12:56 godog: test thanos 0.22 on thanos-fe2001 - T288326
12:48 oblivian@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
12:34 oblivian@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
12:26 oblivian@deploy1002: helmfile [codfw] DONE helmfile.d/admin 'sync'.
12:25 oblivian@deploy1002: helmfile [codfw] START helmfile.d/admin 'sync'.
12:25 oblivian@deploy1002: helmfile [codfw] DONE helmfile.d/admin 'apply'.
12:25 oblivian@deploy1002: helmfile [codfw] START helmfile.d/admin 'apply'.
12:23 oblivian@deploy1002: helmfile [eqiad] DONE helmfile.d/admin 'sync'.
12:22 oblivian@deploy1002: helmfile [eqiad] START helmfile.d/admin 'sync'.
12:22 oblivian@deploy1002: helmfile [eqiad] DONE helmfile.d/admin 'sync'.
12:22 oblivian@deploy1002: helmfile [eqiad] START helmfile.d/admin 'sync'.
12:21 oblivian@deploy1002: helmfile [eqiad] DONE helmfile.d/admin 'sync'.
12:21 oblivian@deploy1002: helmfile [eqiad] START helmfile.d/admin 'sync'.
12:20 oblivian@deploy1002: helmfile [eqiad] DONE helmfile.d/admin 'apply'.
12:20 oblivian@deploy1002: helmfile [eqiad] START helmfile.d/admin 'apply'.
11:45 jayme: enabling dragonfly dfdaemon on kubernetes200*
11:16 hnowlan@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on maps1006.eqiad.wmnet with reason: REIMAGE
11:14 hnowlan@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on maps1006.eqiad.wmnet with reason: REIMAGE
10:16 kormat@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on db1181.eqiad.wmnet with reason: REIMAGE
10:14 kormat@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on db1181.eqiad.wmnet with reason: REIMAGE
09:58 kormat: reimaging db1181 (s7) to buster T288244
09:15 hnowlan@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on maps2005.codfw.wmnet with reason: Rebuilding as buster replica of maps1009
09:15 hnowlan@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on maps2005.codfw.wmnet with reason: Rebuilding as buster replica of maps1009
09:15 hnowlan@puppetmaster1001: conftool action : set/pooled=no; selector: name=maps2005.codfw.wmnet
09:14 dcausse@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'rdf-streaming-updater' for release 'main' .
08:38 elukey@deploy1002: helmfile [ml-serve-eqiad] DONE helmfile.d/admin 'sync'.
08:38 elukey@deploy1002: helmfile [ml-serve-eqiad] START helmfile.d/admin 'sync'.
08:31 elukey@deploy1002: helmfile [ml-serve-eqiad] DONE helmfile.d/admin 'sync'.
08:30 elukey@deploy1002: helmfile [ml-serve-eqiad] START helmfile.d/admin 'sync'.
08:10 elukey@deploy1002: helmfile [ml-serve-eqiad] DONE helmfile.d/admin 'sync'.
08:09 elukey@deploy1002: helmfile [ml-serve-eqiad] START helmfile.d/admin 'sync'.
07:58 godog: test thanos 0.21 on thanos-fe2001 - T288326
07:48 elukey@deploy1002: helmfile [ml-serve-eqiad] DONE helmfile.d/admin 'sync'.
07:48 elukey@deploy1002: helmfile [ml-serve-eqiad] START helmfile.d/admin 'sync'.
07:42 elukey@deploy1002: helmfile [ml-serve-eqiad] DONE helmfile.d/admin 'sync'.
07:36 dcausse@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'rdf-streaming-updater' for release 'main' .
07:32 elukey@deploy1002: helmfile [ml-serve-eqiad] START helmfile.d/admin 'sync'.
07:15 dcausse@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'rdf-streaming-updater' for release 'main' .
07:02 dcausse@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'rdf-streaming-updater' for release 'main' .
06:43 marostegui: Reboot db1107 to upgrade its kernel
05:47 marostegui: Optimize commonswiki.image on db1160 T288273
05:44 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db1160 T288273', diff saved to https://phabricator.wikimedia.org/P16965 and previous config saved to /var/cache/conftool/dbconfig/20210806-054433-marostegui.json
05:44 eileen: civicrm revision changed from 931b3defbe to c132d2f943, config revision is 3696499932
04:03 TimStarling: on mwmaint1002 mwscript extensions/SecurePoll/cli/wm-scripts/makeGlobalVoterList.php --wiki=mediawikiwiki --edit-count-table=bv2021_edits --list-name=board-vote-2021 --short-min-edits=20 --long-min-edits=300
04:00 eileen: civicrm revision changed from e52f569991 to 931b3defbe, config revision is 3696499932
03:54 tstarling@deploy1002: Synchronized php-1.37.0-wmf.17/extensions/SecurePoll/cli/wm-scripts/makeGlobalVoterList.php: need to run this script T288025 (duration: 00m 57s)
03:32 mwdebug-deploy@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
03:30 mwdebug-deploy@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
01:01 pt1979@cumin2002: END (FAIL) - Cookbook sre.hosts.downtime (exit_code=99) for 2:00:00 on ms-be2065.codfw.wmnet with reason: REIMAGE
00:58 pt1979@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on ms-be2065.codfw.wmnet with reason: REIMAGE
00:14 pt1979@cumin2002: END (FAIL) - Cookbook sre.hosts.downtime (exit_code=99) for 2:00:00 on ms-be2064.codfw.wmnet with reason: REIMAGE
00:12 pt1979@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on ms-be2064.codfw.wmnet with reason: REIMAGE
00:03 mwdebug-deploy@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
00:03 egardner@deploy1002: Synchronized php-1.37.0-wmf.17/extensions/MediaSearch: Backport: Revert "Open search result links in-place" (duration: 00m 58s)
00:01 mwdebug-deploy@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .

2021-08-05

23:39 pt1979@cumin2002: END (FAIL) - Cookbook sre.hosts.downtime (exit_code=99) for 2:00:00 on ms-be2063.codfw.wmnet with reason: REIMAGE
23:37 pt1979@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on ms-be2063.codfw.wmnet with reason: REIMAGE
23:17 mwdebug-deploy@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
23:16 legoktm@deploy1002: Synchronized php-1.37.0-wmf.17/includes/: Revert "Use CsrfTokenSet as CSRF token source" (T287542) (duration: 01m 03s)
23:00 pt1979@cumin2002: END (FAIL) - Cookbook sre.hosts.downtime (exit_code=99) for 2:00:00 on ms-be2062.codfw.wmnet with reason: REIMAGE
22:58 pt1979@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on ms-be2062.codfw.wmnet with reason: REIMAGE
22:53 legoktm@deploy1002: Synchronized php-1.37.0-wmf.16/includes/: Revert "Use CsrfTokenSet as CSRF token source" (T287542) (duration: 01m 02s)
22:52 mwdebug-deploy@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
22:51 mwdebug-deploy@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
22:12 jforrester@deploy1002: Synchronized php-1.37.0-wmf.17/includes/content/: T288191: Support deprecated Content::preSaveTransform override (2/2) (duration: 00m 55s)
22:11 jforrester@deploy1002: Synchronized php-1.37.0-wmf.17/includes/content/ContentHandler.php: T288191: Support deprecated Content::preSaveTransform override (1/2) (duration: 01m 00s)
22:10 mwdebug-deploy@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
22:07 mwdebug-deploy@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
21:42 mwdebug-deploy@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
21:41 jforrester@deploy1002: Synchronized php-1.37.0-wmf.17/skins/MonoBook/resources/screen-common.less: T288288 Restore visualClear style to MonoBook so that footer doesn't show in the interwiki list (duration: 01m 24s)
21:40 mwdebug-deploy@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
21:03 ejegg: updated payments-wiki from 72fe99abb1 to a70aaa7944
20:56 cmjohnson@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
20:51 cmjohnson@cumin1001: START - Cookbook sre.dns.netbox
20:48 pt1979@cumin2002: END (FAIL) - Cookbook sre.hosts.downtime (exit_code=99) for 2:00:00 on ms-be2062.codfw.wmnet with reason: REIMAGE
20:48 cmjohnson@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
20:46 pt1979@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on ms-be2062.codfw.wmnet with reason: REIMAGE
20:44 cmjohnson@cumin1001: START - Cookbook sre.dns.netbox
20:23 dduvall: 1.37.0-wmf.17 promoted to all wikis. no new errors or concerning rates (T281158). fixes for open UBN T288191 will be handled via backport (see task discussion)
20:18 cmjohnson@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
20:13 cmjohnson@cumin1001: START - Cookbook sre.dns.netbox
19:24 mwdebug-deploy@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
19:20 mwdebug-deploy@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
19:18 dduvall@deploy1002: rebuilt and synchronized wikiversions files: all wikis to 1.37.0-wmf.17
19:00 mwdebug-deploy@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
18:58 mwdebug-deploy@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
18:44 ladsgroup@deploy1002: Synchronized wmf-config/InitialiseSettings.php: Config: Increase the ratio for shellbox for constraints to 42% in Wikidata (T176312) (duration: 01m 06s)
18:38 mwdebug-deploy@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
18:37 mwdebug-deploy@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
18:28 ladsgroup@deploy1002: Synchronized wmf-config/InitialiseSettings.php: Config: Increase the ratio for shellbox for constraints to 21% in Wikidata (T176312) (duration: 01m 06s)
18:23 topranks: Adding peering to second router of Xiber LLC - AS393950 - on cr2-eqord (Equinix IX Chicago)
18:23 urbanecm@deploy1002: Synchronized wmf-config/InitialiseSettings.php: da36bc3: DiscussionTools: Make sourcemodetoolbar available everywhere (T287927) (duration: 01m 06s)
18:18 urbanecm@deploy1002: Synchronized wmf-config/InitialiseSettings.php: 0a14eb4: wikimediaEvents: Enable IP address copy action instrument on all wikis (T279540) (duration: 01m 07s)
18:17 urbanecm@deploy1002: Synchronized php-1.37.0-wmf.17/extensions/DiscussionTools/extension.json: 91f7c02: Change sourcemodetoolbar default to enabled when available (T287927) (duration: 01m 06s)
18:16 mwdebug-deploy@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
18:16 urbanecm@deploy1002: sync-file aborted: 91f7c02: Change sourcemodetoolbar default to enabled when available (T287927) (duration: 00m 04s)
18:15 urbanecm@deploy1002: Synchronized php-1.37.0-wmf.16/extensions/DiscussionTools/extension.json: 38a8658: Change sourcemodetoolbar default to enabled when available (T287927) (duration: 01m 06s)
18:15 mwdebug-deploy@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
17:55 mwdebug-deploy@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
17:54 mwdebug-deploy@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
17:49 ladsgroup@deploy1002: Synchronized wmf-config/InitialiseSettings.php: Config: Increase the shellbox ratio to 5% for wikidata (T176312) (duration: 01m 15s)
17:43 elukey: upgrade helm3 to 3.6.3-1 on release*, contint*, chartmuseum*, deploy2002 (1002 was already done before)
17:43 herron: rolling restart eqiad logstash cluster for java updates
17:41 ebernhardson: restart airflow-{scheduler|webserver} on an-airflow1001 to pickup deployed plugin changes
17:36 elukey@deploy1002: helmfile [ml-serve-eqiad] DONE helmfile.d/admin 'sync'.
17:32 ebernhardson@deploy1002: Finished deploy [wikimedia/discovery/analytics@9872df9]: pyspark generalization gerrit:709837 and 666774 (duration: 09m 01s)
17:26 elukey@deploy1002: helmfile [ml-serve-eqiad] START helmfile.d/admin 'sync'.
17:25 elukey@deploy1002: helmfile [ml-serve-eqiad] DONE helmfile.d/admin 'sync'.
17:25 Amir1: end of pdf rebuild on commonswiki (T275268)
17:23 ebernhardson@deploy1002: Started deploy [wikimedia/discovery/analytics@9872df9]: pyspark generalization gerrit:709837 and 666774
17:15 elukey@deploy1002: helmfile [ml-serve-eqiad] START helmfile.d/admin 'sync'.
16:52 mwdebug-deploy@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
16:51 mwdebug-deploy@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
16:48 hnowlan@puppetmaster1001: conftool action : set/pooled=yes; selector: name=maps2006.codfw.wmnet
16:48 ladsgroup@deploy1002: Synchronized wmf-config/InitialiseSettings.php: Config: Enable shellbox for constraints for 1% of wikidata (T176312) (duration: 01m 27s)
16:47 hnowlan@puppetmaster1001: conftool action : set/pooled=yes; selector: name=maps1010.eqiad.wmnet
16:42 elukey@deploy1002: helmfile [ml-serve-eqiad] DONE helmfile.d/admin 'sync'.
16:42 pt1979@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
16:42 elukey@deploy1002: helmfile [ml-serve-eqiad] START helmfile.d/admin 'sync'.
16:36 pt1979@cumin2002: START - Cookbook sre.dns.netbox
16:21 elukey@deploy1002: helmfile [ml-serve-eqiad] DONE helmfile.d/admin 'sync'.
16:21 elukey@deploy1002: helmfile [ml-serve-eqiad] START helmfile.d/admin 'sync'.
16:16 elukey@deploy1002: helmfile [ml-serve-eqiad] DONE helmfile.d/admin 'sync'.
16:16 elukey@deploy1002: helmfile [ml-serve-eqiad] START helmfile.d/admin 'sync'.
16:15 mbsantos@deploy1002: Finished deploy [tilerator/deploy@16dbc04]: maps2006: imposm: add codfw targets (duration: 00m 22s)
16:14 elukey@deploy1002: helmfile [ml-serve-eqiad] DONE helmfile.d/admin 'sync'.
16:14 mbsantos@deploy1002: Started deploy [tilerator/deploy@16dbc04]: maps2006: imposm: add codfw targets
16:14 elukey@deploy1002: helmfile [ml-serve-eqiad] START helmfile.d/admin 'sync'.
16:13 mbsantos@deploy1002: Finished deploy [tilerator/deploy@16dbc04]: maps2007: imposm: add codfw targets (duration: 00m 25s)
16:12 mbsantos@deploy1002: Started deploy [tilerator/deploy@16dbc04]: maps2007: imposm: add codfw targets
16:11 mbsantos@deploy1002: Finished deploy [tilerator/deploy@16dbc04]: maps2008: imposm: add codfw targets (duration: 00m 23s)
16:10 mbsantos@deploy1002: Started deploy [tilerator/deploy@16dbc04]: maps2008: imposm: add codfw targets
16:10 mwdebug-deploy@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
16:10 mbsantos@deploy1002: Finished deploy [tilerator/deploy@16dbc04]: maps2009: imposm: add codfw targets (duration: 00m 29s)
16:10 mbsantos@deploy1002: Started deploy [tilerator/deploy@16dbc04]: maps2009: imposm: add codfw targets
16:09 mbsantos@deploy1002: Finished deploy [tilerator/deploy@16dbc04]: maps2010: imposm: add codfw targets (duration: 00m 22s)
16:09 mbsantos@deploy1002: Started deploy [tilerator/deploy@16dbc04]: maps2010: imposm: add codfw targets
16:04 hnowlan: draining maps1006 from maps cassandra cluster
16:04 mbsantos@deploy1002: Finished deploy [kartotherian/deploy@0ea1846]: maps2006: tegola: mirror 5% of requests everywhere (duration: 00m 24s)
16:04 mwdebug-deploy@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
16:03 mbsantos@deploy1002: Started deploy [kartotherian/deploy@0ea1846]: maps2006: tegola: mirror 5% of requests everywhere
16:02 hnowlan@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on maps1006.eqiad.wmnet with reason: Rebuilding as buster replica of maps1009
16:02 hnowlan@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on maps1006.eqiad.wmnet with reason: Rebuilding as buster replica of maps1009
16:02 elukey@deploy1002: helmfile [ml-serve-eqiad] DONE helmfile.d/admin 'sync'.
16:02 elukey@deploy1002: helmfile [ml-serve-eqiad] START helmfile.d/admin 'sync'.
16:02 mbsantos@deploy1002: Finished deploy [kartotherian/deploy@0ea1846]: maps2010: tegola: mirror 5% of requests everywhere (duration: 00m 21s)
16:02 hnowlan@puppetmaster1001: conftool action : set/pooled=no; selector: name=maps1006.eqiad.wmnet
16:01 mbsantos@deploy1002: Started deploy [kartotherian/deploy@0ea1846]: maps2010: tegola: mirror 5% of requests everywhere
16:01 mbsantos@deploy1002: Finished deploy [kartotherian/deploy@0ea1846]: maps2009: tegola: mirror 5% of requests everywhere (duration: 00m 55s)
16:01 elukey@deploy1002: helmfile [ml-serve-eqiad] DONE helmfile.d/admin 'sync'.
16:00 elukey@deploy1002: helmfile [ml-serve-eqiad] START helmfile.d/admin 'sync'.
16:00 mbsantos@deploy1002: Started deploy [kartotherian/deploy@0ea1846]: maps2009: tegola: mirror 5% of requests everywhere
15:59 mbsantos@deploy1002: Finished deploy [kartotherian/deploy@0ea1846]: maps2008: tegola: mirror 5% of requests everywhere (duration: 00m 21s)
15:59 mbsantos@deploy1002: Started deploy [kartotherian/deploy@0ea1846]: maps2008: tegola: mirror 5% of requests everywhere
15:59 dcausse@deploy1002: helmfile [staging] Ran 'sync' command on namespace 'rdf-streaming-updater' for release 'main' .
15:59 mbsantos@deploy1002: Finished deploy [kartotherian/deploy@0ea1846]: tegola: mirror 5% of requests everywhere (duration: 00m 22s)
15:58 mbsantos@deploy1002: Started deploy [kartotherian/deploy@0ea1846]: tegola: mirror 5% of requests everywhere
15:57 mbsantos@deploy1002: deploy aborted: tegola: mirror 5% of requests everywhere (duration: 00m 03s)
15:57 mbsantos@deploy1002: Started deploy [kartotherian/deploy@0ea1846] (imposm): tegola: mirror 5% of requests everywhere
15:54 herron: rolling restart codfw logstash elasticsearch cluster for java updates
15:52 elukey: upgrade helm3 to 3.6.3-1 on deploy1002
15:28 vgutierrez: pool lvs2009 - T286881
15:27 mbsantos@deploy1002: Finished deploy [kartotherian/deploy@0a38bc5]: Deploy imposm to maps2006 (duration: 00m 20s)
15:27 mbsantos@deploy1002: Started deploy [kartotherian/deploy@0a38bc5]: Deploy imposm to maps2006
15:11 vgutierrez@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on lvs2009.codfw.wmnet with reason: T286881
15:11 vgutierrez@cumin1001: START - Cookbook sre.hosts.downtime for 1:00:00 on lvs2009.codfw.wmnet with reason: T286881
15:11 vgutierrez: depool lvs2009 - T286881
15:10 vgutierrez: pool lvs2008 - T286881
14:54 vgutierrez@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on lvs2008.codfw.wmnet with reason: T286881
14:53 vgutierrez@cumin1001: START - Cookbook sre.hosts.downtime for 1:00:00 on lvs2008.codfw.wmnet with reason: T286881
14:52 vgutierrez: depool lvs2008 - T286881
14:50 elukey: upload helm 3.6.3-1 to {buster,stretch}-wikimedia
14:30 hnowlan@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 3:00:00 on maps1010.eqiad.wmnet with reason: Reimaging
14:30 hnowlan@cumin1001: START - Cookbook sre.hosts.downtime for 3:00:00 on maps1010.eqiad.wmnet with reason: Reimaging
14:24 dzahn@cumin1001: END (PASS) - Cookbook sre.ganeti.makevm (exit_code=0) for new host doh5002.wikimedia.org
14:18 hnowlan@cumin1001: END (FAIL) - Cookbook sre.hosts.downtime (exit_code=99) for 2:00:00 on maps1010.eqiad.wmnet with reason: REIMAGE
14:16 hnowlan@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on maps1010.eqiad.wmnet with reason: REIMAGE
14:14 dzahn@cumin1001: START - Cookbook sre.ganeti.makevm for new host doh5002.wikimedia.org
14:12 hnowlan@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on maps2006.codfw.wmnet with reason: REIMAGE
14:10 hnowlan@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on maps2006.codfw.wmnet with reason: REIMAGE
14:00 oblivian@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
13:49 oblivian@deploy1002: helmfile [codfw] DONE helmfile.d/admin 'apply'.
13:48 oblivian@deploy1002: helmfile [codfw] START helmfile.d/admin 'apply'.
13:48 oblivian@deploy1002: helmfile [codfw] DONE helmfile.d/admin 'apply'.
13:48 oblivian@deploy1002: helmfile [codfw] START helmfile.d/admin 'apply'.
13:44 dzahn@cumin1001: END (FAIL) - Cookbook sre.ganeti.makevm (exit_code=99) for new host doh5002.wikimedia.org
13:43 dzahn@cumin1001: START - Cookbook sre.ganeti.makevm for new host doh5002.wikimedia.org
13:39 mutante: deleted reserved (not active) IP 103.102.166.5/28 from netbox (T284246)
13:39 mwdebug-deploy@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
13:35 oblivian@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
13:28 mwdebug-deploy@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
13:28 ladsgroup@deploy1002: Synchronized php-1.37.0-wmf.17/extensions/WikibaseQualityConstraints/src/ConstraintCheck/Checker/FormatChecker.php: Backport: Add 'constraint-regex-checker' to isEnabled() check as well (T176312) (duration: 01m 06s)
13:25 ladsgroup@deploy1002: Synchronized php-1.37.0-wmf.16/extensions/WikibaseQualityConstraints/src/ConstraintCheck/Checker/FormatChecker.php: Backport: Add 'constraint-regex-checker' to isEnabled() check as well (T176312) (duration: 01m 19s)
13:12 oblivian@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
13:08 oblivian@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
12:54 oblivian@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
12:52 oblivian@deploy1002: helmfile [eqiad] DONE helmfile.d/admin 'apply'.
12:52 oblivian@deploy1002: helmfile [eqiad] START helmfile.d/admin 'apply'.
12:44 oblivian@deploy1002: helmfile [eqiad] DONE helmfile.d/admin 'apply'.
12:44 oblivian@deploy1002: helmfile [eqiad] START helmfile.d/admin 'apply'.
12:44 oblivian@deploy1002: helmfile [eqiad] DONE helmfile.d/admin 'apply'.
12:44 oblivian@deploy1002: helmfile [eqiad] START helmfile.d/admin 'apply'.
11:59 hnowlan@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on maps1010.eqiad.wmnet with reason: Rebuilding as buster replica of maps1009
11:59 hnowlan@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on maps1010.eqiad.wmnet with reason: Rebuilding as buster replica of maps1009
11:58 hnowlan@puppetmaster1001: conftool action : set/pooled=no; selector: name=maps1010.eqiad.wmnet
11:55 hnowlan@puppetmaster1001: conftool action : set/pooled=yes; selector: name=maps1008.eqiad.wmnet
11:47 XioNoX: prepare cloudsw1-c8-eqiad for cloudsw2-c8 - T277340
11:41 hnowlan: removing maps2006 from old maps cassandra cluster
11:39 hnowlan@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on maps2006.codfw.wmnet with reason: Rebuilding as buster replica of maps2009
11:39 hnowlan@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on maps2006.codfw.wmnet with reason: Rebuilding as buster replica of maps2009
11:38 hnowlan@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on maps2008.codfw.wmnet with reason: Rebuilding as buster replica of maps2009
11:38 hnowlan@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on maps2008.codfw.wmnet with reason: Rebuilding as buster replica of maps2009
11:11 hnowlan@puppetmaster1001: conftool action : set/pooled=no; selector: name=maps2006.codfw.wmnet
11:07 topranks: Reconfiguring packet buffer partitioning on cloudsw-d5-eqiad T288037
11:06 ayounsi@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
11:01 ayounsi@cumin1001: START - Cookbook sre.dns.netbox
10:25 ladsgroup@deploy1002: Synchronized wmf-config/InitialiseSettings.php: Config: Add shellbox-constraint services and use them (T176312), Part III (duration: 01m 06s)
10:24 ladsgroup@deploy1002: Synchronized wmf-config/Wikibase.php: Config: Add shellbox-constraint services and use them (T176312), Part II (duration: 01m 07s)
10:23 ladsgroup@deploy1002: Synchronized wmf-config/ProductionServices.php: Config: Add shellbox-constraint services and use them (T176312), Part I (duration: 01m 07s)
10:15 mwdebug-deploy@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
10:11 vgutierrez: restart acme-chief on acmechief1001
10:06 jayme@cumin1001: END (PASS) - Cookbook sre.ganeti.makevm (exit_code=0) for new host dragonfly-supernode2001.codfw.wmnet
10:04 mwdebug-deploy@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
10:03 topranks: Reconfiguring packet buffer partitioning on cloudsw-c8-eqiad T288036
10:01 ladsgroup@deploy1002: Synchronized php-1.37.0-wmf.17/extensions/WikibaseQualityConstraints/src/ConstraintCheck/Checker/FormatChecker.php: Backport: Route Shellbox requests to 'constraint-regex-checker' service (T176312) (duration: 01m 06s)
09:59 ladsgroup@deploy1002: Synchronized php-1.37.0-wmf.16/extensions/WikibaseQualityConstraints/src/ConstraintCheck/Checker/FormatChecker.php: Backport: Route Shellbox requests to 'constraint-regex-checker' service (T176312) (duration: 01m 27s)
09:56 jayme@cumin1001: START - Cookbook sre.ganeti.makevm for new host dragonfly-supernode2001.codfw.wmnet
09:49 oblivian@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
09:36 hnowlan@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on maps1008.eqiad.wmnet with reason: REIMAGE
09:34 hnowlan@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on maps1008.eqiad.wmnet with reason: REIMAGE
09:28 oblivian@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
09:19 jayme@cumin1001: conftool action : set/pooled=yes; selector: name=registry2004.codfw.wmnet,dc=codfw,cluster=docker-registry
09:05 oblivian@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
08:28 godog: bounce grafana to apply new settings - T119719
08:00 marostegui: Failover m2 from db1107 to db1183 - T287852
06:54 godog: prometheus/ops codfw +100G
06:15 godog: add back thanos-be1003 sdf1 in thanos-swift
04:03 ejegg: re-enabled fundraising scheduled jobs (process-control)
03:03 ejegg: disabled fundraising scheduled jobs (process-control)
02:50 TimStarling: on mwmaint1002 killing populateEditCount.php for loginwiki -- it's slow but it's not going to find any edits
02:46 eileen: civicrm revision changed from d6baf291f4 to e52f569991, config revision is 360c8a1f08
01:26 Krinkle: krinkle@mwmaint1002 Temporarily grant myself `translationadmin` on wikimania2016wiki in order to approve an edit given FlaggedRevs-like nature of Translate
00:24 legoktm@deploy1002: Synchronized wmf-config/InitialiseSettings.php: Remove DynamicPageList from all Wikimania wikis except 2016 (T287916) (duration: 01m 52s)

2021-08-04

22:18 ebernhardson@deploy1002: Finished deploy [wikimedia/discovery/analytics@34cd541]: gerrit:709835 and 709836 (duration: 06m 52s)
22:11 ebernhardson@deploy1002: Started deploy [wikimedia/discovery/analytics@34cd541]: gerrit:709835 and 709836
20:56 legoktm@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
20:21 legoktm@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
19:22 dduvall: 1.37.0-wmf.17 promoted to group1. no new errors or troubling error rates spotted (T281158)
19:16 mwdebug-deploy@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
19:12 dduvall@deploy1002: Synchronized php: group1 wikis to 1.37.0-wmf.17 (duration: 01m 15s)
19:11 dduvall@deploy1002: rebuilt and synchronized wikiversions files: group1 wikis to 1.37.0-wmf.17
18:45 mwdebug-deploy@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
18:41 urbanecm@deploy1002: Synchronized php-1.37.0-wmf.17/extensions/GrowthExperiments/: 5c3ac58: Fix array key handling for GEHelpPanelLinks in on-wiki config (T288023) (duration: 01m 08s)
18:34 dzahn@cumin1001: conftool action : set/pooled=yes; selector: name=mw2380.codfw.wmnet
18:33 dzahn@cumin1001: conftool action : set/pooled=yes; selector: name=mw2379.codfw.wmnet
18:30 urbanecm@deploy1002: Synchronized php-1.37.0-wmf.16/extensions/GrowthExperiments/: 36a2b9f: Fix array key handling for GEHelpPanelLinks in on-wiki config (T288023) (duration: 01m 06s)
18:28 mwdebug-deploy@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
18:25 mutante: mw2379, mw2380 - scap pull
18:16 brennen: gitlab1001: upgrading to 13.12.9
18:11 mwdebug-deploy@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
18:11 brennen: gitlab2001: upgrading to 13.12.9
18:10 otto@deploy1002: Synchronized wmf-config/InitialiseSettings.php: Enable canary events by default - T287789 (duration: 01m 06s)
18:03 dzahn@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on mw2380.codfw.wmnet with reason: reimage
18:03 dzahn@cumin1001: START - Cookbook sre.hosts.downtime for 1:00:00 on mw2380.codfw.wmnet with reason: reimage
18:01 dzahn@cumin1001: END (FAIL) - Cookbook sre.hosts.downtime (exit_code=99) for 2:00:00 on mw2380.codfw.wmnet with reason: REIMAGE
18:01 dzahn@cumin1001: END (FAIL) - Cookbook sre.hosts.downtime (exit_code=99) for 1:00:00 on mw2380.codfw.wmnet with reason: reimage
18:01 dzahn@cumin1001: START - Cookbook sre.hosts.downtime for 1:00:00 on mw2380.codfw.wmnet with reason: reimage
18:01 dzahn@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on mw2379.codfw.wmnet with reason: reimage
18:01 dzahn@cumin1001: START - Cookbook sre.hosts.downtime for 1:00:00 on mw2379.codfw.wmnet with reason: reimage
17:59 dzahn@cumin1001: END (FAIL) - Cookbook sre.hosts.downtime (exit_code=99) for 2:00:00 on mw2379.codfw.wmnet with reason: REIMAGE
17:59 dzahn@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on mw2380.codfw.wmnet with reason: REIMAGE
17:57 dzahn@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on mw2379.codfw.wmnet with reason: REIMAGE
17:49 dzahn@cumin1001: conftool action : set/pooled=yes; selector: name=mw2378.codfw.wmnet
17:49 dzahn@cumin1001: conftool action : set/pooled=yes; selector: name=mw2377.codfw.wmnet
17:46 dzahn@cumin1001: conftool action : set/weight=30; selector: name=mw2380.codfw.wmnet
17:46 dzahn@cumin1001: conftool action : set/weight=30; selector: name=mw237[7-9].codfw.wmnet
17:45 dzahn@cumin1001: conftool action : set/pooled=yes; selector: name=mw2357.codfw.wmnet
17:41 dzahn@cumin1001: conftool action : set/weight=25; selector: name=mw2357.codfw.wmnet
17:40 mutante: mw2357, mw2377, mw2378 - scap pull
17:40 dzahn@cumin1001: conftool action : set/weight=30; selector: name=mw2357.codfw.wmnet
17:39 mwdebug-deploy@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
17:29 dzahn@cumin1001: conftool action : set/weight=25; selector: name=mw238[1-2].codfw.wmnet
17:29 mwdebug-deploy@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
17:27 ejegg: updated payments-wiki config to 360c8a1f08
17:26 dzahn@cumin1001: conftool action : set/pooled=yes; selector: name=mw2355.codfw.wmnet
17:25 dzahn@cumin1001: conftool action : set/pooled=yes; selector: name=mw2353.codfw.wmnet
17:25 dzahn@cumin1001: conftool action : set/pooled=yes; selector: name=mw2351.codfw.wmnet
17:25 dzahn@cumin1001: conftool action : set/weight=25; selector: name=mw2355.codfw.wmnet
17:25 dzahn@cumin1001: conftool action : set/weight=25; selector: name=mw2353.codfw.wmnet
17:25 dzahn@cumin1001: conftool action : set/weight=25; selector: name=mw2351.codfw.wmnet
17:15 dzahn@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on mw2378.codfw.wmnet with reason: REIMAGE
17:13 dzahn@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on mw2377.codfw.wmnet with reason: REIMAGE
17:12 urbanecm@deploy1002: Synchronized php-1.37.0-wmf.17/extensions/GrowthExperiments/maintenance/updateMenteeData.php: 66c2c75: updateMenteeData: Output how long the script took (T287964) (duration: 01m 07s)
17:11 dzahn@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on mw2378.codfw.wmnet with reason: REIMAGE
17:11 dzahn@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on mw2357.codfw.wmnet with reason: REIMAGE
17:10 elukey@deploy1002: helmfile [ml-serve-eqiad] DONE helmfile.d/admin 'sync'.
17:10 elukey@deploy1002: helmfile [ml-serve-eqiad] START helmfile.d/admin 'sync'.
17:09 dzahn@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on mw2377.codfw.wmnet with reason: REIMAGE
17:08 dzahn@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on mw2357.codfw.wmnet with reason: REIMAGE
16:57 mwdebug-deploy@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
16:55 mutante: mw2351, mw2353, mw2355 - scap pull
16:40 cmjohnson@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
16:37 cmjohnson@cumin1001: START - Cookbook sre.dns.netbox
16:25 dzahn@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 4:00:00 on mw2355.codfw.wmnet with reason: reimage
16:25 dzahn@cumin1001: START - Cookbook sre.hosts.downtime for 4:00:00 on mw2355.codfw.wmnet with reason: reimage
16:23 dzahn@cumin1001: END (FAIL) - Cookbook sre.hosts.downtime (exit_code=99) for 2:00:00 on mw2355.codfw.wmnet with reason: REIMAGE
16:23 dzahn@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 4:00:00 on mw2357.codfw.wmnet with reason: reimage
16:22 dzahn@cumin1001: START - Cookbook sre.hosts.downtime for 4:00:00 on mw2357.codfw.wmnet with reason: reimage
16:22 dzahn@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 4:00:00 on mw2353.codfw.wmnet with reason: reimage
16:22 dzahn@cumin1001: START - Cookbook sre.hosts.downtime for 4:00:00 on mw2353.codfw.wmnet with reason: reimage
16:21 dzahn@cumin1001: END (FAIL) - Cookbook sre.hosts.downtime (exit_code=99) for 4:00:00 on mw2353.codfw.wmnet with reason: reimage
16:21 dzahn@cumin1001: START - Cookbook sre.hosts.downtime for 4:00:00 on mw2353.codfw.wmnet with reason: reimage
16:21 dzahn@cumin1001: END (FAIL) - Cookbook sre.hosts.downtime (exit_code=99) for 2:00:00 on mw2353.codfw.wmnet with reason: REIMAGE
16:21 joe: find . -type f -delete on /var/cache/nginx-docker-registry on registry2*, the disk is too small for unbound cache *and* accepting large uploads
16:20 dzahn@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on mw2355.codfw.wmnet with reason: REIMAGE
16:19 dzahn@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on mw2351.codfw.wmnet with reason: REIMAGE
16:18 dzahn@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on mw2353.codfw.wmnet with reason: REIMAGE
16:16 dzahn@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on mw2351.codfw.wmnet with reason: REIMAGE
16:15 hnowlan@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on maps1008.eqiad.wmnet with reason: Rebuilding as buster replica of maps1009
16:15 hnowlan@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on maps1008.eqiad.wmnet with reason: Rebuilding as buster replica of maps1009
16:14 hnowlan: draining maps1008 from cassandra cluster
16:13 hnowlan@puppetmaster1001: conftool action : set/pooled=no; selector: name=maps1008.eqiad.wmnet
16:02 dzahn@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 4:00:00 on mw2357.codfw.wmnet with reason: reimage
16:02 dzahn@cumin1001: START - Cookbook sre.hosts.downtime for 4:00:00 on mw2357.codfw.wmnet with reason: reimage
16:01 dzahn@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 4:00:00 on mw2380.codfw.wmnet with reason: reimage
16:01 dzahn@cumin1001: START - Cookbook sre.hosts.downtime for 4:00:00 on mw2380.codfw.wmnet with reason: reimage
16:01 dzahn@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 4:00:00 on mw[2377-2379].codfw.wmnet with reason: reimage
16:01 dzahn@cumin1001: START - Cookbook sre.hosts.downtime for 4:00:00 on mw[2377-2379].codfw.wmnet with reason: reimage
15:58 mutante: mw2351, mw2353, mw2355, mw2357 - converting from appserver to jobrunner, mw2377, mw2378, mw2379, mw2380 - converting from jobrunner to appserver - for balancing of server types over rows
15:51 dzahn@cumin1001: conftool action : set/pooled=inactive; selector: name=mw2380.codfw.wmnet
15:50 dzahn@cumin1001: conftool action : set/pooled=inactive; selector: name=mw237[789].codfw.wmnet
15:48 dzahn@cumin1001: conftool action : set/pooled=inactive; selector: name=mw235[1357].codfw.wmnet
15:47 dzahn@cumin1001: conftool action : set/pooled=inactive; selector: name=mw235[1357].wmnet
14:30 godog: upgrade prometheus on cloudmetrics hosts - T222113
14:28 godog: upgrade prometheus on prometheus4001 - T222113
14:19 moritzm: imported gitlab-ce 13.12.9 to thirdparty/gitlab T287671
14:18 mwdebug-deploy@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
14:17 godog: depool prometheus2004 and pool prometheus2003 - T222113
14:13 kormat@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on 18 hosts with reason: Firmware upgrade on db1104 (s8 primary) T286226
14:13 kormat@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on 18 hosts with reason: Firmware upgrade on db1104 (s8 primary) T286226
14:12 mwdebug-deploy@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
14:02 dcausse@deploy1002: helmfile [staging] Ran 'sync' command on namespace 'rdf-streaming-updater' for release 'main' .
13:55 mwdebug-deploy@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
13:50 urbanecm@deploy1002: Synchronized wmf-config/InitialiseSettings.php: 5d7255c: jvwikisource: Add author namespace (T286241) (duration: 01m 06s)
13:49 mwdebug-deploy@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
13:32 mwdebug-deploy@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
13:21 mwdebug-deploy@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
13:19 urbanecm: jvwikisource was created (T286241)
13:19 urbanecm@deploy1002: Synchronized wmf-config/interwiki.php: Update interwiki cache (duration: 03m 11s)
13:18 volans: upgraded python3-wmflib to v0.0.9 fleet wide
13:15 urbanecm@deploy1002: Synchronized wmf-config/InitialiseSettings.php: Creating jvwikisource (T286241) (duration: 01m 06s)
13:14 urbanecm@deploy1002: Synchronized wmf-config/logos.php: Creating jvwikisource (T286241) (duration: 01m 06s)
13:10 urbanecm@deploy1002: Synchronized static/images/project-logos/: Creating jvwikisource (T286241) (duration: 01m 07s)
13:09 urbanecm@deploy1002: rebuilt and synchronized wikiversions files: Creating jvwikisource (T286241)
13:08 urbanecm@deploy1002: Synchronized dblists: Creating jvwikisource (T286241) (duration: 01m 07s)
13:07 urbanecm@deploy1002: Synchronized wmf-config/db-codfw.php: Creating jvwikisource (T286241) (duration: 01m 07s)
13:05 urbanecm@deploy1002: Synchronized wmf-config/db-eqiad.php: Creating jvwikisource (T286241) (duration: 01m 08s)
12:23 godog: depool prometheus2004 for upgrade - T222113
12:19 mwdebug-deploy@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
12:07 marostegui@cumin1001: dbctl commit (dc=all): 'Repool db1129', diff saved to https://phabricator.wikimedia.org/P16958 and previous config saved to /var/cache/conftool/dbconfig/20210804-120725-marostegui.json
12:03 mwdebug-deploy@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
11:53 reedy@deploy1002: Synchronized docroot/mediawiki.org/xml/index.html: T288040 (duration: 01m 08s)
11:46 mwdebug-deploy@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
11:43 moritzm: installing testvm2001 T286206
11:36 marostegui@cumin1001: dbctl commit (dc=all): 'Repool db1170:3317 and db1101:3317 T286888!', diff saved to https://phabricator.wikimedia.org/P16957 and previous config saved to /var/cache/conftool/dbconfig/20210804-113623-marostegui.json
11:24 phuedx@deploy1002: Synchronized php-1.37.0-wmf.17/extensions/SecurePoll: Backport: Use real transactions when creating an election (duration: 01m 08s)
11:21 phuedx@deploy1002: Synchronized php-1.37.0-wmf.16/extensions/SecurePoll: Backport: Use real transactions when creating an election (duration: 01m 19s)
10:53 jayme: running puppet on eqiad appservers
10:48 jayme: switch most eqiad appservers to appserver_dragonly role for testing - T286054
10:29 jayme: importing dragonfly 1.0.6-1 (downgrade from 1.0.6-2) to buster-wikimedia and stretch-wikimedia - T286054
10:17 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db1101:3317 T286888', diff saved to https://phabricator.wikimedia.org/P16955 and previous config saved to /var/cache/conftool/dbconfig/20210804-101719-marostegui.json
09:10 jmm@cumin2002: END (PASS) - Cookbook sre.ganeti.makevm (exit_code=0) for new host testvm2001.codfw.wmnet
09:10 volans: uploaded python3-wmflib_0.0.9 to apt.wikimedia.org buster-wikimedia,bullseye-wikimedia
08:55 legoktm@cumin1001: conftool action : set/pooled=true; selector: name=codfw,dnsdisc=shellbox-constraints
08:51 jmm@cumin2002: START - Cookbook sre.ganeti.makevm for new host testvm2001.codfw.wmnet
08:41 godog: pool prometheus1003 (and depool prometheus1004 for testing 1003 only) - T222113
08:27 legoktm: restarting pybal on lvs2009 to add shellbox-constraints service
08:24 legoktm: restarting pybal on lvs1015 to add shellbox-constraints service
08:22 legoktm: restarting pybal on lvs2010 to add shellbox-constraints service
08:18 legoktm: restarting pybal on lvs1016 to add shellbox-constraints service
08:00 godog: upgrade prometheus1003 - T222113
06:53 moritzm: installing testvm2002 T286206
06:45 marostegui@cumin1001: dbctl commit (dc=all): 'Repool db1174 and db1127 T286763', diff saved to https://phabricator.wikimedia.org/P16954 and previous config saved to /var/cache/conftool/dbconfig/20210804-064548-marostegui.json
06:03 marostegui@cumin1001: dbctl commit (dc=all): 'Repool db1170:3312, db1105:3312, db1105:3311 T286888', diff saved to https://phabricator.wikimedia.org/P16953 and previous config saved to /var/cache/conftool/dbconfig/20210804-060347-marostegui.json
05:35 joe: docker image prune on releases1002, T288024
05:07 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db1105:3311', diff saved to https://phabricator.wikimedia.org/P16952 and previous config saved to /var/cache/conftool/dbconfig/20210804-050751-marostegui.json
04:54 TimStarling: on mwmaint2002: running bv2021/populateEditCounts.php on all wikis with one thread per section s1-s8
04:45 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db1105:3312 to clone db1170:3312 T286888', diff saved to https://phabricator.wikimedia.org/P16950 and previous config saved to /var/cache/conftool/dbconfig/20210804-044507-marostegui.json
04:34 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db1174 to clone db1127 T286763', diff saved to https://phabricator.wikimedia.org/P16948 and previous config saved to /var/cache/conftool/dbconfig/20210804-043438-marostegui.json
04:10 TimStarling: on mwmaint2002: creating bv2021_edits table on all wikis
03:58 tstarling@deploy1002: Synchronized php-1.37.0-wmf.17/extensions/SecurePoll: for bv2021/populateEditCount.php (duration: 01m 06s)
03:56 mwdebug-deploy@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
03:56 tstarling@deploy1002: Synchronized php-1.37.0-wmf.16/extensions/SecurePoll: for bv2021/populateEditCount.php (duration: 01m 18s)
03:50 mwdebug-deploy@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
03:34 mwdebug-deploy@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
03:27 mwdebug-deploy@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .

2021-08-03

23:34 mwdebug-deploy@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
23:28 mwdebug-deploy@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
23:17 ebernhardson@deploy1002: Synchronized wmf-config/InitialiseSettings.php: Config: Re-enable commonswiki sister search (T277225) (duration: 01m 07s)
22:57 mwdebug-deploy@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
22:50 mwdebug-deploy@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
22:37 dduvall: re-rolled 1.37.0-wmf.17 to group0 following rollback and subsequent fixes for T287988 (T281158)
22:28 dduvall@deploy1002: rebuilt and synchronized wikiversions files: group0 wikis to 1.37.0-wmf.17
22:20 urbanecm@deploy1002: Synchronized php-1.37.0-wmf.16/autoload.php: 7d286dc: Add (MySQL/DB)PrimaryPos as an alias to (MySQL/DB)MasterPos (T287988; 3/3) (duration: 01m 07s)
22:18 urbanecm@deploy1002: Synchronized php-1.37.0-wmf.16/includes/libs/rdbms/database/position/MySQLMasterPos.php: 7d286dc: Add (MySQL/DB)PrimaryPos as an alias to (MySQL/DB)MasterPos (T287988; 2/3) (duration: 01m 07s)
22:15 urbanecm@deploy1002: Synchronized php-1.37.0-wmf.16/includes/libs/rdbms/database/position/DBMasterPos.php: 7d286dc: Add (MySQL/DB)PrimaryPos as an alias to (MySQL/DB)MasterPos (T287988; 1/3) (duration: 01m 07s)
21:51 ebernhardson@deploy1002: Finished deploy [wikimedia/discovery/analytics@2d533ba]: enable glent version marker in final output (duration: 45m 00s)
21:43 mwdebug-deploy@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
21:37 mwdebug-deploy@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
21:10 mwdebug-deploy@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
21:06 ebernhardson@deploy1002: Started deploy [wikimedia/discovery/analytics@2d533ba]: enable glent version marker in final output
21:04 mwdebug-deploy@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
20:46 rzl@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 13 days, 0:00:00 on mw2383.codfw.wmnet with reason: T286463
20:45 rzl@cumin1001: START - Cookbook sre.hosts.downtime for 13 days, 0:00:00 on mw2383.codfw.wmnet with reason: T286463
20:44 urbanecm@deploy1002: Synchronized php-1.37.0-wmf.16/autoload.php: REVERT: 2d4ea75: Add (MySQL/DB)PrimaryPos as an alias to (MySQL/DB)MasterPos (T287988; 1/3) (duration: 00m 37s)
20:41 urbanecm@deploy1002: Synchronized php-1.37.0-wmf.16/includes/libs/rdbms/database/position/DBMasterPos.php: REVERT: 2d4ea75: Add (MySQL/DB)PrimaryPos as an alias to (MySQL/DB)MasterPos (T287988; 2/3) (duration: 00m 37s)
20:40 urbanecm@deploy1002: Synchronized php-1.37.0-wmf.16/includes/libs/rdbms/database/position/DBMasterPos.php: 2d4ea75: Add (MySQL/DB)PrimaryPos as an alias to (MySQL/DB)MasterPos (T287988; 2/3) (duration: 01m 07s)
20:39 urbanecm@deploy1002: Synchronized php-1.37.0-wmf.16/autoload.php: 2d4ea75: Add (MySQL/DB)PrimaryPos as an alias to (MySQL/DB)MasterPos (T287988; 1/3) (duration: 01m 08s)
20:23 pt1979@cumin2002: END (FAIL) - Cookbook sre.hosts.downtime (exit_code=99) for 2:00:00 on ganeti-test2003.codfw.wmnet with reason: REIMAGE
20:20 pt1979@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on ganeti-test2003.codfw.wmnet with reason: REIMAGE
20:13 otto@deploy1002: Finished deploy [analytics/refinery@ea78871] (hadoop-test): Regular analytics weekly train TEST [analytics/refinery@ea78871] (duration: 05m 36s)
20:08 otto@deploy1002: Started deploy [analytics/refinery@ea78871] (hadoop-test): Regular analytics weekly train TEST [analytics/refinery@ea78871]
20:08 otto@deploy1002: Finished deploy [analytics/refinery@ea78871] (thin): Regular analytics weekly train THIN [analytics/refinery@ea78871] (duration: 00m 07s)
20:07 otto@deploy1002: Started deploy [analytics/refinery@ea78871] (thin): Regular analytics weekly train THIN [analytics/refinery@ea78871]
20:03 otto@deploy1002: Finished deploy [analytics/refinery@ea78871]: Regular analytics weekly train [analytics/refinery@ea78871] (duration: 20m 38s)
19:57 mwdebug-deploy@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
19:52 mwdebug-deploy@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
19:45 pt1979@cumin2002: END (FAIL) - Cookbook sre.hosts.downtime (exit_code=99) for 2:00:00 on ganeti-test2002.codfw.wmnet with reason: REIMAGE
19:43 pt1979@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on ganeti-test2002.codfw.wmnet with reason: REIMAGE
19:42 otto@deploy1002: Started deploy [analytics/refinery@ea78871]: Regular analytics weekly train [analytics/refinery@ea78871]
19:36 mwdebug-deploy@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
19:31 ryankemper: T285355 `ryankemper@an-web1001:~$ sudo run-puppet-agent` to establish `role(analytics_cluster::webserver)` on the host in preparation for upcoming cutover from `thorium`->`an-web1001`
19:31 otto@deploy1002: Finished deploy [analytics/refinery@aceb561] (hadoop-test): Regular analytics weekly train TEST [analytics/refinery@aceb561] (duration: 05m 40s)
19:30 mwdebug-deploy@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
19:27 dduvall@deploy1002: rebuilt and synchronized wikiversions files: revert group0 wikis to 1.37.0-wmf.16
19:25 otto@deploy1002: Started deploy [analytics/refinery@aceb561] (hadoop-test): Regular analytics weekly train TEST [analytics/refinery@aceb561]
19:14 dduvall@deploy1002: rebuilt and synchronized wikiversions files: group0 wikis to 1.37.0-wmf.17
19:01 otto@deploy1002: Finished deploy [analytics/refinery@aceb561] (thin): Regular analytics weekly train THIN [analytics/refinery@aceb561] (duration: 00m 07s)
19:01 otto@deploy1002: Started deploy [analytics/refinery@aceb561] (thin): Regular analytics weekly train THIN [analytics/refinery@aceb561]
19:00 otto@deploy1002: Finished deploy [analytics/refinery@aceb561]: Regular analytics weekly train [analytics/refinery@aceb561] (duration: 16m 25s)
18:47 Amir1: running mwscript migrateUserGroup.php --wiki=idwiki editor reviewer (T286853)
18:44 otto@deploy1002: Started deploy [analytics/refinery@aceb561]: Regular analytics weekly train [analytics/refinery@aceb561]
18:29 dduvall@deploy1002: Finished scap: testwikis wikis to 1.37.0-wmf.17 (duration: 36m 44s)
18:18 mwdebug-deploy@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
18:12 mwdebug-deploy@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
18:06 ebernhardson@deploy1002: Finished deploy [search/mjolnir/deploy@f0f70d1]: T286642 fixes to bulk daemon prioritization (duration: 00m 48s)
18:05 ebernhardson@deploy1002: Started deploy [search/mjolnir/deploy@f0f70d1]: T286642 fixes to bulk daemon prioritization
17:52 dduvall@deploy1002: Started scap: testwikis wikis to 1.37.0-wmf.17
16:59 hashar: Gerrit has been upgraded
16:47 dancy@deploy1002: Finished deploy [gerrit/gerrit@244120b]: Gerrit to 3.3.5 on gerrit1001 (duration: 00m 07s)
16:47 dancy@deploy1002: Started deploy [gerrit/gerrit@244120b]: Gerrit to 3.3.5 on gerrit1001
16:45 urbanecm: Start server side upload for 1 video file (T287957)
16:45 hashar: Stopping Gerrit for upgrade
16:43 volans: upgraded spicerack to 0.0.57-1+deb10u1 on cumin1001
16:36 dancy@deploy1002: Finished deploy [gerrit/gerrit@244120b]: Gerrit to 3.3.5 on gerrit2001 (duration: 00m 10s)
16:36 dancy@deploy1002: Started deploy [gerrit/gerrit@244120b]: Gerrit to 3.3.5 on gerrit2001
16:27 hashar: Going to upgrade Gerrit 3.3 (scheduled maintenance)
16:18 pt1979@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
16:14 pt1979@cumin2002: START - Cookbook sre.dns.netbox
16:00 dcausse@deploy1002: helmfile [staging] Ran 'sync' command on namespace 'rdf-streaming-updater' for release 'main' .
15:55 mwdebug-deploy@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
15:50 mwdebug-deploy@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
15:49 jmm@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
15:34 mwdebug-deploy@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
15:30 mwdebug-deploy@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
15:26 jmm@cumin2002: START - Cookbook sre.dns.netbox
15:25 moritzm: prune testvm2001 from Ganeti and clean up from Netbox (stuck in some inconsistent state the decom cookbook can't handle) T286206
15:14 jmm@cumin2002: END (PASS) - Cookbook sre.ganeti.makevm (exit_code=0) for new host testvm2002.codfw.wmnet
15:01 jmm@cumin2002: START - Cookbook sre.ganeti.makevm for new host testvm2002.codfw.wmnet
14:56 jmm@cumin2002: END (FAIL) - Cookbook sre.hosts.decommission (exit_code=1) for hosts testvm2001.codfw.wmnet
14:49 jmm@cumin2002: START - Cookbook sre.hosts.decommission for hosts testvm2001.codfw.wmnet
14:32 oblivian@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
14:27 ottomata: chown dumpsgen and chmod 644 /data/xmldatadumps/public/*/20210801/dumpstatus.json on labstore1006 and labstore1007 (it was only readable by root causing an analytics import job to fail), ping apergos
14:23 ottomata: chown dumpsgen and chmod 644 /data/xmldatadumps/public/lezwiki/20210801/dumpstatus.json on labstore1006 and labstore1007 (it was only readable by root causing an analytics import job to fail), ping apergos
14:13 ottomata: chown dumpsgen and chmod 644 dumpsdata1003:/data/xmldatadumps/public/lezwiki/20210801/dumpstatus.json (it was only readable by root causing an analytics import job to fail), ping apergos
12:47 moritzm: restarting Tomcat on idp1001
12:05 moritzm: installing libgcrypt20 security updates
11:48 jmm@cumin2002: END (FAIL) - Cookbook sre.ganeti.makevm (exit_code=99) for new host testvm2001.codfw.wmnet
11:36 moritzm: updated bullseye d-i images to rc3 T275873
11:28 godog: upgrade prometheus3001 to 2.24.1+ds-1+wmf1 - T222113
11:25 jmm@cumin2002: START - Cookbook sre.ganeti.makevm for new host testvm2001.codfw.wmnet
11:19 jmm@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
11:18 godog: upgrade prometheus5001 to 2.24.1+ds-1+wmf1 - T222113
11:15 jmm@cumin2002: START - Cookbook sre.dns.netbox
11:13 moritzm: rename Ganeti group for test cluster to row_D T286206
11:01 jmm@cumin2002: END (FAIL) - Cookbook sre.ganeti.makevm (exit_code=99) for new host testvm2001.codfw.wmnet
10:58 jmm@cumin2002: START - Cookbook sre.ganeti.makevm for new host testvm2001.codfw.wmnet
09:18 marostegui: Failover m1, m2 and m3-master T287574
09:12 moritzm: installinh php 7.0 security updates on stretch
09:11 jayme: importing dragonfly 1.0.6-2 to buster-wikimedia and stretch-wikimedia - T286054
08:57 moritzm: installing pillow security updates on stretch
08:53 jynus@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on db1145.eqiad.wmnet with reason: REIMAGE
08:50 jynus@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on db1145.eqiad.wmnet with reason: REIMAGE
08:17 legoktm: pausing refreshLinks run against wikiversities while other issues are figured out
08:13 jynus@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on db1145.eqiad.wmnet with reason: REIMAGE
08:10 jynus@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on db1145.eqiad.wmnet with reason: REIMAGE
08:03 dzahn@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 5 days, 8:00:00 on planet1002.eqiad.wmnet with reason: known issue
08:03 dzahn@cumin1001: START - Cookbook sre.hosts.downtime for 5 days, 8:00:00 on planet1002.eqiad.wmnet with reason: known issue
07:42 moritzm: upgrading spicerack on cumin2002 to 0.0.57
06:31 kart__: Updated cxserver to 2021-08-02-164000-production (T286473)
06:26 kartik@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'cxserver' for release 'production' .
06:20 kartik@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'cxserver' for release 'production' .
06:15 kartik@deploy1002: helmfile [staging] Ran 'sync' command on namespace 'cxserver' for release 'staging' .
04:37 marostegui: Disable puppet on dbproxy1014 dbproxy1013 dbproxy1020
00:43 reedy@deploy1002: Finished deploy [integration/docroot@f9d225d]: with less gref (duration: 00m 05s)
00:43 reedy@deploy1002: Started deploy [integration/docroot@f9d225d]: with less gref
00:29 reedy@deploy1002: Finished deploy [integration/docroot@f7df1c7]: (no justification provided) (duration: 00m 05s)
00:29 reedy@deploy1002: Started deploy [integration/docroot@f7df1c7]: (no justification provided)
00:22 reedy@deploy1002: Finished deploy [integration/docroot@3cff0e4]: (no justification provided) (duration: 00m 08s)
00:22 reedy@deploy1002: Started deploy [integration/docroot@3cff0e4]: (no justification provided)

2021-08-02

23:58 legoktm@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
23:50 legoktm@cumin1001: START - Cookbook sre.dns.netbox
23:38 legoktm@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'shellbox-constraints' for release 'main' .
23:38 mwdebug-deploy@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
23:38 legoktm@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'shellbox-constraints' for release 'main' .
23:35 mwdebug-deploy@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
23:28 mwdebug-deploy@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
23:25 mwdebug-deploy@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
23:21 legoktm: Previous sync also deployed c38998f03f "Stop enabling DPL on new wikis" (T287380)
23:18 legoktm@deploy1002: Synchronized dblists/: Move ruwikinews to large wikis dblist (2/2) (duration: 00m 56s)
23:16 legoktm@deploy1002: Synchronized wmf-config/InitialiseSettings.php: Move ruwikinews to large wikis dblist (1/2) (duration: 00m 57s)
21:31 tzatziki: removing 1 file for legal compliance
21:16 tzatziki: removing 7 files for legal compliance
19:35 razzi@cumin1001: END (PASS) - Cookbook sre.druid.roll-restart-workers (exit_code=0) for Druid analytics cluster: Roll restart of Druid's jvm daemons. - razzi@cumin1001
19:15 mwdebug-deploy@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
19:13 mwdebug-deploy@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
19:05 mwdebug-deploy@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
19:01 urbanecm: Run extensions/GrowthExperiments/maintenance/initWikiConfig.php on a couple of wikis to init on-wiki config for Growth features (T287868, T287874, T287873)
19:00 urbanecm: Morning B&C window completed
19:00 urbanecm@deploy1002: Synchronized dblists/growthexperiments.dblist: bebf4a9: Enable Growth features on a couple of wikis in dark mode (T287868, T287874, T287873; 2/2) (duration: 00m 56s)
18:59 urbanecm@deploy1002: Synchronized wmf-config/InitialiseSettings.php: bebf4a9: Enable Growth features on a couple of wikis in dark mode (T287868, T287874, T287873; 1/2) (duration: 00m 57s)
18:58 mwdebug-deploy@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
18:57 mwdebug-deploy@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
18:50 otto@deploy1002: Synchronized wmf-config/InitialiseSettings.php: Stream config for android_notification_interaction - T287652 (duration: 00m 56s)
18:50 mwdebug-deploy@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
18:49 razzi@cumin1001: START - Cookbook sre.druid.roll-restart-workers for Druid analytics cluster: Roll restart of Druid's jvm daemons. - razzi@cumin1001
18:49 urbanecm: Run extensions/GrowthExperiments/maintenance/initWikiConfig.php on a couple of wikis to init on-wiki config for Growth features (T287876, T287871, T287878, T287880, T287875, T287879, T287872)
18:48 razzi@cumin1001: END (PASS) - Cookbook sre.druid.roll-restart-workers (exit_code=0) for Druid public cluster: Roll restart of Druid's jvm daemons. - razzi@cumin1001
18:47 mwdebug-deploy@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
18:46 urbanecm@deploy1002: Synchronized wmf-config/InitialiseSettings.php: 18cd360: Growth features: Enable features in dark mode on a few wikis (T287876, T287871, T287878, T287880, T287875, T287879, T287872; 2/2) (duration: 00m 56s)
18:45 urbanecm@deploy1002: Synchronized dblists/growthexperiments.dblist: 18cd360: Growth features: Enable features in dark mode on a few wikis (T287876, T287871, T287878, T287880, T287875, T287879, T287872; 1/2) (duration: 00m 56s)
18:41 urbanecm: Create GrowthExperiments database tables for a bunch of wikis (T287876, T287871, T287878, T287880, T287875, T287879, T287872)
18:40 urbanecm@deploy1002: Synchronized wmf-config/InitialiseSettings.php: ee47f9d: Add rollbacker group for kswiki (T286789) (duration: 00m 56s)
18:35 mwdebug-deploy@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
18:33 urbanecm@deploy1002: Synchronized wmf-config/InitialiseSettings.php: eec997c: Enable SUL autologin for wikimania.wikimedia.org (T285197) (duration: 00m 55s)
18:33 mwdebug-deploy@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
18:31 urbanecm@deploy1002: Synchronized php-1.37.0-wmf.16/extensions/GrowthExperiments/modules/: 05cf1d6: Add a link: Show article extract instead of description in the link inspector (T287636; 2/2) (duration: 00m 56s)
18:30 urbanecm@deploy1002: Synchronized php-1.37.0-wmf.16/extensions/GrowthExperiments/extension.json: 05cf1d6: Add a link: Show article extract instead of description in the link inspector (T287636; 1/2) (duration: 00m 57s)
18:25 mwdebug-deploy@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
18:22 mwdebug-deploy@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
18:20 urbanecm@deploy1002: Synchronized wmf-config/InitialiseSettings.php: cc8ca45: Add tewikisource as import source for tewikibooks (T286978) (duration: 00m 56s)
18:15 urbanecm@deploy1002: Synchronized wmf-config/InitialiseSettings.php: 11e96ba: Add media.defense.gov to the wgCopyUploadsDomains allowlist of Wikimedia Commons (T287264) (duration: 00m 56s)
18:11 urbanecm@deploy1002: Synchronized static/images/project-logos/: 97b6897: Remove unused enwiki celebration logos (T272108) (duration: 00m 57s)
18:07 urbanecm@deploy1002: Synchronized static/images/project-logos/: 16f9794: Remove unused eswiki celebration logos (T280908) (duration: 00m 57s)
17:57 razzi@cumin1001: START - Cookbook sre.druid.roll-restart-workers for Druid public cluster: Roll restart of Druid's jvm daemons. - razzi@cumin1001
15:44 jynus: remove s2 from db1139 T287230
14:37 vgutierrez@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on lvs2010.codfw.wmnet with reason: NIC maintenance
14:37 vgutierrez@cumin1001: START - Cookbook sre.hosts.downtime for 1:00:00 on lvs2010.codfw.wmnet with reason: NIC maintenance
13:21 volans: uploaded spicerack_0.0.57 to apt.wikimedia.org buster-wikimedia,bullseye-wikimedia
13:03 dzahn@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on gerrit1001.wikimedia.org with reason: apply 706049
13:03 dzahn@cumin1001: START - Cookbook sre.hosts.downtime for 1:00:00 on gerrit1001.wikimedia.org with reason: apply 706049
13:02 mutante: gerrit1001 - restarting service after 706049
13:00 mutante: gerrit1001 - re-enabling puppet, deploying sshd listening / firewall change
12:38 mutante: gerrit2001 - restarting gerrit after deploying 706049
12:20 mutante: gerrit servers: disabling puppet
12:10 reedy@deploy1002: Synchronized php-1.37.0-wmf.16/extensions/MobileFrontend/: T287528 (duration: 00m 57s)
12:08 reedy@deploy1002: Synchronized php-1.37.0-wmf.16/extensions/SecurePoll/: T287780 (duration: 00m 57s)
11:54 dzahn@cumin1001: END (PASS) - Cookbook sre.hosts.decommission (exit_code=0) for hosts mw1288.eqiad.wmnet
11:42 dzahn@cumin1001: START - Cookbook sre.hosts.decommission for hosts mw1288.eqiad.wmnet
11:42 dzahn@cumin1001: END (PASS) - Cookbook sre.hosts.decommission (exit_code=0) for hosts mw1271.eqiad.wmnet
11:40 reedy@deploy1002: Synchronized php-1.37.0-wmf.16/extensions/SecurePoll/: T287782 (duration: 00m 56s)
11:31 dzahn@cumin1001: START - Cookbook sre.hosts.decommission for hosts mw1271.eqiad.wmnet
11:29 hashar: restarting gerrit primary server on gerrit1001
11:27 hashar: restarting Jenkins on contint2001
11:27 hashar: restarting Jenkins on contint1001
11:26 mwdebug-deploy@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
11:24 mwdebug-deploy@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
11:18 dzahn@cumin1001: conftool action : set/pooled=inactive; selector: name=mw1288.eqiad.wmnet
11:17 mwdebug-deploy@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
11:16 dzahn@cumin1001: conftool action : set/pooled=inactive; selector: name=mw1271.eqiad.wmnet
11:14 mwdebug-deploy@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
11:13 urbanecm: EU B&C window completed
11:10 urbanecm@deploy1002: Synchronized wmf-config/InitialiseSettings.php: 43020b7: votewiki: Enable Single Transferable Vote (T283728) (duration: 00m 57s)
11:08 moritzm: installing openjdk-11 security updates
11:05 urbanecm@deploy1002: Synchronized wmf-config/InitialiseSettings.php: 26bcaaf: Restore logging for mediamoderation script to better understand high error rate occurring when running script (T287511) (duration: 00m 57s)
07:53 moritzm: catch up bullseye installs with latest state of testing
07:24 moritzm: installing libsndfile security updates on buster
07:12 moritzm: installing aspell security updates
05:01 oblivian@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
04:58 oblivian@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
02:01 tstarling@deploy1002: Synchronized src/defines.php: for consistency only, should have no production impact (duration: 00m 57s)

2000s

Archive 1: 2004 Jun - 2004 Sep
Archive 2: 2004 Oct - 2004 Nov
Archive 3: 2004 Dec - 2005 Mar
Archive 4: 2005 Apr - 2005 Jul
Archive 5: 2005 Aug - 2005 Oct, with revision history 2004-06-23 to 2005-11-25
Archive 6: 2005 Nov - 2006 Feb
Archive 7: 2006 Mar - 2006 Jun
Archive 8: 2006 Jul - 2006 Sep
Archive 9: 2006 Oct - 2007 Jan, with revision history 2005-11-25 to 2007-02-21
Archive 10: 2007 Feb - 2007 Jun
Archive 11: 2007 Jul - 2007 Dec
Archive 12: 2008 Jan - 2008 Jul
Archive 12a: 2008 Aug
Archive 12b: 2008 Sept
Archive 13: 2008 Oct - 2009 Jun
Archive 14: 2009 Jun - 2009 Dec

2010s

2020s