Server Admin Log
Appearance
2025-03-16
- 14:08 claime: sudo postqueue -j | jq -r 'select(.sender == "vrts-bounce@wikimedia.org") | .queue_id' | sudo postsuper -d - # mx-out1001
- 13:59 Emperor: sudo postqueue -j | jq -r ' select(.recipients[0].address == "vrts-bounce@wikimedia.org") | select(.recipients[1].address == null) | .queue_id' | sudo postsuper -d - # mx-in2001
- 13:39 Emperor: restart postfix on mx-in2001
2025-03-15
- 18:11 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host ganeti2046.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART
- 18:00 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host ganeti2046.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART
- 00:51 jhancock@cumin2002: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host ganeti2046
- 00:51 jhancock@cumin2002: START - Cookbook sre.network.configure-switch-interfaces for host ganeti2046
2025-03-14
- 21:25 zabe: zabe@mwmaint2002:~$ mwscript extensions/WikimediaMaintenance/migrateESRefToContentTableStage2.php testwiki --delete /home/zabe/afl_text_table_deletedump/testwiki --sleep 0.3 # T381599
- 21:06 zabe: zabe@mwmaint2002:~$ mwscript extensions/AbuseFilter/maintenance/MigrateESRefToAflTable.php testwiki --dump /home/zabe/afl_text_table_dump/testwiki --deletedump /home/zabe/afl_text_table_deletedump/testwiki --sleep 0.3 # T381599
- 16:51 root@deploy2002: helmfile [aux-k8s-codfw] DONE helmfile.d/admin 'sync'.
- 16:51 root@deploy2002: helmfile [aux-k8s-codfw] START helmfile.d/admin 'sync'.
- 16:15 slyngshede@cumin1002: DONE (PASS) - Cookbook sre.idm.logout (exit_code=0) Logging Vivian Rook out of all services on: 2288 hosts
- 16:14 slyngshede@cumin1002: DONE (FAIL) - Cookbook sre.idm.logout (exit_code=99) Logging Vivian Rook out of all services on: 2288 hosts
- 16:05 sukhe@cumin1002: END (FAIL) - Cookbook sre.loadbalancer.admin (exit_code=1) pooling A:liberica-canary
- 16:05 sukhe@cumin1002: START - Cookbook sre.loadbalancer.admin pooling A:liberica-canary
- 16:04 sukhe@cumin1002: END (PASS) - Cookbook sre.loadbalancer.admin (exit_code=0) depooling A:liberica-canary
- 16:04 sukhe@cumin1002: START - Cookbook sre.loadbalancer.admin depooling A:liberica-canary
- 16:04 sukhe@cumin1002: END (FAIL) - Cookbook sre.loadbalancer.admin (exit_code=1) pooling A:liberica-canary
- 16:03 sukhe@cumin1002: START - Cookbook sre.loadbalancer.admin pooling A:liberica-canary
- 16:01 sukhe@cumin1002: END (FAIL) - Cookbook sre.loadbalancer.admin (exit_code=1) pooling A:liberica-canary
- 16:00 sukhe@cumin1002: START - Cookbook sre.loadbalancer.admin pooling A:liberica-canary
- 16:00 sukhe@cumin1002: END (PASS) - Cookbook sre.loadbalancer.admin (exit_code=0) depooling A:liberica-canary
- 16:00 sukhe@cumin1002: START - Cookbook sre.loadbalancer.admin depooling A:liberica-canary
- 15:50 slyngshede@cumin1002: DONE (FAIL) - Cookbook sre.idm.logout (exit_code=99) Logging Vivian Rook out of all services on: 2288 hosts
- 15:41 jhancock@cumin2002: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host ms-be2075.codfw.wmnet with OS bullseye
- 15:41 jhancock@cumin2002: START - Cookbook sre.hosts.reimage for host ms-be2075.codfw.wmnet with OS bullseye
- 15:37 jhancock@cumin2002: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host ms-be2075.codfw.wmnet with OS bullseye
- 15:20 root@deploy2002: helmfile [aux-k8s-codfw] DONE helmfile.d/admin 'sync'.
- 15:20 root@deploy2002: helmfile [aux-k8s-codfw] START helmfile.d/admin 'sync'.
- 15:20 root@deploy2002: helmfile [aux-k8s-codfw] DONE helmfile.d/admin 'sync'.
- 15:20 root@deploy2002: helmfile [aux-k8s-codfw] START helmfile.d/admin 'sync'.
- 15:19 root@deploy2002: helmfile [aux-k8s-codfw] DONE helmfile.d/admin 'sync'.
- 15:19 root@deploy2002: helmfile [aux-k8s-codfw] START helmfile.d/admin 'sync'.
- 15:19 root@deploy2002: helmfile [aux-k8s-codfw] DONE helmfile.d/admin 'sync'.
- 15:18 root@deploy2002: helmfile [aux-k8s-codfw] START helmfile.d/admin 'sync'.
- 14:55 herron: kafka-logging reduce mediawiki.httpd.accesslog topic retention from 172800000ms (2d) to 129600000ms (1.5d)
- 13:33 kevinbazira@deploy2002: helmfile [ml-staging-codfw] Ran 'sync' command on namespace 'article-models' for release 'main' .
- 13:14 pt1979@cumin1002: END (FAIL) - Cookbook sre.hosts.dhcp (exit_code=99) for host cloudgw1003.eqiad.wmnet
- 13:13 volans: installed cumin v5.1.1 on cloudcumin* and cuminunpriv* hosts
- 12:03 hnowlan@cumin2002: END (PASS) - Cookbook sre.switchdc.mediawiki.08-start-maintenance (exit_code=0) for datacenter switchover from eqiad to codfw
- 12:02 root@deploy2002: helmfile [codfw] DONE helmfile.d/services/mw-cron: apply
- 12:02 root@deploy2002: helmfile [codfw] START helmfile.d/services/mw-cron: apply
- 12:02 root@deploy2002: helmfile [eqiad] DONE helmfile.d/services/mw-cron: apply
- 12:02 root@deploy2002: helmfile [eqiad] START helmfile.d/services/mw-cron: apply
- 12:00 hnowlan@cumin2002: START - Cookbook sre.switchdc.mediawiki.08-start-maintenance for datacenter switchover from eqiad to codfw
- 11:52 hnowlan@cumin2002: END (PASS) - Cookbook sre.switchdc.mediawiki.01-stop-maintenance (exit_code=0) for datacenter switchover from eqiad to codfw
- 11:52 hnowlan@cumin2002: START - Cookbook sre.switchdc.mediawiki.01-stop-maintenance for datacenter switchover from eqiad to codfw
- 11:40 hnowlan@cumin2002: END (FAIL) - Cookbook sre.switchdc.mediawiki.01-stop-maintenance (exit_code=99) for datacenter switchover from eqiad to codfw
- 11:40 hnowlan@cumin2002: START - Cookbook sre.switchdc.mediawiki.01-stop-maintenance for datacenter switchover from eqiad to codfw
- 11:36 volans: uploaded cumin_5.1.1 to apt.wikimedia.org bullseye-wikimedia
- 11:13 stevemunene@cumin1002: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 3 days, 0:00:00 on an-worker1199.eqiad.wmnet with reason: Adding the hosts to the analytics hadoop cluster in batches. this is part of the next batch
- 11:13 stevemunene@cumin1002: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 3 days, 0:00:00 on 9 hosts with reason: Adding the hosts to the analytics hadoop cluster in batches. this is part of the next batch
- 10:58 godog: set 80GB (per 6x partition ~500GB) retention for udp_localhost-err topic in kafka-logging eqiad
- 10:57 godog: set 150GB (per 6x partition = ~1TB) retention for udp_localhost-warning topic in kafka-logging eqiad
- 10:37 brouberol@deploy2002: helmfile [dse-k8s-eqiad] DONE helmfile.d/services/mediawiki-dumps-legacy: apply
- 10:37 brouberol@deploy2002: helmfile [dse-k8s-eqiad] START helmfile.d/services/mediawiki-dumps-legacy: apply
- 10:19 elukey@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host db1257.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART
- 10:17 elukey@cumin2002: START - Cookbook sre.hosts.provision for host db1257.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART
- 10:10 elukey@cumin2002: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host db1257.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART
- 10:09 elukey@cumin2002: START - Cookbook sre.hosts.provision for host db1257.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART
- 10:08 elukey@cumin1002: END (ERROR) - Cookbook sre.hosts.reimage (exit_code=97) for host db1257.eqiad.wmnet with OS bookworm
- 09:38 godog: set 1TB retention for udp_localhost-warning topic in kafka-logging eqiad
- 09:36 godog: set 400G retention for udp_localhost-err topic in kafka-logging eqiad
- 09:31 elukey@deploy2002: helmfile [staging] DONE helmfile.d/services/changeprop-jobqueue: sync
- 09:31 elukey@deploy2002: helmfile [staging] START helmfile.d/services/changeprop-jobqueue: sync
- 09:30 elukey@deploy2002: helmfile [staging] DONE helmfile.d/services/changeprop: sync
- 09:30 elukey@deploy2002: helmfile [staging] START helmfile.d/services/changeprop: sync
- 09:19 elukey@cumin1002: START - Cookbook sre.hosts.reimage for host db1257.eqiad.wmnet with OS bookworm
- 09:18 elukey@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host db1257.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART
- 09:05 elukey@cumin2002: START - Cookbook sre.hosts.provision for host db1257.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART
- 01:49 cstone: civicrm upgraded from 52226531 to aa582fe1
2025-03-13
- 23:11 ladsgroup@deploy2002: Finished scap sync-world: Backport for Temporarily enable mobile sitenotice for fawiki (duration: 10m 07s)
- 23:05 ladsgroup@deploy2002: ladsgroup: Continuing with sync
- 23:04 ladsgroup@deploy2002: ladsgroup: Backport for Temporarily enable mobile sitenotice for fawiki synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug)
- 23:01 ladsgroup@deploy2002: Started scap sync-world: Backport for Temporarily enable mobile sitenotice for fawiki
- 22:58 reedy@deploy2002: Finished scap sync-world: Backport for FilterEvaluator::rmspecials: Disable PCRE JIT for this call too (T385452) (duration: 68m 05s)
- 22:52 reedy@deploy2002: reedy: Continuing with sync
- 21:53 reedy@deploy2002: reedy: Backport for FilterEvaluator::rmspecials: Disable PCRE JIT for this call too (T385452) synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug)
- 21:50 reedy@deploy2002: Started scap sync-world: Backport for FilterEvaluator::rmspecials: Disable PCRE JIT for this call too (T385452)
- 21:35 mutante: lists1004 - systemctl start wmf_auto_restart_exim4 which was failed for some reason
- {{safesubst:SAL entry|1=21:34 jhuneidi@deploy2002: Finished scap sync-world: Backport for PreferenceHelper: Handle another case of getGlobalPreferencesValues returning false (T388073), FilterEvaluator::rmdoubles: Disable PCRE JIT for this call (T385452), Score: Handle parser passing $code of null and bail out (T388821), [[gerrit:1127570|SidebarBeforeOutputHookHandler::getItemId: Bail early i}}
- 21:28 jhuneidi@deploy2002: reedy, jhuneidi: Continuing with sync
- {{safesubst:SAL entry|1=21:28 jhuneidi@deploy2002: reedy, jhuneidi: Backport for PreferenceHelper: Handle another case of getGlobalPreferencesValues returning false (T388073), FilterEvaluator::rmdoubles: Disable PCRE JIT for this call (T385452), Score: Handle parser passing $code of null and bail out (T388821), [[gerrit:1127570|SidebarBeforeOutputHookHandler::getItemId: Bail early if Title i}}
- {{safesubst:SAL entry|1=21:25 jhuneidi@deploy2002: Started scap sync-world: Backport for PreferenceHelper: Handle another case of getGlobalPreferencesValues returning false (T388073), FilterEvaluator::rmdoubles: Disable PCRE JIT for this call (T385452), Score: Handle parser passing $code of null and bail out (T388821), [[gerrit:1127570|SidebarBeforeOutputHookHandler::getItemId: Bail early if}}
- 21:16 eevans@deploy2002: helmfile [staging] DONE helmfile.d/services/data-gateway: apply
- 21:16 eevans@deploy2002: helmfile [staging] START helmfile.d/services/data-gateway: apply
- 20:59 ladsgroup@deploy2002: Finished scap sync-world: Backport for Bump the thumbnail steps ratio to 10% (T360589) (duration: 12m 56s)
- 20:57 ladsgroup@deploy2002: helmfile [codfw] DONE helmfile.d/services/changeprop-jobqueue: apply
- 20:56 ladsgroup@deploy2002: helmfile [codfw] START helmfile.d/services/changeprop-jobqueue: apply
- 20:56 ladsgroup@deploy2002: helmfile [eqiad] DONE helmfile.d/services/changeprop-jobqueue: apply
- 20:54 ladsgroup@deploy2002: helmfile [eqiad] START helmfile.d/services/changeprop-jobqueue: apply
- 20:54 ladsgroup@deploy2002: helmfile [staging] DONE helmfile.d/services/changeprop-jobqueue: apply
- 20:54 ladsgroup@deploy2002: helmfile [staging] START helmfile.d/services/changeprop-jobqueue: apply
- 20:53 ladsgroup@deploy2002: ladsgroup: Continuing with sync
- 20:50 ladsgroup@deploy2002: ladsgroup: Backport for Bump the thumbnail steps ratio to 10% (T360589) synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug)
- 20:49 ladsgroup@deploy2002: helmfile [staging] DONE helmfile.d/services/changeprop-jobqueue: apply
- 20:49 ladsgroup@deploy2002: helmfile [staging] START helmfile.d/services/changeprop-jobqueue: apply
- 20:47 ladsgroup@deploy2002: Started scap sync-world: Backport for Bump the thumbnail steps ratio to 10% (T360589)
- 20:44 jhuneidi@deploy2002: Finished scap sync-world: Backport for Rebuild logo files (T387448), Logos: Fix order of guwwikinews in yaml file (T387448), logos: have CI fail on uncommited logos.php changes (T341412) (duration: 19m 18s)
- 20:38 jhuneidi@deploy2002: hashar, pppery, jhuneidi: Continuing with sync
- 20:28 jhuneidi@deploy2002: hashar, pppery, jhuneidi: Backport for Rebuild logo files (T387448), Logos: Fix order of guwwikinews in yaml file (T387448), logos: have CI fail on uncommited logos.php changes (T341412) synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug)
- 20:25 jhuneidi@deploy2002: Started scap sync-world: Backport for Rebuild logo files (T387448), Logos: Fix order of guwwikinews in yaml file (T387448), logos: have CI fail on uncommited logos.php changes (T341412)
- 20:03 bking@cumin2002: END (PASS) - Cookbook sre.elasticsearch.ban (exit_code=0) Unbanning all hosts in cloudelastic
- 20:03 bking@cumin2002: START - Cookbook sre.elasticsearch.ban Unbanning all hosts in cloudelastic
- 19:56 bking@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host cloudelastic1012.eqiad.wmnet with OS bullseye
- 19:54 swfrench@deploy2002: Finished scap sync-world: apply rsyslog config changes - T388799 (duration: 08m 09s)
- 19:47 jynus: forcing a reboot of db1248 from console T388837
- 19:47 swfrench@deploy2002: Started scap sync-world: apply rsyslog config changes - T388799
- 19:44 cwhite: depooled db1248, unchanged db1245
- 19:42 cwhite@cumin2002: dbctl commit (dc=all): 'depool db1245', diff saved to https://phabricator.wikimedia.org/P74224 and previous config saved to /var/cache/conftool/dbconfig/20250313-194204-cwhite.json
- 19:36 bking@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on cloudelastic1012.eqiad.wmnet with reason: host reimage
- 19:34 swfrench@deploy2002: helmfile [codfw] DONE helmfile.d/services/mw-api-int: apply
- 19:33 swfrench@deploy2002: helmfile [codfw] START helmfile.d/services/mw-api-int: apply
- 19:32 bking@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on cloudelastic1012.eqiad.wmnet with reason: host reimage
- 19:24 swfrench-wmf: mw-(api-int|jobrunner|parsoid): reverted all traffic back to 'main' release - T383845
- 19:04 ebernhardson@deploy2002: helmfile [eqiad] DONE helmfile.d/services/cirrus-streaming-updater: apply
- 19:04 ebernhardson@deploy2002: helmfile [eqiad] START helmfile.d/services/cirrus-streaming-updater: apply
- 19:04 ebernhardson@deploy2002: helmfile [eqiad] DONE helmfile.d/services/cirrus-streaming-updater: apply
- 19:04 ebernhardson@deploy2002: helmfile [eqiad] START helmfile.d/services/cirrus-streaming-updater: apply
- 18:58 ebernhardson@deploy2002: helmfile [codfw] DONE helmfile.d/services/cirrus-streaming-updater: apply
- 18:58 ebernhardson@deploy2002: helmfile [codfw] START helmfile.d/services/cirrus-streaming-updater: apply
- 18:39 jhuneidi@deploy2002: rebuilt and synchronized wikiversions files: group2 to 1.44.0-wmf.20 refs T386215
- 18:17 jiji@deploy2002: Finished scap sync-world: scap run to deploy switch to PHP 8.1 images - T383845 (duration: 10m 28s)
- 18:11 brett@puppetserver1001: conftool action : set/pooled=yes; selector: name=cp3066.esams.wmnet
- 18:10 root@deploy1003: helmfile [aux-k8s-codfw] DONE helmfile.d/admin 'sync'.
- 18:10 root@deploy1003: helmfile [aux-k8s-codfw] START helmfile.d/admin 'sync'.
- 18:09 root@deploy1003: helmfile [aux-k8s-codfw] DONE helmfile.d/admin 'sync'.
- 18:09 root@deploy1003: helmfile [aux-k8s-codfw] START helmfile.d/admin 'sync'.
- 18:09 jiji@deploy2002: Started scap sync-world: scap run to deploy switch to PHP 8.1 images - T383845
- 18:08 root@deploy1003: helmfile [aux-k8s-codfw] DONE helmfile.d/admin 'sync'.
- 18:08 root@deploy1003: helmfile [aux-k8s-codfw] START helmfile.d/admin 'sync'.
- 18:02 jiji@deploy2002: helmfile [codfw] DONE helmfile.d/services/mw-parsoid: apply
- 18:01 root@deploy1003: helmfile [aux-k8s-codfw] DONE helmfile.d/admin 'sync'.
- 18:01 root@deploy1003: helmfile [aux-k8s-codfw] START helmfile.d/admin 'sync'.
- 18:00 jiji@deploy2002: helmfile [codfw] START helmfile.d/services/mw-parsoid: apply
- 17:56 jiji@deploy2002: helmfile [codfw] DONE helmfile.d/services/mw-parsoid: apply
- 17:56 jiji@deploy2002: helmfile [codfw] START helmfile.d/services/mw-parsoid: apply
- 17:55 jiji@deploy2002: helmfile [eqiad] DONE helmfile.d/services/mw-parsoid: apply
- 17:54 jiji@deploy2002: helmfile [eqiad] START helmfile.d/services/mw-parsoid: apply
- 17:50 root@deploy1003: helmfile [aux-k8s-codfw] DONE helmfile.d/admin 'sync'.
- 17:50 brett@puppetserver1001: conftool action : set/pooled=no; selector: name=cp3066.esams.wmnet
- 17:49 brett: Upgrading cp3066 to Varnish 7 (T378737)
- 17:49 root@deploy1003: helmfile [aux-k8s-codfw] START helmfile.d/admin 'sync'.
- 17:48 jiji@deploy2002: Stopping before sync operations
- 17:47 jiji@deploy2002: Started scap sync-world: No-sync scap run to switch image flavours to PHP 8.1 - T383845
- 17:47 root@deploy1003: helmfile [aux-k8s-codfw] DONE helmfile.d/admin 'sync'.
- 17:46 root@deploy1003: helmfile [aux-k8s-codfw] START helmfile.d/admin 'sync'.
- 17:44 swfrench@deploy2002: Unlocked for deployment [ALL REPOSITORIES]: Taking scap lock while awaiting coordinated puppet change (duration: 34m 27s)
- 17:37 bking@cumin2002: END (PASS) - Cookbook sre.elasticsearch.ban (exit_code=0) Banning hosts: cloudelastic1012* for ban host prior to reimage - bking@cumin2002 - T387904
- 17:37 bking@cumin2002: START - Cookbook sre.elasticsearch.ban Banning hosts: cloudelastic1012* for ban host prior to reimage - bking@cumin2002 - T387904
- 17:10 root@deploy1003: helmfile [aux-k8s-codfw] DONE helmfile.d/admin 'sync'.
- 17:10 root@deploy1003: helmfile [aux-k8s-codfw] START helmfile.d/admin 'sync'.
- 17:09 swfrench@deploy2002: Locking from deployment [ALL REPOSITORIES]: Taking scap lock while awaiting coordinated puppet change
- 17:05 brett@puppetserver1001: conftool action : set/pooled=yes; selector: name=cp3074.esams.wmnet
- 16:44 hashar: deployment server: rebased /srv/mediawiki-staging for 3 noop changes (d4e1c561e..a66406939)
- 16:43 brett@puppetserver1001: conftool action : set/pooled=no; selector: name=cp3074.esams.wmnet
- 16:42 brett: Upgrading cp3074 to Varnish 7 (T378737)
- 16:41 Emperor: restart swift-proxy on ms-fe2009
- 16:41 root@deploy1003: helmfile [aux-k8s-codfw] DONE helmfile.d/admin 'sync'.
- 16:41 root@deploy1003: helmfile [aux-k8s-codfw] START helmfile.d/admin 'sync'.
- 16:36 moritzm: installing gunicorn security updates
- 16:30 brett@cumin2002: END (PASS) - Cookbook sre.cdn.roll-upgrade-ats (exit_code=0) Rolling upgrade/restart of Apache Traffic Server on P{cp[3073,3081].esams.wmnet} and A:cp for 9.2.9-1wm1
- 16:25 jgiannelos@deploy2002: helmfile [codfw] DONE helmfile.d/services/mobileapps: apply
- 16:24 jgiannelos@deploy2002: helmfile [codfw] START helmfile.d/services/mobileapps: apply
- 16:24 jgiannelos@deploy2002: helmfile [eqiad] DONE helmfile.d/services/mobileapps: apply
- 16:23 jgiannelos@deploy2002: helmfile [eqiad] START helmfile.d/services/mobileapps: apply
- 16:22 jgiannelos@deploy2002: helmfile [staging] DONE helmfile.d/services/mobileapps: apply
- 16:21 jgiannelos@deploy2002: helmfile [staging] START helmfile.d/services/mobileapps: apply
- 16:18 brett@cumin2002: START - Cookbook sre.cdn.roll-upgrade-ats Rolling upgrade/restart of Apache Traffic Server on P{cp[3073,3081].esams.wmnet} and A:cp for 9.2.9-1wm1
- 15:57 klausman@deploy2002: helmfile [ml-serve-codfw] DONE helmfile.d/admin 'apply'.
- 15:57 klausman@deploy2002: helmfile [ml-serve-codfw] START helmfile.d/admin 'apply'.
- 15:56 klausman@deploy2002: helmfile [ml-serve-eqiad] DONE helmfile.d/admin 'apply'.
- 15:54 klausman@deploy2002: helmfile [ml-serve-eqiad] START helmfile.d/admin 'apply'.
- 15:48 klausman@deploy2002: helmfile [ml-serve-codfw] Ran 'sync' command on namespace 'revision-models' for release 'main' .
- 15:48 isaranto@deploy2002: helmfile [ml-serve-eqiad] Ran 'sync' command on namespace 'revision-models' for release 'main' .
- 15:48 klausman@deploy2002: helmfile [ml-serve-eqiad] Ran 'sync' command on namespace 'revision-models' for release 'main' .
- 15:44 isaranto@deploy2002: helmfile [ml-serve-eqiad] Ran 'sync' command on namespace 'revision-models' for release 'main' .
- 15:24 jgiannelos@deploy2002: helmfile [eqiad] DONE helmfile.d/services/mobileapps: apply
- 15:23 jgiannelos@deploy2002: helmfile [eqiad] START helmfile.d/services/mobileapps: apply
- 15:22 jgiannelos@deploy2002: helmfile [codfw] DONE helmfile.d/services/mobileapps: apply
- 15:21 jgiannelos@deploy2002: helmfile [codfw] START helmfile.d/services/mobileapps: apply
- 15:21 jgiannelos@deploy2002: helmfile [staging] DONE helmfile.d/services/mobileapps: apply
- 15:21 jgiannelos@deploy2002: helmfile [staging] START helmfile.d/services/mobileapps: apply
- 15:17 klausman@deploy2002: helmfile [codfw] DONE helmfile.d/services/api-gateway: apply
- 15:17 klausman@deploy2002: helmfile [codfw] START helmfile.d/services/api-gateway: apply
- 15:15 klausman@deploy2002: helmfile [eqiad] DONE helmfile.d/services/api-gateway: apply
- 15:15 klausman@deploy2002: helmfile [eqiad] START helmfile.d/services/api-gateway: apply
- 15:13 klausman@deploy2002: helmfile [staging] DONE helmfile.d/services/api-gateway: apply
- 15:12 klausman@deploy2002: helmfile [staging] START helmfile.d/services/api-gateway: apply
- 15:09 bking@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host cloudelastic1012.eqiad.wmnet with OS bullseye
- 15:03 cmooney@cumin1002: END (PASS) - Cookbook sre.netbox.update-extras (exit_code=0) rolling restart_daemons on A:netbox
- 15:03 cmooney@cumin1002: START - Cookbook sre.netbox.update-extras rolling restart_daemons on A:netbox
- 15:01 cmooney@cumin1002: END (PASS) - Cookbook sre.netbox.update-extras (exit_code=0) rolling restart_daemons on A:netbox-canary
- 15:01 cmooney@cumin1002: START - Cookbook sre.netbox.update-extras rolling restart_daemons on A:netbox-canary
- 14:57 Lucas_WMDE: UTC afternoon backport+config window done
- 14:57 lucaswerkmeister-wmde@deploy2002: Finished scap sync-world: Backport for Enable SUL3 signup for everyone (T384218), Set $wgSul3RolloutUserPercentage on some testwikis (T384153), Reapply "Make WikibaseQualityConstraints use split-graph query service" (T374021) (duration: 10m 24s)
- 14:55 hnowlan@deploy1003: helmfile [eqiad] DONE helmfile.d/services/api-gateway: sync
- 14:54 moritzm: restarting FPM on Phabricator to pick up gnutls security updates
- 14:54 hnowlan@deploy1003: helmfile [eqiad] START helmfile.d/services/api-gateway: sync
- 14:50 lucaswerkmeister-wmde@deploy2002: tgr, lucaswerkmeister-wmde: Continuing with sync
- 14:50 moritzm: restarting slapd on serpens/seaborgium to pick up gnutls updates
- 14:50 lucaswerkmeister-wmde@deploy2002: tgr, lucaswerkmeister-wmde: Backport for Enable SUL3 signup for everyone (T384218), Set $wgSul3RolloutUserPercentage on some testwikis (T384153), Reapply "Make WikibaseQualityConstraints use split-graph query service" (T374021) synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug)
- 14:48 bking@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on cloudelastic1012.eqiad.wmnet with reason: host reimage
- 14:47 lucaswerkmeister-wmde@deploy2002: Started scap sync-world: Backport for Enable SUL3 signup for everyone (T384218), Set $wgSul3RolloutUserPercentage on some testwikis (T384153), Reapply "Make WikibaseQualityConstraints use split-graph query service" (T374021)
- 14:45 bking@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on cloudelastic1012.eqiad.wmnet with reason: host reimage
- 14:44 lucaswerkmeister-wmde@deploy2002: Finished scap sync-world: Backport for Follow-up Ia4b9f65b6: Fix argument order passed to EditCheckFactory#create (T388722) (duration: 11m 31s)
- 14:37 lucaswerkmeister-wmde@deploy2002: lucaswerkmeister-wmde, kemayo: Continuing with sync
- 14:35 lucaswerkmeister-wmde@deploy2002: lucaswerkmeister-wmde, kemayo: Backport for Follow-up Ia4b9f65b6: Fix argument order passed to EditCheckFactory#create (T388722) synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug)
- 14:35 bking@cumin2002: START - Cookbook sre.hosts.reimage for host cloudelastic1012.eqiad.wmnet with OS bullseye
- 14:35 jmm@cumin2002: END (PASS) - Cookbook sre.o11y.roll-restart-reboot-logstash-collectors (exit_code=0) rolling restart_daemons on A:logstash-collector
- 14:33 btullis@deploy2002: helmfile [dse-k8s-eqiad] DONE helmfile.d/services/mediawiki-dumps-legacy: apply
- 14:33 btullis@deploy2002: helmfile [dse-k8s-eqiad] START helmfile.d/services/mediawiki-dumps-legacy: apply
- 14:32 lucaswerkmeister-wmde@deploy2002: Started scap sync-world: Backport for Follow-up Ia4b9f65b6: Fix argument order passed to EditCheckFactory#create (T388722)
- 14:32 brouberol@deploy2002: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/airflow-main: apply
- 14:32 brouberol@deploy2002: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/airflow-main: apply
- 14:31 brouberol@deploy2002: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/airflow-test-k8s: apply
- 14:31 bking@cumin2002: END (ERROR) - Cookbook sre.hosts.reimage (exit_code=97) for host cloudelastic1012.eqiad.wmnet with OS bullseye
- 14:31 brouberol@deploy2002: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/airflow-test-k8s: apply
- 14:30 jiji@deploy2002: helmfile [eqiad] DONE helmfile.d/services/mw-parsoid: apply
- 14:30 jiji@deploy2002: helmfile [eqiad] START helmfile.d/services/mw-parsoid: apply
- 14:27 jmm@cumin2002: START - Cookbook sre.o11y.roll-restart-reboot-logstash-collectors rolling restart_daemons on A:logstash-collector
- 14:26 jhancock@cumin2002: START - Cookbook sre.hosts.reimage for host ms-be2075.codfw.wmnet with OS bullseye
- 14:24 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host ms-be2075.mgmt.codfw.wmnet with chassis set policy GRACEFUL_RESTART and with Dell SCP reboot policy GRACEFUL
- 14:19 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host ms-be2075.mgmt.codfw.wmnet with chassis set policy GRACEFUL_RESTART and with Dell SCP reboot policy GRACEFUL
- 14:12 moritzm: installing gnutls security updates
- 14:06 jiji@deploy2002: helmfile [codfw] DONE helmfile.d/services/mw-parsoid: apply
- 14:05 effie: restarting parsoid on codfw
- 14:04 jiji@deploy2002: helmfile [codfw] START helmfile.d/services/mw-parsoid: apply
- 14:01 kcvelaga@deploy2002: Finished deploy [airflow-dags/analytics_product@554407c]: T362615 (duration: 01m 39s)
- 14:00 kcvelaga@deploy2002: Started deploy [airflow-dags/analytics_product@554407c]: T362615
- 13:50 bking@cumin2002: START - Cookbook sre.hosts.reimage for host cloudelastic1012.eqiad.wmnet with OS bullseye
- 13:46 bking@cumin2002: END (ERROR) - Cookbook sre.hosts.reimage (exit_code=97) for host cloudelastic1012.eqiad.wmnet with OS bullseye
- 13:45 bking@cumin2002: START - Cookbook sre.hosts.reimage for host cloudelastic1012.eqiad.wmnet with OS bullseye
- 13:44 bking@cumin2002: conftool action : set/pooled=no; selector: service=cloudelastic,name=cloudelastic1012.eqiad.wmnet
- 13:22 isaranto@deploy2002: helmfile [ml-serve-codfw] Ran 'sync' command on namespace 'revision-models' for release 'main' .
- 13:22 isaranto@deploy2002: helmfile [ml-staging-codfw] Ran 'sync' command on namespace 'revision-models' for release 'main' .
- 13:21 isaranto@deploy2002: helmfile [ml-serve-eqiad] Ran 'sync' command on namespace 'revision-models' for release 'main' .
- 12:51 ladsgroup@deploy2002: helmfile [codfw] DONE helmfile.d/services/changeprop-jobqueue: apply
- 12:50 ladsgroup@deploy2002: helmfile [codfw] START helmfile.d/services/changeprop-jobqueue: apply
- 12:50 ladsgroup@deploy2002: helmfile [eqiad] DONE helmfile.d/services/changeprop-jobqueue: apply
- 12:49 ladsgroup@deploy2002: helmfile [eqiad] START helmfile.d/services/changeprop-jobqueue: apply
- 12:49 ladsgroup@deploy2002: helmfile [staging] DONE helmfile.d/services/changeprop-jobqueue: apply
- 12:49 ladsgroup@deploy2002: helmfile [staging] START helmfile.d/services/changeprop-jobqueue: apply
- 12:28 jiji@deploy2002: helmfile [codfw] DONE helmfile.d/services/mw-jobrunner: apply
- 12:28 moritzm: installing tiff security updates
- 12:27 jiji@deploy2002: helmfile [codfw] START helmfile.d/services/mw-jobrunner: apply
- 12:09 hnowlan@deploy1003: helmfile [codfw] DONE helmfile.d/services/mw-api-int: apply
- 12:08 hnowlan@deploy1003: helmfile [codfw] START helmfile.d/services/mw-api-int: apply
- 12:07 hnowlan@deploy1003: helmfile [codfw] DONE helmfile.d/services/mw-api-int: apply
- 12:07 hnowlan@deploy1003: helmfile [codfw] START helmfile.d/services/mw-api-int: apply
- 11:58 cmooney@dns2005: END - running authdns-update
- 11:58 jiji@deploy2002: helmfile [eqiad] DONE helmfile.d/services/mw-api-int: apply
- 11:56 cmooney@dns2005: START - running authdns-update
- 11:56 jiji@deploy2002: helmfile [eqiad] START helmfile.d/services/mw-api-int: apply
- 11:56 jiji@deploy2002: helmfile [codfw] DONE helmfile.d/services/mw-api-int: apply
- 11:56 jiji@deploy2002: helmfile [codfw] START helmfile.d/services/mw-api-int: apply
- 11:56 cmooney@cumin1002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
- 11:56 cmooney@cumin1002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: remove old dns entries for lvs6xxx vlan sub-int IPs - cmooney@cumin1002"
- 11:56 cmooney@cumin1002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: remove old dns entries for lvs6xxx vlan sub-int IPs - cmooney@cumin1002"
- 11:55 jiji@deploy2002: helmfile [codfw] DONE helmfile.d/services/mw-api-int: sync
- 11:50 cmooney@cumin1002: START - Cookbook sre.dns.netbox
- 11:48 ladsgroup@deploy2002: helmfile [codfw] DONE helmfile.d/services/changeprop-jobqueue: apply
- 11:47 ladsgroup@deploy2002: helmfile [codfw] START helmfile.d/services/changeprop-jobqueue: apply
- 11:47 ladsgroup@deploy2002: helmfile [eqiad] DONE helmfile.d/services/changeprop-jobqueue: apply
- 11:46 ladsgroup@deploy2002: helmfile [eqiad] START helmfile.d/services/changeprop-jobqueue: apply
- 11:45 ladsgroup@deploy2002: helmfile [staging] DONE helmfile.d/services/changeprop-jobqueue: apply
- 11:45 ladsgroup@deploy2002: helmfile [staging] START helmfile.d/services/changeprop-jobqueue: apply
- 11:44 ladsgroup@deploy2002: helmfile [staging] DONE helmfile.d/services/changeprop-jobqueue: apply
- 11:43 effie: rolling restarting mw-api-int
- 11:43 ladsgroup@deploy2002: helmfile [staging] START helmfile.d/services/changeprop-jobqueue: apply
- 11:43 jiji@deploy2002: helmfile [codfw] START helmfile.d/services/mw-api-int: sync
- 11:36 jiji@deploy2002: helmfile [codfw] DONE helmfile.d/services/changeprop-jobqueue: sync
- 11:35 jiji@deploy2002: helmfile [codfw] START helmfile.d/services/changeprop-jobqueue: sync
- 11:35 jiji@deploy2002: helmfile [eqiad] DONE helmfile.d/services/changeprop-jobqueue: sync
- 11:34 jiji@deploy2002: helmfile [eqiad] START helmfile.d/services/changeprop-jobqueue: sync
- 11:28 jiji@deploy2002: scap failed: <KeyError> 'production' (scap version: 4.140.0) (duration: 13m 16s)
- 11:26 elukey@cumin1002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host restbase1045.eqiad.wmnet with OS bullseye
- 11:26 elukey@cumin1002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - elukey@cumin1002"
- 11:20 stevemunene@cumin1002: END (FAIL) - Cookbook sre.hadoop.roll-restart-masters (exit_code=99) restart masters for Hadoop analytics cluster: Restart of jvm daemons.
- 11:15 jiji@deploy2002: Started scap sync-world: (T383845) mw-(api-int|parsoid|jobrunner): switch all releases to PHP 8.1
- 11:08 klausman@deploy2002: helmfile [staging] DONE helmfile.d/services/api-gateway: apply
- 11:08 klausman@deploy2002: helmfile [staging] START helmfile.d/services/api-gateway: apply
- 11:06 elukey@cumin1002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - elukey@cumin1002"
- 11:05 stevemunene@cumin1002: START - Cookbook sre.hadoop.roll-restart-masters restart masters for Hadoop analytics cluster: Restart of jvm daemons.
- 10:51 elukey@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on restbase1045.eqiad.wmnet with reason: host reimage
- 10:50 jgiannelos@deploy2002: helmfile [staging] DONE helmfile.d/services/mobileapps: apply
- 10:49 jgiannelos@deploy2002: helmfile [staging] START helmfile.d/services/mobileapps: apply
- 10:48 jgiannelos@deploy2002: helmfile [staging] DONE helmfile.d/services/mobileapps: apply
- 10:48 jgiannelos@deploy2002: helmfile [staging] START helmfile.d/services/mobileapps: apply
- 10:48 elukey@cumin1002: START - Cookbook sre.hosts.downtime for 2:00:00 on restbase1045.eqiad.wmnet with reason: host reimage
- 10:39 vgutierrez@cumin1002: END (PASS) - Cookbook sre.loadbalancer.migrate-service-ipip (exit_code=0) for role: wdqs::internal_scholarly@eqiad
- 10:39 vgutierrez@cumin1002: END (PASS) - Cookbook sre.loadbalancer.restart-pybal (exit_code=0) rolling-restart of pybal on (A:lvs-low-traffic-eqiad or A:lvs-secondary-eqiad) and A:bullseye and A:lvs
- 10:38 vgutierrez@cumin1002: START - Cookbook sre.loadbalancer.restart-pybal rolling-restart of pybal on (A:lvs-low-traffic-eqiad or A:lvs-secondary-eqiad) and A:bullseye and A:lvs
- 10:36 elukey@cumin1002: START - Cookbook sre.hosts.reimage for host restbase1045.eqiad.wmnet with OS bullseye
- 10:36 elukey@cumin1002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host restbase1044.eqiad.wmnet with OS bullseye
- 10:36 jmm@cumin2002: END (FAIL) - Cookbook sre.ganeti.addnode (exit_code=99) for new host ganeti1034.eqiad.wmnet to cluster eqiad and group D
- 10:36 elukey@cumin1002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - elukey@cumin1002"
- 10:34 vgutierrez@cumin1002: START - Cookbook sre.loadbalancer.migrate-service-ipip for role: wdqs::internal_scholarly@eqiad
- 10:34 vgutierrez@cumin1002: END (FAIL) - Cookbook sre.loadbalancer.migrate-service-ipip (exit_code=99) for role: wdqs::internal_scholarly@eqiad
- 10:34 klausman@deploy2002: helmfile [staging] DONE helmfile.d/services/api-gateway: apply
- 10:34 klausman@deploy2002: helmfile [staging] START helmfile.d/services/api-gateway: apply
- 10:31 jmm@cumin2002: START - Cookbook sre.ganeti.addnode for new host ganeti1034.eqiad.wmnet to cluster eqiad and group D
- 10:31 vgutierrez@cumin1002: START - Cookbook sre.loadbalancer.migrate-service-ipip for role: wdqs::internal_scholarly@eqiad
- 10:28 vgutierrez@cumin1002: END (PASS) - Cookbook sre.loadbalancer.migrate-service-ipip (exit_code=0) for role: wdqs::internal_scholarly@codfw
- 10:28 vgutierrez@cumin1002: END (PASS) - Cookbook sre.loadbalancer.restart-pybal (exit_code=0) rolling-restart of pybal on (A:lvs-low-traffic-codfw or A:lvs-secondary-codfw) and A:bullseye and A:lvs
- 10:27 vgutierrez@cumin1002: START - Cookbook sre.loadbalancer.restart-pybal rolling-restart of pybal on (A:lvs-low-traffic-codfw or A:lvs-secondary-codfw) and A:bullseye and A:lvs
- 10:25 elukey@cumin1002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - elukey@cumin1002"
- 10:21 vgutierrez@cumin1002: START - Cookbook sre.loadbalancer.migrate-service-ipip for role: wdqs::internal_scholarly@codfw
- 10:18 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ganeti1034.eqiad.wmnet
- 10:11 elukey@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on restbase1044.eqiad.wmnet with reason: host reimage
- 10:09 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host ganeti1034.eqiad.wmnet
- 10:08 elukey@cumin1002: START - Cookbook sre.hosts.downtime for 2:00:00 on restbase1044.eqiad.wmnet with reason: host reimage
- 09:56 elukey@cumin1002: START - Cookbook sre.hosts.reimage for host restbase1044.eqiad.wmnet with OS bullseye
- 09:56 elukey@cumin1002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host restbase1043.eqiad.wmnet with OS bullseye
- 09:56 elukey@cumin1002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - elukey@cumin1002"
- 09:53 elukey@cumin1002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - elukey@cumin1002"
- 09:42 volans: uploaded cumin_5.1.0 to apt.wikimedia.org bullseye-wikimedia
- 09:40 elukey@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on restbase1043.eqiad.wmnet with reason: host reimage
- 09:37 vgutierrez@cumin1002: END (PASS) - Cookbook sre.loadbalancer.admin (exit_code=0) config_reloading P{lvs6001.drmrs.wmnet} and A:liberica (T384477)
- 09:37 vgutierrez@cumin1002: START - Cookbook sre.loadbalancer.admin config_reloading P{lvs6001.drmrs.wmnet} and A:liberica (T384477)
- 09:36 elukey@cumin1002: START - Cookbook sre.hosts.downtime for 2:00:00 on restbase1043.eqiad.wmnet with reason: host reimage
- 09:24 elukey@cumin1002: START - Cookbook sre.hosts.reimage for host restbase1043.eqiad.wmnet with OS bullseye
- 09:22 stevemunene@cumin1002: END (PASS) - Cookbook sre.hadoop.init-hadoop-workers (exit_code=0) for hosts an-worker1204.eqiad.wmnet
- 09:20 stevemunene@cumin1002: START - Cookbook sre.hadoop.init-hadoop-workers for hosts an-worker1204.eqiad.wmnet
- 09:15 elukey@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host restbase1045.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART
- 09:12 gkyziridis@deploy2002: helmfile [ml-staging-codfw] Ran 'sync' command on namespace 'experimental' for release 'main' .
- 09:10 elukey@cumin2002: START - Cookbook sre.hosts.provision for host restbase1045.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART
- 09:06 stevemunene@cumin1002: END (PASS) - Cookbook sre.hadoop.init-hadoop-workers (exit_code=0) for hosts an-worker1204.eqiad.wmnet
- 09:04 stevemunene@cumin1002: START - Cookbook sre.hadoop.init-hadoop-workers for hosts an-worker1204.eqiad.wmnet
- 09:02 vgutierrez@cumin1002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host lvs6001.drmrs.wmnet with OS bookworm
- 08:51 vgutierrez@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on lvs6001.drmrs.wmnet with reason: host reimage
- 08:48 vgutierrez@cumin1002: START - Cookbook sre.hosts.downtime for 2:00:00 on lvs6001.drmrs.wmnet with reason: host reimage
- 08:46 vgutierrez@cumin1002: START - Cookbook sre.hosts.reimage for host lvs6001.drmrs.wmnet with OS bookworm
- 08:45 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host ganeti1034.eqiad.wmnet with OS bookworm
- 08:30 arnaudb@cumin1002: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 4:00:00 on gerrit2003.wikimedia.org with reason: testing
- 08:28 kevinbazira@deploy2002: helmfile [ml-staging-codfw] Ran 'sync' command on namespace 'article-models' for release 'main' .
- 08:28 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on ganeti1034.eqiad.wmnet with reason: host reimage
- 08:25 elukey@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host restbase1044.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART
- 08:24 jmm@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on ganeti1034.eqiad.wmnet with reason: host reimage
- 08:20 vgutierrez@cumin1002: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host lvs6001.drmrs.wmnet with OS bookworm
- 08:15 stevemunene@cumin1002: END (FAIL) - Cookbook sre.hadoop.init-hadoop-workers (exit_code=99) for hosts an-worker1204.eqiad.wmnet
- 08:14 elukey@cumin2002: START - Cookbook sre.hosts.provision for host restbase1044.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART
- 08:14 stevemunene@cumin1002: START - Cookbook sre.hadoop.init-hadoop-workers for hosts an-worker1204.eqiad.wmnet
- 08:12 stevemunene@cumin1002: END (PASS) - Cookbook sre.hadoop.init-hadoop-workers (exit_code=0) for hosts an-worker1204.eqiad.wmnet
- 08:10 vgutierrez@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on lvs6001.drmrs.wmnet with reason: host reimage
- 08:09 stevemunene@cumin1002: START - Cookbook sre.hadoop.init-hadoop-workers for hosts an-worker1204.eqiad.wmnet
- 08:06 vgutierrez@cumin1002: START - Cookbook sre.hosts.downtime for 2:00:00 on lvs6001.drmrs.wmnet with reason: host reimage
- 08:03 elukey@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host restbase1043.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART
- 08:02 jmm@cumin2002: START - Cookbook sre.hosts.reimage for host ganeti1034.eqiad.wmnet with OS bookworm
- 07:58 elukey@cumin2002: START - Cookbook sre.hosts.provision for host restbase1043.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART
- 07:57 elukey@cumin1002: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host restbase1043.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART
- 07:57 elukey@cumin1002: START - Cookbook sre.hosts.provision for host restbase1043.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART
- 07:50 vgutierrez@cumin1002: START - Cookbook sre.hosts.reimage for host lvs6001.drmrs.wmnet with OS bookworm
- 07:42 vgutierrez@cumin1002: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 0:30:00 on lvs6001.drmrs.wmnet with reason: depooled before reimage
- 07:42 krinkle@deploy2002: Finished scap sync-world: Backport for fatal-error: Ensure action=cache max-age is higher than response time (duration: 11m 28s)
- 07:41 vgutierrez: depool lvs6001 before being reimaged - T384477
- 07:35 krinkle@deploy2002: krinkle: Continuing with sync
- 07:33 krinkle@deploy2002: krinkle: Backport for fatal-error: Ensure action=cache max-age is higher than response time synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug)
- 07:30 krinkle@deploy2002: Started scap sync-world: Backport for fatal-error: Ensure action=cache max-age is higher than response time
- 07:24 marostegui@cumin1002: dbctl commit (dc=all): 'es2039 (re)pooling @ 100%: Repooling', diff saved to https://phabricator.wikimedia.org/P74215 and previous config saved to /var/cache/conftool/dbconfig/20250313-072403-root.json
- 07:21 marostegui@cumin1002: dbctl commit (dc=all): 'es1039 (re)pooling @ 100%: Repooling', diff saved to https://phabricator.wikimedia.org/P74214 and previous config saved to /var/cache/conftool/dbconfig/20250313-072141-root.json
- 07:19 stevemunene@cumin1002: END (PASS) - Cookbook sre.hadoop.init-hadoop-workers (exit_code=0) for hosts an-worker[1200-1208].eqiad.wmnet
- 07:08 marostegui@cumin1002: dbctl commit (dc=all): 'es2039 (re)pooling @ 75%: Repooling', diff saved to https://phabricator.wikimedia.org/P74213 and previous config saved to /var/cache/conftool/dbconfig/20250313-070857-root.json
- 07:06 marostegui@cumin1002: dbctl commit (dc=all): 'es1039 (re)pooling @ 75%: Repooling', diff saved to https://phabricator.wikimedia.org/P74212 and previous config saved to /var/cache/conftool/dbconfig/20250313-070636-root.json
- 06:53 marostegui@cumin1002: dbctl commit (dc=all): 'es2039 (re)pooling @ 50%: Repooling', diff saved to https://phabricator.wikimedia.org/P74211 and previous config saved to /var/cache/conftool/dbconfig/20250313-065351-root.json
- 06:51 marostegui@cumin1002: dbctl commit (dc=all): 'es1039 (re)pooling @ 50%: Repooling', diff saved to https://phabricator.wikimedia.org/P74210 and previous config saved to /var/cache/conftool/dbconfig/20250313-065129-root.json
- 06:38 marostegui@cumin1002: dbctl commit (dc=all): 'es2039 (re)pooling @ 25%: Repooling', diff saved to https://phabricator.wikimedia.org/P74209 and previous config saved to /var/cache/conftool/dbconfig/20250313-063846-root.json
- 06:36 marostegui@cumin1002: dbctl commit (dc=all): 'es1039 (re)pooling @ 25%: Repooling', diff saved to https://phabricator.wikimedia.org/P74208 and previous config saved to /var/cache/conftool/dbconfig/20250313-063624-root.json
- 06:23 marostegui@cumin1002: dbctl commit (dc=all): 'es2039 (re)pooling @ 10%: Repooling', diff saved to https://phabricator.wikimedia.org/P74207 and previous config saved to /var/cache/conftool/dbconfig/20250313-062341-root.json
- 06:13 dzahn@cumin1002: END (PASS) - Cookbook sre.gitlab.upgrade (exit_code=0) on GitLab host gitlab2002.wikimedia.org with reason: security release
- 05:58 kevinbazira@deploy2002: helmfile [ml-staging-codfw] Ran 'sync' command on namespace 'article-models' for release 'main' .
- 05:40 kevinbazira@deploy2002: helmfile [ml-staging-codfw] Ran 'sync' command on namespace 'article-models' for release 'main' .
- 05:21 kevinbazira@deploy2002: helmfile [ml-staging-codfw] Ran 'sync' command on namespace 'article-models' for release 'main' .
- 03:10 dzahn@cumin1002: START - Cookbook sre.gitlab.upgrade on GitLab host gitlab2002.wikimedia.org with reason: security release
- 03:06 dzahn@cumin1002: END (PASS) - Cookbook sre.gitlab.upgrade (exit_code=0) on GitLab host gitlab1004.wikimedia.org with reason: security release
- 03:00 dzahn@cumin1002: END (PASS) - Cookbook sre.gitlab.upgrade (exit_code=0) on GitLab host gitlab1003.wikimedia.org with reason: security release
- 02:58 dzahn@cumin1002: START - Cookbook sre.gitlab.upgrade on GitLab host gitlab1004.wikimedia.org with reason: security release
- 02:56 dzahn@cumin1002: END (PASS) - Cookbook sre.gitlab.upgrade (exit_code=0) on GitLab host gitlab1004.wikimedia.org with reason: security release
- 02:55 dzahn@cumin1002: START - Cookbook sre.gitlab.upgrade on GitLab host gitlab1004.wikimedia.org with reason: security release
- 02:53 dzahn@cumin1002: START - Cookbook sre.gitlab.upgrade on GitLab host gitlab1003.wikimedia.org with reason: security release
- 02:11 pt1979@cumin1002: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host restbase1043.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART
- 02:05 pt1979@cumin1002: START - Cookbook sre.hosts.provision for host restbase1043.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART
- 01:13 Daimona: Manually fixing 5 bad abuse_filter_log rows in mediawikiwiki for T388732
2025-03-12
- 22:34 mforns@deploy2002: Finished deploy [airflow-dags/analytics@868fdba]: deploy CIM allow list update and DEPRECATED tags for Kubernetes migration (duration: 01m 17s)
- 22:33 mforns@deploy2002: Started deploy [airflow-dags/analytics@868fdba]: deploy CIM allow list update and DEPRECATED tags for Kubernetes migration
- 22:24 krinkle@deploy2002: Synchronized w/fatal-error.php: I1c677ca1cf7d (duration: 08m 41s)
- 21:19 jclark@cumin1002: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host restbase1045.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART
- 21:17 jclark@cumin1002: START - Cookbook sre.hosts.provision for host restbase1045.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART
- 21:17 jclark@cumin1002: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host restbase1043.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART
- 21:15 jclark@cumin1002: START - Cookbook sre.hosts.provision for host restbase1043.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART
- 21:15 jclark@cumin1002: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host db1257.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART
- 21:14 Reedy: create translate tables on officewiki T380414
- 21:13 jclark@cumin1002: START - Cookbook sre.hosts.provision for host db1257.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART
- 21:08 reedy@deploy2002: Synchronized wmf-config/: Various config changes (duration: 08m 42s)
- 20:57 Reedy: created wikilove tables on foundationwiki T381065
- 20:34 jclark@cumin1002: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host restbase1044.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART
- 20:33 jclark@cumin1002: START - Cookbook sre.hosts.provision for host restbase1044.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART
- 20:23 brett@puppetserver1001: conftool action : set/pooled=yes; selector: name=cp4052.ulsfo.wmnet
- 20:23 jdrewniak@deploy2002: Finished scap sync-world: Backport for Fixes event logging for main menu button (T387768), Add donation banner images (T388446) (duration: 14m 42s)
- 20:16 jdrewniak@deploy2002: jdrewniak, jdlrobson: Continuing with sync
- 20:12 gmodena@deploy2002: helmfile [eqiad] DONE helmfile.d/services/cirrus-streaming-updater: apply
- 20:12 gmodena@deploy2002: helmfile [eqiad] START helmfile.d/services/cirrus-streaming-updater: apply
- 20:11 jdrewniak@deploy2002: jdrewniak, jdlrobson: Backport for Fixes event logging for main menu button (T387768), Add donation banner images (T388446) synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug)
- 20:08 jdrewniak@deploy2002: Started scap sync-world: Backport for Fixes event logging for main menu button (T387768), Add donation banner images (T388446)
- 20:06 brett@puppetserver1001: conftool action : set/pooled=no; selector: name=cp4052.ulsfo.wmnet
- 20:06 brett: Upgrading cp4052 (upload) to Varnish 7 (T378737)
- 20:06 gmodena@deploy2002: helmfile [eqiad] DONE helmfile.d/services/cirrus-streaming-updater: apply
- 20:05 gmodena@deploy2002: helmfile [eqiad] START helmfile.d/services/cirrus-streaming-updater: apply
- 20:00 gmodena@deploy2002: helmfile [codfw] DONE helmfile.d/services/cirrus-streaming-updater: apply
- 20:00 gmodena@deploy2002: helmfile [codfw] START helmfile.d/services/cirrus-streaming-updater: apply
- 20:00 gmodena@deploy2002: helmfile [codfw] DONE helmfile.d/services/cirrus-streaming-updater: apply
- 19:59 gmodena@deploy2002: helmfile [codfw] START helmfile.d/services/cirrus-streaming-updater: apply
- 19:38 brett@cumin2002: END (PASS) - Cookbook sre.cdn.roll-upgrade-ats (exit_code=0) Rolling upgrade/restart of Apache Traffic Server on P{cp404[5-7].ulsfo.wmnet} and A:cp for 9.2.9-1wm1
- 19:07 brett@cumin2002: START - Cookbook sre.cdn.roll-upgrade-ats Rolling upgrade/restart of Apache Traffic Server on P{cp403[7,9].ulsfo.wmnet} and A:cp for 9.2.9-1wm1
- 19:07 ebysans@deploy2002: Finished deploy [analytics/refinery@fe214cf]: Regular analytics weekly train [analytics/refinery@fe214cfb] (duration: 02m 47s)
- 19:05 ebysans@deploy2002: Started deploy [analytics/refinery@fe214cf]: Regular analytics weekly train [analytics/refinery@fe214cfb]
- 19:04 sandraebele: deploying refinery
- 19:02 brett@cumin2002: END (PASS) - Cookbook sre.cdn.roll-upgrade-ats (exit_code=0) Rolling upgrade/restart of Apache Traffic Server on P{cp404[0-3].ulsfo.wmnet} and A:cp for 9.2.9-1wm1
- 18:48 swfrench-wmf: mw-(api-ext|web): scaled latent 'next' deployments down to 1 pod - T383845
- 18:47 swfrench@deploy2002: helmfile [codfw] DONE helmfile.d/services/mw-web: apply
- 18:46 swfrench@deploy2002: helmfile [codfw] START helmfile.d/services/mw-web: apply
- 18:46 swfrench@deploy2002: helmfile [eqiad] DONE helmfile.d/services/mw-web: apply
- 18:46 swfrench@deploy2002: helmfile [eqiad] START helmfile.d/services/mw-web: apply
- 18:43 swfrench@deploy2002: helmfile [codfw] DONE helmfile.d/services/mw-api-ext: apply
- 18:43 swfrench@deploy2002: helmfile [codfw] START helmfile.d/services/mw-api-ext: apply
- 18:43 swfrench@deploy2002: helmfile [eqiad] DONE helmfile.d/services/mw-api-ext: apply
- 18:43 swfrench@deploy2002: helmfile [eqiad] START helmfile.d/services/mw-api-ext: apply
- 18:36 Amir1: marking ~3K revisions with bad blobs (T351953)
- 18:35 brett@cumin2002: START - Cookbook sre.cdn.roll-upgrade-ats Rolling upgrade/restart of Apache Traffic Server on P{cp404[0-3].ulsfo.wmnet} and A:cp for 9.2.9-1wm1
- 18:32 brett@cumin2002: END (PASS) - Cookbook sre.cdn.roll-upgrade-ats (exit_code=0) Rolling upgrade/restart of Apache Traffic Server on P{cp4048.ulsfo.wmnet} and A:cp for 9.2.9-1wm1
- 18:29 brett@cumin2002: START - Cookbook sre.cdn.roll-upgrade-ats Rolling upgrade/restart of Apache Traffic Server on P{cp4048.ulsfo.wmnet} and A:cp for 9.2.9-1wm1
- 18:20 jhuneidi@deploy2002: rebuilt and synchronized wikiversions files: group1 to 1.44.0-wmf.20 refs T386215
- 18:19 brett@cumin2002: END (PASS) - Cookbook sre.cdn.roll-upgrade-ats (exit_code=0) Rolling upgrade/restart of Apache Traffic Server on P{cp4049.ulsfo.wmnet} and A:cp for 9.2.9-1wm1
- 18:16 brett@cumin2002: START - Cookbook sre.cdn.roll-upgrade-ats Rolling upgrade/restart of Apache Traffic Server on P{cp4049.ulsfo.wmnet} and A:cp for 9.2.9-1wm1
- 18:07 swfrench-wmf: ran cumin -b8 -s90 'A:cp-text' 'run-puppet-agent -e "merging ATS Lua config change - T383845"'
- 17:44 sandraebele: deploying refinery source as part of weekly deployment train
- 17:37 swfrench-wmf: ran cumin 'A:cp-text' 'disable-puppet "merging ATS Lua config change - T383845"'
- 17:35 swfrench-wmf: mw-(api-ext|web): scaled 'main' releases back to normal size - T383845
- 17:34 swfrench@deploy2002: helmfile [codfw] DONE helmfile.d/services/mw-web: apply
- 17:34 swfrench@deploy2002: helmfile [codfw] START helmfile.d/services/mw-web: apply
- 17:34 swfrench@deploy2002: helmfile [eqiad] DONE helmfile.d/services/mw-web: apply
- 17:33 swfrench@deploy2002: helmfile [eqiad] START helmfile.d/services/mw-web: apply
- 17:33 swfrench@deploy2002: helmfile [codfw] DONE helmfile.d/services/mw-api-ext: apply
- 17:33 swfrench@deploy2002: helmfile [codfw] START helmfile.d/services/mw-api-ext: apply
- 17:32 swfrench@deploy2002: helmfile [eqiad] DONE helmfile.d/services/mw-api-ext: apply
- 17:32 swfrench@deploy2002: helmfile [eqiad] START helmfile.d/services/mw-api-ext: apply
- 17:28 swfrench-wmf: mw-(api-ext|web): reverted all non-cookie-migrated traffic back to 'main' release - T383845
- 17:27 brett@cumin2002: END (PASS) - Cookbook sre.cdn.roll-upgrade-ats (exit_code=0) Rolling upgrade/restart of Apache Traffic Server on P{cp4050.ulsfo.wmnet} and A:cp for 9.2.9-1wm1
- 17:26 swfrench@deploy2002: helmfile [codfw] DONE helmfile.d/services/mw-web: apply
- 17:26 swfrench@deploy2002: helmfile [codfw] START helmfile.d/services/mw-web: apply
- 17:25 swfrench@deploy2002: helmfile [eqiad] DONE helmfile.d/services/mw-web: apply
- 17:25 swfrench@deploy2002: helmfile [eqiad] START helmfile.d/services/mw-web: apply
- 17:24 swfrench@deploy2002: helmfile [codfw] DONE helmfile.d/services/mw-web: apply
- 17:24 swfrench@deploy2002: helmfile [codfw] START helmfile.d/services/mw-web: apply
- 17:24 brett@cumin2002: START - Cookbook sre.cdn.roll-upgrade-ats Rolling upgrade/restart of Apache Traffic Server on P{cp4050.ulsfo.wmnet} and A:cp for 9.2.9-1wm1
- 17:23 swfrench@deploy2002: helmfile [eqiad] DONE helmfile.d/services/mw-web: apply
- 17:23 swfrench@deploy2002: helmfile [eqiad] START helmfile.d/services/mw-web: apply
- 17:21 swfrench@deploy2002: helmfile [codfw] DONE helmfile.d/services/mw-api-ext: apply
- 17:21 swfrench@deploy2002: helmfile [codfw] START helmfile.d/services/mw-api-ext: apply
- 17:20 swfrench@deploy2002: helmfile [eqiad] DONE helmfile.d/services/mw-api-ext: apply
- 17:20 swfrench@deploy2002: helmfile [eqiad] START helmfile.d/services/mw-api-ext: apply
- 17:19 swfrench@deploy2002: helmfile [codfw] DONE helmfile.d/services/mw-api-ext: apply
- 17:19 swfrench@deploy2002: helmfile [codfw] START helmfile.d/services/mw-api-ext: apply
- 17:19 swfrench@deploy2002: helmfile [eqiad] DONE helmfile.d/services/mw-api-ext: apply
- 17:18 swfrench@deploy2002: helmfile [eqiad] START helmfile.d/services/mw-api-ext: apply
- 17:06 swfrench-wmf: mw-(api-ext|web): migrated 100% of residual PHP 7.4 traffic to 8.1 - T383845
- 17:06 swfrench@deploy2002: Finished scap sync-world: helmfile-only deployment to apply remaining 8.1 diffs on mw-(api-ext|web) - T383845 (duration: 05m 03s)
- 17:02 swfrench@deploy2002: Started scap sync-world: helmfile-only deployment to apply remaining 8.1 diffs on mw-(api-ext|web) - T383845
- 16:57 swfrench@deploy2002: helmfile [eqiad] DONE helmfile.d/services/mw-web: apply
- 16:56 swfrench@deploy2002: helmfile [eqiad] START helmfile.d/services/mw-web: apply
- 16:54 swfrench@deploy2002: helmfile [eqiad] DONE helmfile.d/services/mw-web: apply
- 16:53 swfrench@deploy2002: helmfile [eqiad] START helmfile.d/services/mw-web: apply
- 16:52 swfrench@deploy2002: helmfile [eqiad] DONE helmfile.d/services/mw-api-ext: apply
- 16:51 swfrench@deploy2002: helmfile [eqiad] START helmfile.d/services/mw-api-ext: apply
- 16:47 swfrench@deploy2002: helmfile [eqiad] DONE helmfile.d/services/mw-api-ext: apply
- 16:47 swfrench@deploy2002: helmfile [eqiad] START helmfile.d/services/mw-api-ext: apply
- 16:44 brouberol@deploy2002: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/airflow-wmde: apply
- 16:44 swfrench@deploy2002: Stopping before sync operations
- 16:43 brouberol@deploy2002: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/airflow-wmde: apply
- 16:43 swfrench@deploy2002: Started scap sync-world: No-sync scap run to update helmfile release values for mw-(api-ext|web) - T383845
- 16:42 brouberol@deploy2002: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/airflow-research: apply
- 16:42 brouberol@deploy2002: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/airflow-research: apply
- 16:42 brouberol@deploy2002: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/airflow-search: apply
- 16:41 brouberol@deploy2002: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/airflow-search: apply
- 16:39 brouberol@deploy2002: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/airflow-ml: apply
- 16:39 jmm@cumin2002: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts ganeti1034.eqiad.wmnet
- 16:39 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ganeti1034.eqiad.wmnet
- 16:39 brouberol@deploy2002: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/airflow-ml: apply
- 16:36 brouberol@deploy2002: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/airflow-analytics-product: apply
- 16:36 elukey@deploy2002: helmfile [eqiad] DONE helmfile.d/services/kartotherian: sync
- 16:35 brouberol@deploy2002: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/airflow-analytics-product: apply
- 16:34 elukey@deploy2002: helmfile [eqiad] START helmfile.d/services/kartotherian: sync
- 16:29 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host ganeti1034.eqiad.wmnet
- 16:24 moritzm: installing Redis security updates
- 16:07 godog: bounce mtail on centrallog1002 - hogging the cpu
- 16:06 moritzm: installing qemu security updates
- 16:00 vgutierrez@cumin1002: END (PASS) - Cookbook sre.loadbalancer.admin (exit_code=0) config_reloading P{lvs6002.drmrs.wmnet} and A:liberica (T384477)
- 16:00 vgutierrez@cumin1002: START - Cookbook sre.loadbalancer.admin config_reloading P{lvs6002.drmrs.wmnet} and A:liberica (T384477)
- 15:55 jmm@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ganeti1034.eqiad.wmnet
- 15:48 jmm@cumin2002: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on ganeti1034.eqiad.wmnet with reason: remove from cluster for reimage
- 15:44 ladsgroup@deploy2002: Finished scap sync-world: Backport for Bump the thumbnail steps ratio to 5% (T360589) (duration: 11m 30s)
- 15:38 ladsgroup@deploy2002: ladsgroup: Continuing with sync
- 15:36 ladsgroup@deploy2002: ladsgroup: Backport for Bump the thumbnail steps ratio to 5% (T360589) synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug)
- 15:33 ladsgroup@deploy2002: Started scap sync-world: Backport for Bump the thumbnail steps ratio to 5% (T360589)
- 15:30 mszabo@deploy2002: Finished scap sync-world: Backport for GlobalUserSelectQueryBuilder: Ignore unattached local users (T388125), http: Promote MultiHttpClient warnings to errors (T384717) (duration: 12m 01s)
- 15:24 mszabo@deploy2002: mszabo: Continuing with sync
- 15:22 mszabo@deploy2002: mszabo: Backport for GlobalUserSelectQueryBuilder: Ignore unattached local users (T388125), http: Promote MultiHttpClient warnings to errors (T384717) synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug)
- 15:20 jmm@cumin2002: END (PASS) - Cookbook sre.ganeti.drain-node (exit_code=0) for draining ganeti node ganeti1034.eqiad.wmnet
- 15:18 mszabo@deploy2002: Started scap sync-world: Backport for GlobalUserSelectQueryBuilder: Ignore unattached local users (T388125), http: Promote MultiHttpClient warnings to errors (T384717)
- 15:17 Emperor: storcli64 /c0 restart on ms-be1090 T384003
- 15:14 vgutierrez@cumin1002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host lvs6002.drmrs.wmnet with OS bookworm
- 15:12 elukey@deploy2002: helmfile [codfw] DONE helmfile.d/services/kartotherian: sync
- 15:11 vgutierrez@cumin1002: END (PASS) - Cookbook sre.loadbalancer.migrate-service-ipip (exit_code=0) for role: wdqs::internal_main@eqiad
- 15:11 vgutierrez@cumin1002: END (PASS) - Cookbook sre.loadbalancer.restart-pybal (exit_code=0) rolling-restart of pybal on (A:lvs-low-traffic-eqiad or A:lvs-secondary-eqiad) and A:bullseye and A:lvs
- 15:10 vgutierrez@cumin1002: START - Cookbook sre.loadbalancer.restart-pybal rolling-restart of pybal on (A:lvs-low-traffic-eqiad or A:lvs-secondary-eqiad) and A:bullseye and A:lvs
- 15:10 elukey@deploy2002: helmfile [codfw] START helmfile.d/services/kartotherian: sync
- 15:06 vgutierrez@cumin1002: START - Cookbook sre.loadbalancer.migrate-service-ipip for role: wdqs::internal_main@eqiad
- 15:00 vgutierrez@cumin1002: END (PASS) - Cookbook sre.loadbalancer.migrate-service-ipip (exit_code=0) for role: wdqs::internal_main@codfw
- 15:00 vgutierrez@cumin1002: END (PASS) - Cookbook sre.loadbalancer.restart-pybal (exit_code=0) rolling-restart of pybal on (A:lvs-low-traffic-codfw or A:lvs-secondary-codfw) and A:bullseye and A:lvs
- 14:59 vgutierrez@cumin1002: START - Cookbook sre.loadbalancer.restart-pybal rolling-restart of pybal on (A:lvs-low-traffic-codfw or A:lvs-secondary-codfw) and A:bullseye and A:lvs
- 14:55 stevemunene@cumin1002: START - Cookbook sre.hadoop.init-hadoop-workers for hosts an-worker[1200-1208].eqiad.wmnet
- 14:55 stevemunene@cumin1002: END (PASS) - Cookbook sre.hadoop.init-hadoop-workers (exit_code=0) for hosts an-worker[1187-1199].eqiad.wmnet
- 14:55 lucaswerkmeister-wmde@deploy2002: Finished scap sync-world: Backport for Improve SPARQL query construction in SparqlHelper, Replace distinct-values SPARQL queries (T369079), Improve SPARQL query construction in SparqlHelper, Replace distinct-values SPARQL queries (T369079) (duration: 12m 58s)
- 14:53 vgutierrez@cumin1002: START - Cookbook sre.loadbalancer.migrate-service-ipip for role: wdqs::internal_main@codfw
- 14:53 vgutierrez@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on lvs6002.drmrs.wmnet with reason: host reimage
- 14:50 brouberol@deploy2002: helmfile [dse-k8s-eqiad] DONE helmfile.d/services/mediawiki-dumps-legacy: apply
- 14:50 brouberol@deploy2002: helmfile [dse-k8s-eqiad] START helmfile.d/services/mediawiki-dumps-legacy: apply
- 14:49 vgutierrez@cumin1002: START - Cookbook sre.hosts.downtime for 2:00:00 on lvs6002.drmrs.wmnet with reason: host reimage
- 14:48 lucaswerkmeister-wmde@deploy2002: lucaswerkmeister-wmde: Continuing with sync
- 14:45 lucaswerkmeister-wmde@deploy2002: lucaswerkmeister-wmde: Backport for Improve SPARQL query construction in SparqlHelper, Replace distinct-values SPARQL queries (T369079), Improve SPARQL query construction in SparqlHelper, Replace distinct-values SPARQL queries (T369079) synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug)
- 14:42 lucaswerkmeister-wmde@deploy2002: Started scap sync-world: Backport for Improve SPARQL query construction in SparqlHelper, Replace distinct-values SPARQL queries (T369079), Improve SPARQL query construction in SparqlHelper, Replace distinct-values SPARQL queries (T369079)
- 14:40 tgr@deploy2002: Finished scap sync-world: Backport for Remove Flow as the default talk system (T383569) (duration: 11m 32s)
- 14:37 brouberol@deploy2002: helmfile [dse-k8s-eqiad] DONE helmfile.d/admin 'apply'.
- 14:36 brouberol@deploy2002: helmfile [dse-k8s-eqiad] START helmfile.d/admin 'apply'.
- 14:33 tgr@deploy2002: zoe, tgr: Continuing with sync
- 14:32 vgutierrez@cumin1002: START - Cookbook sre.hosts.reimage for host lvs6002.drmrs.wmnet with OS bookworm
- 14:31 tgr@deploy2002: zoe, tgr: Backport for Remove Flow as the default talk system (T383569) synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug)
- 14:30 elukey@deploy2002: helmfile [staging] DONE helmfile.d/services/kartotherian: sync
- 14:29 elukey@deploy2002: helmfile [staging] START helmfile.d/services/kartotherian: sync
- 14:28 tgr@deploy2002: Started scap sync-world: Backport for Remove Flow as the default talk system (T383569)
- 14:26 vgutierrez@cumin1002: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 0:30:00 on lvs6002.drmrs.wmnet with reason: depooled before reimage
- 14:26 jforrester@deploy2002: helmfile [eqiad] DONE helmfile.d/services/wikifunctions: apply
- 14:26 tgr@deploy2002: Finished scap sync-world: Backport for Add MP event stream for MassDelete workflows (T382147), Enable SUL3 signup for 50% of group 2 users (T384218), [enwiki] Throttle exemption for event (T388637) (duration: 11m 04s)
- 14:26 jforrester@deploy2002: helmfile [eqiad] START helmfile.d/services/wikifunctions: apply
- 14:26 jforrester@deploy2002: helmfile [codfw] DONE helmfile.d/services/wikifunctions: apply
- 14:26 vgutierrez: depooling lvs6002 before getting reimaged - T384477
- 14:24 jforrester@deploy2002: helmfile [codfw] START helmfile.d/services/wikifunctions: apply
- 14:24 jforrester@deploy2002: helmfile [staging] DONE helmfile.d/services/wikifunctions: apply
- 14:23 jforrester@deploy2002: helmfile [staging] START helmfile.d/services/wikifunctions: apply
- 14:22 brouberol@deploy2002: helmfile [dse-k8s-eqiad] DONE helmfile.d/services/mediawiki-dumps-legacy: apply
- 14:22 brouberol@deploy2002: helmfile [dse-k8s-eqiad] START helmfile.d/services/mediawiki-dumps-legacy: apply
- 14:22 brouberol@deploy2002: helmfile [dse-k8s-eqiad] DONE helmfile.d/services/mediawiki-dumps-legacy: apply
- 14:21 brouberol@deploy2002: helmfile [dse-k8s-eqiad] START helmfile.d/services/mediawiki-dumps-legacy: apply
- 14:20 jforrester@deploy2002: helmfile [codfw] DONE helmfile.d/services/wikifunctions: apply
- 14:20 brouberol@deploy2002: helmfile [dse-k8s-eqiad] DONE helmfile.d/admin 'apply'.
- 14:19 tgr@deploy2002: jsn, tgr, superpes: Continuing with sync
- 14:19 jforrester@deploy2002: helmfile [codfw] START helmfile.d/services/wikifunctions: apply
- 14:19 brouberol@deploy2002: helmfile [dse-k8s-eqiad] START helmfile.d/admin 'apply'.
- 14:18 stevemunene@cumin1002: START - Cookbook sre.hadoop.init-hadoop-workers for hosts an-worker[1187-1199].eqiad.wmnet
- 14:18 jforrester@deploy2002: helmfile [eqiad] DONE helmfile.d/services/wikifunctions: apply
- 14:18 tgr@deploy2002: jsn, tgr, superpes: Backport for Add MP event stream for MassDelete workflows (T382147), Enable SUL3 signup for 50% of group 2 users (T384218), [enwiki] Throttle exemption for event (T388637) synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug)
- 14:17 jforrester@deploy2002: helmfile [eqiad] START helmfile.d/services/wikifunctions: apply
- 14:17 brouberol@deploy2002: helmfile [dse-k8s-eqiad] DONE helmfile.d/admin 'apply'.
- 14:17 jforrester@deploy2002: helmfile [staging] DONE helmfile.d/services/wikifunctions: apply
- 14:16 jforrester@deploy2002: helmfile [staging] START helmfile.d/services/wikifunctions: apply
- 14:16 brouberol@deploy2002: helmfile [dse-k8s-eqiad] START helmfile.d/admin 'apply'.
- 14:15 tgr@deploy2002: Started scap sync-world: Backport for Add MP event stream for MassDelete workflows (T382147), Enable SUL3 signup for 50% of group 2 users (T384218), [enwiki] Throttle exemption for event (T388637)
- 14:13 jforrester@deploy2002: helmfile [staging] DONE helmfile.d/services/wikifunctions: apply
- 14:13 jforrester@deploy2002: helmfile [staging] START helmfile.d/services/wikifunctions: apply
- 13:44 jiji@deploy2002: Finished scap sync-world: Reverted 1126607 and 1126650 (duration: 04m 57s)
- 13:40 jiji@deploy2002: Started scap sync-world: Reverted 1126607 and 1126650
- 13:37 ladsgroup@deploy2002: helmfile [codfw] DONE helmfile.d/services/changeprop-jobqueue: apply
- 13:36 ladsgroup@deploy2002: helmfile [codfw] START helmfile.d/services/changeprop-jobqueue: apply
- 13:36 ladsgroup@deploy2002: helmfile [eqiad] DONE helmfile.d/services/changeprop-jobqueue: apply
- 13:34 sukhe: upgrade doh2002 to dnsdist 1.9.8
- 13:34 ladsgroup@deploy2002: helmfile [eqiad] START helmfile.d/services/changeprop-jobqueue: apply
- 13:34 ladsgroup@deploy2002: helmfile [staging] DONE helmfile.d/services/changeprop-jobqueue: apply
- 13:34 ladsgroup@deploy2002: helmfile [staging] START helmfile.d/services/changeprop-jobqueue: apply
- 13:34 ladsgroup@deploy2002: helmfile [staging] DONE helmfile.d/services/changeprop-jobqueue: apply
- 13:34 ladsgroup@deploy2002: helmfile [staging] START helmfile.d/services/changeprop-jobqueue: apply
- 13:33 ladsgroup@deploy2002: helmfile [staging] DONE helmfile.d/services/changeprop-jobqueue: apply
- 13:32 sukhe: upgrade doh1001 to dnsdist 1.9.8
- 13:32 ladsgroup@deploy2002: helmfile [staging] START helmfile.d/services/changeprop-jobqueue: apply
- 13:20 brouberol@deploy2002: helmfile [codfw] DONE helmfile.d/services/cirrus-streaming-updater: apply
- 13:20 brouberol@deploy2002: helmfile [codfw] START helmfile.d/services/cirrus-streaming-updater: apply
- 13:19 brouberol@deploy2002: helmfile [eqiad] DONE helmfile.d/services/cirrus-streaming-updater: apply
- 13:19 brouberol@deploy2002: helmfile [eqiad] START helmfile.d/services/cirrus-streaming-updater: apply
- 13:01 Emperor: fio testing on ms-be2088 24 disks at once whilst resetting the controller T384003
- 12:27 jmm@cumin2002: START - Cookbook sre.ganeti.drain-node for draining ganeti node ganeti1034.eqiad.wmnet
- 12:24 jmm@cumin2002: END (PASS) - Cookbook sre.ganeti.drain-node (exit_code=0) for draining ganeti node ganeti1034.eqiad.wmnet
- 12:23 jmm@cumin2002: START - Cookbook sre.ganeti.drain-node for draining ganeti node ganeti1034.eqiad.wmnet
- 12:23 jmm@cumin2002: END (FAIL) - Cookbook sre.ganeti.drain-node (exit_code=99) for draining ganeti node ganeti1034.eqiad.wmnet
- 12:22 jmm@cumin2002: START - Cookbook sre.ganeti.drain-node for draining ganeti node ganeti1034.eqiad.wmnet
- 12:10 vgutierrez@cumin1002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host lvs6003.drmrs.wmnet with OS bookworm
- 11:55 vgutierrez@cumin1002: END (PASS) - Cookbook sre.loadbalancer.migrate-service-ipip (exit_code=0) for role: wdqs::internal@eqiad
- 11:55 vgutierrez@cumin1002: END (PASS) - Cookbook sre.loadbalancer.restart-pybal (exit_code=0) rolling-restart of pybal on (A:lvs-low-traffic-eqiad or A:lvs-secondary-eqiad) and A:bullseye and A:lvs
- 11:54 vgutierrez@cumin1002: START - Cookbook sre.loadbalancer.restart-pybal rolling-restart of pybal on (A:lvs-low-traffic-eqiad or A:lvs-secondary-eqiad) and A:bullseye and A:lvs
- 11:50 Emperor: fio testing on ms-be2088 24 disks at once T384003
- 11:44 vgutierrez@cumin1002: START - Cookbook sre.loadbalancer.migrate-service-ipip for role: wdqs::internal@eqiad
- 11:42 vgutierrez@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on lvs6003.drmrs.wmnet with reason: host reimage
- 11:39 vgutierrez@cumin1002: START - Cookbook sre.hosts.downtime for 2:00:00 on lvs6003.drmrs.wmnet with reason: host reimage
- 11:39 vgutierrez@cumin1002: END (PASS) - Cookbook sre.loadbalancer.migrate-service-ipip (exit_code=0) for role: wdqs::internal@codfw
- 11:39 vgutierrez@cumin1002: END (PASS) - Cookbook sre.loadbalancer.restart-pybal (exit_code=0) rolling-restart of pybal on (A:lvs-low-traffic-codfw or A:lvs-secondary-codfw) and A:bullseye and A:lvs
- 11:38 vgutierrez@cumin1002: START - Cookbook sre.loadbalancer.restart-pybal rolling-restart of pybal on (A:lvs-low-traffic-codfw or A:lvs-secondary-codfw) and A:bullseye and A:lvs
- 11:31 vgutierrez@cumin1002: START - Cookbook sre.loadbalancer.migrate-service-ipip for role: wdqs::internal@codfw
- 11:21 vgutierrez@cumin1002: START - Cookbook sre.hosts.reimage for host lvs6003.drmrs.wmnet with OS bookworm
- 11:18 vgutierrez: reimage lvs6003 as a liberica instance - T384477
- 11:17 mvolz@deploy2002: helmfile [eqiad] DONE helmfile.d/services/citoid: apply
- 11:16 mvolz@deploy2002: helmfile [eqiad] START helmfile.d/services/citoid: apply
- 11:16 mvolz@deploy2002: helmfile [codfw] DONE helmfile.d/services/citoid: apply
- 11:15 mvolz@deploy2002: helmfile [codfw] START helmfile.d/services/citoid: apply
- 11:13 mvolz@deploy2002: helmfile [staging] DONE helmfile.d/services/citoid: apply
- 11:13 mvolz@deploy2002: helmfile [staging] START helmfile.d/services/citoid: apply
- 11:11 Emperor: fio testing on ms-be2088 while resetting controller T384003
- 11:05 elukey@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host ms-be1091.mgmt.eqiad.wmnet with chassis set policy GRACEFUL_RESTART
- 11:05 elukey@cumin2002: START - Cookbook sre.hosts.provision for host ms-be1091.mgmt.eqiad.wmnet with chassis set policy GRACEFUL_RESTART
- 10:57 jiji@deploy2002: scap failed: <KeyError> 'production' (scap version: 4.140.0) (duration: 13m 54s)
- 10:53 elukey@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host puppetserver2004.mgmt.codfw.wmnet with chassis set policy GRACEFUL_RESTART
- 10:48 elukey@cumin2002: START - Cookbook sre.hosts.provision for host puppetserver2004.mgmt.codfw.wmnet with chassis set policy GRACEFUL_RESTART
- 10:44 jiji@deploy2002: Started scap sync-world: (T383845) mw-(api-int|parsoid|jobrunner): switch all releases to PHP 8.1
- 10:43 elukey@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host puppetserver2004.mgmt.codfw.wmnet with chassis set policy GRACEFUL_RESTART
- 10:42 jynus: removing backup1002, backup2002 dbbackups user @ m1 T387892
- 10:38 elukey@cumin2002: START - Cookbook sre.hosts.provision for host puppetserver2004.mgmt.codfw.wmnet with chassis set policy GRACEFUL_RESTART
- 10:36 elukey@cumin2002: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host puppetserver2004.mgmt.codfw.wmnet with chassis set policy GRACEFUL_RESTART
- 10:36 elukey@cumin2002: START - Cookbook sre.hosts.provision for host puppetserver2004.mgmt.codfw.wmnet with chassis set policy GRACEFUL_RESTART
- 10:19 jmm@cumin2002: END (FAIL) - Cookbook sre.ganeti.addnode (exit_code=99) for new host ganeti1037.eqiad.wmnet to cluster eqiad and group C
- 10:18 jmm@cumin2002: START - Cookbook sre.ganeti.addnode for new host ganeti1037.eqiad.wmnet to cluster eqiad and group C
- 10:17 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ganeti1037.eqiad.wmnet
- 10:14 jynus: removing backup1002, backup2002 dump user on es6,es7 T387892
- 10:14 elukey@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host puppetserver2004.mgmt.codfw.wmnet with chassis set policy GRACEFUL_RESTART
- 10:13 moritzm: installing systemd bugfix updates from Bookworm point release
- 10:08 elukey@cumin2002: START - Cookbook sre.hosts.provision for host puppetserver2004.mgmt.codfw.wmnet with chassis set policy GRACEFUL_RESTART
- 10:07 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host ganeti1037.eqiad.wmnet
- 09:53 Emperor: fio testing on ms-be2088 T384003
- 09:45 brouberol@deploy2002: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/airflow-main: apply
- 09:44 brouberol@deploy2002: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/airflow-main: apply
- 09:33 marostegui@cumin1002: END (PASS) - Cookbook sre.hosts.decommission (exit_code=0) for hosts db1125.eqiad.wmnet
- 09:33 marostegui@cumin1002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
- 09:33 marostegui@cumin1002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: db1125.eqiad.wmnet decommissioned, removing all IPs except the asset tag one - marostegui@cumin1002"
- 09:32 marostegui@cumin1002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: db1125.eqiad.wmnet decommissioned, removing all IPs except the asset tag one - marostegui@cumin1002"
- 09:26 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host ganeti1037.eqiad.wmnet with OS bookworm
- 09:16 marostegui@cumin1002: START - Cookbook sre.dns.netbox
- 09:10 marostegui@cumin1002: START - Cookbook sre.hosts.decommission for hosts db1125.eqiad.wmnet
- 09:07 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on ganeti1037.eqiad.wmnet with reason: host reimage
- 09:03 jmm@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on ganeti1037.eqiad.wmnet with reason: host reimage
- 08:55 oblivian@deploy2002: Finished scap sync-world: Updating k8s chart (duration: 03m 42s)
- 08:52 oblivian@deploy2002: Started scap sync-world: Updating k8s chart
- 08:50 slyngshede@dns1004: END - running authdns-update
- 08:48 slyngshede@dns1004: START - running authdns-update
- 08:45 marostegui@cumin1002: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1176.eqiad.wmnet with reason: Maintenance
- 08:45 marostegui@cumin1002: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2230.codfw.wmnet,db1125.eqiad.wmnet with reason: Maintenance
- 08:40 oblivian@deploy2002: Finished scap sync-world: Backport for noc/wiki.php: allow showing a single variable in json format (duration: 09m 34s)
- 08:37 jmm@cumin2002: START - Cookbook sre.hosts.reimage for host ganeti1037.eqiad.wmnet with OS bookworm
- 08:33 oblivian@deploy2002: oblivian: Continuing with sync
- 08:33 oblivian@deploy2002: oblivian: Backport for noc/wiki.php: allow showing a single variable in json format synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug)
- 08:32 marostegui@cumin1002: END (PASS) - Cookbook sre.mysql.upgrade (exit_code=0) for db1176.eqiad.wmnet
- 08:30 oblivian@deploy2002: Started scap sync-world: Backport for noc/wiki.php: allow showing a single variable in json format
- 08:28 marostegui@cumin1002: START - Cookbook sre.mysql.upgrade for db1176.eqiad.wmnet
- {{safesubst:SAL entry|1=08:25 hashar@deploy2002: Finished scap sync-world: Backport for Remove obsolete $wgAllowMicrodataAttributes, Remove wgArticlePlaceholderSearchIntegrationBackend (T207407), Remove obsolete CirrusSearch config, Fix wgCirrusSearchSimilarityProfiles, Remove Cognate legacy settings (T348526), [[gerrit:1125124|Remove obsolete $wgFlowMai}}
- 08:24 marostegui: Failover m5 from db1176 to db1228 - T388500
- 08:19 hashar@deploy2002: reedy, hashar: Continuing with sync
- {{safesubst:SAL entry|1=08:16 hashar@deploy2002: reedy, hashar: Backport for Remove obsolete $wgAllowMicrodataAttributes, Remove wgArticlePlaceholderSearchIntegrationBackend (T207407), Remove obsolete CirrusSearch config, Fix wgCirrusSearchSimilarityProfiles, Remove Cognate legacy settings (T348526), [[gerrit:1125124|Remove obsolete $wgFlowMaintenanceMod}}
- 08:16 jmm@cumin2002: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts ganeti1037.eqiad.wmnet
- 08:16 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ganeti1037.eqiad.wmnet
- {{safesubst:SAL entry|1=08:12 hashar@deploy2002: Started scap sync-world: Backport for Remove obsolete $wgAllowMicrodataAttributes, Remove wgArticlePlaceholderSearchIntegrationBackend (T207407), Remove obsolete CirrusSearch config, Fix wgCirrusSearchSimilarityProfiles, Remove Cognate legacy settings (T348526), [[gerrit:1125124|Remove obsolete $wgFlowMain}}
- 08:03 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host ganeti1037.eqiad.wmnet
- 08:02 marostegui@cumin1002: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on db[2160,2235].codfw.wmnet,db[1176,1217,1228].eqiad.wmnet with reason: m5 master switch T388500
- 07:26 jmm@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ganeti1037.eqiad.wmnet
- 03:12 eileen: civicrm upgraded from ec20a105 to 14afd1b8
2025-03-11
- 22:28 ejegg: payments-wiki upgraded from 6409fffa to 3d4dfab3
- 21:59 reedy@deploy2002: Synchronized private/: various cleanup (duration: 08m 45s)
- 20:52 dzahn@dns1004: END - running authdns-update
- 20:50 dzahn@dns1004: START - running authdns-update
- 20:49 jhuneidi@deploy2002: Finished scap sync-world: Backport for Silence TRX profiler in deferreds after autocreation (T388165), Silence TRX profiler in deferreds after autocreation (T388165) (duration: 13m 05s)
- 20:42 jhuneidi@deploy2002: jhuneidi, tgr: Continuing with sync
- 20:39 jhuneidi@deploy2002: jhuneidi, tgr: Backport for Silence TRX profiler in deferreds after autocreation (T388165), Silence TRX profiler in deferreds after autocreation (T388165) synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug)
- 20:35 jhuneidi@deploy2002: Started scap sync-world: Backport for Silence TRX profiler in deferreds after autocreation (T388165), Silence TRX profiler in deferreds after autocreation (T388165)
- 20:18 jhuneidi@deploy2002: Finished scap sync-world: Backport for Deploy donate banner to test wiki for event logging testing (T387768) (duration: 12m 33s)
- 20:12 jhuneidi@deploy2002: ksarabia, jhuneidi: Continuing with sync
- 20:10 bking@cumin2002: END (PASS) - Cookbook sre.elasticsearch.ban (exit_code=0) Banning hosts: cloudelastic1012* for ban host prior to reimage - bking@cumin2002 - T387904
- 20:10 bking@cumin2002: START - Cookbook sre.elasticsearch.ban Banning hosts: cloudelastic1012* for ban host prior to reimage - bking@cumin2002 - T387904
- 20:09 jhuneidi@deploy2002: ksarabia, jhuneidi: Backport for Deploy donate banner to test wiki for event logging testing (T387768) synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug)
- 20:06 jhuneidi@deploy2002: Started scap sync-world: Backport for Deploy donate banner to test wiki for event logging testing (T387768)
- 19:51 bking@cumin2002: END (PASS) - Cookbook sre.elasticsearch.ban (exit_code=0) Unbanning all hosts in cloudelastic
- 19:51 bking@cumin2002: START - Cookbook sre.elasticsearch.ban Unbanning all hosts in cloudelastic
- 19:37 jhuneidi@deploy2002: Finished scap sync-world: Backport for api: guard against undefined prop relations (T384627), api: guard against undefined prop relations (T384627) (duration: 09m 53s)
- 19:30 jhuneidi@deploy2002: reedy, jhuneidi: Continuing with sync
- 19:30 jhuneidi@deploy2002: reedy, jhuneidi: Backport for api: guard against undefined prop relations (T384627), api: guard against undefined prop relations (T384627) synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug)
- 19:27 jhuneidi@deploy2002: Started scap sync-world: Backport for api: guard against undefined prop relations (T384627), api: guard against undefined prop relations (T384627)
- 19:04 bking@cumin2002: DONE (PASS) - Cookbook sre.puppet.renew-cert (exit_code=0) for cloudelastic1011.eqiad.wmnet: Renew puppet certificate - bking@cumin2002
- 19:00 bking@cumin2002: END (ERROR) - Cookbook sre.hosts.reimage (exit_code=97) for host cloudelastic1011.eqiad.wmnet with OS bullseye
- 18:34 bking@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on cloudelastic1011.eqiad.wmnet with reason: host reimage
- 18:30 bking@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on cloudelastic1011.eqiad.wmnet with reason: host reimage
- 18:25 jhuneidi@deploy2002: rebuilt and synchronized wikiversions files: group0 to 1.44.0-wmf.20 refs T386215
- 18:19 bking@cumin2002: START - Cookbook sre.hosts.reimage for host cloudelastic1011.eqiad.wmnet with OS bullseye
- 17:58 bking@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host cloudelastic1011.eqiad.wmnet with OS bullseye
- 17:48 swfrench-wmf: mw-(api-ext|web): migrated 50% of residual PHP 7.4 traffic to 8.1 - T383845
- 17:46 swfrench@deploy2002: helmfile [codfw] DONE helmfile.d/services/mw-web: apply
- 17:46 swfrench@deploy2002: helmfile [codfw] START helmfile.d/services/mw-web: apply
- 17:46 swfrench@deploy2002: helmfile [codfw] DONE helmfile.d/services/mw-api-ext: apply
- 17:46 swfrench@deploy2002: helmfile [codfw] START helmfile.d/services/mw-api-ext: apply
- 17:43 swfrench@deploy2002: helmfile [codfw] DONE helmfile.d/services/mw-web: apply
- 17:43 swfrench@deploy2002: helmfile [codfw] START helmfile.d/services/mw-web: apply
- 17:43 swfrench@deploy2002: helmfile [codfw] DONE helmfile.d/services/mw-api-ext: apply
- 17:43 swfrench@deploy2002: helmfile [codfw] START helmfile.d/services/mw-api-ext: apply
- 17:39 bking@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on cloudelastic1011.eqiad.wmnet with reason: host reimage
- 17:39 swfrench@deploy2002: helmfile [eqiad] DONE helmfile.d/services/mw-web: apply
- 17:38 swfrench@deploy2002: helmfile [eqiad] START helmfile.d/services/mw-web: apply
- 17:38 swfrench@deploy2002: helmfile [eqiad] DONE helmfile.d/services/mw-api-ext: apply
- 17:38 swfrench@deploy2002: helmfile [eqiad] START helmfile.d/services/mw-api-ext: apply
- 17:35 bking@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on cloudelastic1011.eqiad.wmnet with reason: host reimage
- 17:35 swfrench@deploy2002: helmfile [eqiad] DONE helmfile.d/services/mw-web: apply
- 17:35 swfrench@deploy2002: helmfile [eqiad] START helmfile.d/services/mw-web: apply
- 17:34 swfrench@deploy2002: helmfile [eqiad] DONE helmfile.d/services/mw-api-ext: apply
- 17:34 swfrench@deploy2002: helmfile [eqiad] START helmfile.d/services/mw-api-ext: apply
- 17:24 swfrench@deploy2002: Finished scap sync-world: Deployment to pick up new php8.1 production image - T386006 (duration: 26m 26s)
- 17:24 bking@cumin2002: START - Cookbook sre.hosts.reimage for host cloudelastic1011.eqiad.wmnet with OS bullseye
- 17:14 brett@cumin2002: END (PASS) - Cookbook sre.cdn.roll-upgrade-ats (exit_code=0) Rolling upgrade/restart of Apache Traffic Server on P{cp4051.ulsfo.wmnet} and A:cp for 9.2.9-1wm1
- 17:11 brett@cumin2002: START - Cookbook sre.cdn.roll-upgrade-ats Rolling upgrade/restart of Apache Traffic Server on P{cp4051.ulsfo.wmnet} and A:cp for 9.2.9-1wm1
- 17:10 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1160 (T371742)', diff saved to https://phabricator.wikimedia.org/P74200 and previous config saved to /var/cache/conftool/dbconfig/20250311-171052-ladsgroup.json
- 16:58 bking@cumin2002: END (PASS) - Cookbook sre.elasticsearch.ban (exit_code=0) Banning hosts: cloudelastic1011* for ban host prior to reimage - bking@cumin2002 - T387904
- 16:58 bking@cumin2002: START - Cookbook sre.elasticsearch.ban Banning hosts: cloudelastic1011* for ban host prior to reimage - bking@cumin2002 - T387904
- 16:58 swfrench@deploy2002: Started scap sync-world: Deployment to pick up new php8.1 production image - T386006
- 16:55 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1160', diff saved to https://phabricator.wikimedia.org/P74199 and previous config saved to /var/cache/conftool/dbconfig/20250311-165545-ladsgroup.json
- 16:54 swfrench-wmf: rebuilt php8.1 production images to pick up PCRE2 backport from component/php81 - T386006
- 16:53 vgutierrez: test liberica 0.11 in lvs1013
- 16:53 herron@cumin1002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
- 16:52 vgutierrez: upload liberica 0.11 to bookworm-wikimedia (apt.wm.o)
- 16:51 herron@cumin1002: START - Cookbook sre.dns.netbox
- 16:40 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1160', diff saved to https://phabricator.wikimedia.org/P74198 and previous config saved to /var/cache/conftool/dbconfig/20250311-164038-ladsgroup.json
- 16:36 brennen@deploy2002: Finished deploy [phabricator/deployment@714f3c7]: redeploy phab1004 for T309222 (duration: 01m 40s)
- 16:34 brennen@deploy2002: Started deploy [phabricator/deployment@714f3c7]: redeploy phab1004 for T309222
- 16:33 brennen@deploy2002: Finished deploy [phabricator/deployment@714f3c7]: redeploy phab2002 for T309222 (duration: 01m 03s)
- 16:33 brouberol@deploy2002: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/airflow-test-k8s: apply
- 16:33 brouberol@deploy2002: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/airflow-test-k8s: apply
- 16:32 brennen@deploy2002: Started deploy [phabricator/deployment@714f3c7]: redeploy phab2002 for T309222
- 16:25 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1160 (T371742)', diff saved to https://phabricator.wikimedia.org/P74197 and previous config saved to /var/cache/conftool/dbconfig/20250311-162530-ladsgroup.json
- 16:19 isaranto@deploy2002: helmfile [ml-serve-eqiad] Ran 'sync' command on namespace 'revision-models' for release 'main' .
- 16:18 swfrench@deploy2002: Finished scap sync-world: No-op deploy to pick up mediawiki-deployments.yaml changes - T387917 (duration: 02m 42s)
- 16:16 swfrench@deploy2002: Started scap sync-world: No-op deploy to pick up mediawiki-deployments.yaml changes - T387917
- 16:03 brennen@deploy2002: Finished deploy [phabricator/deployment@714f3c7]: deploy phab1004 for T388551 (duration: 01m 02s)
- 16:01 brennen@deploy2002: Started deploy [phabricator/deployment@714f3c7]: deploy phab1004 for T388551
- 16:01 brennen@deploy2002: Finished deploy [phabricator/deployment@714f3c7]: deploy phab2002 for T388551 (duration: 00m 29s)
- 16:01 brennen@deploy2002: Started deploy [phabricator/deployment@714f3c7]: deploy phab2002 for T388551
- 15:59 dzahn@cumin1002: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 0:30:00 on phab2002.codfw.wmnet with reason: phabricator deploy
- 15:59 jelto@cumin1002: END (PASS) - Cookbook sre.hosts.remove-downtime (exit_code=0) for kubestagemaster[2003-2005].codfw.wmnet
- 15:59 jelto@cumin1002: START - Cookbook sre.hosts.remove-downtime for kubestagemaster[2003-2005].codfw.wmnet
- 15:59 dzahn@cumin1002: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 0:30:00 on phab1004.eqiad.wmnet with reason: phabricator deploy
- 15:58 jelto@cumin1002: END (PASS) - Cookbook sre.hosts.remove-downtime (exit_code=0) for kubestage[2001-2004].codfw.wmnet
- 15:58 jelto@cumin1002: START - Cookbook sre.hosts.remove-downtime for kubestage[2001-2004].codfw.wmnet
- 15:58 dzahn@cumin1002: DONE (FAIL) - Cookbook sre.hosts.downtime (exit_code=99) for 0:30:00 on phab.wmfusercontent.org with reason: phabricator deploy
- 15:58 dzahn@cumin1002: DONE (FAIL) - Cookbook sre.hosts.downtime (exit_code=99) for 0:30:00 on phabricator.wikimedia.org with reason: phabricator deploy
- 15:52 vgutierrez: upload liberica 0.10 to bookworm-wikimedia (apt.wm.o)
- 15:49 vgutierrez: test liberica 0.10 in lvs1013
- 15:45 jelto@deploy2002: helmfile [staging] DONE helmfile.d/services/ipoid: apply
- 15:45 jelto@deploy2002: helmfile [staging] START helmfile.d/services/ipoid: apply
- 15:37 klausman@deploy2002: helmfile [codfw] DONE helmfile.d/services/api-gateway: apply
- 15:36 klausman@deploy2002: helmfile [codfw] START helmfile.d/services/api-gateway: apply
- 15:33 brouberol@deploy2002: Finished scap sync-world: mediawiki: render configmaps when dumps are enabled - T388378 (duration: 02m 18s)
- 15:32 brouberol@deploy2002: Started scap sync-world: mediawiki: render configmaps when dumps are enabled - T388378
- 15:26 klausman@deploy2002: helmfile [eqiad] DONE helmfile.d/services/api-gateway: apply
- 15:25 klausman@deploy2002: helmfile [eqiad] START helmfile.d/services/api-gateway: apply
- 15:25 Lucas_WMDE: UTC afternoon backport+config window done
- 15:24 lucaswerkmeister-wmde@deploy2002: Finished scap sync-world: Backport for Revert "ResourceLoader: Enable Less.php math=parens-division" (T388475 T388526), Enable SUL3 signup for 10% of group 2 users (T384218), Disable CX unified dashboard on idwiki (T387820) (duration: 17m 22s)
- 15:21 klausman@deploy2002: helmfile [staging] DONE helmfile.d/services/api-gateway: apply
- 15:20 klausman@deploy2002: helmfile [staging] START helmfile.d/services/api-gateway: apply
- 15:18 lucaswerkmeister-wmde@deploy2002: sbisson, tgr, lucaswerkmeister-wmde: Continuing with sync
- 15:10 lucaswerkmeister-wmde@deploy2002: sbisson, tgr, lucaswerkmeister-wmde: Backport for Revert "ResourceLoader: Enable Less.php math=parens-division" (T388475 T388526), Enable SUL3 signup for 10% of group 2 users (T384218), Disable CX unified dashboard on idwiki (T387820) synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug)
- 15:10 brouberol@deploy2002: helmfile [dse-k8s-eqiad] DONE helmfile.d/services/mediawiki-dumps-legacy: apply
- 15:10 brouberol@deploy2002: helmfile [dse-k8s-eqiad] START helmfile.d/services/mediawiki-dumps-legacy: apply
- 15:08 brouberol@deploy2002: helmfile [dse-k8s-eqiad] DONE helmfile.d/services/mediawiki-dumps-legacy: apply
- 15:08 brouberol@deploy2002: helmfile [dse-k8s-eqiad] START helmfile.d/services/mediawiki-dumps-legacy: apply
- 15:07 lucaswerkmeister-wmde@deploy2002: Started scap sync-world: Backport for Revert "ResourceLoader: Enable Less.php math=parens-division" (T388475 T388526), Enable SUL3 signup for 10% of group 2 users (T384218), Disable CX unified dashboard on idwiki (T387820)
- 14:58 lucaswerkmeister-wmde@deploy2002: Finished scap sync-world: Backport for Revert "Set `$wgCentralAuthLoginWiki` to correct default as documented" (duration: 11m 28s)
- 14:52 lucaswerkmeister-wmde@deploy2002: lucaswerkmeister-wmde, d3r1ck01: Continuing with sync
- 14:50 lucaswerkmeister-wmde@deploy2002: lucaswerkmeister-wmde, d3r1ck01: Backport for Revert "Set `$wgCentralAuthLoginWiki` to correct default as documented" synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug)
- 14:49 godog: moving k8s-mlstaging off prometheus200[56] completed - T383232
- 14:47 lucaswerkmeister-wmde@deploy2002: Started scap sync-world: Backport for Revert "Set `$wgCentralAuthLoginWiki` to correct default as documented"
- 14:37 Lucas_WMDE: accidentally Ctrl+C’ed ongoing scap, was last seen at 80% sync-prod-k8s progress
- 14:31 lucaswerkmeister-wmde@deploy2002: lucaswerkmeister-wmde, abi: Continuing with sync
- 14:31 filippo@cumin1002: conftool action : set/pooled=yes; selector: name=prometheus2006.codfw.wmnet
- 14:24 filippo@cumin1002: conftool action : set/pooled=yes; selector: name=prometheus2008.codfw.wmnet
- 14:19 lucaswerkmeister-wmde@deploy2002: lucaswerkmeister-wmde, abi: Backport for EventLogging: Improve handling when suggestions are not present (T388467) synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug)
- 14:16 filippo@cumin1002: conftool action : set/weight=10; selector: name=prometheus2008.codfw.wmnet
- 14:16 lucaswerkmeister-wmde@deploy2002: Started scap sync-world: Backport for EventLogging: Improve handling when suggestions are not present (T388467)
- 14:15 filippo@cumin1002: conftool action : set/pooled=no; selector: name=prometheus2008.codfw.wmnet
- 14:15 filippo@cumin1002: conftool action : set/pooled=no; selector: name=prometheus2006.codfw.wmnet
- 14:15 lucaswerkmeister-wmde@deploy2002: Finished scap sync-world: Backport for Set `$wgCentralAuthLoginWiki` to correct default as documented (T388218) (duration: 11m 35s)
- 14:09 lucaswerkmeister-wmde@deploy2002: d3r1ck01, lucaswerkmeister-wmde: Continuing with sync
- 14:06 lucaswerkmeister-wmde@deploy2002: d3r1ck01, lucaswerkmeister-wmde: Backport for Set `$wgCentralAuthLoginWiki` to correct default as documented (T388218) synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug)
- 14:03 lucaswerkmeister-wmde@deploy2002: Started scap sync-world: Backport for Set `$wgCentralAuthLoginWiki` to correct default as documented (T388218)
- 13:53 lucaswerkmeister-wmde@deploy2002: Finished scap sync-world: Backport for Remove Wikibase fixed RDF feature flag again (T384344) (duration: 09m 31s)
- 13:50 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1201 (T371742)', diff saved to https://phabricator.wikimedia.org/P74194 and previous config saved to /var/cache/conftool/dbconfig/20250311-135019-ladsgroup.json
- 13:48 isaranto@deploy2002: helmfile [ml-serve-codfw] Ran 'sync' command on namespace 'revision-models' for release 'main' .
- 13:47 isaranto@deploy2002: helmfile [ml-serve-eqiad] Ran 'sync' command on namespace 'revision-models' for release 'main' .
- 13:47 lucaswerkmeister-wmde@deploy2002: lucaswerkmeister-wmde: Continuing with sync
- 13:47 lucaswerkmeister-wmde@deploy2002: lucaswerkmeister-wmde: Backport for Remove Wikibase fixed RDF feature flag again (T384344) synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug)
- 13:44 lucaswerkmeister-wmde@deploy2002: Started scap sync-world: Backport for Remove Wikibase fixed RDF feature flag again (T384344)
- 13:35 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1201', diff saved to https://phabricator.wikimedia.org/P74193 and previous config saved to /var/cache/conftool/dbconfig/20250311-133512-ladsgroup.json
- 13:20 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1201', diff saved to https://phabricator.wikimedia.org/P74192 and previous config saved to /var/cache/conftool/dbconfig/20250311-132005-ladsgroup.json
- 13:16 isaranto@deploy2002: helmfile [ml-staging-codfw] Ran 'sync' command on namespace 'experimental' for release 'main' .
- 13:10 klausman@deploy2002: helmfile [ml-staging-codfw] DONE helmfile.d/admin 'apply'.
- 13:10 klausman@deploy2002: helmfile [ml-staging-codfw] START helmfile.d/admin 'apply'.
- 13:09 klausman@deploy2002: helmfile [ml-serve-codfw] DONE helmfile.d/admin 'apply'.
- 13:09 klausman@deploy2002: helmfile [ml-serve-codfw] START helmfile.d/admin 'apply'.
- 13:08 klausman@deploy2002: helmfile [ml-serve-eqiad] DONE helmfile.d/admin 'apply'.
- 13:07 klausman@deploy2002: helmfile [ml-serve-eqiad] START helmfile.d/admin 'apply'.
- 13:05 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1201 (T371742)', diff saved to https://phabricator.wikimedia.org/P74191 and previous config saved to /var/cache/conftool/dbconfig/20250311-130458-ladsgroup.json
- 12:57 marostegui: Poweroff db1246 T387673
- 12:57 marostegui@cumin1002: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1246.eqiad.wmnet with reason: Maintenance
- 12:56 marostegui: Stop MariaDB on db1246 T387673
- 12:55 ladsgroup@cumin1002: dbctl commit (dc=all): 'Depooling db1201 (T371742)', diff saved to https://phabricator.wikimedia.org/P74190 and previous config saved to /var/cache/conftool/dbconfig/20250311-125458-ladsgroup.json
- 12:54 ladsgroup@cumin1002: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db1201.eqiad.wmnet with reason: Maintenance
- 12:50 ladsgroup@cumin1002: dbctl commit (dc=all): 'Depooling db1160 (T371742)', diff saved to https://phabricator.wikimedia.org/P74189 and previous config saved to /var/cache/conftool/dbconfig/20250311-125007-ladsgroup.json
- 12:50 ladsgroup@cumin1002: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db1160.eqiad.wmnet with reason: Maintenance
- 12:42 jiji@cumin1002: END (PASS) - Cookbook sre.hosts.decommission (exit_code=0) for hosts kafka-main[1001-1005].eqiad.wmnet
- 12:42 jiji@cumin1002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
- 12:42 jiji@cumin1002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: kafka-main[1001-1005].eqiad.wmnet decommissioned, removing all IPs except the asset tag one - jiji@cumin1002"
- 12:41 jiji@cumin1002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: kafka-main[1001-1005].eqiad.wmnet decommissioned, removing all IPs except the asset tag one - jiji@cumin1002"
- 12:40 elukey@puppetserver1001: conftool action : set/pooled=inactive; selector: name=maps1010.eqiad.wmnet,dc=eqiad,cluster=maps,service=kartotherian-k8s-ssl
- 12:40 elukey@puppetserver1001: conftool action : set/pooled=inactive; selector: name=maps1009.eqiad.wmnet,dc=eqiad,cluster=maps,service=kartotherian-k8s-ssl
- 12:39 elukey@deploy2002: helmfile [codfw] DONE helmfile.d/services/kartotherian: sync
- 12:39 elukey@deploy2002: helmfile [codfw] START helmfile.d/services/kartotherian: sync
- 12:38 elukey@deploy2002: helmfile [eqiad] DONE helmfile.d/services/kartotherian: sync
- 12:38 elukey@deploy2002: helmfile [eqiad] START helmfile.d/services/kartotherian: sync
- 12:38 jmm@cumin2002: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on ganeti1037.eqiad.wmnet with reason: remove from cluster for reimage
- 12:37 jiji@cumin1002: START - Cookbook sre.dns.netbox
- 12:37 jmm@cumin2002: END (PASS) - Cookbook sre.ganeti.drain-node (exit_code=0) for draining ganeti node ganeti1037.eqiad.wmnet
- 12:35 fceratto@cumin1002: END (PASS) - Cookbook sre.mysql.pool (exit_code=0) db1254 gradually with 4 steps - Pool in for T385141
- 12:35 elukey@deploy2002: helmfile [codfw] DONE helmfile.d/admin 'sync'.
- 12:34 elukey@deploy2002: helmfile [codfw] START helmfile.d/admin 'sync'.
- 12:34 elukey@deploy2002: helmfile [eqiad] DONE helmfile.d/admin 'sync'.
- 12:33 elukey@deploy2002: helmfile [eqiad] START helmfile.d/admin 'sync'.
- 12:31 hnowlan@deploy1003: helmfile [eqiad] DONE helmfile.d/services/thumbor: sync
- 12:30 hnowlan@deploy1003: helmfile [eqiad] START helmfile.d/services/thumbor: sync
- 12:23 hnowlan@deploy1003: helmfile [codfw] DONE helmfile.d/services/thumbor: sync
- 12:23 hnowlan@deploy1003: helmfile [codfw] START helmfile.d/services/thumbor: sync
- 12:18 ladsgroup@deploy2002: Finished scap sync-world: Backport for Bump thumbnail steps to 2% (T360589) (duration: 10m 18s)
- 12:16 elukey@puppetserver1001: conftool action : set/pooled=inactive; selector: name=maps2010.codfw.wmnet,dc=codfw,cluster=maps,service=kartotherian-k8s-ssl
- 12:16 elukey@puppetserver1001: conftool action : set/pooled=inactive; selector: name=maps2009.codfw.wmnet,dc=codfw,cluster=maps,service=kartotherian-k8s-ssl
- 12:11 jiji@cumin1002: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2 days, 0:00:00 on kafka-main1005.eqiad.wmnet with reason: decom
- 12:11 ladsgroup@deploy2002: ladsgroup: Continuing with sync
- 12:11 jiji@cumin1002: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2 days, 0:00:00 on kafka-main1004.eqiad.wmnet with reason: decom
- 12:11 jiji@cumin1002: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2 days, 0:00:00 on kafka-main1003.eqiad.wmnet with reason: decom
- 12:11 jiji@cumin1002: START - Cookbook sre.hosts.decommission for hosts kafka-main[1001-1005].eqiad.wmnet
- 12:11 jiji@cumin1002: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2 days, 0:00:00 on kafka-main1002.eqiad.wmnet with reason: decom
- 12:10 ladsgroup@deploy2002: ladsgroup: Backport for Bump thumbnail steps to 2% (T360589) synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug)
- 12:09 jiji@cumin1002: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2 days, 0:00:00 on kafka-main1001.eqiad.wmnet with reason: decom
- 12:08 ladsgroup@deploy2002: Started scap sync-world: Backport for Bump thumbnail steps to 2% (T360589)
- 12:04 ladsgroup@deploy2002: Finished scap sync-world: Backport for FileModule: Normalize file paths for deps tracked from CSSMin (T388323) (duration: 13m 41s)
- 11:55 ladsgroup@deploy2002: ladsgroup: Continuing with sync
- 11:55 ladsgroup@deploy2002: ladsgroup: Backport for FileModule: Normalize file paths for deps tracked from CSSMin (T388323) synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug)
- 11:50 ladsgroup@deploy2002: Started scap sync-world: Backport for FileModule: Normalize file paths for deps tracked from CSSMin (T388323)
- 11:50 fceratto@cumin1002: START - Cookbook sre.mysql.pool db1254 gradually with 4 steps - Pool in for T385141
- 11:48 fceratto@cumin1002: dbctl commit (dc=all): 'Preparing db1254 for T385141', diff saved to https://phabricator.wikimedia.org/P74183 and previous config saved to /var/cache/conftool/dbconfig/20250311-114835-fceratto.json
- 11:43 ladsgroup@deploy2002: Finished scap sync-world: Backport for Stop loading the ActiveAbstract extension for dumps (T382069) (duration: 13m 36s)
- 11:41 Amir1: dropping transcache table everywhere (T376627)
- 11:34 ladsgroup@deploy2002: ladsgroup, jforrester: Continuing with sync
- 11:34 ladsgroup@deploy2002: ladsgroup, jforrester: Backport for Stop loading the ActiveAbstract extension for dumps (T382069) synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug)
- 11:31 topranks: enable connections from ssw1-e1 and ssw1-f1 to new top-of-rack switches lsw1-e8 and lsw1-f8 in eqiad T382017
- 11:30 ladsgroup@deploy2002: Started scap sync-world: Backport for Stop loading the ActiveAbstract extension for dumps (T382069)
- 11:28 jelto@cumin1002: END (FAIL) - Cookbook sre.k8s.wipe-cluster (exit_code=99) Wipe the K8s cluster staging-codfw: Kubernetes upgrade
- 11:24 elukey@puppetserver1001: conftool action : set/pooled=inactive; selector: name=maps1008.eqiad.wmnet,dc=eqiad,cluster=maps,service=kartotherian-k8s-ssl
- 11:24 elukey@puppetserver1001: conftool action : set/pooled=inactive; selector: name=maps2008.codfw.wmnet,dc=codfw,cluster=maps,service=kartotherian-k8s-ssl
- 11:24 cmooney@cumin1002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "add new network switches - cmooney@cumin1002 - T382017"
- 11:23 cmooney@cumin1002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "add new network switches - cmooney@cumin1002 - T382017"
- 11:21 cmooney@cumin1002: END (PASS) - Cookbook sre.network.tls (exit_code=0) for network device lsw1-f8-eqiad
- 11:19 MichaelG_WMF: migr@mwmaint2002: ran "time mwscript extensions/GrowthExperiments/maintenance/fixLinkRecommendationData.php --wiki=frwiki --db-table --verbose --force 2>&1 | tee ~/frwiki-dbtable.txt"
- 11:19 cmooney@cumin1002: START - Cookbook sre.network.tls for network device lsw1-f8-eqiad
- 11:18 cmooney@cumin1002: END (PASS) - Cookbook sre.network.tls (exit_code=0) for network device lsw1-e8-eqiad
- 11:16 cmooney@cumin1002: START - Cookbook sre.network.tls for network device lsw1-e8-eqiad
- 11:16 fceratto@cumin1002: END (PASS) - Cookbook sre.mysql.pool (exit_code=0) db1253 gradually with 4 steps - Pool in for T385141
- 11:05 ladsgroup@deploy2002: ladsgroup, jforrester: Continuing with sync
- 11:05 ladsgroup@deploy2002: ladsgroup, jforrester: Backport for Stop loading the ActiveAbstract extension for dumps (T382069) synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug)
- 10:59 brouberol@deploy2002: helmfile [dse-k8s-eqiad] DONE helmfile.d/admin 'apply'.
- 10:59 brouberol@deploy2002: helmfile [dse-k8s-eqiad] START helmfile.d/admin 'apply'.
- 10:56 elukey@puppetserver1001: conftool action : set/pooled=inactive; selector: name=maps2007.codfw.wmnet,dc=codfw,cluster=maps,service=kartotherian-k8s-ssl
- 10:56 elukey@puppetserver1001: conftool action : set/pooled=inactive; selector: name=maps1007.eqiad.wmnet,dc=eqiad,cluster=maps,service=kartotherian-k8s-ssl
- 10:55 brouberol@deploy2002: helmfile [dse-k8s-eqiad] DONE helmfile.d/admin 'apply'.
- 10:54 jelto@deploy2002: helmfile [staging-codfw] DONE helmfile.d/admin 'sync'.
- 10:54 brouberol@deploy2002: helmfile [dse-k8s-eqiad] START helmfile.d/admin 'apply'.
- 10:53 jelto@deploy2002: helmfile [staging-codfw] START helmfile.d/admin 'sync'.
- 10:41 moritzm: installing openjdk 17 security updates on puppet servers (the necessary restarts may cause a few interrupted puppet runs and will be splayed out)
- 10:37 ladsgroup@deploy2002: Started scap sync-world: Backport for Stop loading the ActiveAbstract extension for dumps (T382069)
- 10:36 jelto@deploy2002: helmfile [staging-codfw] DONE helmfile.d/admin 'sync'.
- 10:30 fceratto@cumin1002: START - Cookbook sre.mysql.pool db1253 gradually with 4 steps - Pool in for T385141
- 10:22 marostegui: Deploy schema change on x1 commonswiki codfw master with replication dbmaint T385917
- 10:21 marostegui: Deploy schema change on s4 testcommonswiki codfw master with replication dbmaint T385917
- 10:18 jelto@deploy2002: helmfile [staging-codfw] START helmfile.d/admin 'sync'.
- 10:15 dcausse@deploy2002: Finished deploy [airflow-dags/search@c27621d]: publish search artifacts (duration: 00m 29s)
- 10:14 dcausse@deploy2002: Started deploy [airflow-dags/search@c27621d]: publish search artifacts
- 10:06 jelto@cumin1002: START - Cookbook sre.k8s.wipe-cluster Wipe the K8s cluster staging-codfw: Kubernetes upgrade
- 10:05 jelto@cumin1002: END (FAIL) - Cookbook sre.k8s.wipe-cluster (exit_code=99) Wipe the K8s cluster staging-codfw: Kubernetes upgrade
- 10:01 jelto@deploy2002: helmfile [staging-codfw] DONE helmfile.d/admin 'sync'.
- 09:58 jelto@deploy2002: helmfile [staging-codfw] START helmfile.d/admin 'sync'.
- 09:57 jelto@deploy2002: helmfile [staging-codfw] DONE helmfile.d/admin 'sync'.
- 09:57 jelto@deploy2002: helmfile [staging-codfw] START helmfile.d/admin 'sync'.
- 09:55 jelto@deploy2002: helmfile [staging-codfw] DONE helmfile.d/admin 'sync'.
- 09:55 jelto@deploy2002: helmfile [staging-codfw] START helmfile.d/admin 'sync'.
- 09:53 vgutierrez@cumin1002: END (FAIL) - Cookbook sre.loadbalancer.liberica-admin (exit_code=1) depooling P{lvs4010.ulsfo.wmnet} and A:liberica
- 09:52 vgutierrez@cumin1002: START - Cookbook sre.loadbalancer.liberica-admin depooling P{lvs4010.ulsfo.wmnet} and A:liberica
- 09:48 jelto@deploy2002: helmfile [staging-codfw] DONE helmfile.d/admin 'sync'.
- 09:48 jelto@deploy2002: helmfile [staging-codfw] START helmfile.d/admin 'sync'.
- 09:41 kart_: Script run: `mwscript updateCollation.php --wiki=kkwiki --previous-collation=uppercase` (T384395)
- 09:32 elukey@puppetserver1001: conftool action : set/pooled=inactive; selector: name=maps1006.eqiad.wmnet,dc=eqiad,cluster=maps,service=kartotherian-k8s-ssl
- 09:32 elukey@puppetserver1001: conftool action : set/pooled=inactive; selector: name=maps2006.codfw.wmnet,dc=codfw,cluster=maps,service=kartotherian-k8s-ssl
- 09:28 jelto@cumin1002: START - Cookbook sre.k8s.wipe-cluster Wipe the K8s cluster staging-codfw: Kubernetes upgrade
- 08:59 jmm@cumin2002: START - Cookbook sre.ganeti.drain-node for draining ganeti node ganeti1037.eqiad.wmnet
- 08:52 jmm@cumin2002: END (PASS) - Cookbook sre.ganeti.drain-node (exit_code=0) for draining ganeti node ganeti1037.eqiad.wmnet
- 08:51 marostegui@cumin1002: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on db2144.codfw.wmnet,db1151.eqiad.wmnet with reason: Maintenance
- 08:50 jmm@cumin2002: START - Cookbook sre.ganeti.drain-node for draining ganeti node ganeti1037.eqiad.wmnet
- 08:47 kartik@deploy2002: Finished scap sync-world: Backport for Add uca collation for Kazakh (T384395) (duration: 12m 13s)
- 08:41 kartik@deploy2002: kartik, jhsoby: Continuing with sync
- 08:38 kartik@deploy2002: kartik, jhsoby: Backport for Add uca collation for Kazakh (T384395) synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug)
- 08:37 brouberol@deploy2002: helmfile [dse-k8s-eqiad] DONE helmfile.d/services/mediawiki-dumps-legacy: apply
- 08:37 brouberol@deploy2002: helmfile [dse-k8s-eqiad] START helmfile.d/services/mediawiki-dumps-legacy: apply
- 08:37 brouberol@deploy2002: helmfile [dse-k8s-eqiad] DONE helmfile.d/services/mediawiki-dumps-legacy: apply
- 08:37 brouberol@deploy2002: helmfile [dse-k8s-eqiad] START helmfile.d/services/mediawiki-dumps-legacy: apply
- 08:35 kartik@deploy2002: Started scap sync-world: Backport for Add uca collation for Kazakh (T384395)
- 08:32 kartik@deploy2002: Finished scap sync-world: Backport for EventLogging: Improve handling when suggestions are not present (T388467) (duration: 26m 56s)
- 08:23 kartik@deploy2002: abi, kartik: Continuing with sync
- 08:12 kartik@deploy2002: abi, kartik: Backport for EventLogging: Improve handling when suggestions are not present (T388467) synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug)
- 08:08 moritzm: installing systemd bugfix updates from Bookworm point release
- 08:05 kartik@deploy2002: Started scap sync-world: Backport for EventLogging: Improve handling when suggestions are not present (T388467)
- 08:02 marostegui@cumin1002: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 4:00:00 on 8 hosts with reason: Cloning
- 08:00 marostegui@cumin1002: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1217.eqiad.wmnet with reason: Maintenance
- 07:59 marostegui@cumin1002: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1228.eqiad.wmnet with reason: Maintenance
- 07:23 marostegui@cumin1002: END (PASS) - Cookbook sre.mysql.upgrade (exit_code=0) for db1228.eqiad.wmnet
- 07:19 marostegui@cumin1002: START - Cookbook sre.mysql.upgrade for db1228.eqiad.wmnet
- 07:19 marostegui@cumin1002: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1228.eqiad.wmnet with reason: Maintenance
- 07:13 marostegui: Failover m2 from db1228 to db1164 - T388396
- 07:00 marostegui@cumin1002: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on db[2160,2233].codfw.wmnet,db[1164,1217,1228].eqiad.wmnet with reason: Primary switchover m2 T388396
- 06:45 marostegui: Drop rt database from m1 T388437
- 06:45 marostegui: Remove rt grants from m1 T388437
- 04:03 mwpresync@deploy2002: Pruned MediaWiki: 1.44.0-wmf.17 (duration: 03m 02s)
- 03:54 eileen: civicrm upgraded from f2222fcd to ec20a105
- 03:52 mwpresync@deploy2002: Finished scap sync-world: testwikis to 1.44.0-wmf.20 refs T386215 (duration: 49m 13s)
- 03:03 mwpresync@deploy2002: Started scap sync-world: testwikis to 1.44.0-wmf.20 refs T386215
- 00:22 aaron@deploy2002: helmfile [staging] DONE helmfile.d/services/changeprop-jobqueue: apply
- 00:21 aaron@deploy2002: helmfile [staging] START helmfile.d/services/changeprop-jobqueue: apply
- 00:18 aaron@deploy2002: helmfile [staging] DONE helmfile.d/services/changeprop: apply
- 00:13 jhancock@cumin2002: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host ms-be2089.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED
- 00:08 pt1979@cumin1002: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host restbase1045.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART
- 00:07 pt1979@cumin1002: START - Cookbook sre.hosts.provision for host restbase1045.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART
- 00:07 aaron@deploy2002: helmfile [staging] START helmfile.d/services/changeprop: apply
- 00:03 pt1979@cumin1002: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host restbase1045.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART
- 00:01 pt1979@cumin1002: START - Cookbook sre.hosts.provision for host restbase1045.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART
- 00:00 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host ms-be2089.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED
2025-03-10
- 23:47 jhancock@cumin2002: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host ms-be2089.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED
- 23:40 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host ms-be2089.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED
- 23:38 jhancock@cumin2002: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host ms-be2089
- 23:38 jhancock@cumin2002: START - Cookbook sre.network.configure-switch-interfaces for host ms-be2089
- 23:38 jhancock@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
- 23:38 jhancock@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: adding ms-be2089 to codfw - jhancock@cumin2002"
- 23:38 jhancock@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: adding ms-be2089 to codfw - jhancock@cumin2002"
- 23:34 jhancock@cumin2002: START - Cookbook sre.dns.netbox
- 23:31 jhancock@cumin2002: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host ms-be2089
- 23:31 jhancock@cumin2002: START - Cookbook sre.network.configure-switch-interfaces for host ms-be2089
- 21:48 tgr_: UTC late deploys done
- 21:48 tgr@deploy2002: Finished scap sync-world: Backport for Enable SUL3 signup for all of group 1 and 1% of group 2 users (T384007 T384218) (duration: 15m 21s)
- 21:42 jclark@cumin1002: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host restbase1043.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART
- 21:41 tgr@deploy2002: tgr: Continuing with sync
- 21:41 jclark@cumin1002: START - Cookbook sre.hosts.provision for host restbase1043.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART
- 21:35 tgr@deploy2002: tgr: Backport for Enable SUL3 signup for all of group 1 and 1% of group 2 users (T384007 T384218) synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug)
- 21:32 tgr@deploy2002: Started scap sync-world: Backport for Enable SUL3 signup for all of group 1 and 1% of group 2 users (T384007 T384218)
- 21:28 jclark@cumin1002: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host db1257.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART
- 21:23 jclark@cumin1002: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host restbase1043.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART
- 21:22 jclark@cumin1002: START - Cookbook sre.hosts.provision for host db1257.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART
- 21:21 jclark@cumin1002: START - Cookbook sre.hosts.provision for host restbase1043.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART
- 21:21 jclark@cumin1002: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host restbase1045.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART
- 21:17 jclark@cumin1002: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host restbase1043.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART
- 21:17 jclark@cumin1002: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host restbase1044.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART
- 21:15 jclark@cumin1002: START - Cookbook sre.hosts.provision for host restbase1045.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART
- 21:14 jclark@cumin1002: START - Cookbook sre.hosts.provision for host restbase1044.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART
- 21:14 fabfur: installed new benthos version (4.27.0-2 over 4.27.0-1) on cp4037 for testing'
- 21:14 jclark@cumin1002: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host db1257.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART
- 21:14 jclark@cumin1002: START - Cookbook sre.hosts.provision for host restbase1043.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART
- 21:13 jclark@cumin1002: START - Cookbook sre.hosts.provision for host db1257.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART
- 21:11 jclark@cumin1002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
- 21:11 jclark@cumin1002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: added mgmt for restbase - jclark@cumin1002"
- 21:11 jclark@cumin1002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: added mgmt for restbase - jclark@cumin1002"
- 21:07 jclark@cumin1002: START - Cookbook sre.dns.netbox
- 20:31 dancy@deploy2002: Finished scap sync-world: Backport for CX3 Build 1.0.0+20250310 (T284422 T387036) (duration: 10m 46s)
- 20:25 dancy@deploy2002: sbisson, dancy: Continuing with sync
- 20:23 dancy@deploy2002: sbisson, dancy: Backport for CX3 Build 1.0.0+20250310 (T284422 T387036) synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug)
- 20:20 dancy@deploy2002: Started scap sync-world: Backport for CX3 Build 1.0.0+20250310 (T284422 T387036)
- 20:19 dancy@deploy2002: Finished scap sync-world: Backport for Remove $wgAllowAuthenticatedCrossOrigin again (T322944) (duration: 11m 18s)
- 20:13 dancy@deploy2002: lucaswerkmeister, dancy: Continuing with sync
- 20:11 dancy@deploy2002: lucaswerkmeister, dancy: Backport for Remove $wgAllowAuthenticatedCrossOrigin again (T322944) synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug)
- 20:08 dancy@deploy2002: Started scap sync-world: Backport for Remove $wgAllowAuthenticatedCrossOrigin again (T322944)
- 20:06 brett@cumin2002: END (PASS) - Cookbook sre.cdn.roll-upgrade-ats (exit_code=0) Rolling upgrade/restart of Apache Traffic Server on P{cp4038.ulsfo.wmnet} and A:cp for 9.2.9-1wm1
- 20:03 brett@cumin2002: START - Cookbook sre.cdn.roll-upgrade-ats Rolling upgrade/restart of Apache Traffic Server on P{cp4038.ulsfo.wmnet} and A:cp for 9.2.9-1wm1
- 19:58 bking@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host cloudelastic1010.eqiad.wmnet with OS bullseye
- 19:52 bking@cumin2002: END (PASS) - Cookbook sre.elasticsearch.ban (exit_code=0) Unbanning all hosts in cloudelastic
- 19:52 bking@cumin2002: START - Cookbook sre.elasticsearch.ban Unbanning all hosts in cloudelastic
- 19:25 bking@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on cloudelastic1010.eqiad.wmnet with reason: host reimage
- 19:22 bking@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on cloudelastic1010.eqiad.wmnet with reason: host reimage
- 19:14 ladsgroup@deploy2002: Finished scap sync-world: Backport for FileModule: Normalize file paths for deps tracked from CSSMin (T388323) (duration: 10m 53s)
- 19:11 cmooney@cumin1002: END (PASS) - Cookbook sre.network.provision (exit_code=0) for device lsw1-f8-eqiad.mgmt.eqiad.wmnet
- 19:08 ladsgroup@deploy2002: ladsgroup: Continuing with sync
- 19:06 ladsgroup@deploy2002: ladsgroup: Backport for FileModule: Normalize file paths for deps tracked from CSSMin (T388323) synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug)
- 19:03 ladsgroup@deploy2002: Started scap sync-world: Backport for FileModule: Normalize file paths for deps tracked from CSSMin (T388323)
- 19:03 bking@cumin2002: START - Cookbook sre.hosts.reimage for host cloudelastic1010.eqiad.wmnet with OS bullseye
- 19:02 cmooney@cumin1002: END (PASS) - Cookbook sre.network.provision (exit_code=0) for device lsw1-e8-eqiad.mgmt.eqiad.wmnet
- 18:44 cmooney@cumin1002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
- 18:44 cmooney@cumin1002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Add management record for lsw1-f8-eqiad - cmooney@cumin1002"
- 18:44 cmooney@cumin1002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Add management record for lsw1-f8-eqiad - cmooney@cumin1002"
- 18:39 bking@cumin2002: END (ERROR) - Cookbook sre.hosts.reimage (exit_code=93) for host cloudelastic1010.eqiad.wmnet with OS bullseye
- 18:34 cmooney@cumin1002: START - Cookbook sre.dns.netbox
- 18:34 cmooney@cumin1002: START - Cookbook sre.network.provision for device lsw1-f8-eqiad.mgmt.eqiad.wmnet
- 18:32 cmooney@cumin1002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
- 18:32 cmooney@cumin1002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Add management record for lsw1-e8-eqiad - cmooney@cumin1002"
- 18:27 cmooney@dns2005: END - running authdns-update
- 18:26 cmooney@dns2005: START - running authdns-update
- 18:25 cmooney@cumin1002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Add management record for lsw1-e8-eqiad - cmooney@cumin1002"
- 18:21 cmooney@cumin1002: START - Cookbook sre.dns.netbox
- 18:21 cmooney@cumin1002: START - Cookbook sre.network.provision for device lsw1-e8-eqiad.mgmt.eqiad.wmnet
- 18:17 sukhe: restart pybal on lvs2013: not required but to clear up possible no restart alerts
- 18:16 sukhe: sudo cumin 'A:lvs-codfw' 'run-puppet-agent --enable "adding k8s-ingress-aux codfw"'
- 18:14 sukhe: restart pybal on lvs2014 for reverted aux-k8s change
- 18:12 brett@puppetserver1001: conftool action : set/pooled=yes; selector: name=cp4044.ulsfo.wmnet
- 18:03 herron@puppetserver1001: conftool action : set/pooled=no; selector: name=aux-k8s-worker2004.codfw.wmnet
- 18:03 herron@puppetserver1001: conftool action : set/pooled=no; selector: name=aux-k8s-worker2002.codfw.wmnet
- 17:58 bking@cumin2002: START - Cookbook sre.hosts.reimage for host cloudelastic1010.eqiad.wmnet with OS bullseye
- 17:58 isaranto@deploy2002: helmfile [ml-serve-eqiad] Ran 'sync' command on namespace 'revision-models' for release 'main' .
- 17:56 herron@puppetserver1001: conftool action : set/pooled=yes; selector: name=aux-k8s-worker2005.codfw.wmnet
- 17:56 herron@puppetserver1001: conftool action : set/pooled=yes; selector: name=aux-k8s-worker2003.codfw.wmnet
- 17:55 sukhe: restart pybal on lvs2014
- 17:54 cgoubert@deploy2002: Finished scap sync-world: mw-cron to php 8.1 - T387916 (duration: 02m 49s)
- 17:52 cgoubert@deploy2002: Started scap sync-world: mw-cron to php 8.1 - T387916
- 17:49 jgiannelos@deploy2002: helmfile [eqiad] DONE helmfile.d/services/changeprop: apply
- 17:48 jgiannelos@deploy2002: helmfile [eqiad] START helmfile.d/services/changeprop: apply
- 17:48 jgiannelos@deploy2002: helmfile [codfw] DONE helmfile.d/services/changeprop: apply
- 17:47 jgiannelos@deploy2002: helmfile [codfw] START helmfile.d/services/changeprop: apply
- 17:47 swfrench-wmf: mw-(api-ext|web): migrated 25% of residual PHP 7.4 traffic to 8.1 - T383845
- 17:46 swfrench@deploy2002: helmfile [codfw] DONE helmfile.d/services/mw-web: apply
- 17:45 sukhe: sudo cumin 'A:lvs-codfw' 'disable-puppet "adding k8s-ingress-aux codfw"'T381417
- 17:45 swfrench@deploy2002: helmfile [codfw] START helmfile.d/services/mw-web: apply
- 17:45 swfrench@deploy2002: helmfile [codfw] DONE helmfile.d/services/mw-web: apply
- 17:45 swfrench@deploy2002: helmfile [codfw] START helmfile.d/services/mw-web: apply
- 17:44 swfrench@deploy2002: helmfile [codfw] DONE helmfile.d/services/mw-api-ext: apply
- 17:44 swfrench@deploy2002: helmfile [codfw] START helmfile.d/services/mw-api-ext: apply
- 17:44 swfrench@deploy2002: helmfile [codfw] DONE helmfile.d/services/mw-api-ext: apply
- 17:43 swfrench@deploy2002: helmfile [codfw] START helmfile.d/services/mw-api-ext: apply
- 17:40 brett: Upgrading cp4044 to Varnish 7 (T378737)
- 17:40 brett@puppetserver1001: conftool action : set/pooled=no; selector: name=cp4044.ulsfo.wmnet
- 17:38 swfrench@deploy2002: helmfile [eqiad] DONE helmfile.d/services/mw-web: apply
- 17:38 swfrench@deploy2002: helmfile [eqiad] START helmfile.d/services/mw-web: apply
- 17:37 swfrench@deploy2002: helmfile [eqiad] DONE helmfile.d/services/mw-web: apply
- 17:37 swfrench@deploy2002: helmfile [eqiad] START helmfile.d/services/mw-web: apply
- 17:36 swfrench@deploy2002: helmfile [eqiad] DONE helmfile.d/services/mw-api-ext: apply
- 17:35 swfrench@deploy2002: helmfile [eqiad] START helmfile.d/services/mw-api-ext: apply
- 17:35 swfrench@deploy2002: helmfile [eqiad] DONE helmfile.d/services/mw-api-ext: apply
- 17:35 swfrench@deploy2002: helmfile [eqiad] START helmfile.d/services/mw-api-ext: apply
- 17:14 dreamyjazz@deploy2002: Finished scap sync-world: Backport for Revert^2 "CommonSettings.php: Add $wgCentralAuthAutomaticVanishWiki" (duration: 10m 20s)
- 17:12 bking@cumin2002: END (FAIL) - Cookbook sre.hardware.upgrade-firmware (exit_code=1) upgrade firmware for hosts ['cloudelastic1010.eqiad.wmnet']
- 17:10 sukhe: sudo cumin 'A:lvs and A:eqiad' 'run-puppet-agent --enable "adding aux-k8s-ctrl codfw"'
- 17:08 sukhe: sudo cumin 'A:lvs and A:codfw' 'run-puppet-agent --enable "adding aux-k8s-ctrl codfw"'
- 17:08 dreamyjazz@deploy2002: dreamyjazz: Continuing with sync
- 17:08 dreamyjazz@deploy2002: dreamyjazz: Backport for Revert^2 "CommonSettings.php: Add $wgCentralAuthAutomaticVanishWiki" synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug)
- 17:06 bking@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['cloudelastic1010.eqiad.wmnet']
- 17:06 bking@cumin2002: END (ERROR) - Cookbook sre.hosts.reimage (exit_code=97) for host cloudelastic1010.eqiad.wmnet with OS bullseye
- 17:06 sukhe: lvs2013: restart pybal
- 17:04 dreamyjazz@deploy2002: Started scap sync-world: Backport for Revert^2 "CommonSettings.php: Add $wgCentralAuthAutomaticVanishWiki"
- 17:03 herron@puppetserver1001: conftool action : set/pooled=yes; selector: name=aux-k8s-worker2005.codfw.wmnet
- 17:02 herron@puppetserver1001: conftool action : set/pooled=yes; selector: name=aux-k8s-worker2004.codfw.wmnet
- 17:02 herron@puppetserver1001: conftool action : set/pooled=yes; selector: name=aux-k8s-worker2003.codfw.wmnet
- 17:02 herron@puppetserver1001: conftool action : set/pooled=yes; selector: name=aux-k8s-worker2002.codfw.wmnet
- 17:00 dancy@deploy2002: Installation of scap version "4.140.0" completed for 204 hosts
- 17:00 sukhe: restart pybal on lvs2014
- 16:59 sukhe: enable puppet on lvs2014
- 16:58 sukhe: restart pybal on lvs1020
- 16:55 dancy@deploy2002: Installing scap version "4.140.0" for 204 host(s)
- 16:51 jgiannelos@deploy2002: helmfile [codfw] DONE helmfile.d/services/changeprop: apply
- 16:50 jgiannelos@deploy2002: helmfile [codfw] START helmfile.d/services/changeprop: apply
- 16:50 jgiannelos@deploy2002: helmfile [eqiad] DONE helmfile.d/services/changeprop: apply
- 16:47 herron@puppetserver1001: conftool action : set/pooled=yes; selector: name=aux-k8s-ctrl2003.codfw.wmnet
- 16:47 herron@puppetserver1001: conftool action : set/pooled=yes; selector: name=aux-k8s-ctrl2002.codfw.wmnet
- 16:47 sukhe: sudo cumin 'A:lvs and (A:eqiad or A:codfw)' 'disable-puppet "adding aux-k8s-ctrl codfw"'
- 16:44 bking@cumin2002: START - Cookbook sre.hosts.reimage for host cloudelastic1010.eqiad.wmnet with OS bullseye
- 16:43 jgiannelos@deploy2002: helmfile [eqiad] START helmfile.d/services/changeprop: apply
- 16:43 jgiannelos@deploy2002: helmfile [codfw] DONE helmfile.d/services/changeprop: apply
- 16:43 jgiannelos@deploy2002: helmfile [codfw] START helmfile.d/services/changeprop: apply
- 16:43 jgiannelos@deploy2002: helmfile [eqiad] DONE helmfile.d/services/changeprop: apply
- 16:42 jgiannelos@deploy2002: helmfile [eqiad] START helmfile.d/services/changeprop: apply
- 16:42 jgiannelos@deploy2002: helmfile [eqiad] DONE helmfile.d/services/changeprop: apply
- 16:42 jgiannelos@deploy2002: helmfile [eqiad] START helmfile.d/services/changeprop: apply
- 16:39 bking@cumin2002: END (FAIL) - Cookbook sre.hardware.upgrade-firmware (exit_code=1) upgrade firmware for hosts ['cloudelastic1010.eqiad.wmnet']
- 16:33 bking@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['cloudelastic1010.eqiad.wmnet']
- 16:32 bking@cumin2002: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts ['cloudelastic1010.eqiad.wmnet']
- 16:31 sukhe@cumin1002: END (PASS) - Cookbook sre.dns.wipe-cache (exit_code=0) k8s-ingress-aux.svc.codfw.wmnet on all recursors
- 16:31 sukhe@cumin1002: START - Cookbook sre.dns.wipe-cache k8s-ingress-aux.svc.codfw.wmnet on all recursors
- 16:30 sukhe@cumin1002: END (PASS) - Cookbook sre.dns.wipe-cache (exit_code=0) aux-k8s-ctrl.svc.codfw.wmnet on all recursors
- 16:30 sukhe@cumin1002: START - Cookbook sre.dns.wipe-cache aux-k8s-ctrl.svc.codfw.wmnet on all recursors
- 16:30 herron@dns1004: END - running authdns-update
- 16:29 jgiannelos@deploy2002: helmfile [eqiad] DONE helmfile.d/services/changeprop: apply
- 16:29 jgiannelos@deploy2002: helmfile [eqiad] START helmfile.d/services/changeprop: apply
- 16:28 herron@dns1004: START - running authdns-update
- 16:17 bking@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['cloudelastic1010.eqiad.wmnet']
- 16:12 sukhe@cumin1002: END (PASS) - Cookbook sre.dns.wipe-cache (exit_code=0) aux-k8s-ctrl.svc.codfw.wmnet on all recursors
- 16:12 sukhe@cumin1002: START - Cookbook sre.dns.wipe-cache aux-k8s-ctrl.svc.codfw.wmnet on all recursors
- 16:10 herron@cumin1002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
- 16:10 herron@cumin1002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: enabling aux-k8s codfw vips - herron@cumin1002"
- 16:10 herron@cumin1002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: enabling aux-k8s codfw vips - herron@cumin1002"
- 16:09 bking@cumin2002: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts ['cloudelastic1010.eqiad.wmnet']
- 16:06 herron@cumin1002: START - Cookbook sre.dns.netbox
- 16:04 elukey@puppetserver1001: conftool action : set/pooled=inactive; selector: name=maps2005.codfw.wmnet,dc=codfw,cluster=maps,service=kartotherian-k8s-ssl
- 16:04 elukey@puppetserver1001: conftool action : set/pooled=inactive; selector: name=maps1005.eqiad.wmnet,dc=eqiad,cluster=maps,service=kartotherian-k8s-ssl
- 16:01 elukey@puppetserver1001: conftool action : set/weight=10; selector: name=wikikube-worker1.*,dc=eqiad,cluster=maps,service=kartotherian-k8s-ssl
- 16:01 elukey@puppetserver1001: conftool action : set/weight=10; selector: name=wikikube-worker2.*,dc=codfw,cluster=maps,service=kartotherian-k8s-ssl
- 16:00 bking@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['cloudelastic1010.eqiad.wmnet']
- 16:00 elukey@cumin2002: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host puppetserver2004.mgmt.codfw.wmnet with chassis set policy GRACEFUL_RESTART
- 16:00 moritzm: imported keepalived 1:2.2.7-1~bpo11+1 to main component of bullseye-wikimedia T383557
- 15:59 elukey@cumin2002: START - Cookbook sre.hosts.provision for host puppetserver2004.mgmt.codfw.wmnet with chassis set policy GRACEFUL_RESTART
- 15:58 bking@cumin2002: END (ERROR) - Cookbook sre.hosts.reimage (exit_code=97) for host cloudelastic1010.eqiad.wmnet with OS bullseye
- 15:56 elukey@cumin2002: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host puppetserver2004.mgmt.codfw.wmnet with chassis set policy GRACEFUL_RESTART
- 15:56 elukey@cumin2002: START - Cookbook sre.hosts.provision for host puppetserver2004.mgmt.codfw.wmnet with chassis set policy GRACEFUL_RESTART
- 15:56 jdrewniak@deploy2002: Synchronized portals: Wikimedia Portals Update: Bumping portals to master (T128546) (duration: 02m 25s)
- 15:54 swfrench-wmf: reprepro update pcre2_10.42-1~wmf11+1 in component/pcre2 from apt-staging - T386006
- 15:53 jdrewniak@deploy2002: Synchronized portals/wikipedia.org/assets: Wikimedia Portals Update: Bumping portals to master (T128546) (duration: 08m 38s)
- 15:53 fceratto@cumin1002: dbctl commit (dc=all): 'Preparing db1253 T385141', diff saved to https://phabricator.wikimedia.org/P74174 and previous config saved to /var/cache/conftool/dbconfig/20250310-155332-fceratto.json
- 15:36 bking@cumin2002: START - Cookbook sre.hosts.reimage for host cloudelastic1010.eqiad.wmnet with OS bullseye
- 15:30 moritzm: installing systemd bugfix updates from Bookworm point release
- 15:17 godog: repool prometheus200[56] - T383232
- 15:16 filippo@puppetserver1001: conftool action : set/pooled=yes; selector: name=prometheus2005.codfw.wmnet
- 15:16 filippo@puppetserver1001: conftool action : set/pooled=yes; selector: name=prometheus2006.codfw.wmnet
- 15:09 jayme@deploy1003: helmfile [codfw] DONE helmfile.d/admin 'apply'.
- 15:08 jayme@deploy1003: helmfile [codfw] START helmfile.d/admin 'apply'.
- 15:08 jayme@deploy1003: helmfile [eqiad] DONE helmfile.d/admin 'apply'.
- 15:07 jayme@deploy1003: helmfile [eqiad] START helmfile.d/admin 'apply'.
- 15:06 jayme@deploy1003: helmfile [staging-eqiad] DONE helmfile.d/admin 'apply'.
- 15:05 jayme@deploy1003: helmfile [staging-eqiad] START helmfile.d/admin 'apply'.
- 15:01 jhancock@cumin2002: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host ganeti2046
- 15:01 elukey@cumin2002: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host puppetserver2004.mgmt.codfw.wmnet with chassis set policy GRACEFUL_RESTART
- 15:01 elukey@cumin2002: START - Cookbook sre.hosts.provision for host puppetserver2004.mgmt.codfw.wmnet with chassis set policy GRACEFUL_RESTART
- 15:01 jhancock@cumin2002: START - Cookbook sre.network.configure-switch-interfaces for host ganeti2046
- 15:01 jhancock@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
- 15:00 tgr_: UTC afternoon deploys done
- 14:58 jhancock@cumin2002: START - Cookbook sre.dns.netbox
- 14:58 jhancock@cumin2002: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host ganeti2045
- 14:58 tgr@deploy2002: Finished scap sync-world: Backport for SpecialCentralAutoLogin: Handle nullable wiki ID (T388252), SUL3: Attach SUL mode to the return URL of local wiki (T388067), Log and add user IDs that mismatch in the runtime exception (T388177) (duration: 15m 48s)
- 14:55 jhancock@cumin2002: START - Cookbook sre.network.configure-switch-interfaces for host ganeti2045
- 14:55 jhancock@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
- 14:55 jhancock@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: adding ganeti2045 to codfw - jhancock@cumin2002"
- 14:53 jhancock@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: adding ganeti2045 to codfw - jhancock@cumin2002"
- 14:51 tgr@deploy2002: tgr: Continuing with sync
- 14:50 moritzm: installing pymysql security updates
- 14:50 tgr@deploy2002: tgr: Backport for SpecialCentralAutoLogin: Handle nullable wiki ID (T388252), SUL3: Attach SUL mode to the return URL of local wiki (T388067), Log and add user IDs that mismatch in the runtime exception (T388177) synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug)
- 14:49 jhancock@cumin2002: START - Cookbook sre.dns.netbox
- 14:48 sukhe: sudo cumin 'P:durum' 'run-puppet-agent'
- 14:48 elukey@cumin2002: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host puppetserver2004.mgmt.codfw.wmnet with chassis set policy GRACEFUL_RESTART
- 14:48 elukey@cumin2002: START - Cookbook sre.hosts.provision for host puppetserver2004.mgmt.codfw.wmnet with chassis set policy GRACEFUL_RESTART
- 14:44 Emperor: restart swift on ms-fe2011
- 14:22 lucaswerkmeister-wmde@deploy2002: lucaswerkmeister-wmde: Continuing with sync
- 14:22 lucaswerkmeister-wmde@deploy2002: lucaswerkmeister-wmde: Backport for Clean up RDF feature flags again (T384344) synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug)
- 14:22 jmm@cumin2002: END (FAIL) - Cookbook sre.ganeti.addnode (exit_code=99) for new host ganeti1028.eqiad.wmnet to cluster eqiad and group C
- 14:21 jmm@cumin2002: START - Cookbook sre.ganeti.addnode for new host ganeti1028.eqiad.wmnet to cluster eqiad and group C
- 14:19 lucaswerkmeister-wmde@deploy2002: Started scap sync-world: Backport for Clean up RDF feature flags again (T384344)
- 14:17 Lucas_WMDE: lucaswerkmeister-wmde@deploy2002 $ mwscript-k8s --comment=T356620 --follow -- namespaceDupes mnwwiktionary --fix | tee T356620
- 14:17 lucaswerkmeister-wmde@deploy2002: Finished scap sync-world: Backport for Enable CX unified dashboard on phase 2 wikis (T387820), Disallow editing modules for non-autoconfirmed users on the English Wikivoyage (T388301), mnwwiktionary: add thesaurus namespace (T356620) (duration: 11m 21s)
- 14:14 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ganeti1028.eqiad.wmnet
- 14:10 lucaswerkmeister-wmde@deploy2002: dreamrimmer, sbisson, anzx, lucaswerkmeister-wmde: Continuing with sync
- 14:08 lucaswerkmeister-wmde@deploy2002: dreamrimmer, sbisson, anzx, lucaswerkmeister-wmde: Backport for Enable CX unified dashboard on phase 2 wikis (T387820), Disallow editing modules for non-autoconfirmed users on the English Wikivoyage (T388301), mnwwiktionary: add thesaurus namespace (T356620) synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug)
- 14:06 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host ganeti1028.eqiad.wmnet
- 14:05 lucaswerkmeister-wmde@deploy2002: Started scap sync-world: Backport for Enable CX unified dashboard on phase 2 wikis (T387820), Disallow editing modules for non-autoconfirmed users on the English Wikivoyage (T388301), mnwwiktionary: add thesaurus namespace (T356620)
- 14:01 brouberol@deploy2002: helmfile [dse-k8s-eqiad] DONE helmfile.d/services/mediawiki-dumps-legacy: apply
- 14:01 brouberol@deploy2002: helmfile [dse-k8s-eqiad] START helmfile.d/services/mediawiki-dumps-legacy: apply
- 14:00 brouberol@deploy2002: helmfile [dse-k8s-eqiad] DONE helmfile.d/services/mediawiki-dumps-legacy: apply
- 14:00 brouberol@deploy2002: helmfile [dse-k8s-eqiad] START helmfile.d/services/mediawiki-dumps-legacy: apply
- 13:58 moritzm: installing libpgjava security updates
- 13:52 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host ganeti1028.eqiad.wmnet with OS bookworm
- 13:33 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on ganeti1028.eqiad.wmnet with reason: host reimage
- 13:26 jmm@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on ganeti1028.eqiad.wmnet with reason: host reimage
- 13:07 godog: test prometheus2007 as the sole host pooled in pybal - T383232
- 13:03 jmm@cumin2002: START - Cookbook sre.hosts.reimage for host ganeti1028.eqiad.wmnet with OS bookworm
- 12:59 filippo@puppetserver1001: conftool action : set/pooled=no; selector: name=prometheus2006.codfw.wmnet
- 12:58 jmm@cumin2002: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts ganeti1028.eqiad.wmnet
- 12:58 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ganeti1028.eqiad.wmnet
- 12:56 filippo@puppetserver1001: conftool action : set/pooled=yes; selector: name=prometheus2007.codfw.wmnet
- 12:55 moritzm: imported wmf-laptop 1.0.1 to apt.wikimedia.org
- 12:44 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host ganeti1028.eqiad.wmnet
- 12:43 jmm@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ganeti1028.eqiad.wmnet
- 12:43 jmm@cumin2002: END (FAIL) - Cookbook sre.hardware.upgrade-firmware (exit_code=99) upgrade firmware for hosts ganeti1028.eqiad.wmnet
- 12:34 arnaudb@cumin1002: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 4:00:00 on gerrit2003.wikimedia.org with reason: testing
- 12:33 jmm@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ganeti1028.eqiad.wmnet
- 12:27 jiji@deploy2002: helmfile [codfw] DONE helmfile.d/services/mw-parsoid: apply
- 12:27 jiji@deploy2002: helmfile [codfw] START helmfile.d/services/mw-parsoid: apply
- 12:26 jiji@deploy2002: helmfile [eqiad] DONE helmfile.d/services/mw-parsoid: apply
- 12:26 jiji@deploy2002: helmfile [eqiad] START helmfile.d/services/mw-parsoid: apply
- 12:25 cgoubert@deploy1003: helmfile [staging-codfw] DONE helmfile.d/admin 'apply'.
- 12:25 jayme@deploy1003: helmfile [aux-k8s-eqiad] DONE helmfile.d/admin 'apply'.
- 12:24 jayme@deploy1003: helmfile [aux-k8s-eqiad] START helmfile.d/admin 'apply'.
- 12:24 jayme@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/admin 'apply'.
- 12:24 jayme@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/admin 'apply'.
- 12:24 jayme@deploy1003: helmfile [ml-staging-codfw] DONE helmfile.d/admin 'apply'.
- 12:23 jayme@deploy1003: helmfile [ml-staging-codfw] START helmfile.d/admin 'apply'.
- 12:23 jayme@deploy1003: helmfile [ml-serve-codfw] DONE helmfile.d/admin 'apply'.
- 12:23 jayme@deploy1003: helmfile [ml-serve-codfw] START helmfile.d/admin 'apply'.
- 12:23 jayme@deploy1003: helmfile [ml-serve-eqiad] DONE helmfile.d/admin 'apply'.
- 12:23 jayme@deploy1003: helmfile [ml-serve-eqiad] START helmfile.d/admin 'apply'.
- 12:23 cgoubert@deploy1003: helmfile [staging-codfw] START helmfile.d/admin 'apply'.
- 12:22 jayme@deploy1003: helmfile [staging-eqiad] DONE helmfile.d/admin 'apply'.
- 12:22 jayme@deploy1003: helmfile [staging-eqiad] START helmfile.d/admin 'apply'.
- 12:22 jayme@deploy1003: helmfile [codfw] DONE helmfile.d/admin 'apply'.
- 12:21 jayme@deploy1003: helmfile [codfw] START helmfile.d/admin 'apply'.
- 12:21 jayme@deploy1003: helmfile [eqiad] DONE helmfile.d/admin 'apply'.
- 12:21 jayme@deploy1003: helmfile [eqiad] START helmfile.d/admin 'apply'.
- 12:18 jiji@deploy2002: helmfile [codfw] DONE helmfile.d/services/mw-jobrunner: apply
- 12:18 jiji@deploy2002: helmfile [codfw] START helmfile.d/services/mw-jobrunner: apply
- 12:18 jiji@deploy2002: helmfile [eqiad] DONE helmfile.d/services/mw-jobrunner: apply
- 12:17 jiji@deploy2002: helmfile [eqiad] START helmfile.d/services/mw-jobrunner: apply
- 12:16 jmm@cumin2002: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on ganeti1028.eqiad.wmnet with reason: remove from cluster for reimage
- 12:14 jiji@deploy2002: helmfile [codfw] DONE helmfile.d/services/mw-api-int: apply
- 12:14 jiji@deploy2002: helmfile [codfw] START helmfile.d/services/mw-api-int: apply
- 12:13 jiji@deploy2002: helmfile [eqiad] DONE helmfile.d/services/mw-api-int: apply
- 12:13 jiji@deploy2002: helmfile [eqiad] START helmfile.d/services/mw-api-int: apply
- 12:01 jayme@deploy1003: helmfile [aux-k8s-eqiad] DONE helmfile.d/admin 'apply'.
- 12:01 jayme@deploy1003: helmfile [aux-k8s-eqiad] START helmfile.d/admin 'apply'.
- 12:01 jayme@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/admin 'apply'.
- 12:00 jayme@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/admin 'apply'.
- 12:00 jayme@deploy1003: helmfile [ml-staging-codfw] DONE helmfile.d/admin 'apply'.
- 11:59 moritzm: installing iputils bugfixes updates
- 11:59 jayme@deploy1003: helmfile [ml-staging-codfw] START helmfile.d/admin 'apply'.
- 11:59 jayme@deploy1003: helmfile [ml-serve-codfw] DONE helmfile.d/admin 'apply'.
- 11:58 jayme@deploy1003: helmfile [ml-serve-codfw] START helmfile.d/admin 'apply'.
- 11:58 jayme@deploy1003: helmfile [ml-serve-eqiad] DONE helmfile.d/admin 'apply'.
- 11:57 jayme@deploy1003: helmfile [ml-serve-eqiad] START helmfile.d/admin 'apply'.
- 11:56 jayme@deploy1003: helmfile [aux-k8s-eqiad] START helmfile.d/admin 'apply'.
- 11:56 jayme@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/admin 'apply'.
- 11:56 jayme@deploy1003: helmfile [ml-staging-codfw] START helmfile.d/admin 'apply'.
- 11:55 jayme@deploy1003: helmfile [ml-serve-codfw] START helmfile.d/admin 'apply'.
- 11:55 jayme@deploy1003: helmfile [ml-serve-eqiad] START helmfile.d/admin 'apply'.
- 11:55 jayme@deploy1003: helmfile [codfw] START helmfile.d/admin 'apply'.
- 11:55 jayme@deploy1003: helmfile [eqiad] START helmfile.d/admin 'apply'.
- 11:55 jmm@cumin2002: END (PASS) - Cookbook sre.ganeti.drain-node (exit_code=0) for draining ganeti node ganeti1028.eqiad.wmnet
- 11:53 jayme@deploy1003: helmfile [eqiad] START helmfile.d/admin 'apply'.
- 11:52 jayme@deploy1003: helmfile [eqiad] START helmfile.d/admin 'apply'.
- 11:51 jayme@deploy1003: helmfile [ml-serve-eqiad] DONE helmfile.d/admin 'apply'.
- 11:51 jayme@deploy1003: helmfile [ml-serve-eqiad] START helmfile.d/admin 'apply'.
- 11:50 jayme@deploy1003: helmfile [codfw] DONE helmfile.d/admin 'apply'.
- 11:50 jayme@deploy1003: helmfile [codfw] START helmfile.d/admin 'apply'.
- 11:49 jayme@deploy1003: helmfile [eqiad] DONE helmfile.d/admin 'apply'.
- 11:49 jayme@deploy1003: helmfile [eqiad] START helmfile.d/admin 'apply'.
- 11:48 jayme@deploy1003: helmfile [staging-eqiad] DONE helmfile.d/admin 'apply'.
- 11:47 jayme@deploy1003: helmfile [staging-eqiad] START helmfile.d/admin 'apply'.
- 11:47 jayme@deploy1003: helmfile [codfw] DONE helmfile.d/admin 'apply'.
- 11:47 jayme@deploy1003: helmfile [codfw] START helmfile.d/admin 'apply'.
- 11:46 jayme@deploy1003: helmfile [eqiad] DONE helmfile.d/admin 'apply'.
- 11:43 jayme@deploy1003: helmfile [eqiad] START helmfile.d/admin 'apply'.
- 11:42 jayme@deploy1003: helmfile [staging-eqiad] DONE helmfile.d/admin 'apply'.
- 11:42 jayme@deploy1003: helmfile [staging-eqiad] START helmfile.d/admin 'apply'.
- 11:41 ladsgroup@deploy2002: Finished scap sync-world: Backport for Set thumbnail steps to 1% of production (T360589) (duration: 10m 27s)
- 11:35 ladsgroup@deploy2002: ladsgroup: Continuing with sync
- 11:34 ladsgroup@deploy2002: ladsgroup: Backport for Set thumbnail steps to 1% of production (T360589) synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug)
- 11:31 elukey@deploy2002: helmfile [codfw] DONE helmfile.d/services/kartotherian: sync
- 11:31 elukey@deploy2002: helmfile [codfw] START helmfile.d/services/kartotherian: sync
- 11:31 ladsgroup@deploy2002: Started scap sync-world: Backport for Set thumbnail steps to 1% of production (T360589)
- 11:31 elukey@deploy2002: helmfile [eqiad] DONE helmfile.d/admin 'sync'.
- 11:30 elukey@deploy2002: helmfile [eqiad] START helmfile.d/admin 'sync'.
- 11:29 elukey@deploy2002: helmfile [codfw] DONE helmfile.d/admin 'sync'.
- 11:29 elukey@deploy2002: helmfile [codfw] START helmfile.d/admin 'sync'.
- 11:27 elukey@deploy2002: helmfile [codfw] DONE helmfile.d/services/kartotherian: sync
- 11:25 marostegui@cumin1002: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on db2142.codfw.wmnet,db1152.eqiad.wmnet with reason: Setting up
- 11:24 marostegui@cumin1002: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on db2144.codfw.wmnet,db[1151-1152].eqiad.wmnet with reason: Setting up
- 11:21 marostegui@cumin1002: dbctl commit (dc=all): 'Set ms3 weights to 1 instead of 100', diff saved to https://phabricator.wikimedia.org/P74171 and previous config saved to /var/cache/conftool/dbconfig/20250310-112140-marostegui.json
- 11:20 marostegui@cumin1002: dbctl commit (dc=all): 'Add weight to ms1 hosts T387332', diff saved to https://phabricator.wikimedia.org/P74170 and previous config saved to /var/cache/conftool/dbconfig/20250310-112046-marostegui.json
- 11:17 marostegui@cumin1002: dbctl commit (dc=all): 'Push ms1 config T387332', diff saved to https://phabricator.wikimedia.org/P74169 and previous config saved to /var/cache/conftool/dbconfig/20250310-111742-marostegui.json
- 11:09 marostegui@cumin1002: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on 8 hosts with reason: Cloning
- 11:07 elukey@deploy2002: helmfile [codfw] START helmfile.d/services/kartotherian: sync
- 11:06 elukey@deploy2002: helmfile [staging] DONE helmfile.d/services/kartotherian: sync
- 11:06 elukey@deploy2002: helmfile [staging] START helmfile.d/services/kartotherian: sync
- 11:03 vgutierrez@cumin1002: END (PASS) - Cookbook sre.loadbalancer.migrate-service-ipip (exit_code=0) for role: wdqs::scholarly@eqiad
- 11:03 vgutierrez@cumin1002: END (PASS) - Cookbook sre.loadbalancer.restart-pybal (exit_code=0) rolling-restart of pybal on (A:lvs-low-traffic-eqiad or A:lvs-secondary-eqiad) and A:bullseye and A:lvs
- 11:02 vgutierrez@cumin1002: START - Cookbook sre.loadbalancer.restart-pybal rolling-restart of pybal on (A:lvs-low-traffic-eqiad or A:lvs-secondary-eqiad) and A:bullseye and A:lvs
- 10:57 stevemunene@cumin1002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host dse-k8s-etcd1002.eqiad.wmnet with OS bookworm
- 10:55 moritzm: installing qemu security updates
- 10:45 ladsgroup@deploy2002: Finished deploy [dumps/dumps@afcb740]: Removing Yahoo! abstract dumps code (T382069) (duration: 00m 07s)
- 10:45 ladsgroup@deploy2002: Started deploy [dumps/dumps@afcb740]: Removing Yahoo! abstract dumps code (T382069)
- 10:45 vgutierrez@cumin1002: START - Cookbook sre.loadbalancer.migrate-service-ipip for role: wdqs::scholarly@eqiad
- 10:43 vgutierrez@cumin1002: END (PASS) - Cookbook sre.loadbalancer.migrate-service-ipip (exit_code=0) for role: wdqs::scholarly@codfw
- 10:43 vgutierrez@cumin1002: END (PASS) - Cookbook sre.loadbalancer.restart-pybal (exit_code=0) rolling-restart of pybal on (A:lvs-low-traffic-codfw or A:lvs-secondary-codfw) and A:bullseye and A:lvs
- 10:42 vgutierrez@cumin1002: START - Cookbook sre.loadbalancer.restart-pybal rolling-restart of pybal on (A:lvs-low-traffic-codfw or A:lvs-secondary-codfw) and A:bullseye and A:lvs
- 10:37 vgutierrez@cumin1002: START - Cookbook sre.loadbalancer.migrate-service-ipip for role: wdqs::scholarly@codfw
- 10:37 filippo@puppetserver1001: conftool action : set/pooled=no; selector: name=prometheus2005.codfw.wmnet
- 10:36 filippo@puppetserver1001: conftool action : set/pooled=no; selector: name=prometheus2007.codfw.wmnet
- 10:36 stevemunene@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on dse-k8s-etcd1002.eqiad.wmnet with reason: host reimage
- 10:33 filippo@puppetserver1001: conftool action : set/weight=10; selector: name=prometheus2007.codfw.wmnet
- 10:32 stevemunene@cumin1002: START - Cookbook sre.hosts.downtime for 2:00:00 on dse-k8s-etcd1002.eqiad.wmnet with reason: host reimage
- 10:22 stevemunene@cumin1002: START - Cookbook sre.hosts.reimage for host dse-k8s-etcd1002.eqiad.wmnet with OS bookworm
- 10:16 vgutierrez@cumin1002: END (PASS) - Cookbook sre.loadbalancer.migrate-service-ipip (exit_code=0) for role: wdqs::main@eqiad
- 10:16 vgutierrez@cumin1002: END (PASS) - Cookbook sre.loadbalancer.restart-pybal (exit_code=0) rolling-restart of pybal on (A:lvs-low-traffic-eqiad or A:lvs-secondary-eqiad) and A:bullseye and A:lvs
- 10:15 vgutierrez@cumin1002: START - Cookbook sre.loadbalancer.restart-pybal rolling-restart of pybal on (A:lvs-low-traffic-eqiad or A:lvs-secondary-eqiad) and A:bullseye and A:lvs
- 10:15 godog: test moving k8s-mlstaging from prometheus2005 to prometheus2007 - T383232
- 10:07 marostegui@cumin1002: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on 8 hosts with reason: Cloning
- 10:07 vgutierrez@cumin1002: START - Cookbook sre.loadbalancer.migrate-service-ipip for role: wdqs::main@eqiad
- 10:05 marostegui@cumin1002: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1217.eqiad.wmnet with reason: Reboot
- 10:04 marostegui@cumin1002: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1164.eqiad.wmnet with reason: Reboot
- 10:04 vgutierrez@cumin1002: END (PASS) - Cookbook sre.loadbalancer.migrate-service-ipip (exit_code=0) for role: wdqs::main@codfw
- 10:04 vgutierrez@cumin1002: END (PASS) - Cookbook sre.loadbalancer.restart-pybal (exit_code=0) rolling-restart of pybal on (A:lvs-low-traffic-codfw or A:lvs-secondary-codfw) and A:bullseye and A:lvs
- 10:03 vgutierrez@cumin1002: START - Cookbook sre.loadbalancer.restart-pybal rolling-restart of pybal on (A:lvs-low-traffic-codfw or A:lvs-secondary-codfw) and A:bullseye and A:lvs
- 09:57 vgutierrez@cumin1002: START - Cookbook sre.loadbalancer.migrate-service-ipip for role: wdqs::main@codfw
- 09:39 elukey: run puppetserver.delete() for relforge100[567] and elastic110[456] - pending certificate requests since weeks ago, DSE confirmed those hosts are not in prod/used.
- 09:33 moritzm: installing exim4 bugfix updates from Bookworm point release
- 09:28 marostegui@cumin1002: END (PASS) - Cookbook sre.mysql.upgrade (exit_code=0) for db1164.eqiad.wmnet
- 09:23 marostegui@cumin1002: START - Cookbook sre.mysql.upgrade for db1164.eqiad.wmnet
- 09:23 marostegui@cumin1002: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1164.eqiad.wmnet with reason: Reboot
- 09:20 isaranto@deploy2002: helmfile [ml-staging-codfw] Ran 'sync' command on namespace 'article-descriptions' for release 'main' .
- 09:06 marostegui@cumin1002: dbctl commit (dc=all): 'Repool pc1 T387953', diff saved to https://phabricator.wikimedia.org/P74166 and previous config saved to /var/cache/conftool/dbconfig/20250310-090600-marostegui.json
- 08:42 marostegui@cumin1002: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on pc2011.codfw.wmnet,pc1011.eqiad.wmnet with reason: Migration to 10.11
- 08:37 marostegui@cumin1002: dbctl commit (dc=all): 'Depool pc1 T387953', diff saved to https://phabricator.wikimedia.org/P74164 and previous config saved to /var/cache/conftool/dbconfig/20250310-083746-marostegui.json
- 08:31 awight: UTC morning backports are done
- 08:30 brouberol@deploy2002: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/airflow-analytics-product: apply
- 08:30 jmm@cumin2002: START - Cookbook sre.ganeti.drain-node for draining ganeti node ganeti1028.eqiad.wmnet
- 08:30 brouberol@deploy2002: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/airflow-analytics-product: apply
- 08:29 awight@deploy2002: Finished scap sync-world: Backport for Disallow editing modules for non-confirmed/non-autoconfirmed users on the English Wikivoyage (T388301) (duration: 24m 08s)
- 08:29 jmm@cumin2002: END (PASS) - Cookbook sre.ganeti.drain-node (exit_code=0) for draining ganeti node ganeti1028.eqiad.wmnet
- 08:27 jmm@cumin2002: START - Cookbook sre.ganeti.drain-node for draining ganeti node ganeti1028.eqiad.wmnet
- 08:19 awight@deploy2002: awight, dreamrimmer: Continuing with sync
- 08:18 awight@deploy2002: awight, dreamrimmer: Backport for Disallow editing modules for non-confirmed/non-autoconfirmed users on the English Wikivoyage (T388301) synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug)
- 08:09 marostegui: Failover m1 from db1164 to db1250 - T388024
- 08:07 marostegui@cumin1002: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on db[2160,2232].codfw.wmnet,db[1164,1217,1250].eqiad.wmnet with reason: Primary switchover m1 T388024
- 08:05 awight@deploy2002: Started scap sync-world: Backport for Disallow editing modules for non-confirmed/non-autoconfirmed users on the English Wikivoyage (T388301)
- 07:38 jmm@cumin2002: DONE (PASS) - Cookbook sre.idm.logout (exit_code=0) Logging Jdcc-berkman out of all services on: 1284 hosts
- 07:37 jmm@cumin2002: DONE (PASS) - Cookbook sre.idm.logout (exit_code=0) Logging Jdcc-berkman out of all services on: 961 hosts
2025-03-09
- 10:13 elukey@puppetserver1001: conftool action : set/weight=5; selector: name=wikikube-worker2.*,dc=codfw,cluster=maps,service=kartotherian-k8s-ssl
- 10:12 elukey@puppetserver1001: conftool action : set/weight=5; selector: name=wikikube-worker1.*,dc=eqiad,cluster=maps,service=kartotherian-k8s-ssl
- 10:12 elukey@puppetserver1001: conftool action : set/pooled=yes; selector: name=maps2005.codfw.wmnet,dc=codfw,cluster=maps,service=kartotherian-k8s-ssl
- 10:12 elukey@puppetserver1001: conftool action : set/pooled=yes; selector: name=maps1005.eqiad.wmnet,dc=eqiad,cluster=maps,service=kartotherian-k8s-ssl
2025-03-08
- 11:33 moritzm: truncated /var/log/syslog on seaborgium and bounced slapd
- 00:34 tzatziki: removing 3 files for legal compliance
2025-03-07
- 22:42 inflatador: bking@cloudelastic1009 exclude `cloudelastic1010` from master voting T387904
- 22:17 ryankemper: [Cloudelastic] Doing a `/_cluster/reroute?retry_failed=true` of all 3 elastic/opensearch clusters
- 22:02 bking@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host cloudelastic1009.eqiad.wmnet with OS bullseye
- 22:02 bking@cumin2002: END (PASS) - Cookbook sre.elasticsearch.ban (exit_code=0) Unbanning all hosts in cloudelastic
- 22:01 bking@cumin2002: START - Cookbook sre.elasticsearch.ban Unbanning all hosts in cloudelastic
- 21:36 bking@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on cloudelastic1009.eqiad.wmnet with reason: host reimage
- 21:32 bking@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on cloudelastic1009.eqiad.wmnet with reason: host reimage
- 21:13 bking@cumin2002: START - Cookbook sre.hosts.reimage for host cloudelastic1009.eqiad.wmnet with OS bullseye
- 21:12 bking@cumin2002: END (ERROR) - Cookbook sre.hosts.reimage (exit_code=97) for host cloudelastic1009.eqiad.wmnet with OS bullseye
- 20:58 bking@cumin2002: START - Cookbook sre.hosts.reimage for host cloudelastic1009.eqiad.wmnet with OS bullseye
- 20:49 bking@cumin2002: END (FAIL) - Cookbook sre.hardware.upgrade-firmware (exit_code=1) upgrade firmware for hosts ['cloudelastic1009.eqiad.wmnet']
- 20:44 bking@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['cloudelastic1009.eqiad.wmnet']
- 20:42 bking@cumin2002: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts ['cloudelastic1009.eqiad.wmnet']
- 20:32 bking@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['cloudelastic1009.eqiad.wmnet']
- 20:32 bking@cumin2002: END (ERROR) - Cookbook sre.hardware.upgrade-firmware (exit_code=97) upgrade firmware for hosts ['cloudelastic1009.eqiad.wmnet']
- 20:32 bking@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['cloudelastic1009.eqiad.wmnet']
- 20:17 bking@cumin2002: END (FAIL) - Cookbook sre.hardware.upgrade-firmware (exit_code=99) upgrade firmware for hosts ['cloudelastic1009.eqiad.wmnet']
- 20:17 bking@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['cloudelastic1009.eqiad.wmnet']
- 20:17 bking@cumin2002: END (FAIL) - Cookbook sre.hardware.upgrade-firmware (exit_code=1) upgrade firmware for hosts ['cloudelastic1009.eqiad.wmnet']
- 20:16 bking@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['cloudelastic1009.eqiad.wmnet']
- 20:16 bking@cumin2002: END (FAIL) - Cookbook sre.hardware.upgrade-firmware (exit_code=99) upgrade firmware for hosts ['cloudelastic1009.eqiad.wmnet']
- 20:08 bking@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['cloudelastic1009.eqiad.wmnet']
- 20:06 bking@cumin2002: END (ERROR) - Cookbook sre.hosts.reimage (exit_code=97) for host cloudelastic1009.eqiad.wmnet with OS bullseye
- 19:53 bking@cumin2002: START - Cookbook sre.hosts.reimage for host cloudelastic1009.eqiad.wmnet with OS bullseye
- 19:53 bking@cumin2002: END (ERROR) - Cookbook sre.hosts.reimage (exit_code=97) for host cloudelastic1009.eqiad.wmnet with OS bullseye
- 19:20 bking@cumin2002: START - Cookbook sre.hosts.reimage for host cloudelastic1009.eqiad.wmnet with OS bullseye
- 17:38 cmooney@cumin1002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
- 17:38 cmooney@cumin1002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: dns additions for eqiad E8/F8 links to new switches - cmooney@cumin1002"
- 17:38 cmooney@cumin1002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: dns additions for eqiad E8/F8 links to new switches - cmooney@cumin1002"
- 17:32 cmooney@cumin1002: START - Cookbook sre.dns.netbox
- 17:26 jclark@cumin1002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host an-worker1202.eqiad.wmnet with OS bullseye
- 17:26 jclark@cumin1002: END (FAIL) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=99) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jclark@cumin1002"
- 17:18 cmooney@cumin1002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
- 17:18 cmooney@cumin1002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: dns additions for eqiad E8/F8 links to new switches - cmooney@cumin1002"
- 17:14 cmooney@cumin1002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: dns additions for eqiad E8/F8 links to new switches - cmooney@cumin1002"
- 17:08 jclark@cumin1002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jclark@cumin1002"
- 17:06 cmooney@cumin1002: START - Cookbook sre.dns.netbox
- 16:46 marostegui@cumin1002: dbctl commit (dc=all): 'Depool db1246', diff saved to https://phabricator.wikimedia.org/P74156 and previous config saved to /var/cache/conftool/dbconfig/20250307-164605-root.json
- 16:45 jclark@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on an-worker1202.eqiad.wmnet with reason: host reimage
- 16:41 jclark@cumin1002: START - Cookbook sre.hosts.downtime for 2:00:00 on an-worker1202.eqiad.wmnet with reason: host reimage
- 16:33 jclark@cumin1002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host an-worker1208.eqiad.wmnet with OS bullseye
- 16:33 jclark@cumin1002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jclark@cumin1002"
- 16:32 jclark@cumin1002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jclark@cumin1002"
- 16:29 jclark@cumin1002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host an-worker1207.eqiad.wmnet with OS bullseye
- 16:29 jclark@cumin1002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jclark@cumin1002"
- 16:28 jclark@cumin1002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jclark@cumin1002"
- 16:26 jclark@cumin1002: START - Cookbook sre.hosts.reimage for host an-worker1202.eqiad.wmnet with OS bullseye
- 16:25 jclark@cumin1002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host an-worker1206.eqiad.wmnet with OS bullseye
- 16:25 jclark@cumin1002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jclark@cumin1002"
- 16:25 jclark@cumin1002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jclark@cumin1002"
- 16:21 jclark@cumin1002: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host an-worker1202.eqiad.wmnet with OS bullseye
- 16:20 sbassett: Deployed security patch for T387691
- 16:17 jclark@cumin1002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host an-worker1205.eqiad.wmnet with OS bullseye
- 16:17 jclark@cumin1002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jclark@cumin1002"
- 16:16 jclark@cumin1002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jclark@cumin1002"
- 16:09 jclark@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on an-worker1208.eqiad.wmnet with reason: host reimage
- 16:05 jclark@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on an-worker1207.eqiad.wmnet with reason: host reimage
- 16:04 jclark@cumin1002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host an-worker1203.eqiad.wmnet with OS bullseye
- 16:04 jclark@cumin1002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jclark@cumin1002"
- 16:04 jclark@cumin1002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host an-worker1204.eqiad.wmnet with OS bullseye
- 16:04 jclark@cumin1002: END (FAIL) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=99) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jclark@cumin1002"
- 16:02 jclark@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on an-worker1206.eqiad.wmnet with reason: host reimage
- 15:59 jclark@cumin1002: START - Cookbook sre.hosts.downtime for 2:00:00 on an-worker1207.eqiad.wmnet with reason: host reimage
- 15:58 isaranto@deploy2002: helmfile [ml-staging-codfw] Ran 'sync' command on namespace 'revision-models' for release 'main' .
- 15:58 jclark@cumin1002: START - Cookbook sre.hosts.downtime for 2:00:00 on an-worker1208.eqiad.wmnet with reason: host reimage
- 15:58 jclark@cumin1002: START - Cookbook sre.hosts.downtime for 2:00:00 on an-worker1206.eqiad.wmnet with reason: host reimage
- 15:58 isaranto@deploy2002: helmfile [ml-serve-codfw] Ran 'sync' command on namespace 'revision-models' for release 'main' .
- 15:58 isaranto@deploy2002: helmfile [ml-serve-eqiad] Ran 'sync' command on namespace 'revision-models' for release 'main' .
- 15:55 jclark@cumin1002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jclark@cumin1002"
- 15:50 jclark@cumin1002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jclark@cumin1002"
- 15:50 jclark@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on an-worker1205.eqiad.wmnet with reason: host reimage
- 15:49 cmooney@cumin1002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "updating for renamed dell switches in eqiad - cmooney@cumin1002"
- 15:49 cmooney@cumin1002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "updating for renamed dell switches in eqiad - cmooney@cumin1002"
- 15:46 jclark@cumin1002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host an-worker1201.eqiad.wmnet with OS bullseye
- 15:46 jclark@cumin1002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jclark@cumin1002"
- 15:46 jclark@cumin1002: START - Cookbook sre.hosts.downtime for 2:00:00 on an-worker1205.eqiad.wmnet with reason: host reimage
- 15:46 jclark@cumin1002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jclark@cumin1002"
- 15:44 jclark@cumin1002: START - Cookbook sre.hosts.reimage for host an-worker1207.eqiad.wmnet with OS bullseye
- 15:44 jclark@cumin1002: START - Cookbook sre.hosts.reimage for host an-worker1208.eqiad.wmnet with OS bullseye
- 15:43 jclark@cumin1002: START - Cookbook sre.hosts.reimage for host an-worker1206.eqiad.wmnet with OS bullseye
- 15:35 cmooney@cumin1002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
- 15:35 cmooney@cumin1002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: change dns names for eqiad rack e8 endpoints - cmooney@cumin1002"
- 15:35 cmooney@cumin1002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: change dns names for eqiad rack e8 endpoints - cmooney@cumin1002"
- 15:33 swfrench@deploy2002: Finished scap sync-world: helmfile-only deploy to reduce likelihood of deployment timeouts - T383845 (duration: 04m 33s)
- 15:32 jclark@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on an-worker1204.eqiad.wmnet with reason: host reimage
- 15:31 swfrench@deploy2002: Started scap sync-world: helmfile-only deploy to reduce likelihood of deployment timeouts - T383845
- 15:31 jclark@cumin1002: START - Cookbook sre.hosts.reimage for host an-worker1205.eqiad.wmnet with OS bullseye
- 15:30 cmooney@cumin1002: START - Cookbook sre.dns.netbox
- 15:29 cmooney@cumin1002: END (FAIL) - Cookbook sre.dns.netbox (exit_code=99)
- 15:27 jclark@cumin1002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host an-worker1200.eqiad.wmnet with OS bullseye
- 15:27 jclark@cumin1002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jclark@cumin1002"
- 15:27 jclark@cumin1002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jclark@cumin1002"
- 15:25 jclark@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on an-worker1203.eqiad.wmnet with reason: host reimage
- 15:24 cmooney@cumin1002: START - Cookbook sre.dns.netbox
- 15:24 cmooney@cumin1002: END (FAIL) - Cookbook sre.dns.netbox (exit_code=99)
- 15:22 jclark@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on an-worker1201.eqiad.wmnet with reason: host reimage
- 15:20 jclark@cumin1002: START - Cookbook sre.hosts.downtime for 2:00:00 on an-worker1203.eqiad.wmnet with reason: host reimage
- 15:20 jclark@cumin1002: START - Cookbook sre.hosts.downtime for 2:00:00 on an-worker1204.eqiad.wmnet with reason: host reimage
- 15:18 jclark@cumin1002: START - Cookbook sre.hosts.downtime for 2:00:00 on an-worker1201.eqiad.wmnet with reason: host reimage
- 15:15 cmooney@cumin1002: START - Cookbook sre.dns.netbox
- 15:13 klausman@deploy2002: helmfile [ml-serve-codfw] DONE helmfile.d/admin 'apply'.
- 15:12 klausman@deploy2002: helmfile [ml-serve-codfw] START helmfile.d/admin 'apply'.
- 15:08 klausman@deploy2002: helmfile [ml-serve-eqiad] DONE helmfile.d/admin 'apply'.
- 15:06 klausman@deploy2002: helmfile [ml-serve-eqiad] START helmfile.d/admin 'apply'.
- 15:05 jclark@cumin1002: START - Cookbook sre.hosts.reimage for host an-worker1203.eqiad.wmnet with OS bullseye
- 15:05 jclark@cumin1002: START - Cookbook sre.hosts.reimage for host an-worker1204.eqiad.wmnet with OS bullseye
- 15:04 jclark@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on an-worker1200.eqiad.wmnet with reason: host reimage
- 15:04 jclark@cumin1002: START - Cookbook sre.hosts.reimage for host an-worker1202.eqiad.wmnet with OS bullseye
- 15:04 jclark@cumin1002: START - Cookbook sre.hosts.reimage for host an-worker1201.eqiad.wmnet with OS bullseye
- 15:01 jclark@cumin1002: START - Cookbook sre.hosts.downtime for 2:00:00 on an-worker1200.eqiad.wmnet with reason: host reimage
- 14:46 jclark@cumin1002: START - Cookbook sre.hosts.reimage for host an-worker1200.eqiad.wmnet with OS bullseye
- 14:44 jclark@cumin1002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host an-worker1198.eqiad.wmnet with OS bullseye
- 14:44 jclark@cumin1002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jclark@cumin1002"
- 14:12 elukey@puppetserver1001: conftool action : set/weight=10:pooled=yes; selector: name=wikikube-worker1.*,dc=eqiad,cluster=maps,service=kartotherian-k8s-ssl
- 14:12 elukey@puppetserver1001: conftool action : set/weight=10:pooled=yes; selector: name=wikikube-worker2.*,dc=codfw,cluster=maps,service=kartotherian-k8s-ssl
- 14:02 jclark@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on an-worker1198.eqiad.wmnet with reason: host reimage
- 13:58 jclark@cumin1002: START - Cookbook sre.hosts.downtime for 2:00:00 on an-worker1198.eqiad.wmnet with reason: host reimage
- 13:42 jclark@cumin1002: START - Cookbook sre.hosts.reimage for host an-worker1198.eqiad.wmnet with OS bullseye
- 13:25 isaranto@deploy2002: helmfile [ml-serve-eqiad] Ran 'sync' command on namespace 'revision-models' for release 'main' .
- 13:21 isaranto@deploy2002: helmfile [ml-serve-codfw] Ran 'sync' command on namespace 'revision-models' for release 'main' .
- 13:19 isaranto@deploy2002: helmfile [ml-staging-codfw] Ran 'sync' command on namespace 'revision-models' for release 'main' .
- 11:58 elukey@cumin1002: END (FAIL) - Cookbook sre.hosts.reboot-single (exit_code=1) for host ms-be2088.codfw.wmnet
- 11:46 elukey@cumin1002: START - Cookbook sre.hosts.reboot-single for host ms-be2088.codfw.wmnet
- 11:27 elukey@cumin1002: END (FAIL) - Cookbook sre.hosts.reboot-single (exit_code=1) for host ms-be2088.codfw.wmnet
- 11:16 elukey@cumin1002: START - Cookbook sre.hosts.reboot-single for host ms-be2088.codfw.wmnet
- 10:50 jmm@cumin2002: END (FAIL) - Cookbook sre.ganeti.addnode (exit_code=99) for new host ganeti1035.eqiad.wmnet to cluster eqiad and group A
- 10:48 jmm@cumin2002: START - Cookbook sre.ganeti.addnode for new host ganeti1035.eqiad.wmnet to cluster eqiad and group A
- 10:46 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ganeti1035.eqiad.wmnet
- 10:37 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host ganeti1035.eqiad.wmnet
- 10:30 elukey@puppetserver1001: conftool action : set/pooled=inactive; selector: name=maps1005.eqiad.wmnet,dc=eqiad,cluster=maps,service=kartotherian-k8s-ssl
- 10:30 elukey@puppetserver1001: conftool action : set/pooled=inactive; selector: name=maps2005.codfw.wmnet,dc=codfw,cluster=maps,service=kartotherian-k8s-ssl
- 10:22 jmm@cumin2002: DONE (PASS) - Cookbook sre.idm.logout (exit_code=0) Logging Halfak out of all services on: 951 hosts
- 10:21 jmm@cumin2002: DONE (PASS) - Cookbook sre.idm.logout (exit_code=0) Logging Halfak out of all services on: 1284 hosts
- 10:13 moritzm: updated pwstore key for btullis
- 09:38 elukey@puppetserver1001: conftool action : set/weight=10; selector: name=wikikube-worker2.*,dc=codfw,cluster=maps,service=kartotherian-k8s-ssl
- 09:37 elukey@puppetserver1001: conftool action : set/weight=10; selector: name=wikikube-worker1.*,dc=eqiad,cluster=maps,service=kartotherian-k8s-ssl
- 09:27 elukey@deploy2002: helmfile [codfw] DONE helmfile.d/services/kartotherian: sync
- 09:21 elukey@deploy2002: helmfile [codfw] START helmfile.d/services/kartotherian: sync
- 09:20 elukey@deploy2002: helmfile [codfw] DONE helmfile.d/admin 'sync'.
- 09:20 elukey@deploy2002: helmfile [codfw] START helmfile.d/admin 'sync'.
- 09:18 elukey@deploy2002: helmfile [codfw] DONE helmfile.d/services/kartotherian: sync
- 09:12 jelto@deploy2002: helmfile [codfw] DONE helmfile.d/services/miscweb: apply
- 09:09 jelto@deploy2002: helmfile [codfw] START helmfile.d/services/miscweb: apply
- 09:08 jelto@deploy2002: helmfile [eqiad] DONE helmfile.d/services/miscweb: apply
- 09:07 elukey@deploy2002: helmfile [codfw] START helmfile.d/services/kartotherian: sync
- 09:07 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host ganeti1035.eqiad.wmnet with OS bookworm
- 09:05 jelto@deploy2002: helmfile [eqiad] START helmfile.d/services/miscweb: apply
- 09:03 jelto@deploy2002: helmfile [staging] DONE helmfile.d/services/miscweb: apply
- 09:02 jelto@deploy2002: helmfile [staging] START helmfile.d/services/miscweb: apply
- 08:55 elukey@deploy2002: helmfile [eqiad] DONE helmfile.d/services/kartotherian: sync
- 08:52 elukey@deploy2002: helmfile [eqiad] START helmfile.d/services/kartotherian: sync
- 08:50 elukey@deploy2002: helmfile [eqiad] DONE helmfile.d/services/kartotherian: sync
- 08:48 jayme: updated helmfile to 0.171.0-5 on deploy* - T387837
- 08:48 jayme: imported helmfile 0.171.0-5 to bullseye-wikimedia and bookworm-wikimedia - T387837
- 08:46 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on ganeti1035.eqiad.wmnet with reason: host reimage
- 08:43 jmm@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on ganeti1035.eqiad.wmnet with reason: host reimage
- 08:43 elukey@deploy2002: helmfile [eqiad] DONE helmfile.d/admin 'sync'.
- 08:42 elukey@deploy2002: helmfile [eqiad] START helmfile.d/admin 'sync'.
- 08:40 elukey@deploy2002: helmfile [eqiad] START helmfile.d/services/kartotherian: sync
- 08:39 elukey@deploy2002: helmfile [staging] DONE helmfile.d/services/kartotherian: sync
- 08:39 elukey@deploy2002: helmfile [staging] START helmfile.d/services/kartotherian: sync
- 08:21 jmm@cumin2002: START - Cookbook sre.hosts.reimage for host ganeti1035.eqiad.wmnet with OS bookworm
- 08:15 elukey@deploy2002: helmfile [staging] DONE helmfile.d/services/changeprop-jobqueue: sync
- 08:15 elukey@deploy2002: helmfile [staging] START helmfile.d/services/changeprop-jobqueue: sync
- 08:15 elukey@deploy2002: helmfile [staging] DONE helmfile.d/services/changeprop: sync
- 08:15 elukey@deploy2002: helmfile [staging] START helmfile.d/services/changeprop: sync
- 08:12 moritzm: installing Linux 5.10.234 on Bullseye hosts (just the rollout of the new kernels, no immediate reboots involved)
- 08:07 jmm@cumin2002: DONE (PASS) - Cookbook sre.idm.logout (exit_code=0) Logging JJMC89 out of all services on: 2 hosts
- 07:51 moritzm: installing emacs security updates
- 07:36 hashar@deploy2002: Finished deploy [releng/jenkins-deploy@34b35a5] (releasing): Upgrade to Jenkins LTS 2.492.2 (duration: 01m 23s)
- 07:35 hashar@deploy2002: Started deploy [releng/jenkins-deploy@34b35a5] (releasing): Upgrade to Jenkins LTS 2.492.2
- 07:31 hashar: Upgrading Jenkins on contint1002
- 01:06 jclark@cumin1002: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host an-worker1201.eqiad.wmnet with OS bullseye
- 00:41 jclark@cumin1002: START - Cookbook sre.hosts.reimage for host an-worker1208.eqiad.wmnet with OS bullseye
- 00:41 jclark@cumin1002: START - Cookbook sre.hosts.reimage for host an-worker1207.eqiad.wmnet with OS bullseye
- 00:41 jclark@cumin1002: START - Cookbook sre.hosts.reimage for host an-worker1206.eqiad.wmnet with OS bullseye
- 00:40 jclark@cumin1002: START - Cookbook sre.hosts.reimage for host an-worker1205.eqiad.wmnet with OS bullseye
- 00:40 jclark@cumin1002: START - Cookbook sre.hosts.reimage for host an-worker1204.eqiad.wmnet with OS bullseye
- 00:40 jclark@cumin1002: START - Cookbook sre.hosts.reimage for host an-worker1203.eqiad.wmnet with OS bullseye
- 00:40 jclark@cumin1002: START - Cookbook sre.hosts.reimage for host an-worker1202.eqiad.wmnet with OS bullseye
- 00:01 jclark@cumin1002: START - Cookbook sre.hosts.reimage for host an-worker1201.eqiad.wmnet with OS bullseye
2025-03-06
- 23:19 joal@deploy2002: Finished deploy [analytics/refinery@64b629d]: emergency deploy for gobblin event_default recenchange memory issue - 2 (duration: 01m 13s)
- 23:18 joal@deploy2002: Started deploy [analytics/refinery@64b629d]: emergency deploy for gobblin event_default recenchange memory issue - 2
- 23:03 tgr@deploy2002: Finished scap sync-world: Backport for Enable SUL3 signup for 50% of group 1 users (T384007) (duration: 20m 55s)
- 22:56 tgr@deploy2002: tgr: Continuing with sync
- 22:45 tgr@deploy2002: tgr: Backport for Enable SUL3 signup for 50% of group 1 users (T384007) synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug)
- 22:42 tgr@deploy2002: Started scap sync-world: Backport for Enable SUL3 signup for 50% of group 1 users (T384007)
- 22:39 toyofuku@deploy2002: Finished scap sync-world: Backport for Enable Search AB test for en wiki (duration: 18m 27s)
- 22:33 toyofuku@deploy2002: toyofuku, bwang: Continuing with sync
- 22:26 fceratto@cumin1002: END (PASS) - Cookbook sre.mysql.clone (exit_code=0) of db1202.eqiad.wmnet onto db1253.eqiad.wmnet
- 22:23 toyofuku@deploy2002: toyofuku, bwang: Backport for Enable Search AB test for en wiki synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug)
- 22:21 toyofuku@deploy2002: Started scap sync-world: Backport for Enable Search AB test for en wiki
- 22:13 tgr@deploy2002: Finished scap sync-world: Backport for Revert^2 "Fix nested refs with the same name but a different group" (duration: 12m 44s)
- 22:06 tgr@deploy2002: tgr, ssastry: Continuing with sync
- 22:03 tgr@deploy2002: tgr, ssastry: Backport for Revert^2 "Fix nested refs with the same name but a different group" synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug)
- 22:00 tgr@deploy2002: Started scap sync-world: Backport for Revert^2 "Fix nested refs with the same name but a different group"
- 21:55 tgr@deploy2002: Finished scap sync-world: Backport for Remove unused $wgDiscussionToolsABTest, Remove unused $wgOATHAuthMultipleDevicesMigrationStage, Deduplicate JsonConfig config (duration: 15m 00s)
- 21:54 otto@deploy2002: Finished deploy [analytics/refinery@ec4c468]: 'emergency deploy for gobblin event_default recenchange memory issue' (duration: 01m 55s)
- 21:53 otto@deploy2002: Started deploy [analytics/refinery@ec4c468]: 'emergency deploy for gobblin event_default recenchange memory issue'
- 21:49 tgr@deploy2002: matmarex, tgr: Continuing with sync
- 21:43 tgr@deploy2002: matmarex, tgr: Backport for Remove unused $wgDiscussionToolsABTest, Remove unused $wgOATHAuthMultipleDevicesMigrationStage, Deduplicate JsonConfig config synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug)
- 21:40 tgr@deploy2002: Started scap sync-world: Backport for Remove unused $wgDiscussionToolsABTest, Remove unused $wgOATHAuthMultipleDevicesMigrationStage, Deduplicate JsonConfig config
- 21:32 bking@cumin2002: END (PASS) - Cookbook sre.elasticsearch.ban (exit_code=0) Banning hosts: cloudelastic1009* for ban host prior to reimage - bking@cumin2002 - T387904
- 21:32 bking@cumin2002: START - Cookbook sre.elasticsearch.ban Banning hosts: cloudelastic1009* for ban host prior to reimage - bking@cumin2002 - T387904
- 19:49 jgiannelos@deploy2002: helmfile [eqiad] DONE helmfile.d/services/changeprop: apply
- 19:48 jgiannelos@deploy2002: helmfile [eqiad] START helmfile.d/services/changeprop: apply
- 19:11 ebernhardson: T379002 start reindex of cirrus cebwiki_content index in codfw
- 19:10 btullis@cumin1002: END (FAIL) - Cookbook sre.hosts.reboot-single (exit_code=99) for host an-presto1014.eqiad.wmnet
- 19:09 ebernhardson: T379002 start reindex of cirrus cebwiki_content index in eqiad
- 19:06 vriley@cumin1002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host an-worker1180.eqiad.wmnet with OS bullseye
- 19:06 vriley@cumin1002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - vriley@cumin1002"
- 19:05 vriley@cumin1002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - vriley@cumin1002"
- 19:04 ebernhardson@deploy2002: helmfile [eqiad] DONE helmfile.d/services/cirrus-streaming-updater: apply
- 19:04 ebernhardson@deploy2002: helmfile [eqiad] START helmfile.d/services/cirrus-streaming-updater: apply
- 18:58 ebernhardson@deploy2002: helmfile [codfw] DONE helmfile.d/services/cirrus-streaming-updater: apply
- 18:58 ebernhardson@deploy2002: helmfile [codfw] START helmfile.d/services/cirrus-streaming-updater: apply
- 18:45 swfrench-wmf: mw-web: migrated 5% of residual PHP 7.4 traffic to 8.1 - T383845
- 18:45 swfrench@deploy2002: helmfile [codfw] DONE helmfile.d/services/mw-web: apply
- 18:45 swfrench@deploy2002: helmfile [codfw] START helmfile.d/services/mw-web: apply
- 18:43 vriley@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on an-worker1180.eqiad.wmnet with reason: host reimage
- 18:40 vriley@cumin1002: START - Cookbook sre.hosts.downtime for 2:00:00 on an-worker1180.eqiad.wmnet with reason: host reimage
- 18:39 swfrench@deploy2002: helmfile [codfw] DONE helmfile.d/services/mw-web: apply
- 18:38 swfrench@deploy2002: helmfile [codfw] START helmfile.d/services/mw-web: apply
- 18:37 swfrench@deploy2002: helmfile [eqiad] DONE helmfile.d/services/mw-web: apply
- 18:37 swfrench@deploy2002: helmfile [eqiad] START helmfile.d/services/mw-web: apply
- 18:30 andrew@dns1004: END - running authdns-update
- 18:28 swfrench@deploy2002: helmfile [eqiad] DONE helmfile.d/services/mw-web: apply
- 18:28 swfrench@deploy2002: helmfile [eqiad] START helmfile.d/services/mw-web: apply
- 18:28 andrew@dns1004: START - running authdns-update
- 18:27 vriley@cumin1002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host an-worker1178.eqiad.wmnet with OS bullseye
- 18:27 vriley@cumin1002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - vriley@cumin1002"
- 18:26 swfrench-wmf: mw-api-ext: migrated 5% of residual PHP 7.4 traffic to 8.1 - T383845
- 18:26 ebernhardson@deploy2002: helmfile [codfw] DONE helmfile.d/services/cirrus-streaming-updater: apply
- 18:26 ebernhardson@deploy2002: helmfile [codfw] START helmfile.d/services/cirrus-streaming-updater: apply
- 18:25 vriley@cumin1002: START - Cookbook sre.hosts.reimage for host an-worker1180.eqiad.wmnet with OS bullseye
- 18:25 vriley@cumin1002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - vriley@cumin1002"
- 18:24 jclark@cumin1002: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host an-worker1198.eqiad.wmnet with OS bullseye
- 18:23 ebernhardson@deploy2002: helmfile [eqiad] DONE helmfile.d/services/cirrus-streaming-updater: apply
- 18:23 ebernhardson@deploy2002: helmfile [eqiad] START helmfile.d/services/cirrus-streaming-updater: apply
- 18:23 ebernhardson@deploy2002: helmfile [eqiad] DONE helmfile.d/services/cirrus-streaming-updater: apply
- 18:23 ebernhardson@deploy2002: helmfile [eqiad] START helmfile.d/services/cirrus-streaming-updater: apply
- 18:23 swfrench@deploy2002: helmfile [codfw] DONE helmfile.d/services/mw-api-ext: apply
- 18:23 swfrench@deploy2002: helmfile [codfw] START helmfile.d/services/mw-api-ext: apply
- 18:17 swfrench@deploy2002: helmfile [codfw] DONE helmfile.d/services/mw-api-ext: apply
- 18:17 swfrench@deploy2002: helmfile [codfw] START helmfile.d/services/mw-api-ext: apply
- 18:16 swfrench@deploy2002: helmfile [eqiad] DONE helmfile.d/services/mw-api-ext: apply
- 18:16 swfrench@deploy2002: helmfile [eqiad] START helmfile.d/services/mw-api-ext: apply
- 18:14 jclark@cumin1002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host an-worker1199.eqiad.wmnet with OS bullseye
- 18:14 jclark@cumin1002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jclark@cumin1002"
- 18:13 jclark@cumin1002: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host an-worker1200.eqiad.wmnet with OS bullseye
- 18:08 ebernhardson@deploy2002: helmfile [eqiad] DONE helmfile.d/services/cirrus-streaming-updater: apply
- 18:08 ebernhardson@deploy2002: helmfile [eqiad] START helmfile.d/services/cirrus-streaming-updater: apply
- 18:06 swfrench@deploy2002: helmfile [eqiad] DONE helmfile.d/services/mw-api-ext: apply
- 18:06 swfrench@deploy2002: helmfile [eqiad] START helmfile.d/services/mw-api-ext: apply
- 18:02 vriley@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on an-worker1178.eqiad.wmnet with reason: host reimage
- 17:56 jclark@cumin1002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jclark@cumin1002"
- 17:55 vriley@cumin1002: START - Cookbook sre.hosts.downtime for 2:00:00 on an-worker1178.eqiad.wmnet with reason: host reimage
- 17:51 swfrench@deploy2002: helmfile [codfw] DONE helmfile.d/services/shellbox-media: apply
- 17:50 swfrench@deploy2002: helmfile [codfw] START helmfile.d/services/shellbox-media: apply
- 17:50 btullis@cumin1002: START - Cookbook sre.hosts.reboot-single for host an-presto1014.eqiad.wmnet
- 17:44 swfrench@deploy2002: helmfile [eqiad] DONE helmfile.d/services/shellbox-media: apply
- 17:43 swfrench@deploy2002: helmfile [eqiad] START helmfile.d/services/shellbox-media: apply
- 17:42 jclark@cumin1002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host an-worker1197.eqiad.wmnet with OS bullseye
- 17:42 jclark@cumin1002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jclark@cumin1002"
- 17:42 jclark@cumin1002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jclark@cumin1002"
- 17:40 vriley@cumin1002: START - Cookbook sre.hosts.reimage for host an-worker1178.eqiad.wmnet with OS bullseye
- 17:38 jclark@cumin1002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host an-worker1196.eqiad.wmnet with OS bullseye
- 17:38 jclark@cumin1002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jclark@cumin1002"
- 17:38 jclark@cumin1002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jclark@cumin1002"
- 17:36 jmm@cumin2002: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts ganeti1035.eqiad.wmnet
- 17:36 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ganeti1035.eqiad.wmnet
- 17:33 jclark@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on an-worker1199.eqiad.wmnet with reason: host reimage
- 17:30 jclark@cumin1002: START - Cookbook sre.hosts.downtime for 2:00:00 on an-worker1199.eqiad.wmnet with reason: host reimage
- 17:22 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host ganeti1035.eqiad.wmnet
- 17:22 jclark@cumin1002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host an-worker1189.eqiad.wmnet with OS bullseye
- 17:22 jclark@cumin1002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jclark@cumin1002"
- 17:22 jmm@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ganeti1035.eqiad.wmnet
- 17:21 jclark@cumin1002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jclark@cumin1002"
- 17:21 jmm@cumin2002: END (FAIL) - Cookbook sre.hardware.upgrade-firmware (exit_code=99) upgrade firmware for hosts ganeti1035.eqiad.wmnet
- 17:18 jclark@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on an-worker1197.eqiad.wmnet with reason: host reimage
- 17:17 jclark@cumin1002: START - Cookbook sre.hosts.reimage for host an-worker1200.eqiad.wmnet with OS bullseye
- 17:16 moritzm: installing avahi security updates
- 17:16 jclark@cumin1002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host an-worker1192.eqiad.wmnet with OS bullseye
- 17:16 jclark@cumin1002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jclark@cumin1002"
- 17:16 jclark@cumin1002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jclark@cumin1002"
- 17:15 jclark@cumin1002: START - Cookbook sre.hosts.reimage for host an-worker1199.eqiad.wmnet with OS bullseye
- 17:15 jclark@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on an-worker1196.eqiad.wmnet with reason: host reimage
- 17:15 jclark@cumin1002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host an-worker1191.eqiad.wmnet with OS bullseye
- 17:15 jclark@cumin1002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jclark@cumin1002"
- 17:14 jclark@cumin1002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jclark@cumin1002"
- 17:14 jclark@cumin1002: START - Cookbook sre.hosts.downtime for 2:00:00 on an-worker1197.eqiad.wmnet with reason: host reimage
- 17:12 jclark@cumin1002: START - Cookbook sre.hosts.downtime for 2:00:00 on an-worker1196.eqiad.wmnet with reason: host reimage
- 17:10 isaranto@deploy2002: helmfile [ml-serve-eqiad] Ran 'sync' command on namespace 'revision-models' for release 'main' .
- 17:10 jclark@cumin1002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host an-worker1193.eqiad.wmnet with OS bullseye
- 17:10 jclark@cumin1002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jclark@cumin1002"
- 17:09 jclark@cumin1002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jclark@cumin1002"
- 17:07 isaranto@deploy2002: helmfile [ml-serve-codfw] Ran 'sync' command on namespace 'revision-models' for release 'main' .
- 17:06 jclark@cumin1002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host an-worker1194.eqiad.wmnet with OS bullseye
- 17:06 jclark@cumin1002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jclark@cumin1002"
- 17:05 jclark@cumin1002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jclark@cumin1002"
- 17:03 jclark@cumin1002: START - Cookbook sre.hosts.reimage for host an-worker1198.eqiad.wmnet with OS bullseye
- 17:02 jclark@cumin1002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host an-worker1195.eqiad.wmnet with OS bullseye
- 17:02 jclark@cumin1002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jclark@cumin1002"
- 17:02 jclark@cumin1002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jclark@cumin1002"
- 16:58 jclark@cumin1002: START - Cookbook sre.hosts.reimage for host an-worker1197.eqiad.wmnet with OS bullseye
- 16:58 jclark@cumin1002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host an-worker1190.eqiad.wmnet with OS bullseye
- 16:58 jclark@cumin1002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jclark@cumin1002"
- 16:58 jclark@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on an-worker1189.eqiad.wmnet with reason: host reimage
- 16:58 jclark@cumin1002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jclark@cumin1002"
- 16:56 jclark@cumin1002: START - Cookbook sre.hosts.reimage for host an-worker1196.eqiad.wmnet with OS bullseye
- 16:55 jclark@cumin1002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host an-worker1188.eqiad.wmnet with OS bullseye
- 16:55 jclark@cumin1002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jclark@cumin1002"
- 16:55 jclark@cumin1002: START - Cookbook sre.hosts.downtime for 2:00:00 on an-worker1189.eqiad.wmnet with reason: host reimage
- 16:55 jclark@cumin1002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jclark@cumin1002"
- 16:53 jclark@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on an-worker1192.eqiad.wmnet with reason: host reimage
- 16:50 jclark@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on an-worker1191.eqiad.wmnet with reason: host reimage
- 16:48 jmm@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ganeti1035.eqiad.wmnet
- 16:46 jclark@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on an-worker1193.eqiad.wmnet with reason: host reimage
- 16:42 jclark@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on an-worker1194.eqiad.wmnet with reason: host reimage
- 16:41 jmm@cumin2002: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on ganeti1035.eqiad.wmnet with reason: remove from cluster for reimage
- 16:39 jclark@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on an-worker1195.eqiad.wmnet with reason: host reimage
- 16:38 reedy@deploy2002: Synchronized wmf-config/: Various config cleanup (duration: 08m 31s)
- 16:35 jclark@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on an-worker1190.eqiad.wmnet with reason: host reimage
- 16:32 jclark@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on an-worker1188.eqiad.wmnet with reason: host reimage
- 16:29 jclark@cumin1002: START - Cookbook sre.hosts.downtime for 2:00:00 on an-worker1193.eqiad.wmnet with reason: host reimage
- 16:28 jclark@cumin1002: START - Cookbook sre.hosts.downtime for 2:00:00 on an-worker1194.eqiad.wmnet with reason: host reimage
- 16:28 jclark@cumin1002: START - Cookbook sre.hosts.downtime for 2:00:00 on an-worker1195.eqiad.wmnet with reason: host reimage
- 16:27 jclark@cumin1002: START - Cookbook sre.hosts.downtime for 2:00:00 on an-worker1192.eqiad.wmnet with reason: host reimage
- 16:27 jclark@cumin1002: START - Cookbook sre.hosts.downtime for 2:00:00 on an-worker1191.eqiad.wmnet with reason: host reimage
- 16:27 jclark@cumin1002: START - Cookbook sre.hosts.downtime for 2:00:00 on an-worker1190.eqiad.wmnet with reason: host reimage
- 16:27 jclark@cumin1002: START - Cookbook sre.hosts.downtime for 2:00:00 on an-worker1188.eqiad.wmnet with reason: host reimage
- 16:19 tgr_: UTC afternoon deploys done
- 16:17 tgr@deploy2002: Finished scap sync-world: Backport for Enable SUL3 signup for 10% of group 1 users (T384007) (duration: 14m 10s)
- 16:15 jmm@cumin2002: END (PASS) - Cookbook sre.ganeti.drain-node (exit_code=0) for draining ganeti node ganeti1035.eqiad.wmnet
- 16:14 jclark@cumin1002: START - Cookbook sre.hosts.reimage for host an-worker1193.eqiad.wmnet with OS bullseye
- 16:13 jclark@cumin1002: START - Cookbook sre.hosts.reimage for host an-worker1195.eqiad.wmnet with OS bullseye
- 16:13 jclark@cumin1002: START - Cookbook sre.hosts.reimage for host an-worker1194.eqiad.wmnet with OS bullseye
- 16:13 jclark@cumin1002: START - Cookbook sre.hosts.reimage for host an-worker1192.eqiad.wmnet with OS bullseye
- 16:12 jclark@cumin1002: START - Cookbook sre.hosts.reimage for host an-worker1191.eqiad.wmnet with OS bullseye
- 16:12 fceratto@cumin1002: START - Cookbook sre.mysql.clone of db1202.eqiad.wmnet onto db1253.eqiad.wmnet
- 16:12 jclark@cumin1002: START - Cookbook sre.hosts.reimage for host an-worker1190.eqiad.wmnet with OS bullseye
- 16:12 jclark@cumin1002: START - Cookbook sre.hosts.reimage for host an-worker1189.eqiad.wmnet with OS bullseye
- 16:12 jclark@cumin1002: START - Cookbook sre.hosts.reimage for host an-worker1188.eqiad.wmnet with OS bullseye
- 16:11 tgr@deploy2002: tgr: Continuing with sync
- 16:10 fceratto@cumin1002: END (FAIL) - Cookbook sre.mysql.clone (exit_code=99) of db1202.eqiad.wmnet onto db1253.eqiad.wmnet
- 16:09 fceratto@cumin1002: START - Cookbook sre.mysql.clone of db1202.eqiad.wmnet onto db1253.eqiad.wmnet
- 16:08 fceratto@cumin1002: END (FAIL) - Cookbook sre.mysql.clone (exit_code=99) of db1202.eqiad.wmnet onto db1253.eqiad.wmnet
- 16:08 fceratto@cumin1002: START - Cookbook sre.mysql.clone of db1202.eqiad.wmnet onto db1253.eqiad.wmnet
- 16:06 tgr@deploy2002: tgr: Backport for Enable SUL3 signup for 10% of group 1 users (T384007) synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug)
- 16:03 tgr@deploy2002: Started scap sync-world: Backport for Enable SUL3 signup for 10% of group 1 users (T384007)
- 15:56 hashar@deploy2002: Finished scap sync-world: Backport for Use namespaced Title class (T388085) (duration: 22m 00s)
- 15:50 hashar@deploy2002: hashar, daimona: Continuing with sync
- 15:39 hashar@deploy2002: hashar, daimona: Backport for Use namespaced Title class (T388085) synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug)
- 15:34 hashar@deploy2002: Started scap sync-world: Backport for Use namespaced Title class (T388085)
- 15:31 hashar@deploy2002: Finished scap sync-world: Backport for [Growth] Set default api lookahead size to 10 (T325990), Revert "Let sysops add/remove the event-organizer group by default" (T386738), Remove unused route file from Wikibase REST API configuration (T383774) (duration: 10m 23s)
- 15:29 jclark@cumin1002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host an-worker1187.eqiad.wmnet with OS bullseye
- 15:29 jclark@cumin1002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jclark@cumin1002"
- 15:27 jclark@cumin1002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jclark@cumin1002"
- 15:24 hashar@deploy2002: itamar, sgimeno, daimona, hashar: Continuing with sync
- 15:24 hashar@deploy2002: itamar, sgimeno, daimona, hashar: Backport for [Growth] Set default api lookahead size to 10 (T325990), Revert "Let sysops add/remove the event-organizer group by default" (T386738), Remove unused route file from Wikibase REST API configuration (T383774) synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug)
- 15:22 vgutierrez@cumin1002: END (PASS) - Cookbook sre.loadbalancer.migrate-service-ipip (exit_code=0) for role: wdqs::public@eqiad
- 15:22 vgutierrez@cumin1002: END (PASS) - Cookbook sre.loadbalancer.restart-pybal (exit_code=0) rolling-restart of pybal on (A:lvs-low-traffic-eqiad or A:lvs-secondary-eqiad) and A:bullseye and A:lvs
- 15:21 vgutierrez@cumin1002: START - Cookbook sre.loadbalancer.restart-pybal rolling-restart of pybal on (A:lvs-low-traffic-eqiad or A:lvs-secondary-eqiad) and A:bullseye and A:lvs
- 15:20 hashar@deploy2002: Started scap sync-world: Backport for [Growth] Set default api lookahead size to 10 (T325990), Revert "Let sysops add/remove the event-organizer group by default" (T386738), Remove unused route file from Wikibase REST API configuration (T383774)
- 15:09 vgutierrez@cumin1002: START - Cookbook sre.loadbalancer.migrate-service-ipip for role: wdqs::public@eqiad
- 15:03 vgutierrez@cumin1002: END (PASS) - Cookbook sre.loadbalancer.migrate-service-ipip (exit_code=0) for role: wdqs::public@codfw
- 15:03 vgutierrez@cumin1002: END (PASS) - Cookbook sre.loadbalancer.restart-pybal (exit_code=0) rolling-restart of pybal on (A:lvs-low-traffic-codfw or A:lvs-secondary-codfw) and A:bullseye and A:lvs
- 15:02 jclark@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on an-worker1187.eqiad.wmnet with reason: host reimage
- 15:02 vgutierrez@cumin1002: START - Cookbook sre.loadbalancer.restart-pybal rolling-restart of pybal on (A:lvs-low-traffic-codfw or A:lvs-secondary-codfw) and A:bullseye and A:lvs
- 14:58 hashar@deploy2002: hashar, sgimeno, itamar, daimona: Backport for [Growth] Set default api lookahead size to 10 (T325990), Revert "Let sysops add/remove the event-organizer group by default" (T386738), Remove unused route file from Wikibase REST API configuration (T383774) synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug)
- 14:57 jclark@cumin1002: START - Cookbook sre.hosts.downtime for 2:00:00 on an-worker1187.eqiad.wmnet with reason: host reimage
- 14:57 vgutierrez@cumin1002: START - Cookbook sre.loadbalancer.migrate-service-ipip for role: wdqs::public@codfw
- 14:56 hashar@deploy2002: Started scap sync-world: Backport for [Growth] Set default api lookahead size to 10 (T325990), Revert "Let sysops add/remove the event-organizer group by default" (T386738), Remove unused route file from Wikibase REST API configuration (T383774)
- 14:53 hashar@deploy2002: Finished scap sync-world: Backport for Revert "Fix nested refs with the same name but a different group", Test new term store config in beta (T385592), Growth: remove unused config wgGENewcomerTasksOresTopicConfigTitle, Drop $wmgCampaignEventsProgramsAndEventsDashboardEnabled (T387025) (duration: 12m 10s)
- 14:47 hashar@deploy2002: ollieshotton, migr, daimona, hashar: Continuing with sync
- 14:47 hashar@deploy2002: ollieshotton, migr, daimona, hashar: Backport for Revert "Fix nested refs with the same name but a different group", Test new term store config in beta (T385592), Growth: remove unused config wgGENewcomerTasksOresTopicConfigTitle, Drop $wmgCampaignEventsProgramsAndEventsDashboardEnabled (T387025) synced to the testservers (h
- 14:17 hashar@deploy2002: Sync cancelled.
- 14:11 fceratto@cumin1002: END (PASS) - Cookbook sre.mysql.clone (exit_code=0) of db1233.eqiad.wmnet onto db1254.eqiad.wmnet
- 14:02 hashar@deploy2002: hashar, ihurbain: Backport for Fix nested refs with the same name but a different group (T387800) synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug)
- 13:59 hashar@deploy2002: Started scap sync-world: Backport for Fix nested refs with the same name but a different group (T387800)
- 13:13 fceratto@cumin1002: END (PASS) - Cookbook sre.mysql.upgrade (exit_code=0) for db1197.eqiad.wmnet
- 13:13 fceratto@cumin1002: END (PASS) - Cookbook sre.mysql.pool (exit_code=0) db1197 gradually with 4 steps - Upgrading db1197
- 13:11 moritzm: installing gst-plugins-base1.0 security updates
- 13:02 btullis@deploy2002: helmfile [dse-k8s-eqiad] DONE helmfile.d/admin 'apply'.
- 13:01 btullis@deploy2002: helmfile [dse-k8s-eqiad] START helmfile.d/admin 'apply'.
- 12:57 mfossati@deploy2002: Finished deploy [airflow-dags/platform_eng@fa4513d]: say hello to image suggestions v1.0.0 (duration: 01m 09s)
- 12:56 mfossati@deploy2002: Started deploy [airflow-dags/platform_eng@fa4513d]: say hello to image suggestions v1.0.0
- 12:30 marostegui@cumin1002: dbctl commit (dc=all): 'db2152 (re)pooling @ 100%: Repooling after rebuild index', diff saved to https://phabricator.wikimedia.org/P74134 and previous config saved to /var/cache/conftool/dbconfig/20250306-123017-root.json
- 12:28 fceratto@cumin1002: START - Cookbook sre.mysql.pool db1197 gradually with 4 steps - Upgrading db1197
- 12:24 moritzm: installing krb5 security updates
- 12:21 fceratto@cumin1002: END (PASS) - Cookbook sre.mysql.depool (exit_code=0) db1197 - Upgrading db1197
- 12:21 fceratto@cumin1002: START - Cookbook sre.mysql.depool db1197 - Upgrading db1197
- 12:20 fceratto@cumin1002: START - Cookbook sre.mysql.upgrade for db1197.eqiad.wmnet
- 12:15 moritzm: imported lshw 02.19.git.2021.06.19.996aaad9c7-2~bpo11+1 to component/lshw T383557
- 12:15 marostegui@cumin1002: dbctl commit (dc=all): 'db2152 (re)pooling @ 75%: Repooling after rebuild index', diff saved to https://phabricator.wikimedia.org/P74131 and previous config saved to /var/cache/conftool/dbconfig/20250306-121512-root.json
- 12:00 marostegui@cumin1002: dbctl commit (dc=all): 'db2152 (re)pooling @ 50%: Repooling after rebuild index', diff saved to https://phabricator.wikimedia.org/P74130 and previous config saved to /var/cache/conftool/dbconfig/20250306-120007-root.json
- 11:53 marostegui@cumin1002: dbctl commit (dc=all): 'db1209 (re)pooling @ 100%: Repooling after rebuild index', diff saved to https://phabricator.wikimedia.org/P74129 and previous config saved to /var/cache/conftool/dbconfig/20250306-115357-root.json
- 11:45 marostegui@cumin1002: dbctl commit (dc=all): 'db2152 (re)pooling @ 25%: Repooling after rebuild index', diff saved to https://phabricator.wikimedia.org/P74127 and previous config saved to /var/cache/conftool/dbconfig/20250306-114501-root.json
- 11:44 topranks: applying interface-specific arp policer on cr2-magru to IX.BR sub-interface ae0.3347 (T384774)
- 11:39 vgutierrez@cumin1002: END (PASS) - Cookbook sre.loadbalancer.migrate-service-ipip (exit_code=0) for role: druid::public::worker@eqiad
- 11:39 vgutierrez@cumin1002: END (PASS) - Cookbook sre.loadbalancer.restart-pybal (exit_code=0) rolling-restart of pybal on (A:lvs-low-traffic-eqiad or A:lvs-secondary-eqiad) and A:bullseye and A:lvs
- 11:38 marostegui@cumin1002: dbctl commit (dc=all): 'db1209 (re)pooling @ 75%: Repooling after rebuild index', diff saved to https://phabricator.wikimedia.org/P74125 and previous config saved to /var/cache/conftool/dbconfig/20250306-113852-root.json
- 11:37 vgutierrez@cumin1002: START - Cookbook sre.loadbalancer.restart-pybal rolling-restart of pybal on (A:lvs-low-traffic-eqiad or A:lvs-secondary-eqiad) and A:bullseye and A:lvs
- 11:36 hnowlan: Migrating 12 wikis to use mobileapps/pcs without restbase
- 11:34 fceratto@cumin1002: END (PASS) - Cookbook sre.mysql.upgrade (exit_code=0) for db2230.codfw.wmnet
- 11:34 vgutierrez@cumin1002: START - Cookbook sre.loadbalancer.migrate-service-ipip for role: druid::public::worker@eqiad
- 11:29 marostegui@cumin1002: dbctl commit (dc=all): 'db2152 (re)pooling @ 10%: Repooling after rebuild index', diff saved to https://phabricator.wikimedia.org/P74124 and previous config saved to /var/cache/conftool/dbconfig/20250306-112955-root.json
- 11:29 fceratto@cumin1002: START - Cookbook sre.mysql.upgrade for db2230.codfw.wmnet
- 11:24 vgutierrez@cumin1002: END (FAIL) - Cookbook sre.loadbalancer.migrate-service-ipip (exit_code=99) for role: druid::public::worker@eqiad
- 11:23 marostegui@cumin1002: dbctl commit (dc=all): 'db1209 (re)pooling @ 50%: Repooling after rebuild index', diff saved to https://phabricator.wikimedia.org/P74121 and previous config saved to /var/cache/conftool/dbconfig/20250306-112346-root.json
- 11:19 fceratto@cumin1002: END (FAIL) - Cookbook sre.mysql.upgrade (exit_code=99) for db2230.codfw.wmnet
- 11:18 fceratto@cumin1002: START - Cookbook sre.mysql.upgrade for db2230.codfw.wmnet
- 11:18 vgutierrez@cumin1002: START - Cookbook sre.loadbalancer.migrate-service-ipip for role: druid::public::worker@eqiad
- 11:08 marostegui@cumin1002: dbctl commit (dc=all): 'db1209 (re)pooling @ 25%: Repooling after rebuild index', diff saved to https://phabricator.wikimedia.org/P74118 and previous config saved to /var/cache/conftool/dbconfig/20250306-110841-root.json
- 10:53 marostegui@cumin1002: dbctl commit (dc=all): 'db1209 (re)pooling @ 10%: Repooling after rebuild index', diff saved to https://phabricator.wikimedia.org/P74116 and previous config saved to /var/cache/conftool/dbconfig/20250306-105335-root.json
- 10:16 marostegui: Drop phabricator_search.search_documentfield_BKUP T387174
- 10:14 volans@cumin1002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
- 10:14 volans@cumin1002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Unblock others adds an-worker1186 - volans@cumin1002"
- 10:14 volans@cumin1002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Unblock others adds an-worker1186 - volans@cumin1002"
- 10:10 volans@cumin1002: START - Cookbook sre.dns.netbox
- 10:10 volans@cumin1002: END (FAIL) - Cookbook sre.dns.netbox (exit_code=99)
- 09:51 jmm@cumin2002: START - Cookbook sre.ganeti.drain-node for draining ganeti node ganeti1035.eqiad.wmnet
- 09:49 jmm@cumin2002: END (PASS) - Cookbook sre.ganeti.drain-node (exit_code=0) for draining ganeti node ganeti1035.eqiad.wmnet
- 09:47 jmm@cumin2002: START - Cookbook sre.ganeti.drain-node for draining ganeti node ganeti1035.eqiad.wmnet
- 09:46 volans: disabling iDrac's WebServer.HostHeaderCheck on the remaining hosts that have it - T382416
- 09:35 stevemunene@cumin1002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host dse-k8s-etcd1003.eqiad.wmnet with OS bookworm
- 09:28 jynus: deploy additional grants to m1 T387892
- 09:22 moritzm: installing openssh security updates
- 09:13 hashar@deploy2002: rebuilt and synchronized wikiversions files: group2 to 1.44.0-wmf.19 refs T386214
- 09:10 jmm@cumin2002: END (FAIL) - Cookbook sre.ganeti.addnode (exit_code=99) for new host ganeti1032.eqiad.wmnet to cluster eqiad and group A
- 09:09 jmm@cumin2002: START - Cookbook sre.ganeti.addnode for new host ganeti1032.eqiad.wmnet to cluster eqiad and group A
- 08:56 volans@cumin1002: START - Cookbook sre.dns.netbox
- 08:56 dcausse@deploy2002: Finished scap sync-world: Backport for cirrus: configure wgCirrusSearchLanguageKeywordExtraFields (T271776) (duration: 11m 53s)
- 08:55 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ganeti1032.eqiad.wmnet
- 08:50 dcausse@deploy2002: dcausse: Continuing with sync
- 08:47 dcausse@deploy2002: dcausse: Backport for cirrus: configure wgCirrusSearchLanguageKeywordExtraFields (T271776) synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug)
- 08:44 dcausse@deploy2002: Started scap sync-world: Backport for cirrus: configure wgCirrusSearchLanguageKeywordExtraFields (T271776)
- 08:41 dcausse: adding https://gerrit.wikimedia.org/r/c/operations/mediawiki-config/+/1121666 to the "UTC morning backport window"
- 08:09 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host ganeti1032.eqiad.wmnet
- 08:04 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host ganeti1032.eqiad.wmnet with OS bookworm
- 07:56 stevemunene@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on dse-k8s-etcd1003.eqiad.wmnet with reason: host reimage
- 07:52 stevemunene@cumin1002: START - Cookbook sre.hosts.downtime for 2:00:00 on dse-k8s-etcd1003.eqiad.wmnet with reason: host reimage
- 07:51 fceratto@cumin1002: START - Cookbook sre.mysql.clone of db1233.eqiad.wmnet onto db1254.eqiad.wmnet
- 07:46 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on ganeti1032.eqiad.wmnet with reason: host reimage
- 07:42 jmm@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on ganeti1032.eqiad.wmnet with reason: host reimage
- 07:38 stevemunene@cumin1002: START - Cookbook sre.hosts.reimage for host dse-k8s-etcd1003.eqiad.wmnet with OS bookworm
- 07:25 jmm@cumin2002: START - Cookbook sre.hosts.reimage for host ganeti1032.eqiad.wmnet with OS bookworm
- 07:24 stevemunene@cumin1002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host dse-k8s-etcd1001.eqiad.wmnet with OS bookworm
- 06:25 stevemunene@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on dse-k8s-etcd1001.eqiad.wmnet with reason: host reimage
- 06:25 marostegui@cumin1002: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db1209.eqiad.wmnet with reason: Index rebuild
- 06:25 marostegui@cumin1002: END (PASS) - Cookbook sre.mysql.upgrade (exit_code=0) for db1209.eqiad.wmnet
- 06:22 stevemunene@cumin1002: START - Cookbook sre.hosts.downtime for 2:00:00 on dse-k8s-etcd1001.eqiad.wmnet with reason: host reimage
- 06:19 marostegui@cumin1002: START - Cookbook sre.mysql.upgrade for db1209.eqiad.wmnet
- 06:17 marostegui@cumin1002: dbctl commit (dc=all): 'Depool db1209 T388093', diff saved to https://phabricator.wikimedia.org/P74112 and previous config saved to /var/cache/conftool/dbconfig/20250306-061736-marostegui.json
- 06:16 marostegui@cumin1002: dbctl commit (dc=all): 'Promote db1193 to s8 primary T388093', diff saved to https://phabricator.wikimedia.org/P74111 and previous config saved to /var/cache/conftool/dbconfig/20250306-061650-marostegui.json
- 06:16 marostegui: Starting s8 eqiad failover from db1209 to db1193 - T388093
- 06:16 root@cumin1002: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db2152.codfw.wmnet with reason: Index rebuild
- 06:15 marostegui@cumin1002: END (PASS) - Cookbook sre.mysql.upgrade (exit_code=0) for db2152.codfw.wmnet
- 06:12 stevemunene@cumin1002: START - Cookbook sre.hosts.reimage for host dse-k8s-etcd1001.eqiad.wmnet with OS bookworm
- 06:11 marostegui@cumin1002: dbctl commit (dc=all): 'Remove db1193 from API/vslow/dump T388093', diff saved to https://phabricator.wikimedia.org/P74110 and previous config saved to /var/cache/conftool/dbconfig/20250306-061133-marostegui.json
- 06:11 marostegui@cumin1002: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on 32 hosts with reason: Primary switchover s8 T388093
- 06:10 marostegui@cumin1002: dbctl commit (dc=all): 'Set db1193 with weight 0 T388093', diff saved to https://phabricator.wikimedia.org/P74109 and previous config saved to /var/cache/conftool/dbconfig/20250306-061052-marostegui.json
- 06:08 marostegui@cumin1002: START - Cookbook sre.mysql.upgrade for db2152.codfw.wmnet
- 06:08 marostegui@cumin1002: dbctl commit (dc=all): 'Depool db2152', diff saved to https://phabricator.wikimedia.org/P74108 and previous config saved to /var/cache/conftool/dbconfig/20250306-060842-marostegui.json
- 05:42 vriley@cumin1002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host an-worker1184.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED
- 05:28 vriley@cumin1002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host an-worker1180.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED
- 05:25 vriley@cumin1002: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host an-worker1185.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED
- 05:20 vriley@cumin1002: START - Cookbook sre.hosts.provision for host an-worker1185.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED
- 05:19 vriley@cumin1002: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host an-worker1185.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED
- 05:14 vriley@cumin1002: START - Cookbook sre.hosts.provision for host an-worker1185.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED
- 05:13 vriley@cumin1002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
- 05:13 vriley@cumin1002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: update mgmt [an-worker1185] - vriley@cumin1002"
- 05:13 vriley@cumin1002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: update mgmt [an-worker1185] - vriley@cumin1002"
- 05:09 vriley@cumin1002: START - Cookbook sre.hosts.provision for host an-worker1184.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED
- 05:08 vriley@cumin1002: START - Cookbook sre.hosts.provision for host an-worker1180.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED
- 05:08 vriley@cumin1002: START - Cookbook sre.dns.netbox
- 05:07 vriley@cumin1002: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host an-worker1184
- 05:07 vriley@cumin1002: START - Cookbook sre.network.configure-switch-interfaces for host an-worker1184
- 05:06 vriley@cumin1002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
- 05:06 vriley@cumin1002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: update mgmt [an-worker1184] - vriley@cumin1002"
- 05:06 vriley@cumin1002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: update mgmt [an-worker1184] - vriley@cumin1002"
- 05:04 vriley@cumin1002: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host an-worker1180
- 05:02 vriley@cumin1002: START - Cookbook sre.network.configure-switch-interfaces for host an-worker1180
- 05:01 vriley@cumin1002: START - Cookbook sre.dns.netbox
- 05:01 vriley@cumin1002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
- 05:01 vriley@cumin1002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: update mgmt [an-worker1180] - vriley@cumin1002"
- 05:01 vriley@cumin1002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: update mgmt [an-worker1180] - vriley@cumin1002"
- 05:00 vriley@cumin1002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host an-worker1182.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED
- 04:56 vriley@cumin1002: START - Cookbook sre.dns.netbox
- 04:52 vriley@cumin1002: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host an-worker1179.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED
- 04:47 vriley@cumin1002: START - Cookbook sre.hosts.provision for host an-worker1179.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED
- 04:45 vriley@cumin1002: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host an-worker1179.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED
- 04:45 vriley@cumin1002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host an-worker1178.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED
- 04:44 vriley@cumin1002: START - Cookbook sre.hosts.provision for host an-worker1182.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED
- 04:42 vriley@cumin1002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
- 04:42 vriley@cumin1002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: update mgmt [an-worker1182] - vriley@cumin1002"
- 04:42 vriley@cumin1002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: update mgmt [an-worker1182] - vriley@cumin1002"
- 04:39 vriley@cumin1002: START - Cookbook sre.hosts.provision for host an-worker1179.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED
- 04:38 vriley@cumin1002: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host an-worker1179
- 04:38 vriley@cumin1002: START - Cookbook sre.dns.netbox
- 04:37 vriley@cumin1002: START - Cookbook sre.network.configure-switch-interfaces for host an-worker1179
- 04:36 vriley@cumin1002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
- 04:36 vriley@cumin1002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: update mgmt [an-worker1179] - vriley@cumin1002"
- 04:36 vriley@cumin1002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: update mgmt [an-worker1179] - vriley@cumin1002"
- 04:31 vriley@cumin1002: START - Cookbook sre.dns.netbox
- 04:28 vriley@cumin1002: START - Cookbook sre.hosts.provision for host an-worker1178.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED
- 04:26 vriley@cumin1002: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host an-worker1178
- 04:25 vriley@cumin1002: START - Cookbook sre.network.configure-switch-interfaces for host an-worker1178
- 04:24 vriley@cumin1002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
- 04:24 vriley@cumin1002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: update mgmt an-worker1178 - vriley@cumin1002"
- 04:24 vriley@cumin1002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: update mgmt an-worker1178 - vriley@cumin1002"
- 04:20 vriley@cumin1002: START - Cookbook sre.dns.netbox
- 04:04 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host ganeti2045.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART
- 03:58 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host ganeti2045.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART
- 03:54 jhancock@cumin2002: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host ganeti2045.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART
- 03:49 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host ganeti2045.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART
- 03:48 jhancock@cumin2002: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host ganeti2045.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART
- 03:48 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host ganeti2045.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART
- 03:36 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host ganeti2046.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART
- 03:26 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host ganeti2046.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART
- 03:25 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host ganeti2045.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART
- 03:14 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host ganeti2045.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART
- 03:10 jhancock@cumin2002: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host ganeti2045.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART
- 03:10 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host ganeti2045.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART
- 03:10 jhancock@cumin2002: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host ganeti2047.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART
- 03:10 jhancock@cumin2002: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host ganeti2046.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART
- 03:10 jhancock@cumin2002: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host ganeti2045.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART
- 03:09 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host ganeti2045.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART
- 03:09 jhancock@cumin2002: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host ganeti2049.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART
- 03:09 jhancock@cumin2002: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host ganeti2048.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART
- 03:09 jhancock@cumin2002: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host ganeti2050.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART
- 03:08 jhancock@cumin2002: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host ganeti2045.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART
- 03:08 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host ganeti2045.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART
- 03:06 jhancock@cumin2002: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host ganeti2045.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART
- 03:06 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host ganeti2045.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART
- 03:05 jhancock@cumin2002: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host ganeti2045.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART
- 03:05 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host ganeti2045.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART
- 03:04 jhancock@cumin2002: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host ganeti2045.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART
- 03:03 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host ganeti2050.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART
- 03:03 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host ganeti2049.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART
- 03:03 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host ganeti2048.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART
- 03:03 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host ganeti2047.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART
- 03:03 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host ganeti2046.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART
- 03:03 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host ganeti2045.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART
- 02:43 jhancock@cumin2002: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host ganeti2050
- 02:43 jhancock@cumin2002: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host ganeti2049
- 02:42 jhancock@cumin2002: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host ganeti2048
- 02:42 jhancock@cumin2002: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host ganeti2047
- 02:42 jhancock@cumin2002: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host ganeti2046
- 02:42 jhancock@cumin2002: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host ganeti2045
- 02:42 jhancock@cumin2002: START - Cookbook sre.network.configure-switch-interfaces for host ganeti2050
- 02:42 jhancock@cumin2002: START - Cookbook sre.network.configure-switch-interfaces for host ganeti2049
- 02:42 jhancock@cumin2002: START - Cookbook sre.network.configure-switch-interfaces for host ganeti2048
- 02:42 jhancock@cumin2002: START - Cookbook sre.network.configure-switch-interfaces for host ganeti2047
- 02:42 jhancock@cumin2002: START - Cookbook sre.network.configure-switch-interfaces for host ganeti2046
- 02:42 jhancock@cumin2002: START - Cookbook sre.network.configure-switch-interfaces for host ganeti2045
- 02:41 jhancock@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
- 02:41 jhancock@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: adding ganeti2049 to codfw - jhancock@cumin2002"
- 02:41 jhancock@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: adding ganeti2049 to codfw - jhancock@cumin2002"
- 02:39 bking@cumin2002: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host cloudelastic1008.eqiad.wmnet with OS bullseye
- 02:37 jhancock@cumin2002: START - Cookbook sre.dns.netbox
- 02:32 jhancock@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
- 02:32 jhancock@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: adding ganeti2045 to codfw - jhancock@cumin2002"
- 02:32 jhancock@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: adding ganeti2045 to codfw - jhancock@cumin2002"
- 02:27 jhancock@cumin2002: START - Cookbook sre.dns.netbox
- 02:26 jhancock@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
- 02:23 jhancock@cumin2002: START - Cookbook sre.dns.netbox
- 01:19 jclark@cumin1002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host an-worker1181.eqiad.wmnet with OS bullseye
- 01:19 jclark@cumin1002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jclark@cumin1002"
- 01:19 jclark@cumin1002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jclark@cumin1002"
- 01:18 bking@cumin2002: START - Cookbook sre.hosts.reimage for host cloudelastic1008.eqiad.wmnet with OS bullseye
- 00:58 jclark@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on an-worker1181.eqiad.wmnet with reason: host reimage
- 00:55 jclark@cumin1002: START - Cookbook sre.hosts.downtime for 2:00:00 on an-worker1181.eqiad.wmnet with reason: host reimage
- 00:40 jclark@cumin1002: START - Cookbook sre.hosts.reimage for host an-worker1181.eqiad.wmnet with OS bullseye
- 00:33 zabe: zabe@mwmaint2002:~$ mwscript extensions/WikimediaMaintenance/migrateESRefToContentTableStage2.php commonswiki --delete /home/zabe/text_table_cleanup/commonswiki --sleep 0.5 # T183490
- 00:09 tgr_: UTC late deploys done
- 00:08 tgr@deploy2002: Finished scap sync-world: Backport for Roll out SUL3 signup to 1% of users on most group 1 wikis (T384007) (duration: 29m 13s)
- 00:02 tgr@deploy2002: tgr: Continuing with sync
2025-03-05
- 23:42 tgr@deploy2002: tgr: Backport for Roll out SUL3 signup to 1% of users on most group 1 wikis (T384007) synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug)
- 23:39 tgr@deploy2002: Started scap sync-world: Backport for Roll out SUL3 signup to 1% of users on most group 1 wikis (T384007)
- 23:39 jclark@cumin1002: START - Cookbook sre.hosts.reimage for host an-worker1181.eqiad.wmnet with OS bullseye
- 23:36 tgr@deploy2002: Finished scap sync-world: Backport for Preserve usesul3 flag during autologin (T375788), Preserve usesul3 flag during autologin (T375788), Clean up SUL3 config (T384007) (duration: 18m 53s)
- 23:29 tgr@deploy2002: tgr: Continuing with sync
- 23:20 tgr@deploy2002: tgr: Backport for Preserve usesul3 flag during autologin (T375788), Preserve usesul3 flag during autologin (T375788), Clean up SUL3 config (T384007) synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug)
- 23:17 bking@cumin2002: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host cloudelastic1008.eqiad.wmnet with OS bullseye
- 23:17 tgr@deploy2002: Started scap sync-world: Backport for Preserve usesul3 flag during autologin (T375788), Preserve usesul3 flag during autologin (T375788), Clean up SUL3 config (T384007)
- 23:04 tgr@deploy2002: Finished scap sync-world: Backport for Revert^2 "Invert Parsoid read view wiktionary configs" (duration: 12m 13s)
- 23:03 ebernhardson@deploy2002: helmfile [eqiad] DONE helmfile.d/services/cirrus-streaming-updater: apply
- 23:02 ebernhardson@deploy2002: helmfile [eqiad] START helmfile.d/services/cirrus-streaming-updater: apply
- 22:57 tgr@deploy2002: tgr: Continuing with sync
- 22:54 tgr@deploy2002: tgr: Backport for Revert^2 "Invert Parsoid read view wiktionary configs" synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug)
- 22:53 ebernhardson@deploy2002: helmfile [eqiad] DONE helmfile.d/services/cirrus-streaming-updater: apply
- 22:53 ebernhardson@deploy2002: helmfile [eqiad] START helmfile.d/services/cirrus-streaming-updater: apply
- 22:51 tgr@deploy2002: Started scap sync-world: Backport for Revert^2 "Invert Parsoid read view wiktionary configs"
- 22:29 tgr@deploy2002: Finished scap sync-world: Backport for Revert^2 "Turn on Parsoid Read Views for 44 wiktionaries" (T387505) (duration: 12m 06s)
- 22:24 ebernhardson@deploy2002: helmfile [eqiad] DONE helmfile.d/services/cirrus-streaming-updater: apply
- 22:24 ebernhardson@deploy2002: helmfile [eqiad] START helmfile.d/services/cirrus-streaming-updater: apply
- 22:23 tgr@deploy2002: tgr: Continuing with sync
- 22:20 tgr@deploy2002: tgr: Backport for Revert^2 "Turn on Parsoid Read Views for 44 wiktionaries" (T387505) synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug)
- 22:17 tgr@deploy2002: Started scap sync-world: Backport for Revert^2 "Turn on Parsoid Read Views for 44 wiktionaries" (T387505)
- 21:57 bking@cumin2002: START - Cookbook sre.hosts.reimage for host cloudelastic1008.eqiad.wmnet with OS bullseye
- 21:56 jclark@cumin1002: START - Cookbook sre.hosts.reimage for host an-worker1181.eqiad.wmnet with OS bullseye
- 21:54 bking@cumin2002: END (ERROR) - Cookbook sre.hosts.reimage (exit_code=97) for host cloudelastic1008.eqiad.wmnet with OS bullseye
- 21:53 bking@cumin2002: START - Cookbook sre.hosts.reimage for host cloudelastic1008.eqiad.wmnet with OS bullseye
- 21:50 tgr@deploy2002: Finished scap sync-world: Backport for Revert "Invert Parsoid read view wiktionary configs", Revert "Turn on Parsoid Read Views for 44 wiktionaries" (duration: 09m 30s)
- 21:44 tgr@deploy2002: tgr: Continuing with sync
- 21:44 tgr@deploy2002: tgr: Backport for Revert "Invert Parsoid read view wiktionary configs", Revert "Turn on Parsoid Read Views for 44 wiktionaries" synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug)
- 21:41 tgr@deploy2002: Started scap sync-world: Backport for Revert "Invert Parsoid read view wiktionary configs", Revert "Turn on Parsoid Read Views for 44 wiktionaries"
- 21:29 ebernhardson@deploy2002: helmfile [eqiad] DONE helmfile.d/services/cirrus-streaming-updater: apply
- 21:28 ebernhardson@deploy2002: helmfile [eqiad] START helmfile.d/services/cirrus-streaming-updater: apply
- 21:26 swfrench@deploy2002: helmfile [codfw] DONE helmfile.d/services/mw-web: apply
- 21:26 swfrench@deploy2002: helmfile [codfw] START helmfile.d/services/mw-web: apply
- 21:25 swfrench@deploy2002: helmfile [eqiad] DONE helmfile.d/services/mw-web: apply
- 21:25 swfrench@deploy2002: helmfile [eqiad] START helmfile.d/services/mw-web: apply
- 21:24 ebernhardson@deploy2002: helmfile [eqiad] DONE helmfile.d/services/cirrus-streaming-updater: apply
- 21:24 ebernhardson@deploy2002: helmfile [eqiad] START helmfile.d/services/cirrus-streaming-updater: apply
- 21:23 vriley@cumin1002: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host an-worker1181.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED
- 21:22 tgr@deploy2002: Finished scap sync-world: Backport for Turn on Parsoid Read Views for 44 wiktionaries (T387505), Invert Parsoid read view wiktionary configs (duration: 12m 23s)
- 21:16 tgr@deploy2002: tgr, arlolra: Continuing with sync
- 21:13 tgr@deploy2002: tgr, arlolra: Backport for Turn on Parsoid Read Views for 44 wiktionaries (T387505), Invert Parsoid read view wiktionary configs synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug)
- 21:10 tgr@deploy2002: Started scap sync-world: Backport for Turn on Parsoid Read Views for 44 wiktionaries (T387505), Invert Parsoid read view wiktionary configs
- 20:39 swfrench-wmf: right-sized capacity distribution between mw-(api-ext|web) main and next releases - T383845
- 20:38 swfrench@deploy2002: helmfile [codfw] DONE helmfile.d/services/mw-api-ext: apply
- 20:38 swfrench@deploy2002: helmfile [codfw] START helmfile.d/services/mw-api-ext: apply
- 20:38 swfrench@deploy2002: helmfile [eqiad] DONE helmfile.d/services/mw-api-ext: apply
- 20:38 swfrench@deploy2002: helmfile [eqiad] START helmfile.d/services/mw-api-ext: apply
- 20:20 swfrench@deploy2002: helmfile [codfw] DONE helmfile.d/services/mw-web: apply
- 20:20 swfrench@deploy2002: helmfile [codfw] START helmfile.d/services/mw-web: apply
- 20:20 swfrench@deploy2002: helmfile [eqiad] DONE helmfile.d/services/mw-web: apply
- 20:19 swfrench@deploy2002: helmfile [eqiad] START helmfile.d/services/mw-web: apply
- 20:16 marostegui@cumin1002: dbctl commit (dc=all): 'db2154 (re)pooling @ 100%: Repooling after rebuild index', diff saved to https://phabricator.wikimedia.org/P74107 and previous config saved to /var/cache/conftool/dbconfig/20250305-201612-root.json
- 20:11 brett@cumin2002: END (PASS) - Cookbook sre.cdn.roll-upgrade-ats (exit_code=0) Rolling upgrade/restart of Apache Traffic Server on P{cp4052.ulsfo.wmnet} and A:cp for 9.2.9-1wm1
- 20:09 brett@cumin2002: START - Cookbook sre.cdn.roll-upgrade-ats Rolling upgrade/restart of Apache Traffic Server on P{cp4052.ulsfo.wmnet} and A:cp for 9.2.9-1wm1
- 20:04 marostegui@cumin1002: dbctl commit (dc=all): 'db1167 (re)pooling @ 100%: Repooling after rebuild index', diff saved to https://phabricator.wikimedia.org/P74106 and previous config saved to /var/cache/conftool/dbconfig/20250305-200426-root.json
- 20:01 marostegui@cumin1002: dbctl commit (dc=all): 'db2154 (re)pooling @ 75%: Repooling after rebuild index', diff saved to https://phabricator.wikimedia.org/P74105 and previous config saved to /var/cache/conftool/dbconfig/20250305-200106-root.json
- 19:49 marostegui@cumin1002: dbctl commit (dc=all): 'db1167 (re)pooling @ 75%: Repooling after rebuild index', diff saved to https://phabricator.wikimedia.org/P74104 and previous config saved to /var/cache/conftool/dbconfig/20250305-194920-root.json
- 19:46 marostegui@cumin1002: dbctl commit (dc=all): 'db2154 (re)pooling @ 50%: Repooling after rebuild index', diff saved to https://phabricator.wikimedia.org/P74103 and previous config saved to /var/cache/conftool/dbconfig/20250305-194601-root.json
- 19:34 marostegui@cumin1002: dbctl commit (dc=all): 'db1167 (re)pooling @ 50%: Repooling after rebuild index', diff saved to https://phabricator.wikimedia.org/P74102 and previous config saved to /var/cache/conftool/dbconfig/20250305-193414-root.json
- 19:31 bking@cumin2002: END (PASS) - Cookbook sre.elasticsearch.ban (exit_code=0) Banning hosts: cloudelastic1008* for ban host prior to reimage - bking@cumin2002 - T387904
- 19:31 bking@cumin2002: START - Cookbook sre.elasticsearch.ban Banning hosts: cloudelastic1008* for ban host prior to reimage - bking@cumin2002 - T387904
- 19:30 marostegui@cumin1002: dbctl commit (dc=all): 'db2154 (re)pooling @ 25%: Repooling after rebuild index', diff saved to https://phabricator.wikimedia.org/P74101 and previous config saved to /var/cache/conftool/dbconfig/20250305-193056-root.json
- 19:19 marostegui@cumin1002: dbctl commit (dc=all): 'db1167 (re)pooling @ 25%: Repooling after rebuild index', diff saved to https://phabricator.wikimedia.org/P74100 and previous config saved to /var/cache/conftool/dbconfig/20250305-191909-root.json
- 19:15 marostegui@cumin1002: dbctl commit (dc=all): 'db2154 (re)pooling @ 10%: Repooling after rebuild index', diff saved to https://phabricator.wikimedia.org/P74099 and previous config saved to /var/cache/conftool/dbconfig/20250305-191550-root.json
- 19:04 marostegui@cumin1002: dbctl commit (dc=all): 'db1167 (re)pooling @ 10%: Repooling after rebuild index', diff saved to https://phabricator.wikimedia.org/P74098 and previous config saved to /var/cache/conftool/dbconfig/20250305-190403-root.json
- 18:45 brett: import trafficserver 9.2.9-1wm1 into bullseye-wikimedia (T388035)
- 18:45 brett: import trafficserver 9.2.9-1wm1 into bullseye-wikimedia
- 18:21 bking@cumin2002: END (PASS) - Cookbook sre.elasticsearch.ban (exit_code=0) Unbanning all hosts in cloudelastic
- 18:21 bking@cumin2002: START - Cookbook sre.elasticsearch.ban Unbanning all hosts in cloudelastic
- 18:20 bking@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host cloudelastic1007.eqiad.wmnet with OS bullseye
- 17:50 hnowlan@deploy1003: helmfile [codfw] DONE helmfile.d/services/shellbox-video: apply
- 17:50 hnowlan@deploy1003: helmfile [codfw] START helmfile.d/services/shellbox-video: apply
- 17:50 hnowlan@deploy1003: helmfile [eqiad] DONE helmfile.d/services/shellbox-video: apply
- 17:50 hnowlan@deploy1003: helmfile [eqiad] START helmfile.d/services/shellbox-video: apply
- 17:50 hnowlan@deploy1003: helmfile [eqiad] DONE helmfile.d/services/shellbox-video: apply
- 17:50 hnowlan@deploy1003: helmfile [eqiad] START helmfile.d/services/shellbox-video: apply
- 17:41 ladsgroup@deploy2002: Finished scap sync-world: Backport for Enable thumbnail steps in testwiki (T360589) (duration: 13m 04s)
- 17:34 ladsgroup@deploy2002: ladsgroup: Continuing with sync
- 17:31 ladsgroup@deploy2002: ladsgroup: Backport for Enable thumbnail steps in testwiki (T360589) synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug)
- 17:28 ladsgroup@deploy2002: Started scap sync-world: Backport for Enable thumbnail steps in testwiki (T360589)
- 17:20 tgr@deploy2002: Finished scap sync-world: Backport for CentralAuth: Enable SUL3 signup on group 0 (attempt 4) (T384007) (duration: 24m 13s)
- 17:14 tgr@deploy2002: tgr: Continuing with sync
- 16:59 tgr@deploy2002: tgr: Backport for CentralAuth: Enable SUL3 signup on group 0 (attempt 4) (T384007) synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug)
- 16:56 tgr@deploy2002: Started scap sync-world: Backport for CentralAuth: Enable SUL3 signup on group 0 (attempt 4) (T384007)
- {{safesubst:SAL entry|1=16:54 tgr@deploy2002: Finished scap sync-world: Backport for CentralAuthIdLookup: Reuse cached object on single-value lookup (T379909 T380500 T387106), CentralAuthIdLookup: Use primary DB after writes (T379909 T380500), Use UserOptionsManager for SUL3 rollout flag (T384549), Make SUL3 global preference optional and simplify logic, [[gerrit:1124785|A}}
- 16:53 elukey@cumin2002: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host puppetserver2004.mgmt.codfw.wmnet with chassis set policy GRACEFUL_RESTART
- 16:53 elukey@cumin2002: START - Cookbook sre.hosts.provision for host puppetserver2004.mgmt.codfw.wmnet with chassis set policy GRACEFUL_RESTART
- 16:53 cmooney@cumin1002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
- 16:50 elukey@cumin2002: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host puppetserver2004.mgmt.codfw.wmnet with chassis set policy GRACEFUL_RESTART
- 16:50 elukey@cumin2002: START - Cookbook sre.hosts.provision for host puppetserver2004.mgmt.codfw.wmnet with chassis set policy GRACEFUL_RESTART
- 16:50 cmooney@cumin1002: START - Cookbook sre.dns.netbox
- 16:48 tgr@deploy2002: tgr: Continuing with sync
- 16:45 vgutierrez@cumin1002: END (PASS) - Cookbook sre.loadbalancer.migrate-service-ipip (exit_code=0) for role: wcqs::public@eqiad
- 16:45 vgutierrez@cumin1002: END (PASS) - Cookbook sre.loadbalancer.restart-pybal (exit_code=0) rolling-restart of pybal on (A:lvs-low-traffic-eqiad or A:lvs-secondary-eqiad) and A:bullseye and A:lvs
- 16:44 vgutierrez@cumin1002: START - Cookbook sre.loadbalancer.restart-pybal rolling-restart of pybal on (A:lvs-low-traffic-eqiad or A:lvs-secondary-eqiad) and A:bullseye and A:lvs
- 16:39 vgutierrez@cumin1002: START - Cookbook sre.loadbalancer.migrate-service-ipip for role: wcqs::public@eqiad
- 16:39 vgutierrez@cumin1002: END (FAIL) - Cookbook sre.loadbalancer.migrate-service-ipip (exit_code=99) for role: wcqs::public@eqiad
- 16:34 vgutierrez@cumin1002: START - Cookbook sre.loadbalancer.migrate-service-ipip for role: wcqs::public@eqiad
- 16:33 vgutierrez@cumin1002: END (PASS) - Cookbook sre.loadbalancer.migrate-service-ipip (exit_code=0) for role: wcqs::public@codfw
- 16:33 vgutierrez@cumin1002: END (PASS) - Cookbook sre.loadbalancer.restart-pybal (exit_code=0) rolling-restart of pybal on (A:lvs-low-traffic-codfw or A:lvs-secondary-codfw) and A:bullseye and A:lvs
- 16:32 vgutierrez@cumin1002: START - Cookbook sre.loadbalancer.restart-pybal rolling-restart of pybal on (A:lvs-low-traffic-codfw or A:lvs-secondary-codfw) and A:bullseye and A:lvs
- 16:28 bking@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on cloudelastic1007.eqiad.wmnet with reason: host reimage
- 16:26 vgutierrez@cumin1002: START - Cookbook sre.loadbalancer.migrate-service-ipip for role: wcqs::public@codfw
- 16:24 bking@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on cloudelastic1007.eqiad.wmnet with reason: host reimage
- {{safesubst:SAL entry|1=16:22 tgr@deploy2002: tgr: Backport for CentralAuthIdLookup: Reuse cached object on single-value lookup (T379909 T380500 T387106), CentralAuthIdLookup: Use primary DB after writes (T379909 T380500), Use UserOptionsManager for SUL3 rollout flag (T384549), Make SUL3 global preference optional and simplify logic, [[gerrit:1124785|Add passive central do}}
- 16:20 jhancock@cumin2002: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host ganeti2048
- 16:20 jhancock@cumin2002: START - Cookbook sre.network.configure-switch-interfaces for host ganeti2048
- {{safesubst:SAL entry|1=16:19 tgr@deploy2002: Started scap sync-world: Backport for CentralAuthIdLookup: Reuse cached object on single-value lookup (T379909 T380500 T387106), CentralAuthIdLookup: Use primary DB after writes (T379909 T380500), Use UserOptionsManager for SUL3 rollout flag (T384549), Make SUL3 global preference optional and simplify logic, [[gerrit:1124785|Ad}}
- 16:19 jhancock@cumin2002: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host ganeti2047
- 16:19 jhancock@cumin2002: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host ganeti2046
- 16:19 jhancock@cumin2002: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host ganeti2045
- 16:19 jhancock@cumin2002: START - Cookbook sre.network.configure-switch-interfaces for host ganeti2047
- 16:19 jhancock@cumin2002: START - Cookbook sre.network.configure-switch-interfaces for host ganeti2046
- 16:19 jhancock@cumin2002: START - Cookbook sre.network.configure-switch-interfaces for host ganeti2045
- 16:19 jhancock@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
- 16:19 jhancock@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: adding ganeti2046-48 to codfw - jhancock@cumin2002"
- 16:17 jhancock@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: adding ganeti2046-48 to codfw - jhancock@cumin2002"
- 16:12 jhancock@cumin2002: START - Cookbook sre.dns.netbox
- 16:05 bking@cumin2002: START - Cookbook sre.hosts.reimage for host cloudelastic1007.eqiad.wmnet with OS bullseye
- 16:00 stevemunene@cumin1002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host dse-k8s-ctrl1002.eqiad.wmnet with OS bookworm
- 15:48 bking@cumin2002: END (FAIL) - Cookbook sre.elasticsearch.ban (exit_code=99) Banning hosts: cloudelastic1007* for ban host prior to reimage - bking@cumin2002 - T387904
- 15:48 bking@cumin2002: START - Cookbook sre.elasticsearch.ban Banning hosts: cloudelastic1007* for ban host prior to reimage - bking@cumin2002 - T387904
- 15:43 ecarg@deploy2002: helmfile [eqiad] DONE helmfile.d/services/wikifunctions: apply
- 15:42 ecarg@deploy2002: helmfile [eqiad] START helmfile.d/services/wikifunctions: apply
- 15:42 ecarg@deploy2002: helmfile [codfw] DONE helmfile.d/services/wikifunctions: apply
- 15:42 ecarg@deploy2002: helmfile [codfw] START helmfile.d/services/wikifunctions: apply
- 15:41 stevemunene@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on dse-k8s-ctrl1002.eqiad.wmnet with reason: host reimage
- 15:40 jynus: starting es backups on new hosts backup1013, backup2013 T387892
- 15:39 ecarg@deploy2002: helmfile [staging] DONE helmfile.d/services/wikifunctions: apply
- 15:38 ecarg@deploy2002: helmfile [staging] START helmfile.d/services/wikifunctions: apply
- 15:37 stevemunene@cumin1002: START - Cookbook sre.hosts.downtime for 2:00:00 on dse-k8s-ctrl1002.eqiad.wmnet with reason: host reimage
- 15:35 bking@cumin2002: END (PASS) - Cookbook sre.elasticsearch.ban (exit_code=0) Unbanning all hosts in cloudelastic
- 15:34 bking@cumin2002: START - Cookbook sre.elasticsearch.ban Unbanning all hosts in cloudelastic
- 15:32 ecarg@deploy2002: helmfile [eqiad] DONE helmfile.d/services/wikifunctions: apply
- 15:32 ecarg@deploy2002: helmfile [eqiad] START helmfile.d/services/wikifunctions: apply
- 15:31 ecarg@deploy2002: helmfile [codfw] DONE helmfile.d/services/wikifunctions: apply
- 15:30 ecarg@deploy2002: helmfile [codfw] START helmfile.d/services/wikifunctions: apply
- 15:30 sukhe: upload dnsdist 1.9.8-1~wmf12u1 to apt.wm.org for bookworm
- 15:28 ecarg@deploy2002: helmfile [staging] DONE helmfile.d/services/wikifunctions: apply
- 15:27 bking@cumin2002: END (ERROR) - Cookbook sre.hosts.reimage (exit_code=93) for host cloudelastic1007.eqiad.wmnet with OS bullseye
- 15:26 ecarg@deploy2002: helmfile [staging] START helmfile.d/services/wikifunctions: apply
- 15:26 stevemunene@cumin1002: START - Cookbook sre.hosts.reimage for host dse-k8s-ctrl1002.eqiad.wmnet with OS bookworm
- 15:24 bking@cumin2002: START - Cookbook sre.hosts.reimage for host cloudelastic1007.eqiad.wmnet with OS bullseye
- 15:23 jayme@deploy1003: helmfile [staging-codfw] DONE helmfile.d/services/mw-debug: apply
- 15:21 jayme@deploy1003: helmfile [staging-codfw] START helmfile.d/services/mw-debug: apply
- 15:19 jayme@deploy1003: helmfile [staging-codfw] DONE helmfile.d/services/mw-debug: apply
- 15:18 ecarg@deploy2002: helmfile [eqiad] DONE helmfile.d/services/wikifunctions: apply
- 15:17 ecarg@deploy2002: helmfile [eqiad] START helmfile.d/services/wikifunctions: apply
- 15:17 ecarg@deploy2002: helmfile [codfw] DONE helmfile.d/services/wikifunctions: apply
- 15:16 ecarg@deploy2002: helmfile [codfw] START helmfile.d/services/wikifunctions: apply
- 15:11 moritzm: installing openssh security updates
- 15:10 ecarg@deploy2002: helmfile [staging] DONE helmfile.d/services/wikifunctions: apply
- 15:09 ecarg@deploy2002: helmfile [staging] START helmfile.d/services/wikifunctions: apply
- 15:09 jayme@deploy1003: helmfile [staging-codfw] START helmfile.d/services/mw-debug: apply
- 15:08 stevemunene@cumin1002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host dse-k8s-ctrl1001.eqiad.wmnet with OS bookworm
- 15:07 jayme@deploy1003: helmfile [staging-eqiad] DONE helmfile.d/services/mw-debug: apply
- 15:07 jayme@deploy1003: helmfile [staging-eqiad] START helmfile.d/services/mw-debug: apply
- 15:04 dreamyjazz@deploy2002: Finished scap sync-world: Backport for Use MediaWikiServices hook for push-subscription-manager changes (T275336), Unset unused IP reveal groups in properly (T387205) (duration: 11m 05s)
- 14:57 dreamyjazz@deploy2002: dreamyjazz, pppery: Continuing with sync
- 14:55 dreamyjazz@deploy2002: dreamyjazz, pppery: Backport for Use MediaWikiServices hook for push-subscription-manager changes (T275336), Unset unused IP reveal groups in properly (T387205) synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug)
- 14:53 dreamyjazz@deploy2002: Started scap sync-world: Backport for Use MediaWikiServices hook for push-subscription-manager changes (T275336), Unset unused IP reveal groups in properly (T387205)
- 14:52 fceratto@cumin1002: END (FAIL) - Cookbook sre.mysql.dbctl (exit_code=99)
- 14:52 fceratto@cumin1002: START - Cookbook sre.mysql.dbctl
- 14:52 dreamyjazz@deploy2002: Finished scap sync-world: Backport for metawiki: Enable Chinese variant translation for message bundles (T387230) (duration: 18m 29s)
- 14:51 bking@cumin2002: END (PASS) - Cookbook sre.elasticsearch.ban (exit_code=0) Banning hosts: cloudelastic1007* for ban host prior to reimage - bking@cumin2002 - T387904
- 14:51 bking@cumin2002: START - Cookbook sre.elasticsearch.ban Banning hosts: cloudelastic1007* for ban host prior to reimage - bking@cumin2002 - T387904
- 14:51 stevemunene@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on dse-k8s-ctrl1001.eqiad.wmnet with reason: host reimage
- 14:48 stevemunene@cumin1002: START - Cookbook sre.hosts.downtime for 2:00:00 on dse-k8s-ctrl1001.eqiad.wmnet with reason: host reimage
- 14:45 fceratto@cumin1002: END (ERROR) - Cookbook sre.mysql.dbctl (exit_code=1)
- 14:45 fceratto@cumin1002: START - Cookbook sre.mysql.dbctl
- 14:45 cmooney@cumin1002: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host ganeti2046
- 14:45 dreamyjazz@deploy2002: abi, dreamyjazz: Continuing with sync
- 14:44 vgutierrez@cumin1002: END (PASS) - Cookbook sre.loadbalancer.migrate-service-ipip (exit_code=0) for role: analytics_cluster::datahub::opensearch@eqiad
- 14:44 vgutierrez@cumin1002: END (PASS) - Cookbook sre.loadbalancer.restart-pybal (exit_code=0) rolling-restart of pybal on (A:lvs-low-traffic-eqiad or A:lvs-secondary-eqiad) and A:bullseye and A:lvs
- 14:44 cmooney@cumin1002: START - Cookbook sre.network.configure-switch-interfaces for host ganeti2046
- 14:43 vgutierrez@cumin1002: START - Cookbook sre.loadbalancer.restart-pybal rolling-restart of pybal on (A:lvs-low-traffic-eqiad or A:lvs-secondary-eqiad) and A:bullseye and A:lvs
- 14:43 fceratto@cumin1002: END (ERROR) - Cookbook sre.mysql.dbctl (exit_code=2)
- 14:43 fceratto@cumin1002: START - Cookbook sre.mysql.dbctl
- 14:42 fceratto@cumin1002: END (PASS) - Cookbook sre.mysql.dbctl (exit_code=0)
- 14:42 fceratto@cumin1002: START - Cookbook sre.mysql.dbctl
- {{safesubst:SAL entry|1=14:23 dreamyjazz@deploy2002: Finished scap sync-world: Backport for Set Flow to read-only on remaining phase 2a wikis (T378834), Remove unused config parameters from ReadingLists extension., Use namespaced Title and Html classes (T166010 T387938), officewiki: Disable the event-organizer user group (T387943), [[gerrit:1124768|Temporarily unset tempor}}
- 14:16 dreamyjazz@deploy2002: daimona, zoe, dreamyjazz, dbrant: Continuing with sync
- 14:14 sukhe: restart pybal on lvs2014
- 14:14 sukhe: restart pybal on lvs2013
- {{safesubst:SAL entry|1=14:12 dreamyjazz@deploy2002: daimona, zoe, dreamyjazz, dbrant: Backport for Set Flow to read-only on remaining phase 2a wikis (T378834), Remove unused config parameters from ReadingLists extension., Use namespaced Title and Html classes (T166010 T387938), officewiki: Disable the event-organizer user group (T387943), [[gerrit:1124768|Temporarily unse}}
- {{safesubst:SAL entry|1=14:09 dreamyjazz@deploy2002: Started scap sync-world: Backport for Set Flow to read-only on remaining phase 2a wikis (T378834), Remove unused config parameters from ReadingLists extension., Use namespaced Title and Html classes (T166010 T387938), officewiki: Disable the event-organizer user group (T387943), [[gerrit:1124768|Temporarily unset tempora}}
- 13:58 marostegui@cumin1002: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db1167.eqiad.wmnet with reason: Index rebuild
- 13:58 root@cumin1002: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db2154.codfw.wmnet with reason: Index rebuild
- 13:58 marostegui@cumin1002: END (PASS) - Cookbook sre.mysql.upgrade (exit_code=0) for db2154.codfw.wmnet
- 13:57 marostegui@cumin1002: END (PASS) - Cookbook sre.mysql.upgrade (exit_code=0) for db1167.eqiad.wmnet
- 13:53 jmm@cumin2002: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts ganeti1032.eqiad.wmnet
- 13:53 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ganeti1032.eqiad.wmnet
- 13:51 marostegui@cumin1002: START - Cookbook sre.mysql.upgrade for db2154.codfw.wmnet
- 13:51 marostegui@cumin1002: START - Cookbook sre.mysql.upgrade for db1167.eqiad.wmnet
- 13:50 marostegui@cumin1002: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on clouddb[1016,1020].eqiad.wmnet,db1154.eqiad.wmnet with reason: Index rebuild
- 13:49 marostegui@cumin1002: dbctl commit (dc=all): 'Depool db2154 db1167', diff saved to https://phabricator.wikimedia.org/P74096 and previous config saved to /var/cache/conftool/dbconfig/20250305-134936-marostegui.json
- 13:42 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host ganeti1032.eqiad.wmnet
- 13:28 klausman@deploy2002: conftool action : set/pooled=yes; selector: name=inference-staging
- 13:27 klausman@deploy2002: conftool action : set/pooled=yes; selector: name=inference
- 13:26 jiji@deploy2002: helmfile [codfw] DONE helmfile.d/services/shellbox-media: apply
- 13:26 jiji@deploy2002: helmfile [codfw] START helmfile.d/services/shellbox-media: apply
- 13:26 jiji@deploy2002: helmfile [eqiad] DONE helmfile.d/services/shellbox-media: apply
- 13:25 jiji@deploy2002: helmfile [eqiad] START helmfile.d/services/shellbox-media: apply
- 13:23 ladsgroup@deploy2002: Finished scap sync-world: Backport for maintenance: Also check for utf-8 encoding in findBadBlobs (T351953), maintenance: Also check for utf-8 encoding in findBadBlobs (T351953) (duration: 11m 31s)
- 13:22 elukey@cumin1002: END (PASS) - Cookbook sre.dns.admin (exit_code=0) DNS admin: pool site eqiad [reason: Repool eqiad after maintenance, no task ID specified]
- 13:22 elukey@cumin1002: START - Cookbook sre.dns.admin DNS admin: pool site eqiad [reason: Repool eqiad after maintenance, no task ID specified]
- 13:18 jmm@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ganeti1032.eqiad.wmnet
- 13:16 ladsgroup@deploy2002: ladsgroup: Continuing with sync
- 13:15 ladsgroup@deploy2002: ladsgroup: Backport for maintenance: Also check for utf-8 encoding in findBadBlobs (T351953), maintenance: Also check for utf-8 encoding in findBadBlobs (T351953) synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug)
- 13:12 jmm@cumin2002: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on ganeti1032.eqiad.wmnet with reason: remove from cluster for reimage
- 13:11 ladsgroup@deploy2002: Started scap sync-world: Backport for maintenance: Also check for utf-8 encoding in findBadBlobs (T351953), maintenance: Also check for utf-8 encoding in findBadBlobs (T351953)
- 13:10 jmm@cumin2002: END (PASS) - Cookbook sre.ganeti.drain-node (exit_code=0) for draining ganeti node ganeti1032.eqiad.wmnet
- 12:50 ladsgroup@deploy2002: ladsgroup: Continuing with sync
- 12:42 ladsgroup@deploy2002: ladsgroup: Backport for maintenance: Also check for utf-8 encoding in findBadBlobs (T351953), maintenance: Also check for utf-8 encoding in findBadBlobs (T351953) synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug)
- 12:39 ladsgroup@deploy2002: Started scap sync-world: Backport for maintenance: Also check for utf-8 encoding in findBadBlobs (T351953), maintenance: Also check for utf-8 encoding in findBadBlobs (T351953)
- 12:35 Emperor: restart envoy/swift on ms-fe2010
- 12:31 marostegui@cumin1002: dbctl commit (dc=all): 'db2166 (re)pooling @ 100%: Repooling after rebuild index', diff saved to https://phabricator.wikimedia.org/P74095 and previous config saved to /var/cache/conftool/dbconfig/20250305-123149-root.json
- 12:25 mvolz@deploy2002: helmfile [eqiad] DONE helmfile.d/services/citoid: apply
- 12:24 mvolz@deploy2002: helmfile [eqiad] START helmfile.d/services/citoid: apply
- 12:23 mvolz@deploy2002: helmfile [codfw] DONE helmfile.d/services/citoid: apply
- 12:23 mvolz@deploy2002: helmfile [codfw] START helmfile.d/services/citoid: apply
- 12:22 mvolz@deploy2002: helmfile [staging] DONE helmfile.d/services/citoid: apply
- 12:21 mvolz@deploy2002: helmfile [staging] START helmfile.d/services/citoid: apply
- 12:20 mvolz@deploy2002: helmfile [staging] DONE helmfile.d/services/citoid: apply
- 12:19 mvolz@deploy2002: helmfile [staging] START helmfile.d/services/citoid: apply
- 12:16 marostegui@cumin1002: dbctl commit (dc=all): 'db2166 (re)pooling @ 75%: Repooling after rebuild index', diff saved to https://phabricator.wikimedia.org/P74094 and previous config saved to /var/cache/conftool/dbconfig/20250305-121643-root.json
- 12:14 jgiannelos@deploy2002: helmfile [eqiad] DONE helmfile.d/services/changeprop: apply
- 12:13 jgiannelos@deploy2002: helmfile [eqiad] START helmfile.d/services/changeprop: apply
- 12:13 jgiannelos@deploy2002: helmfile [eqiad] DONE helmfile.d/services/changeprop: apply
- 12:13 jgiannelos@deploy2002: helmfile [eqiad] START helmfile.d/services/changeprop: apply
- 12:13 jgiannelos@deploy2002: helmfile [codfw] DONE helmfile.d/services/changeprop: apply
- 12:12 jgiannelos@deploy2002: helmfile [codfw] START helmfile.d/services/changeprop: apply
- 12:10 jgiannelos@deploy2002: helmfile [staging] DONE helmfile.d/services/changeprop: apply
- 12:10 jgiannelos@deploy2002: helmfile [staging] START helmfile.d/services/changeprop: apply
- 12:05 slyngshede@dns1004: END - running authdns-update
- 12:03 slyngshede@dns1004: START - running authdns-update
- 12:01 marostegui@cumin1002: dbctl commit (dc=all): 'db2166 (re)pooling @ 50%: Repooling after rebuild index', diff saved to https://phabricator.wikimedia.org/P74093 and previous config saved to /var/cache/conftool/dbconfig/20250305-120138-root.json
- 11:55 marostegui@cumin1002: dbctl commit (dc=all): 'db1226 (re)pooling @ 100%: Repooling after rebuild index', diff saved to https://phabricator.wikimedia.org/P74092 and previous config saved to /var/cache/conftool/dbconfig/20250305-115557-root.json
- 11:46 marostegui@cumin1002: dbctl commit (dc=all): 'db2166 (re)pooling @ 25%: Repooling after rebuild index', diff saved to https://phabricator.wikimedia.org/P74091 and previous config saved to /var/cache/conftool/dbconfig/20250305-114632-root.json
- 11:40 marostegui@cumin1002: dbctl commit (dc=all): 'db1226 (re)pooling @ 75%: Repooling after rebuild index', diff saved to https://phabricator.wikimedia.org/P74090 and previous config saved to /var/cache/conftool/dbconfig/20250305-114051-root.json
- 11:38 klausman@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ml-staging2001.codfw.wmnet
- 11:35 tappof@cumin1002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "network_devices: adding device model - tappof@cumin1002 - T387231"
- 11:34 tappof@cumin1002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "network_devices: adding device model - tappof@cumin1002 - T387231"
- 11:32 klausman@cumin2002: START - Cookbook sre.hosts.reboot-single for host ml-staging2001.codfw.wmnet
- 11:31 marostegui@cumin1002: dbctl commit (dc=all): 'db2166 (re)pooling @ 10%: Repooling after rebuild index', diff saved to https://phabricator.wikimedia.org/P74089 and previous config saved to /var/cache/conftool/dbconfig/20250305-113126-root.json
- 11:29 hnowlan@deploy2002: helmfile [codfw] DONE helmfile.d/services/mw-web: apply
- 11:29 hnowlan@deploy2002: helmfile [codfw] START helmfile.d/services/mw-web: apply
- 11:25 marostegui@cumin1002: dbctl commit (dc=all): 'db1226 (re)pooling @ 50%: Repooling after rebuild index', diff saved to https://phabricator.wikimedia.org/P74088 and previous config saved to /var/cache/conftool/dbconfig/20250305-112545-root.json
- 11:10 marostegui@cumin1002: dbctl commit (dc=all): 'db1226 (re)pooling @ 25%: Repooling after rebuild index', diff saved to https://phabricator.wikimedia.org/P74087 and previous config saved to /var/cache/conftool/dbconfig/20250305-111040-root.json
- 11:07 elukey@cumin1002: END (PASS) - Cookbook sre.dns.admin (exit_code=0) DNS admin: depool site eqiad [reason: no reason specified, no task ID specified]
- 11:07 elukey@cumin1002: START - Cookbook sre.dns.admin DNS admin: depool site eqiad [reason: no reason specified, no task ID specified]
- 10:57 hnowlan@deploy2002: helmfile [codfw] DONE helmfile.d/services/mw-web: apply
- 10:57 hnowlan@deploy2002: helmfile [codfw] START helmfile.d/services/mw-web: apply
- 10:57 hnowlan@deploy2002: helmfile [codfw] DONE helmfile.d/services/mw-api-int: apply
- 10:57 hnowlan@deploy2002: helmfile [codfw] START helmfile.d/services/mw-api-int: apply
- 10:56 hnowlan@deploy2002: helmfile [codfw] DONE helmfile.d/services/mw-api-ext: apply
- 10:56 hnowlan@deploy2002: helmfile [codfw] START helmfile.d/services/mw-api-ext: apply
- 10:55 marostegui@cumin1002: dbctl commit (dc=all): 'db1226 (re)pooling @ 10%: Repooling after rebuild index', diff saved to https://phabricator.wikimedia.org/P74086 and previous config saved to /var/cache/conftool/dbconfig/20250305-105534-root.json
- 10:38 filippo@cumin1002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host prometheus2007.codfw.wmnet
- 10:38 filippo@cumin1002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host prometheus2008.codfw.wmnet
- 10:33 marostegui@cumin1002: dbctl commit (dc=all): 'db1160 (re)pooling @ 100%: Repooling after rebuild index', diff saved to https://phabricator.wikimedia.org/P74085 and previous config saved to /var/cache/conftool/dbconfig/20250305-103316-root.json
- 10:32 elukey: restart kube-apiserver on ml-staging-ctrl200[12] after the move to containerd (some issues regisstered)
- 10:31 filippo@cumin1002: START - Cookbook sre.hosts.reboot-single for host prometheus2008.codfw.wmnet
- 10:30 filippo@cumin1002: START - Cookbook sre.hosts.reboot-single for host prometheus2007.codfw.wmnet
- 10:18 marostegui@cumin1002: dbctl commit (dc=all): 'db1160 (re)pooling @ 75%: Repooling after rebuild index', diff saved to https://phabricator.wikimedia.org/P74084 and previous config saved to /var/cache/conftool/dbconfig/20250305-101810-root.json
- 10:03 marostegui@cumin1002: dbctl commit (dc=all): 'db1160 (re)pooling @ 50%: Repooling after rebuild index', diff saved to https://phabricator.wikimedia.org/P74083 and previous config saved to /var/cache/conftool/dbconfig/20250305-100304-root.json
- 09:58 volans@cumin1002: END (PASS) - Cookbook sre.netbox.update-extras (exit_code=0) rolling restart_daemons on A:netbox
- 09:58 volans@cumin1002: START - Cookbook sre.netbox.update-extras rolling restart_daemons on A:netbox
- 09:58 volans@cumin1002: END (PASS) - Cookbook sre.netbox.update-extras (exit_code=0) rolling restart_daemons on A:netbox-canary
- 09:57 volans@cumin1002: START - Cookbook sre.netbox.update-extras rolling restart_daemons on A:netbox-canary
- 09:55 fceratto@cumin1002: END (ERROR) - Cookbook sre.mysql.pool (exit_code=97) db1202 gradually with 4 steps - Cloned db1202 to db1253
- 09:48 marostegui@cumin1002: dbctl commit (dc=all): 'db1160 (re)pooling @ 25%: Repooling after rebuild index', diff saved to https://phabricator.wikimedia.org/P74081 and previous config saved to /var/cache/conftool/dbconfig/20250305-094759-root.json
- 09:39 jmm@cumin2002: START - Cookbook sre.ganeti.drain-node for draining ganeti node ganeti1032.eqiad.wmnet
- 09:38 jmm@cumin2002: END (PASS) - Cookbook sre.ganeti.drain-node (exit_code=0) for draining ganeti node ganeti1032.eqiad.wmnet
- 09:38 jmm@cumin2002: START - Cookbook sre.ganeti.drain-node for draining ganeti node ganeti1032.eqiad.wmnet
- 09:36 jmm@cumin2002: END (PASS) - Cookbook sre.ganeti.changedisk (exit_code=0) for changing disk type of dse-k8s-etcd1001.eqiad.wmnet to plain
- 09:36 jmm@cumin2002: START - Cookbook sre.ganeti.changedisk for changing disk type of dse-k8s-etcd1001.eqiad.wmnet to plain
- 09:35 jmm@cumin2002: END (PASS) - Cookbook sre.ganeti.drain-node (exit_code=0) for draining ganeti node ganeti1032.eqiad.wmnet
- 09:35 jmm@cumin2002: START - Cookbook sre.ganeti.drain-node for draining ganeti node ganeti1032.eqiad.wmnet
- 09:33 fceratto@cumin1002: START - Cookbook sre.mysql.pool db1202 gradually with 4 steps - Cloned db1202 to db1253
- 09:32 marostegui@cumin1002: dbctl commit (dc=all): 'db1160 (re)pooling @ 10%: Repooling after rebuild index', diff saved to https://phabricator.wikimedia.org/P74078 and previous config saved to /var/cache/conftool/dbconfig/20250305-093254-root.json
- 09:32 fceratto@cumin1002: dbctl commit (dc=all): 'Cloned db1202 to db1253', diff saved to https://phabricator.wikimedia.org/P74077 and previous config saved to /var/cache/conftool/dbconfig/20250305-093249-fceratto.json
- 09:31 fceratto@cumin1002: END (ERROR) - Cookbook sre.mysql.pool (exit_code=97) db1202 gradually with 4 steps - Cloned db1202 to db1253
- 09:30 fceratto@cumin1002: START - Cookbook sre.mysql.pool db1202 gradually with 4 steps - Cloned db1202 to db1253
- 09:27 jmm@cumin2002: END (PASS) - Cookbook sre.ganeti.changedisk (exit_code=0) for changing disk type of dse-k8s-etcd1001.eqiad.wmnet to drbd
- 09:23 jynus: deploy new backup grants for es2036,es2040 T387892
- 09:18 jynus: deploy new backup grants for es1036,es1040 T387892
- 09:17 jmm@cumin2002: START - Cookbook sre.ganeti.changedisk for changing disk type of dse-k8s-etcd1001.eqiad.wmnet to drbd
- 09:16 jmm@cumin2002: END (PASS) - Cookbook sre.ganeti.changedisk (exit_code=0) for changing disk type of aux-k8s-etcd1003.eqiad.wmnet to plain
- 09:15 jmm@cumin2002: START - Cookbook sre.ganeti.changedisk for changing disk type of aux-k8s-etcd1003.eqiad.wmnet to plain
- 09:15 hashar@deploy2002: rebuilt and synchronized wikiversions files: group1 to 1.44.0-wmf.19 refs T386214
- 09:15 godog: upgrade to karma 0.120 - T353457
- 09:15 jmm@cumin2002: END (PASS) - Cookbook sre.ganeti.drain-node (exit_code=0) for draining ganeti node ganeti1032.eqiad.wmnet
- 09:14 jmm@cumin2002: START - Cookbook sre.ganeti.drain-node for draining ganeti node ganeti1032.eqiad.wmnet
- 09:14 jmm@cumin2002: END (PASS) - Cookbook sre.ganeti.changedisk (exit_code=0) for changing disk type of aux-k8s-etcd1003.eqiad.wmnet to drbd
- 09:09 fceratto@cumin1002: END (FAIL) - Cookbook sre.mysql.pool (exit_code=99) db1202 gradually with 4 steps - Cloned db1202 to db1253
- 09:08 fceratto@cumin1002: START - Cookbook sre.mysql.pool db1202 gradually with 4 steps - Cloned db1202 to db1253
- 09:07 marostegui: Stop db1217:3321 to clone db1250 T385141
- 09:05 marostegui@cumin1002: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1217.eqiad.wmnet with reason: cloning
- 09:04 klausman@deploy2002: helmfile [ml-staging-codfw] DONE helmfile.d/admin 'apply'.
- 09:04 klausman@deploy2002: helmfile [ml-staging-codfw] START helmfile.d/admin 'apply'.
- 08:55 tappof@cumin1002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "network_devices: adding device model - tappof@cumin1002 - T387231"
- 08:54 tappof@cumin1002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "network_devices: adding device model - tappof@cumin1002 - T387231"
- 08:53 jelto@deploy2002: helmfile [codfw] DONE helmfile.d/admin 'apply'.
- 08:52 jelto@deploy2002: helmfile [codfw] START helmfile.d/admin 'apply'.
- 08:51 jelto@deploy2002: helmfile [eqiad] DONE helmfile.d/admin 'apply'.
- 08:50 jelto@deploy2002: helmfile [eqiad] START helmfile.d/admin 'apply'.
- 08:50 jelto@deploy2002: helmfile [staging-codfw] DONE helmfile.d/admin 'apply'.
- 08:50 jelto@deploy2002: helmfile [staging-codfw] START helmfile.d/admin 'apply'.
- 08:49 jmm@cumin2002: START - Cookbook sre.ganeti.changedisk for changing disk type of aux-k8s-etcd1003.eqiad.wmnet to drbd
- 08:46 jmm@cumin2002: END (PASS) - Cookbook sre.ganeti.drain-node (exit_code=0) for draining ganeti node ganeti1032.eqiad.wmnet
- 08:45 jmm@cumin2002: START - Cookbook sre.ganeti.drain-node for draining ganeti node ganeti1032.eqiad.wmnet
- 08:33 klausman@deploy2002: helmfile [ml-staging-codfw] Ran 'sync' command on namespace 'article-descriptions' for release 'main' .
- 08:20 hashar@deploy2002: Finished scap sync-world: Backport for Lift IP cap for edit-a-thon (Illinois Tech) on March 12, 2025 (T387568), sewikimedia: update wordmark and tagline (T377921) (duration: 12m 02s)
- 08:14 hashar@deploy2002: hashar, anzx: Continuing with sync
- 08:13 hashar@deploy2002: hashar, anzx: Backport for Lift IP cap for edit-a-thon (Illinois Tech) on March 12, 2025 (T387568), sewikimedia: update wordmark and tagline (T377921) synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug)
- 08:08 hashar@deploy2002: Started scap sync-world: Backport for Lift IP cap for edit-a-thon (Illinois Tech) on March 12, 2025 (T387568), sewikimedia: update wordmark and tagline (T377921)
- 08:03 marostegui@cumin1002: dbctl commit (dc=all): 'db1246 (re)pooling @ 100%: Repooling', diff saved to https://phabricator.wikimedia.org/P74075 and previous config saved to /var/cache/conftool/dbconfig/20250305-080343-root.json
- 07:48 marostegui@cumin1002: dbctl commit (dc=all): 'db1246 (re)pooling @ 75%: Repooling', diff saved to https://phabricator.wikimedia.org/P74074 and previous config saved to /var/cache/conftool/dbconfig/20250305-074838-root.json
- 07:33 marostegui@cumin1002: dbctl commit (dc=all): 'db1246 (re)pooling @ 50%: Repooling', diff saved to https://phabricator.wikimedia.org/P74073 and previous config saved to /var/cache/conftool/dbconfig/20250305-073333-root.json
- 07:18 marostegui@cumin1002: dbctl commit (dc=all): 'db1246 (re)pooling @ 25%: Repooling', diff saved to https://phabricator.wikimedia.org/P74072 and previous config saved to /var/cache/conftool/dbconfig/20250305-071827-root.json
- 07:03 marostegui@cumin1002: dbctl commit (dc=all): 'db1246 (re)pooling @ 10%: Repooling', diff saved to https://phabricator.wikimedia.org/P74071 and previous config saved to /var/cache/conftool/dbconfig/20250305-070321-root.json
- 06:45 marostegui@cumin1002: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db1160.eqiad.wmnet with reason: Rebuilding index
- 06:42 marostegui@cumin1002: END (PASS) - Cookbook sre.mysql.upgrade (exit_code=0) for db1160.eqiad.wmnet
- 06:35 marostegui@cumin1002: START - Cookbook sre.mysql.upgrade for db1160.eqiad.wmnet
- 06:32 marostegui@cumin1002: dbctl commit (dc=all): 'Depool db1160 T387816', diff saved to https://phabricator.wikimedia.org/P74070 and previous config saved to /var/cache/conftool/dbconfig/20250305-063216-marostegui.json
- 06:31 marostegui@cumin1002: dbctl commit (dc=all): 'Promote db1244 to s4 primary T387816', diff saved to https://phabricator.wikimedia.org/P74069 and previous config saved to /var/cache/conftool/dbconfig/20250305-063124-marostegui.json
- 06:30 marostegui: Starting s4 eqiad failover from db1160 to db1244 - T387816
- 06:30 root@cumin1002: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db2166.codfw.wmnet with reason: Index rebuild
- 06:30 marostegui@cumin1002: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db1226.eqiad.wmnet with reason: Index rebuild
- 06:30 marostegui@cumin1002: END (PASS) - Cookbook sre.mysql.upgrade (exit_code=0) for db2166.codfw.wmnet
- 06:29 marostegui@cumin1002: END (PASS) - Cookbook sre.mysql.upgrade (exit_code=0) for db1226.eqiad.wmnet
- 06:26 marostegui@cumin1002: dbctl commit (dc=all): 'Remove db1244 from API/vslow/dump T387816', diff saved to https://phabricator.wikimedia.org/P74068 and previous config saved to /var/cache/conftool/dbconfig/20250305-062629-marostegui.json
- 06:26 marostegui@cumin1002: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on 33 hosts with reason: Primary switchover s4 T387816
- 06:25 marostegui@cumin1002: dbctl commit (dc=all): 'Set db1244 with weight 0 T387816', diff saved to https://phabricator.wikimedia.org/P74067 and previous config saved to /var/cache/conftool/dbconfig/20250305-062554-marostegui.json
- 06:24 marostegui@cumin1002: START - Cookbook sre.mysql.upgrade for db1226.eqiad.wmnet
- 06:24 marostegui@cumin1002: START - Cookbook sre.mysql.upgrade for db2166.codfw.wmnet
- 06:24 marostegui@cumin1002: dbctl commit (dc=all): 'Depool db2166 db1226', diff saved to https://phabricator.wikimedia.org/P74066 and previous config saved to /var/cache/conftool/dbconfig/20250305-062402-marostegui.json
- 03:42 ejegg: donorwiki upgraded from 05f7d8cc to 1b6c275a
- 02:05 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host backup2014.codfw.wmnet with OS bookworm
- 02:05 jhancock@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002"
- 02:00 jhancock@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002"
- 01:40 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on backup2014.codfw.wmnet with reason: host reimage
- 01:36 jhancock@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on backup2014.codfw.wmnet with reason: host reimage
- 01:18 jhancock@cumin2002: START - Cookbook sre.hosts.reimage for host backup2014.codfw.wmnet with OS bookworm
- 01:12 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host backup2014.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED
- 01:01 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host backup2014.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED
- 01:00 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host backup2013.codfw.wmnet with OS bookworm
- 01:00 jhancock@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002"
- 00:55 jhancock@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002"
- 00:36 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on backup2013.codfw.wmnet with reason: host reimage
- 00:32 jhancock@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on backup2013.codfw.wmnet with reason: host reimage
- 00:14 jhancock@cumin2002: START - Cookbook sre.hosts.reimage for host backup2013.codfw.wmnet with OS bookworm
- 00:11 jhancock@cumin2002: END (FAIL) - Cookbook sre.hardware.upgrade-firmware (exit_code=99) upgrade firmware for hosts ['backup2013']
- 00:11 jhancock@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['backup2013']
- 00:10 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host backup2013.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED
- 00:03 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host backup2013.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED
- 00:02 jhancock@cumin2002: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host backup2014.codfw.wmnet with OS bookworm
2025-03-04
- 23:53 swfrench-wmf: started shellbox-media PHP 8.1 pilot with increased logging and display_startup_errors fix - T377038
- 23:51 swfrench@deploy2002: helmfile [codfw] DONE helmfile.d/services/shellbox-media: apply
- 23:51 swfrench@deploy2002: helmfile [codfw] START helmfile.d/services/shellbox-media: apply
- 23:49 swfrench@deploy2002: helmfile [eqiad] DONE helmfile.d/services/shellbox-media: apply
- 23:49 swfrench@deploy2002: helmfile [eqiad] START helmfile.d/services/shellbox-media: apply
- 23:11 tgr_: UTC very late deploys done
- 23:08 tgr@deploy2002: Finished scap sync-world: Backport for Revert "CentralAuth: Enable SUL3 signup on group 0 (attempt 3)" (duration: 11m 36s)
- 23:01 tgr@deploy2002: trainbranchbot, tgr: Continuing with sync
- 22:59 tgr@deploy2002: trainbranchbot, tgr: Backport for Revert "CentralAuth: Enable SUL3 signup on group 0 (attempt 3)" synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug)
- 22:56 tgr@deploy2002: Started scap sync-world: Backport for Revert "CentralAuth: Enable SUL3 signup on group 0 (attempt 3)"
- 22:50 tgr@deploy2002: Sync cancelled.
- 22:35 ebernhardson@deploy2002: helmfile [eqiad] DONE helmfile.d/services/cirrus-streaming-updater: apply
- 22:34 ebernhardson@deploy2002: helmfile [eqiad] START helmfile.d/services/cirrus-streaming-updater: apply
- 22:29 tgr@deploy2002: tgr: Backport for CentralAuth: Enable SUL3 signup on group 0 (attempt 3) (T384007) synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug)
- 22:26 tgr@deploy2002: Started scap sync-world: Backport for CentralAuth: Enable SUL3 signup on group 0 (attempt 3) (T384007)
- 22:21 jdrewniak@deploy2002: Finished scap sync-world: Backport for Deploy Search AB test to everywhere but English wiki (T386849) (duration: 13m 34s)
- 22:19 Amir1: clearing user_real_name in group0 wikis (T387212)
- 22:15 jdrewniak@deploy2002: jdrewniak, bwang: Continuing with sync
- 22:11 jdrewniak@deploy2002: jdrewniak, bwang: Backport for Deploy Search AB test to everywhere but English wiki (T386849) synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug)
- 22:08 jdrewniak@deploy2002: Started scap sync-world: Backport for Deploy Search AB test to everywhere but English wiki (T386849)
- {{safesubst:SAL entry|1=21:53 jforrester@deploy2002: Finished scap sync-world: Backport for IS: Stop setting wgParserConf, unused since MW 1.36, CS: Stop setting wgTmhWebPlayer, unused since TMH REL1_39, CS: Stop setting wgBabelUseDatabase, unused since Babel REL1_39, CS-labs: Stop setting wgUrlShortenerDB*, unused since UrlShortener REL1_41, [[gerrit:1120505|[Growth] Enab}}
- 21:47 jforrester@deploy2002: jforrester, sgimeno: Continuing with sync
- {{safesubst:SAL entry|1=21:45 jforrester@deploy2002: jforrester, sgimeno: Backport for IS: Stop setting wgParserConf, unused since MW 1.36, CS: Stop setting wgTmhWebPlayer, unused since TMH REL1_39, CS: Stop setting wgBabelUseDatabase, unused since Babel REL1_39, CS-labs: Stop setting wgUrlShortenerDB*, unused since UrlShortener REL1_41, [[gerrit:1120505|[Growth] Enable su}}
- {{safesubst:SAL entry|1=21:42 jforrester@deploy2002: Started scap sync-world: Backport for IS: Stop setting wgParserConf, unused since MW 1.36, CS: Stop setting wgTmhWebPlayer, unused since TMH REL1_39, CS: Stop setting wgBabelUseDatabase, unused since Babel REL1_39, CS-labs: Stop setting wgUrlShortenerDB*, unused since UrlShortener REL1_41, [[gerrit:1120505|[Growth] Enabl}}
- {{safesubst:SAL entry|1=21:41 jforrester@deploy2002: Finished scap sync-world: Backport for fix(surfacing): don't show highlights on protected pages, fix(surfacing): don't show highlights on protected pages, analytics(GrowthExperimentsInteractionLogger): add mediawiki.database to event data (T387286), [[gerrit:1124494|analytics(GrowthExperimentsInteractionLogger): add mediawiki.database to e}}
- 21:34 jforrester@deploy2002: sgimeno, jforrester, migr: Continuing with sync
- {{safesubst:SAL entry|1=21:32 jforrester@deploy2002: sgimeno, jforrester, migr: Backport for fix(surfacing): don't show highlights on protected pages, fix(surfacing): don't show highlights on protected pages, analytics(GrowthExperimentsInteractionLogger): add mediawiki.database to event data (T387286), [[gerrit:1124494|analytics(GrowthExperimentsInteractionLogger): add mediawiki.database to}}
- 21:30 jhancock@cumin2002: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host ganeti2045.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART
- {{safesubst:SAL entry|1=21:28 jforrester@deploy2002: Started scap sync-world: Backport for fix(surfacing): don't show highlights on protected pages, fix(surfacing): don't show highlights on protected pages, analytics(GrowthExperimentsInteractionLogger): add mediawiki.database to event data (T387286), [[gerrit:1124494|analytics(GrowthExperimentsInteractionLogger): add mediawiki.database to ev}}
- 21:25 jforrester@deploy2002: Finished scap sync-world: Backport for Revert "styles: Remove transparent PNG fallback for `.vector-icon`" (T358910 T387351) (duration: 10m 13s)
- 21:25 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host ganeti2045.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART
- 21:24 jhancock@cumin2002: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host ganeti2045
- 21:23 jhancock@cumin2002: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host backup2013.codfw.wmnet with OS bookworm
- 21:23 vriley@cumin1002: START - Cookbook sre.hosts.provision for host an-worker1181.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED
- 21:23 jhancock@cumin2002: START - Cookbook sre.network.configure-switch-interfaces for host ganeti2045
- 21:18 jforrester@deploy2002: jforrester, jdlrobson: Continuing with sync
- 21:18 jforrester@deploy2002: jforrester, jdlrobson: Backport for Revert "styles: Remove transparent PNG fallback for `.vector-icon`" (T358910 T387351) synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug)
- 21:15 jforrester@deploy2002: Started scap sync-world: Backport for Revert "styles: Remove transparent PNG fallback for `.vector-icon`" (T358910 T387351)
- 21:14 jforrester@deploy2002: Finished scap sync-world: Backport for docroot: Enable Chrome credential sharing on all open SUL wikis (T385520) (duration: 10m 33s)
- 21:08 vriley@cumin1002: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host an-worker1181.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED
- 21:08 jforrester@deploy2002: jforrester, krinkle: Continuing with sync
- 21:07 vriley@cumin1002: START - Cookbook sre.hosts.provision for host an-worker1181.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED
- 21:07 vriley@cumin1002: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host an-worker1181.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED
- 21:07 jforrester@deploy2002: jforrester, krinkle: Backport for docroot: Enable Chrome credential sharing on all open SUL wikis (T385520) synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug)
- 21:06 vriley@cumin1002: START - Cookbook sre.hosts.provision for host an-worker1181.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED
- 21:06 vriley@cumin1002: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host an-worker1181.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED
- 21:05 vriley@cumin1002: START - Cookbook sre.hosts.provision for host an-worker1181.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED
- 21:04 jforrester@deploy2002: Started scap sync-world: Backport for docroot: Enable Chrome credential sharing on all open SUL wikis (T385520)
- 20:40 jhancock@cumin2002: START - Cookbook sre.hosts.reimage for host backup2013.codfw.wmnet with OS bookworm
- 20:05 cmooney@cumin1002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
- 20:05 cmooney@cumin1002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: update dns names for test servers nokia lab - cmooney@cumin1002"
- 20:04 cmooney@cumin1002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: update dns names for test servers nokia lab - cmooney@cumin1002"
- 20:01 cmooney@cumin1002: START - Cookbook sre.dns.netbox
- 19:33 xcollazo@deploy2002: Finished deploy [airflow-dags/analytics@10615c9]: Deploy latet DAGs for analytics Airflow instance. T387906. (duration: 00m 34s)
- 19:32 xcollazo@deploy2002: Started deploy [airflow-dags/analytics@10615c9]: Deploy latet DAGs for analytics Airflow instance. T387906.
- 19:20 hashar@deploy2002: rebuilt and synchronized wikiversions files: group0 to 1.44.0-wmf.19 refs T386214
- 18:51 fceratto@cumin1002: END (FAIL) - Cookbook sre.mysql.clone (exit_code=99) of db1202.eqiad.wmnet onto db1253.eqiad.wmnet
- 18:43 hashar@deploy2002: Finished scap sync-world: Backport for Fix typo in wgTrackGlobalJsonLinksNamespaces (T387843 T385917) (duration: 14m 51s)
- 18:41 klausman@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host ml-staging2003.codfw.wmnet with OS bookworm
- 18:36 hashar@deploy2002: hashar, bvibber: Continuing with sync
- 18:36 hashar@deploy2002: hashar, bvibber: Backport for Fix typo in wgTrackGlobalJsonLinksNamespaces (T387843 T385917) synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug)
- 18:28 hashar@deploy2002: Started scap sync-world: Backport for Fix typo in wgTrackGlobalJsonLinksNamespaces (T387843 T385917)
- 18:26 klausman@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on ml-staging2003.codfw.wmnet with reason: host reimage
- 18:23 klausman@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on ml-staging2003.codfw.wmnet with reason: host reimage
- 18:20 swfrench-wmf: serving 25% of mw-api-int traffic on PHP 8.1 - T383845
- 18:19 swfrench@deploy2002: helmfile [codfw] DONE helmfile.d/services/mw-api-int: apply
- 18:19 swfrench@deploy2002: helmfile [codfw] START helmfile.d/services/mw-api-int: apply
- 18:18 swfrench@deploy2002: helmfile [eqiad] DONE helmfile.d/services/mw-api-int: apply
- 18:18 swfrench@deploy2002: helmfile [eqiad] START helmfile.d/services/mw-api-int: apply
- 18:16 swfrench@deploy2002: helmfile [codfw] DONE helmfile.d/services/mw-api-int: apply
- 18:16 swfrench@deploy2002: helmfile [codfw] START helmfile.d/services/mw-api-int: apply
- 18:15 swfrench@deploy2002: helmfile [eqiad] DONE helmfile.d/services/mw-api-int: apply
- 18:15 swfrench@deploy2002: helmfile [eqiad] START helmfile.d/services/mw-api-int: apply
- 18:10 klausman@cumin2002: END (PASS) - Cookbook sre.hosts.move-vlan (exit_code=0) for host ml-staging2003
- 18:10 klausman@cumin2002: START - Cookbook sre.hosts.move-vlan for host ml-staging2003
- 18:09 klausman@cumin2002: START - Cookbook sre.hosts.reimage for host ml-staging2003.codfw.wmnet with OS bookworm
- 18:05 klausman@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host ml-staging2001.codfw.wmnet with OS bookworm
- 17:59 dcausse@deploy2002: helmfile [eqiad] DONE helmfile.d/services/cirrus-streaming-updater: apply
- 17:58 dcausse@deploy2002: helmfile [eqiad] START helmfile.d/services/cirrus-streaming-updater: apply
- 17:58 swfrench@deploy2002: Finished scap sync-world: Use latest php8.1 images - T377038 (duration: 24m 53s)
- 17:56 dcausse@deploy2002: helmfile [eqiad] DONE helmfile.d/services/cirrus-streaming-updater: apply
- 17:55 dcausse@deploy2002: helmfile [eqiad] START helmfile.d/services/cirrus-streaming-updater: apply
- 17:53 dcausse@deploy2002: helmfile [eqiad] DONE helmfile.d/services/cirrus-streaming-updater: apply
- 17:53 ejegg: donorwiki upgraded from 98027151 to 05f7d8cc
- 17:52 dcausse@deploy2002: helmfile [eqiad] START helmfile.d/services/cirrus-streaming-updater: apply
- 17:50 dcausse@deploy2002: helmfile [codfw] DONE helmfile.d/services/cirrus-streaming-updater: apply
- 17:49 dcausse@deploy2002: helmfile [codfw] START helmfile.d/services/cirrus-streaming-updater: apply
- 17:48 dcausse@deploy2002: helmfile [staging] DONE helmfile.d/services/cirrus-streaming-updater: apply
- 17:47 dcausse@deploy2002: helmfile [staging] START helmfile.d/services/cirrus-streaming-updater: apply
- 17:46 klausman@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on ml-staging2001.codfw.wmnet with reason: host reimage
- 17:42 klausman@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on ml-staging2001.codfw.wmnet with reason: host reimage
- 17:33 swfrench@deploy2002: Started scap sync-world: Use latest php8.1 images - T377038
- 17:32 marostegui@cumin1002: dbctl commit (dc=all): 'db2161 (re)pooling @ 100%: Repooling after rebuild index', diff saved to https://phabricator.wikimedia.org/P74064 and previous config saved to /var/cache/conftool/dbconfig/20250304-173228-root.json
- 17:31 swfrench-wmf: built php8.1 production images with 'php8.1: Set display_startup_errors consistent with display_errors' - T377038
- 17:25 klausman@cumin2002: START - Cookbook sre.hosts.reimage for host ml-staging2001.codfw.wmnet with OS bookworm
- 17:24 klausman@cumin2002: END (ERROR) - Cookbook sre.hosts.reimage (exit_code=97) for host ml-staging2001.codfw.wmnet with OS bookworm
- 17:17 marostegui@cumin1002: dbctl commit (dc=all): 'db2161 (re)pooling @ 75%: Repooling after rebuild index', diff saved to https://phabricator.wikimedia.org/P74062 and previous config saved to /var/cache/conftool/dbconfig/20250304-171722-root.json
- 17:12 jgiannelos@deploy2002: helmfile [eqiad] DONE helmfile.d/services/mobileapps: apply
- 17:11 jgiannelos@deploy2002: helmfile [eqiad] START helmfile.d/services/mobileapps: apply
- 17:09 jgiannelos@deploy2002: helmfile [codfw] DONE helmfile.d/services/mobileapps: apply
- 17:08 jgiannelos@deploy2002: helmfile [codfw] START helmfile.d/services/mobileapps: apply
- 17:03 jgiannelos@deploy2002: helmfile [staging] DONE helmfile.d/services/mobileapps: apply
- 17:02 marostegui@cumin1002: dbctl commit (dc=all): 'db1214 (re)pooling @ 100%: Repooling after rebuild index', diff saved to https://phabricator.wikimedia.org/P74061 and previous config saved to /var/cache/conftool/dbconfig/20250304-170223-root.json
- 17:02 marostegui@cumin1002: dbctl commit (dc=all): 'db2161 (re)pooling @ 50%: Repooling after rebuild index', diff saved to https://phabricator.wikimedia.org/P74060 and previous config saved to /var/cache/conftool/dbconfig/20250304-170217-root.json
- 17:02 jgiannelos@deploy2002: helmfile [staging] START helmfile.d/services/mobileapps: apply
- 17:02 ryankemper@dns1004: END - running authdns-update
- 17:00 claime: Closing UTC afternoon backport window
- 16:59 ryankemper@dns1004: START - running authdns-update
- 16:58 cgoubert@deploy2002: Finished scap sync-world: Backport for Revert^2 "When executing cli scripts, wait for the service mesh" (T387208) (duration: 10m 42s)
- 16:57 jgiannelos@deploy2002: Finished deploy [restbase/deploy@3eb0316]: Add new wikis. Enable prometheus metrics. (duration: 21m 25s)
- 16:55 klausman@cumin2002: END (PASS) - Cookbook sre.hosts.move-vlan (exit_code=0) for host ml-staging2001
- 16:55 klausman@cumin2002: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host ml-staging2001
- 16:55 klausman@cumin2002: START - Cookbook sre.network.configure-switch-interfaces for host ml-staging2001
- 16:55 klausman@cumin2002: END (PASS) - Cookbook sre.dns.wipe-cache (exit_code=0) ml-staging2001.codfw.wmnet 201.0.192.10.in-addr.arpa 1.0.2.0.0.0.0.0.2.9.1.0.0.1.0.0.1.0.1.0.0.6.8.0.0.0.0.0.0.2.6.2.ip6.arpa on all recursors
- 16:55 klausman@cumin2002: START - Cookbook sre.dns.wipe-cache ml-staging2001.codfw.wmnet 201.0.192.10.in-addr.arpa 1.0.2.0.0.0.0.0.2.9.1.0.0.1.0.0.1.0.1.0.0.6.8.0.0.0.0.0.0.2.6.2.ip6.arpa on all recursors
- 16:55 klausman@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
- 16:55 klausman@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Update records for host ml-staging2001 - klausman@cumin2002"
- 16:55 klausman@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Update records for host ml-staging2001 - klausman@cumin2002"
- 16:51 cgoubert@deploy2002: cgoubert, oblivian: Continuing with sync
- 16:51 klausman@cumin2002: START - Cookbook sre.dns.netbox
- 16:50 klausman@cumin2002: START - Cookbook sre.hosts.move-vlan for host ml-staging2001
- 16:50 cgoubert@deploy2002: cgoubert, oblivian: Backport for Revert^2 "When executing cli scripts, wait for the service mesh" (T387208) synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug)
- 16:49 klausman@cumin2002: START - Cookbook sre.hosts.reimage for host ml-staging2001.codfw.wmnet with OS bookworm
- 16:48 klausman@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host ml-staging2002.codfw.wmnet with OS bookworm
- 16:47 cgoubert@deploy2002: Started scap sync-world: Backport for Revert^2 "When executing cli scripts, wait for the service mesh" (T387208)
- 16:47 marostegui@cumin1002: dbctl commit (dc=all): 'db1214 (re)pooling @ 75%: Repooling after rebuild index', diff saved to https://phabricator.wikimedia.org/P74059 and previous config saved to /var/cache/conftool/dbconfig/20250304-164718-root.json
- 16:47 marostegui@cumin1002: dbctl commit (dc=all): 'db2161 (re)pooling @ 25%: Repooling after rebuild index', diff saved to https://phabricator.wikimedia.org/P74058 and previous config saved to /var/cache/conftool/dbconfig/20250304-164712-root.json
- 16:46 fceratto@cumin1002: START - Cookbook sre.mysql.clone of db1202.eqiad.wmnet onto db1253.eqiad.wmnet
- 16:45 fceratto@cumin1002: END (FAIL) - Cookbook sre.mysql.clone (exit_code=99) of db1202.eqiad.wmnet onto db1253.eqiad.wmnet
- 16:43 fceratto@cumin1002: START - Cookbook sre.mysql.clone of db1202.eqiad.wmnet onto db1253.eqiad.wmnet
- 16:43 cgoubert@deploy2002: Finished scap sync-world: Backport for Enable $wgCampaignEventsSeparateOngoingEvents by default (T386427) (duration: 21m 28s)
- 16:42 fceratto@cumin1002: END (FAIL) - Cookbook sre.mysql.clone (exit_code=99) of db1202.eqiad.wmnet onto db1253.eqiad.wmnet
- 16:41 fceratto@cumin1002: START - Cookbook sre.mysql.clone of db1202.eqiad.wmnet onto db1253.eqiad.wmnet
- 16:39 jmm@cumin2002: END (FAIL) - Cookbook sre.ganeti.addnode (exit_code=99) for new host ganeti1031.eqiad.wmnet to cluster eqiad and group A
- 16:38 jmm@cumin2002: START - Cookbook sre.ganeti.addnode for new host ganeti1031.eqiad.wmnet to cluster eqiad and group A
- 16:36 jgiannelos@deploy2002: Started deploy [restbase/deploy@3eb0316]: Add new wikis. Enable prometheus metrics.
- 16:34 cgoubert@deploy2002: daimona, cgoubert: Continuing with sync
- 16:32 fceratto@cumin1002: END (FAIL) - Cookbook sre.mysql.clone (exit_code=99) of db1202.eqiad.wmnet onto db1253.eqiad.wmnet
- 16:32 marostegui@cumin1002: dbctl commit (dc=all): 'db1214 (re)pooling @ 50%: Repooling after rebuild index', diff saved to https://phabricator.wikimedia.org/P74057 and previous config saved to /var/cache/conftool/dbconfig/20250304-163212-root.json
- 16:32 marostegui@cumin1002: dbctl commit (dc=all): 'db2161 (re)pooling @ 10%: Repooling after rebuild index', diff saved to https://phabricator.wikimedia.org/P74056 and previous config saved to /var/cache/conftool/dbconfig/20250304-163207-root.json
- 16:31 fceratto@cumin1002: START - Cookbook sre.mysql.clone of db1202.eqiad.wmnet onto db1253.eqiad.wmnet
- 16:30 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ganeti1031.eqiad.wmnet
- 16:29 klausman@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on ml-staging2002.codfw.wmnet with reason: host reimage
- 16:29 fceratto@cumin1002: END (FAIL) - Cookbook sre.mysql.clone (exit_code=99) of db1202.eqiad.wmnet onto db1253.eqiad.wmnet
- 16:28 cgoubert@deploy2002: daimona, cgoubert: Backport for Enable $wgCampaignEventsSeparateOngoingEvents by default (T386427) synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug)
- 16:27 dcausse@deploy2002: helmfile [eqiad] DONE helmfile.d/services/rdf-streaming-updater: apply
- 16:27 dcausse@deploy2002: helmfile [eqiad] START helmfile.d/services/rdf-streaming-updater: apply
- 16:25 klausman@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on ml-staging2002.codfw.wmnet with reason: host reimage
- 16:24 dcausse@deploy2002: helmfile [codfw] DONE helmfile.d/services/rdf-streaming-updater: apply
- 16:24 dcausse@deploy2002: helmfile [codfw] START helmfile.d/services/rdf-streaming-updater: apply
- 16:22 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host ganeti1031.eqiad.wmnet
- 16:22 cgoubert@deploy2002: Started scap sync-world: Backport for Enable $wgCampaignEventsSeparateOngoingEvents by default (T386427)
- 16:20 jmm@cumin2002: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts ganeti1031.eqiad.wmnet
- 16:20 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ganeti1031.eqiad.wmnet
- 16:17 cgoubert@deploy2002: Finished scap sync-world: Move image forward (duration: 09m 16s)
- 16:15 jayme@deploy1003: helmfile [staging-eqiad] DONE helmfile.d/services/mw-debug: apply
- 16:15 jayme@deploy1003: helmfile [staging-eqiad] START helmfile.d/services/mw-debug: apply
- 16:11 jayme@deploy1003: helmfile [staging-eqiad] DONE helmfile.d/services/mw-debug: apply
- 16:09 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host ganeti1031.eqiad.wmnet
- 16:08 cgoubert@deploy2002: Started scap sync-world: Move image forward
- 16:07 cgoubert@deploy2002: Finished scap sync-world: Shrink -next releases (duration: 02m 35s)
- 16:07 klausman@cumin2002: START - Cookbook sre.hosts.reimage for host ml-staging2002.codfw.wmnet with OS bookworm
- 16:05 cgoubert@deploy2002: Started scap sync-world: Shrink -next releases
- 16:05 elukey@cumin1002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host ml-staging2002.mgmt.codfw.wmnet with chassis set policy GRACEFUL_RESTART and with Dell SCP reboot policy GRACEFUL
- 16:04 otto@deploy2002: helmfile [codfw] DONE helmfile.d/services/eventgate-logging-external: apply
- 16:04 brennen@deploy2002: Finished deploy [phabricator/deployment@5d2302b]: deploy phab1004 for T387873 (duration: 00m 51s)
- 16:03 brennen@deploy2002: Started deploy [phabricator/deployment@5d2302b]: deploy phab1004 for T387873
- 16:03 elukey@cumin1002: START - Cookbook sre.hosts.provision for host ml-staging2002.mgmt.codfw.wmnet with chassis set policy GRACEFUL_RESTART and with Dell SCP reboot policy GRACEFUL
- 16:03 otto@deploy2002: helmfile [codfw] START helmfile.d/services/eventgate-logging-external: apply
- 16:03 brennen@deploy2002: Finished deploy [phabricator/deployment@5d2302b]: test deploy phab2002 for T387873 (duration: 00m 29s)
- 16:02 otto@deploy2002: helmfile [eqiad] DONE helmfile.d/services/eventgate-logging-external: apply
- 16:02 jayme@deploy1003: helmfile [staging-eqiad] START helmfile.d/services/mw-debug: apply
- 16:02 brennen@deploy2002: Started deploy [phabricator/deployment@5d2302b]: test deploy phab2002 for T387873
- 16:02 otto@deploy2002: helmfile [eqiad] START helmfile.d/services/eventgate-logging-external: apply
- 16:02 jelto@cumin1002: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 0:30:00 on phab1004.eqiad.wmnet with reason: Phabricator deploy
- 16:01 otto@deploy2002: helmfile [staging] DONE helmfile.d/services/eventgate-logging-external: apply
- 16:01 jelto@cumin1002: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 0:30:00 on phab2002.codfw.wmnet with reason: Phabricator deploy
- 16:01 otto@deploy2002: helmfile [staging] START helmfile.d/services/eventgate-logging-external: apply
- 16:01 ottomata: eventgate-logging-external: rolling back to pre node 20 due to bug likely caused by T382173. -- T387850 , T383814
- 15:51 jayme@deploy1003: helmfile [staging-eqiad] DONE helmfile.d/services/mw-debug: apply
- 15:51 cgoubert@deploy2002: helmfile [eqiad] DONE helmfile.d/services/mw-api-ext: apply
- 15:51 cgoubert@deploy2002: helmfile [eqiad] START helmfile.d/services/mw-api-ext: apply
- 15:50 jayme@deploy1003: helmfile [staging-eqiad] START helmfile.d/services/mw-debug: apply
- 15:50 jayme@deploy1003: helmfile [staging-eqiad] DONE helmfile.d/services/mw-debug: apply
- 15:49 jayme@deploy1003: helmfile [staging-eqiad] START helmfile.d/services/mw-debug: apply
- 15:46 jmm@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ganeti1031.eqiad.wmnet
- 15:46 fceratto@cumin1002: START - Cookbook sre.mysql.clone of db1202.eqiad.wmnet onto db1253.eqiad.wmnet
- 15:45 marostegui@cumin1002: dbctl commit (dc=all): 'db1214 (re)pooling @ 25%: Repooling after rebuild index', diff saved to https://phabricator.wikimedia.org/P74054 and previous config saved to /var/cache/conftool/dbconfig/20250304-154537-root.json
- 15:42 jayme@deploy1003: helmfile [staging-eqiad] DONE helmfile.d/services/mw-debug: apply
- 15:41 vgutierrez: repooling lvs5004 running liberica - T384477
- 15:35 jelto@deploy2002: helmfile [staging] DONE helmfile.d/services/miscweb: apply
- 15:35 jelto@deploy2002: helmfile [staging] START helmfile.d/services/miscweb: apply
- 15:34 jelto@deploy2002: helmfile [staging] DONE helmfile.d/services/miscweb: apply
- 15:33 jelto@deploy2002: helmfile [staging] START helmfile.d/services/miscweb: apply
- 15:33 klausman@cumin2002: END (ERROR) - Cookbook sre.hosts.reimage (exit_code=97) for host ml-staging2002.codfw.wmnet with OS bookworm
- 15:32 jayme@deploy1003: helmfile [staging-eqiad] START helmfile.d/services/mw-debug: apply
- 15:30 marostegui@cumin1002: dbctl commit (dc=all): 'db1214 (re)pooling @ 10%: Repooling after rebuild index', diff saved to https://phabricator.wikimedia.org/P74052 and previous config saved to /var/cache/conftool/dbconfig/20250304-153031-root.json
- 15:19 vgutierrez@cumin1002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host lvs5004.eqsin.wmnet with OS bookworm
- 15:16 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host ganeti1031.eqiad.wmnet with OS bookworm
- 14:59 klausman@cumin2002: END (PASS) - Cookbook sre.hosts.move-vlan (exit_code=0) for host ml-staging2002
- 14:59 klausman@cumin2002: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host ml-staging2002
- 14:58 klausman@cumin2002: START - Cookbook sre.network.configure-switch-interfaces for host ml-staging2002
- 14:58 klausman@cumin2002: END (PASS) - Cookbook sre.dns.wipe-cache (exit_code=0) ml-staging2002.codfw.wmnet 174.48.192.10.in-addr.arpa 4.7.1.0.8.4.0.0.2.9.1.0.0.1.0.0.4.0.1.0.0.6.8.0.0.0.0.0.0.2.6.2.ip6.arpa on all recursors
- 14:58 klausman@cumin2002: START - Cookbook sre.dns.wipe-cache ml-staging2002.codfw.wmnet 174.48.192.10.in-addr.arpa 4.7.1.0.8.4.0.0.2.9.1.0.0.1.0.0.4.0.1.0.0.6.8.0.0.0.0.0.0.2.6.2.ip6.arpa on all recursors
- 14:58 klausman@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
- 14:58 klausman@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Update records for host ml-staging2002 - klausman@cumin2002"
- 14:58 klausman@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Update records for host ml-staging2002 - klausman@cumin2002"
- 14:57 sukhe@cumin1002: END (PASS) - Cookbook sre.cdn.roll-upgrade-ats (exit_code=0) Rolling upgrade/restart of Apache Traffic Server on P{cp7005.*} or P{cp7009.*} or P{cp[7011-7014]*} or P{cp7016.*} and A:cp for 9.2.6-1wm2
- 14:57 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on ganeti1031.eqiad.wmnet with reason: host reimage
- {{safesubst:SAL entry|1=14:56 cgoubert@deploy2002: Started scap sync-world: Deploying [[gerrit:1124444|Revert "php8.1: Set display_startup_errors consistent with display_errors"}}
- 14:54 klausman@cumin2002: START - Cookbook sre.dns.netbox
- 14:53 jmm@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on ganeti1031.eqiad.wmnet with reason: host reimage
- 14:53 vgutierrez@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on lvs5004.eqsin.wmnet with reason: host reimage
- 14:51 vgutierrez@cumin1002: END (PASS) - Cookbook sre.loadbalancer.migrate-service-ipip (exit_code=0) for role: eventschemas::service@eqiad
- 14:51 vgutierrez@cumin1002: END (PASS) - Cookbook sre.loadbalancer.restart-pybal (exit_code=0) rolling-restart of pybal on (A:lvs-low-traffic-eqiad or A:lvs-secondary-eqiad) and A:bullseye and A:lvs
- 14:51 klausman@cumin2002: START - Cookbook sre.hosts.move-vlan for host ml-staging2002
- 14:50 vgutierrez@cumin1002: START - Cookbook sre.loadbalancer.restart-pybal rolling-restart of pybal on (A:lvs-low-traffic-eqiad or A:lvs-secondary-eqiad) and A:bullseye and A:lvs
- 14:50 klausman@cumin2002: START - Cookbook sre.hosts.reimage for host ml-staging2002.codfw.wmnet with OS bookworm
- 14:50 vgutierrez@cumin1002: START - Cookbook sre.hosts.downtime for 2:00:00 on lvs5004.eqsin.wmnet with reason: host reimage
- 14:46 moritzm: restarting r/w slapds to pick up libtasn updates
- 14:23 dreamyjazz@deploy2002: Started scap sync-world: Backport for Create temporary-account-viewer group (T387205)
- 14:20 moritzm: installing libtasn1-6 security updates
- 14:17 vgutierrez@cumin1002: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 0:30:00 on lvs5004.eqsin.wmnet with reason: depooled before reimage
- 14:16 vgutierrez: depooling lvs5004 before reimaging - T384477
- 14:03 sukhe@cumin1002: START - Cookbook sre.cdn.roll-upgrade-ats Rolling upgrade/restart of Apache Traffic Server on P{cp7005.*} or P{cp7009.*} or P{cp[7011-7014]*} or P{cp7016.*} and A:cp for 9.2.6-1wm2
- 14:02 jmm@cumin2002: END (PASS) - Cookbook sre.ganeti.reboot-vm (exit_code=0) for VM ldap-replica1003.wikimedia.org
- 13:56 jmm@cumin2002: START - Cookbook sre.ganeti.reboot-vm for VM ldap-replica1003.wikimedia.org
- 13:54 dreamyjazz@deploy2002: dreamyjazz: Continuing with sync
- 13:54 dreamyjazz@deploy2002: dreamyjazz: Backport for Create temporary-account-viewer group (T387205) synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug)
- 13:48 jayme@deploy1003: helmfile [staging-codfw] DONE helmfile.d/admin 'apply'.
- 13:46 jayme@deploy1003: helmfile [staging-eqiad] DONE helmfile.d/admin 'apply'.
- 13:46 jayme@deploy1003: helmfile [staging-codfw] START helmfile.d/admin 'apply'.
- 13:46 jayme@deploy1003: helmfile [staging-eqiad] START helmfile.d/admin 'apply'.
- 13:45 jmm@cumin2002: END (PASS) - Cookbook sre.ganeti.reboot-vm (exit_code=0) for VM ldap-replica1004.wikimedia.org
- 13:43 marostegui@cumin1002: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on an-redacteddb1001.eqiad.wmnet with reason: Rebuilding index
- 13:38 jmm@cumin2002: START - Cookbook sre.ganeti.reboot-vm for VM ldap-replica1004.wikimedia.org
- 13:30 dreamyjazz@deploy2002: Started scap sync-world: Backport for Create temporary-account-viewer group (T387205)
- 13:00 jmm@cumin2002: END (PASS) - Cookbook sre.ganeti.reboot-vm (exit_code=0) for VM ldap-replica2005.wikimedia.org
- 12:51 jmm@cumin2002: START - Cookbook sre.ganeti.reboot-vm for VM ldap-replica2005.wikimedia.org
- 12:48 jmm@cumin2002: END (PASS) - Cookbook sre.ganeti.reboot-vm (exit_code=0) for VM ldap-replica2006.wikimedia.org
- 12:44 jmm@cumin2002: START - Cookbook sre.ganeti.reboot-vm for VM ldap-replica2006.wikimedia.org
- 12:32 marostegui@cumin1002: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1154.eqiad.wmnet with reason: Rebuilding index
- 12:27 jiji@deploy2002: Finished scap sync-world: Deploy php 8.1.34-1-s3 image (duration: 04m 59s)
- 12:23 jiji@deploy2002: Started scap sync-world: Deploy php 8.1.34-1-s3 image
- 12:20 marostegui@cumin1002: dbctl commit (dc=all): 'db2167 (re)pooling @ 100%: Repooling after rebuild index', diff saved to https://phabricator.wikimedia.org/P74048 and previous config saved to /var/cache/conftool/dbconfig/20250304-122057-root.json
- 12:05 marostegui@cumin1002: dbctl commit (dc=all): 'db2167 (re)pooling @ 75%: Repooling after rebuild index', diff saved to https://phabricator.wikimedia.org/P74047 and previous config saved to /var/cache/conftool/dbconfig/20250304-120552-root.json
- 11:50 marostegui@cumin1002: dbctl commit (dc=all): 'db2167 (re)pooling @ 50%: Repooling after rebuild index', diff saved to https://phabricator.wikimedia.org/P74046 and previous config saved to /var/cache/conftool/dbconfig/20250304-115047-root.json
- 11:43 marostegui@cumin1002: dbctl commit (dc=all): 'db1211 (re)pooling @ 100%: Repooling after rebuild index', diff saved to https://phabricator.wikimedia.org/P74045 and previous config saved to /var/cache/conftool/dbconfig/20250304-114358-root.json
- 11:43 claime: Deleting obsolete puppet certs for eventstreams.discovery.wmnet and eventgate-analytics-external.discovery.wmnet
- 11:43 jiji@deploy2002: Started scap sync-world: Deploy php 8.1.34-1-s3 image
- 11:35 marostegui@cumin1002: dbctl commit (dc=all): 'db2167 (re)pooling @ 25%: Repooling after rebuild index', diff saved to https://phabricator.wikimedia.org/P74044 and previous config saved to /var/cache/conftool/dbconfig/20250304-113541-root.json
- 11:28 marostegui@cumin1002: dbctl commit (dc=all): 'db1211 (re)pooling @ 75%: Repooling after rebuild index', diff saved to https://phabricator.wikimedia.org/P74043 and previous config saved to /var/cache/conftool/dbconfig/20250304-112852-root.json
- 11:28 vgutierrez: repooling lvs5005 running liberica - T384477
- 11:23 joal@deploy2002: Finished deploy [airflow-dags/analytics@9a0b051]: Regular analytics weekly train [airflow-dags/analytics@9a0b0519] (duration: 00m 35s)
- 11:22 joal@deploy2002: Started deploy [airflow-dags/analytics@9a0b051]: Regular analytics weekly train [airflow-dags/analytics@9a0b0519]
- 11:22 cgoubert@deploy2002: helmfile [staging-eqiad] DONE helmfile.d/services/mw-debug: apply
- 11:20 marostegui@cumin1002: dbctl commit (dc=all): 'db2167 (re)pooling @ 10%: Repooling after rebuild index', diff saved to https://phabricator.wikimedia.org/P74039 and previous config saved to /var/cache/conftool/dbconfig/20250304-112035-root.json
- 11:20 fceratto@cumin1002: END (FAIL) - Cookbook sre.mysql.clone (exit_code=99) of db1233.eqiad.wmnet onto db1246.eqiad.wmnet
- 11:16 fceratto@cumin1002: START - Cookbook sre.mysql.clone of db1233.eqiad.wmnet onto db1246.eqiad.wmnet
- 11:13 marostegui@cumin1002: dbctl commit (dc=all): 'db1211 (re)pooling @ 50%: Repooling after rebuild index', diff saved to https://phabricator.wikimedia.org/P74038 and previous config saved to /var/cache/conftool/dbconfig/20250304-111347-root.json
- 11:12 vgutierrez@cumin1002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host lvs5005.eqsin.wmnet with OS bookworm
- 11:11 cgoubert@deploy2002: helmfile [staging-eqiad] START helmfile.d/services/mw-debug: apply
- 11:11 marostegui@cumin1002: dbctl commit (dc=all): 'db1184 (re)pooling @ 100%: Repooling after rebuild index', diff saved to https://phabricator.wikimedia.org/P74037 and previous config saved to /var/cache/conftool/dbconfig/20250304-111146-root.json
- 11:10 hashar@deploy2002: Finished scap sync-world: testwikis to 1.44.0-wmf.19 refs T386214 (duration: 11m 09s)
- 11:09 marostegui@cumin1002: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on clouddb1016.eqiad.wmnet with reason: Rebuilding index
- 11:08 joal@deploy2002: Finished deploy [analytics/refinery@dbcd265] (hadoop-test): Regular analytics weekly train TEST [analytics/refinery@dbcd2652] (duration: 00m 35s)
- 11:07 joal@deploy2002: Started deploy [analytics/refinery@dbcd265] (hadoop-test): Regular analytics weekly train TEST [analytics/refinery@dbcd2652]
- 11:07 joal@deploy2002: Finished deploy [analytics/refinery@dbcd265] (thin): Regular analytics weekly train THIN [analytics/refinery@dbcd2652] (duration: 00m 55s)
- 11:06 root@cumin1002: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db2161.codfw.wmnet with reason: Index rebuild
- 11:06 joal@deploy2002: Started deploy [analytics/refinery@dbcd265] (thin): Regular analytics weekly train THIN [analytics/refinery@dbcd2652]
- 11:05 joal@deploy2002: Finished deploy [analytics/refinery@dbcd265]: Regular analytics weekly train [analytics/refinery@dbcd2652] (duration: 02m 58s)
- 11:05 marostegui@cumin1002: END (PASS) - Cookbook sre.mysql.upgrade (exit_code=0) for db2161.codfw.wmnet
- 11:05 root@cumin1002: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db1214.eqiad.wmnet with reason: Index rebuild
- 11:04 marostegui@cumin1002: END (PASS) - Cookbook sre.mysql.upgrade (exit_code=0) for db1214.eqiad.wmnet
- 11:02 joal@deploy2002: Started deploy [analytics/refinery@dbcd265]: Regular analytics weekly train [analytics/refinery@dbcd2652]
- 10:59 hashar@deploy2002: Started scap sync-world: testwikis to 1.44.0-wmf.19 refs T386214
- 10:58 marostegui@cumin1002: START - Cookbook sre.mysql.upgrade for db1214.eqiad.wmnet
- 10:58 marostegui@cumin1002: dbctl commit (dc=all): 'db1211 (re)pooling @ 25%: Repooling after rebuild index', diff saved to https://phabricator.wikimedia.org/P74036 and previous config saved to /var/cache/conftool/dbconfig/20250304-105842-root.json
- 10:58 marostegui@cumin1002: START - Cookbook sre.mysql.upgrade for db2161.codfw.wmnet
- 10:58 marostegui@cumin1002: dbctl commit (dc=all): 'Depool db2161 db1214', diff saved to https://phabricator.wikimedia.org/P74035 and previous config saved to /var/cache/conftool/dbconfig/20250304-105814-marostegui.json
- 10:56 marostegui@cumin1002: dbctl commit (dc=all): 'db1184 (re)pooling @ 75%: Repooling after rebuild index', diff saved to https://phabricator.wikimedia.org/P74034 and previous config saved to /var/cache/conftool/dbconfig/20250304-105640-root.json
- 10:52 vgutierrez@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on lvs5005.eqsin.wmnet with reason: host reimage
- 10:49 marostegui@cumin1002: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on clouddb1019.eqiad.wmnet with reason: Rebuilding index
- 10:48 vgutierrez@cumin1002: START - Cookbook sre.hosts.downtime for 2:00:00 on lvs5005.eqsin.wmnet with reason: host reimage
- 10:43 gkyziridis@deploy2002: helmfile [ml-staging-codfw] Ran 'sync' command on namespace 'experimental' for release 'main' .
- 10:43 marostegui@cumin1002: dbctl commit (dc=all): 'db1211 (re)pooling @ 10%: Repooling after rebuild index', diff saved to https://phabricator.wikimedia.org/P74033 and previous config saved to /var/cache/conftool/dbconfig/20250304-104336-root.json
- 10:41 marostegui@cumin1002: dbctl commit (dc=all): 'db1184 (re)pooling @ 50%: Repooling after rebuild index', diff saved to https://phabricator.wikimedia.org/P74032 and previous config saved to /var/cache/conftool/dbconfig/20250304-104135-root.json
- 10:35 xSavitar: T387789 Ran mwscript-k8s --comment="T387789" -f -- extensions/CentralAuth/maintenance/fixStuckGlobalRename.php --wiki=enwiki --logwiki=metawiki 'JamesVilla44' 'DartsF4' --ignorestatus
- 10:27 vgutierrez@cumin1002: START - Cookbook sre.hosts.reimage for host lvs5005.eqsin.wmnet with OS bookworm
- 10:26 marostegui@cumin1002: dbctl commit (dc=all): 'db1184 (re)pooling @ 25%: Repooling after rebuild index', diff saved to https://phabricator.wikimedia.org/P74031 and previous config saved to /var/cache/conftool/dbconfig/20250304-102630-root.json
- 10:21 hashar@deploy2002: rebuilt and synchronized wikiversions files: group0 to 1.44.0-wmf.19 refs T386214
- 10:20 vgutierrez@cumin1002: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 0:30:00 on lvs5005.eqsin.wmnet with reason: depooled before reimage
- 10:19 vgutierrez: depooling lvs5005 before reimaging - T384477
- 10:11 marostegui@cumin1002: dbctl commit (dc=all): 'db1184 (re)pooling @ 10%: Repooling after rebuild index', diff saved to https://phabricator.wikimedia.org/P74030 and previous config saved to /var/cache/conftool/dbconfig/20250304-101124-root.json
- 10:00 dcausse: wdqs: reconciled Q27151108 on both eqiad & codfw wdqs endpoints (T386998)
- 09:52 aborrero@cumin1002: END (FAIL) - Cookbook sre.hosts.reboot-single (exit_code=99) for host cloudcontrol1005.eqiad.wmnet
- 09:52 marostegui@cumin1002: dbctl commit (dc=all): 'db2147 (re)pooling @ 100%: Repooling after rebuild index', diff saved to https://phabricator.wikimedia.org/P74029 and previous config saved to /var/cache/conftool/dbconfig/20250304-095228-root.json
- 09:41 elukey@deploy1003: helmfile [ml-serve-codfw] DONE helmfile.d/admin 'sync'.
- 09:39 elukey@deploy1003: helmfile [ml-serve-codfw] START helmfile.d/admin 'sync'.
- 09:37 marostegui@cumin1002: dbctl commit (dc=all): 'db2147 (re)pooling @ 75%: Repooling after rebuild index', diff saved to https://phabricator.wikimedia.org/P74028 and previous config saved to /var/cache/conftool/dbconfig/20250304-093723-root.json
- 09:34 elukey@deploy1003: helmfile [ml-serve-eqiad] DONE helmfile.d/admin 'sync'.
- 09:33 elukey@cumin1002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host puppetserver2004.mgmt.codfw.wmnet with chassis set policy GRACEFUL_RESTART
- 09:33 elukey@deploy1003: helmfile [ml-serve-eqiad] START helmfile.d/admin 'sync'.
- 09:32 vgutierrez@cumin1002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host lvs5006.eqsin.wmnet with OS bookworm
- 09:32 sgimeno@deploy2002: Finished scap sync-world: Backport for analytics(HomepageHooks,BeforePageDisplayHandler): log experiment_enrollment interaction on new accounts (T387286) (duration: 12m 01s)
- 09:28 elukey@cumin1002: START - Cookbook sre.hosts.provision for host puppetserver2004.mgmt.codfw.wmnet with chassis set policy GRACEFUL_RESTART
- 09:28 marostegui@cumin1002: dbctl commit (dc=all): 'db1221 (re)pooling @ 100%: Repooling after rebuild index', diff saved to https://phabricator.wikimedia.org/P74027 and previous config saved to /var/cache/conftool/dbconfig/20250304-092839-root.json
- 09:27 marostegui@cumin1002: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1155.eqiad.wmnet with reason: Rebuilding index
- 09:25 sgimeno@deploy2002: sgimeno: Continuing with sync
- 09:23 sgimeno@deploy2002: sgimeno: Backport for analytics(HomepageHooks,BeforePageDisplayHandler): log experiment_enrollment interaction on new accounts (T387286) synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug)
- 09:23 jayme@deploy1003: helmfile [staging] DONE helmfile.d/services/ipoid: apply
- 09:23 jayme@deploy1003: helmfile [staging] START helmfile.d/services/ipoid: apply
- 09:22 jayme@deploy1003: helmfile [staging-eqiad] DONE helmfile.d/admin 'apply'.
- 09:22 marostegui@cumin1002: dbctl commit (dc=all): 'db2147 (re)pooling @ 50%: Repooling after rebuild index', diff saved to https://phabricator.wikimedia.org/P74026 and previous config saved to /var/cache/conftool/dbconfig/20250304-092217-root.json
- 09:21 jayme@deploy1003: helmfile [staging-eqiad] START helmfile.d/admin 'apply'.
- 09:21 jayme@deploy1003: helmfile [staging-codfw] DONE helmfile.d/admin 'apply'.
- 09:20 sgimeno@deploy2002: Started scap sync-world: Backport for analytics(HomepageHooks,BeforePageDisplayHandler): log experiment_enrollment interaction on new accounts (T387286)
- 09:19 jayme@deploy1003: helmfile [staging-codfw] START helmfile.d/admin 'apply'.
- 09:16 sgimeno@deploy2002: Finished scap sync-world: Backport for [Growth] Add mediawiki.product_metrics.growth_product_interaction stream config (T387286) (duration: 16m 01s)
- 09:13 marostegui@cumin1002: dbctl commit (dc=all): 'db1221 (re)pooling @ 75%: Repooling after rebuild index', diff saved to https://phabricator.wikimedia.org/P74025 and previous config saved to /var/cache/conftool/dbconfig/20250304-091334-root.json
- 09:08 sgimeno@deploy2002: sgimeno: Continuing with sync
- 09:07 marostegui@cumin1002: dbctl commit (dc=all): 'db2147 (re)pooling @ 25%: Repooling after rebuild index', diff saved to https://phabricator.wikimedia.org/P74024 and previous config saved to /var/cache/conftool/dbconfig/20250304-090712-root.json
- 09:05 sgimeno@deploy2002: sgimeno: Backport for [Growth] Add mediawiki.product_metrics.growth_product_interaction stream config (T387286) synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug)
- 09:00 sgimeno@deploy2002: Started scap sync-world: Backport for [Growth] Add mediawiki.product_metrics.growth_product_interaction stream config (T387286)
- 09:00 dcausse@deploy2002: helmfile [codfw] DONE helmfile.d/services/eventgate-main: sync
- 08:59 dcausse@deploy2002: helmfile [codfw] START helmfile.d/services/eventgate-main: sync
- 08:58 marostegui@cumin1002: dbctl commit (dc=all): 'db1221 (re)pooling @ 50%: Repooling after rebuild index', diff saved to https://phabricator.wikimedia.org/P74023 and previous config saved to /var/cache/conftool/dbconfig/20250304-085829-root.json
- 08:58 aborrero@cumin1002: START - Cookbook sre.hosts.reboot-single for host cloudcontrol1005.eqiad.wmnet
- 08:57 dcausse@deploy2002: helmfile [eqiad] DONE helmfile.d/services/eventgate-main: sync
- 08:56 dcausse@deploy2002: helmfile [eqiad] START helmfile.d/services/eventgate-main: sync
- 08:55 dcausse: restarting eventgate-main to pickup to new streams (T375821)
- 08:54 dcausse@deploy2002: Finished scap sync-world: Backport for cirrus: add v1 stream for the search update pipeline (T375821) (duration: 41m 17s)
- 08:52 marostegui@cumin1002: dbctl commit (dc=all): 'db2147 (re)pooling @ 10%: Repooling after rebuild index', diff saved to https://phabricator.wikimedia.org/P74022 and previous config saved to /var/cache/conftool/dbconfig/20250304-085207-root.json
- 08:45 elukey@cumin2002: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host puppetserver2004.mgmt.codfw.wmnet with chassis set policy GRACEFUL_RESTART
- 08:45 elukey@cumin2002: START - Cookbook sre.hosts.provision for host puppetserver2004.mgmt.codfw.wmnet with chassis set policy GRACEFUL_RESTART
- 08:44 elukey@cumin2002: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host puppetserver2004.mgmt.codfw.wmnet with chassis set policy GRACEFUL_RESTART
- 08:44 vgutierrez@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on lvs5006.eqsin.wmnet with reason: host reimage
- 08:43 marostegui@cumin1002: dbctl commit (dc=all): 'db1221 (re)pooling @ 25%: Repooling after rebuild index', diff saved to https://phabricator.wikimedia.org/P74021 and previous config saved to /var/cache/conftool/dbconfig/20250304-084325-root.json
- 08:40 vgutierrez@cumin1002: START - Cookbook sre.hosts.downtime for 2:00:00 on lvs5006.eqsin.wmnet with reason: host reimage
- 08:40 dcausse@deploy2002: dcausse: Continuing with sync
- 08:29 dcausse@deploy2002: dcausse: Backport for cirrus: add v1 stream for the search update pipeline (T375821) synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug)
- 08:28 marostegui@cumin1002: dbctl commit (dc=all): 'db1221 (re)pooling @ 10%: Repooling after rebuild index', diff saved to https://phabricator.wikimedia.org/P74020 and previous config saved to /var/cache/conftool/dbconfig/20250304-082819-root.json
- 08:24 marostegui@cumin1002: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on clouddb1015.eqiad.wmnet with reason: Rebuilding index
- 08:17 vgutierrez@cumin1002: START - Cookbook sre.hosts.reimage for host lvs5006.eqsin.wmnet with OS bookworm
- 08:13 dcausse@deploy2002: Started scap sync-world: Backport for cirrus: add v1 stream for the search update pipeline (T375821)
- 08:08 hashar@deploy2002: sync-world aborted: testwikis to 1.44.0-wmf.19 refs T386214 (duration: 05m 10s)
- 08:03 hashar@deploy2002: Started scap sync-world: testwikis to 1.44.0-wmf.19 refs T386214
- 08:00 jmm@cumin2002: START - Cookbook sre.ganeti.drain-node for draining ganeti node ganeti1031.eqiad.wmnet
- 07:59 jmm@cumin2002: END (PASS) - Cookbook sre.ganeti.changedisk (exit_code=0) for changing disk type of kubestagemaster1005.eqiad.wmnet to plain
- 07:57 jmm@cumin2002: START - Cookbook sre.ganeti.changedisk for changing disk type of kubestagemaster1005.eqiad.wmnet to plain
- 07:54 jmm@cumin2002: END (PASS) - Cookbook sre.ganeti.drain-node (exit_code=0) for draining ganeti node ganeti1031.eqiad.wmnet
- 07:53 jmm@cumin2002: START - Cookbook sre.ganeti.drain-node for draining ganeti node ganeti1031.eqiad.wmnet
- 07:51 jmm@cumin2002: END (PASS) - Cookbook sre.ganeti.changedisk (exit_code=0) for changing disk type of kubestagemaster1005.eqiad.wmnet to drbd
- 07:35 jmm@cumin2002: START - Cookbook sre.ganeti.changedisk for changing disk type of kubestagemaster1005.eqiad.wmnet to drbd
- 07:32 jmm@cumin2002: END (PASS) - Cookbook sre.ganeti.drain-node (exit_code=0) for draining ganeti node ganeti1031.eqiad.wmnet
- 07:29 jmm@cumin2002: START - Cookbook sre.ganeti.drain-node for draining ganeti node ganeti1031.eqiad.wmnet
- 06:42 marostegui@cumin1002: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db1184.eqiad.wmnet with reason: Index rebuild
- 06:41 marostegui@cumin1002: END (PASS) - Cookbook sre.mysql.upgrade (exit_code=0) for db1184.eqiad.wmnet
- 06:35 marostegui@cumin1002: START - Cookbook sre.mysql.upgrade for db1184.eqiad.wmnet
- 06:33 marostegui@cumin1002: dbctl commit (dc=all): 'Depool db1184 T387552', diff saved to https://phabricator.wikimedia.org/P74019 and previous config saved to /var/cache/conftool/dbconfig/20250304-063320-marostegui.json
- 06:32 marostegui@cumin1002: dbctl commit (dc=all): 'Promote db1163 to s1 primary T387552', diff saved to https://phabricator.wikimedia.org/P74018 and previous config saved to /var/cache/conftool/dbconfig/20250304-063222-marostegui.json
- 06:27 marostegui@cumin1002: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on 34 hosts with reason: Primary switchover s1 T387552
- 06:27 marostegui@cumin1002: dbctl commit (dc=all): 'Remove db1163 from API/vslow/dump T387552', diff saved to https://phabricator.wikimedia.org/P74017 and previous config saved to /var/cache/conftool/dbconfig/20250304-062717-marostegui.json
- 06:27 marostegui@cumin1002: dbctl commit (dc=all): 'Set db1163 with weight 0 T387552', diff saved to https://phabricator.wikimedia.org/P74016 and previous config saved to /var/cache/conftool/dbconfig/20250304-062702-marostegui.json
- 06:18 marostegui@cumin1002: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db2167.codfw.wmnet with reason: Index rebuild
- 06:18 marostegui@cumin1002: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db1211.eqiad.wmnet with reason: Index rebuild
- 06:17 marostegui@cumin1002: END (PASS) - Cookbook sre.mysql.upgrade (exit_code=0) for db2167.codfw.wmnet
- 06:17 marostegui@cumin1002: END (PASS) - Cookbook sre.mysql.upgrade (exit_code=0) for db1211.eqiad.wmnet
- 06:17 root@cumin1002: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db2147.codfw.wmnet with reason: Index rebuild
- 06:17 root@cumin1002: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db1221.eqiad.wmnet with reason: Index rebuild
- 06:17 marostegui@cumin1002: END (PASS) - Cookbook sre.mysql.upgrade (exit_code=0) for db2147.codfw.wmnet
- 06:16 marostegui@cumin1002: END (PASS) - Cookbook sre.mysql.upgrade (exit_code=0) for db1221.eqiad.wmnet
- 06:13 marostegui@cumin1002: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on clouddb1020.eqiad.wmnet with reason: Rebuilding index
- 06:12 marostegui@cumin1002: START - Cookbook sre.mysql.upgrade for db1211.eqiad.wmnet
- 06:12 marostegui@cumin1002: START - Cookbook sre.mysql.upgrade for db2167.codfw.wmnet
- 06:11 marostegui@cumin1002: dbctl commit (dc=all): 'Depool db2167 db1211', diff saved to https://phabricator.wikimedia.org/P74015 and previous config saved to /var/cache/conftool/dbconfig/20250304-061152-marostegui.json
- 06:10 marostegui@cumin1002: START - Cookbook sre.mysql.upgrade for db1221.eqiad.wmnet
- 06:10 marostegui@cumin1002: START - Cookbook sre.mysql.upgrade for db2147.codfw.wmnet
- 06:10 marostegui@cumin1002: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on an-redacteddb1001.eqiad.wmnet,clouddb[1015,1019].eqiad.wmnet,db1155.eqiad.wmnet with reason: Rebuilding index
- 06:09 marostegui@cumin1002: dbctl commit (dc=all): 'Depool db1221 db2147', diff saved to https://phabricator.wikimedia.org/P74014 and previous config saved to /var/cache/conftool/dbconfig/20250304-060927-marostegui.json
- 05:53 kart_: Updated cxserver to 2025-03-03-041049-production (T369815, T387037)
- 05:52 kartik@deploy2002: helmfile [eqiad] DONE helmfile.d/services/cxserver: apply
- 05:51 kartik@deploy2002: helmfile [eqiad] START helmfile.d/services/cxserver: apply
- 05:51 kartik@deploy2002: helmfile [codfw] DONE helmfile.d/services/cxserver: apply
- 05:50 kartik@deploy2002: helmfile [codfw] START helmfile.d/services/cxserver: apply
- 05:43 kartik@deploy2002: helmfile [staging] DONE helmfile.d/services/cxserver: apply
- 05:43 kartik@deploy2002: helmfile [staging] START helmfile.d/services/cxserver: apply
- 05:05 mwpresync@deploy2002: Pruned MediaWiki: 1.44.0-wmf.16 (duration: 05m 56s)
- 04:02 mwpresync@deploy2002: Started scap sync-world: testwikis to 1.44.0-wmf.19 refs T386214
- 00:59 jhancock@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
- 00:59 jhancock@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: adding ganeti2046-8 to codfw - jhancock@cumin2002"
- 00:59 jhancock@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: adding ganeti2046-8 to codfw - jhancock@cumin2002"
- 00:53 jhancock@cumin2002: START - Cookbook sre.dns.netbox
- 00:50 jhancock@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
- 00:50 jhancock@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: adding ganeti2045 to codfw - jhancock@cumin2002"
- 00:50 jhancock@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: adding ganeti2045 to codfw - jhancock@cumin2002"
- 00:39 jhancock@cumin2002: START - Cookbook sre.dns.netbox
- 00:34 dduvall: deleting older mw-multiversion images on deploy2002 to free space (T387796)
- 00:01 bking@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host elastic1110.eqiad.wmnet with OS bullseye
2025-03-03
- 23:56 bking@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host elastic1109.eqiad.wmnet with OS bullseye
- 23:53 bking@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host elastic1108.eqiad.wmnet with OS bullseye
- 23:50 Amir1: deleted local user_password from labswiki database (T104500 and T161859)
- 23:44 bking@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on elastic1110.eqiad.wmnet with reason: host reimage
- 23:40 bking@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on elastic1109.eqiad.wmnet with reason: host reimage
- 23:38 bking@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on elastic1110.eqiad.wmnet with reason: host reimage
- 23:36 bking@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on elastic1108.eqiad.wmnet with reason: host reimage
- 23:32 bking@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on elastic1109.eqiad.wmnet with reason: host reimage
- 23:32 bking@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on elastic1108.eqiad.wmnet with reason: host reimage
- 23:25 vriley@cumin1002: START - Cookbook sre.hosts.provision for host an-worker1181.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED
- 23:25 jhancock@cumin2002: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host backup2014.codfw.wmnet with OS bookworm
- 23:23 bking@cumin2002: START - Cookbook sre.hosts.reimage for host elastic1110.eqiad.wmnet with OS bullseye
- 23:22 bking@cumin2002: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host elastic1110.eqiad.wmnet with OS bullseye
- 23:17 bking@cumin2002: START - Cookbook sre.hosts.reimage for host elastic1109.eqiad.wmnet with OS bullseye
- 23:16 bking@cumin2002: START - Cookbook sre.hosts.reimage for host elastic1108.eqiad.wmnet with OS bullseye
- 23:02 ebernhardson@deploy2002: helmfile [eqiad] DONE helmfile.d/services/cirrus-streaming-updater: apply
- 23:02 ebernhardson@deploy2002: helmfile [eqiad] START helmfile.d/services/cirrus-streaming-updater: apply
- 23:01 bking@cumin2002: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host elastic1109.eqiad.wmnet with OS bullseye
- 22:56 ryankemper: T384422 Deploying backend.yaml routing patch; after it's deployed we should theoretically be able to see a UI at https://query-legacy-full.wikidata.org/
- 22:52 bking@cumin2002: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host elastic1108.eqiad.wmnet with OS bullseye
- 22:52 tgr_: late UTC deploys done
- 22:51 vriley@cumin1002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host an-worker1181.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED
- 22:51 tgr@deploy2002: Finished scap sync-world: Backport for feat(Surfacing): Add Change Tag for surfaced Add a Link (T387160) (duration: 31m 28s)
- 22:49 ryankemper@dns1004: END - running authdns-update
- 22:47 ryankemper@dns1004: START - running authdns-update
- 22:47 ryankemper: T384422 Merging DNS patch now https://gerrit.wikimedia.org/r/c/operations/dns/+/1122676
- 22:46 ryankemper: T384422 k8s deployment of `wikidata-query-legacy-full-gui` release in codfw looks fine, proceeding to eqiad
- 22:46 ryankemper@deploy2002: helmfile [eqiad] DONE helmfile.d/services/wikidata-query-gui: apply
- 22:45 ryankemper@deploy2002: helmfile [eqiad] START helmfile.d/services/wikidata-query-gui: apply
- 22:41 tgr@deploy2002: migr, tgr: Continuing with sync
- 22:39 ryankemper@deploy2002: helmfile [codfw] DONE helmfile.d/services/wikidata-query-gui: apply
- 22:39 ryankemper@deploy2002: helmfile [codfw] START helmfile.d/services/wikidata-query-gui: apply
- 22:35 tgr@deploy2002: migr, tgr: Backport for feat(Surfacing): Add Change Tag for surfaced Add a Link (T387160) synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug)
- 22:32 ryankemper@deploy2002: helmfile [eqiad] DONE helmfile.d/services/wikidata-query-gui: apply
- 22:32 ryankemper@deploy2002: helmfile [eqiad] START helmfile.d/services/wikidata-query-gui: apply
- 22:24 bking@cumin2002: START - Cookbook sre.hosts.reimage for host elastic1110.eqiad.wmnet with OS bullseye
- 22:21 bking@cumin2002: END (PASS) - Cookbook sre.hosts.rename (exit_code=0) from relforge1007 to elastic1110
- 22:20 bking@cumin2002: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host elastic1110
- 22:20 bking@cumin2002: START - Cookbook sre.network.configure-switch-interfaces for host elastic1110
- 22:20 bking@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
- 22:20 bking@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Renaming relforge1007 to elastic1110 - bking@cumin2002"
- 22:20 bking@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Renaming relforge1007 to elastic1110 - bking@cumin2002"
- 22:19 tgr@deploy2002: Started scap sync-world: Backport for feat(Surfacing): Add Change Tag for surfaced Add a Link (T387160)
- 22:15 vriley@cumin1002: START - Cookbook sre.hosts.provision for host an-worker1181.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED
- 22:13 vriley@cumin1002: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host an-worker1181
- 22:13 vriley@cumin1002: START - Cookbook sre.network.configure-switch-interfaces for host an-worker1181
- 22:12 bking@cumin2002: START - Cookbook sre.dns.netbox
- 22:12 vriley@cumin1002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
- 22:12 vriley@cumin1002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: update mgmt an-worker1181 - vriley@cumin1002"
- 22:11 vriley@cumin1002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: update mgmt an-worker1181 - vriley@cumin1002"
- 22:08 bking@cumin2002: START - Cookbook sre.hosts.rename from relforge1007 to elastic1110
- 22:07 vriley@cumin1002: START - Cookbook sre.dns.netbox
- 22:07 vriley@cumin1002: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host an-worker1181
- 22:07 vriley@cumin1002: START - Cookbook sre.network.configure-switch-interfaces for host an-worker1181
- 22:06 vriley@cumin1002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
- 22:05 jhancock@cumin2002: START - Cookbook sre.hosts.reimage for host backup2014.codfw.wmnet with OS bookworm
- 22:04 vriley@cumin1002: START - Cookbook sre.dns.netbox
- 22:02 bking@cumin2002: START - Cookbook sre.hosts.reimage for host elastic1109.eqiad.wmnet with OS bullseye
- 22:01 bking@cumin2002: END (PASS) - Cookbook sre.hosts.rename (exit_code=0) from relforge1006 to elastic1109
- 22:01 tgr@deploy2002: Finished scap sync-world: Backport for Use session storage for session tick events (T387400), Update experiment name for Search AB test french wiki (T387400) (duration: 26m 04s)
- 22:00 bking@cumin2002: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host elastic1109
- 22:00 bking@cumin2002: START - Cookbook sre.network.configure-switch-interfaces for host elastic1109
- 22:00 bking@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
- 22:00 bking@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Renaming relforge1006 to elastic1109 - bking@cumin2002"
- 22:00 bking@cumin2002: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 16:00:00 on relforge[1003-1004,1006-1007].eqiad.wmnet with reason: T387782
- 21:59 bking@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Renaming relforge1006 to elastic1109 - bking@cumin2002"
- 21:56 bking@cumin2002: START - Cookbook sre.dns.netbox
- 21:55 bking@cumin2002: START - Cookbook sre.hosts.rename from relforge1006 to elastic1109
- 21:54 jhancock@cumin2002: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host backup2013.codfw.wmnet with OS bookworm
- 21:54 bking@cumin2002: START - Cookbook sre.hosts.reimage for host elastic1108.eqiad.wmnet with OS bullseye
- 21:53 tgr@deploy2002: bwang, tgr: Continuing with sync
- 21:52 bking@cumin2002: END (PASS) - Cookbook sre.hosts.rename (exit_code=0) from relforge1005 to elastic1108
- 21:51 bking@cumin2002: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host elastic1108
- 21:51 bking@cumin2002: START - Cookbook sre.network.configure-switch-interfaces for host elastic1108
- 21:51 bking@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
- 21:51 bking@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Renaming relforge1005 to elastic1108 - bking@cumin2002"
- 21:50 bking@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Renaming relforge1005 to elastic1108 - bking@cumin2002"
- 21:45 bking@cumin2002: START - Cookbook sre.dns.netbox
- 21:44 bking@cumin2002: START - Cookbook sre.hosts.rename from relforge1005 to elastic1108
- 21:44 bking@cumin2002: END (FAIL) - Cookbook sre.hosts.rename (exit_code=99) from relforge1006 to elastic1109
- 21:44 bking@cumin2002: END (ERROR) - Cookbook sre.dns.netbox (exit_code=97)
- 21:43 bking@cumin2002: END (FAIL) - Cookbook sre.hosts.rename (exit_code=99) from relforge1005 to elastic1108
- 21:43 bking@cumin2002: END (FAIL) - Cookbook sre.dns.netbox (exit_code=99)
- 21:42 vriley@cumin1002: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host an-worker1181.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED
- 21:42 vriley@cumin1002: START - Cookbook sre.hosts.provision for host an-worker1181.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED
- 21:42 bking@cumin2002: START - Cookbook sre.dns.netbox
- 21:41 vriley@cumin1002: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host an-worker1181.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED
- 21:41 vriley@cumin1002: START - Cookbook sre.hosts.provision for host an-worker1181.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED
- 21:39 ebernhardson@deploy2002: helmfile [eqiad] DONE helmfile.d/services/cirrus-streaming-updater: apply
- 21:39 ebernhardson@deploy2002: helmfile [eqiad] START helmfile.d/services/cirrus-streaming-updater: apply
- 21:38 bking@cumin2002: START - Cookbook sre.hosts.rename from relforge1006 to elastic1109
- 21:37 tgr@deploy2002: bwang, tgr: Backport for Use session storage for session tick events (T387400), Update experiment name for Search AB test french wiki (T387400) synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug)
- 21:37 bking@cumin2002: START - Cookbook sre.dns.netbox
- 21:36 bking@cumin2002: START - Cookbook sre.hosts.rename from relforge1005 to elastic1108
- 21:35 vriley@cumin1002: END (FAIL) - Cookbook sre.network.configure-switch-interfaces (exit_code=99) for host an-worker1181
- 21:35 vriley@cumin1002: START - Cookbook sre.network.configure-switch-interfaces for host an-worker1181
- 21:35 vriley@cumin1002: END (FAIL) - Cookbook sre.network.configure-switch-interfaces (exit_code=99) for host an-worker1181
- 21:35 vriley@cumin1002: START - Cookbook sre.network.configure-switch-interfaces for host an-worker1181
- 21:34 tgr@deploy2002: Started scap sync-world: Backport for Use session storage for session tick events (T387400), Update experiment name for Search AB test french wiki (T387400)
- 21:34 vriley@cumin1002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
- 21:34 vriley@cumin1002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: update mgmt an-worker1181 - vriley@cumin1002"
- 21:34 vriley@cumin1002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: update mgmt an-worker1181 - vriley@cumin1002"
- 21:30 swfrench@deploy2002: helmfile [staging] DONE helmfile.d/services/shellbox-media: apply
- 21:30 swfrench@deploy2002: helmfile [staging] START helmfile.d/services/shellbox-media: apply
- 21:30 vriley@cumin1002: START - Cookbook sre.dns.netbox
- 21:27 swfrench@deploy2002: helmfile [staging] DONE helmfile.d/services/shellbox-media: apply
- 21:26 swfrench@deploy2002: helmfile [staging] START helmfile.d/services/shellbox-media: apply
- 21:26 tgr@deploy2002: Finished scap sync-world: Backport for Remove unused config variable $wgJsonConfigInterwikiPrefix, Fix inconsistent definitions for $wmgLocalServices['chart-renderer'], Set $wgCentralAuthSharedDomainCallback (T387357) (duration: 10m 06s)
- 21:21 swfrench@deploy2002: helmfile [staging] DONE helmfile.d/services/shellbox-media: apply
- 21:21 swfrench@deploy2002: helmfile [staging] START helmfile.d/services/shellbox-media: apply
- 21:19 tgr@deploy2002: matmarex, tgr: Continuing with sync
- 21:19 tgr@deploy2002: matmarex, tgr: Backport for Remove unused config variable $wgJsonConfigInterwikiPrefix, Fix inconsistent definitions for $wmgLocalServices['chart-renderer'], Set $wgCentralAuthSharedDomainCallback (T387357) synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug)
- 21:16 tgr@deploy2002: Started scap sync-world: Backport for Remove unused config variable $wgJsonConfigInterwikiPrefix, Fix inconsistent definitions for $wmgLocalServices['chart-renderer'], Set $wgCentralAuthSharedDomainCallback (T387357)
- 21:10 jhancock@cumin2002: START - Cookbook sre.hosts.reimage for host backup2013.codfw.wmnet with OS bookworm
- 20:31 marostegui@cumin1002: dbctl commit (dc=all): 'db1203 (re)pooling @ 100%: Repooling after rebuild index', diff saved to https://phabricator.wikimedia.org/P74012 and previous config saved to /var/cache/conftool/dbconfig/20250303-203158-root.json
- 20:31 marostegui@cumin1002: dbctl commit (dc=all): 'db2162 (re)pooling @ 100%: Repooling after rebuild index', diff saved to https://phabricator.wikimedia.org/P74011 and previous config saved to /var/cache/conftool/dbconfig/20250303-203100-root.json
- 20:16 marostegui@cumin1002: dbctl commit (dc=all): 'db1203 (re)pooling @ 75%: Repooling after rebuild index', diff saved to https://phabricator.wikimedia.org/P74010 and previous config saved to /var/cache/conftool/dbconfig/20250303-201652-root.json
- 20:15 marostegui@cumin1002: dbctl commit (dc=all): 'db2162 (re)pooling @ 75%: Repooling after rebuild index', diff saved to https://phabricator.wikimedia.org/P74009 and previous config saved to /var/cache/conftool/dbconfig/20250303-201554-root.json
- 20:01 marostegui@cumin1002: dbctl commit (dc=all): 'db1203 (re)pooling @ 50%: Repooling after rebuild index', diff saved to https://phabricator.wikimedia.org/P74008 and previous config saved to /var/cache/conftool/dbconfig/20250303-200146-root.json
- 20:00 marostegui@cumin1002: dbctl commit (dc=all): 'db2162 (re)pooling @ 50%: Repooling after rebuild index', diff saved to https://phabricator.wikimedia.org/P74007 and previous config saved to /var/cache/conftool/dbconfig/20250303-200048-root.json
- 19:46 marostegui@cumin1002: dbctl commit (dc=all): 'db1203 (re)pooling @ 25%: Repooling after rebuild index', diff saved to https://phabricator.wikimedia.org/P74006 and previous config saved to /var/cache/conftool/dbconfig/20250303-194641-root.json
- 19:45 marostegui@cumin1002: dbctl commit (dc=all): 'db2162 (re)pooling @ 25%: Repooling after rebuild index', diff saved to https://phabricator.wikimedia.org/P74005 and previous config saved to /var/cache/conftool/dbconfig/20250303-194543-root.json
- 19:37 marostegui@cumin1002: dbctl commit (dc=all): 'db2155 (re)pooling @ 100%: Repooling after rebuild index', diff saved to https://phabricator.wikimedia.org/P74004 and previous config saved to /var/cache/conftool/dbconfig/20250303-193742-root.json
- 19:31 marostegui@cumin1002: dbctl commit (dc=all): 'db1203 (re)pooling @ 10%: Repooling after rebuild index', diff saved to https://phabricator.wikimedia.org/P74003 and previous config saved to /var/cache/conftool/dbconfig/20250303-193136-root.json
- 19:30 marostegui@cumin1002: dbctl commit (dc=all): 'db2162 (re)pooling @ 10%: Repooling after rebuild index', diff saved to https://phabricator.wikimedia.org/P74002 and previous config saved to /var/cache/conftool/dbconfig/20250303-193038-root.json
- 19:26 bking@cumin2002: END (PASS) - Cookbook sre.elasticsearch.ban (exit_code=0) Banning hosts: reforge1005*,relforge1006*,relforge1007* for ban hosts prior to revert - bking@cumin2002 - T387176
- 19:26 bking@cumin2002: START - Cookbook sre.elasticsearch.ban Banning hosts: reforge1005*,relforge1006*,relforge1007* for ban hosts prior to revert - bking@cumin2002 - T387176
- 19:22 marostegui@cumin1002: dbctl commit (dc=all): 'db2155 (re)pooling @ 75%: Repooling after rebuild index', diff saved to https://phabricator.wikimedia.org/P74001 and previous config saved to /var/cache/conftool/dbconfig/20250303-192237-root.json
- 19:15 marostegui@cumin1002: dbctl commit (dc=all): 'db1247 (re)pooling @ 100%: Repooling after rebuild index', diff saved to https://phabricator.wikimedia.org/P74000 and previous config saved to /var/cache/conftool/dbconfig/20250303-191513-root.json
- 19:08 swfrench-wmf: serving 10% of mw-api-int traffic on PHP 8.1 - T383845
- 19:07 swfrench@deploy2002: helmfile [codfw] DONE helmfile.d/services/mw-api-int: apply
- 19:07 marostegui@cumin1002: dbctl commit (dc=all): 'db2155 (re)pooling @ 50%: Repooling after rebuild index', diff saved to https://phabricator.wikimedia.org/P73999 and previous config saved to /var/cache/conftool/dbconfig/20250303-190732-root.json
- 19:07 swfrench@deploy2002: helmfile [codfw] START helmfile.d/services/mw-api-int: apply
- 19:07 swfrench@deploy2002: helmfile [eqiad] DONE helmfile.d/services/mw-api-int: apply
- 19:07 swfrench@deploy2002: helmfile [eqiad] START helmfile.d/services/mw-api-int: apply
- 19:06 swfrench@deploy2002: helmfile [codfw] DONE helmfile.d/services/mw-api-int: apply
- 19:06 swfrench@deploy2002: helmfile [codfw] START helmfile.d/services/mw-api-int: apply
- 19:05 swfrench@deploy2002: helmfile [eqiad] DONE helmfile.d/services/mw-api-int: apply
- 19:05 swfrench@deploy2002: helmfile [eqiad] START helmfile.d/services/mw-api-int: apply
- 19:05 swfrench@deploy2002: helmfile [eqiad] DONE helmfile.d/services/mw-api-int: apply
- 19:05 swfrench@deploy2002: helmfile [eqiad] START helmfile.d/services/mw-api-int: apply
- 19:03 dcausse@deploy2002: helmfile [eqiad] DONE helmfile.d/services/cirrus-streaming-updater: apply
- 19:02 dcausse@deploy2002: helmfile [eqiad] START helmfile.d/services/cirrus-streaming-updater: apply
- 19:00 marostegui@cumin1002: dbctl commit (dc=all): 'db1247 (re)pooling @ 75%: Repooling after rebuild index', diff saved to https://phabricator.wikimedia.org/P73998 and previous config saved to /var/cache/conftool/dbconfig/20250303-190007-root.json
- 18:52 marostegui@cumin1002: dbctl commit (dc=all): 'db2155 (re)pooling @ 25%: Repooling after rebuild index', diff saved to https://phabricator.wikimedia.org/P73997 and previous config saved to /var/cache/conftool/dbconfig/20250303-185227-root.json
- 18:45 marostegui@cumin1002: dbctl commit (dc=all): 'db1247 (re)pooling @ 50%: Repooling after rebuild index', diff saved to https://phabricator.wikimedia.org/P73996 and previous config saved to /var/cache/conftool/dbconfig/20250303-184501-root.json
- 18:44 dcausse@deploy2002: helmfile [codfw] DONE helmfile.d/services/cirrus-streaming-updater: apply
- 18:44 dcausse@deploy2002: helmfile [codfw] START helmfile.d/services/cirrus-streaming-updater: apply
- 18:44 dcausse@deploy2002: helmfile [eqiad] DONE helmfile.d/services/cirrus-streaming-updater: apply
- 18:43 dcausse@deploy2002: helmfile [eqiad] START helmfile.d/services/cirrus-streaming-updater: apply
- 18:37 marostegui@cumin1002: dbctl commit (dc=all): 'db2155 (re)pooling @ 10%: Repooling after rebuild index', diff saved to https://phabricator.wikimedia.org/P73995 and previous config saved to /var/cache/conftool/dbconfig/20250303-183721-root.json
- 18:33 swfrench@deploy2002: Finished scap sync-world: Backport for Enroll 100% of client sessions in PHP 8.1 (T383845) (duration: 11m 03s)
- 18:29 marostegui@cumin1002: dbctl commit (dc=all): 'db1247 (re)pooling @ 25%: Repooling after rebuild index', diff saved to https://phabricator.wikimedia.org/P73994 and previous config saved to /var/cache/conftool/dbconfig/20250303-182956-root.json
- 18:26 swfrench@deploy2002: swfrench: Continuing with sync
- 18:24 swfrench@deploy2002: swfrench: Backport for Enroll 100% of client sessions in PHP 8.1 (T383845) synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug)
- 18:22 swfrench@deploy2002: Started scap sync-world: Backport for Enroll 100% of client sessions in PHP 8.1 (T383845)
- 18:17 swfrench-wmf: scaled mw-(api-ext|web) next deployments to 40% of main size - T383845
- 18:16 swfrench@deploy2002: helmfile [codfw] DONE helmfile.d/services/mw-web: apply
- 18:16 swfrench@deploy2002: helmfile [codfw] START helmfile.d/services/mw-web: apply
- 18:15 swfrench@deploy2002: helmfile [eqiad] DONE helmfile.d/services/mw-web: apply
- 18:15 swfrench@deploy2002: helmfile [eqiad] START helmfile.d/services/mw-web: apply
- 18:14 marostegui@cumin1002: dbctl commit (dc=all): 'db1247 (re)pooling @ 10%: Repooling after rebuild index', diff saved to https://phabricator.wikimedia.org/P73993 and previous config saved to /var/cache/conftool/dbconfig/20250303-181451-root.json
- 18:13 swfrench@deploy2002: helmfile [codfw] DONE helmfile.d/services/mw-api-ext: apply
- 18:13 swfrench@deploy2002: helmfile [codfw] START helmfile.d/services/mw-api-ext: apply
- 18:12 swfrench@deploy2002: helmfile [eqiad] DONE helmfile.d/services/mw-api-ext: apply
- 18:12 swfrench@deploy2002: helmfile [eqiad] START helmfile.d/services/mw-api-ext: apply
- 18:03 elukey@cumin2002: START - Cookbook sre.hosts.provision for host puppetserver2004.mgmt.codfw.wmnet with chassis set policy GRACEFUL_RESTART
- 18:02 elukey@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host sretest2001.mgmt.codfw.wmnet with chassis set policy GRACEFUL_RESTART
- 18:02 elukey@cumin2002: START - Cookbook sre.hosts.provision for host sretest2001.mgmt.codfw.wmnet with chassis set policy GRACEFUL_RESTART
- 18:01 elukey@cumin2002: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host sretest2001.mgmt.codfw.wmnet with chassis set policy GRACEFUL_RESTART
- 18:01 elukey@cumin2002: START - Cookbook sre.hosts.provision for host sretest2001.mgmt.codfw.wmnet with chassis set policy GRACEFUL_RESTART
- 18:00 moritzm: repool maps2009 T387431
- 17:54 elukey@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host sretest2001.mgmt.codfw.wmnet with chassis set policy GRACEFUL_RESTART
- 17:54 elukey@cumin2002: START - Cookbook sre.hosts.provision for host sretest2001.mgmt.codfw.wmnet with chassis set policy GRACEFUL_RESTART
- 17:42 elukey@deploy2002: helmfile [ml-staging-codfw] DONE helmfile.d/admin 'sync'.
- 17:40 elukey@deploy2002: helmfile [ml-staging-codfw] START helmfile.d/admin 'sync'.
- 17:16 dancy@deploy2002: Installation of scap version "4.139.0" completed for 204 hosts
- 17:11 dancy@deploy2002: Installing scap version "4.139.0" for 204 host(s)
- 16:48 otto@deploy2002: helmfile [codfw] DONE helmfile.d/services/eventgate-logging-external: apply
- 16:47 elukey@cumin2002: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host sretest2001.mgmt.codfw.wmnet with chassis set policy GRACEFUL_RESTART
- 16:46 elukey@cumin2002: START - Cookbook sre.hosts.provision for host sretest2001.mgmt.codfw.wmnet with chassis set policy GRACEFUL_RESTART
- 16:43 otto@deploy2002: helmfile [codfw] START helmfile.d/services/eventgate-logging-external: apply
- 16:42 otto@deploy2002: helmfile [eqiad] DONE helmfile.d/services/eventgate-logging-external: apply
- 16:42 otto@deploy2002: helmfile [eqiad] START helmfile.d/services/eventgate-logging-external: apply
- 16:38 otto@deploy2002: helmfile [staging] DONE helmfile.d/services/eventgate-logging-external: apply
- 16:38 ottomata: deploying eventgate-logging-external to ACTUALLY bump to node20 - T383814
- 16:37 otto@deploy2002: helmfile [staging] START helmfile.d/services/eventgate-logging-external: apply
- 16:34 fceratto@cumin1002: END (FAIL) - Cookbook sre.mysql.clone (exit_code=99) of db1248.eqiad.wmnet onto db1252.eqiad.wmnet
- 16:26 fceratto@cumin1002: END (PASS) - Cookbook sre.mysql.pool (exit_code=0) db2167 gradually with 4 steps - Cloned db2166 to db2167
- 16:20 fceratto@cumin1002: END (PASS) - Cookbook sre.mysql.pool (exit_code=0) db2166 gradually with 4 steps - Cloned db2166 to db2167
- 16:18 moritzm: depool maps2009 T387431
- 16:10 swfrench-wmf: finished shellbox-media PHP 8.1 pilot - T377038
- 16:10 swfrench@deploy2002: helmfile [codfw] DONE helmfile.d/services/shellbox-media: apply
- 16:10 swfrench@deploy2002: helmfile [codfw] START helmfile.d/services/shellbox-media: apply
- 16:10 swfrench@deploy2002: helmfile [eqiad] DONE helmfile.d/services/shellbox-media: apply
- 16:10 swfrench@deploy2002: helmfile [eqiad] START helmfile.d/services/shellbox-media: apply
- 16:01 root@cumin1002: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db2155.codfw.wmnet with reason: Index rebuild
- 16:01 root@cumin1002: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db1247.eqiad.wmnet with reason: Index rebuild
- 16:01 marostegui@cumin1002: END (PASS) - Cookbook sre.mysql.upgrade (exit_code=0) for db2155.codfw.wmnet
- 16:00 marostegui@cumin1002: END (PASS) - Cookbook sre.mysql.upgrade (exit_code=0) for db1247.eqiad.wmnet
- 15:58 fceratto@cumin1002: END (FAIL) - Cookbook sre.mysql.pool (exit_code=99) db1252 gradually with 4 steps - Cloned db124 to db1252
- 15:58 fceratto@cumin1002: START - Cookbook sre.mysql.pool db1252 gradually with 4 steps - Cloned db124 to db1252
- 15:58 fceratto@cumin1002: END (FAIL) - Cookbook sre.mysql.pool (exit_code=99) db1252 gradually with 4 steps - Cloned db124 to db1252
- 15:58 fceratto@cumin1002: START - Cookbook sre.mysql.pool db1252 gradually with 4 steps - Cloned db124 to db1252
- 15:55 marostegui@cumin1002: START - Cookbook sre.mysql.upgrade for db2155.codfw.wmnet
- 15:54 marostegui@cumin1002: START - Cookbook sre.mysql.upgrade for db1247.eqiad.wmnet
- 15:54 marostegui@cumin1002: dbctl commit (dc=all): 'Depool db1247 db2155', diff saved to https://phabricator.wikimedia.org/P73985 and previous config saved to /var/cache/conftool/dbconfig/20250303-155447-marostegui.json
- 15:53 marostegui@cumin1002: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db[2155,2187].codfw.wmnet with reason: Rebuilding indexes
- 15:51 swfrench-wmf: started shellbox-media PHP 8.1 pilot with increased logging - T377038
- 15:50 swfrench@deploy2002: helmfile [codfw] DONE helmfile.d/services/shellbox-media: apply
- 15:50 swfrench@deploy2002: helmfile [codfw] START helmfile.d/services/shellbox-media: apply
- 15:47 swfrench@deploy2002: helmfile [eqiad] DONE helmfile.d/services/shellbox-media: apply
- 15:47 swfrench@deploy2002: helmfile [eqiad] START helmfile.d/services/shellbox-media: apply
- 15:42 otto@deploy2002: helmfile [codfw] DONE helmfile.d/services/eventgate-logging-external: apply
- 15:41 otto@deploy2002: helmfile [codfw] START helmfile.d/services/eventgate-logging-external: apply
- 15:41 fceratto@cumin1002: END (PASS) - Cookbook sre.mysql.pool (exit_code=0) db1248 gradually with 4 steps - Cloning db1252.eqiad.wmnet completed
- 15:40 fceratto@cumin1002: START - Cookbook sre.mysql.pool db2167 gradually with 4 steps - Cloned db2166 to db2167
- 15:38 otto@deploy2002: helmfile [eqiad] DONE helmfile.d/services/eventgate-logging-external: apply
- 15:37 otto@deploy2002: helmfile [eqiad] START helmfile.d/services/eventgate-logging-external: apply
- 15:36 ottomata: deploying eventgate-logging-external to bump to node20 - T383814
- 15:36 otto@deploy2002: helmfile [staging] DONE helmfile.d/services/eventgate-logging-external: apply
- 15:36 otto@deploy2002: helmfile [staging] START helmfile.d/services/eventgate-logging-external: apply
- 15:35 fceratto@cumin1002: START - Cookbook sre.mysql.pool db2166 gradually with 4 steps - Cloned db2166 to db2167
- 15:24 ihurbain: UTC afternoon deploys done
- 15:18 elukey@cumin1002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host puppetserver2004.codfw.wmnet with OS bookworm
- 15:11 fceratto@cumin1002: START - Cookbook sre.mysql.pool db1248 gradually with 4 steps - Cloning db1252.eqiad.wmnet completed
- 15:11 marostegui@cumin1002: dbctl commit (dc=all): 'db2206 (re)pooling @ 100%: Repooling after rebuild index', diff saved to https://phabricator.wikimedia.org/P73978 and previous config saved to /var/cache/conftool/dbconfig/20250303-151113-root.json
- 15:11 marostegui@cumin1002: dbctl commit (dc=all): 'db1249 (re)pooling @ 100%: Repooling after rebuild index', diff saved to https://phabricator.wikimedia.org/P73977 and previous config saved to /var/cache/conftool/dbconfig/20250303-151107-root.json
- 15:11 fceratto@cumin1002: dbctl commit (dc=all): 'Pooling in after cloning to db1252 T385141', diff saved to https://phabricator.wikimedia.org/P73976 and previous config saved to /var/cache/conftool/dbconfig/20250303-151103-fceratto.json
- 15:09 ihurbain@deploy2002: Finished scap sync-world: Backport for Remove $wmgUseGraphWithJsonNamespace (T124748) (duration: 11m 55s)
- 15:03 elukey@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on puppetserver2004.codfw.wmnet with reason: host reimage
- 15:02 ihurbain@deploy2002: matmarex, ihurbain: Continuing with sync
- 15:00 fceratto@cumin1002: END (FAIL) - Cookbook sre.mysql.pool (exit_code=99) db1248 gradually with 4 steps - Cloning db1252.eqiad.wmnet completed
- 15:00 fceratto@cumin1002: START - Cookbook sre.mysql.pool db1248 gradually with 4 steps - Cloning db1252.eqiad.wmnet completed
- 14:59 ihurbain@deploy2002: matmarex, ihurbain: Backport for Remove $wmgUseGraphWithJsonNamespace (T124748) synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug)
- 14:57 ihurbain@deploy2002: Started scap sync-world: Backport for Remove $wmgUseGraphWithJsonNamespace (T124748)
- 14:56 elukey@cumin1002: START - Cookbook sre.hosts.downtime for 2:00:00 on puppetserver2004.codfw.wmnet with reason: host reimage
- 14:54 ihurbain@deploy2002: Finished scap sync-world: Backport for Change license for Russian Wikinews to CC-BY-4.0 (T387279), Revert "Turn on Parsoid fragment support everywhere" (T387608) (duration: 11m 39s)
- 14:52 bking@cumin2002: END (PASS) - Cookbook sre.elasticsearch.ban (exit_code=0) Unbanning all hosts in search_eqiad
- 14:52 bking@cumin2002: START - Cookbook sre.elasticsearch.ban Unbanning all hosts in search_eqiad
- 14:49 marostegui@cumin1002: dbctl commit (dc=all): 'db2206 (re)pooling @ 75%: Repooling after rebuild index', diff saved to https://phabricator.wikimedia.org/P73974 and previous config saved to /var/cache/conftool/dbconfig/20250303-144930-root.json
- 14:48 marostegui@cumin1002: dbctl commit (dc=all): 'db1249 (re)pooling @ 75%: Repooling after rebuild index', diff saved to https://phabricator.wikimedia.org/P73973 and previous config saved to /var/cache/conftool/dbconfig/20250303-144805-root.json
- 14:48 ihurbain@deploy2002: matmarex, ssastry, ihurbain: Continuing with sync
- 14:46 ihurbain@deploy2002: matmarex, ssastry, ihurbain: Backport for Change license for Russian Wikinews to CC-BY-4.0 (T387279), Revert "Turn on Parsoid fragment support everywhere" (T387608) synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug)
- 14:44 bking@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host elastic1066.eqiad.wmnet
- 14:44 elukey@cumin1002: START - Cookbook sre.hosts.reimage for host puppetserver2004.codfw.wmnet with OS bookworm
- 14:42 ihurbain@deploy2002: Started scap sync-world: Backport for Change license for Russian Wikinews to CC-BY-4.0 (T387279), Revert "Turn on Parsoid fragment support everywhere" (T387608)
- 14:42 elukey@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host puppetserver2004.mgmt.codfw.wmnet with chassis set policy GRACEFUL_RESTART
- 14:22 lucaswerkmeister-wmde@deploy2002: Started scap sync-world: Backport for Enable fixed Wikibase RDF everywhere (T384344)
- 14:21 bking@cumin2002: END (PASS) - Cookbook sre.elasticsearch.ban (exit_code=0) Banning hosts: elastic1066* for ban elastic1066 to hopefully stop rejections - bking@cumin2002 - T387176
- 14:21 bking@cumin2002: START - Cookbook sre.elasticsearch.ban Banning hosts: elastic1066* for ban elastic1066 to hopefully stop rejections - bking@cumin2002 - T387176
- 14:21 lucaswerkmeister-wmde@deploy2002: Finished scap sync-world: Backport for Set Transwiki namespace on zhwikivoyage and zhwikiversity (T387055) (duration: 14m 02s)
- 14:19 marostegui@cumin1002: dbctl commit (dc=all): 'db2206 (re)pooling @ 25%: Repooling after rebuild index', diff saved to https://phabricator.wikimedia.org/P73970 and previous config saved to /var/cache/conftool/dbconfig/20250303-141919-root.json
- 14:17 marostegui@cumin1002: dbctl commit (dc=all): 'db1249 (re)pooling @ 25%: Repooling after rebuild index', diff saved to https://phabricator.wikimedia.org/P73969 and previous config saved to /var/cache/conftool/dbconfig/20250303-141754-root.json
- 14:14 marostegui@cumin1002: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db2162.codfw.wmnet with reason: Index rebuild
- 14:14 marostegui@cumin1002: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db1203.eqiad.wmnet with reason: Index rebuild
- 14:14 marostegui@cumin1002: END (PASS) - Cookbook sre.mysql.upgrade (exit_code=0) for db2162.codfw.wmnet
- 14:13 marostegui@cumin1002: dbctl commit (dc=all): 'db2164 (re)pooling @ 100%: Repooling after rebuild index', diff saved to https://phabricator.wikimedia.org/P73968 and previous config saved to /var/cache/conftool/dbconfig/20250303-141350-root.json
- 14:13 marostegui@cumin1002: END (PASS) - Cookbook sre.mysql.upgrade (exit_code=0) for db1203.eqiad.wmnet
- 14:13 lucaswerkmeister-wmde@deploy2002: lucaswerkmeister-wmde, sdhehua: Continuing with sync
- 14:12 lucaswerkmeister-wmde@deploy2002: lucaswerkmeister-wmde, sdhehua: Backport for Set Transwiki namespace on zhwikivoyage and zhwikiversity (T387055) synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug)
- 14:07 marostegui@cumin1002: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on dbstore1009.eqiad.wmnet with reason: Rebuilding indexes
- 14:07 lucaswerkmeister-wmde@deploy2002: Started scap sync-world: Backport for Set Transwiki namespace on zhwikivoyage and zhwikiversity (T387055)
- 14:07 marostegui@cumin1002: START - Cookbook sre.mysql.upgrade for db2162.codfw.wmnet
- 14:06 marostegui@cumin1002: START - Cookbook sre.mysql.upgrade for db1203.eqiad.wmnet
- 14:06 marostegui@cumin1002: dbctl commit (dc=all): 'Depool db2162 db1203', diff saved to https://phabricator.wikimedia.org/P73966 and previous config saved to /var/cache/conftool/dbconfig/20250303-140638-marostegui.json
- 14:04 marostegui@cumin1002: dbctl commit (dc=all): 'db2206 (re)pooling @ 10%: Repooling after rebuild index', diff saved to https://phabricator.wikimedia.org/P73965 and previous config saved to /var/cache/conftool/dbconfig/20250303-140414-root.json
- 14:03 marostegui@cumin1002: dbctl commit (dc=all): 'db1172 (re)pooling @ 100%: Repooling after rebuild index', diff saved to https://phabricator.wikimedia.org/P73964 and previous config saved to /var/cache/conftool/dbconfig/20250303-140309-root.json
- 14:02 marostegui@cumin1002: dbctl commit (dc=all): 'db1249 (re)pooling @ 10%: Repooling after rebuild index', diff saved to https://phabricator.wikimedia.org/P73963 and previous config saved to /var/cache/conftool/dbconfig/20250303-140249-root.json
- 13:58 marostegui@cumin1002: dbctl commit (dc=all): 'db2164 (re)pooling @ 75%: Repooling after rebuild index', diff saved to https://phabricator.wikimedia.org/P73962 and previous config saved to /var/cache/conftool/dbconfig/20250303-135845-root.json
- 13:56 kevinbazira@deploy2002: helmfile [ml-serve-codfw] Ran 'sync' command on namespace 'article-models' for release 'main' .
- 13:55 kevinbazira@deploy2002: helmfile [ml-serve-eqiad] Ran 'sync' command on namespace 'article-models' for release 'main' .
- 13:54 fceratto@cumin1002: END (FAIL) - Cookbook sre.mysql.clone (exit_code=99) of db2166.codfw.wmnet onto db2167.codfw.wmnet
- 13:48 marostegui@cumin1002: dbctl commit (dc=all): 'db1172 (re)pooling @ 75%: Repooling after rebuild index', diff saved to https://phabricator.wikimedia.org/P73961 and previous config saved to /var/cache/conftool/dbconfig/20250303-134804-root.json
- 13:43 marostegui@cumin1002: dbctl commit (dc=all): 'db2164 (re)pooling @ 50%: Repooling after rebuild index', diff saved to https://phabricator.wikimedia.org/P73960 and previous config saved to /var/cache/conftool/dbconfig/20250303-134340-root.json
- 13:37 cgoubert@deploy2002: Finished scap sync-world: Deploying 1116800 1122563 (duration: 02m 15s)
- 13:37 marostegui@cumin1002: dbctl commit (dc=all): 'db1220 (re)pooling @ 100%: Repooling', diff saved to https://phabricator.wikimedia.org/P73959 and previous config saved to /var/cache/conftool/dbconfig/20250303-133713-root.json
- 13:35 cgoubert@deploy2002: Started scap sync-world: Deploying 1116800 1122563
- 13:32 marostegui@cumin1002: dbctl commit (dc=all): 'db1172 (re)pooling @ 50%: Repooling after rebuild index', diff saved to https://phabricator.wikimedia.org/P73958 and previous config saved to /var/cache/conftool/dbconfig/20250303-133258-root.json
- 13:28 marostegui@cumin1002: dbctl commit (dc=all): 'db2164 (re)pooling @ 25%: Repooling after rebuild index', diff saved to https://phabricator.wikimedia.org/P73957 and previous config saved to /var/cache/conftool/dbconfig/20250303-132834-root.json
- 13:24 moritzm: failover Ganeti master in eqiad to ganeti1048 T382507
- 13:22 marostegui@cumin1002: dbctl commit (dc=all): 'db1220 (re)pooling @ 75%: Repooling', diff saved to https://phabricator.wikimedia.org/P73956 and previous config saved to /var/cache/conftool/dbconfig/20250303-132207-root.json
- 13:17 marostegui@cumin1002: dbctl commit (dc=all): 'db1172 (re)pooling @ 25%: Repooling after rebuild index', diff saved to https://phabricator.wikimedia.org/P73955 and previous config saved to /var/cache/conftool/dbconfig/20250303-131752-root.json
- 13:17 tgr_: undid arbcom_ruwiki block of CirrusSearch_Streaming_Updater via blockUser.php
- 13:15 cgoubert@deploy2002: helmfile [codfw] DONE helmfile.d/admin 'apply'.
- 13:14 cgoubert@deploy2002: helmfile [codfw] START helmfile.d/admin 'apply'.
- 13:14 cgoubert@deploy2002: helmfile [eqiad] DONE helmfile.d/admin 'apply'.
- 13:13 marostegui@cumin1002: dbctl commit (dc=all): 'db2164 (re)pooling @ 10%: Repooling after rebuild index', diff saved to https://phabricator.wikimedia.org/P73954 and previous config saved to /var/cache/conftool/dbconfig/20250303-131329-root.json
- 13:12 cgoubert@deploy2002: helmfile [eqiad] START helmfile.d/admin 'apply'.
- 13:12 cgoubert@deploy2002: helmfile [ml-serve-codfw] DONE helmfile.d/admin 'apply'.
- 13:10 cgoubert@deploy2002: helmfile [ml-serve-codfw] START helmfile.d/admin 'apply'.
- 13:10 cgoubert@deploy2002: helmfile [ml-serve-eqiad] DONE helmfile.d/admin 'apply'.
- 13:07 cgoubert@deploy2002: helmfile [ml-serve-eqiad] START helmfile.d/admin 'apply'.
- 13:07 cgoubert@deploy2002: helmfile [aux-k8s-eqiad] DONE helmfile.d/admin 'apply'.
- 13:07 marostegui@cumin1002: dbctl commit (dc=all): 'db1220 (re)pooling @ 50%: Repooling', diff saved to https://phabricator.wikimedia.org/P73953 and previous config saved to /var/cache/conftool/dbconfig/20250303-130702-root.json
- 13:06 cgoubert@deploy2002: helmfile [aux-k8s-eqiad] START helmfile.d/admin 'apply'.
- 13:02 marostegui@cumin1002: dbctl commit (dc=all): 'db1172 (re)pooling @ 10%: Repooling after rebuild index', diff saved to https://phabricator.wikimedia.org/P73952 and previous config saved to /var/cache/conftool/dbconfig/20250303-130247-root.json
- 13:01 cgoubert@deploy2002: helmfile [staging-codfw] DONE helmfile.d/admin 'apply'.
- 13:01 cgoubert@deploy2002: helmfile [staging-codfw] START helmfile.d/admin 'apply'.
- 12:58 cgoubert@deploy2002: helmfile [staging-codfw] DONE helmfile.d/admin 'apply'.
- 12:58 cgoubert@deploy2002: helmfile [staging-codfw] START helmfile.d/admin 'apply'.
- 12:56 cgoubert@deploy2002: helmfile [staging-codfw] DONE helmfile.d/admin 'apply'.
- 12:55 cgoubert@deploy2002: helmfile [staging-codfw] START helmfile.d/admin 'apply'.
- 12:55 cgoubert@deploy2002: helmfile [staging-eqiad] DONE helmfile.d/admin 'apply'.
- 12:53 cgoubert@deploy2002: helmfile [staging-eqiad] START helmfile.d/admin 'apply'.
- 12:51 marostegui@cumin1002: dbctl commit (dc=all): 'db1220 (re)pooling @ 25%: Repooling', diff saved to https://phabricator.wikimedia.org/P73951 and previous config saved to /var/cache/conftool/dbconfig/20250303-125156-root.json
- 12:36 marostegui@cumin1002: dbctl commit (dc=all): 'db1220 (re)pooling @ 10%: Repooling', diff saved to https://phabricator.wikimedia.org/P73950 and previous config saved to /var/cache/conftool/dbconfig/20250303-123651-root.json
- 12:33 marostegui@cumin1002: END (PASS) - Cookbook sre.mysql.upgrade (exit_code=0) for db1220.eqiad.wmnet
- 12:29 marostegui@cumin1002: START - Cookbook sre.mysql.upgrade for db1220.eqiad.wmnet
- 12:26 marostegui@cumin1002: dbctl commit (dc=all): 'db1233 (re)pooling @ 100%: Repooling', diff saved to https://phabricator.wikimedia.org/P73949 and previous config saved to /var/cache/conftool/dbconfig/20250303-122609-root.json
- 12:24 marostegui@cumin1002: dbctl commit (dc=all): 'Depool db1220 T387557', diff saved to https://phabricator.wikimedia.org/P73948 and previous config saved to /var/cache/conftool/dbconfig/20250303-122437-marostegui.json
- 12:23 marostegui@cumin1002: dbctl commit (dc=all): 'Promote db1237 to x1 primary T387557', diff saved to https://phabricator.wikimedia.org/P73947 and previous config saved to /var/cache/conftool/dbconfig/20250303-122304-root.json
- 12:22 marostegui: Starting x1 eqiad failover from db1220 to db1237 - T387557
- 12:17 vgutierrez@cumin1002: END (PASS) - Cookbook sre.loadbalancer.migrate-service-ipip (exit_code=0) for role: restbase::production@eqiad
- 12:17 vgutierrez@cumin1002: END (PASS) - Cookbook sre.loadbalancer.restart-pybal (exit_code=0) rolling-restart of pybal on (A:lvs-low-traffic-eqiad or A:lvs-secondary-eqiad) and A:bullseye and A:lvs
- 12:16 marostegui@cumin1002: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on 13 hosts with reason: Primary switchover x1 T387557
- 12:16 marostegui@cumin1002: dbctl commit (dc=all): 'Set db1237 with weight 0 T387557', diff saved to https://phabricator.wikimedia.org/P73946 and previous config saved to /var/cache/conftool/dbconfig/20250303-121623-root.json
- 12:16 vgutierrez@cumin1002: START - Cookbook sre.loadbalancer.restart-pybal rolling-restart of pybal on (A:lvs-low-traffic-eqiad or A:lvs-secondary-eqiad) and A:bullseye and A:lvs
- 12:11 marostegui@cumin1002: dbctl commit (dc=all): 'db1233 (re)pooling @ 75%: Repooling', diff saved to https://phabricator.wikimedia.org/P73945 and previous config saved to /var/cache/conftool/dbconfig/20250303-121104-root.json
- 12:09 vgutierrez@cumin1002: START - Cookbook sre.loadbalancer.migrate-service-ipip for role: restbase::production@eqiad
- 12:08 vgutierrez@cumin1002: END (PASS) - Cookbook sre.loadbalancer.migrate-service-ipip (exit_code=0) for role: wmcs::openstack::eqiad1::cloudweb@eqiad
- 12:08 vgutierrez@cumin1002: END (PASS) - Cookbook sre.loadbalancer.restart-pybal (exit_code=0) rolling-restart of pybal on (A:lvs-low-traffic-eqiad or A:lvs-secondary-eqiad) and A:bullseye and A:lvs
- 12:07 vgutierrez@cumin1002: START - Cookbook sre.loadbalancer.restart-pybal rolling-restart of pybal on (A:lvs-low-traffic-eqiad or A:lvs-secondary-eqiad) and A:bullseye and A:lvs
- 12:03 vgutierrez@cumin1002: START - Cookbook sre.loadbalancer.migrate-service-ipip for role: wmcs::openstack::eqiad1::cloudweb@eqiad
- 12:02 jgiannelos@deploy2002: helmfile [eqiad] DONE helmfile.d/services/mobileapps: apply
- 12:01 jgiannelos@deploy2002: helmfile [eqiad] START helmfile.d/services/mobileapps: apply
- 12:01 jgiannelos@deploy2002: helmfile [codfw] DONE helmfile.d/services/mobileapps: apply
- 12:00 jgiannelos@deploy2002: helmfile [codfw] START helmfile.d/services/mobileapps: apply
- 12:00 vgutierrez@cumin1002: END (PASS) - Cookbook sre.loadbalancer.migrate-service-ipip (exit_code=0) for role: restbase::production@codfw
- 12:00 vgutierrez@cumin1002: END (PASS) - Cookbook sre.loadbalancer.restart-pybal (exit_code=0) rolling-restart of pybal on (A:lvs-low-traffic-codfw or A:lvs-secondary-codfw) and A:bullseye and A:lvs
- 12:00 jgiannelos@deploy2002: helmfile [staging] DONE helmfile.d/services/mobileapps: apply
- 11:59 vgutierrez@cumin1002: START - Cookbook sre.loadbalancer.restart-pybal rolling-restart of pybal on (A:lvs-low-traffic-codfw or A:lvs-secondary-codfw) and A:bullseye and A:lvs
- 11:59 jayme: Imported helmfile 0.171.0-2 and helm-diff 3.10.0-1 to bullseye-wikimedia and bookworm-wikimedia - T341984 T387376
- 11:58 jgiannelos@deploy2002: helmfile [staging] START helmfile.d/services/mobileapps: apply
- 11:57 elukey@deploy2002: helmfile [ml-staging-codfw] DONE helmfile.d/admin 'sync'.
- 11:56 elukey@deploy2002: helmfile [ml-staging-codfw] START helmfile.d/admin 'sync'.
- 11:56 marostegui@cumin1002: dbctl commit (dc=all): 'db1233 (re)pooling @ 50%: Repooling', diff saved to https://phabricator.wikimedia.org/P73944 and previous config saved to /var/cache/conftool/dbconfig/20250303-115559-root.json
- 11:52 kevinbazira@deploy2002: helmfile [ml-staging-codfw] Ran 'sync' command on namespace 'article-models' for release 'main' .
- 11:52 vgutierrez@cumin1002: START - Cookbook sre.loadbalancer.migrate-service-ipip for role: restbase::production@codfw
- 11:49 jayme@deploy1003: helmfile [staging] DONE helmfile.d/services/ipoid: apply
- 11:49 root@cumin1002: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db2206.codfw.wmnet with reason: Index rebuild
- 11:48 marostegui@cumin1002: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db1249.eqiad.wmnet with reason: Index rebuild
- 11:48 jayme@deploy1003: helmfile [staging] START helmfile.d/services/ipoid: apply
- 11:48 fceratto@cumin1002: START - Cookbook sre.mysql.clone of db2166.codfw.wmnet onto db2167.codfw.wmnet
- 11:45 marostegui@cumin1002: dbctl commit (dc=all): 'db1190 (re)pooling @ 100%: Repooling after rebuild index', diff saved to https://phabricator.wikimedia.org/P73943 and previous config saved to /var/cache/conftool/dbconfig/20250303-114500-root.json
- 11:43 fceratto@cumin1002: END (FAIL) - Cookbook sre.mysql.clone (exit_code=99) of db2166.codfw.wmnet onto db2167.codfw.wmnet
- 11:42 fceratto@cumin1002: START - Cookbook sre.mysql.clone of db2166.codfw.wmnet onto db2167.codfw.wmnet
- 11:42 vgutierrez@cumin1002: END (PASS) - Cookbook sre.loadbalancer.migrate-service-ipip (exit_code=0) for role: mediawiki::jobrunner@eqiad
- 11:42 vgutierrez@cumin1002: END (PASS) - Cookbook sre.loadbalancer.restart-pybal (exit_code=0) rolling-restart of pybal on (A:lvs-low-traffic-eqiad or A:lvs-secondary-eqiad) and A:bullseye and A:lvs
- 11:41 vgutierrez@cumin1002: START - Cookbook sre.loadbalancer.restart-pybal rolling-restart of pybal on (A:lvs-low-traffic-eqiad or A:lvs-secondary-eqiad) and A:bullseye and A:lvs
- 11:40 marostegui@cumin1002: dbctl commit (dc=all): 'db1233 (re)pooling @ 25%: Repooling', diff saved to https://phabricator.wikimedia.org/P73942 and previous config saved to /var/cache/conftool/dbconfig/20250303-114054-root.json
- 11:38 marostegui@cumin1002: END (PASS) - Cookbook sre.mysql.upgrade (exit_code=0) for db1249.eqiad.wmnet
- 11:38 marostegui@cumin1002: END (PASS) - Cookbook sre.mysql.upgrade (exit_code=0) for db2206.codfw.wmnet
- 11:37 vgutierrez@cumin1002: START - Cookbook sre.loadbalancer.migrate-service-ipip for role: mediawiki::jobrunner@eqiad
- 11:32 marostegui@cumin1002: START - Cookbook sre.mysql.upgrade for db1249.eqiad.wmnet
- 11:32 marostegui@cumin1002: START - Cookbook sre.mysql.upgrade for db2206.codfw.wmnet
- 11:32 marostegui@cumin1002: dbctl commit (dc=all): 'Depool db2206 db1249', diff saved to https://phabricator.wikimedia.org/P73941 and previous config saved to /var/cache/conftool/dbconfig/20250303-113225-root.json
- 11:29 marostegui@cumin1002: dbctl commit (dc=all): 'db1190 (re)pooling @ 75%: Repooling after rebuild index', diff saved to https://phabricator.wikimedia.org/P73940 and previous config saved to /var/cache/conftool/dbconfig/20250303-112954-root.json
- 11:25 marostegui@cumin1002: dbctl commit (dc=all): 'db1233 (re)pooling @ 10%: Repooling', diff saved to https://phabricator.wikimedia.org/P73939 and previous config saved to /var/cache/conftool/dbconfig/20250303-112548-root.json
- 11:18 fceratto@cumin1002: END (FAIL) - Cookbook sre.mysql.clone (exit_code=99) of db2166.codfw.wmnet onto db2167.codfw.wmnet
- 11:17 elukey@deploy2002: helmfile [ml-serve-eqiad] DONE helmfile.d/admin 'sync'.
- 11:17 elukey@deploy2002: helmfile [ml-serve-eqiad] START helmfile.d/admin 'sync'.
- 11:14 marostegui@cumin1002: dbctl commit (dc=all): 'db1190 (re)pooling @ 50%: Repooling after rebuild index', diff saved to https://phabricator.wikimedia.org/P73938 and previous config saved to /var/cache/conftool/dbconfig/20250303-111448-root.json
- 11:12 elukey@deploy2002: helmfile [ml-serve-codfw] DONE helmfile.d/admin 'sync'.
- 11:11 elukey@deploy2002: helmfile [ml-serve-codfw] START helmfile.d/admin 'sync'.
- 11:11 vgutierrez@cumin1002: END (PASS) - Cookbook sre.loadbalancer.migrate-service-ipip (exit_code=0) for role: mediawiki::jobrunner@codfw
- 11:11 vgutierrez@cumin1002: END (PASS) - Cookbook sre.loadbalancer.restart-pybal (exit_code=0) rolling-restart of pybal on (A:lvs-low-traffic-codfw or A:lvs-secondary-codfw) and A:bullseye and A:lvs
- 11:10 elukey@deploy2002: helmfile [ml-staging-codfw] DONE helmfile.d/admin 'sync'.
- 11:09 elukey@deploy2002: helmfile [ml-staging-codfw] START helmfile.d/admin 'sync'.
- 11:08 marostegui@cumin1002: dbctl commit (dc=all): 'db2210 (re)pooling @ 100%: Repooling after rebuild index', diff saved to https://phabricator.wikimedia.org/P73937 and previous config saved to /var/cache/conftool/dbconfig/20250303-110830-root.json
- 11:05 vgutierrez@cumin1002: START - Cookbook sre.loadbalancer.restart-pybal rolling-restart of pybal on (A:lvs-low-traffic-codfw or A:lvs-secondary-codfw) and A:bullseye and A:lvs
- 11:01 vgutierrez@cumin1002: START - Cookbook sre.loadbalancer.migrate-service-ipip for role: mediawiki::jobrunner@codfw
- 10:59 marostegui@cumin1002: dbctl commit (dc=all): 'db1190 (re)pooling @ 25%: Repooling after rebuild index', diff saved to https://phabricator.wikimedia.org/P73936 and previous config saved to /var/cache/conftool/dbconfig/20250303-105943-root.json
- 10:58 fceratto@cumin1002: END (PASS) - Cookbook sre.mysql.depool (exit_code=0) db2166 - catching up replication
- 10:58 fceratto@cumin1002: START - Cookbook sre.mysql.depool db2166 - catching up replication
- 10:54 elukey@deploy2002: helmfile [ml-staging-codfw] DONE helmfile.d/admin 'sync'.
- 10:53 marostegui@cumin1002: dbctl commit (dc=all): 'db2210 (re)pooling @ 75%: Repooling after rebuild index', diff saved to https://phabricator.wikimedia.org/P73935 and previous config saved to /var/cache/conftool/dbconfig/20250303-105325-root.json
- 10:52 elukey@deploy2002: helmfile [ml-staging-codfw] START helmfile.d/admin 'sync'.
- 10:51 marostegui@cumin1002: END (PASS) - Cookbook sre.mysql.clone (exit_code=0) of db1233.eqiad.wmnet onto db1246.eqiad.wmnet
- 10:44 marostegui@cumin1002: dbctl commit (dc=all): 'db1190 (re)pooling @ 10%: Repooling after rebuild index', diff saved to https://phabricator.wikimedia.org/P73934 and previous config saved to /var/cache/conftool/dbconfig/20250303-104438-root.json
- 10:40 ayounsi@cumin1002: END (PASS) - Cookbook sre.network.cf (exit_code=0)
- 10:40 ayounsi@cumin1002: START - Cookbook sre.network.cf
- 10:38 marostegui@cumin1002: dbctl commit (dc=all): 'db2210 (re)pooling @ 50%: Repooling after rebuild index', diff saved to https://phabricator.wikimedia.org/P73933 and previous config saved to /var/cache/conftool/dbconfig/20250303-103820-root.json
- 10:34 fceratto@cumin1002: START - Cookbook sre.mysql.clone of db1248.eqiad.wmnet onto db1252.eqiad.wmnet
- 10:28 fceratto@cumin1002: END (ERROR) - Cookbook sre.mysql.clone (exit_code=97) of db1248.eqiad.wmnet onto db1252.eqiad.wmnet
- 10:26 hashar: Upgraded scap to 4.139.0 # T303828
- 10:26 hashar@deploy2002: Installation of scap version "4.139.0" completed for 204 hosts
- 10:21 hashar@deploy2002: Installing scap version "4.139.0" for 204 host(s)
- 10:21 marostegui@cumin1002: dbctl commit (dc=all): 'db2210 (re)pooling @ 25%: Repooling after rebuild index', diff saved to https://phabricator.wikimedia.org/P73931 and previous config saved to /var/cache/conftool/dbconfig/20250303-102109-root.json
- 10:06 marostegui@cumin1002: dbctl commit (dc=all): 'db2210 (re)pooling @ 10%: Repooling after rebuild index', diff saved to https://phabricator.wikimedia.org/P73930 and previous config saved to /var/cache/conftool/dbconfig/20250303-100603-root.json
- 09:46 mvernon@cumin1002: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 7 days, 0:00:00 on ms-be1080.eqiad.wmnet with reason: disk failed
- 09:45 vgutierrez@cumin1002: END (PASS) - Cookbook sre.loadbalancer.migrate-service-ipip (exit_code=0) for role: docker_registry_ha::registry@codfw
- 09:45 vgutierrez@cumin1002: END (PASS) - Cookbook sre.loadbalancer.restart-pybal (exit_code=0) rolling-restart of pybal on (A:lvs-low-traffic-codfw or A:lvs-secondary-codfw) and A:bullseye and A:lvs
- 09:44 jmm@cumin2002: END (FAIL) - Cookbook sre.ganeti.addnode (exit_code=99) for new host ganeti1030.eqiad.wmnet to cluster eqiad and group A
- 09:44 vgutierrez@cumin1002: START - Cookbook sre.loadbalancer.restart-pybal rolling-restart of pybal on (A:lvs-low-traffic-codfw or A:lvs-secondary-codfw) and A:bullseye and A:lvs
- 09:43 jmm@cumin2002: START - Cookbook sre.ganeti.addnode for new host ganeti1030.eqiad.wmnet to cluster eqiad and group A
- 09:43 jmm@cumin2002: END (FAIL) - Cookbook sre.ganeti.addnode (exit_code=99) for new host ganeti1027.eqiad.wmnet to cluster eqiad and group A
- 09:43 jmm@cumin2002: START - Cookbook sre.ganeti.addnode for new host ganeti1027.eqiad.wmnet to cluster eqiad and group A
- 09:38 vgutierrez@cumin1002: START - Cookbook sre.loadbalancer.migrate-service-ipip for role: docker_registry_ha::registry@codfw
- 09:28 marostegui@cumin1002: START - Cookbook sre.mysql.clone of db1233.eqiad.wmnet onto db1246.eqiad.wmnet
- 09:11 vgutierrez@cumin1002: END (PASS) - Cookbook sre.loadbalancer.migrate-service-ipip (exit_code=0) for role: docker_registry_ha::registry@eqiad
- 09:11 vgutierrez@cumin1002: END (PASS) - Cookbook sre.loadbalancer.restart-pybal (exit_code=0) rolling-restart of pybal on (A:lvs-low-traffic-eqiad or A:lvs-secondary-eqiad) and A:bullseye and A:lvs
- 09:10 vgutierrez@cumin1002: START - Cookbook sre.loadbalancer.restart-pybal rolling-restart of pybal on (A:lvs-low-traffic-eqiad or A:lvs-secondary-eqiad) and A:bullseye and A:lvs
- 09:07 vgutierrez@cumin1002: START - Cookbook sre.loadbalancer.migrate-service-ipip for role: docker_registry_ha::registry@eqiad
- 08:55 marostegui@cumin1002: END (FAIL) - Cookbook sre.mysql.clone (exit_code=99) of db1233.eqiad.wmnet onto db1246.eqiad.wmnet
- 08:55 marostegui@cumin1002: START - Cookbook sre.mysql.clone of db1233.eqiad.wmnet onto db1246.eqiad.wmnet
- 08:30 fceratto@cumin1002: START - Cookbook sre.mysql.clone of db2166.codfw.wmnet onto db2167.codfw.wmnet
- 08:29 kartik@deploy2002: Finished scap sync-world: Backport for Enable CX unified dashboard on sqwiki (T386719) (duration: 25m 32s)
- 08:20 kartik@deploy2002: sbisson, kartik: Continuing with sync
- 08:16 kartik@deploy2002: sbisson, kartik: Backport for Enable CX unified dashboard on sqwiki (T386719) synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug)
- 08:08 marostegui@cumin1002: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db1190.eqiad.wmnet with reason: Index rebuild
- 08:04 kartik@deploy2002: Started scap sync-world: Backport for Enable CX unified dashboard on sqwiki (T386719)
- 07:55 marostegui@cumin1002: END (PASS) - Cookbook sre.mysql.upgrade (exit_code=0) for db1190.eqiad.wmnet
- 07:53 root@cumin1002: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db2210.codfw.wmnet with reason: Index rebuild
- 07:53 marostegui@cumin1002: END (PASS) - Cookbook sre.mysql.upgrade (exit_code=0) for db2210.codfw.wmnet
- 07:52 root@cumin1002: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db2164.codfw.wmnet with reason: Index rebuild
- 07:52 root@cumin1002: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db1172.eqiad.wmnet with reason: Index rebuild
- 07:52 marostegui@cumin1002: END (PASS) - Cookbook sre.mysql.upgrade (exit_code=0) for db2164.codfw.wmnet
- 07:52 marostegui@cumin1002: END (PASS) - Cookbook sre.mysql.upgrade (exit_code=0) for db1172.eqiad.wmnet
- 07:51 jmm@cumin2002: END (FAIL) - Cookbook sre.ganeti.addnode (exit_code=99) for new host ganeti1027.eqiad.wmnet to cluster eqiad and group C
- 07:50 jmm@cumin2002: START - Cookbook sre.ganeti.addnode for new host ganeti1027.eqiad.wmnet to cluster eqiad and group C
- 07:48 marostegui@cumin1002: START - Cookbook sre.mysql.upgrade for db2210.codfw.wmnet
- 07:48 marostegui@cumin1002: START - Cookbook sre.mysql.upgrade for db1190.eqiad.wmnet
- 07:48 marostegui@cumin1002: dbctl commit (dc=all): 'Depool db2210 db1190', diff saved to https://phabricator.wikimedia.org/P73926 and previous config saved to /var/cache/conftool/dbconfig/20250303-074804-marostegui.json
- 07:46 Ammar: T387658 Ran mwscript-k8s --comment="T387658" -f -- extensions/CentralAuth/maintenance/fixStuckGlobalRename.php --wiki=bawiki --logwiki=metawiki 'Əkrəm Cəfər' 'Əkrəm'
- 07:45 marostegui@cumin1002: START - Cookbook sre.mysql.upgrade for db2164.codfw.wmnet
- 07:45 marostegui@cumin1002: START - Cookbook sre.mysql.upgrade for db1172.eqiad.wmnet
- 07:45 marostegui@cumin1002: dbctl commit (dc=all): 'Depool db1172 db2164', diff saved to https://phabricator.wikimedia.org/P73925 and previous config saved to /var/cache/conftool/dbconfig/20250303-074525-marostegui.json
- 07:45 marostegui@cumin1002: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db[2164,2186].codfw.wmnet,db1172.eqiad.wmnet with reason: Rebuilding indexes
- 07:33 marostegui@cumin1002: dbctl commit (dc=all): 'Depool db1233', diff saved to https://phabricator.wikimedia.org/P73923 and previous config saved to /var/cache/conftool/dbconfig/20250303-073358-root.json
- 07:18 moritzm: installing Linux 6.1.128 on Bookworm hosts
2025-03-02
- 22:07 marostegui@cumin1002: dbctl commit (dc=all): 'db1248 (re)pooling @ 100%: Repooling after rebuild index', diff saved to https://phabricator.wikimedia.org/P73921 and previous config saved to /var/cache/conftool/dbconfig/20250302-220727-root.json
- 21:52 marostegui@cumin1002: dbctl commit (dc=all): 'db1248 (re)pooling @ 75%: Repooling after rebuild index', diff saved to https://phabricator.wikimedia.org/P73920 and previous config saved to /var/cache/conftool/dbconfig/20250302-215221-root.json
- 21:37 marostegui@cumin1002: dbctl commit (dc=all): 'db1248 (re)pooling @ 50%: Repooling after rebuild index', diff saved to https://phabricator.wikimedia.org/P73919 and previous config saved to /var/cache/conftool/dbconfig/20250302-213716-root.json
- 21:22 marostegui@cumin1002: dbctl commit (dc=all): 'db1248 (re)pooling @ 25%: Repooling after rebuild index', diff saved to https://phabricator.wikimedia.org/P73918 and previous config saved to /var/cache/conftool/dbconfig/20250302-212211-root.json
- 21:07 marostegui@cumin1002: dbctl commit (dc=all): 'db1248 (re)pooling @ 10%: Repooling after rebuild index', diff saved to https://phabricator.wikimedia.org/P73917 and previous config saved to /var/cache/conftool/dbconfig/20250302-210705-root.json
- 20:52 mvernon@cumin1002: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2 days, 0:00:00 on db1246.eqiad.wmnet with reason: crashed
- 20:51 mvernon@cumin1002: dbctl commit (dc=all): 'Depool db1246', diff saved to https://phabricator.wikimedia.org/P73916 and previous config saved to /var/cache/conftool/dbconfig/20250302-205123-mvernon.json
- 16:24 marostegui@cumin1002: dbctl commit (dc=all): 'db2163 (re)pooling @ 100%: Repooling after rebuild index', diff saved to https://phabricator.wikimedia.org/P73915 and previous config saved to /var/cache/conftool/dbconfig/20250302-162421-root.json
- 16:09 marostegui@cumin1002: dbctl commit (dc=all): 'db2163 (re)pooling @ 75%: Repooling after rebuild index', diff saved to https://phabricator.wikimedia.org/P73914 and previous config saved to /var/cache/conftool/dbconfig/20250302-160915-root.json
- 15:54 marostegui@cumin1002: dbctl commit (dc=all): 'db2163 (re)pooling @ 50%: Repooling after rebuild index', diff saved to https://phabricator.wikimedia.org/P73913 and previous config saved to /var/cache/conftool/dbconfig/20250302-155410-root.json
- 15:39 marostegui@cumin1002: dbctl commit (dc=all): 'db2163 (re)pooling @ 25%: Repooling after rebuild index', diff saved to https://phabricator.wikimedia.org/P73912 and previous config saved to /var/cache/conftool/dbconfig/20250302-153904-root.json
- 15:23 marostegui@cumin1002: dbctl commit (dc=all): 'db2163 (re)pooling @ 10%: Repooling after rebuild index', diff saved to https://phabricator.wikimedia.org/P73911 and previous config saved to /var/cache/conftool/dbconfig/20250302-152359-root.json
- 10:17 root@cumin1002: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db1248.eqiad.wmnet with reason: Index rebuild
- 10:17 marostegui@cumin1002: END (PASS) - Cookbook sre.mysql.upgrade (exit_code=0) for db1248.eqiad.wmnet
- 10:11 marostegui@cumin1002: START - Cookbook sre.mysql.upgrade for db1248.eqiad.wmnet
- 10:11 root@cumin1002: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db2163.codfw.wmnet with reason: Index rebuild
- 10:08 marostegui@cumin1002: END (PASS) - Cookbook sre.mysql.upgrade (exit_code=0) for db2163.codfw.wmnet
- 10:04 marostegui@cumin1002: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db2163.codfw.wmnet with reason: Setup
- 10:03 marostegui@cumin1002: dbctl commit (dc=all): 'Depool db2167', diff saved to https://phabricator.wikimedia.org/P73910 and previous config saved to /var/cache/conftool/dbconfig/20250302-100324-marostegui.json
- 10:00 marostegui@cumin1002: START - Cookbook sre.mysql.upgrade for db2163.codfw.wmnet
- 09:58 marostegui@cumin1002: dbctl commit (dc=all): 'Depool db2163', diff saved to https://phabricator.wikimedia.org/P73909 and previous config saved to /var/cache/conftool/dbconfig/20250302-095839-root.json
- 06:04 _joe_: started replication on db2167
- 05:44 tchin@deploy2002: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/mw-content-history-reconcile-enrich: apply
- 05:44 tchin@deploy2002: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/mw-content-history-reconcile-enrich: apply
- 00:32 reedy@deploy2002: Finished scap sync-world: Backport for UserGroupsHookHandler: Return early if performer is false (T387523) (duration: 10m 33s)
- 00:25 reedy@deploy2002: reedy, dreamyjazz: Continuing with sync
- 00:25 reedy@deploy2002: reedy, dreamyjazz: Backport for UserGroupsHookHandler: Return early if performer is false (T387523) synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug)
- 00:21 reedy@deploy2002: Started scap sync-world: Backport for UserGroupsHookHandler: Return early if performer is false (T387523)
2025-03-01
- 23:59 tchin@deploy2002: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/mw-content-history-reconcile-enrich: apply
- 23:59 tchin@deploy2002: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/mw-content-history-reconcile-enrich: apply
- 18:37 dcausse: disabling the saneitizer on the cirrus streaming updater for consumer-search@eqiad & consumer-cloudelastic (pre-emptive hotfix for T387625)
- 18:37 dcausse@deploy2002: helmfile [eqiad] DONE helmfile.d/services/cirrus-streaming-updater: apply
- 18:37 dcausse@deploy2002: helmfile [eqiad] START helmfile.d/services/cirrus-streaming-updater: apply
- 18:36 dcausse@deploy2002: helmfile [eqiad] DONE helmfile.d/services/cirrus-streaming-updater: apply
- 18:35 dcausse@deploy2002: helmfile [eqiad] START helmfile.d/services/cirrus-streaming-updater: apply
- 18:30 dcausse: disabling the saneitizer on the cirrus streaming updater in codfw (hotfix for T387625)
- 18:29 dcausse@deploy2002: helmfile [codfw] DONE helmfile.d/services/cirrus-streaming-updater: apply
- 18:29 dcausse@deploy2002: helmfile [codfw] START helmfile.d/services/cirrus-streaming-updater: apply
- 17:47 godog: bounce mtail on centrallog2002
- 17:22 tchin@deploy2002: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/mw-content-history-reconcile-enrich: apply
- 17:22 tchin@deploy2002: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/mw-content-history-reconcile-enrich: apply
- 14:00 andrewbogott: rebooting wikitech-static; the entire server was intermittently locking up