16:57 mutante: - puppetmaster-1003 reachable again but service fails to start and puppetserver-deploy-code fails
16:50 mutante: - rebooted unreachable puppetmaster-1003 - was "no route to host" - but is back now, log had a " /dev/sdb: Can't open blockdev" as well
2024-04-12
17:55 mutante: - changed both puppetmaster and puppetdb hiera setting back to puppetmaster-1001 for instance deploy-1004
17:49 mutante: - deploy-1004 itself is on buster and buster and puppet 7 don't mix well - testing deployment role on bookworm, creating deploy-1005
17:48 mutante: - deploy-1004 has a puppet problem when talking to new puppetmaster-1003 that goes away when switching back to puppetmaster-1001
2024-04-11
20:08 mutante: zuul-1001 - switching to new puppetmaster-1003 in puppet.conf manually, switched project defaults in repo too
19:58 mutante: manually editing puppet.conf to use puppetmaster-1003 instead of -1001 because you can't switch the puppetmaster via puppet if puppet is already broken :)
19:41 mutante: switching gitlab-runner-1005 from puppetmaster-1001 to puppetmaster-1003 via web Hiera
19:03 mutante: - deleting instance contint-bullseye which was only used by me for a test before we created contint1003 in prod T334517T361224
18:38 mutante: - attempting to fix puppet run on vrts-1001 related to switching prod to cfssl for SSL cers
18:23 mutante: - shutting down puppetmaster-1001 on buster - should now be replaced by puppetmaster-1003 on bookworm (thanks brennen) T360964T360470
18:02 mutante: - shutting down instance devtools-puppetdb1001 - which is on buster - basically to see what breaks of complains, if anything
17:27 mutante: - added profile::pki::client::ensure: present to instance hiera for etherpad-bookworm - fixing broken puppet run
17:25 mutante: - attempting to fix puppet on instance etherpad-bookworm but SSL provider cfssl doesn't appear to work in cloud
2024-02-09
19:08 mutante: deleting instance phabricator-prod-1001 (shut down a couple days ago, buster instance replaced by phabricator-bullseye instance) T356530
22:31 mutante: - phabricator-bullseye, instance hiera: setting phabricator_domain to phabricator.wmcloud.org and phabricator_altdomain to phab-usercontent.wmcloud.org T356530
22:27 mutante: - deleted proxies phab.wmflabs.org, phab-prod-usercontent.wmflabs.org, phabricator.wmflabs.org - created proxies phabricator.wmcloud.org, phab-usercontent.wmcloud.org - wmflabs names are legacy and should migrate T356530
20:59 mutante: - changing phabricator domain in instance Hiera of phabricator-bullseye to phab.wmflabs.org and running puppet to update apache config/rewrite rules T356530
20:41 mutante: shutting down instance phabricator-prod-1001 (buster), replaced by phab-bullseye (bullseye) T356530
20:38 mutante: deleting proxy phab-bull.wmcloud.org after previous proxy names are switched to bullseye backend T356530
20:17 mutante: editing proxies phab.wmflabs.org and phab-prod-usercontent.wmflabs.org to point to bullseye instance instead of buster T356530
20:13 mutante: running "scap deploy" in /srv/deployment/phabricator/deployment on deploy-1004 which deploys to phabricator-bullseye and phabricator-prod-1001 T356530
19:29 mutante: deleting web proxy phabricator-prod.wmflabs.org which pointed to port 443 on the buster instance and timed out T356530
19:12 mutante: deleting web proxy phorge.wmcloud.org which pointed to 172.16.7.98 which doesn't exist anymore T356530
2024-01-05
21:30 mutante: contint-bullseye - sudo /usr/sbin/a2dismod mpm_event ; sudo /usr/sbin/a2endmod php74 - the usual issue we have had for years
2023-12-18
21:53 andrew@cloudcumin1001: END (FAIL) - Cookbook wmcs.openstack.quota_increase (exit_code=99)
23:49 mutante: starting manual gitlab upgrade process on gitlab-prod-1002
2023-11-30
21:03 mutante: - phabricator-bullseye - created user app_user and granted privileges in mysql for user from 127.0.0.1, ran ./phabricator/bin/storage upgrade --force; set 'phabricator_domain: phab-bull.wmcloud.org' in web Hiera
20:33 mutante: - phabricator-bullseye - running 'mariadb-secure-installation' interactive script - this fixed mysql shell which previously exited with "bash: /nonexistent: No such file or directory"
20:27 mutante: - phabricator-bullseye - attempting to fix mariadb/mysql server, apt-get remove mariadb-server, running puppet, debugging why it wont start
2023-11-21
21:52 mutante: - commit fake key for phabricator-bullseye host in git /var/lib/git/labs/private/modules/secret/secrets/ssl on puppetmaster-1001.devtools T327068
21:41 mutante: - cert issue on new machine related to having local puppetmaster, like T349937#9288547 except "rm -rf /var/lib/puppet/ssl" was enough since puppetmaster did auto-sign new CSR - T327068
21:24 mutante: - initial puppet run on newly created VM fails with "SSL_connect returned=1 errno=0 state=error: certificate verify failed (self signed certificate in certificate chain): [self signed certificate in certificate chain for /CN=Puppet CA: puppetmaster-1001.devtools.eqiad.wmflabs" T327068
21:00 mutante: - deleted instance phorge-1001 to get quota back and allow for creting new phabricator-on-bullseye instance T328595T327068
2023-04-12
19:26 mutante: - vrts-1001 - editing /etc/my.cnf to set mariadb datadir to /var/lib/mysql instead of /srv/sqldata and restart service, issue like T329571
2023-03-11
00:20 mutante: - on phorge1001, enable general query log in mysql (mariadb), to learn about database scheme, don't forget to turn that off so VM doesn't run out of disk (SET GLOBAL general_log=1;) T328595
2023-03-07
23:34 mutante: - phorge-1001 - MariaDB [(none)]> SET GLOBAL max_allowed_packet=33554432;
23:33 mutante: - phorge-1001 - MariaDB [(none)]> SET GLOBAL local_infile=0;
23:31 mutante: - phorge-1001 - MariaDB [(none)]> SET GLOBAL sql_mode = "STRICT_ALL_TABLES,STRICT_TRANS_TABLES,ERROR_FOR_DIVISION_BY_ZERO,NO_AUTO_CREATE_USER,NO_ENGINE_SUBSTITUTION";
22:18 mutante: install package python3-certbot-apache on gerrit-prod-1001 - T329444
22:03 mutante: - re-activating disabled puppet on gerrit-prod-1001 (reason given was 'gerrit deploy' but it was about 17 days ago)
21:58 mutante: rebooting instance gerrit-prod-1001 which can't be reached T329444
2023-01-31
22:39 mutante: remove role::gitlab from gitlab-prod-1001. to be replaced with gitlab-prod-1002. T318521
2023-01-28
16:26 taavi: adjust gitlab-prod-1002 network port settings to allow adding the secondary IP, requested in T318521
2023-01-24
09:47 wm-bot2: Increased quotas by 1 instances (T327750) - cookbook ran by arturo@nostromo
2022-11-30
15:57 wm-bot2: Increased quotas by 1 floating-ips (T323986) - cookbook ran by dcaro@vulcanus
2022-11-17
18:18 andrewbogott: committed a local puppet change on puppetprimary to fix upstream syncs
2022-10-28
20:24 mutante: - removing from Horizon / project-wide hiera: profile::phabricator::main::manage_scap_user: true (set in the repo)
20:23 mutante: - removing from Horizon / project-wide hiera: profile::keyholder::server::require_encrypted_keys: 'no' (set in the repo)
20:19 mutante: - removing from Horizon / project-wide hiera: profile::gerrit::daemon_user: gerrit2, profile::gerrit::manage_scap_user: true, profile::gerrit::scap_user: gerrit-deploy (all of these are set in the repo)
2022-10-19
19:21 mutante: - on puppetmaster-1001.devtools created /var/lib/puppet/volatile/GeoIP directory - to fix puppet error on deploy-1004.devtools - reacting to puppet-broken-nagging-emails
20:30 mutante: - created gitlab-runner-1002 - applied puppet role - attached cinder volume "docker" - running puppet again
17:06 mutante: deleting instance gitlab-runner-1001 which just disconnects people. gut feeling is it has to do with the fact that a previous instance name was used again
2022-06-14
23:29 mutante: - creating instance gitlab-runner-1001 since we did not have a test machine for gitlab-runners but need one to test things like gerrit:791655 before hitting prod T308271
2022-04-29
20:59 mutante: - restarting instance gitlab-prod-1001 - No route to host
20:55 mutante: - attempting to soft reboot instance deploy1004 (got the puppet fail mail and wasnt reachable by ssh), this happened lately as well to gitlab-prod-1001, same project, different instance, but this time it doesn't just come back yet
2022-04-20
17:03 mutante: soft rebooting gitlab-prod-1001 which was sending "failed puppet" reports and was unreachable, just like the other day.
2022-04-18
19:08 mutante: - gitlab-prod-1001 is indeed back after soft rebooting the instance. uptime 1 min T297411
19:07 mutante: - gitlab-prod-1001 randomly stopped working. we got the "puppet failed" mails without having made changes and can't ssh to the instance anymore when trying to check out why. trying soft reboot via Horizon T297411
17:03 mutante: - not sure if possible (for me) to create a bullseye deployment server in cloud, using scap: failed: Execution of '/usr/bin/scap deploy --init', missing PHP packages, missing prometheus-mcrouter-exporter and more T306069
17:02 mutante: - not sure if possible (for me) to create ad deployment server in cloud, using scap: failed: Execution of '/usr/bin/scap deploy --init'
16:40 mutante: : creating deploy1003 to replace deploy1002 T306069
16:36 mutante: : deleting instance gitlab-runner-1001 - was just for testing, real runners are upgrade in their own project
2022-03-02
22:22 mutante: - creating gitlab-runner-1001 on bullseye - purely test for T297659
2022-03-01
18:16 taavi: allocated secondary IP for gitlab-prod-1001 per request on T302803
2022-02-15
16:08 taavi: created devtools.wmcloud.org dns zone for the devtools project T301793
2022-01-26
17:26 arturo: bump quota, floating IP from 1 to 2 (T299561)
15:56 arturo: bump quota, RAM from 32 to 40, cores from 16 to 20 (T299561)
2022-01-21
22:12 mutante: - created new instance gitlab-prod-1001 T297411
22:11 mutante: - created new instance gitlab-prod-1001T297411
21:57 mutante: - deleted instances "doc" and "doc1002" to make room for gitlab instance T299561 - T297411
2022-01-19
17:36 mutante: - added brennen, aokoth and jelto as users and projectadmins (T297411)
2021-11-10
19:49 mutante: - removing manually added things in Horizon Hiera that were already in the repo, please don't keep adding in web UI, we don't want to repeat the same thing we did in deployment-prep
08:04 mutante: - broken puppet again from prod changes. this time: deploy-1002 - []' is not applicable to an Undef Value. mediawiki/mcrouter_wancache.pp, line: 19
2020-04-13
10:00 mutante: - phabricator-stage-1001: replace deployment-tin.deployment-rep with deploy-1002.devtools in deployment-cache/.config
09:40 mutante: set missing (and new) profile::tlsproxy::envoy::capitalize_headers: true to fix puppet errors
09:35 mutante: set phabricator::vcs::address::v6 to fe80 local address to fix puppet error on phabricator-stage-1001
2020-01-16
00:53 mutante: deploy-1002 - become 'trebuchet' user and ssh to phabricator scap targets. to fix ssh host key verification issue on first deploy
00:30 mutante: deploy-1002 live hack /srv/deployment/phabricator/deployment/scap/phabricator-targets and replace prod server with cloud instances; scap deploy in phabricator repo
2020-01-15
23:51 paladox: deploy-1002 rm -rf /srv/deployment
23:44 mutante: deploy-1002 sudo git init in /srv/deployment ; scap deploy --init (now fails with 'fatal: Not a valid object name HEAD')
23:42 mutante: deploy-1002 mkdir /srv/deployments/.git ; chown trebuchet:wikidev .git ; manually run "scap deploy --init" as trebuchet user in an attempt to fix initial puppet run on deployment_server
16:01 bstorm_: moving vm puppetmaster-1001 from cloudvirt1024 to cloudvirt1009 due to hardware error T241884
2020-01-03
22:37 mutante: - sudo vi /srv/deployment/phabricator/deployment-cache/.config on both phabricator instances to fix deployment server (remove deployment-tin (!))
21:50 mutante: assigned 172.16.0.198/32 on eth0 on phabricator-prod-1001
21:50 jeh: add secondary interface to phabricator-prod-1001
21:32 mutante: configure 172.16.0.189 as "vcs" address v4 for phabricator-stage-1001
21:24 jeh: add secondary interface to phabricator-stage-1001