Nova Resource:Account-creation-assistance/SAL

16:10 taavi@cloudcumin1001: END (PASS) - Cookbook wmcs.openstack.migrate_project_to_ovs (exit_code=0)
16:05 taavi@cloudcumin1001: START - Cookbook wmcs.openstack.migrate_project_to_ovs

11:16 dcaro: removing custom mx hosts, as the global names are now resolvable again (T271322)

14:53 dcaro: manually configured mx servers to use wikimedia.cloud domain on project hiera (T271322)

12:05 arturo: VM instances accounts-appserver5, were stopped briefly due to issue in hypervisor (T215012)
12:04 arturo: VM instances accounts-appserver5, were stopped briefly due to issue in hypervisor (T215012)

01:34 FastLizard4: Looks like the problem was caused by puppet not having been run in days, so the ldap config was out of date (along with the puppet config)
01:25 FastLizard4: SSH on accounts-database is acting up again, will reboot and investigate

06:47 FastLizard4: Altered security group 'default' config, changed rule 'from 80 to 80 proto tcp ip 10.4.0.54/32' to 'from 80 to 80 proto tcp ip *10.4.0.0/21*' to fix issues with accessing web services on accounts-application

12:03 FastLizard4: Opreations log, supplemental: This was a triumph; I'm making a note here, huge success, and that turning it off and on again still works!
11:57 FastLizard4: Opreations log, stardate 45650.54: -application is now properly reporting to Icinga, but -database isn't. Giving -database the (re)boot to see if that helps.

23:55 FastLizard4|away: Supplemental: Reboot has fixed connectivity issues and application is back up. DHCP is probable culprit.
23:54 FastLizard4|away: Supplemental: accounts-application is the instance that is unexpectedly down
23:53 FastLizard4|away: is unexpectedly down, attempting a reboot to recover. Will investigate further when I get home.
09:36 FastLizard4: Fixed incinga reporting by `sudo cp -R /etc/nagios/* /etc/icinga/ && sudo killall nrpe && sudo /etc/init.d/nagios-nrpe-server start` on -application and -database.
08:47 FastLizard4: Nope, reboots did not fix the errors. Interestingly, only application and database are showing errors (for load, disk, RAM, numprocs, and dpkg): "CHECK_NRPE: Error - Could not complete SSL handshake." Puppetmaster is reading all-green.
08:35 FastLizard4: Going to try rebooting all systems to see if that fixes the NRPE problems.

10:11 FastLizard4: All instances need reboots after `apt-get` upgrade. Initiating now.

03:48 FastLizard4: Scratch last log entry, error fixed by using `sudo puppetca --sign "i-000005c5.pmtpa.wmflabs"`
03:41 FastLizard4: Getting strange puppet errors related to SSL on accounts-puppetmaster, going to try rebooting.

23:45 FastLizard4: Installing package updates and rebooting accounts-application. Also got puppet running again by removing generic::packages::git-core class.
23:23 FastLizard4: Setting mysql_server_bind_address to * for accounts-database
23:03 FastLizard4: Restarting accounts-puppetmaster to apply updates to packages