Jump to navigation Jump to search
- 20:35 Izhidez: apache2 restarted
- 20:33 Izhidez: apache2 shutdown to deal with spam (742 line DB change)
- 12:05 arturo: VM instances accounts-appserver5, were stopped briefly due to issue in hypervisor (T215012)
- 12:04 arturo: VM instances accounts-appserver5, were stopped briefly due to issue in hypervisor (T215012)
- 14:00 andrewbogott: moving accounts-mwoauth to virt1005
- 08:23 andrewbogott: rebooted accounts-puppetmaster
- 06:33 FastLizard4: running apt-get updates and reboot accounts-application
- 01:28 FastLizard4: `apt-get update && apt-get dist-upgrade -y` on all instances
- 01:34 FastLizard4: Looks like the problem was caused by puppet not having been run in days, so the ldap config was out of date (along with the puppet config)
- 01:25 FastLizard4: SSH on accounts-database is acting up again, will reboot and investigate
- 00:24 FastLizard4: accounts-application requesting reboot, fulfilling
- 00:23 FastLizard4: `apt-get update && apt-get upgrade -y` on all accounts-*
- 00:18 FastLizard4: Unable to SSH in to accounts-database, issuing a reboot
- 06:47 FastLizard4: Altered security group 'default' config, changed rule 'from 80 to 80 proto tcp ip 10.4.0.54/32' to 'from 80 to 80 proto tcp ip *10.4.0.0/21*' to fix issues with accessing web services on accounts-application
- 08:08 FastLizard4: updated packages on all instances
- 02:22 FastLizard4: apt-get update/upgrade on all instances
- 12:03 FastLizard4: Opreations log, supplemental: This was a triumph; I'm making a note here, huge success, and that turning it off and on again still works!
- 11:57 FastLizard4: Opreations log, stardate 45650.54: -application is now properly reporting to Icinga, but -database isn't. Giving -database the (re)boot to see if that helps.
- 23:55 FastLizard4|away: Supplemental: Reboot has fixed connectivity issues and application is back up. DHCP is probable culprit.
- 23:54 FastLizard4|away: Supplemental: accounts-application is the instance that is unexpectedly down
- 23:53 FastLizard4|away: is unexpectedly down, attempting a reboot to recover. Will investigate further when I get home.
- 09:36 FastLizard4: Fixed incinga reporting by `sudo cp -R /etc/nagios/* /etc/icinga/ && sudo killall nrpe && sudo /etc/init.d/nagios-nrpe-server start` on -application and -database.
- 08:47 FastLizard4: Nope, reboots did not fix the errors. Interestingly, only application and database are showing errors (for load, disk, RAM, numprocs, and dpkg): "CHECK_NRPE: Error - Could not complete SSL handshake." Puppetmaster is reading all-green.
- 08:35 FastLizard4: Going to try rebooting all systems to see if that fixes the NRPE problems.
- 03:11 FastLizard4: All servers requesting reboots, will provide
- 03:06 FastLizard4: Running apt-get upgrade on all server
- 05:48 FastLizard4: THe servers demand reboots, and reboots they shall receive
- 10:11 FastLizard4: All instances need reboots after `apt-get` upgrade. Initiating now.
- 03:48 FastLizard4: Scratch last log entry, error fixed by using `sudo puppetca --sign "i-000005c5.pmtpa.wmflabs"`
- 03:41 FastLizard4: Getting strange puppet errors related to SSL on accounts-puppetmaster, going to try rebooting.
- 10:43 FastLizard4: `apt-get update && apt-get upgrade` run on all instances
- 07:01 FastLizard4: `apt-get upgrade` and reboot on accounts-database
- 23:45 FastLizard4: Installing package updates and rebooting accounts-application. Also got puppet running again by removing generic::packages::git-core class.
- 23:23 FastLizard4: Setting mysql_server_bind_address to * for accounts-database
- 23:03 FastLizard4: Restarting accounts-puppetmaster to apply updates to packages