Decom script
A real life example how to decom a host using the latest method, a Spicerack cookbook which replaced "wmf-decommission-host".
1) ssh to one of the cumin masters: cumin1002.eqiad.wmnet, cumin2002.codfw.wmnet
2) example command, as dry-run:
sudo cookbook -d sre.hosts.decommission analytics1032.eqiad.wmnet -t T233080
3) replace host name, ticket ID and remove the "-d" to actually run it.
example output of the dry run:
elukey@cumin1001:~$ sudo cookbook -d sre.hosts.decommission analytics1032.eqiad.wmnet -t T233080 DRY-RUN: Executing cookbook sre.hosts.decommission with args: ['analytics1032.eqiad.wmnet', '-t', 'T233080'] DRY-RUN: START - Cookbook sre.hosts.decommission ATTENTION: destructive action for 1 hosts: analytics1032.eqiad.wmnet Are you sure to proceed? Type "done" to proceed > done DRY-RUN: Resolved CNAME record for 300 IN CNAME DRY-RUN: MGMT_PASSWORD environment variable not found Management Password: DRY-RUN: Scheduling downtime on Icinga server for hosts: ['analytics1032.eqiad.wmnet'] DRY-RUN: Executing commands ['icinga-downtime -h "analytics1032" -d 14400 -r "Host decommission - elukey@cumin1001 - T233080"'] on 1 hosts: DRY-RUN: Downtimed host on Icinga DRY-RUN: Resolved A record for analytics1032.mgmt.eqiad.wmnet: analytics1032.mgmt.eqiad.wmnet. 3600 IN A DRY-RUN: Management FQDN for analytics1032.eqiad.wmnet is analytics1032.mgmt.eqiad.wmnet DRY-RUN: Scheduling downtime on Icinga server for hosts: ['analytics1032.mgmt.eqiad.wmnet'] DRY-RUN: Executing commands ['icinga-downtime -h "analytics1032" -d 14400 -r "Host decommission - elukey@cumin1001 - T233080"'] on 1 hosts: DRY-RUN: Downtimed management interface on Icinga DRY-RUN: Executing commands ['true'] on 1 hosts: analytics1032.eqiad.wmnet DRY-RUN: Executing commands ["lsblk --all --output 'NAME,TYPE' --paths | awk '/^\\/.* disk$/{ print $1 }' | xargs -I % bash -c '/sbin/wipefs --all --force %*'"] on 1 hosts: analytics1032.eqiad.wmnet DRY-RUN: Wiped bootloaders DRY-RUN: Running IPMI command: ipmitool -I lanplus -H analytics1032.mgmt.eqiad.wmnet -U root -E chassis power off DRY-RUN: Powered off DRY-RUN: skipping host status write due to dry-run mode for analytics1032 Active -> Decommissioning DRY-RUN: Set Netbox status to Decommissioning DRY-RUN: Skip removing host analytics1032.eqiad.wmnet from Debmonitor in DRY-RUN DRY-RUN: Removed from DebMonitor DRY-RUN: Executing commands ['puppet node clean analytics1032.eqiad.wmnet', 'puppet node deactivate analytics1032.eqiad.wmnet'] on 1 hosts: puppetmaster1001.eqiad.wmnet DRY-RUN: Removed from Puppet master and PuppetDB DRY-RUN: Skip updating Phabricator task T233080 in DRY-RUN with comment: cookbooks.sre.hosts.decommission executed by elukey@cumin1001 for hosts: `analytics1032.eqiad.wmnet` - analytics1032.eqiad.wmnet (**PASS**) - Downtimed host on Icinga - Downtimed management interface on Icinga - Wiped bootloaders - Powered off - Set Netbox status to Decommissioning - Removed from DebMonitor - Removed from Puppet master and PuppetDB DRY-RUN: END (PASS) - Cookbook sre.hosts.decommission (exit_code=0)
example output of the actual run:
[cumin1001:~] $ sudo cookbook sre.hosts.decommission -t T218021 START - Cookbook sre.hosts.decommission Scheduling downtime on Icinga server for hosts: [''] Scheduling downtime on Icinga server for hosts: ['labtestcontrol2001.mgmt.codfw.wmnet'] Removed host from Debmonitor Updated Phabricator task T218021 END (PASS) - Cookbook sre.hosts.decommission (exit_code=0)
You should see logmsgbot and stashbot talk about it on #wikimedia-operations and your Phabricator ticket should be automatically updated.
An example on a Phabricator ticket the result looks like
Also see: