Decom script

From Wikitech
Jump to navigation Jump to search

A real life example how to decom a host using the latest method, a Spicerack cookbook which replaced "wmf-decommission-host".

1) ssh to a cumin master, currently cumin1001.eqiad.wmnet

2) example command, as dry-run:

sudo cookbook -d sre.hosts.decommission labtestcontrol2001.wikimedia.org -t T218021

3) replace host name, ticket ID and remove the "-d" to actually run it.

example output of the dry run:

[cumin1001:~] $ sudo cookbook -d sre.hosts.decommission labtestcontrol2001.wikimedia.org -t T218021
DRY-RUN: Executing cookbook sre.hosts.decommission with args: ['labtestcontrol2001.wikimedia.org', '-t', 'T218021']
DRY-RUN: START - Cookbook sre.hosts.decommission
DRY-RUN: Resolved CNAME record for icinga.wikimedia.org: icinga.wikimedia.org. 300 IN CNAME icinga1001.wikimedia.org.
DRY-RUN: Executing commands ['puppet node clean labtestcontrol2001.wikimedia.org', 'puppet node deactivate labtestcontrol2001.wikimedia.org'] on 1 hosts: puppetmaster1001.eqiad.wmnet
DRY-RUN: Scheduling downtime on Icinga server icinga1001.wikimedia.org for hosts: ['labtestcontrol2001.wikimedia.org']
DRY-RUN: Executing commands ['icinga-downtime -h "labtestcontrol2001" -d 14400 -r "Host decommission - dzahn@cumin1001 - T218021"'] on 1 hosts: icinga1001.wikimedia.org
DRY-RUN: Resolved A record for labtestcontrol2001.mgmt.codfw.wmnet: labtestcontrol2001.mgmt.codfw.wmnet. 3600 IN A 10.193.2.1
DRY-RUN: Management FQDN for labtestcontrol2001.wikimedia.org is labtestcontrol2001.mgmt.codfw.wmnet
DRY-RUN: Scheduling downtime on Icinga server icinga1001.wikimedia.org for hosts: ['labtestcontrol2001.mgmt.codfw.wmnet']
DRY-RUN: Executing commands ['icinga-downtime -h "labtestcontrol2001" -d 14400 -r "Host decommission - dzahn@cumin1001 - T218021"'] on 1 hosts: icinga1001.wikimedia.org
DRY-RUN: Skip removing host labtestcontrol2001.wikimedia.org from Debmonitor in DRY-RUN
DRY-RUN: Skip updating Phabricator task T218021 in DRY-RUN with comment: cookbooks.sre.hosts.decommission executed by dzahn@cumin1001 for hosts: `labtestcontrol2001.wikimedia.org`
-  labtestcontrol2001.wikimedia.org
  - Removed from Puppet master and PuppetDB
  - Downtimed host on Icinga
  - Downtimed management interface on Icinga
  - Removed from DebMonitor
DRY-RUN: END (PASS) - Cookbook sre.hosts.decommission (exit_code=0)

example output of the actual run:

[cumin1001:~] $ sudo cookbook sre.hosts.decommission labtestcontrol2001.wikimedia.org -t T218021
START - Cookbook sre.hosts.decommission
Scheduling downtime on Icinga server icinga1001.wikimedia.org for hosts: ['labtestcontrol2001.wikimedia.org']
Scheduling downtime on Icinga server icinga1001.wikimedia.org for hosts: ['labtestcontrol2001.mgmt.codfw.wmnet']
Removed host labtestcontrol2001.wikimedia.org from Debmonitor
Updated Phabricator task T218021
END (PASS) - Cookbook sre.hosts.decommission (exit_code=0)

You should see logmsgbot and stashbot talk about it on #wikimedia-operations and your Phabricator ticket should be automatically updated.

An example on a Phabricator ticket the result looks like https://phabricator.wikimedia.org/T218021#5107910

Also see: https://doc.wikimedia.org/spicerack/master/cookbook.html