Jump to content

MariaDB/Decommissioning a DB Host

From Wikitech

Prerequisites:

  • SSH access to one of the cluster management hosts (cumin1002.eqiad.wmnet, cumin2002.codfw.wmnet

) to depool + run the decommissioning script

Decommissioning workflow:

Create a tracking ticket

  1. Create a decommission ticket with the following template: https://phabricator.wikimedia.org/maniphest/task/edit/form/52/
  2. If there is hardware problems, please specify so for the DCOps to label it so we do not re-use broken pieces.

Depool the host

  1. SSH to one of the cluster management hosts (cumin1002.eqiad.wmnet, cumin2002.codfw.wmnet

)

  1. dbctl instance HOSTNAME depool && dbctl config commit -m "Depool db1091 TASKNUMBER"

Remove the host from dbctl

  1. Create a puppet patch (example: https://gerrit.wikimedia.org/r/c/operations/puppet/+/638343)
  2. SSH to puppetmaster1001
  3. sudo puppet-merge - if you see any changes other than yours here, contact the owners to see if these are ok to merge
  4. SSH to one of the cluster management hosts (cumin1002.eqiad.wmnet, cumin2002.codfw.wmnet

)

  1. sudo dbctl config commit -m "Remove HOSTNAME from dbctl TASKNUMBER"

Remove all other puppet entries

  1. Create a puppet patch (example: https://gerrit.wikimedia.org/r/c/operations/puppet/+/638352)
    1. Changes to dhcp are no longer needed, so no need to edit: linux-host-entries.ttyS1-115200
  2. DO NOT merge the patch yet

Run the decommissioning script

  1. SSH to one of the cluster management hosts (cumin1002.eqiad.wmnet, cumin2002.codfw.wmnet

)

  1. Start a screen or tmux session
  2. sudo cookbook sre.hosts.decommission -t TASKNUMBER HOSTNAME.DC.wmnet
  3. Enter console password from Pwstore

Merge puppet change

  1. SSH to puppetmaster1001
  2. sudo puppet-merge - if you see any changes other than yours here, contact the owners to see if these are ok to merge

Remove host from zarcillo

  1. Log the action in IRC (#wikimedia-operations) - !log Removing HOSTNAME from zarcillo TASKNUMBER
  2. SSH to one of the cluster management hosts (cumin1002.eqiad.wmnet, cumin2002.codfw.wmnet

)

  1. sudo -i
  2. Zarcillo
    1. db-mysql db1215 -A zarcillo
    2. Execute the following queries in the MySQL prompt (remember about the semicolon):
      1. set binlog_format='ROW';
      2. delete from servers where hostname like 'HOSTNAME%';
      3. delete from instances where name like 'HOSTNAME%'; (INSTANCE is normally HOSTNAME or HOSTNAME:PORT)
      4. delete from section_instances where instance like 'HOSTNAME%';

Remove host from orchestrator

Orchestrator will purge the host automatically within 1-2 weeks but to avoid that delay it should be removed manually
  1. From the GUI (admin users only)
  2. From the CLI:
    1. Log the action in IRC (#wikimedia-operations) - !log Removing HOSTNAME from orchestrator TASKNUMBER
    2. SSH to dborch1001.wikimedia.org
    3. Single-instance host: sudo orchestrator -c forget -i HOSTNAME:3306 (use the FQDN for the HOSTNAME)
    4. Multi-instance host: sudo orchestrator -c forget -i HOSTNAME:PORT for each HOSTNAME:PORT combination (use the FQDN for the HOSTNAME)

Update the task and send it to dcops

  1. mark all the steps for "step for service owners" on: https://phabricator.wikimedia.org/T267088
  2. Reassign:
    • for eqiad to wiki_willy
    • for codfw to wiki_willy
  3. Remove #DBA tag and add #dc-ops and #ops-eqiad OR #ops-codfw.
  4. Add the following comment: "This host is ready for DC-Ops to decommission".