Portal:Cloud VPS/Admin/Instance backups

From Wikitech

As of 2020-09-01, most VMs hosted on Ceph have their main disk backed up for a few days. This is a last-resort system put in place to guard against a total Ceph cluster collapse; any large-scale restore will be extremely labor-intensive.

We do not provide self-serve snapshots, backup, or restore services for openstack VMs.

Architecture

We are using Backy2 to back up Ceph volumes. Backy2 is designed for fast, incremental backups of rbd volumes and uses deduplication to minimize disk space.

Backups are currently stored cloudvirts. Cloudvirt hosts with numbers lower than 1031 were built out to support local VM storage; since those VMs live on Ceph the SSDs are being reused for non-redundant backup storage. To determine which VMs and projects are backed up on which cloudvirts, consult the file "modules/profile/templates/wmcs/backy2/wmcs_backup_instances.yaml.erb" in the puppet repo.

The backup agent (wmcs-backup-instances) is run daily by a systemd timer. Another systemd timer runs a cleanup script (wmcs-purge-backups) which deletes expired backups. The intended lifespan of a given backup is set when that backup is created.

The specific backup process for a given VM is handled by the python library rbd2backy2.py. The steps for each VM are:

  1. make a new snapshot
  2. collect a diff between today's snapshot and yesterday's snapshot
  3. delete yesterday's snapshot
  4. back up today's snapshot, using the diff as a hint for Backy2 so it knows what to ignore

What is backed up

Each backup is of a complete VM disk image. Backups do not include any openstack metadata (e.g. base image, flavor, etc.) so a given restore is likely to work only within the same openstack install context where it was captured.

Our backup agent has a simple regexp-based filter (in /etc/wmcs_backup_instances.yaml) that excludes some VMs from backup. Prime candidates for exclusion are:

  • Cattle: VMs that can be trivially reproduced from scratch from puppet with no data loss. Kubernetes worker nodes are the most obvious example of this.
  • Hogs: VMs that, by special request, have ENORMOUS disk drives for temporary processing work. Typically if a user requests a quota exception for a VM like this they should be warned that they will not be eligible for backup, and their project should be added to the exclusion list when the project is created. Example: encodingXX.video.eqiad.wmflabs
  • Mayflies: Some internal-use projects are created to run a single test or experiment and then destroyed. Obvious examples of this are VMs in the admin-monitoring project or the sre-sandbox project.

We do not have the capacity to back up everything!

Restoring

Backy2 can restore volumes straight into the Ceph pool. To exercise the restore process (or to rollback to a previous backup):

  1. Stop the vm.
  2. root@cloudcontrol1003:~# openstack server stop ee8bd285-73ab-4981-a1f1-498b79b50e2a
    
  3. Delete (or move) the existing ceph image. This prevents filename conflict when restoring.
    • If you get a complaint about the volume still having "watchers", check if the shutdown of the VM really completed.
    • If the command complains about snapshots, find use the commands below to remove the snapshots of that volume.
    root@cloudcontrol1005:~# rbd rm eqiad1-compute/ee8bd285-73ab-4981-a1f1-498b79b50e2a_disk
    Removing image: 100% complete...done.
    
  4. Find the backup you want to restore.
  5. root@cloudbackup1003:~# backy2 ls  ee8bd285-73ab-4981-a1f1-498b79b50e2a_disk  
        INFO: [backy2.logging] $ /usr/bin/backy2 ls ee8bd285-73ab-4981-a1f1-498b79b50e2a_disk
    +---------------------+-------------------------------------------+---------------------+------+-------------+--------------------------------------+-------+-----------+----------------------------+---------------------+
    |         date        | name                                      | snapshot_name       | size |  size_bytes |                 uid                  | valid | protected | tags                       |        expire       |
    +---------------------+-------------------------------------------+---------------------+------+-------------+--------------------------------------+-------+-----------+----------------------------+---------------------+
    | 2020-08-19 01:37:12 | ee8bd285-73ab-4981-a1f1-498b79b50e2a_disk | 2020-08-19T01:37:11 | 5120 | 21474836480 | 8136f1c6-e1bc-11ea-94a2-b02628295df0 |   1   |     0     | b_daily,b_monthly,b_weekly | 2020-08-22 00:00:00 |
    | 2020-08-19 02:00:51 | ee8bd285-73ab-4981-a1f1-498b79b50e2a_disk | 2020-08-19T02:00:50 | 5120 | 21474836480 | cedd7db6-e1bf-11ea-b5bb-b02628295df0 |   1   |     0     |                            | 2020-08-22 00:00:00 |
    | 2020-08-20 02:00:49 | ee8bd285-73ab-4981-a1f1-498b79b50e2a_disk | 2020-08-20T02:00:48 | 5120 | 21474836480 | f8395878-e288-11ea-b5a0-b02628295df0 |   1   |     0     | b_daily                    | 2020-08-27 00:00:00 |
    | 2020-08-20 18:43:04 | ee8bd285-73ab-4981-a1f1-498b79b50e2a_disk | 2020-08-20T18:43:03 | 5120 | 21474836480 | fb87dc0c-e314-11ea-a855-b02628295df0 |   1   |     0     |                            | 2020-08-27 00:00:00 |
    | 2020-08-20 18:43:31 | ee8bd285-73ab-4981-a1f1-498b79b50e2a_disk | 2020-08-20T18:43:30 | 5120 | 21474836480 | 0b9046ac-e315-11ea-83c9-b02628295df0 |   1   |     0     |                            | 2020-08-27 00:00:00 |
    | 2020-08-20 18:45:19 | ee8bd285-73ab-4981-a1f1-498b79b50e2a_disk | 2020-08-20T18:45:18 | 5120 | 21474836480 | 4ba4d38e-e315-11ea-9d18-b02628295df0 |   1   |     0     |                            | 2020-08-27 00:00:00 |
    | 2020-08-20 18:51:33 | ee8bd285-73ab-4981-a1f1-498b79b50e2a_disk | 2020-08-20T18:51:31 | 5120 | 21474836480 | 2ac24c4a-e316-11ea-939c-b02628295df0 |   1   |     0     |                            | 2020-08-27 00:00:00 |
    +---------------------+-------------------------------------------+---------------------+------+-------------+--------------------------------------+-------+-----------+----------------------------+---------------------+
        INFO: [backy2.logging] Backy complete.
    
  6. Note the UID of the desired image and restore
  7. root@cloudbackup1003:~# backy2 restore 2ac24c4a-e316-11ea-939c-b02628295df0 rbd://eqiad1-compute/ee8bd285-73ab-4981-a1f1-498b79b50e2a_disk
        INFO: [backy2.logging] $ /usr/bin/backy2 restore 2ac24c4a-e316-11ea-939c-b02628295df0 rbd://eqiad1-compute/ee8bd285-73ab-4981-a1f1-498b79b50e2a_disk
        INFO: [backy2.logging] Restore phase 1/2 (sparse) to rbd://eqiad1-compute/ee8bd285-73ab-4981-a1f1-498b79b50e2a_disk Read Queue [          ] Write Queue [          ] (0.0% 0.0MB/sØ ETA 2m56s) 
        INFO: [backy2.logging] Restore phase 1/2 (sparse) to rbd://eqiad1-compute/ee8bd285-73ab-4981-a1f1-498b79b50e2a_disk Read Queue [==========] Write Queue [==========] (23.0% 1121.1MB/sØ ETA 3s) 
        INFO: [backy2.logging] Restore phase 1/2 (sparse) to rbd://eqiad1-compute/ee8bd285-73ab-4981-a1f1-498b79b50e2a_disk Read Queue [==========] Write Queue [==========] (28.2% 1092.6MB/sØ ETA 5s) 
        INFO: [backy2.logging] Restore phase 1/2 (sparse) to rbd://eqiad1-compute/ee8bd285-73ab-4981-a1f1-498b79b50e2a_disk Read Queue [==========] Write Queue [==========] (34.1% 1087.6MB/sØ ETA 6s) 
    <etc>
    
  8. Start the restored VM
  9. root@cloudcontrol1003:~# openstack server start ee8bd285-73ab-4981-a1f1-498b79b50e2a
    

Restoring after 'openstack server delete'

We have rescued at least one VM from the void after an accidental deletion. The process involves creating a new 'host' VM with the same name (so that dns, neutron, etc are hooked up properly) and then overlaying the disk image of then new host with the restored backup.

This may be possible, with the following caveats

  • Backups are only preserved for 7 days, so if the deletion is noticed weeks or months later it is probably too late.
  • The restored VM will lose much of its openstack state: it will have a new IP address, forget its security groups, and most likely need its puppet config replaced in Horizon.
  • If the VM predated the move from .eqiad.wmflabs to .eqiad1.wikimedia.cloud, the new VM will only be present under the new domain, eqiad1.wikimedia.cloud.

Here are the steps for rescue:

  1. Locate the VM in the nova database
  2. # mysql -u root nova_eqiad1
    [nova_eqiad1]> SELECT hostname, id, image_ref, instance_type_id FROM instances WHERE hostname LIKE "<hostname>";
    
  3. Locate the flavor in the nova api database
  4. # mysql -u root nova_api_eqiad1;
    [nova_api_eqiad1]> SELECT name, ID FROM flavors WHERE id='<instance_type_id from above>';
    
  5. Create the new host VM
  6. # OS_PROJECT_ID=<project> openstack server create --nic net-id=7425e328-560c-4f00-8e99-706f3fb90bb4  --flavor <flavor_id_from_above> --image <image_ref_from_above> <hostname>
    
  7. Proceed with the #Restoring steps from above
  8. Confirm puppet runs on the restored VM
  9. Add security groups, floating IPs, etc. as needed in Horizon

Restoring a lost Glance image

Glance images are backed up on cloudcontrol nodes: each image is backed up on every node. Restoring is similar to the process for instances, but Glance accesses a snapshot rather than the primary file so there's an extra step. In this example we are restoring an image with id '06cf27ba-bed2-48c7-af2b-2abdfa65463c'.

  1. Find the backup you want to restore.
  2. root@cloudcontrol1005:~# backy2 ls 06cf27ba-bed2-48c7-af2b-2abdfa65463c
        INFO: [backy2.logging] $ /usr/bin/backy2 ls 06cf27ba-bed2-48c7-af2b-2abdfa65463c
    +---------------------+--------------------------------------+---------------------+------+-------------+--------------------------------------+-------+-----------+----------------------------+---------------------+
    |         date        | name                                 | snapshot_name       | size |  size_bytes |                 uid                  | valid | protected | tags                       |        expire       |
    +---------------------+--------------------------------------+---------------------+------+-------------+--------------------------------------+-------+-----------+----------------------------+---------------------+
    | 2020-10-20 16:00:03 | 06cf27ba-bed2-48c7-af2b-2abdfa65463c | 2020-10-20T16:00:02 | 4864 | 20401094656 | 508686ba-12ed-11eb-a7f5-4cd98fc4a649 |   1   |     0     | b_daily,b_monthly,b_weekly | 2020-10-27 00:00:00 |
    +---------------------+--------------------------------------+---------------------+------+-------------+--------------------------------------+-------+-----------+----------------------------+---------------------+
        INFO: [backy2.logging] Backy complete.
    
  3. Note the UID of the desired image and restore
  4. root@cloudcontrol1005:~# backy2 restore 508686ba-12ed-11eb-a7f5-4cd98fc4a649 rbd://eqiad1-glance-images/06cf27ba-bed2-48c7-af2b-2abdfa65463c
        INFO: [backy2.logging] $ /usr/bin/backy2 restore 508686ba-12ed-11eb-a7f5-4cd98fc4a649 rbd://eqiad1-glance-images/06cf27ba-bed2-48c7-af2b-2abdfa65463c
        INFO: [backy2.logging] Restore phase 1/2 (sparse) to rbd://eqiad1-glance-images/06cf27ba-bed2-48c7-af2b-2abdfa65463c Read Queue [          ] Write Queue [          ] (0.0% 0.0MB/sØ ETA 2m57s) 
        INFO: [backy2.logging] Restore phase 1/2 (sparse) to rbd://eqiad1-glance-images/06cf27ba-bed2-48c7-af2b-2abdfa65463c Read Queue [==========] Write Queue [==========] (9.4% 244.1MB/sØ ETA 11s) 
    <etc>
    
  5. Create a snapshot named 'snap' for Glance to access
  6. root@cloudcontrol1005:~# rbd snap create eqiad1-glance-images/06cf27ba-bed2-48c7-af2b-2abdfa65463c@snap
    


Handy commands

List every backup on a backup server:

root@cloudbackup1003:~# backy2 ls

List every backup for a given VM:

root@cloudbackup1003:~# backy2 ls <instance_id>_disk

List all rbd snapshots for a given VM:

root@cloudcontrol1005:~# rbd --pool eqiad1-compute snap ls <instance_id>_disk

Note: The backup job should leave at most one rbd snapshot for any given VM. If there are a bunch then something interesting is happening and we are probably leaking ceph storage space like mad.

Purge all rbd snapshots for a given VM:

root@cloudcontrol1005:~# rbd --pool eqiad1-compute snap purge <instance_id>_disk

Note: purging snaps is not especially disruptive. It will force the next backup to read the entire volume rather than only the changed blocks, which will slow things down quite a bit for the next backup.

Future concerns

  • Until large-scale ceph adoption, any time and space estimates are approximate. If it turns out to take more than 24 hours to backup the whole cluster, or cloudvirt1024 doesn't have space for all the backups, there are a few options:
    • Reduce the number of daily backups. Even if we go as low as 2 days these backups will still be valuable.
    • Split backup jobs between more hosts.
    • Exclude more projects or VM types from backup
  • To support incremental backups, every rbd image is accompanied at all times by yesterday's snapshot. Depending on how snapshots are stored, that may turn out to consume a massive amount of precious Ceph storage space. If this turns out to be an issue we may need to abandon incremental backups, or use some convoluted process like restoring yesterday's backup into and image, do a diff, and then remove yesterday's backup. It should be clearer what tradeoffs to make as use increases.
  • Running backup jobs on cloudvirt1024 may interfere with performance of VMs hosted there. Ideally we'd have some way to allocate some cores for backups and other cores for virtualization.
  • Some users of Backy2 complain that taking snapshots on an active Ceph cluster causes performance lags during snapshoting. We need to keep an eye out for such problems.
  • The restoration process documented here is too cumbersome for mass restoration. Probably a restore feature should be added to rbd2backy2.py.