Portal:Cloud VPS/Admin/Cinder backups
Most cinder and glance volumes hosted are backed up every day or two, with backups preserved for a few days.
Architecture
We are using Backy2 to back up cinder volumes. VMs are backed up (off-site) on cloudbackup2003 and cloudbackup2004.
The backup agent (wmcs-backup volumes) is run daily by a systemd timer. The intended lifespan of a given backup is set when that backup is created.
What is backed up
Each cloudbackup host has a config file, /etc/wmcs_backup_volumes.yaml, which determines which volumes are and aren't backed up. New projects are backed up be default (because of the ALLOTHERS keyword).
Restoring
Backy2 can restore volumes straight into the Ceph pool, and a restore can replace an existing cinder volume. If you're restoring into an existing cinder volume it is highly recommended to unmount and detach the volume before restoring.
Some of this will be done on a cloudcontrol node and some on the backup node that contains the backup. Volumes should have the same rbd id as they have in cinder.
root@cloudcontrol1007:~# ceph osd pool ls
eqiad1-compute
eqiad1-glance-images
eqiad1-cinder <- it's this one
device_health_metrics
.rgw.root
default.rgw.log
default.rgw.control
default.rgw.meta
default.rgw.buckets.index
default.rgw.buckets.data
default.rgw.buckets.non-ec
root@cloudcontrol1007:~# rbd --pool eqiad1-cinder list | grep bff48003-a672-47f6-997a-462422a1a719
volume-bff48003-a672-47f6-997a-462422a1a719
- If the command complains about snapshots, use the commands below to remove the snapshots of that volume.
root@cloudcontrol1005:~# rbd rm eqiad1-compute/volume-bff48003-a672-47f6-997a-462422a1a719
Removing image: 100% complete...done.
root@cloudbackup2003:~# backy2 ls volume-bff48003-a672-47f6-997a-462422a1a719
INFO: [backy2.logging] $ /usr/bin/backy2 ls volume-bff48003-a672-47f6-997a-462422a1a719
+---------------------+---------------------------------------------+-------------------------------------+------+------------+--------------------------------------+-------+-----------+---------------------+---------------------+
| date | name | snapshot_name | size | size_bytes | uid | valid | protected | tags | expire |
+---------------------+---------------------------------------------+-------------------------------------+------+------------+--------------------------------------+-------+-----------+---------------------+---------------------+
| 2024-09-06 02:17:31 | volume-bff48003-a672-47f6-997a-462422a1a719 | 2024-09-06T02:17:24_cloudbackup2003 | 512 | 2147483648 | 2c104e68-6bf6-11ef-a5b9-84160cded950 | 1 | 0 | full_backup | 2024-09-14 02:17:24 |
| 2024-09-07 20:22:52 | volume-bff48003-a672-47f6-997a-462422a1a719 | 2024-09-07T20:22:43_cloudbackup2003 | 512 | 2147483648 | f548a602-6d56-11ef-a5b9-84160cded950 | 1 | 0 | differential_backup | 2024-09-15 20:22:43 |
| 2024-09-08 20:11:44 | volume-bff48003-a672-47f6-997a-462422a1a719 | 2024-09-08T20:11:35_cloudbackup2003 | 512 | 2147483648 | 91df1a6a-6e1e-11ef-a5b9-84160cded950 | 1 | 0 | differential_backup | 2024-09-16 20:11:35 |
+---------------------+---------------------------------------------+-------------------------------------+------+------------+--------------------------------------+-------+-----------+---------------------+---------------------+
INFO: [backy2.logging] Backy complete.
root@cloudbackup1003:~# backy2 restore f548a602-6d56-11ef-a5b9-84160cded950 rbd://eqiad1-cinder/volume-f548a602-6d56-11ef-a5b9-84160cded950
INFO: [backy2.logging] $ /usr/bin/backy2 restore f548a602-6d56-11ef-a5b9-84160cded950 rbd://eqiad1-cinder/volume-f548a602-6d56-11ef-a5b9-84160cded950
INFO: [backy2.logging] Restore phase 1/2 (sparse) to rbd://eqiad1-compute/ee8bd285-73ab-4981-a1f1-498b79b50e2a_disk Read Queue [ ] Write Queue [ ] (0.0% 0.0MB/sØ ETA 2m56s)
INFO: [backy2.logging] Restore phase 1/2 (sparse) to rbd://eqiad1-cinder/volume-f548a602-6d56-11ef-a5b9-84160cded950 Read Queue [==========] Write Queue [==========] (23.0% 1121.1MB/sØ ETA 3s)
INFO: [backy2.logging] Restore phase 1/2 (sparse) to rbd://eqiad1-cinder/volume-f548a602-6d56-11ef-a5b9-84160cded950 Read Queue [==========] Write Queue [==========] (28.2% 1092.6MB/sØ ETA 5s)
INFO: [backy2.logging] Restore phase 1/2 (sparse) to rbd://eqiad1-cinder/volume-f548a602-6d56-11ef-a5b9-84160cded950 Read Queue [==========] Write Queue [==========] (34.1% 1087.6MB/sØ ETA 6s)
<etc>
Restoring a volume that has been previously deleted is a bit messier. There's probably a better way to do this, but the current (tested) procedure is to create a new empty cinder volume of the proper size, and then restore into that volume according to the above directions.
Restoring after 'openstack server delete'
We have rescued at least one VM from the void after an accidental deletion. The process involves creating a new 'host' VM with the same name (so that dns, neutron, etc are hooked up properly) and then overlaying the disk image of then new host with the restored backup.
This may be possible, with the following caveats
- Backups are only preserved for 7 days, so if the deletion is noticed weeks or months later it is probably too late.
- The restored VM will lose much of its openstack state: it will have a new IP address, forget its security groups, and most likely need its puppet config replaced in Horizon.
- If the VM predated the move from .eqiad.wmflabs to .eqiad1.wikimedia.cloud, the new VM will only be present under the new domain, eqiad1.wikimedia.cloud.
Here are the steps for rescue:
- Locate the VM in the nova database
- Locate the flavor in the nova api database
- Create the new host VM
- Proceed with the #Restoring steps from above
- Confirm puppet runs on the restored VM
- Add security groups, floating IPs, etc. as needed in Horizon
# mysql -u root nova_eqiad1
[nova_eqiad1]> SELECT hostname, id, image_ref, instance_type_id FROM instances WHERE hostname LIKE "<hostname>";
# mysql -u root nova_api_eqiad1;
[nova_api_eqiad1]> SELECT name, ID FROM flavors WHERE id='<instance_type_id from above>';
# OS_PROJECT_ID=<project> openstack server create --nic net-id=7425e328-560c-4f00-8e99-706f3fb90bb4 --flavor <flavor_id_from_above> --image <image_ref_from_above> <hostname>
Restoring a lost Glance image
Glance images are backed up on cloudcontrol nodes: each image is backed up on every node. Restoring is similar to the process for instances, but Glance accesses a snapshot rather than the primary file so there's an extra step. In this example we are restoring an image with id '06cf27ba-bed2-48c7-af2b-2abdfa65463c'.
- Find the backup you want to restore.
- Note the UID of the desired image and restore
- Create a snapshot named 'snap' for Glance to access
root@cloudcontrol1005:~# backy2 ls 06cf27ba-bed2-48c7-af2b-2abdfa65463c
INFO: [backy2.logging] $ /usr/bin/backy2 ls 06cf27ba-bed2-48c7-af2b-2abdfa65463c
+---------------------+--------------------------------------+---------------------+------+-------------+--------------------------------------+-------+-----------+----------------------------+---------------------+
| date | name | snapshot_name | size | size_bytes | uid | valid | protected | tags | expire |
+---------------------+--------------------------------------+---------------------+------+-------------+--------------------------------------+-------+-----------+----------------------------+---------------------+
| 2020-10-20 16:00:03 | 06cf27ba-bed2-48c7-af2b-2abdfa65463c | 2020-10-20T16:00:02 | 4864 | 20401094656 | 508686ba-12ed-11eb-a7f5-4cd98fc4a649 | 1 | 0 | b_daily,b_monthly,b_weekly | 2020-10-27 00:00:00 |
+---------------------+--------------------------------------+---------------------+------+-------------+--------------------------------------+-------+-----------+----------------------------+---------------------+
INFO: [backy2.logging] Backy complete.
root@cloudcontrol1005:~# backy2 restore 508686ba-12ed-11eb-a7f5-4cd98fc4a649 rbd://eqiad1-glance-images/06cf27ba-bed2-48c7-af2b-2abdfa65463c
INFO: [backy2.logging] $ /usr/bin/backy2 restore 508686ba-12ed-11eb-a7f5-4cd98fc4a649 rbd://eqiad1-glance-images/06cf27ba-bed2-48c7-af2b-2abdfa65463c
INFO: [backy2.logging] Restore phase 1/2 (sparse) to rbd://eqiad1-glance-images/06cf27ba-bed2-48c7-af2b-2abdfa65463c Read Queue [ ] Write Queue [ ] (0.0% 0.0MB/sØ ETA 2m57s)
INFO: [backy2.logging] Restore phase 1/2 (sparse) to rbd://eqiad1-glance-images/06cf27ba-bed2-48c7-af2b-2abdfa65463c Read Queue [==========] Write Queue [==========] (9.4% 244.1MB/sØ ETA 11s)
<etc>
root@cloudcontrol1005:~# rbd snap create eqiad1-glance-images/06cf27ba-bed2-48c7-af2b-2abdfa65463c@snap