GitLab/Backup and Restore
This section describes backup configuration and restore procedure for GitLab instance.
Backups
To backup application data GitLabs build in backup functionality is used. Application data backups are created by calling the /usr/bin/gitlab-backup create
command. Configuration backups are created by calling /usr/bin/gitlab-ctl backup-etc
. The commands are executed once a day in cronjobs created with Ansible and will create full backups. To configure the backups please refer to all backup related variables in Ansible.
So GitLab will create two new .tar archives every day:
- full data backup in
{{gitlab_backup_path}}
- full config backup in
/etc/gitlab/config_backup
Partial backups are disabled currently. For the initialization phase daily full backups are used. In the future we may start implementing partial and incremental backups.
Backup retention
Data backups and config backups will be deleted after three days on the production instance (see T274463#7147179). Release Engineering wanted to have three days of local retention for fast troubleshooting and restores. Deletion of the data backups is handled by GitLab (using the gitlab_backup_keep_time
variable). Deletion of the config backup is implemented in the backup cronjob (using the gitlab_backup_config_keep_num
variable).
Storing backups in bacula
For enhanced reliability backups are also stored in Bacula. Bacula is the standard for secure, encrypted backup storage in the WMF.
For the initialization phase we decided to only backup the most recent .tar file with the data backup and the most recent .tar file with the configuration backup. Furthermore these .tar files will be shipped to Bacula once a day as a full backup (see backup strategy daily). This backup strategy is not the default used by most services. The following concerns and advantages were discovered during our discussion when comparing daily full backups instead of weekly full backups and daily incremental backups (see T274463 and comments in /puppet/+/697850):
- Incremental backups of GitLab's self-contained full backups would introduce an artificial technical dependency between revisions without having an actual dependency. To restore a backup Bacula would have to merge and diff all recent incremental backups and combine them with the last full backup. However, the latest backup should be enough to restore GitLab to the previous state.
- The default backup policy would conflict with the requirement of Release Engineering to have three days of local backup retention on the GitLab host. This conflict would cause up to three times of additional disk usage in Bacula in comparison to a non-default backup policy.
- Incremental-only backups would solve the problem of additional disk usage but can't be used long term due to technical limitations of Bacula according to Data Persistence. The restore process with a lot of incremental revisions would need a long time and computing resources. Furthermore we would introduce a dependency between revisions (see above).
Because of the reasons above we decided against the default strategy and instead use Daily Full Backups. For this decisions it was necessary to implement two changes:
- Add a new Daily Full policy to Bacula (see /puppet/+/700183)
- Create dedicated folder structure for GitLab latest backup (see /gitlab-ansible/+/700084 and below)
"Latest" backup
To implement the strategy of daily full backups, a dedicated folder structure is needed for Bacula. We have to make sure that Bacula will not save the last three backups available on the GitLab host. Bacula must only backup the directory with the most recent files. For this purpose we created a additional ./latest
directory inside each of the backup directories (using Ansible). Since our goal is to replace the Ansible code with puppet eventually, we also ensured the "latest" backup dirs exist using Puppet. We did this in 2 places, the profile class currently used in production (gerrit:700622) and the backup class from the gitlab module currently used only in cloud (gerrit:700595). Ideally we want to get to a situation where both production and cloud machines are setup automatically by the same puppet role, both using the module. The backup scripts on the GitLab machine will update the latest.tar
file.
/srv/gitlab-backup/
├── 1624752267_2021_06_27_13.11.5_gitlab_backup.tar
├── 1624838667_2021_06_28_13.11.5_gitlab_backup.tar
├── 1624925067_2021_06_29_13.11.5_gitlab_backup.tar
└── latest
└── latest.tar
Bacula is then configured to just use the /latest
folder and save the most recent backup. Here is the fileset used in bacula:
bacula::director::fileset { 'gitlab':
includes => [ '/srv/gitlab-backup/latest', '/etc/gitlab/config_backup/latest' ]
}
Restore
The restore procedure depends on the host and the age of the backup that should be restored. Backups for the last three days are present on production GitLab instance in /srv/gitlab-backup/
and /etc/gitlab/config_backup/
. If older backups have to be restored, the backups have to be fetched from Bacula first.
Fetch backups from bacula
Restoring a backup from bacular can be done using the Bacular CLI and the guide to restore a backup of the same client. Note: only production GitLab is configured to use Bacula.
This steps follow the guide to restore a backup.
- SSH to the backup host (currently
backup1001.eqiad.wmnet
) - Run bacula command line tool:
sudo bconsole
backup1001:~$ sudo bconsole
Connecting to Director backup1001.eqiad.wmnet:9101
1000 OK: 103 backup1001.eqiad.wmnet Version: 9.4.2 (04 February 2019)
Enter a period to cancel a command.
- Choose
restore
option
- Choose option
5
(5: Select the most recent backup for a client) - Select the server (currently
96: gitlab1001.wikimedia.org-fd
) - Choose the FileSet to be restored
- Use the new prompt to browse the bvfs (bacula virtual filesystem) if file metadata has not been expired from the database. Standard ls, cd commands apply. mark the files/dirs you want restored. If you specified a date old enough you will not be able to browse and you will have to restore the entire fileset
- use the
mark
command to mark files you want to be restored. wildcards work, there is alsounmark
You are now entering file selection mode where you add (mark) and
remove (unmark) files to be restored. No files are initially added, unless
you used the "all" keyword on the command line.
Enter "done" to leave this mode.
cwd is: /
$ ls
etc/
srv/
$ mark srv/
2 files marked.
- Enter
done
- modify the job if needed (for example change the destination directory)
- wait :-) (you can use the messages command to see the status of the restore job)
- Check the backup on the GitLab host:
gitlab1001:~$ ls -l /var/tmp/bacula-restores/srv/gitlab-backup/latest/
total 17512
-rw------- 1 root root 17930240 Aug 11 00:04 latest.tar
Proceed with restore of the backup to GitLab.
Restore backup to GitLab
To proceed with the restore procedure, the config backup and data backup both of the same day should be present on the GitLab machine to perform the restore (either by using local backups or by restoring the Bacula backup to a temporary folder). Make sure to move the backup to the default backup path /srv/gitlab-backup/
and /etc/gitlab/config_backup/
.
- Make yourself familiar with Restore Prerequisites and the official omnibus restore guide.
- Select the backup archives (data and configuration) to restore and copy them to the target host to
/srv/gitlab-backup/
and/etc/gitlab/config_backup/
- Make sure the backup archives are owned by
git:git
and haverw-------
(0600
) permissions.
sudo chown git.git /srv/gitlab-backup/1628121868_2021_08_05_13.12.9_gitlab_backup.tar
- Confirm that there is enough free space on GitLab installation mountpoint on the target host
jelto@gitlab2001:~$ df -h
- Make sure GitLab package (
gitlab-ce
) is properly installed on the target host and installed version is same as was used to create data and configuration archives; use version code from the name of the data archive to verify this
jelto@gitlab2001:~$ dpkg -l | grep gitlab
ii gitlab-ce 13.12.9-ce.0 amd64 GitLab Community Edition (including NGINX, Postgres, Redis)
jelto@gitlab2001:~$ sudo ls /srv/gitlab-backup/ | grep gitlab_backup | cut -d "_" -f 5
13.12.9
- Restore GitLab configuration file into
/etc/gitlab/gitlab.rb
from the configuration backup archive and restore GitLab secrets file into/etc/gitlab/gitlab-secrets.json
from the configuration backup archive
sudo tar -xvf /etc/gitlab/config_backup/latest/latest.tar --strip-components=2 -C /etc/gitlab/
- When restoring a replica, overwrite the
/etc/gitlab/gitlab.rb
file with the local one of the replica - Run
sudo gitlab-ctl reconfigure
to make sure GitLab installation is set up and PostgreSQL database is initialized; make sure GitLab configuration was done successfully - Make sure GitLab is running with
sudo gitlab-ctl status
; if not, start it withsudo gitlab-ctl start
- Make sure the GitLab backup path, configured in
gitlab_rails['backup_path']
setting in/etc/gitlab/gitlab.rb
exists, owned bygit:root
and hasrwx------
(0700
) permissions - Before restoring, disallow users' access to the GitLab. Can be skipped when restoring a replica.
- If you have GitLab Runners connected to your running GitLab Server, pause all runners and wait until all jobs are finished before starting the restore. Can be skipped when restoring a replica.
- Stop GitLab's dedicated ssh server:
sudo systemctl stop ssh-gitlab
; check withsudo systemctl status ssh-gitlab
- Stop database-connected GitLab processes:
sudo gitlab-ctl stop puma
andsudo gitlab-ctl stop sidekiq
; check withsudo gitlab-ctl status
. DO NOT stop other GitLab processes, they are required for restoring - Restore GitLab data by running
sudo gitlab-backup restore BACKUP=timestamp_of_backup
;timestamp_of_backup
is the datecode from data backup name, example:1628121868_2021_08_05_13.12.9
- Reconfigure GitLab:
sudo gitlab-ctl reconfigure
- Restart GitLab services:
sudo gitlab-ctl restart
; check withsudo gitlab-ctl status
- Run GitLab check rake task:
sudo gitlab-rake gitlab:check SANITIZE=true
; make sure all checks are fine - Check GitLab can decrypt secrets:
sudo gitlab-rake gitlab:doctor:secrets
; make sure check is fine, if not, check that/etc/gitlab/gitlab-secrets.json
was restored correctly - Restore GitLab's dedicated ssh server:
sudo systemctl restart ssh-gitlab
- run basic smoke tests (make sure that web UI works, authentication works, ssh cloning works)
- re-enable paused runners (if required)
Automated restore backup to GitLab
The restore consists a lot of different steps. There is a script to automate the restore process. Currently this script is intended for the restore of the replica only. In the future this script should also be able to do restores on the non-replica instance. But until this is implemented, please follow the manual restore procedure above.