GitLab/Backup and Restore
This section describes backup configuration and restore procedure for GitLab instance.
To backup application data GitLabs build in backup functionality is used. Application data backups are created by calling the
/usr/bin/gitlab-backup create command. Configuration backups are created by calling
/usr/bin/gitlab-ctl backup-etc. The commands are executed once a day in cronjobs created with Ansible and will create full backups. To configure the backups please refer to all backup related variables in Ansible.
So GitLab will create two new .tar archives every day:
- full data backup in
- full config backup in
Partial backups are disabled currently. For the initialization phase daily full backups are used. In the future we may start implementing partial and incremental backups.
Data backups and config backups will be deleted after three days on the production instance (see T274463#7147179). Release Engineering wanted to have three days of local retention for fast troubleshooting and restores. Deletion of the data backups is handled by GitLab (using the
gitlab_backup_keep_time variable). Deletion of the config backup is implemented in the backup cronjob (using the
Due to disk space issues two additional disks were added to the GitLab hosts in T330172. This disks are used for backup storage and creation only (
/srv/gitlab-backup). This partition is configured using the
sudo /opt/provision-backup-fs.sh script. See also GitLab#Bootstrap_a_new_GitLab_instance for more details.
Storing backups in bacula
For enhanced reliability backups are also stored in Bacula. Bacula is the standard for secure, encrypted backup storage in the WMF.
For the initialization phase we decided to only backup the most recent .tar file with the data backup and the most recent .tar file with the configuration backup. Furthermore these .tar files will be shipped to Bacula once a day as a full backup (see backup strategy daily). This backup strategy is not the default used by most services. The following concerns and advantages were discovered during our discussion when comparing daily full backups instead of weekly full backups and daily incremental backups (see T274463 and comments in /puppet/+/697850):
- Incremental backups of GitLab's self-contained full backups would introduce an artificial technical dependency between revisions without having an actual dependency. To restore a backup Bacula would have to merge and diff all recent incremental backups and combine them with the last full backup. However, the latest backup should be enough to restore GitLab to the previous state.
- The default backup policy would conflict with the requirement of Release Engineering to have three days of local backup retention on the GitLab host. This conflict would cause up to three times of additional disk usage in Bacula in comparison to a non-default backup policy.
- Incremental-only backups would solve the problem of additional disk usage but can't be used long term due to technical limitations of Bacula according to Data Persistence. The restore process with a lot of incremental revisions would need a long time and computing resources. Furthermore we would introduce a dependency between revisions (see above).
Because of the reasons above we decided against the default strategy and instead use Daily Full Backups. For this decisions it was necessary to implement:
- Add a new Daily Full policy to Bacula (see /puppet/+/700183)
The restore procedure depends on the host and the age of the backup that should be restored. Backups for the last three days are present on production GitLab instance in
/etc/gitlab/config_backup/. If older backups have to be restored, the backups have to be fetched from Bacula first.
Fetch backups from bacula
Restoring a backup from bacular can be done using the Bacular CLI and the guide to restore a backup of the same client. Note: only production GitLab is configured to use Bacula.
This steps follow the guide to restore a backup.
- SSH to the backup host (currently
- Run bacula command line tool:
backup1001:~$ sudo bconsole Connecting to Director backup1001.eqiad.wmnet:9101 1000 OK: 103 backup1001.eqiad.wmnet Version: 9.4.2 (04 February 2019) Enter a period to cancel a command.
- Choose option
5(5: Select the most recent backup for a client)
- Select the server (currently
- Choose the FileSet to be restored
- Use the new prompt to browse the bvfs (bacula virtual filesystem) if file metadata has not been expired from the database. Standard ls, cd commands apply. mark the files/dirs you want restored. If you specified a date old enough you will not be able to browse and you will have to restore the entire fileset
- use the
markcommand to mark files you want to be restored. wildcards work, there is also
You are now entering file selection mode where you add (mark) and remove (unmark) files to be restored. No files are initially added, unless you used the "all" keyword on the command line. Enter "done" to leave this mode. cwd is: / $ ls etc/ srv/ $ mark srv/ 2 files marked.
- modify the job if needed (for example change the destination directory)
- wait :-) (you can use the messages command to see the status of the restore job)
- Check the backup on the GitLab host:
gitlab1001:~$ ls -l /var/tmp/bacula-restores/srv/gitlab-backup/ total 17512 -rw------- 1 root root 17930240 Apr 12 15:41 1681314083_2023_04_12_15.8.5_gitlab_backup.tar
Proceed with restore of the backup to GitLab.
Restore backup to GitLab
To proceed with the restore procedure, the config backup and data backup both of the same day should be present on the GitLab machine to perform the restore (either by using local backups or by restoring the Bacula backup to a temporary folder). Make sure to move the backup to the default backup path
Restore is handled by the gitlab-restore.sh script on the GitLab hosts. It includes all steps of the manual restore process below. To restore from the latest available backup, run:
systemctl start backup-restore
The script can also be triggered manually. This is useful if restore to a older backup version (not latest) should happen:
/srv/gitlab-backup/gitlab-restore.sh -f <backup_name>
gitlab-restore.sh script is used on a production host (non-replica/wmcloud) the restore has to be forced using a additional
- Make yourself familiar with Restore Prerequisites and the official omnibus restore guide.
- Select the backup archives (data and configuration) to restore and copy them to the target host to
- Make sure the backup archives are owned by
sudo chown git.git /srv/gitlab-backup/1628121868_2021_08_05_13.12.9_gitlab_backup.tar
- Confirm that there is enough free space on GitLab installation mountpoint on the target host
jelto@gitlab2001:~$ df -h
- Make sure GitLab package (
gitlab-ce) is properly installed on the target host and installed version is same as was used to create data and configuration archives; use version code from the name of the data archive to verify this
jelto@gitlab2001:~$ dpkg -l | grep gitlab ii gitlab-ce 13.12.9-ce.0 amd64 GitLab Community Edition (including NGINX, Postgres, Redis) jelto@gitlab2001:~$ sudo ls /srv/gitlab-backup/ | grep gitlab_backup | cut -d "_" -f 5 13.12.9
- Restore GitLab configuration file into
/etc/gitlab/gitlab.rbfrom the configuration backup archive and restore GitLab secrets file into
/etc/gitlab/gitlab-secrets.jsonfrom the configuration backup archive
sudo tar -xvf /etc/gitlab/config_backup/gitlab_config_1680255776_2023_03_31.tar --strip-components=2 -C /etc/gitlab/
- When restoring a replica, overwrite the
/etc/gitlab/gitlab.rbfile with the local one of the replica
sudo gitlab-ctl reconfigureto make sure GitLab installation is set up and PostgreSQL database is initialized; make sure GitLab configuration was done successfully
- Make sure GitLab is running with
sudo gitlab-ctl status; if not, start it with
sudo gitlab-ctl start
- Make sure the GitLab backup path, configured in
/etc/gitlab/gitlab.rbexists, owned by
- Before restoring, disallow users' access to the GitLab. Can be skipped when restoring a replica.
- If you have GitLab Runners connected to your running GitLab Server, pause all runners and wait until all jobs are finished before starting the restore. Can be skipped when restoring a replica.
- Stop GitLab's dedicated ssh server:
sudo systemctl stop ssh-gitlab; check with
sudo systemctl status ssh-gitlab
- Stop database-connected GitLab processes:
sudo gitlab-ctl stop pumaand
sudo gitlab-ctl stop sidekiq; check with
sudo gitlab-ctl status. DO NOT stop other GitLab processes, they are required for restoring
- Restore GitLab data by running
sudo gitlab-backup restore BACKUP=timestamp_of_backup;
timestamp_of_backupis the datecode from data backup name, example:
- Reconfigure GitLab:
sudo gitlab-ctl reconfigure
- Restart GitLab services:
sudo gitlab-ctl restart; check with
sudo gitlab-ctl status
- Run GitLab check rake task:
sudo gitlab-rake gitlab:check SANITIZE=true; make sure all checks are fine
- Check GitLab can decrypt secrets:
sudo gitlab-rake gitlab:doctor:secrets; make sure check is fine, if not, check that
/etc/gitlab/gitlab-secrets.jsonwas restored correctly
- Restore GitLab's dedicated ssh server:
sudo systemctl restart ssh-gitlab
- run basic smoke tests (make sure that web UI works, authentication works, ssh cloning works)
- re-enable paused runners (if required)