Data Platform/Systems/CloudnativePG/Clusters
Using the CloudnativePG operator, it becomes quite easy to deploy PostgreSQL clusters in Kubernetes.
We have created a custom cloudnative-pg-cluster chart, allowing users to create a cluster from a very simple YAML values file, such as this one.
fullnameOverride: postgres-airflow-analytics
By default, each cluster will upload base backups and WALs to a dedicated S3 bucket, and performs a daily backup at midnight UTC.
Observability
Dashboard
All cloudnative-pg clusters are instrumented using Prometheus and can be observed via this Grafana dashboard.
Monitoring
Logs
Operations
Finding what instance is the current PG master
brouberol@deploy2002:~$ kubectl get cluster postgresql-test -ojson | jq -r .status.currentPrimary
postgresql-test-2
Performing a manual backup
[1] On the deployment server, run the following commands.
brouberol@deploy1003:~$ cat <<EOF >manual-backup.yaml
apiVersion: postgresql.cnpg.io/v1
kind: Backup
metadata:
name: postgresql-test-on-demand-backup-$(date "+%Y%m%d%H%M%S")
spec:
method: barmanObjectStore
cluster:
name: postgresql-test
EOF
brouberol@deploy1003:/home/brouberol# kubectl create -f manual-backup.yaml -n postgresql-test
backup.postgresql.cnpg.io/postgresql-test-on-demand-backup-20240917141528 created
You can then check the state of the backup with kubectl.
brouberol@deploy1003:/home/brouberol# kubectl get backups -n postgresql-test --sort-by='{metadata.creationTimestamp}' --no-headers | tac
postgresql-test-on-demand-backup-20240917141528 72s postgresql-test barmanObjectStore completed
postgresql-test-daily-backup-20240917000000 14h postgresql-test barmanObjectStore completed
postgresql-test-daily-backup-20240916073251 30h postgresql-test barmanObjectStore completed
Restoring a backup by replaying all WAL files
[2] The easiest way to restore the latest archived state of a pre-existing database is to deploy it with the following values:
mode: recovery
recovery:
method: object_store
clusterName: postgresql-test
backups:
enabled: false
When deploying the cluster, this will start a full-recovery pod, that will replay all WAL files, after which the PG pods themselves will be deployed. After everything is deployed successfully, redeploy the cluster with all ^ these ^ values either removed or commented out. This will make sure to start the cluster in a normal mode and re-enable backups (and a backup will be immediately triggered for good measure).
Performing a Point In Time Recovery (PITR)
[2] A PITR allows you to restore the state as backed up at a certain point in time.
You can list the available base backups by descending dates by running
root@deploy1003:~# kubectl get backups -n postgresql-test --sort-by='{metadata.creationTimestamp}' --no-headers | tac
postgresql-test-daily-backup-20240917000000 13h postgresql-test barmanObjectStore completed
postgresql-test-daily-backup-20240916073251 30h postgresql-test barmanObjectStore completed
Say we'd like to restore postgresql-test-daily-backup-20240917000000
. To do this, first extract the creation timestamp of the backup resource:
root@deploy1003:~# kubectl get backups -n postgresql-test -o jsonpath='{.metadata.creationTimestamp}{"\n"}' postgresql-test-daily-backup-20240917000000
2024-09-17T00:00:00Z
You then need to convert this timestamp to another format, understood by the https://pgbarman.org/ tools, used to restore the backup.
$ python3 -c'import datetime as dt;print(dt.datetime.fromisoformat("2024-09-17T00:00:00Z").strftime("%Y-%m-%d %H:%M:%S.%f%z"))'
2024-09-17 00:00:00.000000+0000
You can then deploy the cluster with the following values:
mode: recovery
recovery:
method: object_store
clusterName: postgresql-test
recoveryTarget:
targetTime: "2024-09-17 00:00:00.000000+0000"
backups:
enabled: false
Once the cluster is deployed and running, redeploy it with all ^ these ^ values commented out or deleted. This will make sure to start the cluster in a normal mode and re-enable backups (and a backup will be immediately triggered for good measure).
Increasing the storage size of a cluster
[3] As we're storing the data on Ceph, we can resize the volume in which PG stores its data on the fly. To do this, simply adjust thecluster.storage.size
cluster configuration value.
Assuming the cluster was deployed with the following configuration
cluster:
storage:
size: 10Gi
then change it to
cluster:
storage:
size: 15Gi
and redeploy the cluster. The volumes should get automatically upsized within minutes.
Increasing the WAL storage size of a cluster
Follow the same instructions than in Data Platform/Systems/CloudnativePG/Clusters#Increasing the storage size of a cluster but adjust the cluster.walStorage.size
parameter instead.
Increasing the max number of connections on PostgreSQL
PostgreSQL server configuration tunables can be set and tweaked via the cluster.postgresql.parameters
YAML dictionary (except these ones). For example, if you'd like to increase the max_connections
setting to 300 in a cluster using all default configuration values, change the cluster configuration to
cluster:
postgresql:
parameters:
max_connections: 300
Increasing the max number of client connections on PGBouncer
[4] A subset of PGBouncer configuration options can be tweaked from the release YAML values themselves. For example, if PGBouncer is getting near its client connection limit, you can increase it by tweaking the pooler.parameters
dictionary entries. For example:
pooler:
parameters:
max_client_conn: 200
All PGBouncer configuration options are available here: https://www.pgbouncer.org/config.html
Import data from an external database
The process for importing a database data from outside of Kubernetes described in the CloudnativePG documentation can be performed by deploying the following values:
cluster:
initdb:
import:
host: an-db1001.eqiad.wmnet
user: airflow-test-k8s
dbname: airflow-test-k8s
password: xxxx # commit this in the private puppet repo
external_services:
postgresql: [analytics]
Once the cluster has been deployed, exec into the PG master pod and run a psql shell. Check that the data has indeed be replicated. Once you are satisfied it has, delete the password from the private puppet repository, run puppet on the deployment server then redeploy the cluster with ^ these ^ values either removed or commented out.
Checking the status of the out-of-band backups of backups
We perform a daily sync of all Cloudnative PG clusters' S3 bucket to db1208.eqiad.wmnet
, so that they get included in Bacula out-of-band backups. This daily sync is performed on db1208.eqiad.wmnet
, via the backup_s3_postgresql
systemd timer.
If that daily sync is failing, check the error logs via:
$ ssh db1208.eqiad.wmnet
brouberol@db1208:~$ systemctl status backup_s3_postgresql.service
○ backup_s3_postgresql.service - Create local copies of S3 buckets containing PostgreSQL backups to the local filesystem
Loaded: loaded (/lib/systemd/system/backup_s3_postgresql.service; static)
Active: inactive (dead) since Mon 2024-10-14 09:21:23 UTC; 4h 12min ago
TriggeredBy: ● backup_s3_postgresql.timer
Docs: https://wikitech.wikimedia.org/wiki/Monitoring/systemd_unit_state
Process: 2296733 ExecStart=/usr/bin/rclone -v --config /srv/postgresql_backups/rclone.conf --exclude /rclone.conf sync everything: /srv/postgresql_backups (code=exited, status=0/SUCCESS)
Main PID: 2296733 (code=exited, status=0/SUCCESS)
CPU: 30.644s
brouberol@db1208:~$ sudo journalctl -u backup_s3_postgresql.service -n 20
Oct 14 09:21:20 db1208 rclone[2296733]: INFO : postgresql-airflow-test-k8s.dse-k8s-eqiad/postgresql-airflow-test-k8s/base/20241010T000000/backup.info: Copied (new)
...
Oct 14 09:21:23 db1208 rclone[2296733]: INFO : postgresql-airflow-test-k8s.dse-k8s-eqiad/postgresql-airflow-test-k8s/base/20241014T000000/data.tar.gz: Copied (new)
Oct 14 09:21:23 db1208 rclone[2296733]: INFO :
Oct 14 09:21:23 db1208 rclone[2296733]: Transferred: 1.497 GiB / 1.497 GiB, 100%, 93.372 MiB/s, ETA 0s
Oct 14 09:21:23 db1208 rclone[2296733]: Transferred: 5035 / 5035, 100%
Oct 14 09:21:23 db1208 rclone[2296733]: Elapsed time: 16.9s
Oct 14 09:21:23 db1208 systemd[1]: backup_s3_postgresql.service: Deactivated successfully.
Oct 14 09:21:23 db1208 systemd[1]: Finished backup_s3_postgresql.service - Create local copies of S3 buckets containing PostgreSQL backups to the local filesystem.
Oct 14 09:21:23 db1208 systemd[1]: backup_s3_postgresql.service: Consumed 30.644s CPU time.
Investigate why WALs are not being archived
The primary instance should archive their WAL file every 5 minutes. First off, find what instance is currently acting as the PG primary. Then, tail the logs from that instance, either using kubectl or the logstash dashboard.
brouberol@deploy2002:~$ kubectl logs postgresql-airflow-test-k8s-2 | grep archive
These logs should shine some light about the current issue.
Here's an example of such an investigation.