Data Platform/Systems/Airflow/Kubernetes/Operations
Creating a new instance
airflow-example, deployed with a dedicated PG cluster named postgresql-airflow-example, deployed in the dse-k8s-eqiad Kubernetes environment, accessible via https://airflow-example.wikimedia.org
- The first thing you need to do is create Kubernetes read and deploy user credentials
- Add a namespace (using the same name as the airflow instance) entry into
deployment_charts/helmfile.d/admin_ng/values/dse-k8s.yamlnamespaces: # ... airflow-example: deployClusterRole: deploy-airflow customPriorityClasses: [pod-critical] tlsExtraSANs: - airflow-example.wikimedia.org
- Add the namespace under the
tenantNamespaceslist indeployment_charts/helmfile.d/admin_ng/values/dse-k8s-eqiad/cephfs-csi-rbd-values.yamlas well asdeployment_charts/helmfile.d/admin_ng/values/dse-k8s-eqiad/cephfs-csi-rbd-values.yaml - Add the namespace to the
watchedNamespaceslist defined indeployment_charts/helmfile.d/admin_ng/values/dse-k8s-eqiad/cloudnative-pg.yaml - Then, create the public and internal DNS records for this instance
- Define the PG cluster and airflow instance
helmfile.yamlfiles and associated values (take example fromdeployment_charts/helmfile.d/dse-k8s-services/postgresql-airflow-exampleanddeployment_charts/helmfile.d/dse-k8s-services/airflow-example) - Generate the S3 keypairs for both PG and Airflow
brouberol@cephosd1001:~$ sudo radosgw-admin user create --uid=postgresql-airflow-example --display-name="postgresql-airflow-example" brouberol@cephosd1001:~$ sudo radosgw-admin user create --uid=airflow-example --display-name="airflow-example" # note: copy the `access_key` and `secret_key` from the JSON output, you will need it in the next step
- Create the S3 buckets for both PG and Airflow
brouberol@stat1008:~$ read access_key <PG S3 access key> brouberol@stat1008:~$ read secret_key <PG S3 secret key> brouberol@stat1008:~$ s3cmd --access_key=$access_key --secret_key=$secret_key --host=rgw.eqiad.dpe.anycast.wmnet --region=dpe --host-bucket=no mb s3://postgresql-airflow-example.dse-k8s-eqiad brouberol@stat1008:~$ read access_key <Airflow S3 access key> brouberol@stat1008:~$ read secret_key <Airflow S3 secret key> brouberol@stat1008:~$ s3cmd --access_key=$access_key --secret_key=$secret_key --host=rgw.eqiad.dpe.anycast.wmnet --region=dpe --host-bucket=no mb s3://logs.airflow-example.dse-k8s-eqiad
- Register the service in our IDP server (into
idp.yaml). After the patch was merged and puppet ran on the idp servers, copy the OIDC secret key generated for the airflow service.root@idp1004:# cat /etc/cas/services/airflow_example-*.json | jq -r .clientSecret <OIDC secret key>
- Issue a Kerberos keytab following the guide provided on Data Platform/Systems/Kerberos/Administration. The hostname must match the internal service discovery domain of the airflow instance.
# Change `analytics-example` to the UNIX user the airflow tasks will impersonate by default in Hadoop brouberol@krb1002:~$ sudo kadmin.local addprinc -randkey analytics-example/airflow-example.discovery.wmnet@WIKIMEDIA brouberol@krb1002:~$ sudo kadmin.local addprinc -randkey airflow/airflow-example.discovery.wmnet@WIKIMEDIA brouberol@krb1002:~$ sudo kadmin.local addprinc -randkey HTTP/airflow-example.discovery.wmnet@WIKIMEDIA brouberol@krb1002:~$ sudo kadmin.local ktadd -norandkey -k analytics-example.keytab \ analytics-example/airflow-example.discovery.wmnet \ airflow/airflow-example.discovery.wmnet@WIKIMEDIA \ HTTP/airflow-example.discovery.wmnet@WIKIMEDIA
Created the base64 representation for the keytab
root@krb1002:~# base64 analytics-example.keytab
- Generate the secrets or both the PG cluster and the Airflow instance and add the to the private puppet repository, to
/srv/git/private/hieradata/role/common/deployment_server/kubernetes.yamldse-k8s: # ... postgresql-airflow-example: dse-k8s-eqiad: s3: accessKey: <PG S3 access key> secretKey: <PG S3 secret key> airflow-example: dse-k8s-eqiad: config: private: airflow__core__fernet_key: <random 64 characters> airflow__webserver__secret_key: <random 64 characters> airflow: aws_access_key_id: <Airflow S3 access key> aws_secret_access_key: <Airflow S3 secret key> oidc: client_secret: <OIDC secret key> kerberos: keytab: | # The base64 representation obtained in the previous step ABCDEFHIJKLMNOP ABCDEFHIJKLMNOP=
- Register the PG bucket name and keys into
/srv/git/private/hieradata/role/common/mariadb/misc/analytics/backup.yamlon thepuppetserverhost. This will make sure that the PG base backups and WALs are regularly backed up outside of our Ceph cluster.profile::ceph::backup::s3_local::sources: ... postgresql-airflow-example.dse-k8s-eqiad: # must match the PG bucket name access_key: <PG S3 access key> secret_key: <PG S3 secret key>
- Deploy the service (which should deploy both the PG cluster and the airflow instance)
- Once the instance is running, enable the ATS redirection from the wikimedia.org subdomain to the kube ingress. After puppet has run on all the cache servers (wait a good 30 minutes), https://airflow-example.wikimedia.org should display the airflow web UI, and you should be able to connect via CAS.
Hadoop setup
Once the instance is up and running, we need to make sure Hadoop is correctly configured to support it.
Note: These instructions will assume that the airflow instance was deployed using a new Kerberos principal primary (that we'll name analytics-example in that section). If you're reusing an existing principal, such as analytics, you can stop there.
Create UNIX user/group
- Define an analytics-example user and group in
data.yamland add it to theanalytics-privatedata-usersgroup. - In the same file, define an
analytics-example-usersgroup containing all human users requiring to impersonate the primary
Create the HDFS folders
- Create a home directory in HDFS for that user
brouberol@an-master1003:~$ sudo -u hdfs /usr/bin/hdfs dfs -mkdir /user/analytics-example brouberol@an-master1003:~$ sudo -u hdfs /usr/bin/hdfs dfs -chown analytics-example /user/analytics-example brouberol@an-master1003:~$ sudo -u hdfs /usr/bin/hdfs dfs -chgrp analytics-example-users /user/analytics-example
- Create a temporary directory for the airflow instance
brouberol@an-master1003:~$ sudo -u hdfs /usr/bin/hdfs dfs -mkdir /tmp/example brouberol@an-master1003:~$ sudo -u hdfs /usr/bin/hdfs dfs -chown analytics-example /tmp/example brouberol@an-master1003:~$ sudo -u hdfs /usr/bin/hdfs dfs -chown analytics-privatedata-users /tmp/example
- Create an artifact directory for the airflow instance
brouberol@an-master1003:~$ sudo -u hdfs /usr/bin/hdfs dfs -mkdir /wmf/cache/artifacts/airflow/example brouberol@an-master1003:~$ sudo -u hdfs /usr/bin/hdfs dfs -chown blunderbuss /wmf/cache/artifacts/airflow/example brouberol@an-master1003:~$ sudo -u hdfs /usr/bin/hdfs dfs -chgrp blunderbuss /wmf/cache/artifacts/airflow/example
Configuring out-of-band backups
The PostgreSQL database cluster for this instance will already be configured with its own backup system that writes database backups and WAL archives to the S3 interface of the Ceph cluster.
However, we decided to implement out-of-band backups of each of the S3 buckets containing these database backups, so we added a new backup pipeline to our database backup replica system, which is db1208.
In this case the file you need to modify when you add a new instance is in the private repo and is named: hieradata/role/common/mariadb/misc/analytics/backup.yaml
Add your new bucket and its access credentials to the profile::ceph::backup::s3_local::sources hash structure, as shown.
profile::ceph::backup::s3_local::sources:
postgresql.airflow-example-k8s-eqiad:
access_key: <Airflow S3 access key>
secret_key: <Airflow S3 secret key>
When merged, this will update the file /srv/postgresql_backups/rclone.conf on db1208, adding the backups of this database cluster to the daily sync process and therefore to Bacula.
Kubernetes migration operations
Data Platform/Systems/Airflow/Kubernetes/Operations/K8s-Migration