Data Platform/Systems/Airflow/Instances
WMF's Airflow system is composed of several Airflow instances. Each instance is supposed to schedule and orchestrate jobs belonging to a particular grouping. For example, there's an Airflow instance called analytics
, which schedules jobs that generate and process analytics data sets. There's another instance called research
, which orchestrates jobs that process research-related data sets. Usually, each Airflow instance is managed by a given WMF team, for example, the analytics
instance is managed by the Data Engineering team, and most of its jobs have been developed by them. However, an Airflow instance can also be shared by several teams, and also one team can part-take in the development of jobs in multiple Airflow instances.
Multi-instance vs. single instance
During the development of WMF's Airflow system, we've had discussions about using a single instance approach versus a multi-instance approach. There are advantages and disadvantages in both cases. This thread contains most of the arguments we discussed, which include the following:
Single instance | Multi-instance | |
---|---|---|
Pros | Single configuration, no custom stacks for teams, and thus, easy upgrades and maintenance. | No single point of failure, if a team deploys code that breaks Airflow services, the other instances continue working. Teams have more independence when deploying. |
Cons | Airflow does not support Kerberos multitenancy (yet), so one single instance would require that all WMF jobs accessed Hadoop with the same Kerberos credentials, not allowing for access control or specific permissions. | When doing maintenance, the Data Engineering team will have to wrangle multiple airflow instances to stop jobs. |
We Data Engineering decided to kick off the project with a multi-instance approach, mainly because of the Kerberos issue, but we don't discard the possibility of switching to single instance in the future. All WMF Airflow instances are set up by the same puppet configuration, so even if we provide multiple instances, they all will have the same stack (see: https://github.com/wikimedia/puppet/tree/production/modules/airflow).
Access via SSH tunnel
Although all instances provide a public web endpoint like https://airflow-{instance-name}.wikimedia.org/, the web UI access commands below show how an instance owner can connect directly via SSH tunnel to the instance. The commands will not work for non-owners.
However, anyone in the analytics-privatedata-users
group can access any server by routing their SSH connection through one of the analytics clients. This is useful for tracking the status of jobs on different instances. For example, to connect to the analytics instance through stat1011, use the following command:
ssh -N -L 8600:an-launcher1002.eqiad.wmnet:8600 stat1011.eqiad.wmnet
You can now access the web UI at http://localhost:8600.
List of instances
test-k8s
This is a temporary instance that is being used by the Data Platform SRE team to facilitate the migration of Airflow to Kubernetes.
Host | n/a - running on the dse-k8s cluster |
Service user | Default: analytics - but this can be overridden at the DAG and task levels |
Web UI Access | https://airflow-test-k8s.wikimedia.org |
Dags | airflow-dags/test_k8s/dags |
Dags deployment | Still TBD. See task T368033 |
analytics
Airflow instance owned by the Data / Analytics engineering team. Contains all production jobs historically developed by the team.
Host | an-launcher1002.eqiad.wmnet |
Service user | analytics |
Web UI Port | 8600 |
Web UI Access | https://airflow-analytics.wikimedia.org/
or
|
Dags | airflow-dags/analytics/dags |
Dags deployment |
ssh deployment.eqiad.wmnet cd /srv/deployment/airflow-dags/analytics git fetch && git rebase scap deploy |
analytics_test
Airflow test instance owned by the Data / Analytics engineering team. Contains some jobs analog to the ones in the analytics instance, just to create some data flows in the Data Engineering's test cluster.
Host | an-test-client1002.eqiad.wmnet |
Service user | analytics |
Web UI Port | 8600 |
Web UI Access | https://airflow-analytics-test.wikimedia.org/
or
|
Dags | airflow-dags/analytics_test/dags |
Dags deployment |
ssh deployment.eqiad.wmnet cd /srv/deployment/airflow-dags/analytics_test git fetch && git rebase scap deploy |
search
Airflow instance owned by the Search team.
Host | an-airflow1005.eqiad.wmnet |
Service user | analytics-search |
Web UI Port | 8600 |
Web UI Access | https://airflow-search.wikimedia.org/
or
|
Dags | airflow-dags/search/dags |
Dags deployment | ssh deployment.eqiad.wmnet cd /srv/deployment/airflow-dags/search git fetch && git rebase scap deploy |
research
Airflow instance owned by the Research team.
Host | an-airflow1002.eqiad.wmnet |
Service user | analytics-research |
Web UI Port | 8600 |
Web UI Access | https://airflow-research.wikimedia.org/
or
|
Dags | airflow-dags/research/dags |
Dags deployment |
ssh deployment.eqiad.wmnet cd /srv/deployment/airflow-dags/research git fetch && git rebase scap deploy |
platform_eng
Airflow instance owned by the Platform Engineering team.
Host | an-airflow1004.eqiad.wmnet |
Service user | analytics-platform-eng |
Web UI Port | 8600 |
Web UI Access | https://airflow-platform-eng.wikimedia.org/
or
|
Dags | airflow-dags/platform_eng/dags |
Dags deployment |
ssh deployment.eqiad.wmnet cd /srv/deployment/airflow-dags/platform_eng git fetch && git rebase scap deploy |
analytics_product
Airflow instance owned by the Product Analytics engineering team. Contains all production jobs historically developed by the team.
Host | an-airflow1006.eqiad.wmnet |
Service user | product-analytics |
Web UI Port | 8600 |
Web UI Access | https://airflow-analytics-product.wikimedia.org/
or
|
Dags | airflow-dags/analytics_product/dags |
Dags deployment |
ssh deployment.eqiad.wmnet cd /srv/deployment/airflow-dags/analytics_product git fetch && git rebase scap deploy |
wmde
Airflow instance owned by the WMDE engineering team. Contains all production jobs historically developed by the team.
Host | an-airflow1007.eqiad.wmnet |
Service user | analytics-wmde |
Web UI Port | 8600 |
Web UI Access | https://airflow-wmde.wikimedia.org/
or
|
Dags | airflow-dags/wmde/dags |
Dags deployment |
ssh deployment.eqiad.wmnet cd /srv/deployment/airflow-dags/wmde git fetch && git rebase scap deploy |
ml
Airflow instance owned by the ML team. Contains all production jobs developed by the team.
Host | Kubernetes |
Service user | N/A |
Web UI Port | N/A |
Web UI Access | https://airflow-wmde.wikimedia.org/ |
Dags | airflow-dags/ml/dags |
Dags deployment | Data Platform/Systems/Airflow/Kubernetes#DAGs deployment |
Custom test instance
More at Analytics/Systems/Airflow/Airflow_testing_instance_tutorial