Data Platform/Systems/Airflow/Upgrading
We have the following procedures in place to upgrade Airflow versions.
Configure feature branch
Check out a working copy of the https://gitlab.wikimedia.org/repos/data-engineering/airflow-dags repository
Create a feature branch, for example: git checkout -b airflow_version_2_6
Determine the new version number, which is based on
- The airflow major, minor, and patch versions
- The python environment major and minor versions
- The date of creation
For example: 2.6.0-py3.10-20230510
Update conda environment
Modify the conda-environment.yml
file with the new version of airflow that is to be used.
Execute ./generate_conda_environment_lock_yml.sh
which will update the file generate_conda_environment_lock_yml.sh with the specific package versions in the environment.
This file may require modification before proceeding. One known error is that it removes the gitlab URLs from the workflow_utils and conda-pack packages. These modifications should be reverted.
Execute ./check_conda_environment_lock_yml.sh and verify that a conda envionment can in fact be created from the specification. Check for the output:
An environment could be created form conda-environment.lock.yml
Execute unit tests
Execute the following:
export PYTHONPATH=.:./wmf_airflow_common/plugins
tox
tox -e lint
Ensure that all tests pass
Update debian packaging parameters
Update the gitlab.ci.yaml
and Dockerfile
files with the new package version.
Add an entry to the debian/changelog
- This can be made a little simpler with the command dch -v 2.6.0-py3.10-20230510 -D buster-wikimedia --force-distribution
Stage your changes for commit: git add gitlab-ci.yaml Dockerfile debian/changelog
Commit your chages with git commit
Build the package with GitLab CI
Push your branch back to GitLab with git push --set-upstream origin update_airflow_2_6
Browse to the feature branch in GitLab and locate the CI pipeline for the branch, as shown in the following image.
![Location of the link to the airflow feature branch pipeline](http://upload.wikimedia.org/wikipedia/labs/thumb/3/3f/Airflow_branch_pipeline.png/220px-Airflow_branch_pipeline.png)
Click on the triangle in the publish_airflow_package_with_docker stage.
![Location of the publish airflow package with docker stage](http://upload.wikimedia.org/wikipedia/labs/thumb/4/41/Publish_airflow_package_with_docker.png/220px-Publish_airflow_package_with_docker.png)
When the stage has built successfully, navigate to the Package Registry section of the repository.
![Location of the Airflow package built by GitLab-CI](http://upload.wikimedia.org/wikipedia/labs/thumb/5/5d/Airflow_package.png/220px-Airflow_package.png)
Locate the package that has been built.
Copy the download link and download this file to your test host with a command like the following:
curl -o airflow-2.6.0-py3.10-20230510_amd64.deb https://gitlab.wikimedia.org/repos/data-engineering/airflow-dags/-/package_files/1257/download
[SRE Only] Install the package with sudo apt install airflow-2.6.0-py3.10-20230510_amd64.deb
Run the database check command: sudo -u analytics airflow-analytics_test db check
Run the database upgrade command: sudo -u analytics airflow-analytics_test db upgrade