Data Engineering/Systems/Airflow/Upgrading
We have the following procedures in place to upgrade Airflow versions.
Configure feature branch
Check out a working copy of the https://gitlab.wikimedia.org/repos/data-engineering/airflow-dags repository
Create a feature branch, for example: git checkout -b airflow_version_2_6
Determine the new version number, which is based on
- The airflow major, minor, and patch versions
- The python environment major and minor versions
- The date of creation
For example: 2.6.0-py3.10-20230510
Update conda environment
Modify the conda-environment.yml
file with the new version of airflow that is to be used.
Execute ./generate_conda_environment_lock_yml.sh
which will update the file generate_conda_environment_lock_yml.sh with the specific package versions in the environment.
This file may require modification before proceeding. One known error is that it removes the gitlab URLs from the workflow_utils and conda-pack packages. These modifications should be reverted.
Execute ./check_conda_environment_lock_yml.sh and verify that a conda envionment can in fact be created from the specification. Check for the output:
An environment could be created form conda-environment.lock.yml
Execute unit tests
Execute the following:
export PYTHONPATH=.:./wmf_airflow_common/plugins
tox
tox -e lint
Ensure that all tests pass
Update debian packaging parameters
Update the gitlab.ci.yaml
and Dockerfile
files with the new package version.
Add an entry to the debian/changelog
- This can be made a little simpler with the command dch -v 2.6.0-py3.10-20230510 -D buster-wikimedia --force-distribution
Stage your changes for commit: git add gitlab-ci.yaml Dockerfile debian/changelog
Commit your chages with git commit
Build the package with GitLab CI
Push your branch back to GitLab with git push --set-upstream origin update_airflow_2_6
Browse to the feature branch in GitLab and locate the CI pipeline for the branch, as shown in the following image.
Click on the triangle in the publish_airflow_package_with_docker stage.
When the stage has built successfully, navigate to the Package Registry section of the repository.
Locate the package that has been built.
Copy the download link and download this file to your test host with a command like the following:
curl -o airflow-2.6.0-py3.10-20230510_amd64.deb https://gitlab.wikimedia.org/repos/data-engineering/airflow-dags/-/package_files/1257/download
[SRE Only] Install the package with sudo apt install airflow-2.6.0-py3.10-20230510_amd64.deb
Run the database check command: sudo -u analytics airflow-analytics_test db check
Run the database upgrade command: sudo -u analytics airflow-analytics_test db upgrade