Data Engineering/Systems/DataHub/Upgrading

From Wikitech
Jump to navigation Jump to search

The upstream DataHub repository is: https://github.com/linkedin/datahub/

At the moment we maintain a fork of DataHub here: https://gerrit.wikimedia.org/r/admin/repos/analytics/datahub

The reasons why we do this are:

  • DataHub do not publish binary artifacts other than their docker images
  • We need to add files for PipelineLib configuration files and Blubber build pipelines alongside the codebase

Currently our changes are made in a [wmf branch] and we frequently squash any changes to that branch down to a single commit.

When a new release is required we perform the following operations.

  • Update the code in a feature branch
  • Merge to the wmf branch to publish the new containers
  • Create a feature branch in the deployment-charts repository and update the image version in the helm charts
  • Deploy the new version with helmfile

Update the code

  • Check out the code locally.

git checkout -b datahub_upgrade_branch

  • Add the upstream remote if it does not already exist

git remote add linkedin-github git@github.com:datahub-project/datahub.git

  • Pull the master branch from the upstream remote.

git remote update linkedin-github

  • Push the master branch from the upstream repository to our gerrit repository.

git push origin linkedin-github/master:master

  • Also push the tags to the remote repository

git push origin --tags

  • Checkout the wmf branch.

git checkout wmf

  • Rebase your current branch against the tag of the new version. In this case it is v0.8.34

git rebase -i v0.8.34

  • Fix any merge conflicts if encountered
  • Force-push the branch to gerrit

git push --force-with-lease

Deploy datahub CLI tool

The version of the CLI tool has to match the server version, so we have to: