Analytics/Systems/Cluster/Bigtop Packages

From Wikitech

Overview

We use Apache Bigtop (https://bigtop.apache.org/) as our Hadoop distribution and use their toolchain to build Debian packages for all of the components. The upstream source for Bigtop is here: https://github.com/apache/bigtop/

They maintain a branch called branch-1.5 which was the version that we used until recently. Unfortunately for us, they have decided not to support Debian 11 Bullseye or later as an installation target, whereas we have had to do so. Therefore, we have had to create our own fork of their repository, which is https://gitlab.wikimedia.org/repos/data-engineering/bigtop

The gradle build system is used throughout bigtop, along with puppetized build slaves running under docker.

An example command to build a single component, such as hadoop, under bullseye is as follows:

docker run --rm -v `pwd`:/ws --workdir /ws bigtop/slaves:1.5.0-debian-11 bash -c '. /etc/profile.d/bigtop.sh; ./gradlew allclean hadoop-pkg'

The list of components that we currently build with bigtop is currently:

  • bigtop-groovy
  • bigtop-jsvc
  • bigtop-tomcat
  • bigtop-utils
  • hadoop
  • hbase
  • hive
  • mahout
  • oozie
  • solr
  • spark
  • sqoop
  • sqoop2

Unlike most other software that we build, bigtop packages are currently built on an engineer's workstation and then uploaded to the APT repository for serving

WMF Build Script

We have added a couple of scripts to our branch-1.5 since:

  1. We have a specific set of components that we need to build
  2. We use an operating system that is unsupported

The top level script to run is: build_all_bigtop_distros_wmf.sh

This script does the following:

  1. Creates build slaves for debian-10 and debian-11
  2. Uses each build slave to run build_bigtop_wmf.sh

Package Amendment

The default build mechanism in bigtop does not include any information about which distribution the packages were built for.

This causes an issue for us, since we have to ensure that we have packages available for both buster and bullseye on apt.wikimedia.org for distribution.

In order to get around this, the build_bigtop_wmf.sh script does the following for each package file generated.

  1. Unpack the deb file including the metadata files with dpkg-deb -R
  2. Modify the DEBIAN/control file and append either -deb10 or -deb11 to the Version: field
  3. Re-pack the deb file with dpkg-deb -b including either -deb10 or -deb11 to the file name

This will allow us to host all of these packages with reprepro concurrently.