Data Engineering/Systems/Archiva

Apache Archiva is a Maven Build Artifact Repository Manager, similar to Nexus and Artifactory. WMF uses Archiva as its sole Java build repository, and uses it in a unique way for production deployments of artifacts via Scap and git-fat.

Our Archiva instance is at archiva.wikimedia.org.

Setup

Archiva was chosen over other Maven Repository Managers because it supports all of the features we need, and is 100% open source. You can see a comparison of Archiva against the other two major Maven Repository Managers.

Our Archiva is configured in a slightly non-standard way. We wanted to have complete control over the build and deploy environments used by developers and by production. Usually, organizations will maintain their own Maven repository so that developers and deployments do not need to download build dependencies from the internet every time they need to build. This is done mainly to reduce network traffic. Proxies are maintained between the organization's Maven repository and other Maven repositories out there (e.g. Maven Central, Cloudera, etc.). Dependencies are transparently proxied and cached by the org's Maven repository. Future builds that require cached dependencies will just download them from the org's Maven repository.

Repositories

mirrored - Mirrors artifacts proxied from Cloudera, Spark and Central. Manual uploads to this repository are not expected.
releases - WMF versioned releases (expected to contain only WMF packaged software).
snapshots - WMF snapshot jars/artifacts.
python - WMF Python packaging (experimental?)

Development

You'll need Maven installed locally in order to use Archiva and build JVM based projects.

Setup a master password

The password to reach out Archiva has to be locally encrypted. That is done using a master password which is detailed at: https://maven.apache.org/guides/mini/guide-encryption.html#how-to-create-a-master-password

Using pom.xml

This is the preferred way of building from Archiva. Your project's main pom.xml should include repository settings to disable maven central and enable wikimedia Archiva repositories. Edit your pom.xml file and add the following:

    <repositories>
        <!-- disable Maven central -->
        <repository>
            <id>central</id>
            <url>http://repo1.maven.org/maven2</url>
            <releases>
                <enabled>false</enabled>
            </releases>
            <snapshots>
                <enabled>false</enabled>
            </snapshots>
        </repository>

        <!-- Repository information for archiva.wikimedia.org. -->
        <repository>
            <id>wikimedia.mirrored</id>
            <name>Wikimedia Mirrored Repository</name>
            <url>https://archiva.wikimedia.org/repository/mirrored</url>
            <releases>
                <enabled>true</enabled>
                <checksumPolicy>fail</checksumPolicy>
            </releases>
            <snapshots>
                <enabled>false</enabled>
            </snapshots>
        </repository>

        <repository>
            <id>wikimedia.releases</id>
            <name>Wikimedia Release Repository</name>
            <url>https://archiva.wikimedia.org/repository/releases</url>
            <releases>
                <enabled>true</enabled>
                <checksumPolicy>fail</checksumPolicy>
            </releases>
            <snapshots>
                <enabled>false</enabled>
            </snapshots>
        </repository>

        <repository>
            <id>wikimedia.snaphots</id>
            <name>Wikimedia Snapshot Repository</name>
            <url>https://archiva.wikimedia.org/repository/snapshots</url>
            <releases>
                <enabled>false</enabled>
            </releases>
            <snapshots>
                <enabled>true</enabled>
                <checksumPolicy>fail</checksumPolicy>
            </snapshots>
        </repository>

        <!-- Only necessary if uploading python wheels. Needs to be in ~/.m2/settings.xml described in next section -->
        <repository>
            <id>wikimedia.python</id>
            <name>Wikimedia Python Repository</name>
            <url>https://archiva.wikimedia.org/repository/python</url>
            <releases>
                <enabled>true</enabled>
                <checksumPolicy>fail</checksumPolicy>
            </releases>
            <snapshots>
                <enabled>false</enabled>
            </snapshots>
        </repository>
    </repositories>

    <pluginRepositories>
        <!-- disable Maven central -->
        <pluginRepository>
            <id>central</id>
            <url>http://repo1.maven.org/maven2</url>
            <releases>
                <enabled>false</enabled>
            </releases>
            <snapshots>
                <enabled>false</enabled>
            </snapshots>
        </pluginRepository>

        <pluginRepository>
            <id>wikimedia.mirrored</id>
            <name>Wikimedia Mirrored Repository</name>
            <url>https://archiva.wikimedia.org/repository/mirrored/</url>
        </pluginRepository>
    </pluginRepositories>

Using settings.xml

Maven keeps its global settings and cached artifacts in ~/.m2. If you don't want to modify your project's pom.xml file, you may instead edit ~/.m2/settings.xml and add the same repository information from above inside of a <profile> tag:

<settings>
    <profiles>
	<profile>
	    <id>force-wikimedia-archiva</id>

            ### COPY/PASTE <repositories> and <pluginRepositories> information from above here. ###

	</profile>
    </profiles>

    <activeProfiles>
        <activeProfile>force-wikimedia-archiva</activeProfile>
    </activeProfiles>
</settings>

Deploy to Archiva

Once your pom.xml or ~/.m2/settings.xml has repository information, you should be able to build your projects using Archiva. If you want to deploy a snapshot or a release to Archiva, you'll need to add distributionManagement configs to your project's main pom.xml file.

    <distributionManagement>
        <repository>
            <id>archiva.releases</id>
            <name>Wikimedia Release Repository</name>
            <url>https://archiva.wikimedia.org/repository/releases/</url>
        </repository>
        <snapshotRepository>
            <id>archiva.snapshots</id>
            <name>Wikimedia Snapshot Repository</name>
            <url>https://archiva.wikimedia.org/repository/snapshots/</url>
        </snapshotRepository>
    </distributionManagement>

You'll also need archiva deployment credentials in your ~/.m2/settings.xml file:

<settings>
    <servers>
        <!--
          User credentials for archiva.wikimedia.org deployment repositories.
          If you don't plan to ever run 'mvn deploy', you don't need these.
         -->
        <server>
            <id>archiva.releases</id>
            <username>$USERNAME</username>
            <password>XXXXXXXX</password>
        </server>
        <server>
            <id>archiva.snapshots</id>
            <username>$USERNAME</username>
            <password>XXXXXXXX</password>
        </server>
    </servers>
</settings>

In the past, the $USERNAME was archiva-deploy, that has been deprecated in August 2018 in favor of LDAP authentication. In order to be able to upload jars to archiva, $USERNAME needs to be listed in the archiva-deployers LDAP group.

This will allow you to run mvn deploy to deploy to Archiva. If your project's version ends in -SNAPSHOT, a snapshot will be deployed, otherwise a release will be deployed. See Deploying to repository for more information.

You may also want to consider using maven password encryption to store your password in your ~/.m2/settings.xml.

Deploy artifacts using scap3

mvn deploy will push your project's artifacts to archiva.wikimedia.org. We do not use Archiva directly to deploy these artifacts to production. Instead, we use a combination of git-fat + Archiva.

A cronjob creates symlinks inside of a git-fat store, and makes this store publicly available via an rsync daemon module. To deploy Archiva hosted artifacts with git-deploy, you'll need to use git-fat to add those artifacts to your local working copy of your project, and then commit those artifacts manually. git-deploy will know how to use git-fat to pull your artifacts from the rsync daemon.

Setting up git-fat for your project

git-fat must be installed locally. This is available as a .deb in the Wikimedia Apt Repo. Alternatively, git-fat is just a single python script, so you can download it and put it on your path somewhere.

Initializing git-fat for a new clone of an already configured repository

If the steps below have already been run by someone for your repository (i.e. .gitfat and .gitattributes exist with proper content), then all you have to do is install git-fat, cd into your clone, and then run git-fat init

Initializing git-fat for the first time on your repository

Edit .gitattributes so that git-fat is used whenever you git add .jar files:

echo '*.jar filter=fat -text' >> .gitattributes
git add .gitattributes

Edit .gitfat to point at the rsync daemon module:

echo '[rsync]
	remote = archiva.wikimedia.org::archiva/git-fat
	options = --copy-links --verbose
' >> .gitfat
git add .gitfat

Commit these files so that everyone will have them. This will only have to be done once per project.

git commit .gitfat .gitattributes -m 'Configuring git-fat to work with Archiva'

Now, initialize git-fat for your repository. This needs to be done for every clone of your project:

git-fat init

Adding an artifact to your project

Once you know that artifact .jars are properly in Archiva, and git-fat is set up for your project, you may add any of these .jars using git add and git commit

git add lib/myproject-1.0.1.jar
git commit -m 'Adding myproject-1.0.1.jar via git-fat'

When others git pull your project, this file will first appear as a small text file containing the sha1 sum of the .jar you just added. They will have to run git-fat pull to sync down .jars from the rsync daemon on archiva.wikmedia.org.

Deploying your project and artifacts using scap3

scap supports git-fat managed files. Set git_binary_manager: git-fat in your project's scap.cfg.

On deployment, scap deploy will run git-fat pull as part of the sync phase, which will cause added artifacts to be pulled down from the rsync remote.

Uploading dependency artifacts

If your project has a dependency that is not already in the wikimedia.mirrored repository, you may want to add this manually

There are multiple ways to do this, this is just how I have done it so far. If you know of a better way, please edit this section accordingly.

Download the artifact's .pom and .jar files from its external source, Maven Central or elsewhere. Note that dependencies from Maven Central should be automatically proxied and cached, and thus don't need to be manually uploaded.
Log in with your username. Click on Upload Artifact. Please note: you will be able to see the Upload functionality only if your username is part of the archiva-deployers LDAP group (previously the archiva-deploy user was used, but it has been deprecated in August 2018).
Choose 'Wikimedia Mirrored Repository' for Repository Id.
Fill in the rest of the upload form with proper values. e.g.

Click + Choose File twice, once for the .pom file, and again for the .jar file. Check the pomFile box for the .pom file.
Click Start Upload.
Click Save Files. Your artifact should now be available in the wikimedia.mirrored repository.

Uploading jars with `<type>s`

If your project depends on a jar with a <type> in the pom, like tests then you upload it with your regular jar but before you click "Start Upload" you fill in the "type" into the "classifier" box.

Neither of these boxes accept the "-" character, instead use ‐. Yup. That sucks.

Using new Let's Encrypt SSL certs

WMF recently (early 2017) switched to Let's Encrypt. Let's Encrypt cacerts have been added in new Java versions. If you are running a Java version that does not yet have these certs, you'll need to add them. If you are trying to use archiva.wikimedia.org to run mvn package, you may see an error like sun.security.validator.ValidatorException: PKIX path building failed: sun.security.provider.certpath.SunCertPathBuilderException: unable to find valid certification path to requested target. If you do, download this script and run it:

wget https://gist.githubusercontent.com/Firefishy/109b0f1a90156f6c933a50fe40aa777e/raw/15926be913682876ae68bb4f71e489bc53feaae3/install-letsencrypt-in-jdk.sh && chmod 755 install-letsencrypt-in-jdk.sh && sudo ./install-letsencrypt-in-jdk.sh $(/usr/libexec/java_home)

This should add the Let's Encrypt cacerts to your JVM's list of CAs.

Administration

Archiva is installed using a fairly hacky .deb package. It was created using a release tarball from Archiva, which means it was cobbled together from prebuilt sources. This was done because building .debs of JVM projects from source in the Debian Way is often nearly impossible! But, that's what we have Archiva for anyway now, eh?

Upgrade

If you need to upgrade Archiva, please check the README. Please keep in mind that /var/lib/archiva/conf/archiva.xml needs to be saved before installing the Debian package as precautionary measure (avoid losing all LDAP/repository/etc.. confs). You can also use Bacula#Restore (aka Panic mode) to retrieve the last version of the archiva.xml file from Bacula if needed.

Configuration

Archiva is installed and configured by this archiva puppet module. This module also installs the cronjob that maintains the Archiva git-fat store.

/var/lib/archiva/conf/archiva.xml is a config file that is maintained by Archiva itself. Changes are made via the Web GUI and saved to this file.
/var/lib/archiva/git-fat is the git-fat store. An rsync daemon module is configured to serve files out of this directory.
archiva-gitfat-link runs every 5 minutes and symlinks any .jar files found in /var/lib/archiva/repository named by their SHA1 sums in /var/lib/archiva/git-fat. If a .jar's SHA1 does not match the .sha1 metadata file that Archiva maintains, the symlink will not be created.

git-fat store's are just directories with files named by their SHA1 sums. By keeping symlinks named by SHA1 sums to .jars, git-fat is able to pull .jars from this directory.

Backups

Archiva's repository will change over time as more dependencies are added and more projects are released. It also keeps control over much of its internal configuration. Bacula takes backups of the /var/lib/archiva directory ensure that we can recreate archiva.wikimedia.org should we lose the server.