Jenkins

From Wikitech

Jenkins is a software system originating from the Java community. Jenkins offers a highly modular system which has allowed other communities to add their own plugins (such as Git, PHP, …). At Wikimedia Foundation, we use Jenkins to run CI tests for MediaWiki and various other products and tools, generate and publish documentation, keep Beta Cluster code up-to-date, and various other things.

Administration

As of June 2023, Release Engineering maintains two Jenkins systems:

  1. WMF CI for MediaWiki and other software development, at https://integration.wikimedia.org/ci/
  2. Release automation, at https://releases-jenkins.wikimedia.org/
Host Description
contint2002.wikimedia.org Active CI
contint1002.wikimedia.org Cold CI. Controller stopped, one agent running
releases1003.eqiad.wmnet Active releasing
releases2003.codfw.wmnet Cold releasing

DNS entry contint.wikimedia.org always points to the currently active CI Jenkins.

To get administrative access on the CI Jenkins instance you would need to be added to LDAP group ciadmin. As of December 2019, the releases Jenkins permissions are managed differently (ask release engineering team).

To get shell access, your account has to be added to the contint-admins and releasers-mediawiki shell groups. That is done in operations/puppet modules/admin/data/data.yaml. One can use matrix.py to list their group:

$ ./matrix.py hashar
grp/users	hashar
contint-admins	OK
contint-docker	OK
contint-roots	OK
deployment	OK
gerrit-deployers	OK
gerrit-root	OK
labnet-users	OK
releasers-mediawiki	OK

If one needs root access (to look at Apache logs, upgrade Jenkins, change file permissions etc), that is granted by the group contint-roots for both CI and release Jenkins hosts.

Upgrading

Upstream does a release every week and a stable one from time to time, also named Long Term Support (LTS). We want to upgrade to the latest LTS whenever it is released.

Get the package

Releases instances currently run on Bullseye while runs on Buster. Instructions below are for buster-wikimedia component and also needs to be done for bullseye-wikimedia component

The upstream Debian packages are made available at:

Per SRE convention, the package needs to be copied to apt.wikimedia.org. Reprepro is configured to do so semi-automatically. On apt1001 one only needs to do:

$ cd /srv/wikimedia
$ sudo -E reprepro -C thirdparty/ci --restrict=jenkins checkupdate buster-wikimedia

Verify the output and then:

$ sudo -E reprepro -C thirdparty/ci --restrict=jenkins update buster-wikimedia

Congratulations.

In some case upstream might release a new LTS which has breaking changes and backport the security fixes to the LTS we currently use. In the case the breaking change is not trivial, we need to instruct reprepro to update the previous LTS. Since reprepro update does not offer a way to match a specific version, we have to first delete the package and then get it again with a specific version. Given a LTS 2.100.1 currently known and upstream releasing a new LTS 3.0.0 and a hotfix one 2.100.2 we would:

$ reprepro list buster-wikimedia jenkins
buster-wikimedia|thirdparty/ci|amd64: jenkins 2.100.1

$ reprepro remove buster-wikimedia jenkins
Exporting indices...

Deleting files no longer referenced

Then get a the specific version 2.100.2 instead of the breaking one 3.0.0:

$ reprepro -C thirdparty/ci --noskipold --restrict-binary=jenkins=2.100-2 checkupdate buster-wikimedia

$ reprepro list buster-wikimedia jenkins
buster-wikimedia|thirdparty/ci|amd64: jenkins 2.100.2

Precheck

Review the upstream changelog https://jenkins.io/changelog-stable/ and try to guess what might cause havoc. There is no clear science there. The community reported bugs might raise a red flag.

Deploying

Jenkins is deployed in two places:

  • the two virtual Ganeti hosts, releases1003.eqiad.wmnet and releases2003.codfw.wmnet, for releases, and
  • the two bare-metal hots, contint1001.eqiad.wmnet and contint2001.codfw.wmnet, for CI.

It can safely be restarted at anytime as long as the developer community is notified about it. (e.g. via Wikitech-l). When Jenkins is restarted, all current CI builds are aborted. New builds will automatically be enqueued again by Zuul once Jenkins is back up.

When doing an upgrade, you will most probably want to have one continuous integration person floating around because plugin dependencies might break once Jenkins is upgraded.

  1. Make sure the developer community are aware this is happening; (!log it in #wikimedia-releng and announce it in #wikimedia-operations a few minutes beforehand).
  2. Monitor https://integration.wikimedia.org/zuul/ to wait for a good time when no major jobs are about to finish.
  3. First, upgrade the releases hosts, since they are less impactful if something goes wrong:
    1. On releases1003 as a root, run apt-get update && apt-get install --only-upgrade jenkins
    2. Wait for https://releases-jenkins.wikimedia.org/ to come back up; check that it works.
    3. Check for any alerts from Jenkins that things are broken.
    4. Then upgrade releases2003 in the same way.
  4. Second, upgrade the CI hosts:
    1. On contint1001 as a root, run apt-get update && apt-get install --only-upgrade jenkins
    2. Wait for https://integration.wikimedia.org/ci/ to come back up; check that it works and that jobs process as expected.
    3. Check for any alerts from Jenkins that things are broken.
    4. Confirm that https://integration.wikimedia.org/zuul/ shows jobs processing as normal.
    5. Then upgrade contint2001 in the same way.
  5. !log that the upgrade is complete and mark the task in Phabricator as done.

Security updates

A few days before releasing a security update, upstream sends a pre-announcement to a mailing list (jenkinsci-advisories). The internal mailing list of the release engineering team is subscribed to it. Upon reception, a security task is filled in Phabricator against #continuous-integration-infrastructure and #jenkins.

The are two kind of security updates for Jenkins:

  • an issue in Jenkins itself, which would triggers a release of Jenkins. It is thus important to always stay up to date and follow Jenkins LTS cycle.
  • issues in plugins, they typically affect several plugins which are released in bulk

Jenkins security update

When the security update is made public, one has to first review the security bulletin to gauge how affected we are. Some security release of Jenkins do not affect us at all, some others are absolutely critical and require an immediate upgrade. The upgrade is as above: The Jenkins Debian package is updated using reprepro as described above and apt install it.

Plugins

Plugins security update

CI Jenkins

For the CI Jenkins, the update is done manually via the Jenkins plugin manager.

We might have pinned packages to a specific version or have forked a plugin (usually having a version with SNAPSHOT or wmf. Those should **not** be updated with the plugin manager but the fork should be updated. See #Patch a plugin below.

Use the Check now button at the bottom of the page to fetch the latest list of plugins from upstream, then update affected plugins.

Note: plugins might have back compatibility issue or could expect XML elements different than those generated by Jenkins Job Builder, there is no solid process there.

Releases Jenkins

Plugin versions for the Releases Jenkins are managed in its Scap3 deployment repository.

To test an update and to get the right plugin versions, you should deploy Jenkins locally using the dev environment and follow these steps:

  • In the UI of the local instance, navigate to http://<local>/manage/pluginManager/advanced and disable the HTTP proxy so you can update plugins manually
  • From http://<local>/manage/pluginManager, update the relevant plugins
  • Navigate to the script console (http://<local>/manage/script) and run the following snippet:
def plugins = new ArrayList(Jenkins.instance.pluginManager.plugins)
plugins.sort { it.getShortName() }.each { plugin -> 
  println ("${plugin.getShortName()}:${plugin.getVersion()}")
}
return
  • The output will contain the updated versions in the right format, you can paste it directly into the plugin list file. Note that, often, updating a plugin will also transitively update the versions of that plugin's dependencies, so the script step is important to find out those.

To deploy the version changes, create an MR in the deploy repo with the updated versions and once merged, re-deploy the Jenkins instance by running the deploy script from deployment.eqiad.wmnet.

Patch a plugin

If you absolutely need to patch a plugin without an upstream release, that is possible.

First, fork the Git repository to Gerrit under the integration/jenkinsci/ namespace.

Add CI job for the plugin by defining a new project under jjb/integration.yaml and adding the jobs to zuul/layout.yaml. As of February 2021 we run Jenkins on CI with Java 8 and would aim at upgrading to Java 11, we should thus have two jobs. Example for an hypothetical acme plugin:

# jjb/integration.yaml
- project:
    name: jenkins-plugin
    project:
        - acme
    jobs:
        - '{name}-{project}-maven-java8-docker'
        - '{name}-{project}-maven-java11-docker'
# zuul/layout.yaml
- projects:
  - name: integration/jenkinsci/acme
    experimental:
      - jenkins-plugin-acme-maven-java8-docker
      - jenkins-plugin-acme-maven-java11-docker

Then send your patch(es) as Gerrit change(s).

If you need to, update the version number in pom.xml, for example 1.103-wmf.1. Then run mvn install, which should create a <name>.hpi file somewhere. You can upload that file in the Plugin Manager using the "Advanced" tab. If it tells you to restart Jenkins, kill any long running jobs, and then it should restart shortly. Then try out your newly patched plugin!

Gearman plugin

Jenkins jobs are registered to Zuul using the Gearman protocol via the Gearman plugin. It has long been abandoned by upstream since they no more use Jenkins and the Gearman java library has not been updated since roughly 2012. We eventually have hit a wall with Java 11 which forced us to maintain forks of those legacy software.

https://phabricator.wikimedia.org/T271683 has some history.

gearman-java library

The gearman-java library was maintained using Bazaar on https://code.launchpad.net/gearman-java . We have converted it to a git repository: https://gerrit.wikimedia.org/g/integration/gearman-java (changes).

The original code was written for Java 5 and worked fine until Java 11 made some incompatible change to their select() abstraction:

That actually fixes the implementation to match the specification.

The library dependencies have been updated to bring support for Java 11.

The artifacts are created with maven and published to Archiva ( https://archiva.wikimedia.org/#artifact/org.wikimedia.gearman/gearman-java ).

Jenkins plugin

The Jenkins plugin had a complicated history:

Antoine "hashar" Musso (talk) envisioned maintaining another fork on Wikimedia Gerrit, after discussions with OpenDev people, it sounded better to have the plugin under the Github jenkinsci organization to potentially benefit from shared maintenance from the Jenkins community.

On January 2021, Antoine "hashar" Musso (talk) got admin access to https://github.com/jenkinsci/gearman-plugin and merged in the gooddata fork.

A patch has been made to use the updated gearman-java provided by Wikimedia archiva ( https://github.com/jenkinsci/gearman-plugin/pull/7 ).

We are releasing manually using Maven on our local machines until we convert to use continuous deployment with upstream GitHub actions. The documentation is at https://www.jenkins.io/doc/developer/publishing/releasing-manually/

You would require an account with jenkins-io and fill a request to get permission (an example is: https://github.com/jenkins-infra/repository-permissions-updater/pull/1830 ).

It is always possible to install the plugins without publishing it. That can be done through the Jenkins instance update manager and uploading the locally build .jar file.

How to

Restart Jenkins

See mw:Continuous integration/Jenkins#Restart.

Logs

Main application logs:

journalctl -u jenkins -f

Disable plugin

Jenkins plugins are placed in /var/lib/jenkins/plugin/ and ends with a .hpi extension. To disable a plugin, rename it to .hpi.disable and restart Jenkins!

Java thread dump

Whenever Jenkins appears to be stuck or facing high CPU usage, you will want to look at the Java threads: https://integration.wikimedia.org/ci/threadDump

You can also send kill -3 to the Java process. Apparently kill -3 apparently kills Jenkins.

Another way:

jstack -F <pid of jenkins>

And there is a more verbose dump written to /var/log/jenkins/jenkins.log

For deadlock detection:

jstack -l -F <pid of jenkins>

Java info

sudo -u jenkins jstat -gcutil PID_HERE 1000 3

See also