Jenkins
Jenkins is a software system originating from the Java community. Jenkins offers a highly modular system which has allowed other communities to add their own plugins (such as Git, PHP, …). At Wikimedia Foundation, we use Jenkins to run CI tests for MediaWiki and various other products and tools, generate and publish documentation, keep Beta Cluster code up-to-date, and various other things.
Administration
As of June 2023, Release Engineering maintains two Jenkins systems:
- WMF CI for MediaWiki and other software development, at https://integration.wikimedia.org/ci/
- Release automation, at https://releases-jenkins.wikimedia.org/
Host | Description |
---|---|
contint2002.wikimedia.org | Active CI |
contint1002.wikimedia.org | Cold CI. Controller stopped, one agent running |
releases1003.eqiad.wmnet | Active releasing |
releases2003.codfw.wmnet | Cold releasing |
DNS entry contint.wikimedia.org
always points to the currently active CI Jenkins.
To get administrative access on the CI Jenkins instance you would need to be added to LDAP group ciadmin
. As of December 2019, the releases Jenkins permissions are managed differently (ask release engineering team).
To get shell access, your account has to be added to the contint-admins
and releasers-mediawiki
shell groups. That is done in operations/puppet modules/admin/data/data.yaml
. One can use matrix.py
to list their group:
$ ./matrix.py hashar grp/users hashar contint-admins OK contint-docker OK contint-roots OK deployment OK gerrit-deployers OK gerrit-root OK labnet-users OK releasers-mediawiki OK
If one needs root access (to look at Apache logs, upgrade Jenkins, change file permissions etc), that is granted by the group contint-roots
for both CI and release Jenkins hosts.
Upgrading
Upstream does a release every week and a stable one from time to time, also named Long Term Support (LTS). We want to upgrade to the latest LTS whenever it is released.
Get the package
bullseye-wikimedia
componentThe upstream Debian packages are made available at:
Per SRE convention, the package needs to be copied to apt.wikimedia.org. Reprepro is configured to do so semi-automatically. On apt1001 one only needs to do:
$ cd /srv/wikimedia
$ sudo -E reprepro -C thirdparty/ci --restrict=jenkins checkupdate bullseye-wikimedia
Verify the output and then:
$ sudo -E reprepro -C thirdparty/ci --restrict=jenkins update bullseye-wikimedia
Congratulations.
In some case upstream might release a new LTS which has breaking changes and backport the security fixes to the LTS we currently use. In the case the breaking change is not trivial, we need to instruct reprepro to update the previous LTS. Since reprepro update does not offer a way to match a specific version, we have to first delete the package and then get it again with a specific version. Given a LTS 2.100.1 currently known and upstream releasing a new LTS 3.0.0 and a hotfix one 2.100.2 we would:
$ reprepro list bullseye-wikimedia jenkins bullseye-wikimedia|thirdparty/ci|amd64: jenkins 2.100.1 $ reprepro remove bullseye-wikimedia jenkins Exporting indices... Deleting files no longer referenced
Then get a the specific version 2.100.2 instead of the breaking one 3.0.0:
$ reprepro -C thirdparty/ci --noskipold --restrict-binary=jenkins=2.100-2 checkupdate bullseye-wikimedia $ reprepro list bullseye-wikimedia jenkins bullseye-wikimedia|thirdparty/ci|amd64: jenkins 2.100.2
Precheck
Review the upstream changelog https://jenkins.io/changelog-stable/ and try to guess what might cause havoc. There is no clear science there. The community reported bugs might raise a red flag.
Deploying
Jenkins is deployed in two places:
- the two virtual Ganeti hosts,
releases1003.eqiad.wmnet
andreleases2003.codfw.wmnet
, for releases, and - the two bare-metal hots,
contint1001.eqiad.wmnet
andcontint2001.codfw.wmnet
, for CI.
It can safely be restarted at anytime as long as the developer community is notified about it. (e.g. via Wikitech-l). When Jenkins is restarted, all current CI builds are aborted. New builds will automatically be enqueued again by Zuul once Jenkins is back up.
When doing an upgrade, you will most probably want to have one continuous integration person floating around because plugin dependencies might break once Jenkins is upgraded.
- Make sure the developer community are aware this is happening; (!log it in #wikimedia-releng and announce it in #wikimedia-operations a few minutes beforehand).
- Monitor https://integration.wikimedia.org/zuul/ to wait for a good time when no major jobs are about to finish.
- First, upgrade the releases hosts, since they are less impactful if something goes wrong:
- On
releases1003
as a root, runapt-get update && apt-get install --only-upgrade jenkins
- Wait for https://releases-jenkins.wikimedia.org/ to come back up; check that it works.
- Check for any alerts from Jenkins that things are broken.
- Then upgrade
releases2003
in the same way.
- On
- Second, upgrade the CI hosts:
- On
contint1001
as a root, runapt-get update && apt-get install --only-upgrade jenkins
- Wait for https://integration.wikimedia.org/ci/ to come back up; check that it works and that jobs process as expected.
- Check for any alerts from Jenkins that things are broken.
- Confirm that https://integration.wikimedia.org/zuul/ shows jobs processing as normal.
- Then upgrade
contint2001
in the same way.
- On
- !log that the upgrade is complete and mark the task in Phabricator as done.
Security updates
A few days before releasing a security update, upstream sends a pre-announcement to a mailing list (jenkinsci-advisories). The internal mailing list of the release engineering team is subscribed to it. Upon reception, a security task is filled in Phabricator against #continuous-integration-infrastructure
and #jenkins
.
The are two kind of security updates for Jenkins:
- an issue in Jenkins itself, which would triggers a release of Jenkins. It is thus important to always stay up to date and follow Jenkins LTS cycle.
- issues in plugins, they typically affect several plugins which are released in bulk
Jenkins security update
When the security update is made public, one has to first review the security bulletin to gauge how affected we are. Some security release of Jenkins do not affect us at all, some others are absolutely critical and require an immediate upgrade. The upgrade is as above: The Jenkins Debian package is updated using reprepro as described above and apt install
it.
Plugins
Plugins security update
CI Jenkins
For the CI Jenkins, the update is done manually via the Jenkins plugin manager.
We might have pinned packages to a specific version or have forked a plugin (usually having a version with SNAPSHOT
or wmf
. Those should **not** be updated with the plugin manager but the fork should be updated. See #Patch a plugin below.
Use the Check now
button at the bottom of the page to fetch the latest list of plugins from upstream, then update affected plugins.
Note: plugins might have back compatibility issue or could expect XML elements different than those generated by Jenkins Job Builder, there is no solid process there.
Releases Jenkins
Plugin versions for the Releases Jenkins are managed in its Scap3 deployment repository.
To test an update and to get the right plugin versions, you should deploy Jenkins locally using the dev environment and follow these steps:
- In the UI of the local instance, navigate to
http://<local>/manage/pluginManager/advanced
and disable the HTTP proxy so you can update plugins manually - From
http://<local>/manage/pluginManager
, update the relevant plugins - Navigate to the script console (
http://<local>/manage/script
) and run the following snippet:
def plugins = new ArrayList(Jenkins.instance.pluginManager.plugins)
plugins.sort { it.getShortName() }.each { plugin ->
println ("${plugin.getShortName()}:${plugin.getVersion()}")
}
return
- The output will contain the updated versions in the right format, you can paste it directly into the plugin list file. Note that, often, updating a plugin will also transitively update the versions of that plugin's dependencies, so the script step is important to find out those.
To deploy the version changes, create an MR in the deploy repo with the updated versions and once merged, re-deploy
the Jenkins instance by running the deploy script from deployment.eqiad.wmnet
.
Patch a plugin
If you absolutely need to patch a plugin without an upstream release, that is possible.
First, fork the Git repository to Gerrit under the integration/jenkinsci/ namespace.
Add CI job for the plugin by defining a new project under jjb/integration.yaml
and adding the jobs to zuul/layout.yaml
. As of February 2021 we run Jenkins on CI with Java 8 and would aim at upgrading to Java 11, we should thus have two jobs. Example for an hypothetical acme
plugin:
# jjb/integration.yaml
- project:
name: jenkins-plugin
project:
- acme
jobs:
- '{name}-{project}-maven-java8-docker'
- '{name}-{project}-maven-java11-docker'
# zuul/layout.yaml
- projects:
- name: integration/jenkinsci/acme
experimental:
- jenkins-plugin-acme-maven-java8-docker
- jenkins-plugin-acme-maven-java11-docker
Then send your patch(es) as Gerrit change(s).
If you need to, update the version number in pom.xml
, for example 1.103-wmf.1
. Then run mvn install
, which should create a <name>.hpi
file somewhere. You can upload that file in the Plugin Manager using the "Advanced" tab. If it tells you to restart Jenkins, kill any long running jobs, and then it should restart shortly. Then try out your newly patched plugin!
Gearman plugin
Jenkins jobs are registered to Zuul using the Gearman protocol via the Gearman plugin. It has long been abandoned by upstream since they no more use Jenkins and the Gearman java library has not been updated since roughly 2012. We eventually have hit a wall with Java 11 which forced us to maintain forks of those legacy software.
https://phabricator.wikimedia.org/T271683 has some history.
gearman-java library
The gearman-java
library was maintained using Bazaar on https://code.launchpad.net/gearman-java . We have converted it to a git repository: https://gerrit.wikimedia.org/g/integration/gearman-java (changes).
The original code was written for Java 5 and worked fine until Java 11 made some incompatible change to their select()
abstraction:
- Our primary patch is https://gerrit.wikimedia.org/r/c/integration/gearman-java/+/655663
- Java bug JDK-8200458
- JDK 11 Release Notes: Readiness Information Previously Recorded in SelectionKey Ready Set Not Preserved
That actually fixes the implementation to match the specification.
The library dependencies have been updated to bring support for Java 11.
The artifacts are created with maven
and published to Archiva ( https://archiva.wikimedia.org/#artifact/org.wikimedia.gearman/gearman-java ).
Jenkins plugin
The Jenkins plugin had a complicated history:
- First draft on https://github.com/jenkinsci/gearman-plugin
- Canonical repository was https://opendev.org/x/gearman-plugin until upstream abandonned Jenkins
- The company Gooddata did a fork at https://github.com/gooddata/gearman-plugin adressing some issues and matching their environment.
Antoine "hashar" Musso (talk) envisioned maintaining another fork on Wikimedia Gerrit, after discussions with OpenDev people, it sounded better to have the plugin under the Github jenkinsci organization to potentially benefit from shared maintenance from the Jenkins community.
On January 2021, Antoine "hashar" Musso (talk) got admin access to https://github.com/jenkinsci/gearman-plugin and merged in the gooddata fork.
A patch has been made to use the updated gearman-java
provided by Wikimedia archiva ( https://github.com/jenkinsci/gearman-plugin/pull/7 ).
We are releasing manually using Maven on our local machines until we convert to use continuous deployment with upstream GitHub actions. The documentation is at https://www.jenkins.io/doc/developer/publishing/releasing-manually/
You would require an account with jenkins-io and fill a request to get permission (an example is: https://github.com/jenkins-infra/repository-permissions-updater/pull/1830 ).
It is always possible to install the plugins without publishing it. That can be done through the Jenkins instance update manager and uploading the locally build .jar file.
How to
Restart Jenkins
See mw:Continuous integration/Jenkins#Restart.
Logs
Main application logs:
journalctl -u jenkins -f
Disable plugin
Jenkins plugins are placed in /var/lib/jenkins/plugin/
and ends with a .hpi
extension. To disable a plugin, rename it to .hpi.disable
and restart Jenkins!
Java thread dump
Whenever Jenkins appears to be stuck or facing high CPU usage, you will want to look at the Java threads: https://integration.wikimedia.org/ci/threadDump
You can also send Apparently kill -3 apparently kills Jenkins.
kill -3
to the Java process.
Another way:
jstack -F <pid of jenkins>
And there is a more verbose dump written to /var/log/jenkins/jenkins.log
For deadlock detection:
jstack -l -F <pid of jenkins>
Java info
sudo -u jenkins jstat -gcutil PID_HERE 1000 3