Jenkins

From Wikitech
Jump to navigation Jump to search

Jenkins is a software originating from the Java community. It assists in building a continuous integration system. Jenkins offers a highly modular system which has allowed other communities to add their own plugins (such as GIT, PHP...). Wikimedia Foundation is using Jenkins to run CI tests for MediaWiki and various other products and tools, generate and publish documentation, keep Beta cluster code up-to-date and various other things.

Administration

Release engineering maintains four Jenkins instances:

Host Description
contint1001.wikimedia.org The CI Jenkins https://integration.wikimedia.org/ci/
contint2001.wikimedia.org Cold CI Jenkins (stopped)
releases1002.eqiad.wmnet The releasing Jenkins https://releases-jenkins.wikimedia.org/
releases2002.codfw.wmnet Cold release Jenkins

To get administrative access on the CI Jenkins instance you would need to be added to LDAP group ciadmin. As of December 2019, the releases Jenkins permissions are managed differently (ask release engineering team).

To get shell access, your account has to be added to the contint-admins and releasers-mediawiki shell groups. That is done in operations/puppet modules/admin/data/data.yaml. One can use matrix.py to list their group:

$ ./matrix.py hashar
grp/users	hashar
contint-admins	OK
contint-docker	OK
contint-roots	OK
deployment	OK
gerrit-deployers	OK
gerrit-root	OK
labnet-users	OK
releasers-mediawiki	OK

If one needs root access (to look at Apache logs, upgrade Jenkins, change file permissions etc), that is granted by the group contint-roots for both CI and release Jenkins hosts.

Upgrading

Upstream does a release every week and a stable one from time to time, also named Long Term Support (LTS). We want to upgrade to the latest LTS whenever it is released.

Get the package

The upstream Debian packages are made available at:

Per SRE convention, the package needs to be copied to apt.wikimedia.org. Reprepro is configured to do so semi-automatically. On apt1001 one only needs to do:

 cd /srv/wikimedia
 reprepro -C thirdparty/ci --restrict=jenkins checkupdate buster-wikimedia

Verify the output and then:

 reprepro -C thirdparty/ci --restrict=jenkins update buster-wikimedia

Congratulations.

In some case upstream might release a new LTS which has breaking changes and backport the security fixes to the LTS we currently use. In the case the breaking change is not trivial, we need to instruct reprepro to update the previous LTS. Since reprepro update does not offer a way to match a specific version, we have to first delete the package and then get it again with a specific version. Given a LTS 2.100.1 currently known and upstream releasing a new LTS 3.0.0 and a hotfix one 2.100.2 we would:

$ reprepro list buster-wikimedia jenkins
buster-wikimedia|thirdparty/ci|amd64: jenkins 2.100.1

$ reprepro remove buster-wikimedia jenkins
Exporting indices...

Deleting files no longer referenced

Then get a the specific version 2.100.2 instead of the breaking one 3.0.0:

$ reprepro --restrict-binary jenkins=2.100.2

$ reprepro list buster-wikimedia jenkins
buster-wikimedia|thirdparty/ci|amd64: jenkins 2.100.2

Precheck

Review the upstream changelog https://jenkins.io/changelog-stable/ and try to guess what might cause havoc. There is no clear science there. The community reported bugs might raise a red flag.

Deploying

Jenkins is deployed in two places:

  • the two virtual Ganeti hosts, releases1002.eqiad.wmnet and releases2002.codfw.wmnet, for releases, and
  • the two bare-metal hots, contint1001.eqiad.wmnet and contint2001.codfw.wmnet, for CI.

It can safely be restarted at anytime as long as the developer community is notified about it. (e.g. via Wikitech-l). When Jenkins is restarted, all current CI builds are aborted. New builds will automatically be enqueued again by Zuul once Jenkins is back up.

When doing an upgrade, you will most probably want to have one continuous integration person floating around because plugin dependencies might break once Jenkins is upgraded.

  1. Make sure the developer community are aware this is happening; (!log it in #wikimedia-releng and announce it in #wikimedia-operations a few minutes beforehand).
  2. Monitor https://integration.wikimedia.org/zuul/ to wait for a good time when no major jobs are about to finish.
  3. First, upgrade the releases hosts, since they are less impactful if something goes wrong:
    1. On releases1002 as a root, run apt-get update && apt-get install --only-upgrade jenkins
    2. Wait for https://releases-jenkins.wikimedia.org/ to come back up; check that it works.
    3. Check for any alerts from Jenkins that things are broken.
    4. Then upgrade releases2002 in the same way.
  4. Second, upgrade the CI hosts:
    1. On contint1001 as a root, run apt-get update && apt-get install --only-upgrade jenkins
    2. Wait for https://integration.wikimedia.org/ci/ to come back up; check that it works and that jobs process as expected.
    3. Check for any alerts from Jenkins that things are broken.
    4. Confirm that https://integration.wikimedia.org/zuul/ shows jobs processing as normal.
    5. Then upgrade contint2001 in the same way.
  5. !log that the upgrade is complete and mark the task in Phabricator as done.

Security updates

A few days before releasing a security update, upstream sends a pre-announcement to a mailing list (jenkinsci-advisories). The internal mailing list of the release engineering team is subscribed to it. Upon reception, a security task is filled in Phabricator against #continuous-integration-infrastructure and #jenkins.

The are two kind of security updates for Jenkins:

  • an issue in Jenkins itself, which would triggers a release of Jenkins. It is thus important to always stay up to date and follow Jenkins LTS cycle.
  • issues in plugins, they typically affect several plugins which are released in bulk

Jenkins security update

When the security update is made public, one has to first review the security bulletin to gauge how affected we are. Some security release of Jenkins do not affect us at all, some others are absolutely critical and require an immediate upgrade. The upgrade is as above: The Jenkins Debian package is updated using reprepro as described above and apt install it.

Plugins

Plugins security update

For plugins, the update is done via the Jenkins plugin managers:

We might have pinned packages to a specific version or have forked a plugin (usually having a version with SNAPSHOT or wmf. Those should **not** be updated with the plugin manager but the fork should be updated. See #Patch a plugin below.

Use the Check now button at the bottom of the page to fetch the latest list of plugins from upstream, then update affected plugins.

Note: plugins might have back compatibility issue or could expect XML elements different than those generated by Jenkins Job Builder, there is no solid process there.

Patch a plugin

If you absolutely need to patch a plugin without an upstream release, that is possible.

First, fork the Git repository to Gerrit under the integration/jenkinsci/ namespace.

Add CI job for the plugin by defining a new project under jjb/integration.yaml and adding the jobs to zuul/layout.yaml. As of February 2021 we run Jenkins on CI with Java 8 and would aim at upgrading to Java 11, we should thus have two jobs. Example for an hypothetical acme plugin:

# jjb/integration.yaml
- project:
    name: jenkins-plugin
    project:
        - acme
    jobs:
        - '{name}-{project}-maven-java8-docker'
        - '{name}-{project}-maven-java11-docker'
# zuul/layout.yaml
- projects:
  - name: integration/jenkinsci/acme
    experimental:
      - jenkins-plugin-acme-maven-java8-docker
      - jenkins-plugin-acme-maven-java11-docker

Then send your patch(es) as Gerrit change(s).

If you need to, update the version number in pom.xml, for example 1.103-wmf.1. Then run mvn install, which should create a <name>.hpi file somewhere. You can upload that file in the Plugin Manager using the "Advanced" tab. If it tells you to restart Jenkins, kill any long running jobs, and then it should restart shortly. Then try out your newly patched plugin!

Gearman plugin

Jenkins jobs are registered to Zuul using the Gearman protocol via the Gearman plugin. It has long been abandoned by upstream since they no more use Jenkins and the Gearman java library has not been updated since roughly 2012. We eventually have hit a wall with Java 11 which forced us to maintain forks of those legacy software.

https://phabricator.wikimedia.org/T271683 has some history.


gearman-java library

The gearman-java library was maintained using Bazaar on https://code.launchpad.net/gearman-java . We have converted it to a git repository: https://gerrit.wikimedia.org/g/integration/gearman-java (changes).

The original code was written for Java 5 and worked fine until Java 11 made some incompatible change to their select() abstraction:

That actually fixes the implementation to match the specification.

The library dependencies have been updated to bring support for Java 11.

The artifacts are created with maven and published to Archiva ( https://archiva.wikimedia.org/#artifact/org.wikimedia.gearman/gearman-java ).

Jenkins plugin

The Jenkins plugin had a complicated history:

Antoine "hashar" Musso (talk) envisioned maintaining another fork on Wikimedia Gerrit, after discussions with OpenDev people, it sounded better to have the plugin under the Github jenkinsci organization to potentially benefit from shared maintenance from the Jenkins community.

On January 2021, Antoine "hashar" Musso (talk) got admin access to https://github.com/jenkinsci/gearman-plugin and merged in the gooddata fork.

A patch has been made to use the updated gearman-java provided by Wikimedia archiva ( https://github.com/jenkinsci/gearman-plugin/pull/7 ).

The intent is to release from that repository and have the forked plugin made available via the Jenkins plugin manager. As of Feb 18th 2021 it is pending appropriate access https://github.com/jenkins-infra/repository-permissions-updater/pull/1830 . Meanwhile one can build the plugin locally and upload it manually via the plugin manager.

How to

Restart Jenkins

See mw:Continuous integration/Jenkins#Restart.

Logs

Main application logs:

journalctl -u jenkins -f

Disable plugin

Jenkins plugins are placed in /var/lib/jenkins/plugin/ and ends with a .hpi extension. To disable a plugin, rename it to .hpi.disable and restart Jenkins!

Java thread dump

Whenever Jenkins appears to be stuck or facing high CPU usage, you will want to look at the Java threads: https://integration.wikimedia.org/ci/threadDump

You can also send signal 3 to the java process, Jenkins will write a thread dump in /var/log/jenkins/jenkins.log

kill -3 "pid of jenkins"

kill -3 apparently kills Jenkins :(


Another way:

jstack -F <pid of jenkins>

And there is a more verbose dump written to /var/log/jenkins/jenkins.log

For deadlock detection:

jstack -l -F <pid of jenkins>

Java info

sudo -u jenkins jstat -gcutil PID_HERE 1000 3

See also