Jenkins

From Wikitech
Jump to navigation Jump to search

Jenkins is software from the Java community, that assists in building a continuous integration system. It offers a highly modular system which has allowed other communities to add their own plugins (such as GIT, PHP...). Wikimedia Foundation is using Jenkins to run CI tests for MediaWiki and various other products and tools, and to generate alpha builds of the Wikipedia Android app.

Administration

Release engineering maintains four Jenkins instances:

Host Description
contint1001.wikimedia.org The CI Jenkins https://integration.wikimedia.org/ci/
contint2001.wikimedia.org Cold CI Jenkins (stopped)
releases1002.eqiad.wmnet The releasing Jenkins https://releases-jenkins.wikimedia.org/
releases2002.codfw.wmnet Cold release Jenkins

To get administrative access on the CI Jenkins instance you would need to be added to LDAP group ciadmin. As of December 2019, the releases Jenkins permissions are managed differently (ask release engineering team).

To get shell access, your account has to be added to the contint-admins and releasers-mediawiki shell groups. That is done in operations/puppet modules/admin/data/data.yaml. One can use matrix.py to list their group:

$ ./matrix.py hashar
grp/users	hashar
contint-admins	OK
contint-docker	OK
contint-roots	OK
deployment	OK
gerrit-deployers	OK
gerrit-root	OK
labnet-users	OK
releasers-mediawiki	OK

If one needs root access (to look at Apache logs, upgrade Jenkins, change file permissions etc), that is granted by the group contint-roots for both CI and release Jenkins hosts.

Upgrading

Upstream does a release every week and a stable one from time to time, also named Long Term Support (LTS). We want to upgrade to the latest LTS whenever it is released.

Get the package

Debian packages and are made available at:

Per SRE convention, the packages need to be copied to apt.wikimedia.org. Reprepro is configured to do so semi-automatically, so on apt1001 one only needs to do:

 cd /srv/wikimedia
 reprepro checkupdate-C thirdparty/ci --restrict=jenkins  buster-wikimedia

Verify the output and then:

 reprepro update -C thirdparty/ci --restrict=jenkins buster-wikimedia

Congratulations.

In some case upstream might release a new LTS which has breaking changes and backport the security fixes to the LTS we currently use. In the case the breaking change is not trivial, we need to instruct reprepro to update the previous LTS. Since reprepro update does not offer a way to match a specific version, we have to first delete the package and then get it again with a specific version. Given a LTS 2.100.1 currently known and upstream releasing a new LTS 3.0.0 and a hotfix one 2.100.2 we would:

$ reprepro list buster-wikimedia jenkins
buster-wikimedia|thirdparty/ci|amd64: jenkins 2.100.1

$ reprepro remove buster-wikimedia jenkins
Exporting indices...

Deleting files no longer referenced

Then get a the specific version 2.100.2 instead of the breaking one 3.0.0:

$ reprepro --restrict-binary jenkins=2.100.2

$ reprepro list buster-wikimedia jenkins
buster-wikimedia|thirdparty/ci|amd64: jenkins 2.100.2

Precheck

Review the upstream changelog https://jenkins.io/changelog-stable/ and try to guess what might cause havoc. There is no clear science there. The community reported bugs might raise a red flag.

Deploying

Jenkins is deployed in two places:

  • the two virtual Ganeti hosts, releases1001.eqiad.wmnet and releases2001.codfw.wmnet, for releases, and
  • the two bare-metal hots, contint1001.eqiad.wmnet and contint2001.codfw.wmnet, for CI.

It can safely be restarted at anytime as long as the developer community is notified about it. (e.g. via Wikitech-l). When Jenkins is restarted, all current CI builds are aborted. New builds will automatically be enqueued again by Zuul once Jenkins is back up.

When doing an upgrade, you will most probably want to have one continuous integration person floating around because plugin dependencies might break once Jenkins is upgraded.

  1. Make sure the developer community are aware this is happening; (!log it in #wikimedia-releng and announce it in #wikimedia-operations a few minutes beforehand).
  2. Monitor https://integration.wikimedia.org/zuul/ to wait for a good time when no major jobs are about to finish.
  3. First, upgrade the releases hosts, since they are less impactful if something goes wrong:
    1. On releases1001 as a root, run apt-get update && apt-get upgrade jenkins
    2. Wait for https://releases-jenkins.wikimedia.org/ to come back up; check that it works.
    3. Check for any alerts from Jenkins that things are broken.
    4. Then upgrade releases2001 in the same way.
  4. Second, upgrade the CI hosts:
    1. On contint1001 as a root, run apt-get update && apt-get upgrade jenkins
    2. Wait for https://integration.wikimedia.org/ci/ to come back up; check that it works and that jobs process as expected.
    3. Check for any alerts from Jenkins that things are broken.
    4. Confirm that https://integration.wikimedia.org/zuul/ shows jobs processing as normal.
    5. Then upgrade contint2001 in the same way.
  5. !log that the upgrade is complete and mark the task in Phabricator as done.

Security updates

A few days before releasing a security update, upstream sends a pre-announcement to a mailing list (jenkinsci-advisories). The internal mailing list of the release engineering team is subscribed to it. Upon reception, a security task is filled in Phabricator against #continuous-integration-infrastructure and #jenkins.

The are two kind of security updates for Jenkins:

  • an issue in Jenkins itself, which would triggers a release of Jenkins. It is thus important to always stay up to date and follow Jenkins LTS cycle.
  • issues in plugins, they typically affect several plugins which are released in bulk

Jenkins security update

When the security update is made public, one has to first review the security bulletin to gauge how affected we are. Some security release of Jenkins do not affect us at all, some others are absolutely critical and require an immediate upgrade. The upgrade is as above: The Jenkins Debian package is updated using reprepro as described above and apt install it.

Plugins security update

For plugins, the update is done via the plugin manager. Use the Check now button at the bottom of the page to fetch the latest list of plugins from upstream, then update affected plugins. Note: plugins might have back compatibility issue or could expect XML elements different than those generated by Jenkins Job Builder, there is no solid process there.

Plugin managers:

How to

Restart Jenkins

See mw:Continuous integration/Jenkins#Restart.

Logs

Main application logs:

journalctl -u jenkins -f

Disable plugin

Jenkins plugins are placed in /var/lib/jenkins/plugin/ and ends with a .hpi extension. To disable a plugin, rename it to .hpi.disable and restart Jenkins!

Java thread dump

Whenever Jenkins appears to be stuck or facing high CPU usage, you will want to look at the Java threads: https://integration.wikimedia.org/ci/threadDump

You can also send signal 3 to the java process, Jenkins will write a thread dump in /var/log/jenkins/jenkins.log

kill -3 "pid of jenkins"

kill -3 apparently kills Jenkins :(


Another way:

jstack -F <pid of jenkins>

And there is a more verbose dump written to /var/log/jenkins/jenkins.log

For deadlock detection:

jstack -l -F <pid of jenkins>

Java info

sudo -u jenkins jstat -gcutil PID_HERE 1000 3

Patch a plugin

If you absolutely need to patch a plugin without an upstream release, that's possible. First, fork the Git repository to Gerrit under integration/jenkinsci/, and then submit your patch as a Gerrit change and merge it.

If you need to, update the version number in pom.xml, for example 1.103-wmf.1. Then run mvn install, which should create a <name>.hpi file somewhere. You can upload that file in the Plugin Manager using the "Advanced" tab. If it tells you to restart Jenkins, kill any long running jobs, and then it should restart shortly. Then try out your newly patched plugin!

See also