Security/Production updates
Security updates of third party software for Wikimedia production are centrally coordinated by the Infrastructure Foundations SRE team.
How to get notified of security issues
The majority of third party software security announcements are made via the Debian announcements lists. The relevant lists to subscribe to are https://lists.debian.org/debian-security-announce/ and https://lists.debian.org/debian-lts-announce/. Another mailing list to subscribe to is https://www.openwall.com/lists/oss-security/ (which also receives notifications on some software not packaged in Debian (e.g. some parts of the Data Engineering stack maintained by Apache)).
How to assess whether we're affected
So, we've learned about a new security problem, now it needs to be figured out whether the component is installed anywhere. https://debmonitor.wikimedia.org/ is the central entry point to identify packages installed in our production fleet. Always search for the source package name first, it will list all the binary package installed.
The vast majority of packages are using the default versions in Debian stable releases. Using the Debian Package Tracker (https://tracker.debian.org/) is a good idea to identify which version of a package is shipped in Debian stable/oldstable/oldoldstable.
In many cases the installed versions will map to the standard version in either of the three distros. Possible deviations are:
- Software not available in Debian which has been packaged locally and provided by apt.wikimedia.org
- Backports of more recent package versions (which usually means that distribution security updates cannot be used as-is)
- Software with local patches applied (which usually means that the patch(es) need to be rebased on the security update)
How to determine the impact
Once we have established that a vulnerable software component is in use, the impact needs to be checked. If Debian has issued an update that generally means that the security issue is important, but there might be mitigating factors. One possibility is that we're not affected at all; possible reasons are
- vulnerabilities which only affect a more recent release than the versions deployed (e.g. a security issue in recent qemu code, while our cloudvirt and Ganeti servers are still on Debian oldstable)
- vulnerabilities in functionality we don't use (e.g. we don't use Secure Boot ATM, as such GRUB security vulnerabilities specific to us, are not relevant. We can still update them to keep using the most recent versions, but there's no urgency at all)
- vulnerabilities where only one part of the package is affected. E.g. we use the client side tools and libraries of ISC Bind, but not the DNS server. However, most security issues only impact the server. We can still update them to keep using the most recent versions, but there's no urgency at all
If we are affected, not all security are alike. There's no good rule of thumb, but e.g. the ability to execute code is more severe than "just" denial of service.
How to get the vulnerability fixed
- The version number of installed packages need to be checked against the three distributions in use. If the affected package is part of the standard version set in Debian stable/oldstable/oldoldstable and no local modifications are applied, when we can just go ahead and release the distribution update (see next section)
- If the software not coming from the Debian or Ubuntu archive and has been packaged locally or is used with a patch, it's often managed in operations/debs/FOO.git and the fix should be made here. If no such repository exists, you can simply grab the source from apt.wikimedia.org by running "apt-get source SOURCEPACKAGE" and apply the fix on top of the existing package (and upload the resulting build to apt.wikimedia.org)
How to roll out software updates
The steps for updating packages are libraries can be found at Software deployment.
How to roll out kernel updates
We run kernels from three different distros which relate to a given LTS kernel release:
- Buster: 4.19.x
- Bullseye: 5.10.x
- Bookworm: 6.1.x
Debian kernels have very little distro-specific patches and follow upstream LTS kernel releases.
When a new Debian kernel security update has been released we need to determine whether there's any critical security present which warrants a fleet-wide round of reboots. One such case would be a local privilege escalation which cannot be mitigated. A lot of security issues fixed in Debian don't affect us because we have
- disabled unprivileged user namespaces which reduces the attack surface of the kernel a lot
- blacklisted kernel modules we don't use (and thus also make them unavailable to attackers)
- applied sysctl settings which reduce the attack surface
Also, a lot of kernel security issues affect functionality we don't use; e.g. drivers not in use, network protocols not used or architectures not in use (e.g. ARM-specific or specific to 32 bits x86).
If a new kernel update needs to be rolled out, there's a handy script available in the puppetdb hosts. It needs the target version for a given distro and generates a template, e.g.
kernel_report --bullseye 5.10.218 > bullseye.txt
This template can be copied to a security Phabricator task (typically naming scheme is the month and the distro, e.g. June 2024 Bullseye reboots (https://phabricator.wikimedia.org/T366555). DB hosts can take long to reboot (since masters needs to failed over), so usually the DB hosts are split out to a separate task. The reboots of the servers are handled by individual SRE sub teams.