SRE/Infrastructure Foundations/Other/Hackathon Ideas
Requirements: Length: 5 working days, Project team size: 2+
Ideas
Title: Add SLOs to some of our tools
a. There is a major effort at the WMF to move our services to SLOs, trying to apply a more SRE-like culture to our engineering departments. The Service Ops team kicked off the work with some standard templating like SLO/Template instructions, that could be used for a formal onboarding of some of our tools like Netbox and Puppet.
b. Number of People Needed (max): 2
c. Person Submitting: Luca
d. Team:
Title: Run the reimage cookbook inside DCL/Pontoon
a. We have been talking about having a prod-like environment where to test changes to cookbooks like "reimage", this may be a good occasion to figure out if it can be done or not. The goal would be to create a recipe in DCL or Pontoon to create a new environment equipped with basic services like Netbox, load fake data representing a set of hosts/infra and then run cookbooks like reimage.
b. Number of People Needed (max): 2/3
c. Person Submitting: Luca
d. Team:
Title: eBPF tracing for MariaDB master load
a. Description: We meant to get to this in the last hackathon, but we only got as far as enabling reverse DNS of k8s pod IPs, a necessary prerequisite. https://phabricator.wikimedia.org/T372943#10081783
b. Number of People Needed (max): 2-3
c. Person Submitting: Chris
d. Team:
Title: Maintenance emails parsing
a. Description: Automatically parse the maintenance emails sent by all our providers for awareness, conflict detection, less toil and one day auto-mitigation. https://phabricator.wikimedia.org/T230835
b. Number of People Needed (max): 2/3
c. Person Submitting: Arzhel
d. Team:
Title: Expand on the UEFI boot work
a. Description: See the UEFI Boot#Future work section, we could work on UEFI for Ganeti VMs, secure boot, HTTPS, using netbox as source of truth for OS version or reducing the need for DHCP
b. Number of People Needed (max): 2/3
c. Person Submitting: Arzhel
d. Team:
Title: Evaluate the netbox-bgp plugin
a. Description: BGP sessions are currently either managed by custom code, or not at all through our automation. we have a growing need to define them in a standardized way. The netbox-bgp plugin might be the tool we need for that, in addition to an abstraction layer (netbox scripts or cookbooks).
b. Number of People Needed (max): 2/3
c. Person Submitting: Arzhel
d. Team:
Test installations via HTTP Boot:
a. Description: d-i in trixie support HTTP Boot. This allows you to simply specify a URL (to any d-i or live image iso) in your computer’s firmware setup and then you can boot to it directly over the Internet, so no need to download an image, write it to flash disk and then boot from the flash disk on computers made in the last ~5 years. This is also supported on the Tianocore free EFI firmware, which is useful if you’d like to try it out on QEMU/KVM. (quoting from https://jonathancarter.org/2025/08/10/debian-13/)
We should eventually test this with a Dell, Supermicro and a VM. If it works reliably, we could simplify installations for trixie by enabling and initiating HTTP Boot via Redfish.
b. Number of People Needed (max):
c. Person Submitting:
d. Team:
_____________________________
Template:
Title:
a. Description:
b. Number of People Needed (max):
c. Person Submitting:
d. Team: