UEFI Boot
This is WIP work to transition from legacy bios to UEFI on baremetal servers, and perhaps VMs in the future.
Major wins:
- 100% UEFI, no more legacy BIOS
- 100% Redfish, no more legacy IPMI
- 100% HTTP, no more legacy TFTP
Workflow
Apply the correct partman recipe to support UEFI
The disk partitions layout changes a bit when using UEFI, so you need to verify what your server needs before proceeding. Most of the standard Partman recipes have EFI counterparts these days and if you miss anything, please reach out to the I/F team for help.
If you want to convert your server to UEFI, you'll need to add/create/use the right combination of partman recipes that are EFI-enabled. The long term plan is to make the -efi recipe variants the defaults, but we are still testing them right now so we are not ready to do it.
Things to keep in mind when choosing the new recipe:
- The
standard.cfgrecipe be swapped tostandard-efi.cfgwithout any other consideration. The same applies to possibleraidX-Ydev.cfgconfig snippets. - If you have a custom recipe you have to add to it the support for the EFI partition, see for example https://gerrit.wikimedia.org/r/c/operations/puppet/+/1087538.
- Please work with the I/F team before merging a new recipe :)
Depool the server and downtime it
This step is needed only if you are converting a server from BIOS to UEFI. The server may be unbootable between re-provision and reimage, so please keep in mind that when scheduling maintenance (namely, it may take more than planned the first time).
Reconfigure the server to boot via UEFI
During the initial server provisioning use the --uefi parameter to the cookbook parameters. If "converting" an existing server, use --no-switch --no-users --no-dhcp --uefi
The provision cookbook will likely reboot the server to apply the new settings, and at this point it may not be able to boot anymore since the disk partitions are not the expected ones.
You can rollback the change by re-running the sre.hosts.provision cookbook without the --uefi flag.
Re-image the server
From a user's point of view, there is no difference from a legacy BIOS re-image. No extra flags needed etc.., just reimage as you usually do. Note: if you use the --tftp-only flag it please remove it, it shouldn't be needed anymore.
Under the hood
Multiple parts are at play here:
- The re-image cookbook will reconfigure (using Redfish) the server to boot once over UEFI HTTP (instead of booting to disk)
- In parallel it will setup a DHCP snippet for that host containing:
- The URL of the iPXE loader to fetch to initialize the boot (in the
filenameoption) - The OS release to boot, sent by abusing the
root-pathoption - A mandatory
vendor-class-identifieroption set toHTTPClient, this instructs the client to interpret thefilenameas a URL
- The URL of the iPXE loader to fetch to initialize the boot (in the
- Instructs the server to reboot (still using Redfish)
Server boot sequence:
- UEFI sends a DHCP request, DHCP server replies with the data listed above
- UEFI fetchs and load the file, which is an iPXE image, however the UEFI loader exposes very limited DHCP data to iPXE so it is unable to obtain the
root-pathoption - iPXE loads a script file named
autoexec.ipxefrom the same parent directory as the iPXE image (adaptation of our current ttyS0-115200)autoexec.ipxeissues a second DHCP request to obtain theroot-pathoption, providing the Debian release to useautoexec.ipxeinstructs iPXE to boots the indicated debian-installer directly (No need for Grub)
- Debian Installer issues another DCHP request to obtain its networking config
- Debian Installer fetches its
preseed.cfgvia HTTP
Known limitations, future work and notes
Future work
- Enable secure boot
- Switch to HTTPS instead of HTTP
- Store target OS release in a source of truth (eg. Netbox) and expose it to iPXE using a webserver
- This will also eliminate the need to do a second DHCP query in iPXE
- See https://ipxe.org/scripting#dynamic_scripts
- Remove DHCP in the production environment
- This is doable as we can configure static IPs through Redfish to perform the UefiHttp boot. The next step would be to communicate that network configuration to debian-installer
- Test problematic NIC firmware with UEFI - T304483
- Use UEFI for Ganeti VMs - https://github.com/ganeti/ganeti/issues/1374 (continuation of T93208)
Known limitations
- When upgrading the SuperMicro BIOS, it's possible that some options get lost, for example
IPv4HTTPSupportset back toDisabled. - SuperMicro boot sequence sometimes gets stuck at loading the HTTP Boot file
URI: http://apt.wikimedia.org/tftpboot/snponly.efia reboot "unstuck" it. This seems to be a bug in SuperMicro's UEFI TCP stack. Ticket CYC-480-35337 opened with their support. - If older Dell servers (as in hardware which was previously installed with BIOS) are switched to UEFI, they should get upgraded to the latest firmware before reprovisioning them to UEFI
- On SuperMicro hosts there is a bug that causes the EFI boot settings set by the Debian Installer to get lost upon reboot. This usually ends up in another UEFI HTTP boot, another Debian install and eventually reimage getting stuck because of puppet cert errors. The next reimage usually fixes the problem, more info in T381919 and T404356.
Notes
- We initially tried to have UEFI boot to
bootnetx64.efiover TFTP, one major limitation here was that grub would always try to grab files at/debian-installer/xxxinstead of the our usual/DEBIAN-VERSION-installer/debian-installer/xxxthis over TFTP or HTTP, which made customizing the OS to boot more problematic. Possible workarounds were for example to start a web-service on a different port for each Debian version.[...] 14:56:12.823503 IP sretest1002.eqiad.wmnet.1837 > install1004.wikimedia.org.tftp: TFTP, length 74, RRQ "bookworm-installer/debian-installer/amd64/grubx64.efi" octet blksize 512 14:56:14.874395 IP sretest1002.eqiad.wmnet.25300 > install1004.wikimedia.org.tftp: TFTP, length 81, RRQ "/debian-installer/amd64/grub/x86_64-efi/command.lst" octet blksize 1024 tsize 0 [...]
- Other people had the same issues as us, for example https://lists.debian.org/debian-boot/2017/11/msg00434.html (2017) or https://forums.debian.net/viewtopic.php?p=780262&sid=a00d048bf625a4951d559afe098eb59f#p780262 (2023)
- We need to use the
snponly.efiversion of iPXE which uses the UEFI network stack. Using the "full" iPXE causes downloads to get stuck on Supermicro hardware. - On some older platforms (eg. R440) Dell has a bug where it doesn't expose the DNS to iPXE, and thus iPXE can't can't fetch
autoexec.ipxeif its URL requires a DNS resolution, this has been worked-around by using the APT server's IP directly - https://github.com/ipxe/ipxe/issues/1316
Resources
- Working configuration files, similar to what we wanted to do - https://github.com/gaberger/att-uefi-boot
- Supermicro manual - https://www.supermicro.com/manuals/motherboard/X12/MNL-2246.pdf
- SuperMicro Redfish doc - https://www.supermicro.com/manuals/other/redfish-ref-guide-html/Content/general-content/bios-configuration.htm
- HP Presentation "Firmware in the datacenter: Goodbye PXE and IPMI. Welcome HTTP Boot and Redfish!" - https://uefi.org/sites/default/files/resources/UEFI_Plugfest_May_2015_HTTP_Boot_Redfish_Samer_El-Haj_ver1.2.pdf