Jump to content

Portal:Toolforge/Admin/Monthly meeting/2024-09-03

From Wikitech

Attendees

  • David Caro
  • Slavina Stefanova
  • Francesco Negri
  • Arturo Borrero
  • Sarai Sanchez
  • Raymond Ndibe
  • Seyram Komla Sapaty
  • Bryan Davis
  • Lukas Werkmeister

Agenda

  • login-buster bastion
    • some people are still depending on it for their workflows (phab:T360488).
    • how can we support those tools/workflows without keeping the buster bastion running forever?
    • how do we make sure to capture all the tools/workflows that are currently depending on the buster bastion?
    • Andrew was feeling bold and changed login.tools.wmlabs.org to point to a newer bastion, tools-bastion-12
  • k8s upgrade workgroup progress

Notes

  • Round of introductions

login-buster bastion

  • ABG: can we just install the missing packages to the new bastions?
  • BD: bringing back fat bastions means we have fat bastions forever until we find out how to get rid of fat bastions. Taavi was hopeful that removing the exec environment meant users don’t abuse bastions instead of running things in k8s. If we bring back php, perl, etc. we’re gonna have users who go back to the old workflows of running things using “screen” in a bastion, and we have resource constraints problems like we had in the past (NFS, etc.)
  • ABG: I haven’t seen resource constraints in the bastions since we introduced systemd-based resource control
  • BD: it happens less frequently, but we’re still seeing some issues when a user consumes all the IOPS, etc.
  • ABG: systemd could not do resource control on IOPS, maybe newer versions can
  • FN: are there other users using the old bastion?
  • BD: there are 10 active sessions, but most are admins, 4 are real users
  • FN: how hard is it to satisfy the requirements of anomiebot without installing Perl?
  • BD: I spoke with Brad and they’re ok with not having the only thing that is currently not possible in a container. It’s an edge case on Brad’s workflow. For everything else it seems things can run inside a container. We can make a bastion container with curl libraries, jobs client, etc.
  • ABG: if we create a bastion based on a container image, we’re just abstracting the problem. Folks need some libraries, if it’s in a container we will have the same problems of deprecating the OS. Why is the container any different than a VM?
  • BD: from my point of view, a VM is 1 instance for everybody, 1 set of CPU, 1 set of IOPS quota. Containers running in k8s give you access to more CPU, more RAM, more IOPS, spreading it across the k8s cluster instead of concentrating in one instance. Another difference is that you can tailor the env to your needs: you can have a container tailored for a specific user, you’re not necessarily putting 8 or 9 languages together like we did with the old fat bastions
  • DC: I see the same advantages Bryan mentioned. There’s also the possibility to extend that service a bit more, e.g. a “shell” command/buildpack/container. Containers also mean each user can have their own version of a library. The difficulty is in forwarding things to the container, but it can be doable. For the specific problem of Anomiebot, there is a container that mostly works so we can probably use that. For other problems, it can be ok to install a package in the bastion as a workaround
  • ABG: Imagine that we have containers that work as bastions, we would still have the problem of deprecating software, unless we take a different approach. Another approach can exist: to enable the workflows of users interacting with Toolforge directly from their laptop, where they can install all the software they like. The “bastion” becomes their laptop.
  • DC: Definitely that is something that we are trying to enable with Toolforge API, CLI, etc. It will not solve all the user needs: some users will still want to ssh to Toolforge and run things there. About software upgrades: the container stops being part of us providing an image, it’s up to the users to upgrade their containers. Some containers will run old software, but it’s more isolated.
  • ABG: it’s the same with VMs, we have VMs running Debian Buster.
  • DC: but Buster VMs will have many issues like connecting to Puppet, etc. and it’s more likely to fail in the future. The container is smaller.
  • ABG: the fundamental process is that we have a policy like “we no longer want to run Debian Buster”. How do you enforce a software upgrade policy on a container? That problem is very similar for VMs and containers. What is the policy if there is a security issue in version x? It may be even easier if the image is a VM where you can simply upgrade a package instead of rebuilding.
  • FN: I agree containers also have issues, but I still prefer using containers instead of a single bastion because we don’t have to do a “big bang” upgrade that affects all users at the same time.
  • DC: upgrades on containers are not so necessary, using containers might remove the need of some security updates. Small bugs or memory issues will not be a problem inside a container. For upgrading containers, buildpacks offer a process to rebuild the base layer. I don’t mind if users are running an old Python version in a container, it will stop working at some point but it’s a problem for the user, it’s not likely to be a vulnerability for the platform.
  • ABG: the most pressing factor to enforce OS upgrades is Puppet. We share Puppet with the SRE team and they have a strong policy about OS upgrades. If we remove Puppet from the equation, we cut the ties with the SRE team and it becomes easier.
  • DC: I’m not sure, if everybody has access to a VM, and that VM is running an old version, they can cause damage. It will just delay the need of upgrades by one or two years but we will still need to upgrade the VM.
  • FN: can we set a timeline for shutting down login-buster? Is the container required by AnomieBot ready?
  • BD: the initial precondition is finding out if there’s a container that works for Brad. I can investigate that and report back by the next meeting.

k8s upgrade workgroup progress

  • SS: the tools k8s cluster was upgraded yesterday to 1.26. Everything went fine. We did toolsbeta last week. There was a cookbook that failed.
  • DC: in one of the workers the APT database was not correct, but it was easy to fix.
  • DC: 1.27 should be an easy upgrade, and will get us closer to a supported version. It’s going pretty well.

Action items

  • Bd808: Talk to Brad about the container for the AnomieBot bastion need.