Jump to content

Portal:Toolforge/Admin/Monthly meeting/2025-10-22

From Wikitech

Attendees (shuffled)

  • Seyram Komla Sapaty
  • Filippo Giunchedi
  • David Caro
  • Andrew Bogott
  • Alexandros Kosiaris
  • Bryan Davis (bd808)
  • Taavi Väänänen
  • Raymond Ndibe
  • Francesco Negri

Notes

k8s upgrade workgroup progress

  • T372697 [infra,k8s] Upgrade Toolforge Kubernetes to version 1.31
  • tv: Had no time yet to get into the upgrade of k8s itself
  • Fn: is it the idea that both teams will get involved in the current upgrades or for now only tools-platform team will do them? Or it can be voluntary
    • decided to make it volutary for now, and discuss later/offline
  • dc: No updates on the kyverno replacement with k8s VAPs
    • It’s not a blocker for the upgrade (yet)
  • Ab: is production running on a single k8s version or there’s different versions on different cluster?
  • ak : right now is 1.31, they are not locked, it can happen individually, there’s 2 versions supported at all times
  • Ab: is the version jump an issue or it does not matter?
  • Ak: currently the process requires a full cluster rebuild, but the process will change soon

Push to deploy beta

Sustainability score

  • Sk: reached out with deadline end of october to get feedback, today will follow up with a reminder

NFS server update

TOM (toolforge on metal) update

  • Still planning/documentation stage:
  • Persistent volumes first?
    • That would probably add a lot of complexity. Although we want to have those eventually and move away from NFS completely.
  • Should we just use Cloud-VPS NFS servers?
    • DC: I would start with a separate instance of NFS.
    • AB: We can have a physical host (or Ganeti VM) running the NFS server, but still have the data on Ceph. So we don’t have storage tied to a single physical server.
    • AK: We probably don’t need a Ganeti cluster because the puppetization of the control plane allows to collocate both etcd as well as workloads (use the control plane as workers as well effectively)

Side question: is NFS scratch space, home dirs and clouddumps?

  • The complexity for NFS comes mostly from user and tool home dirs, scratch and dumps should be easy to just use/replicate

Logs/loki

  • DC: Rate limit changes this week: we’re losing fewer logs but we’re using more disk space