Toolforge Workgroup meeting 2023-06-06 notes

Participants

Andrew Bogott
Arturo Borrero Gonzalez
Bryan Davis
David Caro
Nicholas Skaggs
Taavi Väänänen
Raymond Ndibe
Francesco Negri

Agenda

Build service status and next steps
- New admin page https://wikitech.wikimedia.org/wiki/Portal:Toolforge/Admin/Build_Service
Image build and publish automation on gitlab
- Questions here https://phabricator.wikimedia.org/T336130#8905435
Envvars service almost ready
State of lima-kilo.
Any reflections on Toolforge vs Hellforge (PaaS vs(and?) managed-k8s). How to drive this initiative forward?
- https://wikitech.wikimedia.org/wiki/User:Andrew_Bogott/products
Would some serverless offering be part of Toolforge, or a new brand/product? How to drive this initiative forward?

Notes

Build service status and next steps

DC: working and open. A few users are using it already. Different results. Some users are still waiting for multi-stack and custom packages support. Python and PHP available.Some users waiting on features, like envars service, as well as install apt packages will unblock more use cases. Once these are present, can expand to more users and get more feedback. Eventually plan to open service to everyone.

FN: Looks like most people in phabricator https://phabricator.wikimedia.org/T336669, agree to allow apt buildpack. This wouldn’t require multistack. Any opinions / concerns? Will this unlock things?

NS: No multistack support upstream, would rather find other solutions, even bespoke ones.

BD: Multistack needs can be fixed largely with apt. Most multistack I know of needs a second runtime but not a second package manager integration.

FN: Multistack is almost like launching a new feature. Should recruit others to test.

TV: How will apt buildpacks work? What will they install?

DC: Buildpacks use os of the image itself, so yes it will pull from ubuntu as of now. Might be possible to pull package from somewhere else. Will we allow people to pull packages from anywhere?

TV: This could affect how we update images for example

DC: Some things could break if things are backwards compatible. For example, pulling from the system, versus language repo that’s more OS independent

TV: Can users configure which OS version they are using?

DC: No, it’s hardcoded into the builder. We could change it, build our own builder, but that’s the default

ABG: Voting for option 2 in that ticket is intended to smooth transition to k8s.

Image build and publish automation on gitlab

ABG: Last toolforge meeting, discussing what to automate next, and image build was the first recommended step. This work is inline with that.

DC: Building images for several repository components. Building images, pushing to toolsbeta. Once merged, it will appear as a jon to run with the commit hash as the tag. Next step, building and publishing the chart. To publish chart, need commit hash. Would be nice to publish chart and image together. Maybe need to change to allow for automation for this

TV: Does helm require proper version number (YES). It might be a good idea to have proper version number for everything. That way both the chart and image can have a predictable number. Avoid multiple commits.

DC: So image itself should have the same version number as the chart?

TV: Yes. Doesn’t make sense to not version chart along with code.

DC: Helm has two versions. Helm chart version, and the app version, can be whatever. Will it be in the same merge branch, so it gets published alongside it?

ABG: Another example is nginx. Helm chart version, and an included software version, and support for k8s version. Removing one of those columns would simplify.

TV: Yes, single version for both would help.

TV: Standardized automated scripts to do so

ABG: How will this fit with Lima-Kilo? Probably can talk about that later?

Envvars service almost ready

DC: Deployed in toolsbeta with code still pending review

DC: Might be a candidate for using the new external Helm repo?

FN: what's the comms plan? Not tied to buildservice correct?

DC: yes, independent of buildservice. Was thinking of announcing along with the next wave of buildservice announcements.

ABG: Uses an admission controller that will need to be replaced?

DC: yes. Didn't want to block on waiting for new solution

ABG: can we make this work with jobs service?

DC: Currently switched on "toolforge" label so probably already working

TV: all secrets currently mounted to all pods in the namespace. Should we add a way to mark vars with where they should be available?

DC: I've had some thoughts about that, but for now all vars to all pods is easier to implement

State of lima-kilo

FN: haven't tried it yet, but want to soon

ABG: there are some open phab tickets about things currently broken https://phabricator.wikimedia.org/T338153 https://phabricator.wikimedia.org/T338156 https://phabricator.wikimedia.org/T338158

DC: I was using lima-kilo but I stopped because of a bug in Kubernetes (kind)

ABG: would you suggest moving from kind to minikube?

DC: the home folder might not be in your local home (not sure). We also hardcode the tool username, but we could change it to be more flexible

ABG: I don’t have much time this quarter to work on lima-kilo but I will keep an eye for reviews, etc.

Any reflections on Toolforge vs Hellforge (PaaS vs(and?) managed-k8s). How to drive this initiative forward?

AB: The idea is I think we should provide managed, byoc as service. This shouldn’t be toolforge. Should allow us to corral users into one place. Could be a clone of existing toolforge to start. Other naming ideas: Kubeforge?

ABG: We should scope this. Currently provide kubectl access, as well as abstractions. Both existing in platform, but beyond sharing cluster, have very different use. Community sees value in both offerings. How do we move from the current reality into something different with multiple platforms? What intermediate steps to take? What to do with the ingress as-is? Toolforge.org domain for example, that is loved. New product in a different cluster, what to do with domain / naming?

BD: Ingress is the main thing that complicates this I think

TV: Would this be a shared cluster or a magnum style k8s-as-a-service product?

AB: Single shared cluster

AB: Today we limit containers, and the future product wouldn’t limit containers. New buildpacks could be a new product as well. Existing users who don’t move end up on kubeforge. Dont’ recommend, but it’s an option. Likely the new product would have new TLD, etc. Should largely consist of power users doing things outside of build service.

BD: How does adding a new thing make the thing we have easier to maintain? Discussions about this came from inability to upgrade k8s in place with random workloads

AB: The existing stepladder was made with users in mind, not with maintenance in mind. That wasn’t the thinking. Instead, trying to think about how to support users. Single platform versus separate can be discussed. Part of making platforms is about simplifying. From a user perspective, trying to keep things simple and easy to use. A platform providing limited commands should be easier to support and seamless upgrade paths, etc. I think this aligns with David’s vision of having a limit API, with limited commitments, with users buying in. And need a place for users who won’t fit in these constraints.

TV: to me this seems moving the hard parts of maintaining Toolforge to making the new thing hard to maintain, not making anything easaier

AB: Most of the things we do for our users is to lower your burden. If you want a cloud-like experience without that support, we should have an offering that allows for it. No guarantee it’s easier, just suspect it is. IF it’s a toolforge clone, maybe that’s easier.

FN: BYOC shouldn’t be hard. But not convinced it’s desired. Would like more data. At the moment, making build service. Having an additional option, if a VM is too hard, but BYOC isn’t. Docker image still might not cut it, and a VM is needed. Or full on k8s-as-a-service.

TV: I don't see the value of this over a k8s-as-a-service (which to my understanding is already being worked on). This feels like duplicated effort. Magnum and this is fairly close.

BD: The past requests for BYOC have generally been "I want to use this image I found on github"

ABG: +1 to Taavi’s concern. Lots of overlap between magnum and byoc.

NS: k8s-as-a-service, managed k8s, etc are similar. Likely shouldn’t offer them all.

DC: Admission controllers not needed, auth/access not needed, lots of simplifying. Pairing that with k8s as a service on the side, that can also be simpler. Could be simpler to maintain simpler versions of both as separate entities versus a single implementation. Would expect users to pick a solution that works for them. Should use different domains yes. Overall, shouldn’t be a big issue.

TV: you do still need to figure out how to isolate users of this service from each other, which are a large portion of our current policies and admission controllers

AB: Forking toolforge solves this problem.

BD: Sharing the *.toolforge.org namespace across multiple pools of tools is not simple

ABG: Proposal to have a dedicated meeting about this.

Would some serverless offering be part of Toolforge, or a new brand/product? How to drive this initiative forward?

ABG: Should this be part of toolforge? Based on discussions just now, no

BD: See serverless as the third thing in what we were just talking about.

FN: To me it’s mostly about reacting to event. Events are http calls. Big advantage for AWS lamba is the cost. Not sure if means we could run more tools on the same cluster.

Action items

Try to summarize feedback from build service. Send a proposal of next iteration.
Image build on gitlab: Chart version == image version
Envvars: someone to play & review it before going live.
Try to introduce Minikube support in Lima-Kilo
See how we can integrate Lima-Kilo with the new deployment repo
PaaS / serverless: Create dedicated meeting and write some drafts (phab, wiki page, etc)
- https://wikitech.wikimedia.org/wiki/Wikimedia_Cloud_Services_team/EnhancementProposals