Toolforge Workgroup meeting 2022-12-13 notes

Participants

Andrew Bogott
Arturo Borrero Gonzalez
Bryan Davis
David Caro
Nicholas Skaggs
Raymond Olisaemeka
Seyram Komla Sapaty
Taavi Väänänen
Vivian Rook

Agenda

Buildpacks: status, open questions, next steps.
Toolforge design docs.
Kubernetes upgrade cadence.

Notes

Thanks Nicholas for capturing notes!

buildpacks

AB: Next two meetings are scheduled for Jan / Feb. Look for calendar invites.

AB: First agenda item, build packs

D: How familiar is everyone? Buildpacks are a way of building docker images that allow you to detach image building from user. It allows user to not care about underlying implementation. User doesn’t have to provide docker file, only code.

D: On the maintenance side, allows for building images with layers. To build application, you need a builder image. Builder image is provided beforehand. Builder image has 3 things. Another embedded build image, embedded run image, and bunch of buildpacks. These buildpacks are scripts. For example, python application, builder image looks at code, detects python, inside the build image it will run the buildpack steps and build them as layers for the final image. The layers are put on the run image. Run image + stack of layers from buildpacks. So in the future, the root image can be upgraded / changed and then the same buildpack built layers and run image can be applied on top. Upgrades can happen this way without user involvement.

AB: Raymond, Slavina, David have been working on the toolforge build service. Any updates to share?

D: Goal was to release minimum functionality. User stories doc (Toolforge build service - Beta release user stories), phabricator tickets mostly created. Right now, core implementation using Tekton pipeline. User can run toolforge build, git url, and name of image and it will trigger build. Build image, push to harbor. Then with toolforge webservice, pass image name, webservice will run. This kind of works, can be manually run. Right now only select buildpacks allowed, and no layering of buildpacks (languages) is possible.

AB: All in toolsbeta?

D: Yes, once all the features are ready plan to move to tools proper. At the moment still testing and refining features.

AB: What will happen in the short-term? What’s the next goal for buildpacks?

D: Early beta next quarter potentially. Get user feedback and work on the next set of features. Multi-stack buildpacks. Redesign cli a bit. Secure Logging isn’t possible yet (potential auth leaks). Beta would be for webservices. Single webservice, single language stack.

AB: Separate into an API? Is the intention to do this after the beta?

D: For beta, yes, just single user story. Post beta working document (Toolforge Build Service - User stories beyond beta). Feel free to edit / comment / suggest!

AB: Buildpacks is adding more custom components into k8s. Taavi has a proposal to better manage adding components to k8s. When should we start to figure out how to sustainably add components? Before / After the beta.

D: They don’t have to be linked. Can be started now or later. Might be interesting to utilize Harbor and replace docker registry with Harbor. Could simplify things, allows helm charts, API, better integration. Even if not reusing Harbor, toolforge build service shouldn’t block changes. Tools and toolsbeta would need to be figured out.

AB: Couldn’t a single harbor server serve tools and toolsbeta?

D: Would recommend separating to test harbor upgrades, etc.

AB: Could have 2 harbor servers. One in tools and one in toolsbeta. But in the tools one, have 2 spaces, so docker images are served from “tools” harbor serving both tools and toolsbeta. The “toolsbeta” harbor could have fake data for testing.

D: Potentially. Still need namespaces from tools themselves. Needs more discussion

B: Sounds like the same thing as aptrepos? Have distinct repos for tools and beta.

D: Siomilar. But in toolfogre build service, it speaks directly to harbor. Need to keep them separated, but images are going to come from only one place.

AB: Ideally we are using helm, and nothing changes beyond the deployment location.

D: Discussion point - Our own buildpacks. Today we have buildpacks we built from upstream. This is done to avoid any breaking changes for users. This allows complete control but requires some burden to maintain. It also doesn’t align with upstream philosophies. Do we want to continue supporting our own buildpacks or push people to upstream buildpacks?

T: Is it higher or lower than the current burden?

D: Maybe higher? There’s several images, can be harder to debug?

T: Would be nice to maintain control, but unclear how buildpacks are used elsewhere

D: Our buildpack is forked from heroku, based on debian. Ours has pypi, etc. Heroku uses something called procfile to define entry point. Instead we use a shell script that supports existing behavior. Heroku would allow users to define starting point. Uswgi, anything they want to use.

D: Unofficial support for things like rust from third parties exist. We might want to allow some 3rd party build images.

D: So should we maintain our own buildpacks or reuse upstream? https://gerrit.wikimedia.org/g/cloud/toolforge/buildpacks is the current list

B: Are we making a realtime decision here?

D: Some consultancy is being down. Soliciting opinions and ideas. Decision not happening right now.

AB: Decision can come later offline. More information on users and usage will be helpful

Toolforge design docs

AB: Next item, toolforge design documents. David started some documents thinking about what are toolforge users, maintainers? What are tools? What use cases? Toolforge user experience - user stories brainstorm. Is https://lists.wikimedia.org/hyperkitty/list/cloud-admin@lists.wikimedia.org/thread/CHGYMEUYSIFU3AIVPXFKEFIDT7UK6YV2/ the same topic, re: Making toolforge a platform?

D: Not the same, but linked. The email comes from and informs the brainstorming about user stories.

AB: Why do we support the things we do support in toolforge? For example, why support cronjobs? Webservices php and python. I was involved in tools Toolforge subdomain. But for other things, any context on how we got here?

A: Not an easy answer. Inherited a project in the beginning, toolserver. Attempted to support the use cases that were running on toolserver. Largely incremental since. AKA, we supported things that were running on the grid. Toolserver supported SGE. SInce, have supported what’s been using to support and followed user requests. Still, largely based on supporting old use cases

AB: Webservices isn’t native to grid? So when/hy was it added?

A: Don’t know

B: In the olden days of toolserver, there was a classic shared hosting environment. Home directory, dump html and cgi there. Webservices, and running on the grid and proxying back out to the world was invited by YuviPanda crica 2014. Invented the thing now called dynamicproxy in puppet (originally yuviproxy :-) ). Webservice was originally webservice2 to transition, and other historical artifacts.

AB: Formal description of feature ever created?

B: No, someone would ask, someone else would hack a solution.

AB: The proposal in the email, it’s a big change. Past has been free-form and organic. Now we’re attempting to formalize that. How do we feel about changing methodology?

D: Goal is not to change how user input or ignore it. Rather to define how we use input given. Combining both, not one or the other. Users asking for something could still happen, provided we can support it. But rather, trying to answer the ‘why’ question and defining what it means to have an operational platform

T: Documenting what is it, what we are attempting to change with toolforge?

D: Capture both what do we do with toolforge and what do we want it to do. The email focuses on just the ‘what do we want to do’?

AB: Asking questions about the future. Need a place to formalize. This meeting exists because toolforge is complex and lots of moving parts. But there is a difference between this and how we have historically operated

A: Restate premise of proposal. Currently the way we handle use cases is reactive. We’re surprised by what users do. IE, the grid migration has lots of unknown or surprising use cases. The opposite of that would be providing a nice abstraction for our users, allowing for future change without breaking anything, nor requiring an understanding of what users are doing.

A: Love the idea of updating things. But also appreciate the beauty of providing options of providing a wiki, and users get creative and invite things and don’t want to prevent future ideas. Wikis don’t work in theory, but do in practice. Still, having just a few commands to implement and support, sounds great. The idea of all

D: Idea not to take anything anyway. For users that simply want a platform, make toolforge nice and easy for them. If they want to be creative, give them a k8s and let them be free to experiment. Today we have both and it’s hard to support both at the same time.

V: We have a tendency to create things that are the lesser of wider supported open source projects. "there's lots of arguments in favor of having a well designed abstraction layer" k8s is a well defined abstraction layer, no?

A: Vivan, that's a good point, maybe we don't need to invent new abstractions if good third party abstractions already exist.

T: the whole k8s api is really complex, and trying to maintain a well-functioning shared cluster has proven rather difficult in the last few years

D: Yes, k8s as a service would be the abstraction layer. But toolforge goes beyond that.

B: My back in the day pitch was to find a FOSS PaaS and use it here.

A: Maybe k8s isn’t a well defined abstraction today, it’s changing quickly. Maybe in 10 years it’ll be more stable? Feels messy

T: I agree, k8s api is tied to implementation. Implementation changes, API changes. It’s not been stable like the grid.

AB: That’s the tradeoff. Grid is dead, so nothing is changing. K8s is alive, so changes are happening.

k8s upgrade cadence

AB: Last agenda item. K8s upgrade cadence. Discussed on IRC.

N: Could have a decision request, but need a proposal. I don’t

D: From CNCF user group, most major hosting services are upgrading twice a year, and skipping a version each time. Trying to be a bit slower. Most are running 1.22/1.23.

T: For context, the latest is 1.26. We are still running on 1.21. Almost ready for 1.22.

AB: Would also affect other k8s clusters, like PAWS

V: I generally like n-1, but that probably won't happen which is alright

N: Do we agree we want a cadence?

AB: Concerned about resources

T: Yes, I’ve been doing the upgrades, mostly on a when I feel like it basis. Upgrades will take time, and needs resourced.

B: I just double checked, the production k8s cluster is v1.16.15

T: prod is working to upgrade to 1.23, and they can skip versions ahead (and we can't')

V: I'm not sure what a cadence would mean in this case. I think it would be silly to consider as there are many different projects that might need different rates

D: dedicate 1 person 1 quarter a year to do upgrade?

V: I believe a dedicated person would be awkward. The projects are different with different requirements. And different knowledge sets would likely be needed to have successful upgrades

N: VR: Maybe you're right, I guess perhaps defining this for toolforge specifically would be helpful

V: Ah yes, in that case much more can be accomplished

AB: Action items?

Action items

Buildpacks:
- Open questions: Which buildpacks to use?
- Beta for buildpacks
Toolforge design docs:
- Let’s review Toolforge user experience - user stories brainstorm docs next workgroup meeting.
- Per Taavi, please move to a wiki at some point 🙂
  - Expect them here https://wikitech.wikimedia.org/wiki/Portal:Toolforge/Ongoing_Efforts/Toolforge_Build_Service
k8s upgrade cadence: Create ticket to start discussion on toolforge k8s upgrade cadence