Portal:Toolforge/Admin/Monthly meeting/2024-02-13
Toolforge council meeting notes
Attendees
- Andrew Bogott
- Arturo Borrero
- David Caro
- Francesco Negri
- Taavi Väänänen
- Komla Sapaty
- Raymond Olisaemeka
- Bryan Davis
Agenda
- Calendar confusion
- Preparation for tool shutdown on the 14th Feb (tomorrow)
- Re-visit last time's backlog grooming experiment
- Stabler image names for toolforge-jobs https://phabricator.wikimedia.org/T357388
- Re-visit toolforge next steps
Notes
calendar confusion
TV: should have been fixed, but it wasn’t fixed
AND: will delete one of the two.
Preparation for tool shutdown on the 14th Feb (tomorrow)
KS: some tools were identified. Tools are not being deleted, just stopped. Board: https://phabricator.wikimedia.org/project/board/6135/ Affected tools (stopped) are about ~117 in number.
AND: Hey Bryan, could you please send an additional announcement?
BD: Yes. Tech news, wikitech-l, etc.
AND: What else for tomorrow?
KS: There is a script to gather data.
BD: There are instructions on how to `unlock` tools. https://lists.wikimedia.org/hyperkitty/list/cloud-admin@lists.wikimedia.org/thread/6O257XN7PNV434FQVVY4ROYPTHOTDPID/
Re-visit last time's backlog grooming experiment
FN: We tried last meeting. Not a lot of success. There was a follow up. We kind of need something more continuous for looking at the backlog.
AND: So we don’t want to spend more time today here, or this meeting?
FN: Yes. Don’t worth it in this meeting to go through the backlog. No longer a good fit for this meeting.
DC: Agree. A dedicated meeting might be beneficial. Joanna also has some ideas on how to do this.
AND: Anybody volunteering to make sure we follow up with some grooming effort?
DC: Added to the WMCS team weekly meeting.
Stabler image names for toolforge-jobs? https://phabricator.wikimedia.org/T357388
TV: Adding a default is a bad idea, most cases would require a dedicated idea. Other than that, it's worth exploring.
AND: What about alias about `latest-debian` or something.
TV: Yes, worth exploring. But not as default.
DC: Same. Default is too much. There are benefits about the latest.
BD: Understands the use case. Runtime upgrades are breaking changes. This request may be a bit naive. This was causing pain in the GridEngine times.
AND: the request seems legit, but maybe not realistic? This may not be how software works, but we all want something like this.
TV: How to implement this using our docker registry? If we change the registry, we may need to manually update the live manifests running in the kubernetes cluster.
DC: We shouldn't offer per-lang :latest images.
AND: it seems we are gravitating towards not doing this. But we can write the rationale on the phab ticket.
BD: The true request may be for all tools to be robustly maintained for ever. This is not a tech problem, but a social problem. We don’t know how to solve this.
DC: Context of the request? The requestor is helping move tools from the grid to k8s. The request seems fine from this point of view.
BD: The requestor is part of the community tech team. They are doing work on support of the community.
AND: the CTT gets a req # for a FT SWE to work on tools.
BD: Yeah, but that doesn’t scale. No more than 25 tools per FT SWE. To support 250 tools we need 10 FT SWE. Actually, the WMF doesn’t scale. We have an infinite number of tech and social problems.
AB: What is the problem with a default image?
TV: the error will be hidden, instead of being explicit.
BD: We also want push-to-deploy & build service.
AB: Can we make the default image, the latest build-service image?
BD: Is there a check to make sure there is a build-service image built before running a job?
DC: It should be possible.
TV: Also, component API may be the proper fix.
FN: Maybe what people are missing is a simple way to run something one-off really quick. People that don’t want to think about images.
DC: BTW the build-service image name doesn’t change. From that POV you can reuse today your toolforge-jobs cmdline over and over again.
AND: I just wrote something in the ticket, but please DC follow up if what I wrote is not enough.
Re-visit toolforge next steps
DC: document https://docs.google.com/document/d/1sqo6YGRn9u-S7V0y9m07cYKA84vQlKa-7_F8p7eg80Y/edit we need the component API.
DC: another document https://docs.google.com/document/d/1y7cIX3oiqOH8hEuPhSEuqWx-yElxTq9Ga8qJRvqAJjY/edit
DC: 1) API aggregation layer 2) orchestration API (component) needed for push-to-deploy 3) moving webservices CLI to a API design. Maybe extend jobs-api to support web services.
DC: Which order, priorities, etc.
Some grid users are asking for a way to schedule a job from within a job, API aggregation would help with that
BD: centralized logging
Taavi has documented this some (link?) but push to deploy/api refactor will happen first
Taavi says ‘the current task about logging is a mess, we will need a new one when we actually start implementing it’
(More brainstorming about OpenAPI spec)
Action items
- Komla: send email to wikitech-l cloud-l about the shutdown
- Komla: request entry on tech news before friday for next week's monday
- David: reply to https://phabricator.wikimedia.org/T357388 with buildservice options