Portal:Toolforge/Admin/Monthly meeting/2023-11-07

Agenda

abandoned tool policy, standards committee inactivity
- case in point: https://phabricator.wikimedia.org/T284968 https://phabricator.wikimedia.org/T338555
- Related to abandoned tool policy concerns; Shall we review all pending adoption requests in the interim?
Setting timeline for turning off the grid
- Updating https://wikitech.wikimedia.org/wiki/News/Toolforge_Grid_Engine_deprecation#Timeline
- Announcing to community
- Offering help to active tools with active maintainers who may need assistance
Quotas (Taavi)

Notes

Abandoned tool policy

TV: It’s inactive, what’s the historical background?
BD: We should reboot the committee, it’s not been refreshed since 20172016. We could ask in the general mailing list who is interested. To clear the backlog, anyone with Toolforge admin rights could clear it. The idea of the committee was not to have only paid staff decide.

Timeline for turning off the grid

SKS: We’re reaching out to maintainers once more. 100 maintainers were not reached because the emails on file were not reachable. Some of the tools are probably just experimental. For specific dates we haven’t communicated any so far, but loosely the migration will continue through the end of 2023.
AB: While I last talk to Nicholas about this, we discussed a “warning shot”, an email saying that we’re gonna turn stuff off, wait some more, then turn off the tools. I don’t have specific dates.
SKS: Last resort is to temporarily turn off a tool, and if they come back in time we can still turn it back on. We want to some more communication.
AB: Currently the tool disabling process is tool-wide. We don’t have a way to turn off Grid things but not Kubernetes things. Is it possible to have a tool running in both systems?
BD: You can only have one webservice, either Grid or K8s. But you can have more jobs.
AB: We need to split out the “disable” button. Or we can say that we’ll disable both if we don’t hear back.
BD: Can we just stop the Grid jobs?
AB: If we do and people show up and ask, we can tell them their code is still there.
BD: If we want intermediate blocks we need additional software development.
AB: There’s a trivial way of stopping, but maybe not of blocking. I assumed it was a subset of the current shutdown process.
TV: We can delete crontabs and the user is free to recreate it, but we’ll turn it off again.
AB: If the goal is to do all of them at the same time, there are easier paths.
BD: If we want to selectively block tools, I would block all new tools from using the grid as the first thing. We should’ve done that a year ago, but that’s another story.
AB: Telling people “migrate or die” is obnoxious. Punishing users for using a tool they didn’t know is unsupported makes everybody sad.
BD: But it’s the only way we find out about unmaintained tools.
AB: We don’t need to write new software, it could be a 4-line Bash script. As a start we can just stop jobs.
TV: Is it just a matter of sending an email and picking a date?
TV: What do we want to do about people without a valid email?
AB: I don’t know what to do. We need to broadcast things also on Discord, etc.
BD: A thing I’ve done in the past was to look for SUL accounts that were associated. Turns out there was a bug that caused the email addresses to blank.
SKS: I used LDAP search, but sometimes it’s still blank
BD: The bug was, logging into wikitech it might have blanked your email in LDAP, because the email address wasn’t set in mediawiki db yet. If the email is blank, it might not be the user’s fault. The place to find potentially a mail to reach them is the SUL database, separate from wikitech. Sometimes you can correlate those accounts through Phabricator, sometimes from Striker database (if they used OAuth). You have to poke at the db. Sometimes they have the same username. It’s tedious.
TV: Dev account email are generally considered public, and SUL emails are not.
AB: About 100 email bounced, but the number with blank emails should be smaller. Another way would be to broadcast to talk pages.
BD: We used to do it.
AB: It’s a legit fallback if we can’t find emails.
TV: It’s something large enough to mention in the Tech News newsletter.
BD: I would always give 3 month notice for this sort of thing. It seems like forever, but I think it’s the right thing to do.
SKS: For people without emails, I wanted to reach out through the specific ticket for each tool. Would that work?
AB: It's totally possible that they have different phab emails from wikitech emails. It’s a lot of trouble, but doing all these things would be nicest thing to do.

Default quotas

TV: I was looking this week at implementing them. Pick a reasonable number of pods and then define CPU and RAM quotas based on that. Is 10 pods reasonable?
AB: Are there any scenarios with multiple pods for a given tool? Can users scale? Do they have access to those APIs?
BD: Yes they can. They can create 6 pods that can communicate with each other.
AB: 10 seems like a big default
BD: We haven’t tracked cronjobs per tool. In Grid Engine you can request to run 70 jobs at the same time, and the Grid gives you 10 slots in parallel. K8s doesn’t have a similar functionality for queuing things that will eventually execute.
TV: The Job object. If it doesn’t have enough quota it will wait, so it will be equivalent. There is limit for jobs, higher than the number of pods.
BD: I don’t think 10 sounds that big.
TV: It would be easier if we tracked quotas in Git. Essentially having maintain-kube-users handle the quotas too.
BD: Sounds ok to me.

Action items

Recruit more people to the Standards Committee (Komla?)
Go through the tool adoption backlog