News/Toolforge Grid Engine deprecation

From Wikitech
This page contains historical information. It may be outdated or unreliable.
2024

This page contains information about the deprecation and removal of the Toolforge Grid Engine platform.

What is changing?

The Grid Engine cluster is being decommissioned in accordance with the timeline on this page.

The Toolforge admins are asking tool maintainers to move tools off the grid and report any blocking issues. This work is being tracked on the Phabricator workboard.

Timeline

  • Oct-Dec 2021: Yes Done Release the Toolforge Jobs Framework. Continue working on Toolforge buildpacks. Migrate Son of Grid Engine to Debian Buster.
  • Oct-Dec 2022: Yes Done Ask community to begin migrating tools. Collect blocking issues.
  • Jan-Mar 2023: Yes Done Add features to support identified blocking issues. Explore Kubernetes a service as potential migration path. Tool migrations continue.
  • Apr-Jun 2023: Yes Done Toolforge buildpacks beta. See T267374. Tool migrations continue.
  • Jul-Sep 2023: Yes Done Toolforge buildpacks multipack support work. See T325799. Tool migrations continue.
  • Oct-Dec 2023: Yes Done All Tools are now able to be migrated. Tool migrations continue.
    • November 2023: Yes Done Kickoff grid shutdown process. Notify individual maintainers who still have tools on the grid of shutdown timelines via email, cloud-announce mailing list, and talk pages.
    • 2023-12-14: Yes Done Tools owned by unresponsive or unreachable maintainers will be stopped. See unreached tools list for what tools were stopped. FAQ contains more information: What_exactly_is_happening_on_2023-12-14?
  • Jan-Mar 2024: Yes Done Migrations complete, the grid is stopped, and finally the grid infrastructure is deleted
    • 2024-02-14: Yes Done All tools still running anything on the grid will be stopped. Tools that have an active maintainer and a clear plan for migrating can request in the tool-specific migration task for the tool to not be stopped/be re-enabled (although they will be shutdown again if they miss the 2024-03-14 deadline).
    • 2024-03-14: Yes Done Grid infrastructure is shut down and deleted. Tools that were not migrated in time can no longer run, but their files will remain on the Toolforge servers.

FAQ

How can I track tool migrations?

grid-deprecation.toolforge.org will show the number of tools still running on grid engine, as well as specifics about the tools and jobs they are running.

What exactly is happening on 2023-12-14?

Think of this as an intentional outage for tools whose maintainers haven't been reachable. Tools that have no plan or communication will be stopped on this date. The tool being down should alert users and maintainers what is happening if prior communication has not yet reached them. The tools will only be stopped, not deleted, and can be restarted if contact is made.

Why is this happening?

Tools need to have a migration plan for when the grid shuts down in on 2024-02-14. As all other methods have failed to reach maintainers of these tools, the hope is that turning them off will raise awareness about what is happening with enough time to still make a plan and migrate before the grid shutdown date on 2024-02-14. For users of these tools, it will also make them aware the tool they are using will be shut down in the future if no action is taken. This will allow users to plan, ask for help, and get support. It will also provide time to help find new maintainers for these tools.

How can I know if a tool I maintain or use has been tagged as having an unreachable maintainer?

We are tracking these on the unreached tools list.

Any grid-disabled tools will have a TOOL_DISABLED file in the tool directory. Tools that were running web services on the grid prior to being shut down will also display a message that explains the shutdown when someone tries to load the web service.

If I use a tool that has been shut down, what should I do?

Contact the maintainer if possible. Share what's happening on the associated phabricator ticket with that tool (See the Phabricator workboard.). If the tool is unmaintained, and you'd like to take over maintenance, follow the abandoned tool policy. If that's not possible make plans to stop using the tool by the grid shutdown date of 2024-02-14. The tool can be restarted to accommodate your needs, however, it will still be shutdown on 2024-02-14 if not migrated off the grid by that date.

If I maintain a tool that has been shut down, what should I do?

Reach out on the phabricator ticket for your tool detailing plans for migration, deletion, or if you need further help or support to develop plans. Grid access for the tool can be re-enabled upon request if you are planning to migrate and continue maintaining the tool.

What happens to crontabs for tools that have been shut down?

The crontab file will be archived to a file called crontab.grid_stopped in the tool home directory. If a tool is re-enabled, the crontab will be restored to the cron server.

Note that the Jobs framework built-in scheduling functionality will replace crontab support entirely.

How can I help?

  • Help with tool migrations. Some maintainers have specifically asked for help in migrating. See Phabricator for the list.

Are crontab and jlocal going away too?

Yes, these both are grid-specific tools. The Toolforge Jobs framework has built-in scheduling capability which makes crontab obsolete, and any jlocal use cases should be obsolete due to the increased reliability that Kubernetes brings.

What should I do?

You have a couple of options:

Use case continuity

The following table tracks use case continuity.

Moving from Toolforge GridEngine to Toolforge Kubernetes
Feature Grid Engine Kubernetes Comment
job scheduling jsub or jstart Toolforge jobs
One off jobs or continuous jobs
Example:

From GridEngine

$ crontab -e
5 * * * * jsub -once -N name-of-tool php $HOME/user/bot.php >/dev/null 2>&1

To Kubernetes

$ toolforge-jobs run name-of-tool --command "php ./user/bot.php" --image php8.2 --schedule "@hourly"
web services webservice specify an image and 'kubernetes' as the backend Example:

From GridEngine

$ webservice --backend=gridengine start

To Kubernetes

$ webservice stop
$ webservice --backend=kubernetes php8.2 start
Multi-language tools Native Toolforge buildpacks Some single language tools will need updated or new images (like dotnet)

Why are we doing this?

As outlined in our series of blog posts, Toolforge is powered by two different backend engines, Kubernetes and Grid Engine. These two backends have traditionally offered different features for tool developers. But as time moves forward we’ve learnt that Kubernetes is the future.

See more for a detailed explanation.

Solutions to common problems

Rebuild virtualenv for Python users / python3: not found / ModuleNotFoundError: No module named '...'

Python virtual environments ("venvs") are tied to the underlying system where they are running. Because of that, you will need to delete and re-create your virtual environments using these instructions.

Tools needing multiple language runtimes

You can build an image for your tool with the dependencies required.

Mono container

Using mono? See discussion on a Mono specific container phab:T311466

Requires a system library or tool to be present

You can build an image for your tool with the dependencies required.

Pywikibot scripts

Delete a tool

Some tools were experiments that are done, others were made obsolete by other tools, some are just things that the original maintainer is tired of caring for. Maintainers can mark their tools for deletion using the "Disable tool" button on the tool's detail page on https://toolsadmin.wikimedia.org/. Disabling a tool will immediately stop any running jobs including webservices and prevent maintainers from logging in as the tool. Disabled tools are archived and deleted after 40 days. Disabled tools can be re-enabled at any time prior to being archived and deleted.

"Your webservice is not running" from `webservice status` after migrating

If webservice status says "Your webservice is not running" after you have started it on the Kubernetes backend, you may have a $HOME/service.template file containing "backend: gridengine". Try removing your $HOME/service.template file or possibly better yet updating it to list the new backend and type that your tool needs to run on Kubernetes.

See also

Communication and support

Support and administration of the WMCS resources is provided by the Wikimedia Foundation Cloud Services team and Wikimedia movement volunteers. Please reach out with questions and join the conversation:

Discuss and receive general support
Stay aware of critical changes and plans
Track work tasks and report bugs

Use a subproject of the #Cloud-Services Phabricator project to track confirmed bug reports and feature requests about the Cloud Services infrastructure itself

Read stories and WMCS blog posts

Read the Cloud Services Blog (for the broader Wikimedia movement, see the Wikimedia Technical Blog)