User:CGoubert-WMF/Periodic jobs without multiversion (Draft)
Material may not yet be complete, information may presently be omitted, and certain parts of the content may be subject to radical, rapid alteration. More information pertaining to this may be available on the talk page.
A naive solution probably starts with making helmfile aware of the group-to-wiki mapping so CronJobs are spawned using the right image.
While this is sufficient for scripts that run on one wiki, or one group, it doesn't solve the problem for all the periodic jobs that use some form of foreachwiki that isn't group-based and can't be executed in parallel.
A possible solution shape could then be for CronJobs to have an option to enable MultiVersion. This option would spawn the Pod with 3 group-versioned MediaWiki containers augmented with a listener. The main container in the Pod would run the orchestration and call the listener in the correct pod for the current wiki to shell out to php-cli.
This seems unwieldy, as the pods would be bigger than current MultiVersion (3x mediawiki image + orchestration container), and would not yield any sort of reusable solution for other types of periodic jobs that are not MediaWiki-based.
At this point, CronJobs start to look more like Workflows, requiring precise control of run sequence or parallelism, and would probably need an actual kubernetes-native workflow orchestration tool to implement the logic.
Doing so would potentially unlock more powerful retries, checkpointing, and finer grained control over parallelism, at the cost of time-to-working solution, and greater complexity.
Data Platform is already using Airflow, we could benefit from their experience with this tool to sketch out an implementation.