User:Dvorapa/Toolforge for beginners/Scheduled tasks

From Wikitech

Once in Toolforge, you may want to schedule a repeated task to help you maintain your tool or Wikimedia wiki. You can for example schedule bot to edit articles, backside command to do a frequent job, possibly everything you do repeatedly once in a while and don't really want to.

To schedule a repeated task, you need to know two instruments, you'll use all the time:

Gridengine

Usually you run the command right in the Toolforge command line. It executes right away and you get the results directly into the command line or redirected into the file you choose:

$ sleep 50; beep

$ sleep 50; echo "Hello" > my_task.out

$ sleep 50; eko "Hello" > my_task.out 2> my_task.err

But larger tasks take longer time and are quite resource-consuming. And you want to go out, close the command line and turn the computer off for a while, don't you? Let's move the task to the Gridengine. What it does is that it schedules the task when the memory is cool and the servers are not overloaded and also runs the task in the background:

$ jsub -N my_task -once -quiet sleep_and_beep.sh

-N parameter gives your task a name in Gridengine. So you can easily pause it, search it in Grid status page or status command, or stop it when it starts to behave bad. Also the standard output and error log files are created automatically using the name specified.

Just be aware, that the internal enviroment of the Gridengine is slightly different to what you are familiar to from Toolforge. So not everything works the same or behaves the same.

Also you can not do multiple tasks in one row (like sleep 50; beep, like for example xargs -I does), but you can definitely save them into a batch file and run that file!

That run once, but I want to make it run continuously!

If you want to run a task continuously in the background for the whole eternity and beyond, the command is so similar:

$ jstart -N my_task -quiet ~/sleep_and_beep.sh
My command makes flowers brown in background, what shall I do?!

Once you submitted your task, you can show the info about it using those two commands:

$ job -v my_task
Job 'my_task' has been running since 2018-02-21T17:40:13 as id 12345
$ qstat -j $(job my_task)
==============================================================
job_number:                 12345
exec_file:                  job_scripts/67890
submission_time:            Thu Feb 21 17:40:12 2018
owner:                      tools.my_tool
uid:                        12312
group:                      tools.my_tool
gid:                        12412
...

Note we are using job my_task to find our task number, because most q-commands can not work with task names. We could also run them one after another like:

$ job my_task
12345
** CTRL + C, CTRL + V
$ qstat -j 12345

Also you can search for your task in Grid status page if you are away from the Toolforge command line.

If you feel your task does something bad, you can always restart it using:

$ qmod -rj $(job my_task)

Or stop it like this:

$ jstop my_task
Still cool, but how can I schedule it every day?

I see, let's move to Crontab!

Crontab

You may ask why we started with Gridengine, when everybody wants to run it frequently. But we'll use all the stuff we learned, let me explain. Crontab is somewhat a scheduler for our tasks. You maintain your own table, which runs tasks every specified day, minute, or even a year. You can open and edit it using:

$ crontab -e

The file already contains small user manual with sheds for your time preferences. If you don't care, you fill in a star (*), if you do care, you fill in an exact number every ocasion of which you want it to run, or divide a star to make it more frequent and still don't care (*/3). The format is as follows, separated by tabs (spaces work too):

m	h	dom	mon	dow	command

So you can for example use:

15	*	*	*	*	jsub -N my_task -once -quiet sleep_and_beep.sh

Which runs every 15th minute of every hour, every day, week, month, or year, ever. Perhaps you noticed we used jsub again. Crontab runs commands in background already, so this is just our way to be gentle to the servers and use Gridengine to find an unthrottled moment to run the task.

You can tune your own time schedule to your needs of course, this guru page can help you with many tricky schedules we think.

There are only two things I can tell, where we would avoid using jsub in here, the first is we really need it to run that exact second (but the delay of Gridengine is usually just in miliseconds, so you don't need to worry really), the second is that Gridengine always does not behave the same as Toolforge command line does. Then you may want to avoid some issue you bumped into. But Crontab does automatically add jsub to commands, that does not use it!

So if you need to avoid Gridengine (jsub), there is one possible way:

15	*	*	*	*	jlocal sleep_and_beep.sh > my_task.out

jlocal is a command, that does exactly nothing! But it can run your command in Toolforge command line, not in Gridengine. So immediately when it is scheduled, but it is not as gentle to servers as jsub. It is not a recommended practise, but it can come handy sometimes. And it prevents Crontab to append jsub automatically when it is missing basically.

I've submitted the task, what happens next?

Everything else works like Gridengine does. You can list task info, stop it, watch it, you'll get log files, like usual. If you want to backup your Crontab, you can use:

$ crontab -l

To show its contents and possibly save it into the file like:

$ crontab -l > my_backup.txt

When you need it, you just call:

$ crontab my_backup.txt

And it will replace whatever was in Crontab before with the contents of your file. This comes handy when you want to quickly stop Crontab scheduling. Or you can alter multiple Crontabs this way!

See also