Help talk:Toolforge/Jobs framework
Tool maintainers email alias
In the Email notifications section it says that tool.mytool@toolforge.org
is an alias for tool maintainers'. As per Help:Toolforge/Email I think the correct address is tools.mytool@toolforge.org
, but I'm not 100% sure. I tried sending an email to the tool
version (from Toolforge servers) and it didn't work, but it may have been something else. If someone could confirm before changing, I'd appreciate it. --Diegodlh (talk) 02:16, 27 February 2022 (UTC)
- @Diegodlh: It is indeed
tools
with ans
. I've fixed the docs, thanks! Majavah (talk!) 10:13, 27 February 2022 (UTC)
Option to specify log location
It seems logs are always put in the home directory of a tool. If you have a couple of jobs this can quickly load to a lot of log files in your home directory. Is there any way to specify another location that the framework should use to write log files (e.g. '~/logs')? Hay (talk) 09:14, 29 April 2022 (UTC)
- Not currently. But see phab:T301901 and phab:T304421. SD0001 (talk) 09:32, 8 August 2022 (UTC)
- phab:T304421 closed on October 28 as a duplicate of phab:T301901: Allow specifying the path for log files for jobs executed on the new toolforge Jobs framework. Wbm1058 (talk) 14:19, 12 November 2022 (UTC)
- I see that §Custom log files was added on 11 October 2022 by Arturo Borrero Gonzalez. Wbm1058 (talk) 20:20, 14 November 2022 (UTC)
Cron syntax
Is it full cron syntax or just what is stated here? I mean 0 8 ? * MON#2
for every second Monday of the month, see
binbot (talk) 08:30, 12 February 2023 (UTC)
- @Bináris The Kubernetes specification doesn't explicitely specify which cron syntax is allowed, but looking at the source code it looks like it's using the standard mode of github.com/robfig/cron/v3, which corresponds with what's listed as 'standard' on the Wikipedia page.
- The StackOverflow link you posted seems to be for the 'Quartz' syntax, which is not supported. You might find this more helpful: https://stackoverflow.com/questions/11683387/run-every-2nd-and-4th-saturday-of-the-month. Majavah (talk!) 10:14, 12 February 2023 (UTC)
@Majavah This looks sophisticated. :-) Great, thank you very much for your effort to look inside! binbot (talk) 14:00, 14 February 2023 (UTC)
PHP syntax
Hello. This page suggests the following syntax for PHP tasks (albeit when talking about memory allocations): "toolforge-jobs run myjob --command ./i_like_more_ram.php --image php7.4 --mem 1Gi --cpu 2". However, that doesn't work for me -- it seems like it ought to be "toolforge-jobs run myjob --command "php ./i_like_more_ram.php" --image php7.4 --mem 1Gi --cpu 2". Can someone with more understanding of the right syntax confirm? jarry1250 (talk) 12:20, 25 March 2023 (UTC)
- @Jarry1250 I've updated the documentation. The original format worked if you'd marked the php script as executable and added a shebang, but the
php script.php
syntax is more common and beginner-friendly. Majavah (talk!) 12:52, 25 March 2023 (UTC)
"Cron" needs better docs
Ok, so the Grid Engine way for periodic jobs was to create a crontab entry (bog standard *nix method) where you run either jsub
or jlocal
to create a Grid Engine job (or non-job in the case of jlocal
). So in an effort to finally start transitioning my tool to Kubernetes I pick one of the cron jobs and change it to call toolforge-jobs
instead of jsub
. Which results in some clever function changing…
28 4 * * * toolforge-jobs run jobname [opts] --command "command" --emails all
…into…
28 4 * * * /usr/bin/jsub -N cron-49 -once -quiet toolforge-jobs run jobname [opts] --command "command" --emails all
Sigh.
So, clearly, what Help:Toolforge/Jobs_framework#Creating_scheduled_jobs_(cron_jobs) is trying to communicate is that in k8s-land we're not actually supposed to use cron, in favour of a built-in cron-alike facility in either the Toolforge Jobs Framework or in k8s. This seems rather surprising to me, since *nix cron is an incredibly well-established and well known and understood facility, but I'm guessing from a k8s perspective it probably enables some better orchestration or something.
In any case, this needs some actual explanation on Help:Toolforge/Jobs framework. For example a #Cron section that explains the difference, its advantages (the reasoning, primarily, to help understanding), documents the auto-conversion of crontabs if you try it the old way, and gives a couple of examples of old and new ways to run a periodic job.
In particular, it needs to say straight up that toolforge-jobs … --schedule "timespec"
is the new One True Way™ if that is the case so people like me don't have to wonder whether we're messing something up and why it's not working.
And if it's not the One True Way™ but merely the old jsub-ification automagic for crontab that hasn't been removed yet, and going forward it should be possible to use cron too, as an alternative for us dinosaurs, then that ought to be mentioned too.
Oh, and "cron" is what you manipulate with crontab
. If it's not managed through crontab and executed by crond then the docs shouldn't refer to it as "cron". Use "scheduled jobs" or something as the terminology to cut down on the confusion. --Xover (talk) 08:45, 6 April 2023 (UTC)
- Oh, and the suggested docs might beneficially also refer to Help:Toolforge/Jobs framework#Loading jobs from a YAML file as a way to replicate some of the desirable properties of cron jobs. In particular, a crontab serves as both a configuration file for your jobs and an overview of what jobs are (intended to be) running. If you just interactively schedule something with
toolforge-jobs … --schedule
it'll be lost after a reboot or other such interrupt. Having inherited a large Toolforge tool from an inactive maintainer I can say with some empiric authority that the crontab was invaluable for figuring out what the heck was going on. --Xover (talk) 08:55, 6 April 2023 (UTC)
Job time limits?
Is is possible to set a time limit for a job? After migrating my jobs to Kubernetes I occasionally found that a job may got stuck for several days which I had to manually restart.This does not seem to happen before in the Grid Engine. Wcam (talk) 21:34, 4 May 2023 (UTC)
ERROR: Please report this issue to the Toolforge admins
for what it's worth
posting this here of the off chance it might possibly be useful to someone. Wbm1058 (talk) 21:04, 25 June 2023 (UTC)
tools.billsbots@tools-sgebastion-10:~$ toolforge-jobs restart refreshlinks
ERROR: An internal error occured while executing this command.
Traceback (most recent call last):
File "/usr/lib/python3/dist-packages/toolforge_weld/api_client.py", line 71, in _make_request
response.raise_for_status()
File "/usr/lib/python3/dist-packages/requests/models.py", line 940, in raise_for_status
raise HTTPError(http_error_msg, response=self)
requests.exceptions.HTTPError: 500 Server Error: INTERNAL SERVER ERROR for url: https://api.svc.tools.eqiad1.wikimedia.cloud:30003/jobs/api/v1/restart/refreshlinks
During handling of the above exception, another exception occurred:
Traceback (most recent call last):
File "/usr/lib/python3/dist-packages/tjf_cli/cli.py", line 712, in main
run_subcommand(args=args, api=api)
File "/usr/lib/python3/dist-packages/tjf_cli/cli.py", line 659, in run_subcommand
op_restart(api, args.name)
File "/usr/lib/python3/dist-packages/tjf_cli/cli.py", line 586, in op_restart
api.post(f"/restart/{name}")
File "/usr/lib/python3/dist-packages/toolforge_weld/api_client.py", line 95, in post
return self._make_request("POST", url, **kwargs).json()
File "/usr/lib/python3/dist-packages/toolforge_weld/api_client.py", line 75, in _make_request
raise self.exception_handler(e)
tjf_cli.api.TjfCliHttpError: Internal Server Error
ERROR: Please report this issue to the Toolforge admins: https://w.wiki/6Zuu
tools.billsbots@tools-sgebastion-10:~$
-- Wbm1058 (talk) 21:04, 25 June 2023 (UTC)
- Some background for the above. I have a "continuous" job running. I confirmed that it's still running with the "toolforge jobs list" command. Its log file has grown to 218,145 KB making this file too large to allow reading it via WinSCP. Previously when this log file became too large I've forced the creation of a new log with "toolforge jobs restart refreshlinks" but now this is failing, making it impossible to kill or restart my existing job which I suppose means it will just keep on running until it crashes your system. Wbm1058 (talk) 21:25, 25 June 2023 (UTC)
- Now it's grown to 262,000 KB.
- Still telling me to:
- ERROR: Please report this issue to the Toolforge admins: https://w.wiki/6Zuu
- Anybody home? Wbm1058 (talk) 17:47, 29 June 2023 (UTC)
- Working for me again, thanks. Wbm1058 (talk) 19:37, 30 June 2023 (UTC)
Another error report for the Toolforge admins
Console dump:
tools-sgebastion-10:~$ toolforge jobs list
ERROR: An internal error occured while executing this command.
Traceback (most recent call last):
File "/usr/lib/python3/dist-packages/urllib3/connection.py", line 159, in _new_conn
(self._dns_host, self.port), self.timeout, **extra_kw)
File "/usr/lib/python3/dist-packages/urllib3/util/connection.py", line 57, in create_connection
for res in socket.getaddrinfo(host, port, family, socket.SOCK_STREAM):
File "/usr/lib/python3.7/socket.py", line 748, in getaddrinfo
for res in _socket.getaddrinfo(host, port, family, type, proto, flags):
socket.gaierror: [Errno -3] Temporary failure in name resolution
During handling of the above exception, another exception occurred:
Traceback (most recent call last):
File "/usr/lib/python3/dist-packages/urllib3/connectionpool.py", line 600, in urlopen
chunked=chunked)
File "/usr/lib/python3/dist-packages/urllib3/connectionpool.py", line 343, in _make_request
self._validate_conn(conn)
File "/usr/lib/python3/dist-packages/urllib3/connectionpool.py", line 841, in _validate_conn
conn.connect()
File "/usr/lib/python3/dist-packages/urllib3/connection.py", line 301, in connect
conn = self._new_conn()
File "/usr/lib/python3/dist-packages/urllib3/connection.py", line 168, in _new_conn
self, "Failed to establish a new connection: %s" % e)
urllib3.exceptions.NewConnectionError: <urllib3.connection.VerifiedHTTPSConnection object at 0x7f0e4a9d4a58>: Failed to establish a new connection: [Errno -3] Temporary failure in name resolution
During handling of the above exception, another exception occurred:
Traceback (most recent call last):
File "/usr/lib/python3/dist-packages/requests/adapters.py", line 449, in send
timeout=timeout
File "/usr/lib/python3/dist-packages/urllib3/connectionpool.py", line 638, in urlopen
_stacktrace=sys.exc_info()[2])
File "/usr/lib/python3/dist-packages/urllib3/util/retry.py", line 398, in increment
raise MaxRetryError(_pool, url, error or ResponseError(cause))
urllib3.exceptions.MaxRetryError: HTTPSConnectionPool(host='api.svc.tools.eqiad1.wikimedia.cloud', port=30003): Max retries exceeded with url: /jobs/api/v1/list/ (Caused by NewConnectionError('<urllib3.connection.VerifiedHTTPSConnection object at 0x7f0e4a9d4a58>: Failed to establish a new connection: [Errno -3] Temporary failure in name resolution'))
During handling of the above exception, another exception occurred:
Traceback (most recent call last):
File "/usr/lib/python3/dist-packages/tjf_cli/cli.py", line 712, in main
run_subcommand(args=args, api=api)
File "/usr/lib/python3/dist-packages/tjf_cli/cli.py", line 653, in run_subcommand
op_list(api, output_format)
File "/usr/lib/python3/dist-packages/tjf_cli/cli.py", line 328, in op_list
list = _list_jobs(api)
File "/usr/lib/python3/dist-packages/tjf_cli/cli.py", line 324, in _list_jobs
return api.get("/list/")
File "/usr/lib/python3/dist-packages/toolforge_weld/api_client.py", line 124, in get
return self._make_request("GET", url, **kwargs).json()
File "/usr/lib/python3/dist-packages/toolforge_weld/api_client.py", line 103, in _make_request
response = self.session.request(method, **self.make_kwargs(url, **kwargs))
File "/usr/lib/python3/dist-packages/requests/sessions.py", line 535, in request
resp = self.send(prep, **send_kwargs)
File "/usr/lib/python3/dist-packages/requests/sessions.py", line 648, in send
r = adapter.send(request, **kwargs)
File "/usr/lib/python3/dist-packages/requests/adapters.py", line 516, in send
raise ConnectionError(e, request=request)
requests.exceptions.ConnectionError: HTTPSConnectionPool(host='api.svc.tools.eqiad1.wikimedia.cloud', port=30003): Max retries exceeded with url: /jobs/api/v1/list/ (Caused by NewConnectionError('<urllib3.connection.VerifiedHTTPSConnection object at 0x7f0e4a9d4a58>: Failed to establish a new connection: [Errno -3] Temporary failure in name resolution'))
ERROR: Please report this issue to the Toolforge admins: https://w.wiki/6Zuu
tools.billsbots@tools-sgebastion-10:~$