Help talk:Toolforge/Web

portgrabber

Latest comment: 9 years ago4 comments2 people in discussion

I'm not seeing portgrabber(1) anywhere. Where is it, or what are the up-to-date instructions for running a non-PHP Web service? abartov (talk) 07:36, 14 December 2014 (UTC)Reply

There is no documentation for this. The source can be found here, i. e. basically connect to /tmp/sock.portgranter, send $tool\n there, receive a port number back, connect to port 8282 on tools-webproxy, send .*\n$host:$port\n there, leave the sockets open until your server terminates. Note that for non-PHP = Tomcat, there is webservice tomcat start. --Tim Landscheidt (talk) 07:53, 14 December 2014 (UTC)Reply

Thanks, Tim! This was helpful. I now see portgrabber is reachable on the grid, even if not from the tool's command line. I ran into more difficulty running my server, but it does not appear to be related to portgrabber, but to the environment. abartov (talk) 21:27, 14 December 2014 (UTC)Reply

I'm sorry, I misread your question to mean "how does portgrabber work", not "how does one use portgrabber". It's rather pretty easy: Call it with portgrabber $tool $command [$arg1…$argn], and it will grab a port number and call $command [$arg1…$argn] $port (leaving the sockets open until $command terminates). --Tim Landscheidt (talk) 21:44, 14 December 2014 (UTC)Reply

Can't move logs?

Latest comment: 9 years ago4 comments2 people in discussion

Per the instructions, I've tried modifying the ~/.lighttpd.conf from

server.errorlog = "$home/error.log"
server.breakagelog = "$home/error.log"

to

server.errorlog += "$home/logs/error.log"
server.breakagelog += "$home/logs/error.log"

And I just get hit with

2015-08-03 02:03:30: (log.c.166) server started 
2015-08-03 02:03:30: (log.c.118) opening errorlog '/data/project/avicbot/error.log$home/logs/error.log' failed: No such file or directory
2015-08-03 02:03:30: (server.c.1012) Opening errorlog failed. Going down.

my ~/logs directory does exist, but it seems it's trying to go to '/data/project/avicbot/error.log$home/logs/error.log' ? Is this expected behavior, or am I doing something wrong? #avicennasis@wikitech 02:16, 3 August 2015 (UTC)Reply

If you mean the section "Default configuration", that is an extract of the corresponding shell script lighttpd-starter where $home will be replaced with the tool's home directory. So in your ~/.lighttpd.conf, you need to use explicit paths (e. g., /data/project/avicbot/logs/error.log) instead.

But your intention cannot be fulfilled: You cannot use server.errorlog = ("Duplicate config variable in conditional 0 global: server.errorlog"), and using server.errorlog += will append the path to the default one and thus not work either.

If you are really committed to putting logs in ~/logs, you could theoretically duplicate the whole web service set up and change the server.errorlog configuration to your liking, but for all practical purposes I would try to make peace with ~/error.log :-). --Tim Landscheidt (talk) 03:06, 3 August 2015 (UTC)Reply

I spoke too soon: If you use a construct like:

$HTTP["url"] =~ "^/" {
        server.errorlog = "/data/project/avicbot/logs/error.log"
}

you can make lighttpd output its log there, but the grid system will create ~/error.log nonetheless. However I would still recommend sticking to the "standard" layout. --Tim Landscheidt (talk) 03:19, 3 August 2015 (UTC)Reply

Ah well. I'll learn to get used to it, it seems. Thank you for the reply, though. #avicennasis@wikitech 07:18, 4 August 2015 (UTC)Reply

Redirecting http to https

Latest comment: 7 years ago3 comments2 people in discussion

Tracked in Phabricator
Task T163019

What is the best way to redirect all traffic to https? I have tried the following but it results in an infinite loop:

$HTTP["scheme"] == "http" {
    $HTTP["host"] =~ ".*" {
        url.redirect = (".*" => "https://%0$0")
    }
}

Thanks,
Sam Wilson 06:19, 13 February 2017 (UTC)Reply

I believe this is impossible with ~/.lighttpd.conf alone. I looked at a similar question and the source code of the suggested solution mod_extforward, but after adding:

server.modules += ( "mod_extforward" )

extforward.forwarder = ( "all" => "trust")

to ~/.lighttpd.conf $HTTP['scheme'] still always was http. So we would have to write our own lighttpd module which we probably won't do :-).

The bigger picture of making Tool Labs https-only is tracked as phab:T102367, and it's probably best to solve this there globally. --Tim Landscheidt (talk) 10:11, 13 February 2017 (UTC)Reply

@Tim Landscheidt: Hmm, bother! Ah well, it's a shame it can't be done in .lighttpd.conf, but as you say it's something that will be solved globally (at some point). In the meantime I'm doing this: https://github.com/wikisource/ia-upload/commit/66c62dfafb10d82797974a8fd6e394d6b3106d4a (based on code in video2commons). I might add a note to this page about how to do this.

Sam Wilson 10:29, 13 February 2017 (UTC)Reply

Creating a virtual environment for a Python app

Latest comment: 7 years ago1 comment1 person in discussion

python3 -m venv ~/www/python/venv

Does not seem to work, see also https://phabricator.wikimedia.org/T140103

The Phabricator task suggests instead:

virtualenv -p python3 venv

Apparently the command needed depends on which host operating system you are using. On a Toolforge bastion you need the virtualenv -p python3 venv form and will end up with a Python 3.4.0 virtual environment. Inside a Kubernetes python container you need to use the python3 -m venv ~/www/python/venv form and will end up with a Python 3.4.2 virtual environment. --BryanDavis (talk) 19:48, 5 July 2017 (UTC)Reply

process.env.PORT not defined

Latest comment: 6 years ago2 comments2 people in discussion

Tracked in Phabricator
Task T205505 Resolved

Note that the node.js instructions fail on ToolForge, as the PORT environment variable is not found by server.js. Smith609 (talk) 06:15, 26 September 2018 (UTC)Reply

Based on the comments on the Phabricator task, it looks like this problem was local to the tool or the result of a configuration issue. --BryanDavis (talk) 22:22, 28 September 2018 (UTC)Reply

Is there any difference between `python shell` and `shell`?

Latest comment: 5 years ago3 comments2 people in discussion

I don't quite get the difference between these commands:

kubectl exec -it <webservice-podname> -- /bin/bash
webservice --backend=kubernetes python shell
webservice --backend=kubernetes shell

assuming that I have a python webservice running, will they give access to the same container shell? Dalba (talk) 07:47, 22 February 2019 (UTC)Reply

Great question! I'll try to break it down line by line.

kubectl exec -it <webservice-podname> -- /bin/bash: This will open an interactive session running a /bin/bash process in the main container of the given pod. The pod will be left running after the interactive session is terminated.
webservice --backend=kubernetes python shell: This will start a new Kubernetes pod using the docker-registry.tools.wmflabs.org/toollabs-python-web:latest image with /bin/bash -il as the entry point, wait for the pod to start, and then run kubectl attach --tty --stdin <pod_name> to attach an interactive console to that pod. This pod will be destroyed when the interactive session is terminated.
webservice --backend=kubernetes shell: This does all the same things as webservice --backend=kubernetes python shell, but using the docker-registry.tools.wmflabs.org/toollabs-php-web:latest (PHP 5.6) image. This is due to php5.6 being the current default type when using the Kubernetes backend.

--BryanDavis (talk) 22:47, 22 February 2019 (UTC)Reply

Now it all makes sense. Perfect. Thank you! Dalba (talk) 05:20, 23 February 2019 (UTC)Reply

Location of static files for uWSGI apps?

Latest comment: 4 years ago7 comments2 people in discussion

Under "Python (uWSGI)", subsection "Default uwsgi configuration", it says, Static files are located in $HOME/www/python/src/static/, but that sees to not be the case. I'm not sure what the intent was, but I got it to work by adding a symlink from $HOME/www/static -> $HOME/www/python/src/static/. RoySmith (talk) 02:05, 17 January 2020 (UTC)Reply

Files in the $HOME/www/static directory are available to be served by https://tools-static.wmflabs.org as documented at Help:Toolforge/Web#Static_file_server. Uwsgi is capable of serving static files, but I can not find any evidence in the history of the current webservice command that the $HOME/www/python/src/static/ directory mentioned in Help:Toolforge/Web#Default_uwsgi_configuration was ever configured for static asset service. I dug through the history of the older webservice2 in the ops/puppet.git repository and did not find uwsgi configuration for static files there either. The item was added to the docs in Special:Diff/1837297 by MichaelSchoenitzer, maybe they can help explain? --BryanDavis (talk) 02:52, 17 January 2020 (UTC)Reply

I've got the tools-static.wmflabs.org mechanism working, but I'd like to avoid that if possible. Having the static files served up under a different host name from the rest of the app complicates things (i.e. constructing urls). At the moment, I'm going with having my own uwsgi.ini:

[uwsgi]
check-static = /data/project/spi-tools-dev/www/static

but it would be nice to avoid that bit of configuration. RoySmith (talk) 03:07, 17 January 2020 (UTC)Reply

The $HOME/www/python/uwsgi.ini settings you mention seem like the right solution. If we did add a check-static mount to the default config, I think that having it be under the $HOME/www/python/src directory would be a bad idea. Something like $HOME/www/python/static could be a reasonable setting, but honestly dealing with static files is so application or framework specific it seems like setting a global default would be unlikely to help many people. As an example, the location that MichaelSchoenitzer documented is completely correct for the Flask framework and would not require any uWSGI configuration. For the Django framework the location could be correct, but would require config inside the Django app. --BryanDavis (talk) 04:23, 17 January 2020 (UTC)Reply

I reverted the misleading edit. --BryanDavis (talk) 04:30, 17 January 2020 (UTC)Reply

Thanks; eliminating that cleared up a lot of confusion. I agree that the default uWSGI config should be framework agnostic. My current thought is that rather than adding my own uwsgi.ini, I'm going to do this entirely inside django. You can get the tool name by parsing __file__ inside of settings.py, and from there it's easy to build the tools-static.wmflabs.org url, and a trivial context processor can make that available to every template. That's a little bit of work, but not as bad as I originally thought it would be. This has the advantage of keeping all the configuration inside the source tree. RoySmith (talk) 04:43, 17 January 2020 (UTC)Reply

A working example

I've got this all sorted out now. It turns out to be pretty simple, once you see how it's done. The gist is:

Parse the tool-name out of the path to your django settings file.
Use that to build a STATIC_URL for tools-static.wmflabs.org.
Add django.template.context_processors.static to TEMPLATES['OPTIONS']['context_processors'].

You can now do something like:

<link rel="stylesheet" href="{{ STATIC_URL }}/whatever.css">

in any template. No need for mucking with uwsgi.ini files. Here's a diff for reference. This works Python 3.7 and Django-2.2. If you're using an older Python, change the f-string substitution to %-formatting, or whatever. RoySmith (talk) 18:25, 17 January 2020 (UTC)Reply

Python (uWSGI/k8s), egg files and binary dependencies

Latest comment: 4 years ago1 comment1 person in discussion

I'll leave this here in case anyone finds it helpful. I had to deal with a Python egg which requires an additional .so dependency (distributed separately) in order to link on runtime. Section "Installing numpy / scipy / things with binary dependencies" covers Python wheels, but sadly it's not so straightforward with old-style .egg files ("ImportError: _library_name_.so: cannot open shared object file: No such file or directory").

At first, I tried to tweak the LD_LIBARY_PATH env var. Turns out k8s pods ignore anything set in the .bashrc file (excuse my lack of expertise, I'm sure this seems pretty obvious to an experienced user). Also, applying this workaround on venv/bin/activate didn't help. My Python app finally managed to see this variable via os.environ after I added env = LD_LIBRARY_PATH=/path/to/dir to www/python/uwsgi.ini, but still no luck on solving that ImportError.

In my case, hardcoding dlopen = /path/to/lib.so into www/python/uwsgi.ini did the trick:

[uwsgi]
dlopen = /path/to/lib.so

See uWSGI docs. Peter Bowman (talk) 00:32, 31 May 2020 (UTC)Reply

Why are there many third-party sites in our tool's csp-report

Latest comment: 2 years ago3 comments2 people in discussion

We are running the https://scholia.toolforge.org and look at the associated CSP report at https://csp-report.toolforge.org/search?ft=scholia&p=1. We are aware of the issue for doi.org, but the other sites, e..g, fonts.gstatic.com and use.typekit.net we are not aware why they show up. We are using Bootstrap and some other third-party libraries. When I F12 in Firefox, I do not see these domains. When I grep in our assets directory I cannot find them. Are these log entries artifacts? — Finn Årup Nielsen (fnielsen) (talk) 17:45, 30 March 2022 (UTC)Reply

I cannot recreate requests from scholia for sites like fonts.gstatic.com either. Looking at the reports in the CSP portal and the total traffic to the tool in the last 14 days (https://toolviews.toolforge.org/api/v1/tool/scholia/daily/2022-03-15/2022-03-30), I have a hunch that you have a few users (or maybe even just one heavy user) who have browser add-ons/extensions installed which request content from common CDNs as part of their implementation. This is a common cause of false positive CSP report data. Browsers typically do not isolate traffic driven by the page's content from traffic driven by other configuration in the browser itself when applying and reporting CSP restrictions.

If your tool was actively using resources from third-party CDNs I would expect to see a lot more than the ~1000 reported violations compared to the ~1.5M HTTP requests the tool has handled in the reporting period. -- BryanDavis (talk) 19:49, 30 March 2022 (UTC)Reply

Thanks! I also came to think that it could be add-ons. — Finn Årup Nielsen (fnielsen) (talk) 16:34, 6 April 2022 (UTC)Reply

Backend parameter

Latest comment: 8 months ago3 comments2 people in discussion

@Majavah: According to the latest version of the documentation (edited by you), there should be a --backend parameter / backend YAML key, which should be one of kubernetes. This doesn’t make much sense, if there’s no real choice, don’t ask people to choose. Couldn’t we just get rid of the backend parameter altogether? (Or is there a second backend? The documentation of extra_args speaks about most backends, which means there are so many backends that we can’t list all of them?) —Tacsipacsi (talk) 19:25, 15 March 2024 (UTC)Reply

@Tacsipacsi Thanks for the ping. I'd like to keep the --backend mentioned here at least until this patch is deployed since without that webservice will give out a warning if that param is missing. For extra_args, it should have been referring to webservice types and backends. Fixed in Special:Diff/2159625. Taavi (talk!) 20:54, 15 March 2024 (UTC)Reply

@Majavah: Of course, if the script itself asks people to choose, it should be documented that way. I hope it will entirely go away eventually, but no hurries, and in any case, the code should be updated before the documentation.

In the extra_args docs, the word backend still appears, aren’t the extra arguments passed to the web service? —Tacsipacsi (talk) 12:23, 17 March 2024 (UTC)Reply