HTTP proxy

From Wikitech

To allow HTTP requests reach the outside world, we maintain a caching HTTP proxy in each datacenter. They are exposed using services entries of the form webproxy.<datacenter>.wmnet running on the install* servers.


You can set the http_proxy and https_proxy environment variables to make many command-line scripts use the site specific proxy automatically.

The no_proxy and NO_PROXY variables are configured automatically across the infra by the profile::environment puppet module and hiera settings.

Helper commands

In your terminal, just run set_proxy. This will take care of setting up the needed environment variables during the active session.

unset_proxy will do the opposite.

Manual config

export http_proxy=http://webproxy:8080
export https_proxy=http://webproxy:8080
export no_proxy=,::1,localhost,.wmnet,,,,,,,,,,,,
export HTTP_PROXY=$http_proxy
export HTTPS_PROXY=$https_proxy
export NO_PROXY=$no_proxy
  • "no_proxy" MUST be explicitly set
    • Prevents unnecessary load on the proxies (to fetch internal resources)
    • Prevents stale data cached on the proxies
    • Prevents unnecessary dependencies
  • HTTP proxies SHOULD NOT be configured by default, but on a case by case (need) basis
    • It's preferred to set these variables for your current session only by using the helper commands at the terminal prompt
    • services should leverage Puppet to configure proxies
  • These proxies MUST NOT be used from Cloud VPS instances (enforced by ACLs)

Internal endpoints

It is better to use internal endpoints instead of public ones, a list or reasons is visible on this comment.


Use e.g. https://mw-api-int-ro.discovery.wmnet:4446 and set the HTTP Host header to the domain of the site you want to access, e.g. curl -H "Host:" https://mw-api-int-ro.discovery.wmnet:4446

MediaWiki_On_Kubernetes internal API endpoints:

For examples in Python and R refer to these notes.


See Machine Learning/LiftWing/Usage#Internal endpoints

A complete list exists at:

Example usage


If you are using curl, you can use the --proxy flag:

curl --proxy http://webproxy.eqiad.wmnet:8080


wget has no --proxy flag, set the appropriate environment variable instead.

https_proxy=http://webproxy:8080 wget

Maven proxy configuration example

You could reference your proxy in your maven conf file ~/.m2/settings.xml to make sure you are passing through it to fetch packages at build time.



In addition to environment variables defined above, invoke ant with the -autoproxy argument.


If your Spark job pulls dependencies via spark.jars.packages, you can point it to a settings file that automatically takes care of proxying by mirroring thru our Archiva instance:

    "spark.jars.packages": "...",  # packages to pull go here
    "spark.driver.extraJavaOptions": "-Divy.cache.dir=/tmp/ivy_spark3/cache -Divy.home=/tmp/ivy_spark3/home ",
    "spark.jars.ivySettings": "/etc/maven/ivysettings.xml"


Access log dashboard:


Future/possible improvements


See also