Jump to content

User:triciaburmeister/Sandbox/requestctl

From Wikitech

requestctl is a command-line tool to control the configuration that manages access and routing of web requests. Wikimedia SREs use this tool to throttle and block certain requests patterns in our edge caching, either in the HAProxy TLS terminator or in Varnish frontend.

Get started

Tutorials

Essential concepts

requestctl uses a custom schema that defines four types of objects:

  • pattern objects describe specific patterns of an HTTP request.
  • ipblock objects group specific IP ranges into logical groups.
  • action objects describe an action to be performed on a request that matches specific combinations of patterns and ipblocks. Actions are what is enabled on Varnish.
  • haproxy_action objects are similar to action objects, but allow a different set of actions because of the capabilities of haproxy.

For more details of how requestctl works, read the Overview.

User guide

For a full list of commands, see the Command line reference.

Identify problematic traffic patterns

Use one or more of the following tools to find and confirm the traffic patterns you need to block:

  • Filter requests in Turnilo and Superset live data, based on which requestctl actions match them
  • Inspect the current or potential impact of requestctl actions in Turnilo/Superset
  • Check Varnish logs for matching requests

Look for existing request pattern

To explore patterns that already exist, you can:

  • Check the git repository dumped on the puppetservers at /srv/git/conftool/auditlog; all requesctl objects there have a request- prefix.
  • Query the backend using requestctl's get command:
# All patterns
requestctl get pattern
# Request a specific pattern, like 'ua/requests'
requestctl get pattern ua/requests

To list all actions to which a pattern is applied, use the find command. For example:

$ requestctl find ua/requests
action: generic_ua_aws, expression: (pattern@ua/requests OR pattern@ua/curl) AND ipblock@cloud/aws
haproxy_action: generic_ua_aws, expression: (pattern@ua/requests OR pattern@ua/curl) AND ipblock@cloud/aws

Add a new pattern

If no existing request pattern matches what you need, add a new one by creating a YAML file on any puppetserver. The pattern object described by your file should capture, with good flexibility, most of the characteristics you want to match in a request.

You can declare entries in your YAML file for the following fields:

  • method, the http method
  • request_body a regex to match in the http body. CURRENTLY UNSUPPORTED IN VARNISH.
  • url_path the path part of the URL, will be used as a regexp
  • header an header name to match, using the regexp at header_value;
  • header_value the regexp to match the value of header to. If left blank when a header is defined, the pattern means “the header is not present”
  • query_parameter and query_parameter_value are a parameter and a regexp for the value of a query parameter to match. An empty value will be interpreted as “for any value”.

Each pattern has an associated “scope” tag, which is an arbitrary grouping of different patterns. For example, the "ua" scope applies to patterns matching specific user agents.

Sync the pattern object to etcd

After you create the YAML file, use the apply command to sync to etcd so you can use the pattern in an actionː

puppetserver1001:~$ sudo requestctl apply pattern <scope>/<name> -f <yaml_file>

Add a new ipblock

ipblocks let you group specific IP ranges into logical groups. For example: the ipblock with scope=cloud,name=aws includes all the IP ranges used by AWS.

To add a new ipblock: on any puppetserver, create a YAML file with the entries you want to declare. ipblocks have two fields:

  • comment - a concise and precise description of the ipblock
  • cidrs - a list of IPv4 or IPv6 CIDRs

When adding a new ipblock, you should typically add it to one of the three scopes (basically: categories) that already exist:

  • abuse - this category should gather all the small groups of abusers we add manually. Each list should be kept reasonably small as they're implemented as ACLs in varnish, and not netmapper files, which are more efficient.
  • known-clients - this category should include most large client groups that are not cloud providers. This includes e.g. googlebot. Some of these are updated automatically
  • cloud - this category should only include cloud providers and all entries should be updated automatically.

Example scenario: a couple annoying clients are causing issues and you want to create a new ipblock for them. Because it's just two IPs, and you're defining a bespoke rule based on abusive behaviour, they should go in the abuse category.

To add a new scope (category), you must contact the Traffic team, because that requires modifications to both varnish and haproxy configurations.

Sync the ipblock object to etcd

After you create the YAML file, use the apply command to sync to etcd so you can use the ipblock in an actionː

puppetserver1001:~$ sudo requestctl apply ipblock <scope>/<name> -f <yaml_file>

Define an action

Action objects define what happens to the traffic matching a given pattern: should matching traffic be blocked, or rate limited? What HTTP status should be served? What message?

Actions are dumped to YAML files on the puppetservers at /srv/git/conftool/auditlog/request-actions.

Action objects are associated to a specific cluster (cache-text or cache-upload at the time of writing) and have a name. Their fields are:

  • enabled boolean. If false, the pattern will not be included in VCL
  • sites a list of datacenters where to apply the rule. If empty, the rule will be applied to all datacenters.
  • cache_miss_only boolean. If false, the pattern will be applied also to cache hits. Not applicable to haproxy_actions.
  • comment a comment to describe what this action does.
  • resp_status the http status code to send as a response
  • resp_reason the text to send as a reason with the response
  • do_throttle boolean to say if we should throttle requests matching the expression (true) on just respond with resp_status unconditionally (false)
  • throttle_requests, throttle_interval, throttle_duration are the three arguments of vsthrottle in VCL to control the rate-limiting behaviour. Not available for haproxy_actions.
  • throttle_per_ip boolean. Makes the rate-limiting per-ip rather than per-cache-server. Not available for haproxy_actions.
  • log_matching if true, it will record in X-Requestctl if a request matches the rule. It will thus be included into the vcl objects even if disabled; it will just not perform any banning / ratelimiting action.
  • expression a string describing the combination of patterns and ipblocks that should be matched. The BNF of the grammar is described in cli.Requestctl.grammar, but in short:
    • A pattern is referenced with the keyword pattern@<scope>/<name>
    • An ipblock is referenced with the keyword ipblock@<scope>/<name>
    • Patterns and ipblocks can be combined with AND, AND NOT, OR, OR NOT logic.
    • Organize statements into groups by using parentheses.

Example valid expressions: ( pattern@ua/requests OR pattern@ua/curl ) AND ipblock@cloud/aws AND NOT pattern@site/commons

Sync action objects to etcd

After you create an action object in a file, to sync it in the datastoreː

puppetserver1001:~$ sudo requestctl apply action <cluster>/<name> -f <yaml_file>

Enable / disable and commit action

To actually get changes to an action injected into the Varnish configuration, run enable or disable and then commit to etcd the resulting VCL snippet:

# Writes to the datastore, needs sudo
sudo requestctl enable cache-text/generic_ua_clouds && sudo requestctl commit
sudo requestctl disable cache-text/generic_ua_clouds && sudo requestctl commit

Define haproxy ̠actions

To act on requests before they touch the caching layer, you must inject actions in the HAProxy configuration instead of Varnish. Haproxy ̠action objects are very similar to action objects, and share many (but not all) of the same fields. Because they're targeting HAProxy, they allow you to perform different actions on the request. Specificallyː

  • No rate-limiting is enforced at this level, as we were wary of the performance implications of adding potentially one stick-table per rule. Rate-limiting will have to happen at the Varnish layer. So all throttle_* fields aren't present.
  • Given we're not caching anything in HAProxy, cache_miss_only has no meaning.
  • silent_drop - HAProxy can both deny or silently drop a request. To silently drop a request, set silent_drop: true</code.
  • HAProxy can limit the bandwidth available for requests that match a certain pattern. This is controlled by:
    • bw_throttle (boolean)
    • bw_throttle_rate (the rate limit in bandwidth)
    • bw_duration (duration of the limit)

To define a new haproxy action, create a YAML file on any puppetserver, then run apply to sync to etcd, enable and commit.

List existing haproxy ̠actions

# All actions.
requestctl get haproxy_action -o yaml
# A specific action
requestctl get haproxy_action cache-text/generic_ua_clouds
# All enabled actions
requestctl get haproxy_action -o json | jq 'to_entries[] | select(.value.enabled == true)'

Sync haproxy ̠action objects to etcd

After you create an action object in a file, to sync it in the datastoreː

puppetserver1001:~$ sudo requestctl apply haproxy_action <cluster>/<name> -f <yaml_file>

Enable / disable and commit haproxy ̠action

To actually get changes to an action injected into the HAProxy configuration, run enable or disable and then commit to etcd the resulting DSL snippet:

# Writes to the datastore, needs sudo
sudo requestctl enable -s haproxy cache-text/generic_ua_clouds && sudo requestctl commit
sudo requestctl disable -s haproxy cache-text/generic_ua_clouds && sudo requestctl commit


Remove an object

If you're removing a pattern / ipblock, ensure it's not referenced by any action objectː

# Find all actions containing a specific pattern or ipblock - both will be searched!
requestctl find ua/foobar
# Same for haproxy-actions
requestctl find -s haproxy ua/foobar

requestctl doesn't allow you to remove a pattern/ipblock if they’re still referenced in an actionː it will terminate with exit code 1 and will print an error like pattern ua/foobar is used by the following actionː cache-text/baz. Once you've ensured your object can be removed, you can runː

$ sudo requestctl delete pattern ua/foobar

Modify and sync any object

  • SSH to a puppetserver frontend.
  • Find the YAML file corresponding to your object under /srv/git/conftool/auditlog; make a copy.
  • Modify the copied file.
  • Run sudo requestctl apply <object-type> <tag>/<name> -f <filename>.
  • If you've modified an action object or an haproxy ̠action object, don't forget to inject the change into the production systems by running sudo requestctl commit.

Command line reference

The ultimate source of truth for requestctl commands is the code in Gitlab.

apply

Writes (or "syncs") an object to the datastore from data in a YAML file.

$ sudo requestctl apply haproxy_action cache-upload/bwlimit_google_cloud -f file.yaml

If you've modified an action object or an haproxy ̠action object, don't forget to also inject the change into the production systems by running sudo requestctl commit.

commit

Commits changes to actions to the compiled datastores. Interactive by default; pass -b if you want to run in batch mode.

$ requestctl enable cache-text/requests_ua_api
$ requestctl commit
--- cache-text/global.old

+++ cache-text/global.new

@@ -1,3 +1,12 @@

+
+// FILTER requests_ua_api
+// Disallow python-requests to access restbase or the action api
+// This filter is generated from data in etcd. To disable it, run the following command:
+// sudo requestctl disable 'cache-text/requests_ua_api'
+if (req.http.User-Agent ~ "^python-requests" && (req.url ~ "^/api/rest_v1/" || req.url ~ "/w/api.php") && vsthrottle.is_denied("requestctl:requests_ua_api", 500, 30s, 1000s)) {
+    set req.http.Requestctl = req.http.Requestctl + ",requests_ua_api";
+    return (synth(429, "Please see our UA policy"));
+}
+

 // FILTER enwiki_api_cloud
 // Limit access to the enwiki api from the clouds

==> Ok to commit these changes?
Type "go" to proceed or "abort" to interrupt the execution
>

Once all Varnish changes are merged, the haproxy actions will be committed as well.

dump

For each category of objects, copies the directory from a specific repository:

# This generates a tree of action objects under dump_dir
$ requestctl dump -g dump_dir action

Parameters:

  • -g, --git-repo identifies the base directory

enable / disable

Enables / disables actions. Note: the enabled field in actions is explicitly excluded from syncing with the apply command.

$ requestctl enable cache-text/foobar  # enables cache-text/foobar in varnish
$ requestctl disable -s varnish cache-text/foobar # disables the same action in Varnish.
$ requestctl enable -s haproxy cache-text/foobar # enables a similarly named haproxy_action

find

Finds which actions include a specific pattern or ipblock:

requestctl find pattern

Example:

$ requestctl find ua/requests
action: generic_ua_aws, expression: (pattern@ua/requests OR pattern@ua/curl) AND ipblock@cloud/aws
haproxy_action: generic_ua_aws, expression: (pattern@ua/requests OR pattern@ua/curl) AND ipblock@cloud/aws

get

Gets the data from the datastore and displays them in the desired format. Can be used to fetch all objects or just one. See the sections below for example usage with different object types.

get action

# All actions.
requestctl get action -o yaml
# A specific action
requestctl get action cache-text/generic_ua_clouds
# All enabled actions
requestctl get action -o json | jq 'to_entries[] | select(.value.enabled == true)'

get haproxy_action

# All actions.
requestctl get haproxy_action -o yaml
# A specific action
requestctl get haproxy_action cache-text/generic_ua_clouds
# All enabled actions
requestctl get haproxy_action -o json | jq 'to_entries[] | select(.value.enabled == true)'

get ipblock

:~$ requestctl get ipblock -o json | jq -r 'keys[]'
abuse/blocked_nets
abuse/bot_blocked_nets
abuse/bot_posts_blocked_nets
abuse/phabricator_abusers
abuse/text_abuse_nets
cloud/akamai
cloud/aws
cloud/azure
cloud/digitalocean
cloud/gcp
cloud/linode
cloud/oci
cloud/public_cloud_nets
known-clients/googlebot

get pattern

  $ requestctl get pattern
  +------------------------+-------------------------------+
  |          name          |            pattern            |
  +------------------------+-------------------------------+
  |   cache-text/docroot   |          url:^/[\?$]          |
  | cache-text/bad_param_q |           ?q=\w{12}           |
  |   cache-text/enwiki    |    Host: en.wikipedia.org     |
  |  cache-text/restbase   |      url:^/api/rest_v1/       |
  | cache-text/action_api  |        url:/w/api.php         |
  | cache-text/requests_ua | User-Agent: python-requests.* |
  |  cache-text/wiki_page  |     url:/wiki/[^:]+(\?$)      |
  |      ua/requests       | User-Agent: ^python-requests  |
  +------------------------+-------------------------------+



  $ requestctl get pattern ua/requests -o json | jq .
  {
    "ua/requests": {
      "method": "",
      "request_body": "",
      "url_path": "",
      "header": "User-Agent",
      "header_value": "^python-requests",
      "query_parameter": "",
      "query_parameter_value": ""
    }
  }

  $ requestctl get pattern ua/requests -o yaml
  ua/requests:
    header: User-Agent
    header_value: ^python-requests
    method: ''
    query_parameter: ''
    query_parameter_value: ''
    request_body: ''
    url_path: ''

haproxycfg

Outputs the haproxy configuration fragment generated from the haproxy_action:

$ requestctl haproxycfg cache-text/requests_ua_api

# ACLs generated for requestctl actions
acl ua_python_requests hdr_reg(User-Agent) -i "^python\-requests"
acl url_rest_api path_reg -i "^/api/rest_v1/"
acl url_action_api path_reg -i "/w/api.php"

# requestctl haproxy action cache-text/requests_ua_api
# Disallow python-requests to access restbase or the action api
# This action is generated from data in etcd. To disable it, run the following command:
# sudo requestctl disable -s haproxy 'cache-text/requests_ua_api'

http-request deny status 429 reason "Please see our UA policy" if ua_python_requests url_rest_api || url_rest_api url_action_api

log

Outputs the varnishncsa command to run on a cache host to see requests matching a given action.

$ requestctl log cache-text/requests_ua_api

You can monitor requests matching this action using the following command:
sudo varnishncsa -n frontend -g request \
  -F '"%{X-Client-IP}i" %l %u %t "%r" %s %b "%{Referer}i" "%{User-agent}i" "%{X-Public-Cloud}i"' \
  -q 'ReqHeader:User-Agent ~ "^python-requests" and ( ReqURL ~ "^/api/rest_v1/" or ReqURL ~ "/w/api.php" ) and  not VCL_ACL eq "MATCH wikimedia_nets"'

There is no corresponding functionality for haproxy actions.

validate

Validates objects written in a directory tree. Useful for testing new actions.

$ for what in pattern ipblock action haproxy_action; do requestctl dump base_dir $what; done
# Edit whatever action/haproxy_action you want to modify
$ edit base_dir/requestctl-actions/foo/bar.yaml
$ requestctl validate base_dir
$ requestctl apply action foo/bar base_dir/requestctl-actions/foo/bar.yaml

It will exit with non-zero exit status if any error is present.

vcl

Outputs the vcl fragment generated from the action.

$ requestctl vcl cache-text/requests_ua_api

// FILTER requests_ua_api
// Disallow python-requests to access restbase or the action api
// This filter is generated from data in etcd. To disable it, run the following command:
// sudo requestctl disable 'cache-text/requests_ua_api'
if (req.http.User-Agent ~ "^python-requests" && (req.url ~ "^/api/rest_v1/" || req.url ~ "/w/api.php") && vsthrottle.is_denied("requestctl:requests_ua_api", 500, 30s, 1000s)) {
    return (synth(429, "Please see our UA policy"));
}

Code reference

Auditlog repository

In production, Conftool2git synchronizes requestctl objects to a git repository on the puppetservers under /srv/git/conftool/auditlog. Don't update object definitions inside that repository; instead, copy the file over to your home directory and run apply. Your changes modify data that resides in our main Etcd cluster under /conftool/v1/request-{ipblock,action,pattern}s/.

Schema