Apache Traffic Server

From Wikitech
Jump to navigation Jump to search

Apache Traffic Server is a caching proxy server.

Architecture

There are three distinct processes in Traffic Server:

  1. traffic_server
  2. traffic_manager
  3. traffic_cop

traffic_server is the process responsible for dealing with user traffic: accepting connections, processing requests, serving documents from cache or the origin server. traffic_server is a event-driven multi-threaded process. Threads are used to take advantage of multiple CPUs, not to handle multiple connections concurrently (eg: by spawning a thread per connection, or by using a thread pool). Instead, an event system is used in order to schedule work on threads. ATS uses a state machine to handle each transaction (single HTTP request from a client and the response Traffic Server sends to that client) and provides a system of hooks where plugins (eg: lua) can step in and do things. Specific timers are used at the various states.

traffic_manager is responsible for launching, monitoring and configuring traffic_server, handling the statistics interface, cluster administration and virtual IP failover.

traffic_cop is a watchdog program monitoring the health of both traffic_manager and traffic_server. This has traditionally been the command to use in order to start ATS. In a systemd world, it can probably be avoided, and traffic_manager can be used as the program to be executed in order to start the unit.

Configuration

The changes to the default configuration required to get a caching proxy are:

# /etc/trafficserver/remap.config
map client_url origin_server_url

The following rules map grafana and phabricator to their respective backends and define a catchall for requests that don't match either of the first two rules:

# /etc/trafficserver/remap.config
map http://grafana.wikimedia.org/ http://krypton.eqiad.wmnet/
map http://phabricator.wikimedia.org/ http://iridium.eqiad.wmnet/
map / http://deployment-mediawiki05.deployment-prep.eqiad.wmflabs/
# /etc/trafficserver/records.config

CONFIG proxy.config.http.server_ports STRING 3128 3128:ipv6
CONFIG proxy.config.admin.synthetic_port INT 8083
CONFIG proxy.config.process_manager.mgmt_port INT 8084

CONFIG proxy.config.admin.user_id STRING trafficserver
CONFIG proxy.config.http.cache.required_headers INT 1
CONFIG proxy.config.url_remap.pristine_host_hdr INT 1
CONFIG proxy.config.disable_configuration_modification INT 1

If proxy.config.http.cache.required_headers is set to 2, which is the default, the origin server is required to set an explicit lifetime, from either Expires or Cache-Control: max-age. By setting required_headers to 1, objects with Last-Modified are considered for caching too. Setting the value to 0 means that no headers are required to make documents cachable.

Logging

Diagnostic output can be sent to standard output and error instead of the default logfiles, which is a good idea in order to take advantage of systemd's journal.

# /etc/trafficserver/records.config
CONFIG proxy.config.diags.output.status STRING O
CONFIG proxy.config.diags.output.note STRING O
CONFIG proxy.config.diags.output.warning STRING O
CONFIG proxy.config.diags.output.error STRING E
CONFIG proxy.config.diags.output.fatal STRING E
CONFIG proxy.config.diags.output.alert STRING E
CONFIG proxy.config.diags.output.emergency STRING E

Health checks

Load the `healthchecks` plugin:

# /etc/trafficserver/plugin.config
healthchecks.so /etc/trafficserver/healtchecks.conf

Define health check:

# /etc/trafficserver/healtchecks.conf
/check /etc/trafficserver/ts-alive text/plain 200 403

Response body:

# /etc/trafficserver/ts-alive
All good

With the above configuration, GET requests to `/check` will result in 200 responses from ATS with the response body defined in `/etc/trafficserver/ts-alive`.

Cache inspector

To enable the cache inspector functionality, add the following remap rules:

map /cache-internal/ http://{cache-internal}
map /cache/ http://{cache}
map /stat/ http://{stat}
map /test/ http://{test}
map /hostdb/ http://{hostdb}
map /net/ http://{net}
map /http/ http://{http}

systemd unit

# /etc/systemd/system/trafficserver.service
[Unit]
Description=Apache Traffic Server
After=network.service systemd-networkd.service network-online.target 

[Service]
ExecStart=/usr/bin/traffic_manager --nosyslog
ExecReload=/usr/bin/traffic_ctl config reload
Restart=always
RestartSec=1

LimitNOFILE=500000
LimitMEMLOCK=90000

# PrivateTmp causes the following error:
# FATAL: unable to load remap.config
# traffic_server: using root directory '/usr'
#PrivateTmp=yes

CapabilityBoundingSet=CAP_CHOWN CAP_DAC_OVERRIDE CAP_IPC_LOCK CAP_KILL CAP_NET_ADMIN CAP_NET_BIND_SERVICE CAP_SETGID CAP_SETUID
# Setting SystemCallFilter as follows seems fine at first, but then objects do not get cached. Needs further investigation.
# SystemCallFilter=~acct modify_ldt add_key adjtimex clock_adjtime delete_module fanotify_init finit_module get_mempolicy init_module io_destroy io_getevents iopl ioperm io_setup io_submit io_cancel kcmp kexec_load keyctl lookup_dcookie mbind migrate_pages mount move_pages open_by_handle_at perf_event_open pivot_root process_vm_readv process_vm_writev ptrace remap_file_pages request_key set_mempolicy swapoff swapon umount2 uselib vmsplice
# MemoryDenyWriteExecute=true

ReadOnlyDirectories=/usr
ReadOnlyDirectories=/var/lib
#
#ReadOnlyDirectories=/etc
#ReadWriteDirectories=/etc/trafficserver/internal
#ReadWriteDirectories=/etc/trafficserver/snapshots

Cheatsheet

Show non-default configuration values:

sudo traffic_ctl config diff

Configuration reload:

sudo traffic_ctl config reload

Check if a reload/restart is needed:

sudo traffic_ctl config status

Start in debugging mode, dumping headers

sudo traffic_server -T http_hdrs

Access metrics from the CLI:

traffic_ctl metric get proxy.process.http.cache_hit_fresh

Multiple metrics can be accessed with 'match':

traffic_ctl metric match proxy.process.ssl.*

Show storage usage:

traffic_server -C check

Lua scripting

ATS plugins can be written in Lua. As an example, this is how to choose an origin server dynamically:

# /etc/trafficserver/remap.config
map http://127.0.0.1:3128/ http://$origin_server_ip/ @plugin=/usr/lib/trafficserver/modules/tslua.so @pparam=/var/tmp/ats-set-backend.lua
reverse_map http://$origin_server_ip/ http://127.0.0.1:3128/

Choosing origin server

Selecting the appropriate origin server for a given request can be done using ATS mapping rules. The same goal can be achieved in lua:

-- /var/tmp/ats-set-backend.lua
function do_remap()
    url = ts.client_request.get_url()
    if url:match("/api/rest_v1/") then
        ts.client_request.set_url_host('origin-server.eqiad.wmnet')
        ts.client_request.set_url_port(80)
        ts.client_request.set_url_scheme('http')
        return TS_LUA_REMAP_DID_REMAP
    end
end

Negative response caching

By default ATS caches negative responses such as 404, 503 and others only if the response defines a maxage via the Cache-Control header. This behavior can be changed by setting the configuration option proxy.config.http.negative_caching_enabled, which allows caching of negative responses that do NOT specify Cache-Control. If negative caching is enabled, the lifetime of negative responses without Cache-Control is defined by proxy.config.http.negative_caching_lifetime, in seconds, defaulting to 1800.

One might however desire to cache 404 responses which do not send Cache-Control, without caching any 503 response. Given that proxy.config.http.negative_caching_enabled enables the behavior for a bunch of negative responses, and there is no way to specify the list of negative response status codes to cache, the goal can be achieved by setting Cache-Control in lua only for certain status codes:

function read_response()
    local status_code = ts.server_response.get_status()
    local cache_control = ts.server_response.header['Cache-Control']

    -- Cache 404 responses without CC for 10s
    if status_code == 404 and not(cache_control) then
        ts.server_response.header['Cache-Control'] = 'max-age=10'
    end
end

function do_remap()
    ts.hook(TS_LUA_HOOK_READ_RESPONSE_HDR, read_response)
    return 0
end

Setting X-Cache-Int

As another example, the following script takes care of setting the X-Cache-Int response header:

-- /var/tmp/ats-set-x-cache-int.lua
function cache_lookup()
     local cache_status = ts.http.get_cache_lookup_status()
     ts.ctx['cstatus'] = cache_status
end

function cache_status_to_string(status)
     if status == TS_LUA_CACHE_LOOKUP_MISS then
        return "miss"
     end

     if status == TS_LUA_CACHE_LOOKUP_HIT_FRESH then
        return "hit"
     end

     if status == TS_LUA_CACHE_LOOKUP_HIT_STALE then
        return "miss"
     end

     if status == TS_LUA_CACHE_LOOKUP_SKIPPED then
        return "pass"
     end

     return "bug"
end

function gen_x_cache_int()
     local hostname = "cp4242" -- from puppet
     local cache_status = cache_status_to_string(ts.ctx['cstatus'])

     local v = ts.client_response.header['X-Cache-Int']
     local mine = hostname .. " " .. cache_status

     if (v) then
        v = v .. ", " .. mine
     else
        v = mine
     end

     ts.client_response.header['X-Cache-Int'] = v
     ts.client_response.header['X-Cache-Status'] = cache_status
end

function do_remap()
     ts.hook(TS_LUA_HOOK_CACHE_LOOKUP_COMPLETE, cache_lookup)
     ts.hook(TS_LUA_HOOK_SEND_RESPONSE_HDR, gen_x_cache_int)
     return 0
end

Unit testing

The busted framework allows to test Lua scripts. It can be installed as follows:

apt install luarocks
luarocks install busted
luarocks install luacov

The following unit tests cover some of the functionalities implemented by ats-set-x-cache-int.lua:

-- unit_test.lua
_G.ts = { client_response = {  header = {} }, ctx = {} }

describe("Busted unit testing framework", function()
  describe("script for ATS Lua Plugin", function()

    it("test - hook", function()
      stub(ts, "hook")

      require("ats-set-x-cache-int")
      local result = do_remap()
      assert.are.equals(0, result)
    end)

    it("test - gen_x_cache_hit", function()
      stub(ts, "hook")

      require("ats-set-x-cache-int")
      local result = gen_x_cache_int()

      assert.are.equals('miss', ts.client_response.header['X-Cache-Status'])
      assert.are.equals('cp4242 miss', ts.client_response.header['X-Cache-Int'])
    end)

  end)
end)

Run the tests and generate a coverage report with:

$ busted -c unit_test.lua 
●●
2 successes / 0 failures / 0 errors / 0 pending : 0.012771 seconds

$ luacov ; cat luacov.report.out

External links