Apache Traffic Server

From Wikitech
Jump to navigation Jump to search

Apache Traffic Server is a caching proxy server.

Architecture

There are three distinct processes in Traffic Server:

  1. traffic_server
  2. traffic_manager
  3. traffic_cop

traffic_server is the process responsible for dealing with user traffic: accepting connections, processing requests, serving documents from cache or the origin server. traffic_server is a event-driven multi-threaded process. Threads are used to take advantage of multiple CPUs, not to handle multiple connections concurrently (eg: by spawning a thread per connection, or by using a thread pool). Instead, an event system is used in order to schedule work on threads. ATS uses a state machine to handle each transaction (single HTTP request from a client and the response Traffic Server sends to that client) and provides a system of hooks where plugins (eg: lua) can step in and do things. Specific timers are used at the various states.

traffic_manager is responsible for launching, monitoring and configuring traffic_server, handling the statistics interface, cluster administration and virtual IP failover.

traffic_cop is a watchdog program monitoring the health of both traffic_manager and traffic_server. This has traditionally been the command to use in order to start ATS. In a systemd world, it can probably be avoided, and traffic_manager can be used as the program to be executed in order to start the unit.

Configuration

The changes to the default configuration required to get a caching proxy are:

# /etc/trafficserver/remap.config
map client_url origin_server_url

The following rules map grafana and phabricator to their respective backends and define a catchall for requests that don't match either of the first two rules:

# /etc/trafficserver/remap.config
map http://grafana.wikimedia.org/ http://krypton.eqiad.wmnet/
map http://phabricator.wikimedia.org/ http://iridium.eqiad.wmnet/
map / http://deployment-mediawiki05.deployment-prep.eqiad.wmflabs/
# /etc/trafficserver/records.config

CONFIG proxy.config.http.server_ports STRING 3128 3128:ipv6
CONFIG proxy.config.admin.synthetic_port INT 8083
CONFIG proxy.config.process_manager.mgmt_port INT 8084

CONFIG proxy.config.admin.user_id STRING trafficserver
CONFIG proxy.config.http.cache.required_headers INT 1
CONFIG proxy.config.url_remap.pristine_host_hdr INT 1
CONFIG proxy.config.disable_configuration_modification INT 1

If proxy.config.http.cache.required_headers is set to 2, which is the default, the origin server is required to set an explicit lifetime, from either Expires or Cache-Control: max-age. By setting required_headers to 1, objects with Last-Modified are considered for caching too. Setting the value to 0 means that no headers are required to make documents cachable.

Logging

Diagnostic output can be sent to standard output and error instead of the default logfiles, which is a good idea in order to take advantage of systemd's journal.

# /etc/trafficserver/records.config
CONFIG proxy.config.diags.output.status STRING O
CONFIG proxy.config.diags.output.note STRING O
CONFIG proxy.config.diags.output.warning STRING O
CONFIG proxy.config.diags.output.error STRING E
CONFIG proxy.config.diags.output.fatal STRING E
CONFIG proxy.config.diags.output.alert STRING E
CONFIG proxy.config.diags.output.emergency STRING E

Health checks

Load the `healthchecks` plugin:

# /etc/trafficserver/plugin.config
healthchecks.so /etc/trafficserver/healtchecks.conf

Define health check:

# /etc/trafficserver/healtchecks.conf
/check /etc/trafficserver/ts-alive text/plain 200 403

Response body:

# /etc/trafficserver/ts-alive
All good

With the above configuration, GET requests to `/check` will result in 200 responses from ATS with the response body defined in `/etc/trafficserver/ts-alive`.

Cache inspector

To enable the cache inspector functionality, add the following remap rules:

map /cache-internal/ http://{cache-internal}
map /cache/ http://{cache}
map /stat/ http://{stat}
map /test/ http://{test}
map /hostdb/ http://{hostdb}
map /net/ http://{net}
map /http/ http://{http}

systemd unit

# /etc/systemd/system/trafficserver.service
[Unit]
Description=Apache Traffic Server
After=network.service systemd-networkd.service network-online.target 

[Service]
ExecStart=/usr/bin/traffic_manager --nosyslog
ExecReload=/usr/bin/traffic_ctl config reload
Restart=always
RestartSec=1

LimitNOFILE=500000
LimitMEMLOCK=90000

# PrivateTmp causes the following error:
# FATAL: unable to load remap.config
# traffic_server: using root directory '/usr'
#PrivateTmp=yes

CapabilityBoundingSet=CAP_CHOWN CAP_DAC_OVERRIDE CAP_IPC_LOCK CAP_KILL CAP_NET_ADMIN CAP_NET_BIND_SERVICE CAP_SETGID CAP_SETUID
# Setting SystemCallFilter as follows seems fine at first, but then objects do not get cached. Needs further investigation.
# SystemCallFilter=~acct modify_ldt add_key adjtimex clock_adjtime delete_module fanotify_init finit_module get_mempolicy init_module io_destroy io_getevents iopl ioperm io_setup io_submit io_cancel kcmp kexec_load keyctl lookup_dcookie mbind migrate_pages mount move_pages open_by_handle_at perf_event_open pivot_root process_vm_readv process_vm_writev ptrace remap_file_pages request_key set_mempolicy swapoff swapon umount2 uselib vmsplice
# MemoryDenyWriteExecute=true

ReadOnlyDirectories=/usr
ReadOnlyDirectories=/var/lib
#
#ReadOnlyDirectories=/etc
#ReadWriteDirectories=/etc/trafficserver/internal
#ReadWriteDirectories=/etc/trafficserver/snapshots

Debugging

The XDebug plugin allows clients to check various aspects of ATS operation.

To enable the plugin, add xdebug.so to plugin.config and restart trafficserver.

Once the plugin is enabled, clients can specify various values in the X-Debug header and receive the relevant information back.

For example:

# cache hit
$ curl -H "X-Debug: X-Milestones" http://localhost 2>&1 | grep Milestones:
< X-Milestones: PLUGIN-TOTAL=0.000022445, PLUGIN-ACTIVE=0.000022445, CACHE-OPEN-READ-END=0.000078570, CACHE-OPEN-READ-BEGIN=0.000078570, UA-BEGIN-WRITE=0.000199094, UA-READ-HEADER-DONE=0.000000000, UA-FIRST-READ=0.000000000, UA-BEGIN=0.000000000

# cache miss
< X-Milestones: PLUGIN-TOTAL=0.000017432, PLUGIN-ACTIVE=0.000017432, DNS-LOOKUP-END=0.091413811, DNS-LOOKUP-BEGIN=0.000148548, CACHE-OPEN-WRITE-END=0.091413811, CACHE-OPEN-WRITE-BEGIN=0.091413811, CACHE-OPEN-READ-END=0.000056997, CACHE-OPEN-READ-BEGIN=0.000056997, SERVER-READ-HEADER-DONE=0.218755336, SERVER-FIRST-READ=0.218755336, SERVER-BEGIN-WRITE=0.091413811, SERVER-CONNECT-END=0.091413811, SERVER-CONNECT=0.091413811, SERVER-FIRST-CONNECT=0.091413811, UA-BEGIN-WRITE=0.218755336, UA-READ-HEADER-DONE=0.000000000, UA-FIRST-READ=0.000000000, UA-BEGIN=0.000000000

The full list of debugging headers is available in the XDebug Plugin documentation.

Cheatsheet

Show non-default configuration values:

sudo traffic_ctl config diff

Configuration reload:

sudo traffic_ctl config reload

Check if a reload/restart is needed:

sudo traffic_ctl config status

Start in debugging mode, dumping headers

sudo traffic_server -T http_hdrs

Access metrics from the CLI:

traffic_ctl metric get proxy.process.http.cache_hit_fresh

Multiple metrics can be accessed with 'match':

traffic_ctl metric match proxy.process.ssl.*

Show storage usage:

traffic_server -C check

Lua scripting

ATS plugins can be written in Lua. As an example, this is how to choose an origin server dynamically:

# /etc/trafficserver/remap.config
map http://127.0.0.1:3128/ http://$origin_server_ip/ @plugin=/usr/lib/trafficserver/modules/tslua.so @pparam=/var/tmp/ats-set-backend.lua
reverse_map http://$origin_server_ip/ http://127.0.0.1:3128/

Choosing origin server

Selecting the appropriate origin server for a given request can be done using ATS mapping rules. The same goal can be achieved in lua:

-- /var/tmp/ats-set-backend.lua
function do_remap()
    url = ts.client_request.get_url()
    if url:match("/api/rest_v1/") then
        ts.client_request.set_url_host('origin-server.eqiad.wmnet')
        ts.client_request.set_url_port(80)
        ts.client_request.set_url_scheme('http')
        return TS_LUA_REMAP_DID_REMAP
    end
end

Negative response caching

By default ATS caches negative responses such as 404, 503 and others only if the response defines a maxage via the Cache-Control header. This behavior can be changed by setting the configuration option proxy.config.http.negative_caching_enabled, which allows caching of negative responses that do NOT specify Cache-Control. If negative caching is enabled, the lifetime of negative responses without Cache-Control is defined by proxy.config.http.negative_caching_lifetime, in seconds, defaulting to 1800.

One might however desire to cache 404 responses which do not send Cache-Control, without caching any 503 response. Given that proxy.config.http.negative_caching_enabled enables the behavior for a bunch of negative responses, and that ATS versions below 8.0.0 did not allow to specify the list of negative response status codes to cache, the goal can be achieved by setting Cache-Control in lua only for certain status codes:

function read_response()
    local status_code = ts.server_response.get_status()
    local cache_control = ts.server_response.header['Cache-Control']

    -- Cache 404 responses without CC for 10s
    if status_code == 404 and not(cache_control) then
        ts.server_response.header['Cache-Control'] = 'max-age=10'
    end
end

function do_remap()
    ts.hook(TS_LUA_HOOK_READ_RESPONSE_HDR, read_response)
    return 0
end

Starting with ATS 8.0.0, the configuration option proxy-config-http-negative-caching-list allows to specify the list of negative response status codes to cache.

Setting X-Cache-Int

As another example, the following script takes care of setting the X-Cache-Int response header:

-- /var/tmp/ats-set-x-cache-int.lua
function cache_lookup()
     local cache_status = ts.http.get_cache_lookup_status()
     ts.ctx['cstatus'] = cache_status
end

function cache_status_to_string(status)
     if status == TS_LUA_CACHE_LOOKUP_MISS then
        return "miss"
     end

     if status == TS_LUA_CACHE_LOOKUP_HIT_FRESH then
        return "hit"
     end

     if status == TS_LUA_CACHE_LOOKUP_HIT_STALE then
        return "miss"
     end

     if status == TS_LUA_CACHE_LOOKUP_SKIPPED then
        return "pass"
     end

     return "bug"
end

function gen_x_cache_int()
     local hostname = "cp4242" -- from puppet
     local cache_status = cache_status_to_string(ts.ctx['cstatus'])

     local v = ts.client_response.header['X-Cache-Int']
     local mine = hostname .. " " .. cache_status

     if (v) then
        v = v .. ", " .. mine
     else
        v = mine
     end

     ts.client_response.header['X-Cache-Int'] = v
     ts.client_response.header['X-Cache-Status'] = cache_status
end

function do_remap()
     ts.hook(TS_LUA_HOOK_CACHE_LOOKUP_COMPLETE, cache_lookup)
     ts.hook(TS_LUA_HOOK_SEND_RESPONSE_HDR, gen_x_cache_int)
     return 0
end

Unit testing

The busted framework allows to test Lua scripts. It can be installed as follows:

apt install luarocks
luarocks install busted
luarocks install luacov

The following unit tests cover some of the functionalities implemented by ats-set-x-cache-int.lua:

-- unit_test.lua
_G.ts = { client_response = {  header = {} }, ctx = {} }

describe("Busted unit testing framework", function()
  describe("script for ATS Lua Plugin", function()

    it("test - hook", function()
      stub(ts, "hook")

      require("ats-set-x-cache-int")
      local result = do_remap()
      assert.are.equals(0, result)
    end)

    it("test - gen_x_cache_hit", function()
      stub(ts, "hook")

      require("ats-set-x-cache-int")
      local result = gen_x_cache_int()

      assert.are.equals('miss', ts.client_response.header['X-Cache-Status'])
      assert.are.equals('cp4242 miss', ts.client_response.header['X-Cache-Int'])
    end)

  end)
end)

Run the tests and generate a coverage report with:

$ busted -c unit_test.lua 
●●
2 successes / 0 failures / 0 errors / 0 pending : 0.012771 seconds

$ luacov ; cat luacov.report.out

External links