Squid logging

From Wikitech
Jump to: navigation, search
Logging packets are sent from the squids to the logging server (locke)

See also: Logs, Fundraising Analytics/Impression Stats, Analytics/Kraken/Logging_Solutions_Overview#udp2log, packet-loss.

Squid side

We use our own UDP logging patch, described here. It has been integrated into Squid 2.7. Log lines are buffered and sent to the logging server in 1450-byte packets.

Aggregation

UDP packets are sent as unicast from all squids to a multicast relay, currently running on server oxygen. The following socat command converts the unicast UDP packets into multicast:

/usr/bin/socat UDP-RECV:8420,su=nobody UDP4-DATAGRAM:233.58.59.1:8420,ip-multicast-ttl=2

These packets are multicasted, and any host on the network joining the multicast group 233.58.59.1 will receive them on port 8420.

This setup is puppetised, and a relay can be installed by including the class misc::squid-logging::multicast-relay.

Locke/Emery

Note: New log lines should only go on emery. locke is at capacity

A program called udp2log runs there. udp2log logs to an arbitrary number of destination files or pipes, specified in /etc/udp2log. The configuration file format is line based, with each line containing:

pipe <factor> <command line>

or

file <factor> <filename>

Unlike the squid pipe log, these pipes are full shell commands and may contain spaces. There is no quoting, the command goes from after the factor to the end of the line.

<factor> is an integer sampling factor, one in every <factor> packets will be sent to the designated destination.

Comments are lines starting with "#".

Reconfiguration can be done by sending a HUP signal to udp2log. The sequence of events when HUP is received are as follows:

  1. HUP is received, a flag is set
  2. Wait for the next UDP packet
  3. Load the new configuration, open all files and pipes specified
  4. If all pipes and files were successfully opened, swap in the configuration and close all previously opened pipes and files. Otherwise, close the new pipes and files and retain the old configuration.
  5. Process the received packet using the new configuration
  6. Resume the recv() loop

This means that pipe scripts should not obtain any lock on a shared resource until after the first line is received on stdin.

udp2log is designed to run as a daemon for long periods of time with only HUP reloads, not restarts. Restarts should be avoided because they cause packet loss. I haven't quite gotten around to writing the daemonize code yet, so for now it runs in a root screen. It runs as the user "logger".

udp2log is available in svn (http://svn.wikimedia.org/svnroot/mediawiki/trunk/udplog), and in the APT repository in the udplog package.

We have seen cases where udp2log is oversubscribed with filter requests and starts to drop packets. As a result a packless analyser was written by Tim and is documented here.

Inflated Stats

It is very important that any filter that is receiving traffic filters out SpecialPage requests. Our current scheme is to make urls conform to the following format

w/index.php?title=Special:Foo

Otherwise they will be counted as regular page requests and will artificially inflate our page view stats.

Log files

Log files are stored in /a/squid.

sampled-1000.log 
A 1/1000 sampled log of all requests
support-requests.log 
Full log click through stats for fund raising stats. Needs to be retained.
bannerImpressions log 
1/1 recording of all banner impressions
landingpages log 
1/1 recording for a click throughs on banners

hume doesn't have enough space to record the full length of fundraising logs. we've decided to archive our data on storage3 in /archive/udplogs

This section should be updated with details of currently collected log files and aggregation data.

Active filters

The list of currently used filters and who uses them

Locke

  • file 1000 /a/squid/sampled-1000.log
    • General Engineering, Erik Z
  • pipe 1 /a/webstats/bin/filter | log2udp -h 127.0.0.1 -p 3815
    • Domas, ErikZ
  • pipe 1 /a/squid/book-filter >> /a/squid/book-log.log
    • tfinc, pediapress.com
    • disabled to reduce load 2011-10-25
  • pipe 100 /a/squid/m-filter >> /a/squid/mobile.log
    • tfinc & the mobile team
  • pipe 10 /a/squid/india-filter >> /a/squid/india.log
    • nimish, mani
  • pipe 1 /a/squid/fundraising/lp-filter >> /a/squid/fundraising/logs/landingpages.log
    • Arthur and the fundraising team
  • pipe 1 /a/squid/fundraising/bi-filter >> /a/squid/fundraising/logs/bannerImpressions.log
    • Arthur and the fundraising team
  • pipe 10 awk -f /a/squid/vu.awk | log2udp -h 123.45.67.89 -p 9999
    • Guillaume Pierre, Vrije Universiteit
  • pipe 100 awk -f /a/squid/urjc.awk | log2udp -h 123.45.67.89 -p 10514
    • Antonio José Reinoso Peinado, Universidad Rey Juan Carlos
  • pipe 10 awk -f /a/squid/minnesota.awk | log2udp -h 123.45.67.89 -p 9999
  • pipe 1 /a/squid/edits-filter >> /a/squid/edits.log
    • ErikZ
  • pipe 1 /a/squid/5xx-filter | awk -W interactive '$9 !~ "upload.wikimedia.org|query.php"' >> /a/squid/5xx.log
  • pipe 10 /usr/local/bin/packet-loss 10 >> /a/squid/packet-loss.log
    • Ben and the Ops team

Emery

main udp2log process

  • pipe 1000 /var/log/squid/filters/latlong-writer >> /var/log/squid/sampled-1000.log
    • nimish (geolocation on IP addresses)
  • pipe 2 /var/log/squid/filters/sqstat3 2
    • ben and the ops team (ganglia)
  • pipe 10 /var/log/squid/filters/india-filter >> /var/log/squid/india.log
    • nimish, mani
  • pipe 10 /usr/local/bin/packet-loss 10 >> /var/log/squid/packet-loss.log
    • ben and the ops team
  • pipe 100 /var/log/squid/filters/api-filter >> /var/log/squid/api-usage.log
  • pipe 1 /var/log/squid/filters/mobile-offline-meta >> /var/log/squid/mobile-offline-meta.log
    • tfinc and mobile
  • pipe 1 /var/log/squid/filters/editSurveyBanners >> /var/log/squid/editSurveyBanners.log
    • mani
    • disabled to reduce load 2011-10-25

AFT udp2log process (this is mediawiki logs, not squid logs):

  • file 1 /var/log/aft/clicktracking.log
    • Aolita, Roan, & the features team