The packet-loss program sends aggregated data to the log at a fixed interval, currently every 10 minutes. First it shows totals: the number of "invalid" packets, which didn't have a valid sequence number, the number of out-of-order packets, and the number of lost packets. These totals are averaged over the 10 minute period. Then it shows a breakdown of lost packets by squid proxy server.
Counting is done by sampling the log stream, currently taking 1 in every 10 packets. All the figures are shown as percentages, with standard errors attached. These errors arise from sampling, and are calculated as a Poisson standard deviation, sqrt(N) suitably scaled. The error is larger when there are very few packets. So a line like this:
[2010-09-14T02:00:47] knsq1.knams.wikimedia.org lost: (-25.00000 ± 31.25000)%
is normal, and indicates that of the very few packets that knsq1 sent in this time period, more packets were received than would have been expected based on the gap between sequence numbers.
The log is configured in the ud2log config and currently points to
Link to UDP2LOG
Udp2log is single-threaded, we're still waiting for someone to rewrite it to use multiple threads. Every output target (pipe processor, file, etc.) receives the same log stream. If something is too slow, packets are dropped on the input side. So measuring packet loss at any output is sufficient to calculate packet loss at the other outputs.
To minimise packet loss, it's important that the udp2log thread is as fast as possible and does as little processing as possible. The pipe processors can be slower, because thanks to the 64KB pipe buffer, they can run in parallel, on separate cores. That's why the packet loss analyser had to run as a separate process, not as an integrated part of udp2log.