Analytics/Archive/Data/Zero webrequests

From Wikitech

The Zero requests stream is holding request logs from the mobile caches that got tagged as "zero".

This page contains historical information. It may be outdated or unreliable.

This stream is owned by the Analytics Team.

Contained data

The contained data only covers requests from mobile caches (hence only a part of all mobile traffic, and it does not cover traffic to for example api, bits, upload subdomains), that come from an IP that is associated[1] to a zero carrier and has the zero= marker set.

Availability

stat1002.eqiad.wmnet /a/squid/archive/zero

The stream is available unsampled in Cache log format as gzipped files at /a/squid/archive/zero/zero.tsv.log-*.gz on stat1002.eqiad.wmnet.

The date in the file name does not mean that all logs of that day are in that file. Instead, the files contain logs from ~06:30 of the previous day until to ~06:30 of the day in the file name. So for example /a/squid/archive/zero/zero.tsv.log-20130930.gz contains data from ~2013-09-29T06:30.00 until ~2013-09-30T06:30:00.

Statistics for 2013-12-01–2013-12-07
Avg. size / gzipped file 183 MiB
Avg. size / uncompressed file 543 MiB
Avg. lines / uncompressed file 2774 K
Avg. lines / second 32
Avg. requests / second 32

This stream gets used for:

  • adhoc research

Events and known problems since 2013-09-01

Date from Date until Bug Details
Inherent Only covers data from mobile caches, not all mobile traffic.
Inherent The stream may suffer from packet drop on udp2log. This should be <5%.
* Lines that would be longer than ~8K get chopped off at that border (no newline gets added). (Affects <1 line/day on average)
* “zero” markers got set not only for wikipedia, but also for sister projects (wiktionary, ...)
* 2013-09-26 bug 53806 Until around 2013-09-26 ~22:57, the client ip might have been garbled.
2013-09-26 2013-10-01 bug 54779 No “mf-m” markers in stream between 2013-09-26 ~22:56 and 2013-10-01 ~13:32.
2013-11-13 2013-12-16 bug 58764 <40 lines/day have been concatenated due to puppet runs unnecessarily restarting the udp2log filtering. First occurrence on 2013-11-13T17:29:22. Last occurrence on 2013-12-16T16:29:20.
2013-12-18 n/a bug 58889 Increase in zero=470-01 (Grameenphone Bangladesh) tagged traffic, due to the advertisement by the carrier
2014-01-05 2014-01-06 bug 59722 Udp2log relay went down. There are no log lines between 2014-01-05T03:39:25 and 2014-01-06T17:45:10.
2014-03-21 2014-03-21 bug 62922 Sometimes zero tags are doubled like “zero=250-99;zero=250-99”. The first occurrence is on 2014-03-21T00:15:41. Last occurrence is on 2014-03-21T17:18:02.
2014-05-22 2014-06-24 bug 66833 Zero tags need not have a trailing characters stripped (like “zero=404-01b” instead of “zero=404-01”). Last occurrence is on 2014-06-24T15:29:30.
2014-07-25 ~14:00 2014-07-25 ~17:00 bug 69112 A big part of carrier 250-99 requests were not properly zero tagged, and hence are missing from this stream.
2014-07-08 19:00 2014-07-08 22:00 bug 67694 A 2014 FIFA World Cup (soccer) related traffic spike caused udp2log overload and lead to up to ~10% packetloss during this period of time.
2014-07-09 09:00 2014-07-10 09:00 bug 68199 Traffic has been rerouted from ulsfo to eqiad for ULSFO floor move. No data has been lost, but host column may show eqiad caches for traffic that could be expected to go to ulsfo.
2014-07-13 19:00 2014-07-13 23:00 bug 67694 A 2014 FIFA World Cup (soccer) related traffic spike caused udp2log overload and lead to up to ~25% packetloss during this period of time.
2014-07-29 01:35 2014-07-29 01:42 bug 68796 cp3013, cp3014 (half of esams) missing between 2014-07-29T01:35:45 and 2014-07-29T01:42:00 due to flapping network link (~15% of total mobile traffic around that time)
2014-07-30 ~00:54 2014-08-04 ~21:00 bug 69112 A big part of carrier 250-99 requests were not properly zero tagged, and hence are missing from this stream.
2014-08-16 ~22:43 2014-08-16 ~22:49 bug 69663 Root mount on oxygen went full, which caused services to panic and udp2log dropped requests during that time
2014-08-17 ~06:26 2014-08-17 ~06:30 bug 69663 Root mount on oxygen went full again, which caused services to panic and udp2log dropped requests during that time
2014-10-08 ~22:00 2014-10-08 ~24:00 bug 71879 ULSFO having connectivity issues leading to partial message loss
2014-10-20 13:06 2014-10-20 13:27 bug 72306 ULSFO connectivity issues causing packet loss between 6% and 47% for ulsfo caches.
2014-10-21 ~10:30 2014-10-21 ~11:43 bug 72355 Ulsfo connectivity issues causing packet loss for ulsfo caches.
2014-11-30 ~03:50 2014-11-30 ~10:13 task T76334 No data while analytics infrastructure suffered eqiad network issues.
2015-01-13 ~22:20 2015-01-13 ~23:18 task T86973 No data due to firewall problems

stat1002.eqiad.wmnet /a/log/webrequest/archive/zero

The stream is available in Cache log format unsampled as gzipped files at /a/log/webrequest/archive/zero/zero.tsv.log-*.gz on stat1002.eqiad.wmnet (using kafka as backend).

Each file covers the full day of the date in the file name.

Events and known problems since 2015-01-01

Date from Date until Bug Details

Note

  1. ↑ See the Zero namespace of the zero wiki. For example Zero:404-01 for the carrier 404-01.