Talk:Analytics/AQS/Wikistats 2/DataQuality/Vetting of mediarequest metrics

Thoughts as I read through this:

to run queries with http_status you have to compare to strings like = '200'
I couldn't see any reason why mediarequests and webrequests counts would be different. The query you use to define mediarequests is the same as the one you get webrequest counts with in the first vetting, so they should match exactly. It's a very small percentage, so it shouldn't matter but it's just weird. Maybe we could find why and reflect the reason in the query defining mediarequests you
The fact that all the individual article counts match exactly the ones from mediacounts seems to hint that the even smaller difference in the totals there is because some articles are included in mediacounts but not in mediarequests. I also couldn't see a reason for this from your queries, so it'd be interesting to find an example. Also very tiny numbers but maybe it's not a problem you want to debug later

I think besides these two small curiosities, the numbers line up as they should. The endpoints all look great, I like that you used them to vet, getting integration testing done in the process, that's great. I see you're planning to vet top files, nice. I think the one other thing to vet would be the new dimensions. Since there's no old data, it would be easier to miss a problem there, like in the referer or agent splits.