Talk:Analytics/Data Lake/Traffic/Webrequest

From Wikitech
Jump to navigation Jump to search

@Milimetric: For page names with non-Latin characters (like Arabic ones), it looks like the uri_path field can contain either Unicode values or percent-encoded values (I assume depending on the browser that generated the request). Is this question correct? If so, it seems like that fact would be a helpful addition to the schema description :) —Neil P. Quinn-WMF (talk) 22:01, 26 January 2018 (UTC)

  • To clarify, this came for a query for requests for an Arabic page—there were a very small number of results when querying for the Unicode uri_path (perhaps five or ten in a month), but when querying using the page id, there was a much bigger number which closely matched the number of pageviews reported by the Pageviews Analysis tool.—Neil P. Quinn-WMF (talk) 22:07, 26 January 2018 (UTC)
  • @Milimetric: Have you had a chance to look at this? :)—Neil P. Quinn-WMF (talk) 10:57, 27 February 2018 (UTC)