Fundraising/Data and flow/Stats pipeline

From Wikitech
Jump to navigation Jump to search


Data about displays and interactions with banners mostly comes via two background requests made by users' browsers when users visit one of the production wikis where banners may be shown. The requests are: beacon/impression and banner history. Data from these requests passes through multiple systems and ends up in multiple data stores, in multiple formats.


This is the main data point for CentralNotice activity on each pageview. This request is sent on 100% of pageviews for Fundraising campaigns, and is usually sent for only 1% of pageviews on non-Fundraising campaigns.

Full data can be extracted from the Hive webrequest table. Aggregated data is stored in the Druid banner_activity_minutely datastore. On the Fundraising cluster, the pgehres bannerimpressions table also stores aggregated data, generated from a 10% sample of the requests received from browsers.

CentralNotice's banner history feature is typically enabled for all Fundraising campaigns. When a user is targeted by a campaign with this feature enabled, so long as the user does not have the Do not track preference set in their browser, CentralNotivce stores in the browser's LocalStorage a log of CentralNotice activity, with entries for each of the users's pageviews over the course of the campaign.

For Fundraising campaigns, CentralNotice is normally configured to send summary versions of these logs back to WMF servers on 1% of pageviews targeted by the campaign. In addition, if a user clicks a Fundraising banner to donate (that is, if they click the button that will take them to the form where they enter payment details) the log is sent 100% of the time.

Banner history logs are sent via the EventLogging system. Here is the current schema.

Impression rates

See centralnotice_analytics.

Landing pages