Fundraising/Data and flow/Stats pipeline

From Wikitech

Banners

Data about displays and interactions with banners mostly comes via two background requests made by users' browsers when users visit one of the production wikis where banners may be shown. The requests are: beacon/impression and banner history. Data from these requests passes through multiple systems and ends up in multiple data stores, in multiple formats.

beacon/impression

This is the main data point for CentralNotice activity on each pageview. This request is sent on 100% of pageviews for Fundraising campaigns, and is usually sent for only 1% of pageviews on non-Fundraising campaigns.

Full data can be extracted from the Hive webrequest table. Aggregated data is stored in the Druid banner_activity_minutely datastore. On the Fundraising cluster, the pgehres bannerimpressions table also stores aggregated data, generated from a 10% sample of the requests received from browsers.

CentralNotice's banner history feature is typically enabled for all Fundraising campaigns. When a user is targeted by a campaign with this feature enabled, so long as the user does not have the Do not track preference set in their browser, CentralNotivce stores in the browser's LocalStorage a log of CentralNotice activity, with entries for each of the users's pageviews over the course of the campaign.

For Fundraising campaigns, CentralNotice is normally configured to send summary versions of these logs back to WMF servers on 1% of pageviews targeted by the campaign. In addition, if a user clicks a Fundraising banner to donate (that is, if they click the button that will take them to the form where they enter payment details) the log is sent 100% of the time.

Banner history logs are sent via the EventLogging system. Here is the current schema.

Impression rates

See centralnotice_analytics.

Landing pages

Older Content moved from main page

Banner impressions and landing page stats are collected from the production proxies. Fundraising_Analytics/Impression_Stats. The wmf:Thank_you page includes wmf:Template:Hide_banners which loads Special:HideBanners from multiple domains via image src. HideBanners sets cookies for donors which tell CentralNotice's bannerController.js not to pester them for a year or so.

utm_source

This is a tracking variable which is supposed to collect information about the transaction. Currently, it is a period-separated concatenation of three components. One interpretation of the components is, 1) banner name, 2) landing page name, and 3) payment method. We are currently in the process of standardizing (see FR #965 and FR #673).

In theory, each component may be a tilde-concatenation of a sequence of landing pages, for example. That code is badly dysfunctional.

utm_medium

Donor was referred by this type of site: sitenotice, spontaneous, sidebar, socialmedia.

Seems unuseful at this broad granularity.

utm_campaign

The parent campaign for the banner where this donation was initiated.

utm_key

TODO