Fundraising/Data and flow/Database schemas
This is a description of WMF Fundraising database schema and fields.
The databases are only available in the private fundraising cluster, please request access through Phabricator.
See also Fun SQL Queries.
civicrm
There are civicrm core tables, civicrm custom tables, and then our custom tables in this database
Core
Contributions civicrm_contributions
Recurring Contribution civicrm_contribution_recur
Custom core
Our custom tables
civicrm_contribution_tracking
For every person that lands on a payments.wiki page, a row is created in the contribution_tracking table. This is what we have historically always used to track landing page impressions. The record is updated with a contribution ID if it results in a successful donation.
This is the replacement for drupal.contribution_tracking. Notable changes:
- ts (timestamp) replaced with tracking_date (datetime)
- banner and landing_page are available as separate fields outside of utm_source
- Introduction of payment_method_id and payment_submethod_id - see https://phabricator.wikimedia.org/T354708#9489865 for how to join
- Addition of is_recurring field
- form_amount replaced with currency and amount
- Addition of user agent info: os, os_version, browser, browser_version
+-----------------------+------------------+------+-----+---------+-------+
| Field | Type | Null | Key | Default | |
+-----------------------+------------------+------+-----+---------+-------+
| id | int(10) unsigned | NO | PRI | NULL | |
| contribution_id | int(10) unsigned | YES | MUL | NULL | |
| amount | decimal(20,3) | YES | | NULL | |
| currency | varchar(3) | YES | MUL | NULL | |
| usd_amount | decimal(20,2) | YES | | NULL | |
| is_recurring | tinyint(4) | YES | | NULL | |
| referrer | varchar(4096) | YES | | NULL | |
| utm_medium | varchar(128) | YES | MUL | NULL | |
| utm_campaign | varchar(128) | YES | MUL | NULL | |
| utm_key | varchar(128) | YES | | NULL | |
| gateway | varchar(32) | YES | | NULL | |
| appeal | varchar(64) | YES | | NULL | |
| payments_form_variant | varchar(64) | YES | | NULL | |
| banner | varchar(128) | YES | MUL | NULL | |
| landing_page | varchar(128) | YES | MUL | NULL | |
| payment_method_id | int(11) | YES | MUL | NULL | |
| payment_submethod_id | int(11) | YES | | NULL | |
| language | varchar(8) | YES | MUL | NULL | |
| country | varchar(2) | YES | MUL | NULL | |
| tracking_date | datetime | YES | MUL | NULL | |
| os | varchar(255) | YES | | NULL | |
| os_version | varchar(255) | YES | | NULL | |
| browser | varchar(255) | YES | | NULL | |
| browser_version | varchar(255) | YES | | NULL | |
| recurring_choice_id | int(11) | YES | | NULL | |
| device_type_id | int(11) | YES | | NULL | |
| banner_size_id | int(11) | YES | | NULL | |
| is_test_variant | tinyint(4) | YES | | NULL | |
| banner_variant | varchar(128) | YES | | NULL | |
| is_pay_fee | tinyint(4) | YES | | NULL | |
| mailing_identifier | varchar(32) | YES | MUL | NULL | |
| utm_source | varchar(128) | YES | | NULL | |
| banner_history_log_id | varchar(255) | YES | | NULL | |
+-----------------------+------------------+------+-----+---------+-------+
- id: Unique Contribution Tracking ID
- contribution_id: Joins to the id column in civicrm_contribution. Contributions that were not actually completed, will be NULL.
- amount':
- currency: The currency code and amount that the user had initially selected.
- usd_amount: USD Amount
- is_recurring:
- referrer: The page that got the user to our pipeline. Usually a wiki project page. Sometimes something totally different.
- utm_medium: A general indication of the group of places that this user came from (common ones are 'sitenotice', 'sidebar' or 'email')
- utm_campaign: The specific campaign that this person came from
- utm_key:
for recent-ish banners: how many times the person saw a banner (cookieCount) before they started this contributionNo longer in useA variety of "extra data" from banners and donatewiki such as screen size, whether user selected pay the fee, etc. Entries separated by ~ - gateway: Gateway, e.g paypal_ec,adyen
- appeal: e.g JimmyQuote - the appeal is the text to the left of the input boxes (on desktop-size screens)
- payments_form_variant: a 'variant' generally changes something about the input boxes or their labels
- banner: Banner
- landing_page: Landing page
- payment_method_id:
- payment_submethod_id:
- language: The user's language preferences
- country: The user's country of web origin.
- tracking_date: Date
- os: Operating System
- os_version: Operating System - Major Version
- browser: Browser
- browser_version: Browser Version
- recurring_choice_id: Denotes whether a recurring donation was the result of upsell or an organic recurring transaction
- device_type_id: The device the banner was served to (e.g Desktop or Mobile)
- banner_size_id: Large or small banner
- is_test_variant: Test, rather than a control group
- banner_variant: The name of the tested variant (if not control)
- is_pay_fee: Did the user select to pay the processing fee
- mailing_identifier: External mailing identifier
- utm_source: A string that builds when a user moves through our donation pipeline. Typically includes a banner name/email code, any landing page info, and a payment method. This is the original text but is separately broken out into banner etc. We aspire to drop this field if we can be sure the data in it is clean enough to always capture in sub-fields
- banner_history_log_id: Temporary banner history log ID to associate banner history EventLogging events.
Joining civicrm_contribution and contribution_tracking
select *
from civicrm.civicrm_contribution_tracking t
left join civicrm.civicrm_contribution c on c.id = t.contribution_id;
select *
from civicrm.civicrm_contribution c
left join civicrm.civicrm_contribution_tracking t on c.id = t.contribution_id;
wmf_contribution_extra
MariaDB [civicrm]> describe wmf_contribution_extra;
+----------------------------------------+------------------+------+-----+---------+----------------+
| Field | Type | Null | Key | Default | Extra |
+----------------------------------------+------------------+------+-----+---------+----------------+
| id | int(10) unsigned | NO | PRI | NULL | auto_increment |
| entity_id | int(10) unsigned | NO | UNI | NULL | |
| settlement_date | datetime | YES | MUL | NULL | |
| total_usd | decimal(20,2) | YES | MUL | NULL | |
| gateway_account | varchar(255) | YES | MUL | NULL | |
| gateway | varchar(255) | YES | MUL | NULL | |
| gateway_txn_id | varchar(255) | YES | MUL | NULL | |
| gateway_status_raw | varchar(255) | YES | MUL | NULL | |
| gateway_date | datetime | YES | MUL | NULL | |
| original_amount | decimal(20,2) | YES | MUL | NULL | |
| original_currency | varchar(255) | YES | MUL | NULL | |
| settlement_usd | decimal(20,2) | YES | MUL | NULL | |
| settlement_currency | varchar(255) | YES | MUL | NULL | |
| deposit_date | datetime | YES | MUL | NULL | |
| deposit_usd | decimal(20,2) | YES | MUL | NULL | |
| deposit_currency | varchar(255) | YES | MUL | NULL | |
| parent_contribution_id | int(11) | YES | MUL | NULL | |
| finance_only | tinyint(4) | YES | MUL | NULL | |
| postmark_date | datetime | YES | MUL | NULL | |
| source_name | varchar(255) | YES | MUL | NULL | |
| source_type | varchar(255) | YES | MUL | NULL | |
| source_host | varchar(255) | YES | MUL | NULL | |
| source_run_id | varchar(255) | YES | MUL | NULL | |
| source_version | varchar(255) | YES | MUL | NULL | |
| source_enqueued_time | datetime | YES | MUL | NULL | |
| no_thank_you | varchar(255) | YES | MUL | NULL | |
| backend_processor | varchar(255) | YES | MUL | NULL | |
| backend_processor_txn_id | varchar(255) | YES | MUL | NULL | |
| settlement_batch_number | varchar(255) | YES | MUL | NULL | |
| markup_fee | decimal(20,2) | YES | | NULL | |
| interchange_fee | decimal(20,2) | YES | | NULL | |
| scheme_fee | decimal(20,2) | YES | | NULL | |
| payment_orchestrator_reconciliation_id | varchar(255) | YES | MUL | NULL | |
+----------------------------------------+------------------+------+-----+---------+----------------+
drupal
fredge
pgehres
This database is named in honor of our dear friend Peter Gehres, and holds a slightly aggregated cache of banner impression statistics. It is written to by a command-line Django script called DjangoBannerStats (git repo). This cron job processes the series of logged web requests to `beacon/impression`.
One day, we hope to replace this database.
pgehres.bannerimpressions
Banner request counts, aggregated from the weblogs.
Banner impressions are now available in the `bannerimpressions` table. They are aggregated in 5-minute chunks and grouped by banner, campaign, project, language and country. The timestamp reflects the "middle" of the period (e.g. 1:00:00 to 1:05:00 is recorded as 1:02:30). The count is found in the `count` column and is corrected for any sampling.
P.S. - Raw impressions can be found in `bannerimpressions_raw`, but I would strongly recommend against querying it due to the massive number of rows. Queries on the aggregate table will be orders of magnitude faster.
-- pgehres
MariaDB [(none)]> use pgehres; MariaDB [pgehres]> describe bannerimpressions; +-------------+----------------------+------+-----+-------------------+----------------+ | Field | Type | Null | Key | Default | Extra | +-------------+----------------------+------+-----+-------------------+----------------+ | id | int(11) unsigned | NO | PRI | NULL | auto_increment | | timestamp | timestamp | NO | MUL | CURRENT_TIMESTAMP | | | banner | varchar(255) | NO | MUL | | | | campaign | varchar(255) | NO | MUL | | | | project_id | smallint(3) unsigned | YES | MUL | NULL | | | language_id | smallint(3) unsigned | YES | MUL | NULL | | | country_id | smallint(3) unsigned | YES | MUL | NULL | | | count | mediumint(11) | YES | | 0 | | +-------------+----------------------+------+-----+-------------------+----------------+
`project_id`, `language_id` and `country_id` map to `project`, `language` and `country` tables in the same database.
One gotcha is that for performance reasons only the top 20-ish languages get a real language_id. Everything else gets a generic one (see gerrit:119740 for how to add to that list. TODO Split all languages.)
pgehres.landingpageimpression_raw
Sorry, there is no built-in aggregation. This table logs the landing page and UTM data for various URLs. We're making this data available for future landing page A/B testing, but there are no active consumers.
smashpig
silverpop
<<<<<<<<everything from mediawiki >>>>>>>>>
drupal database
Our Drupal modules add a few tables of interest.
drupal.contribution_source
Unpacked normalization of `contribution_tracking.utm_source` into its three components: banner, landing page, and payment method.
Join against `contribution_tracking`,
select * from contribution_tracking t left join contribution_source s on s.contribution_tracking_id = t.id;
(TODO: understand or fix why some rows are missing: phab:T98643)
banner - Name of the banner
drupal.banner_history_contribution_associations
Links contribution_tracking id's with banner history log id's.
drupal.exchange_rates
Current and historical foreign exchange rates.
drupal.large_donation_notification
Donation amount thresholds that trigger an email, maintained by the `large_donation` module.
drupal.wmf_campaigns_campaign
Campaign names which will trigger an email upon matching donations. Maintained by the `wmf_campaigns` module.
civicrm database
This is the database that drives civi. As such, all information about completed transactions will be in there somewhere.
Note that we add many custom fields which are managed by CiviCRM and should not be queried directly due to dynamically generated table names, e.g. `civicrm_value_1_stock_information_10`.
civicrm.address
Billing address given by the donor. You must always restrict to `civicrm_address.is_primary = 1` when querying.
civicrm.civicrm_contact
Main record for a donor. We create a new contact for every donation, and deduping happens after the fact if ever.
civicrm.civicrm_contribution
This table contains all the financial information about every donation we have received.
describe civicrm_contribution; +--------------------------+------------------+------+-----+---------+----------------+ | Field | Type | Null | Key | Default | Extra | +--------------------------+------------------+------+-----+---------+----------------+ | id | int(10) unsigned | NO | PRI | NULL | auto_increment | | contact_id | int(10) unsigned | NO | MUL | NULL | | | financial_type_id | int(10) unsigned | YES | MUL | NULL | | | contribution_page_id | int(10) unsigned | YES | MUL | NULL | | | payment_instrument_id | int(10) unsigned | YES | MUL | NULL | | | receive_date | datetime | YES | MUL | NULL | | | non_deductible_amount | decimal(20,2) | YES | | 0.00 | | | total_amount | decimal(20,2) | NO | MUL | NULL | | | fee_amount | decimal(20,2) | YES | | NULL | | | net_amount | decimal(20,2) | YES | | NULL | | | trxn_id | varchar(255) | YES | UNI | NULL | | | invoice_id | varchar(255) | YES | UNI | NULL | | | invoice_number | varchar(255) | YES | | NULL | | | currency | varchar(3) | YES | | NULL | | | cancel_date | datetime | YES | | NULL | | | cancel_reason | text | YES | | NULL | | | receipt_date | datetime | YES | | NULL | | | thankyou_date | datetime | YES | | NULL | | | source | varchar(255) | YES | MUL | NULL | | | amount_level | text | YES | | NULL | | | contribution_recur_id | int(10) unsigned | YES | MUL | NULL | | | is_test | tinyint(4) | YES | | 0 | | | is_pay_later | tinyint(4) | YES | | 0 | | | contribution_status_id | int(10) unsigned | YES | MUL | 1 | | | address_id | int(10) unsigned | YES | MUL | NULL | | | check_number | varchar(255) | YES | MUL | NULL | | | campaign_id | int(10) unsigned | YES | MUL | NULL | | | creditnote_id | varchar(255) | YES | MUL | NULL | | | tax_amount | decimal(20,2) | YES | | NULL | | | revenue_recognition_date | datetime | YES | | NULL | | +--------------------------+------------------+------+-----+---------+----------------+
Field Descriptions
- id - Primary key. Joins to drupal.contribution_tracking.contribution_id.
- contact_id - Joins to civicrm_contact.id
- financial_type_id - Joins to civicrm_financial_type.id. We're inconsistent about how we assign financial types.
- payment_instrument_id - Joins to civicrm_option_value where option_group_id = 10 ("payment_instrument"). This encodes the full payment method.
- receive_date - The date that the transaction was initiated on the payments system.
- total_amount - The donation amount in USD.
- trxn_id - A unique transaction identifier, not necessarily the same as the gateway's transaction ID. Usually starts with the gateway in all-caps, followed by the gateway's transaction id for this donation.
- thankyou_date - the date we sent the Thank You letter to the donor.
- source - Original currency and gross.
- contribution_recur_id - If this is a recurring payment, this will join to civicrm_contribution_recur.id, otherwise will be NULL.
- contribution_status_id - Joins to civicrm_option_value where option_group_id = 11 ("contribution_status").
- check_number - if it's a check, this should be a number.
civicrm.civicrm_email
Always restrict to civicrm_email.is_primary = 1 unless you're doing something crazy.
civicrm.wmf_contribution_extra
describe wmf_contribution_extra; +------------------------+------------------+------+-----+---------+----------------+ | Field | Type | Null | Key | Default | Extra | +------------------------+------------------+------+-----+---------+----------------+ | id | int(10) unsigned | NO | PRI | NULL | auto_increment | | entity_id | int(10) unsigned | NO | UNI | NULL | | | settlement_date | datetime | YES | MUL | NULL | | | total_usd | decimal(20,2) | YES | MUL | NULL | | | gateway_account | varchar(255) | YES | MUL | NULL | | | gateway | varchar(255) | YES | MUL | NULL | | | gateway_txn_id | varchar(255) | YES | MUL | NULL | | | gateway_status_raw | varchar(255) | YES | MUL | NULL | | | gateway_date | datetime | YES | MUL | NULL | | | original_amount | decimal(20,2) | YES | MUL | NULL | | | original_currency | varchar(255) | YES | MUL | NULL | | | settlement_usd | decimal(20,2) | YES | MUL | NULL | | | settlement_currency | varchar(255) | YES | MUL | NULL | | | deposit_date | datetime | YES | MUL | NULL | | | deposit_usd | decimal(20,2) | YES | MUL | NULL | | | deposit_currency | varchar(255) | YES | MUL | NULL | | | parent_contribution_id | int(11) | YES | MUL | NULL | | | finance_only | tinyint(4) | YES | MUL | NULL | | | source_name | varchar(255) | YES | MUL | NULL | | | source_type | varchar(255) | YES | MUL | NULL | | | source_host | varchar(255) | YES | MUL | NULL | | | source_run_id | varchar(255) | YES | MUL | NULL | | | source_version | varchar(255) | YES | MUL | NULL | | | source_enqueued_time | datetime | YES | MUL | NULL | | | postmark_date | datetime | YES | | NULL | | | no_thank_you | varchar(255) | YES | MUL | NULL | | +------------------------+------------------+------+-----+---------+----------------+
The `wmf_civicrm` module adds its own schema to `civicrm_contribution`, stored in the `wmf_contribution_extra` table. Perhaps it should be in the Drupal database, but CiviCRM core doesn't know about custom tables in another database.
Fields
- id - primary key
- entity_id - Joins to civicrm_contribution.id.
- gateway - Which payment processor handled this transaction.
- gateway_account - Account name for processors with multiple accounts.
- gateway_txn_id - Order ID at the processor. Note that this is not necessarily unique, processors have funny ways of recording refunds, recurring payments and so on.
- original_amount - Gross in the native currency.
- original_currency - Native currency code.
- parent_contribution_id - Link to civicrm_contribution.id, for refunds only. This should be deprecated with Civi 4.6.
- finance_only - We're hiding this record from most reports. (TODO: document why)
- source_name - Specific system responsible for creating this donation record.
- source_type - Class of source system.
- source_host - Originating machine.
- source_version - Revision of the code that produced this record.
- source_enqueued_time - Time at which this message was first sent to the completed donation queue.
- no_thank_you - A string explaining why we aren't sending an automatic thank-you letter. Usually NULL. If there is content in this field, the `thank_you` job will not send an automatic letter.
Joining wmf_contribution_extra and civicrm_contribution
select * from civicrm_contribution c join wmf_contribution_extra e on e.entity_id = c.id;
civicrm.wmf_donor
This table stores summary data about donors' donations in the wmf_donor table (Civicrm database). The data there is generated using mysql triggers (see below) from the contributions a contact has. The data covers such things as their total giving in a calendar year, a financial year, the last donation, the first donation. It is used by analytics and partially exported to Acoustic. It is also used for searching and reporting within CiviCRM.
Triggers
Triggers in CiviCRM generate entries in log tables and in wmf_donor. [TODO: anything else?]
On developers' local sites triggers are generated when logging is enabled (and removed when logging is disabled), which is CiviCRM normal behaviour. On production our CiviCRM db user does not have permission to create triggers and hence we have a process to track the db triggers through git and for Fr-tech-ops to run that file. In order for the triggers to be output rather than run directly we have a CiviCRM setting (logging_no_trigger_permission
) which is set to TRUE on production and FALSE on developer instances.
Like other settings, it can be fetched and altered using the api:
drush @wmff cvapi setting.get return=logging_no_trigger_permission
drush @wmff cvapi Setting.create logging_no_trigger_permission=0
Triggers are rebuilt when logging is disabled & enabled or when a system flush is done:
drush @wmff cvapi System.flush triggers=1
When the above flush command is run, one of the following things will happen:
- If logging_no_trigger_permission is FALSE, the triggers in the database will be updated; or
- If logging_no_trigger_permission is TRUE, an output file will be generated in drupal/sites/default/files/civicrm/ConfigAndLog/CiviCRM.trigger*. This file should be checked in to git sites/all/modules/wmf_civicrm/scripts/triggers.mysql
The Civi UI can also be used to turn logging off and on (under Administer->System Settings->Misc). In this case, a dialogue should appear indicating where the new triggers SQL file has been saved.
This setting can also be toggled using the in-browser JS console api method (you will need to refresh the page afterwards):
CRM.api3('setting','create', {'logging': 0});
CRM.api3('setting','create', {'logging': 1});
When fields are added, new triggers are added manually on production.
Custom fields
Local dev Civi setups do not have custom fields that have been added on production via the UI. They should, however, have all the custom fields that have logic attached to them.
fredge database
fredge.payments_fraud
Summary of risk score and validation outcome for each donation attempt.
fredge.payments_fraud_breakdown
Individual components of `payments_fraud.risk_score`, join against that table like:
select * from payments_fraud f join payments_fraud_breakdown b on b.payments_fraud_id = f.id;
fredge.payments_initial
Information about donation outcome, measured when the initial donation workflow is completed.
Join to `contribution_tracking`,
select * from fredge.payments_initial i join drupal.contribution_tracking t on t.id = i.contribution_tracking_id;