Tool:Pageviews

From Wikitech
Toolforge tools
Pageviews Analysis
Website https://pageviews.toolforge.org/
Description Suite of tools to visualize pageviews
Keywords pageviews, statistics, analytics
Author(s) MusikAnimal, Kaldari, Marcel Ruiz Forns
Maintainer(s) MusikAnimal, Kaldari (View all)
Source code GitHub
License MIT License
Issues Phabricator

Pageviews Analysis is a suite of tools to visualize pageviews data of Wikimedia Foundation wikis. The suite includes seven tools that all share the same codebase: Pageviews, Langviews, Topviews, Siteviews, Redirect Views, Userviews and Mediaviews.

See meta:Pageviews Analysis for user documentation.

Deployment

There is a deploy.sh script to make deployment easier:

$ ssh tools-dev.wmflabs.org
$ become pageviews
$ sh deploy.sh
...
From https://github.com/MusikAnimal/pageviews
 * branch            master     -> FETCH_HEAD
Fast-forward

This script will pull in the new code from master, install PHP dependencies, setup necessary symlinks (which effectively change the document root in lighttpd), and update the JSDocs (which are at ~/public_html/jsdocs in the pageviews tool). You may get errors about existing symlink files, which can be ignored.

You will need to deploy code changes to each application (pageviews, langviews, topviews, etc.) individually. After code changes have been deployed to all applications, a release is created on GitHub so that we know which deploy might have caused problems based on the time a bug report was filed.

Testing in production

If you need to test the production environment, use the -test Toolforge accounts (pageviews-test, langviews-test, etc.). Not all apps have a test instance. To deploy code to those apps, use sh deploy.sh <branch-name> <app-name> where app-name is the name of the test application. For example, sh deploy.sh 2018-refactor topviews will deploy the 2018-refactor branch. The second argument "topviews" is needed because the deploy script would otherwise use "topviews-test" when building the symlinks (since it simply goes by the current working directory).

To load a test application, you must pass in the ?debug=true flag in the URL, e.g. https://topviews-test.toolforge.org/?debug=true. In your browser's JavaScript console, window.app refers to the instance of the application. This will allow you access all private methods and properties, such as the chart instance (app.chartObj) and the output data (app.outputData). Use console.log(Object.getOwnPropertyNames(app)) to see the full list.

Setting up a new app on Toolforge

  • become the new tool, then run:
    • git init .
    • git remote add origin https://github.com/MusikAnimal/pageviews.git
    • sh deploy.sh – This will pull in master, install PHP dependencies, and setup the necessary symlinks (which effectively change the document root in lighttpd).
  • Run cp config.sample.php config.php and set the values accordingly. There are comments in the file explaining what each configuration constant is for.
    • The META variables are used to communicate with the database that holds basic usage data. See below for the schema.
  • Finally, start the kubernetes webservice with webservice --backend=kubernetes start. Future updates to the application should not require restarting the webservice.

Meta database

Basic usage tracking (how many times the applications were used, and for which projects) is stored in a private database. For each application, there are two tables: one for counting which projects used the application, and one for counting usage over time. The schema is as follows (using Langviews as an example):

CREATE TABLE langviews_projects (
    id INT PRIMARY KEY NOT NULL AUTO_INCREMENT,
    project VARCHAR(255),
    count INT
)

CREATE TABLE langviews_timeline (
    id INT PRIMARY KEY NOT NULL AUTO_INCREMENT,
    date DATE,
    count INT
);

In addition to usage statistics, this database is used to store known false positives within the Topviews application. This involves two tables – one for false positives on a per-project and per-page level, and another table that serves as a blacklist of what pages to ignore for a given project and platform. The schema is as follows:

CREATE TABLE topviews_false_positives (
    id INT AUTO_INCREMENT PRIMARY KEY,
    project VARCHAR(255),
    page VARCHAR(255),
    count INT, 
    confirmed BOOLEAN
);

CREATE TABLE topviews_blacklist(
    id INT PRIMARY KEY AUTO_INCREMENT,
    project VARCHAR(255),
    page VARCHAR(255),
    platform VARCHAR(10)
);

See also