Data Engineering/Systems/Wikistats 2

From Wikitech
Wikistats interface
Wikistats interface

Access Wikistats

Wikistats is publicly available on: https://stats.wikimedia.org

You can use it to help answer many types of questions about Wikimedia projects.

About Wikistats

See also mw:Analytics/Wikistats

Wikistats is the public statistics website of the Wikimedia Foundation (not to be confused with the Cloud VPS project also called Wikistats). Its main purpose is to add context and motivate our editor community by providing a set of metrics through which users can see the impact of their contributions in the projects they are a part of. In Wikistats 2 we are not only updating the website interface but we are also providing new access to all our edit data in an analytics-friendly form. The transition of relying on static, precomputed datasets generated periodically into APIs querying our data lake improves drastically (and fundamentally changes) the way, time and resources it takes to calculate edit metrics both for the WMF and the community.

There are notable differences between the UI of "old" and "new" wikistats. However, the main difference among those two systems is the backend. Wikistats 2's metrics are computed by extracting data from MediaWiki databases, processing it and re-storing it on an analytics-friendly form so metrics can be extracted easily. Data used and served by Wikistats 2 will be all public, the source of data for the system is the new database replicas on labs.

Design Process

During Q2 and Q3 of the 2016-2017 FY the Analytics team contracted designer Aislinn Grigas to produce the design of the new Wikistats web application. There were two rounds of public requests for feedback, one for the preliminary wireframes, and one for a final candidate of the design. Prototyping and implementation of the website started on Q4 2017. The designs and wireframes produced are archived in MediaWiki here

Architecture

Wikistats 2 is a client-side only single page application, this means that it does not have a server component and can be served from anywhere, varnish, apache or even amazon s3. This way Wikistats remains abiding by the right to fork rule, as recreating it in a different server or in a local machine only requires cloning the repository and following the installation steps.

The application user interface is divided in two main sections:

  • The dashboard: where the ~12 most important metrics are shown with a small graph (when applicable), some simple aggregations (sum, increse/decrease over time...). All metrics belong to one of three main areas: reading, contributing, and content.
  • The detail page: once a metric is clicked, the app transitions into a page where the full graph of that metric is shown, with the possibility to see directly the underlying data in a table view. The page includes UI elements to gain more insights of the data, such as the breakdown selectors (splitting the metric according to several possible criteria), or the time range and granularity selectors.

The backend layer for Wikistats 2 is in alpha version and it is made by a set of API endpoints in the Analytics Query Service:

Local install for development

Cloning the project

The minimum requirements to install the Wikistats UI are Node.js (with the npm package manager) and Git. The project is hosted in a Git/Gerrit repository.

git clone https://gerrit.wikimedia.org/r/analytics/wikistats2
cd wikistats2 npm install

Third-party UI elements

Wikistats uses many components from the Semantic UI library, which requires a special initialization with gulp when installing the project:

npm install -g gulp
git clone https://github.com/Semantic-Org/Semantic-UI/ semantic
cd semantic
gulp build

Generating the bundle

Last, you need to generate the Javascript bundle that contains the Wikistats project, its dependencies and the stylesheets. Assuming you want a development environment, you should run:

npm run dev

This command will set up a watcher that will rebuild the bundle each time a project file changes. The production environment won't minify the bundle so that code is readable within the browser developer tools. This will generate the static site in ./dist-dev within your wikistats repository directory. In order to see the built site you need a simple http server such as python's SimpleHTTPServer

python -m SimpleHTTPServer 5000

The application should be now working in localhost:5000

Technology

Vue.js

Wikistats uses Vue.js as its web framework. All the components that make up the application's structure are stored in the src/components directory, the most important ones being App.vue, Dashboard.vue and Detail.vue. We recommend you to install the Vue Developer Tools for your web browser to have a clear picture of what each component is doing and what data is it handling.

Vue 3 brings performance wins and better reactivity, let's evaluate a possible migration. Version 3 is more modular and allows better tree-shaking when building, as well as a better static analysis compiler. We could get some of these benefits with a simple upgrade, but a true migration would mean rewriting our components to use the composition API and to rethink our bundling. The main benefit would be performance, and that's not a major concern right now. I would say we have more to gain from a general clean-up of state and routing, making sure all updates to state are consistently done in the same way through vuex. This would be a maintenance win, the majority of our time on Wikistats in the past year has been puzzling out problems with state. Other priorities that rank above a migration to the composition API include: increased security by moving to a vetted npm package repository, the data exploration UI that Fran proposed, and general improvements to look and feel. Wikimedia at large is starting out with Vue 2 on their proof of concept projects, so I propose we align with them. Sharing conventions and best practices seems like time better spent than upgrading. For others reading this, docs on composition api and migration.

State management and data flow

We try to avoid passing properties down the Vue component hierarchy in more than two levels. If a property is important enough that it should be passed across the whole application, we prefer it to accessed via a state manager. We user VueX as our state manager, which is declared in src/store .

Metric models

The metric data coming from the APIs is converted into a DimensionalData (src/model/DimensionalData.js) object, which uses Crossfilter.js as its local storage. The DimensionalData API allows the application to simply filter, breakdown and aggregate the data being explored in the application.

Visualization

We use d3 version 4+ for our visualizations. With this release of d3, it's possible to include only the code that we use instead of the whole library. This will allow us to bundle and optimize Wikistats for mobile devices.

Localization

Wikistats 2 is localized via Translatewiki and date/time/number formatting libraries. As translation to new languages is completed, we can include a language variant in the build by adding them to the src/languages.json file. When building, we print out the percent translated of languages that are not yet included. If you're deploying Wikistats and see new completed translations, check the Translatewiki link and add the language if everything looks good.

Tests

Tests are located in the test directory. We use Jasmine as our testing library and Karma as the test runner. Running the following:

npm test

will initialize a karma watcher that will run the webpack bundler each time a test change, and evaluate the whole test suite, printing out any failures in the console. Beware the by default, npm test will use Google Chrome as the testing browser. If you're using a different browser or environment you should change it in karma.conf.js

Additionally, there are smoke tests to be performed with each significant change to the codebase, which are described in Analytics/Wikistats 2/Smoke testing.

Contributing and Deployment

Git repo

Wikistats 2's code is hosted in its Gerrit repository and mirrored in Github as a read-only repo. Read the Wikitech page for information on how to contribute to the projects using Gerrit.

When you've created a gerrit-compliant git commit (with a Change-Id appended), you can open a new code review by running:

git push origin HEAD:refs/for/master

CRs

Smoke tests

Main article: Analytics/Wikistats 2/Smoke Testing

Adding languages

When a language in Wikistats translatewiki page has a translation coverage of 75% or more, we consider it to be ready for production. However, newly translated languages need a manual step to be included in production. Here's a step-by-step on how to add those languages:

  • Go to Wikistats translatewiki and sort the languages by coverage.
  • Open the src/languages.json file in the Wikistats repo.
  • Manually get the languages that have a coverage higher than 75%, but are not in the languages.json file.
  • For each one of them add a code snippet in languages.json (in alphabetical order if possible) like:
 "ko": {
   "numbroCode": "ko-KR",
   "englishName": "Korean",
   "nativeName": "한국어"
 }
  • The key should match the short language code specified in translatewiki's "Language" column (before the colon).
  • The numbroCode should match the most appropriate language code in https://numbrojs.com/languages.html. If numbro does not support the language, use "en".
  • A reliable place for native names of languages that would be consistent with what is used in the rest of the Wikimedia software is the language-data library.
  • Finally, run a build to make npm collect all the specified languages from translatewiki.

Deployment

Testing in beta

Even though it's not required in all changes to the UI, it is recommended to put the site bundle generated with your gerrit change in our canary website. It might be helpful for the team or other people to test your feature in different browsers or devices, especially if they don't have the Wikistats dev environment set up in their machines.

Because we want to be able to easily debug the site in the canary, we push the development bundle of the UI. In your Wikistats directory:

rm -rf dist-dev
npm run dev

When the bundle is generated, quit the process and copy the contents of the dist-dev directory to the canary machine (this assumes you have access to Labs):

scp -r ./dist-dev/* @dashiki-staging-02.eqiad.wmflabs:/srv/static/wikistats-canary.wmflabs.org/<<some-folder-name>>/

Once the copy is complete, the site should be available at https://wikistats-canary.wmflabs.org/<<some-folder-name>>

Releasing a new version to production

The way the code is relased to production is that the dist/ folder is built locally and committed to version control. In the future the build step could happen in a CI server.

You'll need jq and docker. It is technically possible build it on your own computer, but buildling in a docker container is more likely to be consistent.

First, bump up the version of Wikistats in the first lines of package.json:

{
    "name": "wikistats",
    "version": "x.y.z",

Then build and run a docker container tagged by that version:

export WIKISTATS_VERSION=$(jq -r .version package.json)
docker build . -t wikistats2:$WIKISTATS_VERSION
docker run --volume $(pwd)/dist:/app/dist wikistats2:$WIKISTATS_VERSION

The docker container itself is just for writing to the dist/ folder; we don't publish the docker images.

Before deploying, make sure to do some smoke tests in as many browsers as possible with the bundle you just created. Go to the dashboard through to a metric and change the breakdowns a bit.

  • Run npm run server and open localhost:5000/dist to browse your locally built wikistats app.
  • You can use browser tools to emulate a mobile device, or even better, access your laptop over the local network, such as http://192.168.50.13:5000/dist
  • Browse around, open the dev tools, look for errors. Note that the current month of data may give a 404; this is expected.

When you're happy with your changes, submit a new patch for the master branch:

git add package.json package-lock.json dist
git commit -m "Release $WIKISTATS_VERSION"
git tag $WIKISTATS_VERSION
git push origin HEAD:refs/for/master

Once you go to gerrit and merge the change it will get sync-ed on the machine (from which the UI is served) by the next puppet run.

We used to release from the release branch but we changed that process on 2018-08-29. See: https://gerrit.wikimedia.org/r/#/c/operations/puppet/+/455892/3/modules/statistics/manifests/sites/stats.pp.

Give it a half an hour for the job to run and you should be able to see your changes live in https://stats.wikimedia.org/v2/

Supported browsers

Wikistats 2 uses ES6 syntax and thus needs transpiling libraries to ensure that browsers that do not implement ES6 can still display the app. It also uses babel-polyfills to avoid having to write browser-dependent code and ensure all targeted browsers can display the site.

The following browsers are supported by Wikistats 2 and have been tested with positive results:

  • Last 2 versions of Google Chrome
  • Last 2 versions of Google Chrome Mobile
  • Last 2 versions of Safari
  • Last 2 versions of Mobile Safari
  • Last 2 versions of Mozilla Firefox
  • IE 10, IE 11 and Microsoft Edge