Wikistats is the public statistics website of the Wikimedia Foundation (not to be confused with the Cloud VPS project also called Wikistats). Its main purpose is to add context and motivate our editor community by providing a set of metrics through which users can see the impact of their contributions in the projects they are a part of. In Wikistats 2 we are not only updating the website interface but we are also providing new access to all our edit data in an analytics-friendly form. The transition of relying on static, precomputed datasets generated periodically into APIs querying our data lake improves drastically (and fundamentally changes) the way, time and resources it takes to calculate edit metrics both for the WMF and the community.
There are notable differences between the UI of "old" and "new" wikistats (see prototype here). However, the main difference among those two systems is the backend. Wikistats 2 computes metrics by extracting data from MediaWiki databases, processing it and re-storing it on an analytics-friendly form so metrics can be extracted easily. Data used and served by Wikistats 2 will be all public, the source of data for the system is the new database replicas on labs.
During Q2 and Q3 of the 2016-2017 FY the Analytics team contracted designer Aislinn Grigas to produce the design of the new Wikistats web application. There were two rounds of public requests for feedback, one for the preliminary wireframes, and one for a final candidate of the design. Prototyping and implementation of the website started on Q4 2017. The designs and wireframes produced are archived in MediaWiki here
Wikistats 2 is a client-side only single page application, this means that it does not have a server component and can be served from anywhere, varnish, apache or even amazon s3. This way Wikistats remains abiding by the right to fork rule, as recreating it in a different server or in a local machine only requires cloning the repository and following the installation steps.
The application user interface is divided in two main sections:
- The dashboard: where the ~12 most important metrics are shown with a small graph (when applicable), some simple aggregations (sum, increse/decrease over time...). All metrics belong to one of three main areas: reading, contributing, and content.
- The detail page: once a metric is clicked, the app transitions into a page where the full graph of that metric is shown, with the possibility to see directly the underlying data in a table view. The page includes UI elements to gain more insights of the data, such as the breakdown selectors (splitting the metric according to several possible criteria), or the time range and granularity selectors.
The backend layer for Wikistats 2 is in alpha version and it is made by a set of services run by AQS:
Local install for development
Cloning the project
The minimum requirements to install the Wikistats UI are Node.js (with the npm package manager) and Git. The project is hosted in a Phabricator repository
git clone https://gerrit.wikimedia.org/r/analytics/wikistats2
Third-party UI elements
Wikistats uses many components from the Semantic UI library, which requires a special initialization with gulp when installing the project:
npm install -g gulp cd semantic gulp build
Generating the bundle
npm run dev
This command will set up a watcher that will rebuild the bundle each time a project file changes. The production environment won't minify the bundle so that code is readable within the browser developer tools. This will generate the static site in
./dist-dev within your wikistats repository directory. In order to see the built site you need a simple http server such as python's SimpleHTTPServer
python -m SimpleHTTPServer 5000
The application should be now working in
Wikistats uses Vue.js as its web framework. All the components that make up the application's structure are stored in the src/components directory, the most important ones being App.vue, Dashboard.vue and Detail.vue. We recommend you to install the Vue Developer Tools for your web browser to have a clear picture of what each component is doing and what data is it handling.
State management and data flow
We try to avoid passing properties down the Vue component hierarchy in more than two levels. If a property is important enough that it should be passed across the whole application, we prefer it to accessed via a state manager. We user VueX as our state manager, which is declared in
The metric data coming from the APIs is converted into a DimensionalData (
src/model/DimensionalData.js) object, which uses Crossfilter.js as its local storage. The DimensionalData API allows the application to simply filter, breakdown and aggregate the data being explored in the application.
We use d3 version 4+ for our visualizations. With this release of d3, it's possible to include only the code that we use instead of the whole library. This will allow us to bundle and optimize Wikistats for mobile devices.
We have localization planned for our Wikistats2 in the next fiscal year (July 2018- July 2019). Thus far we only localize units on graphs to the language of your browser.
Some guidelines as to localization:
- Vue has this helpful concept of filters, their scope is application-wide and can be easily used in templates. When possible formatting of numbers/dates should be outsourced to filters. See: https://vuejs.org/v2/guide/filters.html
- Templates should not deal directly with the language of the site or with utilities that are language-aware like numeral.js or minute.js.
Tests are located in the
test directory. We use Jasmine as our testing library and Karma as the test runner. Running the following:
will initialize a karma watcher that will run the webpack bundler each time a test change, and evaluate the whole test suite, printing out any failures in the console. Beware the by default, npm test will use Google Chrome as the testing browser. If you're using a different browser or environment you should change it in
Additionally, there are smoke tests to be performed with each significant change to the codebase, which are described in Analytics/Wikistats 2/Smoke testing.
Contributing and Deployment
Wikistats 2 used Phabricator's repository management software, Differential, as its code hosting solution until the end of Q1 2017. As of October 2017, like most Wikimedia projects, Wikistats 2's code is hosted in its Gerrit repository and mirrored in Github as a read-only repo. Read the Wikitech page for information on how to contribute to the projects using Gerrit.
When you've created a gerrit-compliant git commit (with a change-id appended), you can open a new code review by running:
git push origin HEAD:refs/for/master
- If packages were added: check bundle size and webpack bundles, do those need updating?
* zoom in, out * change to mobile mode * hover, check tooltips, mouse pointers * resize window * check CPU/memory * try Firefox, Chrome, Safari and if possible Opera Mini
Testing in beta
Even though it's not required in all changes to the UI, it is recommended to put the site bundle generated with your gerrit change in our canary website. It might be helpful for the team or other people to test your feature in different browsers or devices, especially if they don't have the Wikistats dev environment set up in their machines.
Because we want to be able to easily debug the site in the canary, we push the development bundle of the UI. In your Wikistats directory:
rm -rf dist-dev npm run dev
When the bundle is generated, quit the process and copy the contents of the dist-dev directory to the canary machine (this assumes you have access to Labs):
scp -r ./dist-dev/* @dashiki-staging-01.eqiad.wmflabs:/srv/static/wikistats-canary.wmflabs.org/<<some-folder-name>>/
Once the copy is complete, the site should be available at https://wikistats-canary.wmflabs.org/
Releasing a new version to production
Deploying is done by pushing the latest stable app bundle to the release branch. To generate the bundle in the dist folder, run:
npm install npm run build
This will generate a production bundle with all the JS and CSS minified and transpiled for compatibility with most browser versions. Before deploying, make sure to do some smoke tests in as many browsers as possible with the bundle you just created. Go to the dashboard through to a metric and change the breakdowns a bit.
To deploy, bump up the semantic version of Wikistats in package.json, and then submit a new patch for the master branch:
git add package.json dist git commit -m "Release x.y.z" (this will add files on /dist directory)
git tag x.y.z git push origin HEAD:refs/for/master
Once you go to gerrit and merge the change it will get sync-ed on the machine (from which the UI is served) by the next puppet run. We used to release from the release branch but we changed that process on 2018-08-29. See: https://gerrit.wikimedia.org/r/#/c/operations/puppet/+/455892/3/modules/statistics/manifests/sites/stats.pp Give it a half an hour for the job to run and you should be able to see your changes live in https://stats.wikimedia.org/v2/
Wikistats 2.0 uses es6 syntax and thus needs transpiling libraries to ensure that browsers that do not implement es6 can still display the app. It also uses babel-polyfills to avoid having to write browser-dependent code and ensure all targeted browsers can display the site. The following browsers are supported by Wikistats2.0 and have been tested with positive results:
- Last 2 versions of Google Chrome
- Last 2 versions of Google Chrome Mobile
- Last 2 versions of Safari
- Last 2 versions of Mobile Safar
- Last 2 versions of Mozilla Firefox
- IE 10, IE 11 and Microsoft Edge
For Analytics' team members
We talked about overall objective of project this quarter: "establish technical viability of our workflow of data". It has two main steps: bootstrapping of data and updates. Bootstrapping will be done from dumps or database. Aaron mentioned that db will be best as we might get better quality data. Updates are to come from event stream. Event Stream Schemas: https://github.com/wikimedia/mediawiki-event-schemas/tree/master/jsonschema
We talked about us having ability to change publishing on event stream from mw to whatever is needed, we got input from Aaron for the schemas that eventBus uses. Regarding scaling: loading of boostrapping data is really a 1-off. Can we move data from altiscale cluster to our cluster? Can we get a db slave just for this? *(sounds like this last one is easy to do)
To reduce our iteration cost: can we calculate metrics for just one project to start? We are focusing on data workflow rather than data precision.
Action Items: Joseph to look at event stream schemas and to verify if we have enough data to do a metric vertical (Pages Created?)