Data Engineering/Systems/Dashiki

From Wikitech

Dashiki is a dashboarding tool built by the Wikimedia Analytics team. It lets users declare dashboards using configuration pages on a wiki. The name comes from "Dashboards configured on wikis". It is light and performant and requires no setup other than a static webserver. Dashiki is not a tool to do data analysis, it helps you think about information architecture first, plots second. Configuration pages have schemas that are not obvious and not enforced by mediawiki on save, to help with this we're building configuration documentation.

See, for example, how you can browse pageview counts for all our projects on this Dashiki installation: Pageviews for all projects If you are looking to set up a dashboard for a WMF project, the dashboard tutorial explains this in more detail.

Quickstart

  1. Download dashiki:
    git clone https://gerrit.wikimedia.org/r/analytics/dashiki
  2. Follow the README instructions to get a local webserver with which to serve the files.
  3. You are set, you should be seeing some data on http://localhost:<your port>/dist/

Background

Dashiki's history and technical stack:

Intro

Dashiki has multiple "layouts" which are configured into dashboards via wiki pages

Here are two examples of dashiki dashboards layouts:

By "layouts" we mean a certain way to combine metrics and "parameters" like wikis, dates, etc. to present some data. The layouts we support right now are:

  1. metrics-by-projects (example is vital signs, good for wiki centric projects)
  2. compare (dashiki, example is the comparison of Visual Editor to Wikitext data, good for things like A/B testing)

FAQ

Where is the code?

You can browse dashiki's code on GitHub. We use Gerrit Code Review to manage changes. Setting up Dashiki is real simple and you should do that before you read further.

Where is code deployed?

Dashiki instances are listed in deployment configuration

How do I put dashiki out of service?

Modifying this configuration to set outofservice to "true" will make a banner appear in dashiki instances noting that an outage is going on: https://meta.wikimedia.org/wiki/Dashiki:OutOfService

Deployment

It uses fabric from this change onwards: https://gerrit.wikimedia.org/r/#/c/259437/

How to get your data available via http

As long as you're sure they're OK to be shared publicly, from one of the stats machines you can put files into /srv/published-datasets and those will end up on http://analytics.wikimedia.org/datasets/ after an rsync.

It's up to you to organize the cron that gets them there and make a nice folder hierarchy rsync is hourly. Probably you want to use Analytics/Reportupdater for this.

If you want to store your data in a database you can use the staging db available of analytics store, if you have permits to analytics store you should be able to create tables on that database.

Can I implement a new layout?

You can implement new layouts but the main idea is that we do not want to have just plots and plots but rather we want information architecture around a set of plots that lets users infer what is the data available.

What about limn?

The analytics team is not actively developing limn anymore.

Technical Documentation: Understanding Dashiki - First Steps

(Author contact: Helen, at hjiang(at)wikimedia.org)

Continuous editing and addition to be followed.

Preface

You should already have npm installed if you have used JavaScript. Some key dependencies you need to run Dashiki are: glup, karma, bower. If you use other JS 

visualization libraries such as d3 or dygraph, make sure those are installed as well. Dashiki has lightweight access to major JS visualization libraries, but this doesn't 

mean that we can slack off on due diligence :)  Much of it will be covered in the first MWE.

Overview

Dashiki is a client-side dashboard builder by the Analytics teams, which means that you can use whichever server you fancy. It also has a component system and clear patterns, which let you use pretty much any visualization library and data sources. The tests are run with Karma. Default view port is 5000.

All those lingo aside, let’s take a look at the organization and basic use of Dashiki from /src/, and then dig into its more detailed usage and testing from /test..

From /src, there are six major organic parts of Dashiki:

  • /src/app
  • /src/components
  • /src/css
  • /src/fonts
  • /src/layouts
  • /src/lib

Because /src/css and /src/fonts are only tangentially related with our purpose, and are mostly boilerplate, we are focusing on the remaining four components. And this introductory guide will walk through each part of the component, first on a high level, then go into more details.

/src/layouts

Dashiki has many native build-in layouts, and we can build our own custom layouts. The components within /src/layouts are:

  • Compare
    • Used in A/B test scenarios. Data can either be combined or compared side by side. Tabs, time ranges, different visualization styles can be used.
  • Metrics-by-project
    • This setting is a navigator to easily find and visualize configured metrics for any WMF-hosted wiki projects.
  • tabs
    • Displaying graphs organized into tabs

/src/component

View components are the part where people pass on parameters and interact with the dashboard, and there are many different floating pieces in this module. However, because knockout is used to separate domain data, data, and view components, it is in fact fairly easy to plug and play your favorite visualization libraries and data sources.

To fully see the picture of what /src/component provides, a detailed breakdown is as the following, and we will explore them further. The names of those components are usually self-explanatory, so I will only annotate when needed.

  • A-b-compare
    • Note: used in comparison for A/B tests. Often used in conjuction with /src/compare and /src/component/visualizers.
    • A-b-compare
    • Compare-stacked-bars
    • Compare-sunbursts
    • Compare-timeseries
  • Annotation-list
    • Note: straightforward as it is named, this is used to add annotations to graphs.
    • Annotation-list
    • Binding
  • Breakdown-toggle
  • Button-group
  • Compare-layout
  • Dropdown
  • Metric-selector
  • Metrics-by-project-layout
  • Out-of-service
  • Project-selector
  • Table-layout
  • Visualizers
    • Dygraphs-timeseries
    • Filter-timeseries
    • Hierarchy
    • Nvd3-timeseries
    • Rickshaw-timeseries
    • Stacked-bars
    • Sunburst
    • Table-timeseries
    • Vega-timeseries
    • Visualizer
    • Wikimetrics
    • Note: dygraphs, nvd3, vega, rickshaw are names of JavaScript(JS) visualization libraries.

/src/app

We have already touched on that knockout is used in Dashiki to separate data, do and view components. View components are discussed in the section immediately preceding this /src/components, and now we switch gears to look at the data and data sources, and their interactions and conversions.

  • Apis
    • Annotation-api
    • Api-finder
    • Aqs-api
    • Config-api
      • Note: default setting and plain-state URL handling are both based on the config api.
    • Dataset-api
    • Wikimetrics
  • Data converters
    • Note: this is mainly used to transform data sources
    • Annotations-data
    • Aqs-api-response
    • Factory
    • Hierarchy-data
    • Separated-values
    • Simple-separated-values
    • Timeseries-data
      • This defines the key class at the heart of Dashiki.  Data is parsed into this format and visualizers all must understand how to read and represent it.  This format carries label, color, and pattern information for each column.  Instances of TimeseriesData can be merged together so that you may combine separate datasets (for example, to compare them).  This is where control over colors and patterns is useful, so combined datasets can still be distinguished visually.
    • wikimetrics-timeseries
  • ko-extensions
    • Common-viewmodes
      • Copy-params
      • Single-select
    • Async-observations
      • Note: used for asynchronous data sync and observations
    • Datepicker-binding
    • Global-bindings
  • Utils
    • Note: each component has multiple related useful functions to apply to them accordingly. For example, “array” has two different sorting functions and one filter function, “datetime” has formatting and timespan functions, etc.
    • Arrays
      • Functions: sortByName, sortBYNameIgnoreCase, filler
    • Colors
      • Functions: category10 (a color scale function originally from d3.js. This is a more lightweight version - you don’t have to import the whole d3 library to do it)
    • Datetime
      • Functions: formatDate(formats to YYYY-MM-DD type), timespan
    • Elements
      • Functions: getBounds
    • Numbers
      • Functions: numberFormatter
    • Strings
      • Functions: parserFromSample
  • Config.js
    • Note: static configuration object. Looks for config files written by the build,
  • Require.config.js
    • Note: It looks for different JS libraries, semantic elements, configurations, bindings, utis, apis, view models, and converters in global scope.
  • Sitematrix
    • Note: get the sitematrix and parsing it. It holds an application scoped cache once it is initiated.
  • Startup.js
    • Note: everything in here is on global scope.

/src/lib

This is a knockout-related part, handling errors, manage states, and logging, etc. Not as important for pure visualization purposes, but should be kept in mind because it is visualization with JS.

  • Knockout-extensions
    • Knockout-table.js
    • Note: this is the table binding plugin for knockout. Works with require.js. Dan made customized wrap for a proper define.
  • Ajax-wrapper.js
    • Note: Can handle custom headers, and handle errors across all requests.
  • Logger.js
    • Note: because Dashiki is client-side only, so the logger only logs client-side errors. This is a static function that is available site wide.
  • Polyfills.js
  • State-manager.js
    • Note: Mutates and translates URL to application state. If the URL has no state, then it falls back to the config api for default setting.
  • Window.js
    • Note: window as a stand alone mode, usually useful in testing.

Test your dashboard config locally

Setup

Note: In addition to install gulp, you also need to install bower globally, then install bower locally in your directory.

Clone dashiki depot:

git clone ssh://<user>@gerrit.wikimedia.org:29418/analytics/dashiki

Install npm dependencies

npm install

When prompted with It looks like you have a semantic.json file already, select "Yes, extend my current settings". Then install gulp and semantic:

sudo npm install -g gulp
cd semantic
gulp build

Example building your own dashboard locally for testing with gulp:

Go to the root where you have cloned dashiki and do:

python -m SimpleHTTPServer 5000

You have now webserver on port 5000, going to http://localhost:5000 should display a page

Build dashiki now with some test configuration:

gulp --layout metrics-by-project --config Dashiki:VitalSigns

Browse to http://localhost:5000/dist/metrics-by-project-VitalSigns

Alternatively, browse to localhost:5000, from the /dist directory, navigate to the dashboard.

On-wiki configuration

Dashiki dashboards are configured via wiki pages. Here I will walk through a couple minimal working examples (MWE) to illustrate how Dashiki configuration works.

Configuring the metrics-by-project layout

The metrics-by-project layout uses 2 configuration files that you will want to edit:

Dashiki:CategorizedMetrics looks like this:

{
	"categorizedMetrics": [
		{
			"name": "categoryName1",
			"metrics": [...],
		},
		{
			"name": "categoryName2",
			"metrics": [...]
		},
		...
	]
}

ATTENTION: If you introduce a syntax error in this page, all metrics-by-project dashboards will stop working. So, be careful ;). Oh, and BTW, if the metric that you want to display is already in Dashiki:CategorizedMetrics, then you don't need to change this page! Find a category that suits your metric or create a new one. Then, add a new metric config object to the "metrics" array. The config code depends on the data api that you want to use:

With AQS API

The AQS API lets you grab data from AQS and display it in a graph. If you plan to pull data from that API, then your metric config should look like:

{
    "definition": "https://meta.wikimedia.org/wiki/Research:Unique_Devices",
    "name": "MonthlyUniqueDevices",
    "displayName": "Monthly Unique Devices",
    "api": "aqsApi",
    "breakdown": {
        "columns": ["All", "Desktop site", "Mobile site"]
    },
    "granularity": "monthly",
    "annotations": {
        "host": "meta.wikimedia.org",
        "pageName": "Dashiki:MonthlyUniqueDevicesAnnotations"
    }
}

So let’s take it step by step:

  • definition: link to a wiki that defines the metric in plain English
  • name: this is the field that tells Dashiki which data to show in the graph. It must contain a name defined in https://github.com/wikimedia/analytics-dashiki/blob/master/src/app/config.js#L53
  • displayName: just the friendly name to show in the UI
  • api: in this case we would use "aqsApi". If you omit it, it defaults to "wikimetrics", and you can set it to "datasets" as well: https://github.com/wikimedia/analytics-dashiki/blob/master/src/app/apis/api-finder.js#L25.
  • breakdown: if you want to show the metric breakdown selector, you can specify it here. Note that the API should support the chosen breakdown beforehand.
  • granularity: specify the time granularity (hourly, daily, monthly). Also check that the endpoint that you're querying has the chosen granularity.
  • annotations: link to a wiki page that defines annotations for this metric, for example https://meta.wikimedia.org/wiki/Dashiki:MonthlyUniqueDevicesAnnotations. Look at next section for an example.
With datasets API

Let’s look at a datasets example. These are report files stored in https://analytics.wikimedia.org/datasets/periodic/reports/metrics. All those report files have the same format conventions and usually are generated by reportupdater. You can configure a dashboard to grab data from them by using a metric config like:

{
    "definition": "https://meta.wikimedia.org/wiki/Analytics/Metrics/Uploads",
    "name": "Uploads",
    "metric": "multimedia-health",
    "submetric": "uploads",
    "api": "datasets",
    "annotations": {
        "host": "meta.wikimedia.org",
        "pageName": "Dashiki:MultimediaHealthUploads"
    }
}

In this case the properties are (the ones left out behave the same as in previous examples):

  • "name": just the friendly name to show in the UI
  • “metric”: the top level folder to look for data in the datasets server
  • “submetric”: the subfolder to look for data inside the former folder

So, using the datasets api, you will have to specify a “metric” and “submetric” property that will be used to look for your data in a convention-based path: https://analytics.wikimedia.org/datasets/periodic/reports/metrics/{metric}/{submetric}.

The dashboard's own config page

In addition to Dashiki:CategorizedMetrics, you also want to create/edit a config page that will tell dashiki which metrics to show. You should name it Config:Dashiki:{YourDashboardName}. Please without spaces or underscores, and all words' first letters capitalized, like in ExampleDashboardName. The contents of the page should be:

{
    "defaultProjects": [
        "enwiki",
        "eswiki",
        ...
        "zhwiki"
    ],
    "defaultMetrics": [
        "Pageviews"
    ],
    "metrics": [
        "Pageviews",
        "UniqueDevices",
        "MonthlyUniqueDevices"
    ]
}
  • defaultProjects: The projects you want the dashboard to show at first load
  • defaultMetrics: The metrics you want the dashboard to open at first load
  • metrics: The metrics you want the dashboard to provide for the user to choose

Of course, all metrics should be defined in the previous Dashiki:CategorizedMetrics file.


Configuring the tabs layout

In this layout you only need to create/edit one page that will give dashiki all necessary information to render your dashboard. The page should be named Config:Dashiki:{YourDashboardName}. Please without spaces or underscores, and all words' first letters capitalized, like in ExampleDashboardName. The contents of the page should be:

{
    "title": "Dashboard Title",
    "subtitle": "Dashboard Subtitle",
    "desc": "Dashboard description...",
    "tabs": [
        {
            "title": "Tab title 1",
            "dataRange": {
                "startDate": "2015-06-01"
            },
            "graphs": [
                {
                    "title": "Graph title A",
                    "type": "dygraphs-timeseries",
                    "path": "path/to/your/report_file.tsv",
                    "format": "kmb|percent| any other format supported by numeral.js. example '0,0'",
                    "doNotParse": "column that shouldn't be parsed by Dashiki"
                },
                {
                    "title": "Graph title B",
                    "type": "hierarchy",
                    "path": "path/to/your/report_file.tsv",
                    "pivot": {
                        "dimension": "column-to-pivot-on",
                        "metric": "column-to-roll-up-as-a-sum-when-pivoting"
                    }
                },
                ...
            ]
        },
        {
            "title": "Tab title 2",
            "dataRange": {
                "startDate": "2015-06-01"
            },
            "graphs": [
                ...
            ]
        },
    ]
}

The meaning of the fields is pretty much self explanatory except for maybe:

  • startDate: The fist data point to display
  • type: Type of visualization. If the data to display is a timeline, you'll want either "dygraphs-timeseries" (line chart) or "table-timeseries" (interactive table). If the data is well formatted, the "hierarchy" type will graph an interactive sunburst chart.
  • path: The path to the TSV report file, relative to: https://analytics.wikimedia.org/datasets/periodic/reports/metrics/

Configuring annotations

All Dashiki line charts support annotations. To add annotations to a graph in your dashboard, you need to create/edit an additional config page. Its name should be Dashiki:{YourMetricName}Annotations. It should look like:

[
    {
        "start": "2016-09-13",
        "end":   "2016-09-13",
        "note":  "Whatever you want to annotate to this date"
    },
    {
        "start": "2017-01-05",
        "end":   "2017-02-07",
        "note":  "Whatever you want to annotate to this date range"
    },
    ...
]

And then you'll need to add a link to that page to your dashboard/metric config.

metrics-by-project layout
For this layout you should add the link to Dashiki:CategorizedMetrics.
tabs layout
For this layout you should add the link to your dashboard's dedicated config page.

In both cases, you should add a subsection inside the metric/graph json object, that looks like this:

{
	...
	"annotations": {
	   "host": "meta.wikimedia.org",
	   "pageName": "Dashiki:YourMetricNameAnnotations"
	},
	...
}