User:Milimetric/Notebook/Visualization

From Wikitech

On Visualization

This is just my perspective on the data visualization work I've been involved with while working at WMF, so far. As a timeline, it looks like this:

  • 2003: a basic page counter inspires Erik Zachte to write Wikistats.
  • 2012: two people try to create a visualization framework that allows you to shape data into images and tell stories. That project, Limn, turns out to be too ambitious for our resources and is sunset.
  • 2014: Promising to deliver all data as canned queries executed on arbitrary groups of users, Wikimetrics is ultimately too complicated to administer and suffers from poor product ownership.
  • 2015: Dashiki is a lightweight way to get dashboards up from standard, convention-following reports run by Report Updater. Lack of continued resourcing sees it fall prey to bit-rot.
  • 2015: The Graph Extension is launched and promises to allow users to visualize any data with the full power of D3. Ultimately ignored and falls into disrepair.
  • 2018: MediaWiki History is built to parallel and improve on the dataset Wikistats computes every month from the dumps. It uses the databases as a source and is theoretically able to deliver much faster results. We don't start capitalizing on this theoretical possibility until 2024.
  • 2018: Wikistats 2 is launched, following a good round of community consultation with Wikistats 1 users. However we fail to maintain a good relationship with the community and be responsive to their needs as Erik Zachte was. Ultimately, Wikistats 2 work is deprioritized and it begins to bit rot.

In general I look at this and see a lack of vision and long-term commitment to solving problems. New folks come in, decide that a problem is of high importance, work on it, leave, and the solutions fall into disrepair. A little more detail on this for each technology mentioned above.

Wikisats (the original)

This was a brilliant piece of work. The article linked above shows how the community and Erik worked together to build this way of looking at themselves and their work. It looked yellow and outdated in 2018, but it was fundamental to community health. The fact that we allowed that connection between community folks interested in data and the WMF to fade away is one of my biggest regrets. We tried as a team to maintain it but learned that little can be done without organizational support, indeed in direct opposition to leadership decisions.

Limn

The notion that two people could write something as ambitious as Limn was as thrilling as it was wrong. Limn is pride in a nutshell. There was this easter egg in the code that would pop up in the UI this quote from Rainer Maria Rilke (which you won't find because it's originally in German):

Winning does not tempt that man.
This is how he grows: by being defeated, decisively,
by constantly greater beings.

From a technology and project management perspective Limn made some simple mistakes. But from a philosophical perspective, it was always meant to, so... success? Sometimes we shoot ourselves in the foot.

Wikimetrics

We set out to solve the problem of delivering answers to data questions that both staff and volunteers had. We did a bad job doing product design and research and believed different folks who told us that it was important to compute such answers for groups of anywhere between a few to a few million users. And we needed to get answers across any and all of our 900+ MediaWiki database replicas. This was a significant technical challenge that cost us approximately 9 years of engineer time (an average of three developers over 3 years). I found out at Wikimania one year that Wikimetrics had been abandoned in favor of a tool that took someone a couple of weeks to build because it was unnecessarily clumsy and complicated to use for the use cases they had. It turned out those use cases were what the vast majority of stakeholders needed from Wikimetrics and the use cases we were trying to solve were only helping around 10 people. We abandoned the project.

Dashiki and Report Updater

These two tools took weeks to write and minimal effort to maintain and they provided tons of reports and dashboards to all kinds of users. I look at them as a success in minimalism that was largely misunderstood by the organization. I attribute that to poor cross-team communication which is a problem that we've seen much improvement on due to Will and Leila's amazing work with Effective and Responsible Communication.

Graph Extension

MediaWiki has trouble with dynamic derivative content. As in, you write a graph definition in wikitext and the output of that changes as data dependencies change. Because of this fundamental limitation, some ugly hacks had to be put in place to make the Graph Extension work. It's actually admirable that these hacks were possible and executed at all. But over time this bugged people and with the extension receiving zero support it resulted in lots of headaches. We examine all of this in detail on this Graphoid RFC. Making visualization easy for everyone to use was a dream that Limn started with, and a worthwhile one. I think a true collaboration across the org could make this happen but we need to think about all the implications up front so we don't hack our way through again.

MediaWiki History

This dataset is great, we just need to keep going with the dumps 2.0 effort to get better at speeding up this kind of pipeline. When data is fast enough and easy enough for community to use in their daily work, this dataset will have achieved its potential, which is to provide the kind of value Erik Zachte provided with his hard work on Wikistats 1.

Wikistats 2

This is really just a simple front-end on top of MediaWiki history and AQS. It does a good job of wrangling dimensional data and it strikes a good balance of simplicity and features with its visualizations. Good design went into it by Ash Grigas and it shouldn't be ignored.