This is an essay on why I think we need production-like environments.

Production-like environments!

The Princess Bride

This is a fairy-tale. A user from Finland reports a problem on the Finnish Wikipedia, which seems to have changed behaviour overnight. The user opens a ticket in Phabricator about this.

Princess Buttercup is assigns the ticket to herself, and starts looking into it. She can reproduce it in the actual site. It seems like a weird bug, and her hunch is that it's due to a problematic interaction between the user's locale, the specific version of MariaDB, and the specific version of PHP7 being used, or some subset of the three.

The princess sets up a private Wikimedia production instance, which has all the relevant components (MediaWiki, the extensions in production, a database, and a few other bits and bobs), at the same versions, and attempts to reproduce the problem. 'Lo and behold, she can!

Given an easy way to reproduce the problem, Buttercup quickly experiments by trying different versions of PHP7 (no help), MariaDB (no help), and glibc (boom!). The culprit seems to be the way a new glibc version handles locales. Downgrading to the previous version of glibc, the problem goes away. Upgrading glibc brings it back.

Her Highness digs in a little deeper, and zeroes in on the actual change. The new glibc version changes collation order for the user's locale, and this causes in a problem. The MediaWiki extension code is written to assume the old collation order.

Buttercup does some quick experimentation and comes up with a fix to the extension. She deploys it to her testing environment, and proves that it works with the new glibc version. She pushes the changes to Gerrit, and it is quickly reviewed by another MediaWiki developer, and then automatically deployed to production. Time from ticket being opened until fix is in production: a few hours.

What are they?

A production-like environment is a host, container, or set of hosts or containers that mimics some or all of Wikimedia's production environment for running our various sites, services, and infrastructure, for some specific purpose.

Everything in production-like environment is similar enough to actual production that if something works in the production-like environment, it almost certainly works in actual production. Also the opposite is true: if something fails to work in the production-like environment, it will likely not work in actual production either.

"Works" here means that someone using (to test or for real) does not see any significant difference with real production. This includes automated test suites.

Use cases

There are a number of reasons why different people might want to set up a production-like environment. As a result, the environment they set up may be different, and may even be implemented in entirely different ways. Some examples of use cases follow. This is not intended to be an exhaustive list, but please suggest missing use cases.

Personas

The following personas are involved in the use cases. (They're not used very explicitly, sorry.)

MediaWiki developer
MediaWiki extension developer (FIXME examples)
MediaWiki service developer (Parsoid, Citoid, ...)
Infrastructure service maintainer (Gerrit, Phabricator, ...)
SRE team member
RelEng team member
Manual exploratory tester
Product manager

Use case A: MediaWiki local development

MediaWiki developers and extension developers need a production-like environment to write, test, and debug their code. This would be the "local development" environment. Having a new, fresh environment for each development makes development and debugging simpler. Having it locally makes the innermost loop of development (the edit-build-run loop) nice and tight, which is a pre-condition for speedy development.

Use case B: MediaWiki services development

MediaWiki service Developers need their own environment. We're moving to deploying these into containers running under Docker orchestrated by Kubernetes, so the production-like environment should probably also be Docker, using the identical images as production uses. However, the services may need access to the database used by the MediaWiki they serve, and perhaps other MW services, and possibly external services. Instead of the real databases or services, sufficiently good test doubles may be substituted, but they need to be there if the service needs them.

Use case C: scap and other deployments

SWAT and train deployment uses the scap tool, and probably other automated deployment tooling. Development and configuration of the tooling, and further automation of the deployment process, would benefit from having a safe production-like environment where to deploy to. For example, database schema changes can be awkward to test if they can only be tested in on actual production servers.

Such an environment would be useful also for those who want to learn how to do deployments: they can set up the environment and can safely do any number of deployments without affecting any other people.

This would also be useful for development of the deployment pipeline itself: pipeline changes could be tested safely.

Use case D: demo/showcase/manual testing

It would help to be able to deploy specific versions to a production-like environment, for demos and showcases of upcoming features, to do manual exploratory testing, or for developers to do end-to-end testing of functional changes. It would need to be very easy ("push-button") to deploy a desired set of changes to a production-like environment.

This should deploy either any component version that's been deployed to production previously in recent history (say, in the past two weeks), or a version modified locally by a developer, even before the change has been merged. Possibly even before the change is pushed to Gerrit.

Use case E: CI testing

For CI to do end-to-end testing of the entire set of software running in production, it helps to be able to set up a production-like instance and deploy only the modifications being considered for merging, and only merge if the ephemeral instance passes automated tests. If the environment CI is testing isn't sufficiently production-like, it is likely to not find problems that will crop up if the version gets deployed to production.

Use case F: debugging

Sometimes we end up with problems in production. It helps to debug them if we can reproduce the problem in a separate, isolated production-like environment.

This is not always possible: for example, if the problem is related to the volume of traffic in production, this may be impossible to reproduce. However, not all problems are like that, and many problems can, in fact, be reproduced in an environment sufficiently similar to production.

A fix can then be deployed to the reproduction environment to check that it actually fixes the problem.

Use case G: documentation

WMF does not seem to have good documentation of how production is set up. If it does, I've entirely failed to find it.

Having an automated way to set up a production-like environment would be a kind of documentation. Best of all, if real production and (some) production-like environments get set up using the same automated tools and recipes (puppet manifests, etc), then the documentation is up to date and correct.

Use case H: improving production

Does production work if you upgrade the database engine? If you replace the HTTP cache software with another? Can production handle a specific level of simulated traffic load? These kinds of questions are difficult to answer if they have to answered by changing actual production. Having a separate production-like environment allows much more freedom to experiment boldly and safely.

Use case I: infrastructure

Production doesn't just run Wikipedia and other wikis. We have supporting services, such as Gerrit, Phabricator, CI, Mailman, etc. A solution for production-like environments could, and probably should, cover these as well.

On "staging"

WMF has had a number of discussions of setting up a "staging" environment, with many different and sometimes conflicting opinions on what "staging" means. I don't want to be re-hashing past discussions. I bypass them and avoid using the term "staging" in favour of "production-like environment".

Specification

Exactly what should a production-like environment consist of? It depends on the use case. As a lower bound, a full production-like environment should run the same software components, at the same versions, connected in the same ways, as actual production runs. This means, the same versions of MediaWiki, extensions, services, database engines, etc.

If a production-like environment only aims for part of actual production, such as only running a specific, stateless MediaWiki micro-service, which doesn't require anything outside it's own container, such as Blubberoid, then other components are not needed for that production-like environment. But what is included, needs to be as similar to what is in production, modulo any changes that are being developed, tested, or demonstrated.

If we ran all of Wikimedia's production on one server, things would be easy. A production-like environment would then just be a second server with all the same software, except possibly fewer CPUs, less RAM, and less disk space. However, reality is somewhat more complex.

The biggest and most obvious difference between actual production and production-like environments is likely to be capacity. Actual production needs enough CPU, RAM, disk space, network bandwidth, and other hardware resources, to handle real traffic. The various Wikipedias get a lot of traffic.

A production-like environment only needs enough capacity for it use case. Our actual production runs on hundreds of bare metal servers, to have the capacity to handle all the traffic. A production-like environment for testing a new version of the Blubberoid service might only need a tiny fraction of a full server, just one pod on Kubernetes or Minikube.

Let me emphasize that it is crucial that actual production and production-like environments are built using the same scripts and other automation. In fact, ideally production would be rebuilt regularly from scratch (from empty bare metal servers, with wiped hard drives), and all all production-like environment as well.

One size does not fit all for this. We will need to judiciously design an environment for each use case separately, including how many capacity to give it, but also what components to include. However, it is important to keep each such environment as close to the similar part of the actual production environment as possible.

In more concrete terms, and admitting I'm a little hazy as what running MediaWiki actually requires, a first stab for use case A might be:

a host (container or VM) running Debian
same version of Debian, PHP, and SQL database engine as in production
MediaWiki and the same set of extensions as in production installed, from git
any services needed by MediaWiki or its extensions
any additional software needed by SRE, such as puppet, monitoring

This is very similar to "mediawiki-vagrant", and the Docker based work currently going on for "local dev". The container or VM should be built the same way as production deploys Debian, PHP, MediaWiki, its extensions, and database engine, and any other components included.

The above uses a single host. This may not be sufficient for everything, and it might be a good idea to generalise it to be N hosts.

Call for action

Do you think what I'm trying to argue for here is a good goal? I'm not asking if it's feasible, only whether it's a good goal. Obviously I think it's a good goal, and I hope I'm not alone in that.

I understand that it's a big goal, and achieving it will require a lot of work, and may take a lot of time. I have three things to say about that.

First, while the end goal is big, there are smaller goals and milestones on the way that are useful in and of themselves, even if the big goal is never reached. I claim and predict that achieving a modest sub-goal is beneficial, and could be revolutionary.

Second, if you don't aim high, you'll just end up shooting off your own leg. To build and maintain momentum and morale, it is good to have goals that are easy to achieve quickly, but it is also good to dream big. The road to revolution is travelled by achieving small wins one after another.

Third, sometimes it's best to ignore naysayers. I was part of the genesis of Linux. The first couple of years many, many people told us that creating a complete operating system was a silly and impossible goal. Linux would never run on anything but a specific model of the PC. Linux would never run on anything but a PC. Linux would never run on laptops. Linux would never support networking. Linux would never be used by serious businesses. Linux would never have a desktop environment, unless it support every graphics card on the market. Linux would never have a web browser.

All of those things would be impossible to achieve, because they were beyond what even a brilliant hobbyist could ever do on their own. That was true, but it wasn't a single hobbyist doing everything, after the first few weeks or months. First there was a second contributor, then a handful, then a few handfuls, and eventually, thousands and tens of thousands.

I'm writing this on a laptop running Linux, with a graphical desktop environment, in a world where Linux rules the server, supercomputer, mobile phone, and IoT markets.

Never underestimate the productivity of an Internet full of enthusiastic hackers.

I want to be able to set up a small copy of Wikimedia's production to develop, test, and demonstrate changes to software used by Wikimedia. I want it to be easy, and quick. I can't do it all myself. Who's with me?