Analytics/Archive/AQS -RESTBase

From Wikitech

To serve Pageview data via an API, we need to store the data somewhere after it's processed in Hadoop. This page details our options for data stores and our decision.

I (Milimetric) have to fill this in more, but basically:

  • We looked at PostgreSQL to store denormalized data in a data warehouse type of schema
  • Looked at Cassandra and the way RESTBase integrates with it and creates keyspaces for us
  • Looked at Druid and the more advanced analytics it enables

The choice came down to meeting the immediate need with the lowest cost solution. The basic endpoints that we needed to work on as part of project code name {slug} this quarter were very simple. Simple enough to pre-aggregate data easily in Hadoop and load it into Cassandra with relative ease. So, for now, we are going to use RESTBase with its built-in Cassandra support. We decided that more complicated queries would never be pre-aggregated, and that we would deploy Druid when we need to implement something like that. At that point, we would revisit this page.