Maps/v2/Architecture

From Wikitech
< Maps‎ | v2

Maps V2 - Architecture

Intro

In the v2 iteration of maps one of our major concerns was to break down the maps service to more atomic and decoupled components. On top of that we are trying to improve the OpenStreetMaps data import pipeline with more modern tooling and improve the way we query the spatial DB (PostGIS) for tiles. As an overall improvement we are trying to re-use tech that is already available at the org and replace old components that are deployed just for the maps stack specifically. This is an attempt to improve maintainability from an SRE point of view. Finally we are trying to improve observability by introducing SLOs and better metrics

Notable changes

  • Introduced a new tool for importing OSM data: imposm3
  • Introduced a new server to handle vector tile requests: tegola
  • Replace tilerator and its queuing mechanism with functionality from
    • tegola (map tile caching/invalidation)
    • kafka as an event bus
    • k8s primitives like cronjobs
  • Stop using mapnik for generating vector tiles, mapnik now is only used for rendering raster tiles
  • Infrastructure upgrade: Debian Buster, Postgis 3.1, and PostgreSQL 11

Components

Postgres/PostGIS

PostGIS is a Postgres extension that adds spatial capabilities. This is the core of where all the spatial information are stored. Except of the actual information from OpenStreetMap we also store logic in the DB level which includes:

Imposm

Imposm is our OpenStreetMap importing tool. It runs as a daemon and based on a mapping it decides which features of OpenStreetMap is going to be imported and in which table.

Kartotherian

Kartotherian is the public facing service for all https://maps.wikimedia.org and maps related requests. It as a nodejs based using service-runner. The major endpoints that it exposes are:

  • Vector tile endpoint
    • Heavily compressed, protocol buffer based representation of spatial information for a specific map tile
  • Raster tile endpoint
    • Rendered version of a vector tile based on a specific styling
  • Map snapshots
    • Raster image of a selection of the map

At its core kartotherian is heavily based on mapnik (specifically the nodejs bindings), a library for processing and visualizing spatial data. In order for mapnik to know where to consume spatial data from, what kind of data it needs and how to render them, it uses the following configurations:

Finally one of the important functionalities that kartotherian is taking care of is localization. Depending on the configuration kartotherian can localize the text content of the tiles by unpacking the vector tile, localizing the labels that need to be localized and re-packing.

Tegola

Tegola is a vector tile server written in golang. Given a description of what kind of data backends exist and how to query them, it combines different spatial queries to vector map tiles and serves them using a REST API. Internally we are using its PostGIS MVT capabilities. The idea behind this is that PostGIS, following a specific spec, can be very efficient on calculating the actual vector tile rather than being just a backend for consuming spatial data.

Except of acting as a vector tile server, tegola provides the following the functionality:

  • Caching
  • CLI tooling
    • Utils for cache seeding (tile pregeneration)
    • Utils for cache purging (tile invalidation)
  • Prometheus metrics

EventPlatform

WMF has very extensive infrastructure for producing/validating/consuming events. In our case this was a good fit to be used as an EventBus to be able to send cache invalidation events on each OpenStreetMap import. Specifically:

Kubernetes

Kubernetes is responsible for hosting the tegola server but also the automation around handling expired tiles. In our case a kubernetes cronjob will

  • Run on a specific interval (eg. once a day)
  • Fetch tile expiration events
  • Batch them and populate a list to be used as input for tegola
  • Run a command for tegola to handle stale tiles in our cache (swift).

Diagram