Maps/ClearTables/Loading

From Wikitech
< Maps‎ | ClearTables(Redirected from Maps/Loading)
Jump to navigation Jump to search

OpenStreetMap data is loaded with aid of the script process-osm-data.sh

Access required

Access to the osmupdater and postgres accounts is required

Resetting

Don't follow this section without understanding what data you're going to wipe

When repeatedly testing, it's useful to reset everything. This can be done with

  1. Disable cron jobs through puppet
  2. Remove downloaded files with sudo -u osmupdater rm -rf /srv/osm_replication/cleartables.json /srv/osm_replication/comments.sql /srv/osm_replication/data/ /srv/osm_replication/planet/
  3. Stop tilerator and kartotherian and remove any connections with sudo -u postgres psql -d postgres -c "SELECT pg_terminate_backend(pid) FROM pg_stat_activity WHERE datname = 'ct'"
  4. Drop the database and recreate it with the right extensions with sudo -u postgres dropdb ct && sudo -u postgres createdb --encoding utf8 ct && sudo -u postgres psql -Xq -d ct -c 'CREATE EXTENSION postgis; CREATE EXTENSION hstore; CREATE EXTENSION unaccent; CREATE EXTENSION fuzzystrmatch;'
    This must be done in one command to avoid a period where monitoring might find strange things.

Downloading

  1. Find the most recent planet file on https://planet.openstreetmap.org/pbf/. Don't use -latest, as this doesn't leave a record of what file was used, and the file can change at any moment. For this example, the file is https://planet.openstreetmap.org/pbf/planet-180409.osm.pbf is used, but it will be different when you read it.
  2. Find the appropriate state.txt file for the planet. The easiest way to do this is
    1. Start with osmium-tool's fileinfo command, using curl -x webproxy.eqiad.wmnet:8080 -sL https://planet.openstreetmap.org/pbf/planet-180409.osm.pbf | osmium fileinfo -g header.option.osmosis_replication_timestamp -F pbf -. This only reads the header of the file, so it doesn't need to download the full file. It will show you a timestamp like 2018-04-09T02:00:01Z.
    2. Browse https://planet.openstreetmap.org/replication/minute/ to find a state.txt file two hours before the timestamp. The files are in a NNN/NNN/NNN structure, and higher numbers are more recent, so it should be towards the top of the list. In this case, the file is https://planet.openstreetmap.org/replication/minute/002/918/934.state.txt. The two hours allows a buffer for mistakes.
    3. Double-check the state.txt file by looking at it. It contains a timestamp parameter, e.g. timestamp=2018-04-08T23\:00\:01Z. Double-check that this is before the date on the planet. The most common problem here is getting the wrong state file.
  3. Run process-osm-data setup data_url state_url replication_url with a proxy set and as osmupdater, e.g. sudo -u osmupdater https_proxy=http://webproxy.eqiad.wmnet:8080 process-osm-data setup https://planet.openstreetmap.org/pbf/planet-180409.osm.pbf https://planet.openstreetmap.org/replication/minute/002/918/934.state.txt https://planet.openstreetmap.org/replication/minute. This may take some time, as it has to download about 40gb. If using an extract, double-check all URLs are HTTPS.

Database checks

  1. Stop existing updates by disabling puppet cron jobs
  2. Check that the postgis, hstore, unaccent, and fuzzystrmatch extensions are installed with sudo -u osmupdater psql -d ct -c '\dx'. If you dropped the database earlier, see the reset section. If the DB was set up by operations, talk to them, because they're supposed to automatically be present, and something went wrong. The plpgsql extension should automatically be present.
  3. Make sure the ct database is empty with sudo -u osmupdater psql -Xq -d ct -c '\d'. The only relations should be geography_columns, geometry_columns, raster_columns, raster_overviews, and spatial_ref_sys.

Database setup

  1. Create the loading schema with sudo -u postgres psql -d ct -c 'CREATE SCHEMA IF NOT EXISTS loading; GRANT ALL ON SCHEMA loading TO osmupdater;'
  2. Grant default permissions to the rendering users with sudo -u osmupdater psql -d ct -c 'ALTER DEFAULT PRIVILEGES IN SCHEMA public, loading GRANT SELECT ON TABLES TO kartotherian, tilerator, tileratorui;'

Update borders

Updating borders also updates the local copy of the planet on disk and makes it quicker to catch the database up to current data later, so it gets done first.

  1. sudo -u osmupdater JAVACMD_OPTIONS="-Dhttp.proxyHost=webproxy.codfw.wmnet -Dhttp.proxyPort=8080 -Dhttps.proxyHost=webproxy.codfw.wmnet -Dhttps.proxyPort=8080" process-osm-data planet-update
  2. Run it a second time. This isn't necessary if the planet file is very recent, but otherwise it makes sure you have the latest data.

Note: it's expected to see a couple of warnings/errors at this point since the data import isn't yet completed:

Geometry error on way 501247797: need at least two points for linestring (way_id=501247797)

NOTICE: table "osmborder_lines" does not exist, skipping

Importing OSM data

  • This should be done in `tmux`, `screen`, or another method to keep the process running
  1. Boost the number of max connections with sudo -u postgres psql -c "ALTER SYSTEM SET max_connections = 400;" && sudo service postgresql restart
  2. Import the database with sudo -u osmupdater process-osm-data import. This will take a day or two.
  3. Reset the number of max connections with sudo -u postgres psql -c "ALTER SYSTEM RESET max_connections;" && sudo service postgresql restart

Update database

Updating the database isn't strictly necessary, but it confirms that everything is working

  1. sudo -u osmupdater JAVACMD_OPTIONS="-Dhttp.proxyHost=webproxy.codfw.wmnet -Dhttp.proxyPort=8080 -Dhttps.proxyHost=webproxy.codfw.wmnet -Dhttps.proxyPort=8080" process-osm-data database-update

Load static data

  1. sudo -u osmupdater https_proxy=http://webproxy.eqiad.wmnet:8080 http_proxy=http://webproxy.eqiad.wmnet:8080 process-osm-data static-update

Database checks

This section is optional The data is now loaded, but it's worth checking the database to make sure an error wasn't missed.

  1. Check the osm2pgsql rendering tables are loaded with sudo -u osmupdater psql -d ct -c '\d+'. The exact output changes depending on how much data is loaded and with style changes, but it should look something like
 public | addresses_point           | table | pnorman | 3838 MB | Addresses mapped as points
 public | addresses_polygon         | table | pnorman | 8865 MB | Addresses mapped as polygons
 public | admin_area                | table | pnorman | 4850 MB | Administrative boundaries as areas
 public | admin_line                | table | pnorman | 4897 MB | Administrative boundaries as lines
 public | aeroway_area              | table | pnorman | 3624 kB | Airport-related area features
 public | aeroway_line              | table | pnorman | 36 MB   | Airport-related linear features
 public | airport_point             | table | pnorman | 3056 kB | Airports mapped as points
 public | airport_polygon           | table | pnorman | 10 MB   | Airports mapped as polygons
...
  1. Check the slim tables have loaded with sudo -u osmupdater psql -d ct -c '\dt+ planet_osm_(ways|rels)' && sudo -u osmupdater du -h /srv/osm_replication/planet/nodes.bin.
                          List of relations
 Schema |      Name       | Type  |   Owner    |  Size  | Description
--------+-----------------+-------+------------+--------+-------------
 public | planet_osm_rels | table | osmupdater | 15 MB  |
 public | planet_osm_ways | table | osmupdater | 778 MB |
(2 rows)

42G     /srv/osm_replication/planet/nodes.bin
  1. Check the border table is loaded with sudo -u osmupdater psql -d ct -c '\dt osmborder_lines'.
               List of relations
 Schema |      Name       | Type  |   Owner
--------+-----------------+-------+------------
 public | osmborder_lines | table | osmupdater
(1 row)
  1. Check the static data metadata table with sudo -u osmupdater psql -d ct -c 'SELECT * FROM external_data;'.
           name            |         last_modified
---------------------------+-------------------------------
 ne_lake                   | Fri, 27 Oct 2017 17:58:49 GMT
 ocean_polygons            | Tue, 27 Mar 2018 07:32:33 GMT
 simplified_ocean_polygons | Tue, 27 Mar 2018 07:31:10 GMT
 ne_ocean                  | Fri, 27 Oct 2017 17:58:50 GMT
 ne_places                 | Fri, 27 Oct 2017 17:57:33 GMT
(5 rows)

Note: ne tables may have a last_modified over a year ago, and the other tables may be a few weeks old depending on OSM coastline issues. You can see the last update dates at openstreetmapdata.com

  1. Check rendering read access with

sudo -u tilerator psql -d ct -c 'SELECT 1 FROM addresses_point LIMIT 0; SELECT 1 FROM ocean_polygons LIMIT 0; SELECT 1 FROM osmborder_lines LIMIT 0;'

sudo -u kartotherian psql -d ct -c 'SELECT 1 FROM addresses_point LIMIT 0; SELECT 1 FROM ocean_polygons LIMIT 0; SELECT 1 FROM osmborder_lines LIMIT 0;'

This should report no access errors

 ?column?
----------
(0 rows)