Incidents/2018-11-06 maps
Appearance
document status: final
Summary
Tilerator failed on maps100[1-3]. Tilerator is a non-public service to prepare vector tiles (data blobs) from OSM database into Cassandra storage. This happened on the 6th November 2018. Icinga first reported this failure around 00:18 UTC.
Timeline
This is a step by step outline of what happened to cause the incident and how it was remedied.
00:18 UTC: Icinga reported failure of Tilerator ports :
PROBLEM - tilerator on maps1003 is CRITICAL: connect to address 10.64.32.117 and port 6534: Connection refused
1:19 AM PROBLEM - tilerator on maps1002 is CRITICAL: connect to address 10.64.16.42 and port 6534: Connection refused
1:19 AM PROBLEM - tilerator on maps1001 is CRITICAL: connect to address 10.64.0.79 and port 6534: Connection refused
07:25 UTC: Tilerator Service was restarted on maps100[1-3]
07:26 UTC: Tilerator Service came back up.
Conclusions
- There's need for a non paging alert whenever problem arises and persists.
- Similar problems have occurred around Tilerator which was caused by lock contentions - https://phabricator.wikimedia.org/T204047
Links to relevant documentation
Maps Runbook: Maps/RunBook
Actionables
NOTE: Please add the #wikimedia-incident Phabricator project to these follow-up tasks and move them to the "follow-up/actionable" column.
- Create Runbook for Maps - Maps/RunBook