Data Engineering/Systems/Cluster/Geotagging

From Wikitech

Geotagging functions in Hadoop are provided by jars available at hdfs:///wmf/refinery/current/artifacts


refinery-core.jar exposes two functions

Function Name Data Returned
getCountryCode(String ip) country code
getGeocodedData(String IP) <map> containing geocoding information:
  • continent
  • country_code
  • country
  • subdivision
  • city
  • postal_code
  • latitude
  • longitude
  • timezone


This library provides wraper functions usable as a hive UDF

Hive UDF Wrapped Function


These functions use a regularly updated (every week) version of the MaxMind database that is downloaded on every node of the cluster in the folder /usr/share/GeoIP.