Data Platform/Data Lake/Events
< Data Platform | Data Lake
(Redirected from Analytics/Data Lake/Events)Accessing Event Data
As of September 2020, you have a choice of three engines that can run SQL queries against the Data Lake: Presto, Hive, and Spark. If you're not sure which to choose, Hive is good to start with. All three engines can be used from the Analytics clients.
- WMF maintains 2 separate git schema repositories:
- Schemas and schema status for the old EventLogging system are documented at Research:Schemas
- Event vs event_sanitizied*
- The
event
database stores original (unsanitized) events within a 90 day retention period, whileevent_sanitized
database is an archive of sanitized events, beyond the 90 day retention period. - Sanitized event data is processed per WMF’s Privacy Policy and Data Retention Guidelines.
- The