User:Milimetric/Zarchive/DataResearch/VisualEditor

From Wikitech

Lessons learned:

  • we can't count by any time buckets because some editing sessions are always going to span multiple buckets.
  • joining on editingSessionId has to be done carefully, sanity checks recommended
  • saveIntent can show up twice in the same editing session.
  • [THIS IS A BUG] the same editingSessionId can be shared by two totally separate editing sessions. It's hard to know for sure, but assuming the clientIp is unique per "real" session, we can solve our problems by using (clientIp, editingSessionId) as the key.
  • Working with MySQL is a huge pain in the @$$. I think moving the data to PostgreSQL would speed up analysis considerably.