Jump to content

User:To delete/ORES

From Wikitech
(Redirected from Analytics/Data Lake/ORES)
This page is currently a draft.
Material may not yet be complete, information may presently be omitted, and certain parts of the content may be subject to radical, rapid alteration. More information pertaining to this may be available on the talk page.

These tables contain ORES scores for MediaWiki revisions and pages.

Datasets

Jobs

Data is transformed and imported into these tables in several steps.

  • Import recent ORES revision scores.
  • Backfill old revisions so that we have a complete set of scores.
  • Join scores with historified context.
  • Monthly "current" dumps using the most recent available model versions.
  • Monthly "historical" dumps which include all available scores, from any model version.

Open questions and concerns

Mixed model_versions: We can't calculate scores with an old model version once a newer one has been deployed, which is problematic for backfilling. Our current workaround will be to backfill using an arbitrary, current model version. For the same reason, even the "current" dump file will include heterogenous scores from different model versions. Clients will have to take this into account. In the future, we might be able to run older models using Spark and backfill completely.