|This page is currently a draft.|
More information and discussion about changes to this draft on the talk page.
These tables contain ORES scores for MediaWiki revisions and pages.
- ores.revision_score – Normalized table of ORES revision scores, partitioned by model version.
- ores.revision_score_public – Scores with public context like editor and revision metadata.
Data is transformed and imported into these tables in several steps.
- Import recent ORES revision scores.
- Backfill old revisions so that we have a complete set of scores.
- Join scores with historified context.
- Monthly "current" dumps using the most recent available model versions.
- Monthly "historical" dumps which include all available scores, from any model version.
Open questions and concerns
Mixed model_versions: We can't calculate scores with an old model version once a newer one has been deployed, which is problematic for backfilling. Our current workaround will be to backfill using an arbitrary, current model version. For the same reason, even the "current" dump file will include heterogenous scores from different model versions. Clients will have to take this into account. In the future, we might be able to run older models using Spark and backfill completely.