Jump to content

User:Milimetric/Notebook/Commons Impact Metrics API

From Wikitech

Notes on working on Commons Impact Metrics

Tech Debt Incurred

  • All AQS repositories should be migrated to Gitlab with proper CI. Instead, we're leaving the existing repositories in Gerrit and starting a new one for Commons Impact Analytics in gitlab. The CI will be basic.
  • A scaffold for AQS repositories and endpoints within those repositories should exist. Without this, AQS is too manual and requires too much time to extend. We are skipping this for now, but it should be part of the project's success criteria (perhaps a time-to-functional-endpoint metric).
  • A fast category-tree parsing algorithm that parses paths from all categories would be a much easier place to start than the limited allow-list version we're looking at now. Optimization of the existing GraphX Pregel algorithm, porting to GraphFrames Pregel, or a new algorithm was postponed.
  • The mediawiki_history_load_dag should be split into separate pipelines waiting for sqoops as they are available instead of blocking all loads on all sqoops. A note was added with details but implementation was postponed.
  • We wanted to properly isolate the HQL scripts in a gitlab repository as we decided to do with computation artifacts, but we instead fell back to putting the HQL in refinery.
  • We wanted to adopt the Cassandra HTTP Gateway maintained by Data Persistence. This would reduce the testing and data layer on our side to simple unit tests and perhaps only a logic layer. This was postponed to ensure we launched the APIs on time.

CI

Trying to copy Jenkins CI (like this) to the new Gitlab repo.