Search Platform/Weekly Updates/2023-02-24
Appearance
Summary
We finally have a successful data reload for WDQS. It still needs to be copied over to all servers. This process is still fragile and problematic: it took 2.5 month to complete a task that should take at most a couple of weeks.
What we've accomplished
Spark 3 migration
- RDF (Java / Scala) jobs are running on Spark 3. This was one of the more complex job to be migrated - https://phabricator.wikimedia.org/T327381
- Migrated drop_old_data_daily from Airflow 1 to 2, needed some additional fixes to date handling.
Search Update Pipeline
- Removed our use of a custom Swift plugin in Flink, now relying on the standard S3 interface.
- Preliminary work on tracking update lag to create a formal SLO - https://phabricator.wikimedia.org/T320408
Operations / SRE
- Documentation for Elasticsearch is updated. The datacenter switch should be fully transparent, with no manual intervention this year - https://phabricator.wikimedia.org/T330417
- Initial data reload on WDQS completed, copy to all servers in progress - https://phabricator.wikimedia.org/T323096
- k8s upgrade went well, with our Flink application reacting as expected.
Misc
- Ongoing work to align the Search Platform team documentation with ERC Team Interface template - https://office.wikimedia.org/wiki/ERC/Search
- Communication on the WDQS data reload issues went out (https://lists.wikimedia.org/hyperkitty/list/wikidata@lists.wikimedia.org/thread/7QTJBRU2T3J22SNV4TGBRML4QNBGCEOU/), this prompted some discussion, mostly about modularity and loose coupling.O