Search Platform/Weekly Updates/2023-07-28
Appearance
Summary
The team is focused on the Search Update Pipeline and Improvements to Multilingual Zero-Result Rate.
A new weekly sync meeting around the deployment of the Search Update Pipeline to Flink on k8s is helping unblock various questions and decisions (Thanks Luke!). Service Ops, Data Platform Engineering, Search Platform and Data Platform SRE are involved.
The integration tests for CirrusSearch are flaky (they have been for a long time) and causing delay in getting work merged. We might need to invest more time into reworking those instead of fixing individual errors as they come (and come again).
The Q&A session around the "unpacking analyzer" work was well received.
What we've accomplished
Search Update Pipeline
- Flink Zookeeper cluster is up, still needs monitoring/alerts https://phabricator.wikimedia.org/T341792
- Monitor CirrusSearch update lag, we now have metrics and a dashboard tracking update lag: https://grafana.wikimedia.org/d/8xDerelVz/search-update-lag-slo - https://phabricator.wikimedia.org/T320408
- Add support for redirects in CirrusSearch - https://phabricator.wikimedia.org/T325315
Improve multilingual zero-results rate
- Delays due to instability of integration tests for CirrusSearch
- Working on acronym/WBH write-up and while working through my explanation I realized I missed an edge case for the acronym_fixer that affects Brahmic scripts (Bengalis, Hindi, Khmer, Thai, etc.). Need to do a few more checks and then I'll have another small patch up, which will delay reindexing a little, alas. https://phabricator.wikimedia.org/T170625
Operations
- Failures for Cirrus Search dumps for wikidatawiki, zhwiki, enwiki, related to problematic error handling in HTTP connection - https://phabricator.wikimedia.org/T341058
- Reimage wdqs20[13-22] servers to Bullseye - https://phabricator.wikimedia.org/T328325 / https://phabricator.wikimedia.org/T331300
- Undeleted Wikidata items do not reappear in WDQS: Wikidata was not sending revision_id for undelete events, this is fixed in EventBus - https://phabricator.wikimedia.org/T341905
- Investigate WDQS categories update failures on Bullseye hosts - https://phabricator.wikimedia.org/T342060