Search Platform/Weekly Updates/2023-11-03

From Wikitech

Summary

Search Update pipeline: we're resolving deployment issues one by one, we're making progress, but we don't know yet what all the issues are going to be (https://phabricator.wikimedia.org/T347075)

Improve multilingual zero-results rate: Performance improvements are done (waiting to be deployed). Work continues on fixing tokens that are erroneously split by ICU.

WDQS graph splitting: work in picking up speed. We have a functional development environment and a first version of code that splits the graph. Collaboration with Scholia and with WMDE is picking up speed as well.

What we've accomplished

Improve multilingual zero-results rate

  • Refactoring of slow analysis components have been merged (still need to be deployed and reindexed). In the process, we now have better ways to catch performance issues earlier during analysis work - https://phabricator.wikimedia.org/T346051
  • Work has started on ICU token repair (ICU erroneously splits token based on script, so words that contain multiple scripts are broken up). Good results for a first iteration of the implementation - https://phabricator.wikimedia.org/T332337

WDQS graph splitting

Search SLO

Misc