Search Platform/Weekly Updates/2023-10-06

From Wikitech

Summary

We're starting a new quarter. Our goals for this quarter are almost ready and will be published on wiki shortly (keep an eye on https://wikitech.wikimedia.org/wiki/Search_Platform/Goals).

Overall, this was a short week, due to Wikimedia Connect. We're making good progress towards deploying the Search Update Pipeline, with the testing of standard operations completed. We've identified a number of performance improvements to our improvements to multilingual zero-results rate. And we're getting started on experimenting with WDQS graph split.

What we've accomplished

Search Update Pipeline

  • We have tested all relevant operations, we are ready for a production deployment of the Search Update Pipeline on Flink, with k8s operators - https://phabricator.wikimedia.org/T342149
  • Migration of the WDQS updater to use newer Flink connectors
  • Started to work on better isolation of wdqs updater error streams, quick patch to disable them to unblock testing the flink-k8s-op, better solution still WIP - https://phabricator.wikimedia.org/T347515

Improve multilingual zero-results rate

  • Performance optimization in progress. In particular, consolidating character mapping brings a 9.3% improvement to indexing times, implementing custom mapping code instead of the heavy weight elasticsearch machinery is ~50% faster.
  • A new Elasticsearch plugin will be created to isolate this and allow for easier rollout

WDQS Graph Split

Misc