Search Platform/Weekly Updates/2024-02-23

From Wikitech

Summary

We are in the deployment phase of our multilingual zero-result rate improvements. The new elasticsearch plugins are deployed, the new configuration is ready. Once the configuration is deployed, we will need to reindex all wikis (which takes 2-3 weeks) and analyze the improvements.

We are in active conversation with people from Scholia around the WDQS graph split. The discussions are constructive, but we are identifying major impacts to Scholia queries. Some we might be able to resolve, some we might not. In particular, the authors and scholarly articles are on different graphs, which makes some queries complex and expensive to run. Scholia is not only about scholarly articles, but about other types of papers, which now need to be treated differently. This conversation and investigation needs to continue.

What we've accomplished

Improve multilingual zero-results rate

WDQS graph splitting

  • Meeting with Daniel Mietchen and Lane Raspberry
    • Agreed to have regular meetings
    • Still some questions about why we're doing this and what problem we try to solve
    • Question regarding what is wikidata, is it only a wdqs problem or should wikidata stop accepting some data and ask communities to use another hosting solution

more notes: https://etherpad.wikimedia.org/p/wdqs-graph-split-2024-02-15-1

Operations

  • backfill w[dc]qs reconciliation dag after a failure of the canary events system, deployed a quick patch to stop creating this dag dynamically and sense both DC partitions via a single sensor, should decouple the use of the wmf_conf.eventgate_datacenters var in airflow that was set to codfw only to workaround the canary event issue (which I reverted batch ["eqiad", "codfw"] to do the backfill).
    • We might expect some turbulences during the upcoming dc switch (march 19), DE might still rely on a manual switch of that config var in airflow.