Search Platform/Weekly Updates/2023-03-10

From Wikitech

Summary

Spark 3 upgrade has been unblock, thanks to work by the Data Engineering team on providing us with a working Airflow 2 instance. We are on track to deliver this work, with maybe some minimal leftover work to deploy all DAGs.

The Search update pipeline is going well with our revised expectations. We should be able to deliver update lag metrics, and validate the deployment on k8s with Flink operators. The planned functional work is delayed until next quarter.

What we've accomplished

Search Analysis

Search Update Pipeline

Spark 3 Upgrade

Operations / SRE

Misc

  • Ongoing work to define a set of SLO and KPI around Search - https://docs.google.com/document/d/1gYROXo8Fl7JSxReHAVI22EhcPvG-INVkq79a1C3tfK0/edit
  • Fixed a bug with Search indices not being properly updated when pages move between categories - https://phabricator.wikimedia.org/T331127
  • Work on Search dashboards is restarting, but the highest priority for our data analyst is supporting SDAW. We will see how much time is left to do some improvements on the Search dashboards.
  • While we don't yet have full clarity on what our priorities will be for next fiscal year, we have some inputs:
    • The highest priority should be on scaling WDQS, by investigating splitting the Wikidata graph.
    • Search work should focus on supporting editors more than supporting casual readers, which is a change of direction compared to what we have been doing in the past.