Search Platform/Weekly Updates/2023-05-18

From Wikitech

Summary

Working on the post-mortem of the WDQS outage, Search Update pipeline, and optimizing Wikibase index settings.

What we've accomplished

Search - Analysis

  • Continuing data analysis for apostrohpe-like characters (T315118). There are 22 candidate characters, and they get treated differently by different tokenizers (the Hebrew tokenizer straight up converts 5 of them to apostrophes—including Hebrew geresh—which I never noticed before!) and by ICU normalization and ICU folding.

Operations / SRE