Search Platform/Weekly Updates/2026-01-09
Appearance
Highlights
CirrusSearch dumps (contents of OpenSearch index) are no longer sourced via MW maintenance script. Instead, we use AirFlow/Hadoop/Spark to source the dumps, similar to the content dumps (dumps 2.0).
Search via Action API now supports natural sorting by title. However, UI support on Special:Search is still pending.
Besides that, work on WE3.1.17 (semantic/vector search) continues. So far, we have processed enwiki, dewiki, and simplewiki with different LLMs and tokenization strategies. Thanks to ML we are able to offload the creation of embeddings for content and queries to a dedicated service on lift wing.
Shipped: did we release anything this week?
- T366248 Source the CirrusSearch index dumps from hadoop instead of a MW maintenance script (Open)
- T411347 New CirrusSearch dumps are not properly formatted (Open)
- T40403 Sortable search results (Resolved)
- T408431 Reindex all wikis (Open)
Blockers: does this essential workstream have any unresolved blockers or dependencies? is anything preventing us from doing our work?
N/A
Lessons learned: Did we learn anything in the course of doing this work that can be applied to other work?
N/A
Community collab: Did we do anything in this essential workstream this week in collaboration with the community or because the community asked us to?
N/A