Jump to content

User:DCausse (WMF)/Completion Suggester

From Wikitech

Context & Limitations

The completion suggester is the only suggester that supports "search as you type". It relies on a special index format to work. Suggestions are sorted based on a weight computed at index time, term frequencies are not taken into account like other lucene queries. It is extremely fast (faster than prefix queries), the data structure is loaded into an FST in memory. It allows fuzzy and contextualized queries.

Limitations

  • Not suited for real-time
    • Workaound: build a independent index that will be updated regularly.
  • No discount based on edit distance when fuzzy is enabled, whether you search for "saer" or "sear", "search" will be suggested with the same score.
    • Workaround: run 2 suggestions at the time, one without fuzzy and one with fuzzy, results from fuzzy suggestions will have their score discounted by a factor.
  • No discount based on geo distance for geo contextualized suggestions :
    • Workaround: run multiple suggestions with multiple geo distances and apply a discount.

Build the index

Scoring

This is maybe the most important part, the completion suggester will certainly add more ambiguities: we need a good score function to make sure that only interesting suggestions are returned. The idea for now is to design a score that reflects the quality of page. The completion suggester works with integer, we could make a score function that returns a float between [0-1] and then multiply this float with a constant like 30000 and convert to int. We would have a score between 0 and 30000.

The data available by the score function is the data available by

Aggregates multiple suggestions

The backend (Cirrus) will receive a search query