From Wikitech
Aisha Khatun
NLP Researcher @ Research Team, Wikimedia Foundation
Learning is my passion. Everything else just falls in place.

About me

I am an ML and NLP enthusiast from Bangladesh. I love working wih data and drawing information from them. I did my Bachelors in Computer Science and Engineering from Shahjalal University of Science and Technology, Bangladesh and Masters in Computer Science from University of Waterloo, Canada. Upon graduation, I worked as a Machine Learning Engineer for about a year before joining Wikimedia Foundation as a Data Analyst and Researcher, performing several roles along the way.

My work

  • I am currently working with the Research Team to improve link recommendation in all Wikipedia languages. This work includes fixing mwtokenizer to help parse all languages, improve existing language dependent link recommendation models, and then creating a language agnostic link recommendation model that will replace the 200+ language independent models deployed at present.
  • I worked with the Research Team as a Research Data Scientist (NLP) to develop Copyediting as a structured task. To increase and maintain the standard of Wikipedia articles, it is important to ensure articles don't have typos, spelling, or grammatical errors. While there are ongoing efforts to automatically detect "commonly misspelled" words in English Wikipedia, most other languages are left behind. The intention was to find ways to detect errors in articles in all languages in an automated fashion. I wrote a program to automatically curate a list of commonly misspelled words from 100+ languages using Wiktionary. The coverage of these lists were compared with misspelling lists in 2-3 languages, and then the list was used to detect misspellings in all possible Wikipedia languages.
  • Previously I worked with the Search and Analytics team to find ways to scale the Wikidata Query Service by analyzing the queries being made. Phabricator Work Board (WDQS Analysis).
  • I also worked on the Abstract Wikipedia project during my Outreachy internship (see this) to find out important Scribunto Modules across all the wikis and group similar ones.
Disclaimer: Although I work for the Wikimedia Foundation, contributions under this account do not necessarily represent the actions or views of the Foundation unless expressly stated otherwise. For example, edits to articles or uploads of other media are done in my individual, personal capacity unless otherwise stated.

Contact me


More work