A scalable machine learning model serving infrastructure on Kubernetes using KServe. It is part of a broader project aimed to modernize Machine Learning at Wikimedia. This service will replace the ORES infrastructure (check Machine Learning/LiftWing/Usage#Differences using Lift Wing instead of ORES to have a quick overview of differences between Lift Wing and ORES).

Components

Serving (Lift Wing)

We host our Machine Learning models as Inference Services, which are asynchronous micro-services that can transform raw feature data and make predictions. Each inference service has production images that are published in the WMF Docker Registry via the Deployment Pipeline. These images are then used for an isvc configuration in our ml-services helmfile in the operations/deployment-charts repo.

Model Deployment Guide: Machine Learning/LiftWing/Deploy
Inference Service Docs: Machine_Learning/LiftWing/Inference Services

Training (Train Wing)

Still not available, we are working on it, stay tuned! For more info please feel free to reach out to the Machine Learning team :)

Model Storage

We store model binary files in Swift, which is an open-source s3-compatible object store that is widely-used across the WMF. The model files are downloaded by the KServe's storage-initializer when an Inference Service pod is created. The storage-initializer then mounts the model binary in the pod at /mnt/models/ and can be loaded by the predictor container. This is totally transparent to the user, it is provided by KServe.

Model Upload info: Machine_Learning/LiftWing/Deploy#How_to_upload_a_model_to_Swift

Hosting a model

If you want to host a model on Lift Wing, the first thing to do is to contact the ML team so that we are aware about it. This step is very important since we'll sync on what the model does, what kind of data it needs/handles, how the model was built, etc.. The idea is to avoid last minute surprises between who requests to host a new model and the ML team, better safe than sorry!

If you haven't done so, please make a request by filling this form which requires the following information:

What use case is the model going to support/resolve?
Do you have a model card? If you don't know what it is, please check https://meta.wikimedia.org/wiki/Machine_learning_models.
What team created/trained/etc.. the model? What tools and frameworks have you used?
What kind of data was the model trained with, and what kind of data the model is going to need in production (for example, calls to internal/external services, special datasources for features, etc..) ?
If you have a minimal codebase that you used to run the first tests with the model, could you please share it?
State what team will own the model and please share some main point of contacts (see more info in Ownership of a model).
What is the current latency and throughput of the model, if you have tested it? We don't need anything precise at this stage, just some ballparks numbers to figure out how the model performs with the expected inputs. For example, does the model take ms/seconds/etc.. to respond to queries? How does it react when 1/10/20/etc.. requests in parallel are made? If you don't have these numbers don't worry, open the task and we'll figure something out while we discuss about next steps!
Is there an expected frequency in which the model will have to be retrained with new data? What are the resources required to train the model and what was the dataset size?
Does this model have a license? Does the code or weights the model is fine-tuned from (if applicable) have a license that allow usage?
Does this model require a legal review? If so, have you reached out to the Legal team?
Does this model require a human rights review? If so, have you reached out to the Human Rights team?
Have you checked if the output of your model is safe from a human rights point of view? Is there any risk of it being offensive for somebody? Even if you have any slight worry or corner case, please tell us!
Everything else that is relevant in your opinion :)

While the Machine Learning team processes your request, you can definitely start browsing the following pages to get more info about how we write code for Lift Wing:

KServe Guide: Machine Learning/LiftWing/KServe
Production Image Development Guide: Machine Learning/LiftWing/Inference Services/Production Image Development

After reading the above tutorials you should be able to create a Docker image with a basic KServe model server and test it locally. We realize that it may be overwhelming at first, so if you have any difficulties or doubts please ask us in the aforementioned task and we'll assist you!

The idea of our team is to be involved as soon as possible in the development of the model to direct you and your team to the easiest and best path to production, without incurring in misunderstandings and surprises later on (when a lot of work is already done for example).

Ownership of a model

The Machine Learning team will guide you to the development and deployment of your model on our Kubernetes infrastructure, and we'll take care of basic monitoring and scalability maintenance tasks for you. What we ask is to state a clear ownership of the model, so that we'll be able to ping you in case we need (unexpected problems, etc..).
The Machine Learning team will not own models but will only help teams to deploy them in a reliable Kubernetes infrastructure. The idea is to ease the task of putting a model in production for various teams, but we will not be able to also support/maintain models for resource constraints reasons (we'd need a team of 10+ people otherwise :)).
The Machine Learning team will not be responsible for the outputs of the model, for example if it doesn't respect basic human rights or if it is offensive to any group of people. We will work with you to avoid (as much as possible) these risks, but your team will be ultimately responsible for the model's behavior once added to production.
If your team wants to turn-off the model (namely removing it from the Production's API etc..), then you'll need to do the necessary follow ups with the community using it (for example, if exposed it outside the Wikimedia realm).

Hosting stages for a model server on Lift Wing

Once a model is created, the ML team will likely suggests to add it to the stagings cluster's experimental namespace. In here the model can be queried only from internal clients (see the Usage section for more info), not from the outside. Once the model is stable (doesn't consume a ton of memory/cpu, works reliably and doesn't fail randomly, etc..) it will be possible to move it to the production clusters, exposing it to the outside world via the API-Gateway (see the Usage section for more info). The idea is to avoid exposing prototypes to the outside Internet until they have been properly vetted and tested by ML and the requesting team.

To summarize:

A model server is created by a team in collaboration with the Machine Learning team's engineers (see the Development section for more info).
The model binary is passed to a ML-Ops engineer. This step needs to be logged on Phabricator (ideally in the task that represent the work to be done for the new model server) stating the location of the model (directory on stat100x, Gsuite, etc..) and its sha512. Please don't use paste or similar, add the sha512 value directly as Phabricator comment. The MLOps engineer retrieves the file and checks the sha512 checksum to make sure the file wasn't tampered or miscopied, and acknowledges the correct verification on Phabricator. Finally the model is uploaded by the MLOps engineer to Swift.
- Example
```
elukey@stat1008:~$ sha512sum -b model.bin 
5bd35e5e92196eec76abad880703c9caaa94e2e52eebd68cd61745549bc37d8654e7f4c6731fa1b643d6bb644ccad5dc98f738b8da8928ef27e189cb92b63e5c *model.bin
```
The model server is deployed in the Lift Wing staging cluster, under the experimental namespace. In this limbo the model server will be available only for internal WMF clients, and not exposed to the outside internet. At this point experiments with load testing to figure out correct resource usage (even just a baseline) are performed. We err on the side of fewer resources, this can always be increased later.
The model server is developed and it reaches a good stability level, namely its memory/cpu consumption is predictable and its performance is acceptable. The error rate is low, and the code's quality meet a certain bar. The model has also a clear owner and point of contact, so that in case of bugs/emergencies/doubts/etc.. the ML team will be able to contact them. As written before the ML team doesn't own any model, and we don't necessarily know the codebase and the design choices behind a model, so we need whoever has context to keep supporting it. If this is not possible, the model will not graduate to production.
A basic load test is performed to figure out (indicatively) how many rps the model server can sustain (in staging). The ML team and the model owner set a target SLO for the service.
The model server is assigned to its own Kubernetes namespace, and it is deployed in production by the Machine Learning team. The requesting team can expose the service to the outside Internet via the API Gateway, with the help of the Machine Learning team. See the Usage section for more info.

Requesting to update a model

Please filling this form. Include the following details:

Which model needs updating?
What changes have been made to the model? (e.g., updated training data, different approach, new features, etc.)
Do any dependent repositories/packages need updates? (e.g., knowledge integrity, sklearn, pytorch, etc.) Please provide the MR/version for reference.
Is there a new model binary? What is its version?
Does the input/output schema need any changes?
Does the preprocessing stage require changes?
Does the prediction stage require changes?

Checklist:

[] Update the model card
[] Provide the location of the new model binary and its sha512 to the ML team.

Lastly, make sure the task includes the Machine-Learning-Team and Lift-Wing project tags. We will process your request and follow up with you soon.

Usage

See Machine Learning/LiftWing/Usage

Streams

See Machine Learning/LiftWing/Streams

Current Inference Services

Revscoring models (migrated from ORES)

Model type	Model name	Kubernetes namespace	Docker image	Model binary	Model card
articlequality	enwiki-articlequality, euwiki-articlequality, fawiki-articlequality, frwiki-articlequality, glwiki-articlequality, nlwiki-articlequality, ptwiki-articlequality, ruwiki-articlequality, svwiki-articlequality, trwiki-articlequality, ukwiki-articlequality, wikidatawiki-itemquality	revscoring-articlequality	revscoring	wmf-ml-models/articlequality/ wmf-ml-models/itemquality/	ORES_models
draftquality	enwiki-draftquality, ptwiki-draftquality	revscoring-draftquality		wmf-ml-models/draftquality/
damaging	arwiki-damaging, bswiki-damaging, cawiki-damaging, cswiki-damaging, dewiki-damaging, enwiki-damaging, eswikibooks-damaging, eswiki-damaging, eswikiquote-damaging, etwiki-damaging, fawiki-damaging, fiwiki-damaging, frwiki-damaging, hewiki-damaging, hiwiki-damaging, huwiki-damaging, itwiki-damaging, jawiki-damaging, kowiki-damaging, lvwiki-damaging, nlwiki-damaging, nowiki-damaging, plwiki-damaging, ptwiki-damaging, rowiki-damaging, ruwiki-damaging, sqwiki-damaging, srwiki-damaging, svwiki-damaging, trwiki-damaging, ukwiki-damaging, wikidatawiki-damaging, zhwiki-damaging	revscoring-editquality-damaging		wmf-ml-models/damaging/
goodfaith	arwiki-goodfaith, bswiki-goodfaith, cawiki-goodfaith, cswiki-goodfaith, dewiki-goodfaith, enwiki-goodfaith, eswikibooks-goodfaith, eswiki-goodfaith, eswikiquote-goodfaith, etwiki-goodfaith, fawiki-goodfaith, fiwiki-goodfaith, frwiki-goodfaith, hewiki-goodfaith, hiwiki-goodfaith, huwiki-goodfaith, itwiki-goodfaith, jawiki-goodfaith, kowiki-goodfaith, lvwiki-goodfaith, nlwiki-goodfaith, nowiki-goodfaith, plwiki-goodfaith, ptwiki-goodfaith, rowiki-goodfaith, ruwiki-goodfaith, sqwiki-goodfaith, srwiki-goodfaith, svwiki-goodfaith, trwiki-goodfaith, ukwiki-goodfaith, wikidatawiki-goodfaith, zhwiki-goodfaith	revscoring-editquality-goodfaith		wmf-ml-models/goodfaith/
reverted	bnwiki-reverted, elwiki-reverted, enwiktionary-reverted, glwiki-reverted, hrwiki-reverted, idwiki-reverted, iswiki-reverted, tawiki-reverted, viwiki-reverted	revscoring-editquality-reverted		wmf-ml-models/reverted/
articletopic	arwiki-articletopic, cswiki-articletopic, enwiki-articletopic, euwiki-articletopic, huwiki-articletopic, hywiki-articletopic, kowiki-articletopic, srwiki-articletopic, ukwiki-articletopic, viwiki-articletopic, wikidatawiki-itemtopic	revscoring-articletopic		wmf-ml-models/articletopic/wmf-ml-models/itemtopic/
drafttopic	enwiki-drafttopic	revscoring-drafttopic		wmf-ml-models/drafttopic/

Article topic

Model name	Kubernetes namespace	Docker image	Model binary	Model card
outlink-topic-model	articletopic-outlink	outlink, outlink-transformer	wmf-ml-models/articletopic/outlink/	Language_agnostic_link-based_article_topic_model_card

Vandalism detection

Model name	Kubernetes namespace	Docker image	Model binary	Model card
revertrisk-language-agnostic	revertrisk	revertrisk	wmf-ml-models/revertrisk/	Language-agnostic_revert_risk
revertrisk-multilingual	revertrisk	revertrisk-multilingual	wmf-ml-models/revertrisk/	Multilingual_revert_risk

Language models

Model name	Kubernetes namespace	Docker image	Model binary	Model card
langid	llm	langid	wmf-ml-models/langid	Language Identification

Content Translation Recommendations

Model name	Kubernetes namespace	Docker image	Model binary	Documentation
n/a	recommendation-api-ng	recommendation-api-ng	n/a	Content Translation Recommendation

Article Descriptions

Model name	Kubernetes namespace	Docker image	Model binary	Model card
descartes (mbart-large-cc25 and bert-base-multilingual-uncased)	article-descriptions	article-descriptions	wmf-ml-models/article-descriptions	Article Descriptions

Logo Detection

Model name	Kubernetes namespace	Docker image	Model binary	Model card
logo_max_all.keras	logo-detection	logo-detection	wmf-ml-models/logo-detection	Logo Detection

Article Country

Model name	Kubernetes namespace	Docker image	Model binary	Model card
n/a	article-models	article-country	n/a	Article Country

Contributing to the project

If you are a member of the community that wants to help out with Lift Wing, thanks a lot! You are really welcome :)

We manage our tasks in this Phabricator board, but it may be overwhelming and confusing at first. We suggest to present yourself on the #wikimedia-ml IRC LIbera channel, and after a chat (to understand what are your goals and desired learning experiences) we'll decide together what work to assign to you.

If you have any feedback/suggestion/fix/etc.. please open a Phabricator task to the aforementioned tag.