A scalable machine learning model serving infrastructure on Kubernetes using KServe. It is part of a broader project aimed to modernize Machine Learning at Wikimedia. This service will replace the ORES infrastructure (check Machine Learning/LiftWing/Usage#Differences using Lift Wing instead of ORES to have a quick overview of differences between Lift Wing and ORES).
Serving (Lift Wing)
We host our Machine Learning models as Inference Services, which are asynchronous micro-services that can transform raw feature data and make predictions. Each inference service has production images that are published in the WMF Docker Registry via the Deployment Pipeline. These images are then used for an isvc configuration in our ml-services helmfile in the operations/deployment-charts repo.
- Model Deployment Guide: Machine Learning/LiftWing/Deploy
- Inference Service Docs: Machine_Learning/LiftWing/Inference Services
Training (Train Wing)
Still not available, we are working on it, stay tuned! For more info please feel free to reach out to the Machine Learning team :)
We store model binary files in Swift, which is an open-source s3-compatible object store that is widely-used across the WMF. The model files are downloaded by the KServe's
storage-initializer when an Inference Service pod is created. The
storage-initializer then mounts the model binary in the pod at
/mnt/models/ and can be loaded by the predictor container. This is totally transparent to the user, it is provided by KServe.
- Model Upload info: Machine_Learning/LiftWing/Deploy#How_to_upload_a_model_to_Swift
Hosting a model
If you want to host a model on Lift Wing, the first thing to do is to contact the ML team so that we are aware about it. This step is very important since we'll sync on what the model does, what kind of data it needs/handles, how the model was built, etc.. The idea is to avoid last minute surprises between who requests to host a new model and the ML team, better safe than sorry!
If you haven't done so, please open a new Phabricator task using the
Machine-Learning-Team label. Please also add the following information:
- What use case is the model going to support/resolve?
- Do you have a model card? If you don't know what it is, please check https://meta.wikimedia.org/wiki/Machine_learning_models.
- What team created/trained/etc.. the model? What tools and frameworks have you used?
- What kind of data was the model trained with, and what kind of data the model is going to need in production (for example, calls to internal/external services, special datasources for features, etc..) ?
- If you have a minimal codebase that you used to run the first tests with the model, could you please share it?
- State what team will own the model and please share some main point of contacts (see more info in Ownership of a model).
- What is the current latency and throughput of the model, if you have tested it? We don't need anything precise at this stage, just some ballparks numbers to figure out how the model performs with the expected inputs. For example, does the model take ms/seconds/etc.. to respond to queries? How does it react when 1/10/20/etc.. requests in parallel are made? If you don't have these numbers don't worry, open the task and we'll figure something out while we discuss about next steps!
- Is there an expected frequency in which the model will have to be retrained with new data? What are the resources required to train the model and what was the dataset size?
- Have you checked if the output of your model is safe from a human rights point of view? Is there any risk of it being offensive for somebody? Even if you have any slight worry or corner case, please tell us!
- Everything else that is relevant in your opinion :)
While the Machine Learning team processes your request, you can definitely start browsing the following pages to get more info about how we write code for Lift Wing:
- KServe Guide: Machine Learning/LiftWing/KServe
- Production Image Development Guide: Machine Learning/LiftWing/Inference Services/Production Image Development
After reading the above tutorials you should be able to create a Docker image with a basic KServe model server and test it locally. We realize that it may be overwhelming at first, so if you have any difficulties or doubts please ask us in the aforementioned task and we'll assist you!
The idea of our team is to be involved as soon as possible in the development of the model to direct you and your team to the easiest and best path to production, without incurring in misunderstandings and surprises later on (when a lot of work is already done for example).
Ownership of a model
- The Machine Learning team will guide you to the development and deployment of your model on our Kubernetes infrastructure, and we'll take care of basic monitoring and scalability maintenance tasks for you. What we ask is to state a clear ownership of the model, so that we'll be able to ping you in case we need (unexpected problems, etc..).
- The Machine Learning team will not own models but will only help teams to deploy them in a reliable Kubernetes infrastructure. The idea is to ease the task of putting a model in production for various teams, but we will not be able to also support/maintain models for resource constraints reasons (we'd need a team of 10+ people otherwise :)).
- The Machine Learning team will not be responsible for the outputs of the model, for example if it doesn't respect basic human rights or if it is offensive to any group of people. We will work with you to avoid (as much as possible) these risks, but your team will be ultimately responsible for the model's behavior once added to production.
- If your team wants to turn-off the model (namely removing it from the Production's API etc..), then you'll need to do the necessary follow ups with the community using it (for example, if exposed it outside the Wikimedia realm).
Hosting stages for a model server on Lift Wing
Once a model is created, the ML team will likely suggests to add it to the stagings cluster's
experimental namespace. In here the model can be queried only from internal clients (see the
Usage section for more info), not from the outside. Once the model is stable (doesn't consume a ton of memory/cpu, works reliably and doesn't fail randomly, etc..) it will be possible to move it to the production clusters, exposing it to the outside world via the API-Gateway (see the
Usage section for more info). The idea is to avoid exposing prototypes to the outside Internet until they have been properly vetted and tested by ML and the requesting team.
- A model server is created by a team in collaboration with the Machine Learning team's engineers (see the
Developmentsection for more info).
- The model binary is passed to a ML-Ops engineer. This step needs to be logged on Phabricator (ideally in the task that represent the work to be done for the new model server) stating the location of the model (directory on stat100x, Gsuite, etc..) and its sha512. Please don't use paste or similar, add the sha512 value directly as Phabricator comment. The MLOps engineer retrieves the file and checks the sha512 checksum to make sure the file wasn't tampered or miscopied, and acknowledges the correct verification on Phabricator. Finally the model is uploaded by the MLOps engineer to Swift.
elukey@stat1008:~$ sha512sum -b model.bin 5bd35e5e92196eec76abad880703c9caaa94e2e52eebd68cd61745549bc37d8654e7f4c6731fa1b643d6bb644ccad5dc98f738b8da8928ef27e189cb92b63e5c *model.bin
- The model server is deployed in the Lift Wing staging cluster, under the
experimentalnamespace. In this limbo the model server will be available only for internal WMF clients, and not exposed to the outside internet. At this point experiments with load testing to figure out correct resource usage (even just a baseline) are performed. We err on the side of fewer resources, this can always be increased later.
- The model server is developed and it reaches a good stability level, namely its memory/cpu consumption is predictable and its performance is acceptable. The error rate is low, and the code's quality meet a certain bar. The model has also a clear owner and point of contact, so that in case of bugs/emergencies/doubts/etc.. the ML team will be able to contact them. As written before the ML team doesn't own any model, and we don't necessarily know the codebase and the design choices behind a model, so we need whoever has context to keep supporting it. If this is not possible, the model will not graduate to production.
- A basic load test is performed to figure out (indicatively) how many rps the model server can sustain (in staging). The ML team and the model owner set a target SLO for the service.
- The model server is assigned to its own Kubernetes namespace, and it is deployed in production by the Machine Learning team. The requesting team can expose the service to the outside Internet via the API Gateway, with the help of the Machine Learning team. See the
Usagesection for more info.
Current Inference Services
Revscoring models (migrated from ORES)
|enwiki-articlequality, euwiki-articlequality, fawiki-articlequality, frwiki-articlequality, glwiki-articlequality, nlwiki-articlequality, ptwiki-articlequality, ruwiki-articlequality, svwiki-articlequality, trwiki-articlequality, ukwiki-articlequality, wikidatawiki-itemquality
|arwiki-damaging, bswiki-damaging, cawiki-damaging, cswiki-damaging, dewiki-damaging, enwiki-damaging, eswikibooks-damaging, eswiki-damaging, eswikiquote-damaging, etwiki-damaging, fawiki-damaging, fiwiki-damaging, frwiki-damaging, hewiki-damaging, hiwiki-damaging, huwiki-damaging, itwiki-damaging, jawiki-damaging, kowiki-damaging, lvwiki-damaging, nlwiki-damaging, nowiki-damaging, plwiki-damaging, ptwiki-damaging, rowiki-damaging, ruwiki-damaging, sqwiki-damaging, srwiki-damaging, svwiki-damaging, trwiki-damaging, ukwiki-damaging, wikidatawiki-damaging, zhwiki-damaging
|arwiki-goodfaith, bswiki-goodfaith, cawiki-goodfaith, cswiki-goodfaith, dewiki-goodfaith, enwiki-goodfaith, eswikibooks-goodfaith, eswiki-goodfaith, eswikiquote-goodfaith, etwiki-goodfaith, fawiki-goodfaith, fiwiki-goodfaith, frwiki-goodfaith, hewiki-goodfaith, hiwiki-goodfaith, huwiki-goodfaith, itwiki-goodfaith, jawiki-goodfaith, kowiki-goodfaith, lvwiki-goodfaith, nlwiki-goodfaith, nowiki-goodfaith, plwiki-goodfaith, ptwiki-goodfaith, rowiki-goodfaith, ruwiki-goodfaith, sqwiki-goodfaith, srwiki-goodfaith, svwiki-goodfaith, trwiki-goodfaith, ukwiki-goodfaith, wikidatawiki-goodfaith, zhwiki-goodfaith
|bnwiki-reverted, elwiki-reverted, enwiktionary-reverted, glwiki-reverted, hrwiki-reverted, idwiki-reverted, iswiki-reverted, tawiki-reverted, viwiki-reverted
|arwiki-articletopic, cswiki-articletopic, enwiki-articletopic, euwiki-articletopic, huwiki-articletopic, hywiki-articletopic, kowiki-articletopic, srwiki-articletopic, ukwiki-articletopic, viwiki-articletopic, wikidatawiki-itemtopic
Contributing to the project
If you are a member of the community that wants to help out with Lift Wing, thanks a lot! You are really welcome :)
We manage our tasks in this Phabricator board, but it may be overwhelming and confusing at first. We suggest to present yourself on the
#wikimedia-ml IRC LIbera channel, and after a chat (to understand what are your goals and desired learning experiences) we'll decide together what work to assign to you.
If you have any feedback/suggestion/fix/etc.. please open a Phabricator task to the aforementioned tag.
Tables of contents
- Machine Learning/LiftWing/Alerts
- Machine Learning/LiftWing/Deploy
- Machine Learning/LiftWing/Inference Services
- Machine Learning/LiftWing/Inference Services/Production Image Development
- Machine Learning/LiftWing/KServe
- Machine Learning/LiftWing/KServe/DeployLocal
- Machine Learning/LiftWing/ML-Sandbox
- Machine Learning/LiftWing/ML-Sandbox/Configuration
- Machine Learning/LiftWing/ML-Sandbox/Usage-Examples
- Machine Learning/LiftWing/Streams
- Machine Learning/LiftWing/Usage