Machine Learning/LiftWing/Usage
Lift Wing's model servers can be exposed in two ways:
- Via internal endpoint, namely available only for clients internal to the production WMF network (hence excluding stuff like Toolforge and Cloud VMs).
- Via external endpoint, available for all clients from the outside Internet.
In this page you can find info about both, please use the method that best suits your needs.
Internal endpoints
Once the model server is deployed on Lift Wing, it'll become available internally via two endpoints (the former is prod and the latter is staging):
https://inference.discovery.wmnet:30443/v1/models/{MODEL_NAME}:predict
https://inference-staging.svc.codfw.wmnet:30443/v1/models/{MODEL_NAME}:predict
(staging)
The caller/client needs to set the HTTP Host header as follows: {MODEL_NAME}.{KUBERNETES_NAMESPACE}.wikimedia.org
You can find {MODEL_NAME}
and {KUBERNETES_NAMESPACE}
in the tables in the Machine Learning/LiftWing#Current Inference Services.
When do I need to use the internal endpoint?
If you have an internal application/client/etc.. that runs inside the WMF infrastructure, then you can definitely use the internal endpoint. Please note that any massive traffic volume needs to be vouched by the ML team first (just ping them on IRC or Phabricator). There are rate-limit filters in place but better safe than sorry :)
For example: if you plan to run a Spark job on Hadoop to train a model, hitting Lift Wing 1000000 times to retrieve scores in a short time frame may impact other production workflows. Please follow up with the ML team first to agree on a strategy :)
Example usage of internal endpoint
Curl |
The way to query aikochou@stat1004:~$ cat input.json
{ "rev_id": 1083325118 }
aikochou@stat1004:~$ curl "https://inference.discovery.wmnet:30443/v1/models/enwiki-goodfaith:predict" -X POST -d @input.json -i -H "Host: enwiki-goodfaith.revscoring-editquality-goodfaith.wikimedia.org" -A "YOUR_APP_NAME (YOUR_EMAIL_OR_CONTACT_PAGE)" --http1.1
HTTP/1.1 200 OK
content-length: 209
content-type: application/json; charset=UTF-8
date: Mon, 31 Oct 2022 16:51:54 GMT
server: istio-envoy
x-envoy-upstream-service-time: 361
{"enwiki": {"models": {"goodfaith": {"version": "0.5.1"}}, "scores": {"1083325118": {"goodfaith": {"score": {"prediction": true, "probability": {"false": 0.033641298577500645, "true": 0.9663587014224994}}}}}}}
curl: (60) SSL certificate problem: unable to get local issuer certificate , using
If you get error
|
Python |
The way to query import os
import json
import requests
os.environ['REQUESTS_CA_BUNDLE'] = "/etc/ssl/certs/wmf-ca-certificates.crt"
inference_url = 'https://inference.discovery.wmnet:30443/v1/models/outlink-topic-model:predict'
headers = {
'Host': 'outlink-topic-model.articletopic-outlink.wikimedia.org',
'Content-Type': 'application/json',
'User-Agent': 'YOUR_APP_NAME (YOUR_EMAIL_OR_CONTACT_PAGE)'
}
data = {"lang": "en", "page_title": "Wings of Fire (novel series)"}
response = requests.post(inference_url, headers=headers, data=json.dumps(data))
print(response.text)
If you get the following error message from a stat100x node: requests.exceptions.ProxyError: HTTPSConnectionPool(host='inference.discovery.wmnet', port=30443): Max retries exceeded with url: /v1/models/outlink-topic-model:predict (Caused by ProxyError('Cannot connect to proxy.', OSError('Tunnel connection failed: 403 Forbidden')))
unset https_proxy and run the script again.
|
JavaScript |
The way to query enwiki-goodfaith model via JavaScript: const liftWingInternalEndpoint = "https://inference.discovery.wmnet:30443/v1/models/enwiki-goodfaith:predict";
const appName = "YOUR_APP_NAME";
const email = "YOUR_EMAIL_OR_CONTACT_PAGE";
let headers = new Headers({
"Host": "enwiki-goodfaith.revscoring-editquality-goodfaith.wikimedia.org",
"Content-Type": "application/json",
"Api-User-Agent": appName + " ( " + email + " )"
});
let data = { "rev_id": 1083325118 };
fetch(liftWingInternalEndpoint, {
method: "POST",
headers: headers,
body: JSON.stringify(data)
})
.then(response => response.json())
.then(inferenceData => console.log(inferenceData));
Since this is an internal endpoint, it is best to query it using the stat100x nodes to avoid access issues. |
External endpoints
The ML Team uses the API gateway infrastructure to publish the Lift Wing endpoints in api.wikimedia.org.
The URL to construct is the following: https://api.wikimedia.org/service/lw/inference/v1/models/{MODEL_NAME}:predict
You can find the {MODEL_NAME}
in Machine_Learning/LiftWing#Current_Inference_Services. Note that models in the "experimental" namespace are not available for external endpoints.
When do I need to use the external endpoint?
Every time a client needs to call Lift Wing from outside the Wikimedia infrastructure (this also means any tool from Toolforge).
Example usage of external endpoint
Curl |
The way to query export ACCESSTOKEN="copy-paste-token-from-api-portal"
curl https://api.wikimedia.org/service/lw/inference/v1/models/enwiki-articletopic:predict -X POST -d '{
"rev_id": 123555 }' -H "Authorization: Bearer $ACCESSTOKEN" -A "YOUR_APP_NAME (YOUR_EMAIL_OR_CONTACT_PAGE)"
If you want your request to be anonymous, just remove the |
Python |
The way to query import json
import requests
inference_url = 'https://api.wikimedia.org/service/lw/inference/v1/models/outlink-topic-model:predict'
headers = {
'Content-Type': 'application/json',
'Authorization': 'Bearer copy-paste-token-from-api-portal',
'User-Agent': 'YOUR_APP_NAME (YOUR_EMAIL_OR_CONTACT_PAGE)'
}
data = {"lang": "en", "page_title": "Wings of Fire (novel series)"}
response = requests.post(inference_url, headers=headers, data=json.dumps(data))
print(response.text)
|
JavaScript |
The way to query enwiki-goodfaith model via JavaScript: const liftWingExternalEndpoint = "https://api.wikimedia.org/service/lw/inference/v1/models/enwiki-goodfaith:predict";
const accessToken = "YOUR_ACCESS_TOKEN";
const appName = "YOUR_APP_NAME";
const email = "YOUR_EMAIL_OR_CONTACT_PAGE";
let headers = new Headers({
"Content-Type": "application/json",
"Authorization": "Bearer " + accessToken,
"Api-User-Agent": appName + " ( " + email + " )"
});
let data = { "rev_id": 1083325118 };
fetch(liftWingExternalEndpoint, {
method: "POST",
headers: headers,
body: JSON.stringify(data)
})
.then(response => response.json())
.then(inferenceData => console.log(inferenceData));
|
If you want your request to be anonymous, just remove the Authorization
header. If you want instructions about how to get a Bearer token please check the related section below.
Authentication
The API gateway works with two kind of traffic:
- Anonymous traffic (no authentication).
- Authenticated traffic (via Meta's OAuth2 bearer tokens, see the dedicated section below about how to get one).
Anonymous users via the API gateway are limited to 50000 requests per hour (based on client IPs).
Regarding OAuth authentication, see the the documentation on Meta regarding Oauth for applications and web sites and Owner-only tokens.
Request a bearer token
As mentioned before, the API gateway supports users logged in with OAuth. The gist of it is that the user needs to authenticate with username/password in Meta Mediawiki's OAuth to obtain a bearer token, that will be used in a standard HTTP Authorization header by the user's client. The bearer token is a simple alphanumeric string of text, that has an expiry date. Another token called refresh may be issued alongside the bearer one to allow the client to refresh it when/if needed.
For most use cases (like bots etc..) it should be sufficient to register a owner-only consumer. Let's review all the possibilities to retrieve an OAuth bearer token:
- Request a token (the "Access token") from meta via formal process (basically the aforementioned owner-only consumer). It will need to be approved by a member of the community, but after the process a token is issued that shouldn't expire. The idea is that the app/bot/etc.. using the token will act on behalf of the user that registered/obtained it, so use it with care!
- Request a personal API token from https://api.wikimedia.org/wiki/Special:AppManagement, that shouldn't expire. This is similar to the above use case, but the token is issued by the API portal.
- Following this guide to get a token that lasts few hours, just for testing. The token will expire and it will need to be refreshed periodically.
If you want a higher limit to reach the 100000+ requests/hour capabilities, you'll need to create a task in Phabricator with the Machine-Learning-team
tag, providing your CLIENT ID. The CLIENT ID is also referred to as "Client application key" and "Consumer key", and is shown on your OAuth consumers list or by clicking "View Details" on the API AppManagement page. (It is also encoded in the access / bearer token as the "aud" value, and can be extracted by decoding the token with a JWT library.)
The Lift Wing endpoints have the following rate limits tiers:
- 50000 requests/hour for every anonymous clients/IP.
- 100000 requests/hour for every authenticated OAuth2 user (elevated to the
internal
tier). - 200000 requests/hour for every authenticated OAuth2 user from Wikimedia Enterprise (the token needs to be elevated to the
wme
tier).
IMPORTANT NOTE: After getting your client-id elevated to a new tier, you'll need to reset your token (basically requesting a new one, same client-id is ok) so that the new limit is applied. The rate-limit is in fact encoded inside the token itself.
The limits are not set in stone, they can be configured. Please follow up with the ML team for any doubt/suggestion/etc.. about them. If you want to check please search for ratelimit_config
entries in the following values.yaml config.
ML Admins only
Once a user has requested a token and cut a task in Phabricator with their CLIENT ID, its limit can be elevated on mwmaint machines like so:
$ mwscript extensions/OAuthRateLimiter/maintenance/setClientTierName.php --wiki metawiki --client CLIENTID --tier TIER
Where the TIER is one of the wgOAuthRateLimiterTierConfig
tiers from the MediaWiki config.
IMPORTANT: After the change above, the user must reset/re-issue the token from the same page they originally created it on, so the new limit is encoded in the token. Please mention this in the task.
Differences using Lift Wing instead of ORES
There are three main differences to using Lift Wing instead of ORES
- Lift Wing is a generic model hosting platform while ORES is a scoring service only for article revisions. The notion behind this is that microservices built with Lift Wing offer the following advantages:
- Scalability: Microservices can be independently scaled based on demand, allowing for more efficient resource utilization and improved performance.
- Flexibility: Microservices architecture enables the use of different languages, and frameworks for each model service, providing greater flexibility in development.
- Faster Deployment: Smaller codebases and independent deployment of microservices enable faster and more frequent releases, accelerating release to production.
- Fault Isolation: Failure in one microservice is less likely to impact the entire system, improving overall system resilience and uptime.
- Lift Wing hosts models as microservices, which means that in order to get a prediction using the same input data for more than one model one would have to issue two calls. On the contrary ORES acts as a scoring aggregator and can return predictions for many scores and many article revisions. The following ORES example fetches the predictions for article revision with id 123 for the models damaging and goodfaith:The above call would be translated in the following two Lift Wing calls:
curl https://ores.wikimedia.org/v3/scores/enwiki/123?models=damaging|goodfaith
curl https://api.wikimedia.org/service/lw/inference/v1/models/enwiki-damaging:predict -X POST -d '{"rev_id": 123}' curl https://api.wikimedia.org/service/lw/inference/v1/models/enwiki-goodfaith:predict -X POST -d '{"rev_id": 123}'
- Caching in ORES: In ORES a `precache` is a cache that is updated whenever a new revision is made. This helps decrease latency in subsequent identical calls. Although caching is an important aspect of any service, at the moment such functionality is not integrated in Lift Wing. The main reason for this is that Lift Wing is a general purpose system, which could lead to increased complexity when implementing a caching mechanism. The addition of such a feature in Lift Wing will be taken under consideration if it is seen as a necessity for its users.