Machine Learning/LiftWing/Usage

From Wikitech
Jump to navigation Jump to search

Lift Wing's model servers can be exposed in two ways:

  • Via internal endpoint, namely available only for clients internal to the production WMF network (hence excluding stuff like Toolforge and Cloud VMs).
  • Via external endpoint, available for all clients from the outside Internet.

In this page you can find info about both, please use the method that best suits your needs.

Internal endpoints

Once the model server is deployed on Lift Wing, it'll become available internally via two endpoints (the former is prod and the latter is staging):

  • https://inference.discovery.wmnet:30443/v1/models/{MODEL_NAME}:predict
  • https://inference-staging.svc.codfw.wmnet:30443/v1/models/{MODEL_NAME}:predict (staging)

The caller/client needs to set the HTTP Host header as follows: {MODEL_NAME}.{KUBERNETES_NAMESPACE}.wikimedia.org

You can find {MODEL_NAME} and {KUBERNETES_NAMESPACE} in the tables in the Machine Learning/LiftWing#Current Inference Services.

When do I need to use the internal endpoint?

If you have an internal application/client/etc.. that runs inside the WMF infrastructure, then you can definitely use the internal endpoint. Please note that any massive traffic volume needs to be vouched by the ML team first (just ping them on IRC or Phabricator). There are rate-limit filters in place but better safe than sorry :)

For example: if you plan to run a Spark job on Hadoop to train a model, hitting Lift Wing 1000000 times to retrieve scores in a short time frame may impact other production workflows. Please follow up with the ML team first to agree on a strategy :)

Example usage of internal endpoint

External endpoints

The ML Team uses the API gateway infrastructure to publish the Lift Wing endpoints in api.wikimedia.org.

The URL to construct is the following: https://api.wikimedia.org/service/lw/inference/v1/models/{MODEL_NAME}:predict

You can find the {MODEL_NAME}in Machine_Learning/LiftWing#Current_Inference_Services. Note that models in the "experimental" namespace are not available for external endpoints.

When do I need to use the external endpoint?

Every time a client needs to call Lift Wing from outside the Wikimedia infrastructure (this also means any tool from Toolforge).

Example usage of external endpoint

If you want your request to be anonymous, just remove the Authorization header. If you want instructions about how to get a Bearer token please check the related section below.

Authentication

The API gateway works with two kind of traffic:

  • Anonymous traffic (no authentication).
  • Authenticated traffic (via Meta's OAuth2 bearer tokens, see the dedicated section below about how to get one).

Anonymous users via the API gateway are limited to 50000 requests per hour (based on client IPs).

Regarding OAuth authentication, see the the documentation on Meta regarding Oauth for applications and web sites and Owner-only tokens.

Request a bearer token

As mentioned before, the API gateway supports users logged in with OAuth. The gist of it is that the user needs to authenticate with username/password in Meta Mediawiki's OAuth to obtain a bearer token, that will be used in a standard HTTP Authorization header by the user's client. The bearer token is a simple alphanumeric string of text, that has an expiry date. Another token called refresh may be issued alongside the bearer one to allow the client to refresh it when/if needed.

For most use cases (like bots etc..) it should be sufficient to register a owner-only consumer. Let's review all the possibilities to retrieve an OAuth bearer token:

  • Request a token (the "Access token") from meta via formal process (basically the aforementioned owner-only consumer). It will need to be approved by a member of the community, but after the process a token is issued that shouldn't expire. The idea is that the app/bot/etc.. using the token will act on behalf of the user that registered/obtained it, so use it with care!
  • Request a personal API token from https://api.wikimedia.org/wiki/Special:AppManagement, that shouldn't expire. This is similar to the above use case, but the token is issued by the API portal.
  • Following this guide to get a token that lasts few hours, just for testing. The token will expire and it will need to be refreshed periodically.

If you want a higher limit to reach the 100000+ requests/hour capabilities, you'll need to create a task in Phabricator with the Machine-Learning-team tag, providing your CLIENT ID. The CLIENT ID is also referred to as "Client application key" and "Consumer key", and is shown on your OAuth consumers list or by clicking "View Details" on the API AppManagement page. (It is also encoded in the access / bearer token as the "aud" value, and can be extracted by decoding the token with a JWT library.)

The Lift Wing endpoints have the following rate limits tiers:

  • 50000 requests/hour for every anonymous clients/IP.
  • 100000 requests/hour for every authenticated OAuth2 user (elevated to the internal tier).
  • 200000 requests/hour for every authenticated OAuth2 user from Wikimedia Enterprise (the token needs to be elevated to the wme tier).

IMPORTANT NOTE: After getting your client-id elevated to a new tier, you'll need to reset your token (basically requesting a new one, same client-id is ok) so that the new limit is applied. The rate-limit is in fact encoded inside the token itself.

The limits are not set in stone, they can be configured. Please follow up with the ML team for any doubt/suggestion/etc.. about them. If you want to check please search for ratelimit_config entries in the following values.yaml config.

ML Admins only

Once a user has requested a token and cut a task in Phabricator with their CLIENT ID, its limit can be elevated on mwmaint machines like so:

$ mwscript extensions/OAuthRateLimiter/maintenance/setClientTierName.php --wiki metawiki --client CLIENTID --tier TIER

Where the TIER is one of the wgOAuthRateLimiterTierConfigtiers from the MediaWiki config.

IMPORTANT: After the change above, the user must reset/re-issue the token from the same page they originally created it on, so the new limit is encoded in the token. Please mention this in the task.

Differences using Lift Wing instead of ORES

There are three main differences to using Lift Wing instead of ORES

  1. Lift Wing is a generic model hosting platform while ORES is a scoring service only for article revisions. The notion behind this is that microservices built with Lift Wing offer the following advantages:
    • Scalability: Microservices can be independently scaled based on demand, allowing for more efficient resource utilization and improved performance.
    • Flexibility: Microservices architecture enables the use of different languages, and frameworks for each model service, providing greater flexibility in development.
    • Faster Deployment: Smaller codebases and independent deployment of microservices enable faster and more frequent releases, accelerating release to production.
    • Fault Isolation: Failure in one microservice is less likely to impact the entire system, improving overall system resilience and uptime.
  2. Lift Wing hosts models as microservices, which means that in order to get a prediction using the same input data for more than one model one would have to issue two calls. On the contrary ORES acts as a scoring aggregator and can return predictions for many scores and many article revisions. The following ORES example fetches the predictions for article revision with id 123 for the models damaging and goodfaith:
    curl https://ores.wikimedia.org/v3/scores/enwiki/123?models=damaging|goodfaith
    
    The above call would be translated in the following two Lift Wing calls:
    curl https://api.wikimedia.org/service/lw/inference/v1/models/enwiki-damaging:predict -X POST -d '{"rev_id": 123}'
    
    curl https://api.wikimedia.org/service/lw/inference/v1/models/enwiki-goodfaith:predict -X POST -d '{"rev_id": 123}'
    
  3. Caching in ORES: In ORES a `precache` is a cache that is updated whenever a new revision is made. This helps decrease latency in subsequent identical calls. Although caching is an important aspect of any service, at the moment such functionality is not integrated in Lift Wing. The main reason for this is that Lift Wing is a general purpose system, which could lead to increased complexity when implementing a caching mechanism. The addition of such a feature in Lift Wing will be taken under consideration if it is seen as a necessity for its users.