API Gateway

From Wikitech
Jump to navigation Jump to search

The API Gateway is a service that runs in Kubernetes based on Envoy. The service implements many features central to serving the unified API and the API portal.

What it does

The API Gateway serves pages for api.wikimedia.org. It does this by rewriting requests for the unified API to URIs that are understood by the respective APIs on the application servers, and also by serving pages for the API Portal wiki in the same way other wikis would be served. The API Gateway also uses metadata from JSON Web Tokens (JWTs) to apply rate limits to clients using the APIs.

How it works

Wikimedia API Gateway architecture diagram

Envoy is more or less functioning as any other API router/reverse proxy does. The proxy answers requests for the configured domain, does some selective manipulation and then asks appservers via their LVS endpoint for page content.

Rate limiting

The API Gateway applies rate limits to clients based upon the JWT provided (or not) by the client. During beta unauthenticated clients are currently limited to 500 requests an hour, and authenticated clients that pass a valid JWT are limited to 5000 requests. These values are entirely temporary and will be changed as the platform moves towards general release. Clients issue JWTs by requesting OAuth 2.0 clients on Meta and in future, on the API Portal.

Routing

The API Gateway maps API URIs passed to the Gateway's hostname (api.wikimedia.org) to the relevant APIs understood by the application servers. For example, https://api.wikimedia.org/core/v1/wikipedia/en/page/pizza is mapped to https://en.wikipedia.org/w/rest.php/v1/page/pizza by the Gateway's configuration language. As of September 2020, it is required to use a relatively complex rewriting method using Lua and multiple definitions of URL patterns seen in the values.yaml file, but this will be fixed in Envoy 1.16.0. Currently all APIs that are offered by the API Gateway are also directly accessible via the traditional API routes on their per-service level.

JSON Web Tokens

The API Gateway verifies the signatures of JWT Authorisation headers included alongside requests. If a JWT is valid, a different limit is applied. This limit can be configured via the Helm values file per environment (Values.main_app.ratelimiter.default_limit.unit for valid JWTs and Values.main_app.ratelimiter.anon_limit.unit for anonymous users).

API Portal

The API Gateway is the means by which all clients access the API Portal. The API Portal is simply a customised Mediawiki instance and the API Gateway serves requests to it by proxying requests to the appservers. Unlike other wikis however, the API Portal is only accessible via the API Gateway.

Logs and analytics

Logs are shipped from the API Gateway to EventGate using fluentd. Fluentd runs in its own container, continuously parsing JSON request log output and reposting these logs to Eventgate.

Where it runs

The API Gateway runs in Kubernetes in staging, eqiad and codfw. The instance in staging does not receive external traffic but can be accessed internally at https://api-gateway.svc.eqiad.wmnet:8087. Changes should be deployed to staging and tested via curl on this endpoint.

How to design your API

This section is for teams who want to add their API to the API gateway and API Portal.

Before you build

Style

All APIs in the API gateway follow a RESTful architectural style. For general guidance on our interpretation of REST and recommended API best practices, visit the design principles.

Authentication and authorization

The API gateway supports OAuth 2.0 as the sole method for authentication and authorization. The gateway requires that write requests include an OAuth token, effectively prohibiting writes without a registered consumer.

At this time, the API Portal supports only the MediaWiki rights covered by the basic, createeditmovepage, and editprotected grants. To add a grant, open a task in Phabricator tagged with #API-Portal.

Rate limits

Requests to api.wikimedia.org are subject to centrally defined rate limits. Rate limits are based on the type of OAuth 2.0 workflow used by the consumer. Requests without an OAuth 2.0 token are subject to a significantly lower rate limit. Visit the documentation for specific rate limits.

Criteria: URL structure

API gateway endpoints follow a consistent URL structure:

# Base URL
https://api.wikimedia.org

# Structure
{base URL}/{namespace}/v{version number}/{project name}/{subdomain}/{endpoint}

# Example: Get the Earth article from English Wikipedia
https://api.wikimedia.org/core/v1/wikipedia/en/page/Earth

Namespace

Endpoints MUST be grouped under a namespace as the initial URL element after the base URL. The namespace should represent a logical grouping of endpoints by function or origin. For a list of namespaces currently in use, visit the API Portal.

This pattern in currently in use but should be reconsidered and/or discontinued per phab:T280087
For APIs originating outside of MediaWiki, the namespace may include both the general /service namespace and a service-specific subnamespace (for example: /service/linkrecommendation). For more information about this pattern, see the other API routes section.

Version

Following the namespace, endpoints MUST include a version in the URL. Experimental or unstable APIs MUST use v0. Endpoints using v1 and above MUST follow the stability policy.

Supported projects

Out of the projects supported by the gateway pathing map, endpoints MUST address project routing (Wikipedia, Wiktionary, etc.) through a URL element following the version, using the unabbreviated project name. If your API has limited project support, you MUST document (either in the API Portal, error message, or other location) which projects are supported by your API.

Supported subdomains

Endpoints MUST address subdomain routing (en, zh, etc.) per project through a URL element following the project, using the subdomain. These subdomains are usually ISO-standard language codes, but there are exceptions.

If your API has limited subdomain support, you MUST document (either in the API Portal, error message, or other location) which subdomains are supported for each project (Wikipedia, Wiktionary, etc.). Note that multilingual projects (commons, mediawiki, meta, wikidata, and wikispecies) do not accept a subdomain parameter.

Criteria: Conditional requests

Your API MUST support conditional requests using etag or last-modified headers.

Next steps

How it's configured

The API Gateway uses the reserved port 8087 internally and is registered in Service ports.

The core configuration for the API Gateway helm chart is documented in the default values.yaml file. Note that there are configuration overrides for production in general, and also for eqiad and codfw specifically (and staging, which does not serve public requests).

JWT tokens are verified using the public key of the keypair used to sign OAuth tokens on meta.wikimedia.org. This key has been converted to the JWKS format required for support using JWTs and is distributed as a secret via puppet.

How to add an API route

Appserver API routes

API routes for services running on Mediawiki application servers are defined in the values file for the API Gateway chart under the pathing_map.

  • The keys for the dictionary are the path served on api.wikimedia.org - this line will be interpreted as a regular expression
  • The "path" parameter within the new entry is the path to be queried from the appservers - \group characters refer to the matched groups in the upper API path regex
  • The "host" parameter represents the rewritten Host header that will be passed to the appserver. This can be a simple string to replace the host on all matching requests
    • If the host parameter contains the word LANGUAGE, LANGUAGE will be replaced by the parameter matched by group captured by the lua_lang regular expression.
  • The "lua_lang" regular expression matches the api.wikimedia.org URL (the upper API path regex) and should contain a single group only that matches the language to be replaced in the host parameter.

Currently requests are routed to different locations depending on the path requested.

  • All requests under /core/ with the exception of queries ending “description” are routed to appservers.
  • Requests ending in “description” are routed to the mobileapps service
  • Requests for paths starting with /service/ are routed to service discovery endpoints. This relation is defined in the Helmfile configuration for the gateway.

Other API routes

Routing to a custom service is a little more involved than adding an Appserver API route. Currently only services with a discovery configuration can be routed to. The host is added to the discovery_endpoints dictionary in configuration by its discovery hostname, a port is specified and and endpoint configured. The service will be accessible at https://api.wikimedia.org/services/my_endpoint. See the existing dictionary for pointers.

Versioning

Unstable APIs that may undergo backwards-incompatible changes should be added to the API Gateway with /v0/ in the path. Stable APIs that can comply with the API Gateway versioning policy should be added with /v1/ in the path.

Documentation

To add API routes to the API Portal, see the API Portal page.

How to deploy changes

The API Gateway's configurable components all live within the deployment-charts repository. The components that are of interest are the api-gateway chart itself and the aforementioned helmfile.d configuration for the service. Note: when changing configuration in the API Gateway chart, make sure to bump the version in Chart.yaml. Not bumping this value will lead to your changes not being deployed.

Changes to the API Gateway chart or configuration files follow a standard code review process. Once you have received a +1 in Gerrit, submitting a +2 will trigger the auto-merge process for the deployment-charts repository. Once the change is merged, always deploy it to staging first and then deploy to the production environments using the standard deployment process.

There are currently no specific deployment windows for the API Gateway but if deploying a change ad hoc without PET's knowledge, it is best to both !log liberally and make sure that someone from the team is on hand if you're doing something risky.

How to roll back changes

Follow the standard rollback procedures. If a change is affecting user experience in any way (increases in error codes served, timeouts etc - always refer to the dashboards when deploying), use the emergency procedure to limit the public impact of a change.

How to test changes

In development

Given the API Gateway's interactions with the appservers, testing changes locally can be difficult. However, there exists limited support for testing changes - if you have a local setup like minikube or similar, you can install a local version of the API Gateway by running helm install -f api-gateway/values-devel.yaml api-gateway in the charts directory. You will also need to build the echoapi container beforehand. This is required only once, see the chart's README for more details. Once your install is complete and you have forwarded the requisite ports, requests will be passed to a fake backend service that will return the headers and parameters of requests and responses to any requests. This can be used to ensure that basic behaviour changes are in keeping with what you're expecting, that Envoy syntax checks out and that URL mappings are behaving as expected, amongst other things.

In staging

When changes have been deployed to staging, they can be tested using curl from any internal host. This can make it difficult to test changes that rely on Mediawiki changes, but it is unlikely that helm will be used to change the API Portal's behaviour in lieu of the standard mediawiki-config deployment process.

For example, to test a change to the API routing, run curl -k https://staging.svc.eqiad.wmnet:8087/core/v1/wikipedia/ga/page/Veigeat%C3%B3ireachas -v. When deploying new changes to staging, it should be verified that the change has had no impact on the API in general and specifically any API paths that have been modified or added. The normal operation of the API Portal should be tested - nothing too extensive but make sure that the main page loads okay.

How to debug it

Logs

To read and follow the logs for a API Gateway instance (codfw in this example):

hnowlan@deploy1001:~$ kube_env "api-gateway" "codfw"
hnowlan@deploy1001:~$ kubectl get pods | grep Running
api-gateway-production-5cd8c54ddb-rcg77   5/5     Running   0          5d5h
tiller-deploy-77f47486d6-fxhpx            1/1     Running   0          6d3h
nowlan@deploy1001:~$ kubectl logs api-gateway-production-5cd8c54ddb-rcg77 api-gateway-production --tail 10 -f

This will show the last 10 lines of the logs and then follow output.

Note that Envoy's log format is extremely verbose and dumping whole logs may take a few seconds. Following logs may be challenging at times as they can seem non-linear as many requests may be interpolated amongst each one another - one aid in sorting through logs is following the [Cxxxxx] fields in the logs which are unique connection IDs that can be used to follow requests as they are received and answered.

The above example can also be used to monitor the ratelimit service - in place of api-gateway-production simply subsitute production-ratelimit. This pattern applies to the other services within the pod but their log output is not always useful.

WikimediaDebug

The WikimediaDebug plugin is supported for accessing the API Portal. It is not currently supported for routing API requests.

How to monitor it

There is a Grafana dashboard available that monitors many features of the API Gateway.

How to assign a client to rate limit tier

All clients are assigned to the default ratelimit tier. To change the tier, use the setClientTierName.php maintenance script.

Log in to the mwmaint host in the active datacenter, usually mwmaint1002, and execute:

mwscript extensions/OAuthRateLimiter/maintenance/setClientTierName.php --wiki metawiki --client <client_id> --tier <tier_name>

At the time of writing 3 tiers exist:

  • Default rate limit class: 5000 API calls/hour per client ID/user ID pair (with null user ID counting as a pair here)
  • Preferred rate limit class: 25,000 API calls/hour per client ID/user ID pair
  • Internal rate limit class: 100,000 API calls/hour per client ID/user ID pair

Known issues

  • An issue has been seen where occasionally users will see {"httpCode":503,"httpReason":"upstream connect error or disconnect/reset before headers. reset reason: connection termination"} instead of being served the API portal. This issue could relate to connection reuse or TLS termination issues within Envoy itself, it's not clear. However, a fix limiting the amount and length of connection reuse when connecting to upstream hosts in Envoy has limited the impact. For more details see T262490.
  • If a user receives an error of {"httpCode":401,"httpReason":"Jwt issuer is not configured"} it is because the "iss" field in the token does not match the one configured on the API Gateway. Depending on what has been changed this could be a misconfiguration of the OAuth token creation process or of the Gateway itself. Envoy is very strict about issuer being set (although this is changing) and a mismatch will lead to tokens being rejected.

Related