Jump to content

Service/OpenSearch IPoid

From Wikitech

OpenSearch IPoid provides an API (via OpenSearch) to query IP reputation information obtained from Spur. This service is a replacement for Service/IPoid.

Querying

https://phabricator.wikimedia.org/P86445 shows an example query. tl;dr, make an OpenSearch query using the ip term.

Data pipeline

Twice a day, the download_and_index DAG fetches the main data file and a metadata file from Spur, and sends each row of data to the ipoid index on the OpenSearch cluster, setting the _id of each row to the ip found in the document, and also adding a @timestamp field, while also adding an annotation for tag_metadata_categories to indicate if e.g. an IP is associated with a residential proxy.

Once per day, the retention_cleanup DAG removes documents that are older than 7 days (using the @timestamp field).

Usage in MediaWiki

The entrypoint to querying data from this service is via mw:Extension:IPReputation. The extension sets a 2 second timeout (wgIPReputationIPoidRequestTimeoutSeconds) on requests to the service. When a request is made to IPoid, the response is cached for one hour. The response is cached even if no data is found in the OpenSearch index for an IP.

If OpenSearch IPoid is offline, the following impacts occur:

Code that queries OpenSearch IPoid is recorded in this Grafana chart. Latencies observed from roundtrip requests between MediaWiki and OpenSearch IPoid are recorded in this Grafana chart.

Service Dependencies

The OpenSearch cluster. (Details TBD)

Service Monitoring

See https://grafana.wikimedia.org/d/c0a89788-c6fe-4d06-aeb2-70b63049599e/opensearch-on-k8s and https://grafana.wikimedia.org/d/c86cb881-1645-4721-962c-b1ecb86d7902/95dbbe4?orgId=1&from=now-2d&to=now&timezone=utc

For logs, see (TBD)

For alerts, see (TBD)

Ownership

Product Safety and Integrity and Data Platform SRE.

Supporting documentation and relevant information