Service/OpenSearch IPoid
OpenSearch IPoid provides an API (via OpenSearch) to query IP reputation information obtained from Spur. This service is a replacement for Service/IPoid.
Querying
https://phabricator.wikimedia.org/P86445 shows an example query. tl;dr, make an OpenSearch query using the ip term.
Data pipeline
Twice a day, the download_and_index DAG fetches the main data file and a metadata file from Spur, and sends each row of data to the ipoid index on the OpenSearch cluster, setting the _id of each row to the ip found in the document, and also adding a @timestamp field, while also adding an annotation for tag_metadata_categories to indicate if e.g. an IP is associated with a residential proxy.
Once per day, the retention_cleanup DAG removes documents that are older than 7 days (using the @timestamp field).
Usage in MediaWiki
The entrypoint to querying data from this service is via mw:Extension:IPReputation. The extension sets a 2 second timeout (wgIPReputationIPoidRequestTimeoutSeconds) on requests to the service. When a request is made to IPoid, the response is cached for one hour. The response is cached even if no data is found in the OpenSearch index for an IP.
If OpenSearch IPoid is offline, the following impacts occur:
- AbuseFilters using IP reputation variables https://www.mediawiki.org/wiki/Special:MyLanguage/Extension:IPReputation/AbuseFilter_variables will assume that there is no IP reputation data available in the request
- Logging that uses
getSecurityLogContextwill not have data enriched with IP reputation variables - Event logging in WikimediaEvents that uses IP reputation data will assume that no data is available for that IP
Code that queries OpenSearch IPoid is recorded in this Grafana chart. Latencies observed from roundtrip requests between MediaWiki and OpenSearch IPoid are recorded in this Grafana chart.
Service Dependencies
The OpenSearch cluster. (Details TBD)
Service Monitoring
See https://grafana.wikimedia.org/d/c0a89788-c6fe-4d06-aeb2-70b63049599e/opensearch-on-k8s and https://grafana.wikimedia.org/d/c86cb881-1645-4721-962c-b1ecb86d7902/95dbbe4?orgId=1&from=now-2d&to=now&timezone=utc
For logs, see (TBD)
For alerts, see (TBD)
Ownership
Product Safety and Integrity and Data Platform SRE.