Similarusers
Similarusers (formerly known as "Sockpuppet" or the "Sockpuppet API") is a service running in Kubernetes. It can be used to identify relative similarity between users on wikis. It was developed by the Research team, productionised by the Platform Engineering team, and currently used in the SimilarEditors extension; see that page for installation instructions and documentation.
Development documentation can be found in the service README file.
It runs on port 4110.
What it does
The similar-users service uses analyses of the Dumps to determine how similar given users seem to be based on how and when their edits took place.
How it works
On a monthly basis as new dumps are released, a PySpark notebook runs on the Analytics cluster. This notebook analyses all edits and generates summaries of temporal and coedit data (see the similar-users README for more details on the process), and the results are stored in MariaDB. When the similar-users service receives a request for a particular user, it queries the database and the Mediawiki API, and then uses this information to gather information about similar users, which is then returned as JSON.
Service Documentation
The service is documented using the OpenAPI format.
A documentation endpoint is available at /apidocs. This is the authoritative documentation source. Developers can access the endpoint locally by running the service docker container:
git clone "https://gerrit.wikimedia.org/r/mediawiki/services/similar-users"
make docker
Visit
http://0.0.0.0:5000/apidocs
The endpoint is protected by BasicAuth. Credentials are specified by the BASIC_AUTH_USERNAME and BASIC_AUTH_PASSWORD variables in the config file at
similar_users/flask_config.yaml
See README for more information.
Where it runs
The service will run in Kubernetes in the staging, codfw and eqiad clusters
Deploying
The similar-users service is deployed using the standard Helm deployment process.