Similarusers

From Wikitech

Similarusers (formerly known as "Sockpuppet" or the "Sockpuppet API") is a service running in Kubernetes. It can be used to identify relative similarity between users on wikis. It was developed by the Research team, productionised by the Platform Engineering team, and currently used in the SimilarEditors extension; see that page for installation instructions and documentation.

Development documentation can be found in the service README file.

It runs on port 4110.

What it does

The similar-users service uses analyses of the Dumps to determine how similar given users seem to be based on how and when their edits took place.

How it works

On a monthly basis as new dumps are released, a PySpark notebook runs on the Analytics cluster. This notebook analyses all edits and generates summaries of temporal and coedit data (see the similar-users README for more details on the process), and the results are stored in MariaDB. When the similar-users service receives a request for a particular user, it queries the database and the Mediawiki API, and then uses this information to gather information about similar users, which is then returned as JSON.

Service Documentation

The service is documented using the OpenAPI format.

A documentation endpoint is available at /apidocs. This is the authoritative documentation source. Developers can access the endpoint locally by running the service docker container:

git clone "https://gerrit.wikimedia.org/r/mediawiki/services/similar-users"
make docker

Visit

http://0.0.0.0:5000/apidocs

The endpoint is protected by BasicAuth. Credentials are specified by the BASIC_AUTH_USERNAME and BASIC_AUTH_PASSWORD variables in the config file at

similar_users/flask_config.yaml


See README for more information.

Where it runs

The service will run in Kubernetes in the staging, codfw and eqiad clusters

Deploying

The similar-users service is deployed using the standard Helm deployment process.

Testing

How it's monitored

Related pages