Wikilabels
wikilabels
project in Cloud VPS was deleted on 2024-08-19, see Nova_Resource:Wikilabels.Wikilabels is one of stand-alone services that is being used gather data from users to build AI models for ORES and it's being maintained by Wikimedia Scoring Platform team. It's currently hosted on Nova_Resource:wikilabels (Cloud VPS)
Technical details
- There are several instances:
- wikilabels-03.wikilabels.eqiad1.wikimedia.cloud: The main node and uses Postgresql (wikilabels-database-02) to work. It's accessible from labels.wmflabs.org
- wikilabels-staging-02.wikilabels.eqiad1.wikimedia.cloud: The staging node, uses similar setup and accessible from labels-staging.wmflabs.org
- wikilabels-backups.wikilabels.eqiad1.wikimedia.cloud: The nodes that keeps daily database backups of the main node. Accessible from wikilabels-dumps.wmflabs.org
- wikilabels-database-02.wikilabels.eqiad1.wikimedia.cloud: Postgres database node that is the backing store for the uwsgi applications.
- Codes:
Initialize a VM
From your local laptop/workstation, checkout the deploy repository and make sure that you can ssh to the target cloud VPS instance. Then create a Python venv and install fabric3. This will allow you to do the following:
fab initialize_server:hosts="wikilabels-03.wikilabels.eqiad1.wikimedia.cloud"
You also need to place OAuth keys in a specific file (a random key is good):
elukey@wikilabels-03:~$ cat /srv/wikilabels/config/config/99-oauth.yaml
# These creditials are intended to be used when testing the local, development
# version of Wiki Labels. Do not use these credentials in a production
# environment. They will redirect users to localhost:8080 expecting to find
# Wiki Labels there.
oauth:
key: xxx
secret: xxxx
You'll also need to create a file named 98-database.yaml with the following content:
# These credentials are intended to be used on labels.wmflabs.org. They are
# sensitive and should never be commited to a public repository.
database:
user: u_wikilabels
dbname: u_wikilabels
password: REDACTED
Deployment guide
After things getting merged in the main repo. You need to update the deploy repo.
cd wikilabels-wmflabs-deploy/ git pull cd submodules/wikilabels git pull cd ../.. git add wikilabels git commit
Then write something like "Bumping wikilabels to HEAD"
git push fab stage
Now it's in the staging node. log it (using !log wikilabels in #wikimedia-cloud channel in IRC) Test it and if it works fine move to prod
git checkout deploy git rebase origin/master git push -f origin deploy fab deploy
And log it!
A new labeling campaign
You need to first introduce a new campaign:
$ ssh wikilabels-03.eqiad1.wikimedia.cloud ladsgroup@wikilabels-02$ cd /srv/wikilabels/config ladsgroup@wikilabels-02:/srv/wikilabels/config$ sudo -u www-data /srv/wikilabels/venv/bin/wikilabels new_campaign wikidatawiki "Edit quality (5k, 2018)" damaging_and_goodfaith DiffToPrevious 1 50 {'form': 'damaging_and_goodfaith', 'id': 38, 'view': 'DiffToPrevious', 'active': True, 'name': 'Edit quality (5k, 2018)', 'tasks_per_assignment': 50, 'labels_per_task': 1, 'wiki': 'wikidatawiki', 'info_url': None, 'created': datetime.datetime(2018, 7, 11, 13, 39, 54, 282569)}
Note the id (38 in this case). And now you need to load the data into the campaign. Download the file in the home directory:
ladsgroup@wikilabels-03:/srv/wikilabels/config$ less ~/wikidatawiki.autolabeled_revisions.125k_2018.review.json | sudo -u www-data ../venv/bin/wikilabels task_inserts 38
Restarting the service
Any time the connection PostgreSQL is broken, we need to restart the wikilabels service:
service uwsgi-wikilabels-web restart
Dumping and restoring or Migrating the database
The uwsgi app on wikilabels-03 uses a Postgres database as a backing store. This used to be a clouddb instance, but as of November 2022, is a separate VM, wikilabels-database-02.
Database credentials
The uwsgi app keeps its database configuration in two files, /srv/wikilabels/config/default-db-config.yaml and /srv/wikilabels/config/config/98-database.yaml.
The first file contains host information and user credentials, though the latter are unused. The actual username, password and database (inside of Postgres) to use are in the second file, 98-database.yaml.
Dumping data
To dump the data in an easily restored format, use the pg_dump tool. You can run this on either wikilabels-03 or wikilabels-database-02:
$ pg_dump -U u_wikilabels -h wikilabels-database-02 u_wikilabels -f pg_dump-$(date -Is).sql
Note that the database name is u_wikilabels, just like the user name.
The above command will prompt you for the password of the u_wikilabels user, and then dump the database content as a series of SQL commands to stdout, and we redirect that to a timestamped file. The total amount of data is about 100MB.
Restoring data
To restore the saved data, copy the file to a convenient host (the database host itself is usually easiest). If necessary, create the user and database on the new Postgres instance (as root):
$ sudo -u postgres createuser u_wikilabels $ sudo -u postgres psql psql (13.8 (Debian 13.8-0+deb11u1)) Type "help" for help. postgres=# \password u_wikilabels Enter new password for user "u_wikilabels": Enter it again: postgres=# exit $ sudo -u postgres createdb -O u_wikilabels u_wikilabels
This creates the u_wikilabels user, sets their password, and then creates a new database also called u_wikilabels owned by the just-created user.
You can then restore the saved data by piping the dump file into an appropriate psql connection command:
psql -h localhost -W -d u_wikilabels -U u_wikilabels < pg_dump-[timestamp].sql
This will display some messages on stdout about what SQL commands are run (SET, CREATE TABLE etc).
Once this is done, the uwsgi application can be pointed at the new DB by editing the host setting /srv/wikilabels/config/default-db-config.yaml and restarting the application.