Tool:Wikidata Mismatch Finder
This page is meant to keep documentation related to deployment and maintenance of the Wikidata Mismatch Finder tool. For Usage and developer documentation, please see the main tool documentation in the tool's git repository.
Wikidata Mismatch Finder | |
---|---|
Website | https://mismatch-finder.toolforge.org/ |
Description | A tool to review mismatches between Wikidata and External Databases. |
Keywords | wikidata, databases, mismatch |
Maintainer(s) | Wikimedia Deutschland (View all) |
Source code | https://github.com/wmde/wikidata-mismatch-finder |
License | BSD 3-clause "Modified" License |
Issues | Open tasks · Finder Report a bug |
Changing Server Configuration
Our toolforge instance uses the default lighttpd server, with some default configurations provided for us. All our mismatch finder specific configuration overrides are stored in .lighttpd.conf
, at our tool’s home directory. Follow these steps to make any configuration changes.
SSH into toolforge
ssh <your-username>@login.toolforge.org
Log in as the mismatch finder tool account
become mismatch-finder
Make your changes to our config file (feel free to use
vim
instead ofnano
if you wish)nano .lighttpd.conf
Restart the server
webservice restart
NOTE! Your changes might take a minute or so to propagate, so be patient.
Additional Resources
Updating Environment Variables
All environment variables for Mismatch Finder's production environment are stored in the tool's home folder, and are symlinked into the repository sub-directory. To update production environment variables, please follow the steps below:
- SSH into toolforge
ssh <your-username>@login.toolforge.org
- Log in as the mismatch finder tool
become mismatch-finder
- Backup the current
.env
filecp .env .env.bak
- Make your changes to the
.env
fileNOTE! There's no need to restart the server. Since the file is symlinked, the changes should propagate immediately.nano .env
Additional Resources
Running Database Migrations
Any change that is deployed together with a migration file, requires us to run the database migration command included with Laravel’s artisan
script. To do so, follow these steps:
SSH into toolforge
ssh <your-username>@login.toolforge.org
-
Log in as the mismatch finder tool account
become mismatch-finder
Log in to our web service (to run commands with php 7.3)
webservice shell
Navigate to the code repository
cd mismatch-finder-repo
-
Run the migration script, and follow the instructions on the screen
php artisan migrate
Troubleshooting
Problem: The first time we tried to run the migration, we encountered the following error:
General error: 1709 Index column size too large. The maximum column size is 767 bytes.
Possible Explanation: As explained in this Stack Overflow post: Laravel was attempting to use an unsupported character set and collation, that were incompatible with our database. In addition, it was advised in the toolforge documentation to create the database (anew) with an utf8
char set.
Solution: After recreating the database with the correct char set. We also changed Laravel’s default configuration to ensure the encoding and collation are compatible:
Additional Resources
Update uploaders in production
In order to update our allowlist of uploaders in the live application, we can utilize the custom artisan command we created to synchronize uploaders with a plain text file. Follow the steps below to log in to our server and update our list:
SSH into toolforge
ssh <your-username>@login.toolforge.org
-
Log in as the mismatch finder tool account
become mismatch-finder
Backup the current uploaders list
cp uploaders.txt uploaders.bckp.txt
Edit our uploader list using vim or nano
nano uploaders.txt
Copy the list over to the app repository
cp uploaders.txt mismatch-finder-repo/storage/app/allowlist/
Log in to our web service (to run commands with php 7.3)
webservice shell
Navigate to the code repository
cd mismatch-finder-repo
Run the migration script, and follow the instructions on the screen
php artisan uploadUsers:set uploaders.txt
-
Make sure that the list was updated
php artisan uploadUsers:show
Additional Resources
Drop uploaded mismatches in production
In order to delete a set of imported mismatches in the live application, we also use a custom artisan command. Follow the steps below to log in to our server and delete file imports from the database together with their associated mismatches:
SSH into toolforge
ssh <your-username>@login.toolforge.org
-
Log in as the mismatch finder tool account
become mismatch-finder
Log in to our web service (to run commands with php 7.3)
webservice shell
Navigate to the code repository
cd mismatch-finder-repo
Show the list of currently imported mismatch files - the first column shows the import ID
php artisan import:list +----+-------------+-----------------+-----------------+------------+-----------------+ | ID | Import Date | External Source | User | Expires at | # of Mismatches | +----+-------------+-----------------+-----------------+------------+-----------------+ | 11 | 2021-09-07 | internet | raheem.eichmann | 2022-09-07 | 23 | | 12 | 2021-09-11 | internet | raheem.eichmann | 2022-09-11 | 42 | | 13 | 2021-09-17 | internet | raheem.eichmann | 2022-09-17 | 345 | +----+-------------+-----------------+-----------------+------------+-----------------+
-
Delete an imported file and all its associated mismatches
IMPORTANT: Dropping an import from the store will delete all its asociated mismatches, whether they have been reviewed or not.php artisan import:drop 12 Dropping import ID 12 with 42 mismatches Are you sure? (yes/no) [no]: > y Successfully dropped import ID 12 with 42 associated mismatches
Additional Resources
Restarting the Job Queue Daemon
In Mismatch Finder, we use background jobs for validation and import of the mismatch files. In production, they are managed by a queue worker daemon, that is running on a separate kubernetes pod next to the webservice container.
When deploying changes to production, that are affecting background jobs, a few additional steps have to be carried out in order for them to take effect. From the Laravel Documentation:
Remember, queue workers, are long-lived processes and store the booted application state in memory. As a result, they will not notice changes in your code base after they have been started. So, during your deployment process, be sure to restart your queue workers.
That means, after deploying any such change, the queue worker needs to be restarted via an artisan command from within the webservice container:
-
SSH into toolforge
ssh <your-username>@login.toolforge.org
-
Log in as the mismatch finder tool account
become mismatch-finder
Log in to our web service (to run commands with php 7.3)
webservice shell
-
Navigate to the code repository
cd mismatch-finder-repo
Restart the queue worker via artisan command
php artisan queue:restart
This will signal the queue worker to restart after having finished its current job.
Additional Resources
Updating The Job Queue Container
In order to ensure continuous import and validation job execution, the Wikidata Mismatch Finder deploys a php 7.3 container to run the job queue on. In order to make changes to this deployment's configuration, first update the deployment.yaml file at the root of our repository, and then open a Pull Request to test and merge your changes. Once the changes have been approved and deployed, recreate the container by following these steps:
Production
- SSH into toolforge
ssh <your-username>@login.toolforge.org
- Log in as the mismatch-finder tool account
become mismatch-finder
- Delete the currently running job queue deployment
kubectl delete deployment mismatch-finder.queue
- Recreate the container from the new version of the deployment configuration
kubectl create --validate=true -f mismatch-finder-repo/containers/production-queue.yaml
Staging
- SSH into toolforge
ssh <your-username>@login.toolforge.org
- Log in as the mismatch-finder-staging tool account
become mismatch-finder-staging
- Delete the currently running job queue deployment
kubectl delete deployment mismatch-finder-staging.queue
- Recreate the container from the new version of the deployment configuration
kubectl create --validate=true -f mismatch-finder-repo-next/containers/staging-queue.yaml