Tool:Wikidata Mismatch Finder

From Wikitech

This page is meant to keep documentation related to deployment and maintenance of the Wikidata Mismatch Finder tool. For Usage and developer documentation, please see the main tool documentation in the tool's git repository.


Toolforge tools
Wikidata Mismatch Finder
Website https://mismatch-finder.toolforge.org/
Description A tool to review mismatches between Wikidata and External Databases.
Keywords wikidata, databases, mismatch
Maintainer(s) Wikimedia Deutschland (View all)
Source code https://github.com/wmde/wikidata-mismatch-finder
License BSD 3-clause "Modified" License
Issues Open tasks · Finder Report a bug

Changing Server Configuration

Our toolforge instance uses the default lighttpd server, with some default configurations provided for us. All our mismatch finder specific configuration overrides are stored in .lighttpd.conf, at our tool’s home directory. Follow these steps to make any configuration changes.

  1. SSH into toolforge

    ssh <your-username>@login.toolforge.org
    
  2. Log in as the mismatch finder tool account

    become mismatch-finder
    
  3. Make your changes to our config file (feel free to use vim instead of nano if you wish)

    nano .lighttpd.conf
    
  4. Restart the server

    webservice restart
    

    NOTE! Your changes might take a minute or so to propagate, so be patient.

Additional Resources

Updating Environment Variables

All environment variables for Mismatch Finder's production environment are stored in the tool's home folder, and are symlinked into the repository sub-directory. To update production environment variables, please follow the steps below:

  1. SSH into toolforge
    ssh <your-username>@login.toolforge.org
    
  2. Log in as the mismatch finder tool
    become mismatch-finder
    
  3. Backup the current .env file
    cp .env .env.bak
    
  4. Make your changes to the .env file
    nano .env
    
    NOTE! There's no need to restart the server. Since the file is symlinked, the changes should propagate immediately.

Additional Resources

Running Database Migrations

Any change that is deployed together with a migration file, requires us to run the database migration command included with Laravel’s artisan script. To do so, follow these steps:

  1. SSH into toolforge

    ssh <your-username>@login.toolforge.org
    
  2. Log in as the mismatch finder tool account

    become mismatch-finder
    
  3. Log in to our web service (to run commands with php 7.3)

    webservice shell
    
  4. Navigate to the code repository

    cd mismatch-finder-repo
    
  5. Run the migration script, and follow the instructions on the screen

    php artisan migrate
    

Troubleshooting

Problem: The first time we tried to run the migration, we encountered the following error:

General error: 1709 Index column size too large. The maximum column size is
767 bytes.

Possible Explanation: As explained in this Stack Overflow post: Laravel was attempting to use an unsupported character set and collation, that were incompatible with our database. In addition, it was advised in the toolforge documentation to create the database (anew) with an utf8 char set.

Solution: After recreating the database with the correct char set. We also changed Laravel’s default configuration to ensure the encoding and collation are compatible:

Change database charset from utf8mb4 to utf8 by Silvan-WMDE · Pull Request #9 · wmde/wikidata-mismatch-finder

Additional Resources

Update uploaders in production

In order to update our allowlist of uploaders in the live application, we can utilize the custom artisan command we created to synchronize uploaders with a plain text file. Follow the steps below to log in to our server and update our list:

  1. SSH into toolforge

    ssh <your-username>@login.toolforge.org
    
  2. Log in as the mismatch finder tool account

    become mismatch-finder
    
  3. Backup the current uploaders list

    cp uploaders.txt uploaders.bckp.txt
    
  4. Edit our uploader list using vim or nano

    nano uploaders.txt
    
  5. Copy the list over to the app repository

    cp uploaders.txt mismatch-finder-repo/storage/app/allowlist/
    
  6. Log in to our web service (to run commands with php 7.3)

    webservice shell
    
  7. Navigate to the code repository

    cd mismatch-finder-repo
    
  8. Run the migration script, and follow the instructions on the screen

    php artisan uploadUsers:set uploaders.txt
    
  9. Make sure that the list was updated

    php artisan uploadUsers:show
    

Additional Resources

Drop uploaded mismatches in production

In order to delete a set of imported mismatches in the live application, we also use a custom artisan command. Follow the steps below to log in to our server and delete file imports from the database together with their associated mismatches:

  1. SSH into toolforge

    ssh <your-username>@login.toolforge.org
    
  2. Log in as the mismatch finder tool account

    become mismatch-finder
    
  3. Log in to our web service (to run commands with php 7.3)

    webservice shell
    
  4. Navigate to the code repository

    cd mismatch-finder-repo
    
  5. Show the list of currently imported mismatch files - the first column shows the import ID

    php artisan import:list
    
    +----+-------------+-----------------+-----------------+------------+-----------------+
    | ID | Import Date | External Source | User            | Expires at | # of Mismatches |
    +----+-------------+-----------------+-----------------+------------+-----------------+
    | 11 | 2021-09-07  | internet        | raheem.eichmann | 2022-09-07 | 23              |
    | 12 | 2021-09-11  | internet        | raheem.eichmann | 2022-09-11 | 42              |
    | 13 | 2021-09-17  | internet        | raheem.eichmann | 2022-09-17 | 345             |
    +----+-------------+-----------------+-----------------+------------+-----------------+
    
  6. Delete an imported file and all its associated mismatches

    php artisan import:drop 12
    
    Dropping import ID 12 with 42 mismatches
    
     Are you sure? (yes/no) [no]:
     > y
    
    Successfully dropped import ID 12 with 42 associated mismatches
    
    IMPORTANT: Dropping an import from the store will delete all its asociated mismatches, whether they have been reviewed or not.

Additional Resources

Restarting the Job Queue Daemon

In Mismatch Finder, we use background jobs for validation and import of the mismatch files. In production, they are managed by a queue worker daemon, that is running on a separate kubernetes pod next to the webservice container.

When deploying changes to production, that are affecting background jobs, a few additional steps have to be carried out in order for them to take effect. From the Laravel Documentation:

Remember, queue workers, are long-lived processes and store the booted application state in memory. As a result, they will not notice changes in your code base after they have been started. So, during your deployment process, be sure to restart your queue workers.

That means, after deploying any such change, the queue worker needs to be restarted via an artisan command from within the webservice container:

  1. SSH into toolforge

    ssh <your-username>@login.toolforge.org
    
  2. Log in as the mismatch finder tool account

    become mismatch-finder
    
  3. Log in to our web service (to run commands with php 7.3)

    webservice shell
    
  4. Navigate to the code repository

    cd mismatch-finder-repo
    
  5. Restart the queue worker via artisan command

    php artisan queue:restart
    

This will signal the queue worker to restart after having finished its current job.

Additional Resources

Updating The Job Queue Container

In order to ensure continuous import and validation job execution, the Wikidata Mismatch Finder deploys a php 7.3 container to run the job queue on. In order to make changes to this deployment's configuration, first update the deployment.yaml file at the root of our repository, and then open a Pull Request to test and merge your changes. Once the changes have been approved and deployed, recreate the container by following these steps:

Production

  1. SSH into toolforge
    ssh <your-username>@login.toolforge.org
    
  2. Log in as the mismatch-finder tool account
    become mismatch-finder
    
  3. Delete the currently running job queue deployment
    kubectl delete deployment mismatch-finder.queue
    
  4. Recreate the container from the new version of the deployment configuration
    kubectl create --validate=true -f mismatch-finder-repo/containers/production-queue.yaml
    

Staging

  1. SSH into toolforge
    ssh <your-username>@login.toolforge.org
    
  2. Log in as the mismatch-finder-staging tool account
    become mismatch-finder-staging
    
  3. Delete the currently running job queue deployment
    kubectl delete deployment mismatch-finder-staging.queue
    
  4. Recreate the container from the new version of the deployment configuration
    kubectl create --validate=true -f mismatch-finder-repo-next/containers/staging-queue.yaml
    


Additional Resources