WMDE/Wikidata/PropertySuggester update

From Wikitech
< WMDE‎ | Wikidata

Occasionally, the data for the legacy property suggester needs to be updated from the latest JSON dumps.

This process requires access to a Wikimedia maintenance host.

One-time setup

Each update

  • Find the latest wbs_propertypairs on https://analytics.wikimedia.org/published/datasets/wmde-analytics-engineering/Wikidata/wbs_propertypairs/ (generated on stat1005 by a cron from ladsgroup). We’ll use yyyymmdd as a placeholder for its name below.
  • Pull analyzed-out.gz to your local machine, apply wbs_propertypairs-refine refine.py (README) and commit to the wbs_propertypairs repo with the commit message Add propertypairs from the yyyymmdd dump.
  • Load it down to the maintenance host with https_proxy=http://webproxy.eqiad.wmnet:8080 wget 'https://github.com/wmde/wbs_propertypairs/raw/master/yyyymmdd/wbs_propertypairs.csv.gz'.
  • Unpack it: gzip -d wbs_propertypairs.csv
  • Update the actual table: mwscript extensions/PropertySuggester/maintenance/UpdateTable.php --wiki wikidatawiki --file wbs_propertypairs.csv.
    • This will take a few minutes.
    • It will first log (to your terminal) a bunch of “deleting a batch” lines, then “X rows inserted” up to the total number of lines in the CSV file (which you can count with wc -l wbs_propertypairs.csv beforehand).
  • Log your changes: !log Updated the Wikidata property suggester with data from yyyymmdd's JSON dump