Jump to content

Tool:InteGraality

From Wikitech
Toolforge tools
InteGraality
Website https://integraality.toolforge.org/
Author(s) Jean-Frédérictalk
Maintainer(s) Jean-Frédéric (View all)
Source code https://github.com/JeanFred/integraality/
License MIT License
Issues Open tasks · Report a bug
Admin log Nova_Resource:Tools.integraality/SAL

InteGraality is a tool generating dashboards of property coverage for a given part of Wikidata.

Modes

There are three possible modes:

  • “Queries”, invoked from following the “🔍” in the cell. It uses the queries endpoint.
  • “Manual update”, invoked from the “update” link in the template header. It uses the update endpoint.
  • “Periodic update”, invoked by a cron job (currently weekly) which loops through all the dashboards on a given wiki and updates them.

Architecture


The tool has two main components:

  • A Flask web app (served via uWSGI behind Toolforge's Kubernetes webservice) handling on-demand requests (/update, /queries).
  • A CLI (python -m integraality.pages_processor) that batch-updates all dashboards on a given wiki. This is invoked weekly by a scheduled Toolforge job.

Both share the same Python codebase and use pywikibot to read/write wiki pages.

Data flow:

  1. A user places {{Property dashboard}} on a wiki page with config parameters
  2. PagesProcessor reads the page, parses the template config
  3. PropertyStatistics runs SPARQL queries, one per column (via Wikidata Query Service or QLever) to compute coverage across all groupings
  4. ResultsFormatter formats the results as a wikitext table
  5. The table is written back to the wiki page

Redis caches parsed template configs to speed up the /queries endpoint (which needs to reconstruct the SPARQL query without re-fetching the wiki page).

Endpoints

Endpoint Purpose
/ Landing page
/healthz Health check (used by Toolforge Kubernetes)
/update?url=page_url&page=page_name Triggers the update of the given dashboard page
/update/stream Server-Sent Events stream for live update progress
/queries?url=page_url&page=page_name Returns the SPARQL queries behind the 🔍 links

Deployment

Deployment is automated via an Ansible playbook, deploy/main.yml. It is invoked using:

./bin/deploy-to-toolforge.sh <username>

Logs

  • Cron job: logs/weekly-update-wikidata-[out|err].log
  • Webservice startup logs: webservice logs -f
  • Webservice logs: uwsgi.log

See also