Nova Resource:Wikistats

From Wikitech
For the Wikimedia Fondation project see mw:Wikistats.
Project Name wikistats
Details,
admins/members
openstack-browser
Monitoring

Wikistats

Description

A project that collects and displays statistics for Mediawikis.

Purpose

Collecting statistics about Mediawiki installs on the internet.

Anticipated traffic level

10-100 hits per day

Anticipated time span

indefinite

Project status

currently running

Contact address

dzahn@wikimedia.org

Willing to take contributors or not

willing

Subject area narrow or broad

narrowstatistics

http://wikistats.wmcloud.org

Where to find the code

Wikistats consists of 2 parts, the puppet manifests (in the operations/puppet git repo) and the Debian package (in the operations/debs/wikistats repo).

The puppet part is divided into ./puppet/modules/role/manifests/wikistats/instance.pp,the role class which is applied to a node/instance, and the module in ./puppet/modules/wikistats/.

Manifests

  1. role::wikistats - configures host name and SSL certs depending on labs vs. prod, uses the main classes below, is all that needs to be included on an instance or node
  2. wikistats - the init.pp of the module, sets up user/group, installs package (if we're using labsdebprepo), uses the other classes
  3. wikistats::cronjob - defines a cron job to update a table
  4. wikistats::db - installs mariadb, php-mysql
  5. wikistats::updates - installs php-cli, creates log dir, has definitions for the update cron jobs and configures them
  6. wikistats::web - does the Apache setup

(Currently they do not automatically install the package yet which is done manually)

How to build the Debian package

  1. git clone https://gerrit.wikimedia.org/r/operations/debs/wikistats
  2. cd wikistats
  3. "debuild" (signed) or "debuild -us -uc" (unsigned)
  4. cd ..
  5. optional: check which files would be installed by this: dpkg-deb -c wikistats_*_all.deb
  6. install: dpkg -i wikistats_*_all.deb

^ This is outdated. It's still in this repo but not an actual .deb anymore. Just git pull the files or let the deploy-wikistats command do that for you.

How to deploy latest code

/usr/local/bin/wikistats# ./deploy-wikistats deploy

Optionally use "backup" to make a backup of current code before deploying or "restore" to restore from the last backup.

/usr/local/bin/wikistats# ./deploy-wikistats backup
/usr/local/bin/wikistats# ./deploy-wikistats restore

How to fix DB grants if they break after deploy

grep db_pass /etc/wikistats/config.php
mysql -u root -p wikistats
mysql> grant all privileges on wikistats.* to 'wikistatsuser'@'localhost' identified by '<password from config.php>';

How to add a new wiki

A common maintenance task is to add newly created wikis to the statistics tables in SQL. This means running an INSERT statement on the DB shell, followed by running the update.php script to fetch data for the first time. The query needed varies slightly depending what type of wiki it is. Each project, Wikipedia, Wiktionary, etc has its own table in the database.

First step, ssh to the current instance in the project "wikistats", which you can see in Horizon. As of 12 September, we have wikistats-bookworm.wikistats.eqiad1.wikimedia.cloud. In Horizon, under Proxies you can determine which of them is the backend for https://wikistats.wmcloud.org.

Once connected, get a mysql shell with:

mysql -u root wikistats
MariaDB [wikistats]> 

Wikipedia

An example for adding a Wikipedia with a new language that no project has used before.

MariaDB [wikistats]> insert into wikipedias (prefix, lang, loclang, method) values ("fat", "Fante", "Mfantse", 8);


The minimal values that you need to provide are:

prefix (the language code or subdomain, like the "en" in en.wikipedia.org)
lang (the name of the language in English)
loclang (the name of the language in the language itself
method (this has historic reasons and nowadays should always be set to "8", 8 means to fetch data from the API vs old scraping methods)

In this example no other project exists in the language "Fante".

If the local language name has non-ASCII characters you have to convert them to HTML entities with something like https://onlineutf8tools.com/convert-utf8-to-html-entities and store the HTML version in the database as the tables are not using utf8.

Each wiki creation ticket should link to an URL on meta where the new language was requested. (example https://meta.wikimedia.org/wiki/Requests_for_new_languages/Wikipedia_Ghanaian_Pidgin). This is the source of truth and where to get the correct language name strings and where you can confirm the new prefix is a valid ISO-639 language code (example, linked from there: https://iso639-3.sil.org/code/gpe).

Wiktionary

In this example a new Wiktionary has been created but the language is not entirely new as a project. A Wikipedia with the same language already exists, so we can use an "insert .. select" query to take all the values from the Wikipedias table where it has the same prefix.

MariaDB [wikistats]> insert into wiktionaries (prefix, lang, loclang, method) select prefix,lang,loclang,method from wikipedias where prefix="ckb";

Mediawikis

This is for non-WMF wikis, the general "all other MediaWikis" table. Here it needs a full URL to an api.php and that "method 8".

MariaDB [wikistats]> insert into mediawikis (method,statsurl) values (8,'https://www.qiuwenbaike.cn/api.php');

Manually updating data for a wiki

After adding a new wiki you can either just wait for the timers to update the table or manually run an update. example:

/usr/lib/wikistats/update.php wp prefix fat

Here "wp" means "from the wikipedias table" and "prefix fat", so this is for "fat.wikipedia.org". You can find the short names of all the other tables inside update.php in a large switch statement.

You can also update wikis by ID. Example, a specific wiki from the mediawikis table:

/usr/lib/wikistats/update.php mw id 20247

To manually update an entire table you can also manually start the systemd services. Find them with:

systemctl list-units | grep wikistats

Server admin log

2024-04-01

  • 17:23 mutante: - deleting instance bwplanet

2024-02-23

  • 20:20 RhinosF1: testing

2024-02-08

  • 18:24 mutante: running '/usr/lib/wikistats/update.php mw autofixit' which tries to magically fix stats URLs that are 301 and return a new location .. but it throws an error because then there are duplicate entries which the DB doesn't allow

2023-12-30