Talk:Wiki replicas

From Wikitech
Jump to navigation Jump to search

Optimizing query, storing temporary results

Hi, previously, I had two steps to obtain autopatrolled candidates. However, I noticed that users now only have read-only rights:

DROP DATABASE IF EXISTS `s51857__c_tmp`;
CREATE DATABASE `s51857__c_tmp`;
CREATE TABLE `s51857__c_tmp`.`tmp_apc_com` ENGINE=MEMORY (
	SELECT REPLACE(`user_name`, ' ', '_') AS 'uname',
		`user_id`,
		`user_name`,
		`user_editcount`,
		`user_registration`
	FROM `user`
		/* there are very few users with > 1000 edits compared to the overall number of users */
		WHERE `user`.`user_editcount` > 1000
		/* DB should be prepared for joining these two (MediaWiki does this all the time); */
		/* user_groups is supposedly a relatively small table */
		AND `user`.`user_id` NOT IN ( SELECT `ug_user` FROM `user_groups` )
		/* do not include brand new users who just experimented with Cat-A-Lot */
		AND (DATEDIFF( NOW(), `user`.`user_registration` ) > 30 OR `user`.`user_registration` IS NULL)
		/* and of course these users should not be blocked currently */
		AND `user`.`user_id` NOT IN ( SELECT `ipb_user` FROM `ipblocks` )
		/* and have an edit or other contribution within the last 30 days */
		/* this makes the query notably slower but should be almost constant */
		/* even if we get a lot new users or operate this wiki another 10 years */
		/* while user_daily_contribs will grow over time and might consume even more */
		/* resources in the future */
		AND `user`.`user_id` IN ( SELECT `rc_user` FROM `recentchanges_userindex` ));

SELECT `user_id`,
	CONVERT(`user_name` USING 'utf8') AS 'name',
	CONVERT(`block_reasons` USING 'utf8') AS 'reason',
	`user_editcount` AS 'editcount',
	DATE_FORMAT(`user_registration`, '%Y-%m-%d') AS 'regdate'
FROM `s51857__c_tmp`.`tmp_apc_com`
LEFT JOIN (SELECT GROUP_CONCAT(`log_comment` SEPARATOR ' // ') AS 'block_reasons', `log_title`
	FROM `logging_userindex`
	WHERE `log_type`='block' AND `log_title` IN
		(SELECT `uname` FROM `s51857__c_tmp`.`tmp_apc_com`)
		GROUP BY `log_title`)
	AS `user_selection`
ON `user_selection`.`log_title`=`s51857__c_tmp`.`tmp_apc_com`.`uname`
ORDER BY `user_editcount` DESC;
DROP DATABASE IF EXISTS `s51857__c_tmp`;

Therefore I concatenated both queries ending in:

  SELECT * FROM (
  SELECT uname, GROUP_CONCAT(`comment_text` SEPARATOR ' // ') reason, user_id, name, editcount FROM
    (SELECT
    REPLACE (`user_name`, ' ', '_') AS 'uname',
        `user_id`,
        CONVERT(`user_name` USING 'utf8') AS 'name',
        `user_editcount` AS 'editcount',
        DATE_FORMAT(`user_registration`, '%Y-%m-%d') AS 'regdate'
    FROM
        `user`
    WHERE
        `user`.`user_editcount` > 1000 AND `user`.`user_id` NOT IN(
        SELECT
            `ug_user`
        FROM
            `user_groups`
        ) AND (DATEDIFF(NOW(), `user`.`user_registration`) > 30 OR `user`.`user_registration` IS NULL)
        AND `user`.`user_id` NOT IN(
          SELECT
              `ipb_user`
          FROM
              `ipblocks`
          )
        AND `user`.`user_id` IN(
          SELECT
              `actor_user`
          FROM
              `recentchanges_userindex`
          JOIN actor_recentchanges ON rc_actor = actor_id)
    ) AS q1
    LEFT JOIN `actor_logging` ON `user_id` = `actor_user`
    LEFT JOIN `logging_userindex` ON `log_title` = `uname`
    LEFT JOIN `comment_logging` ON `log_comment_id` = `comment_id`
    WHERE `log_type`='block' OR `log_type` IS NULL
    GROUP BY `uname`, `user_id`, `name`, `editcount`) AS q2
  ORDER BY `editcount` DESC;

This query, however, never finishes/ runs much too long (ERROR 2013 (HY000) at line 1: Lost connection to MySQL server during query). So I thought, let's have a look what MariaDB does and prefixed the query with EXPLAIN. This resulted in ERROR 1345 (HY000) at line 1: ANALYZE/EXPLAIN/SHOW can not be issued; lacking privileges for underlying table.

  1. Is it still possible to store temporary results somehow?
  2. How can one optimize queries without knowing the Query Execution Plan?

-- Rillke (talk) 08:42, 28 July 2019 (UTC)

Please see Help:MySQL_queries#Optimizing_queries. -- Rillke (talk) 08:55, 28 July 2019 (UTC)
After testing 5 or 6 queries, https://tools.wmflabs.org/sql-optimizer returns only 502 Bad Gateway -- seems like this is not really a robust solution. -- Rillke (talk) 10:02, 28 July 2019 (UTC)

1. Is it still possible to store temporary results somehow?

Not on the database servers, no. The ability to create tables on the Wiki Replica servers was removed 2018-01-17 after being announced 2017-09-25. This former feature was certainly useful for some complicated SQL-only analytics, but it also prevented us from load balancing the Wiki Replica databases and caused unexpected failures for tools using the feature when database server maintenance was performed.

2. How can one optimize queries without knowing the Query Execution Plan?

Open 2 SQL sessions. In session 1: SELECT CONNECTION_ID() AS conid;. Note the number returned. Run the query to be explained.
In session 2: Use the number noted above for <conid>: SHOW EXPLAIN FOR <conid>;
--BryanDavis (talk) 02:59, 29 July 2019 (UTC)