Obsolete:Querybane

From Wikitech
(Redirected from Querybane)
This page contains historical information. It may be outdated or unreliable. 2004

NOTE: This is historic information about how things were in 2004 when MediaWiki was nowhere near as well tuned as it is today. Do not use this as guidance on how to manage MediaWiki installations today or as information on current Wikipedia practices. Querybane hasn't been used since 2005/6 when the necessary performance tuning and hardware buying work to make it unnecessary had been completed. The python tool is no longer available, though you might find this way of constructing rules useful - it worked for me when I needed it even though it was only a stopgap until the application could be tuned. --Jamesday 03:00, 11 June 2008 (UTC)

Querybane is an automated query killer for mysql. it takes a configuration file containing a set of rules and processes them in order, killing any queries that match the rules. a sample configuration might look like:

query_killer_rules = [
       # long-running sleeping threads
       {
               'server': [dbserver] + dbslaves,
               'min_threads': 1,
               'min_last_threads': 1,
               'lowest_position': 1,
               'users': ['wikiuser'],
               'commands': ['Sleep'],
               'min_run_time': 900,
               'query_types': [] # any
       }
]

Justification

One of the recurring problems with the database servers is unusually long-running queries. Often, a query will be fine and fast at low load times but problematic at high load times, so it's not a case of automatically blocking the query but of only blocking it if the load at the time it is run is high enough for it to be a problem. In the past, hard limits on the total number of queries which can be run were used. That approach has several problems:

  • It doesn't work. Even one, but more often up to five, queries are sufficient to cause the problem.
  • It blocks the cheap and fast routine queries, unnecessarily giving database errors to many people doing harmless things. Terminating just the slow queries harms fewer people.

The solution: a configurable query killer which will act based on the server load, the type of query being run and the account running it.

Theory

Rules

  • server: sometimes limits will be different. server string must be in this string.
  • minimum active thread threshold: how bad is the load now?
  • previous active thread threshold: is it getting worse? Allow for normal fluctuations.
  • lowest position: looking from longest running to slowest running threads, how far down does this rule apply?
  • user: less tolerance for regular users than admin, replication or backup tasks. user string must be in this string.
  • cmd: generally want to limit only queries, not inserts
  • run time: how long a job has taken is the guide to how high its load is
  • query: identifies the specific queries the rule applies to. Leading part of the query only.

In general, the rules try to catch only unusual cases at first, when load is low, then move on to allow killing more and more as load is higher.

Emergency operations

Measures to catch anything the more routine steps haven't dealt with. These generally operate on any type of query, while the general grooming is more refined.

Denial of service attack limitation

Denial of service attacks can come only from end users but may involve very rapid queries. Use mass kills to try to maintain as much service as possible. Queries which end up waiting for significant disk activity will be killed but most queries will be served successfully from the query and general caches.

Denial of service 1

Service essentially unavailable, get it back

  • server: ariel, suda, probably all others
  • minimum active thread threshold: 950 (at least 95% of max connections for the server)
  • previous active thread threshold: 300 (very rapid load growth)
  • lowest position: 900 (leave a few at the end, which may be legitimate)
  • user: wikiuser
  • cmd: Query
  • run time: 10
  • query: SELECT
Denial of service 2

Service very stressed, get it back

  • server: ariel, suda, probably all others
  • minimum active thread threshold: 900 (at least 90% of max connections for the server)
  • previous active thread threshold: 600 (rapid load growth)
  • lowest position: 400
  • user: wikiuser
  • cmd: Query
  • run time: 10
  • query: SELECT

Long threads at high load

This starts shedding queries of any type once connections reach a high level. It's the fallback "kill anything that's slow so the server can catch up" job. Doesn't matter who's running it, it needs to be gone. Exceptions: backups, replication etc.

  • server: ariel, suda, probably all others
  • minimum active thread threshold: 900 (at least 90% of max connections for the server)
  • previous active thread threshold: 800
  • lowest position: 500
  • user: wikiuser, wikiadmin, root
  • cmd: Query
  • run time: 20
  • query: SELECT

Long-running threads at high load, less important

Normal users start getting their queries dropped before the special case users: we assume that the others know what they are doing and will kill their own queries if they are causing trouble. If that's wrong and load continues to rise, the catchall rule will kill them soon...

  • server: ariel, suda, probably all others
  • minimum active thread threshold: 700 (at least 80% of max connections limit for the server)
  • previous active thread threshold: 600
  • lowest position: 100
  • user: wikuser
  • cmd: Query
  • run time: 20
  • query: SELECT

High load, increasing

Load is high and getting worse, start eliminating likely causes

  • server: ariel, suda, probably all others
  • minimum active thread threshold: 600 (at least 70% of max connections limit for the server)
  • previous active thread threshold: 500
  • lowest position: 30
  • user: wikuser
  • cmd: Query
  • run time: 30
  • query: SELECT

High load 400

  • server: ariel, suda, probably all others
  • minimum active thread threshold: 400
  • previous active thread threshold: 400
  • lowest position: 50
  • user: wikuser
  • cmd: Query
  • run time: 30
  • query: SELECT

High load 350

  • server: ariel, suda, probably all others
  • minimum active thread threshold: 350
  • previous active thread threshold: 350
  • lowest position: 20
  • user: wikuser
  • cmd: Query
  • run time: 30
  • query: SELECT

Normal grooming conditions

Search

Search and very many queries

Lots of load, get rid of most searches.

  • server: ariel, suda, probably all others
  • minimum active thread threshold: 300
  • previous active thread threshold: 50
  • lowest position: 250
  • user: wikuser
  • cmd: Query
  • run time: 20
  • query: SELECT cur_id,cur_namespace,cur_title,cur_text FROM 'cur','searchindex' WHERE
Search and many queries

Kill many search threads before load reaches the point where other things get killed in large numbers.

  • server: ariel, suda, probably all others
  • minimum active thread threshold: 150
  • previous active thread threshold: 40
  • lowest position: 50
  • user: wikuser
  • cmd: Query
  • run time: 30
  • query: SELECT cur_id,cur_namespace,cur_title,cur_text FROM 'cur','searchindex' WHERE
Search queries

High search load, catchall

  • server: ariel, suda, probably all others
  • minimum active thread threshold: 30
  • previous active thread threshold: 30
  • lowest position: 10
  • user: wikuser
  • cmd: Query
  • run time: 30
  • query: SELECT cur_id,cur_namespace,cur_title,cur_text FROM 'cur','searchindex' WHERE
high load search
  • server: ariel, suda, probably all others
  • minimum active thread threshold: 25
  • previous active thread threshold: 10
  • lowest position: 3
  • user: wikuser
  • cmd: Query
  • run time: 40
  • query: SELECT cur_id,cur_namespace,cur_title,cur_text FROM 'cur','searchindex' WHEREInsert non-formatted text here
Single expensive searches
  • server: ariel, suda, probably all others
  • minimum active thread threshold: 10
  • previous active thread threshold: 10
  • lowest position: 1
  • user: wikuser
  • cmd: Query
  • run time: 40
  • query: SELECT cur_id,cur_namespace,cur_title,cur_text FROM 'cur','searchindex' WHERE
Single very expensive searches
  • server: ariel, suda, probably all others
  • minimum active thread threshold: 10
  • previous active thread threshold: 10
  • lowest position: 5
  • user: wikuser
  • cmd: Query
  • run time: 80
  • query: SELECT cur_id,cur_namespace,cur_title,cur_text FROM 'cur','searchindex' WHERE

History

Long history checks are a problem. Rules need to cover the slow cases if load is too high.

History with many queries
  • server: ariel, suda, probably all others
  • minimum active thread threshold: 150
  • previous active thread threshold: 40
  • lowest position: 120
  • user: wikuser
  • cmd: Query
  • run time: 10
  • query: /* PageHistory::history */ SELECT old_id,old_user,old_comment,old_user_text,old_timestamp,
History
  • server: ariel, suda, probably all others
  • minimum active thread threshold: 50
  • previous active thread threshold: 50
  • lowest position: 1 (only one at a time)
  • user: wikuser
  • cmd: Query
  • run time: 100
  • query: /* PageHistory::history */ SELECT old_id,old_user,old_comment,old_user_text,old_timestamp,

User contributions

Long user contributions checks are a problem. Rules need to cover the slow cases if load is too high. Crawlers quite often hit this.

User contributions with many queries
  • server: ariel, suda, probably all others
  • minimum active thread threshold: 50
  • previous active thread threshold: 40
  • lowest position: 40
  • user: wikuser
  • cmd: Query
  • run time: 10
  • query: /* wfSpecialContributions */ SELECT old_namespace,old_title,old_timestamp,old_comment,
User contributions
  • server: ariel, suda, probably all others
  • minimum active thread threshold: 25
  • previous active thread threshold: 25
  • lowest position: 1 (only one at a time)
  • user: wikuser
  • cmd: Query
  • run time: 40
  • query: /* wfSpecialContributions */ SELECT old_namespace,old_title,old_timestamp,old_comment,

Article author list

These sometimes use the wrong index. Kill any which appear to be doing that, in case the other changes don't stop it from happening.

  • server: ariel, suda, probably all others
  • minimum active thread threshold: 10
  • previous active thread threshold: 5
  • lowest position: 400
  • user: wikuser
  • cmd: Query
  • run time: 5
  • query: SELECT old_user, old_user_text, user_real_name, MAX(old_timestamp) as timestamp FROM old LEFT

Category list

The Special:categories category list query is currently very inefficient, performing a select distinct and limit operation of hundreds of thousands of records. Crawlers quite often hit this.

Category list/crawler
  • server: ariel, suda, probably all others
  • minimum active thread threshold: 200
  • previous active thread threshold: 100
  • lowest position: 200
  • user: wikuser
  • cmd: Query
  • run time: 10
  • query: SELECT DISTINCT 'Categories' as type,

The query is multi-line and only the first line is checked by Querybane. Mytop shows all lines combined.

Category list with many queries
  • server: ariel, suda, probably all others
  • minimum active thread threshold: 50
  • previous active thread threshold: 40
  • lowest position: 40
  • user: wikuser
  • cmd: Query
  • run time: 20
  • query: /* categoriespage::doQuery */ SELECT DISTINCT 'Categories' as type,
Category list
  • server: ariel, suda, probably all others
  • minimum active thread threshold: 25
  • previous active thread threshold: 25
  • lowest position: 3
  • user: wikuser
  • cmd: Query
  • run time: 50
  • query: /* categoriespage::doQuery */ SELECT DISTINCT 'Categories' as type,

Newpages list

This sometime shows up as a problem. Crawlers often hit this.

Newpages list/crawler
  • server: ariel, suda, probably all others
  • minimum active thread threshold: 200
  • previous active thread threshold: 100
  • lowest position: 200
  • user: wikuser
  • cmd: Query
  • run time: 10
  • query: SELECT 'Newpages' as type,
Newpages list with many queries
  • server: ariel, suda, probably all others
  • minimum active thread threshold: 50
  • previous active thread threshold: 40
  • lowest position: 40
  • user: wikuser
  • cmd: Query
  • run time: 20
  • query: SELECT 'Newpages' as type,
Newpages list
  • server: ariel, suda, probably all others
  • minimum active thread threshold: 25
  • previous active thread threshold: 25
  • lowest position: 3
  • user: wikuser
  • cmd: Query
  • run time: 50
  • query: SELECT 'Newpages' as type,

Allpages list

The allpages sub-pages are a potential crawler target so it's included here to deal with unruly crawlers. It tends to hit pages which aren't cached.

Allpages subpage/crawler

Kill further down the list if there are many queries, it may be a crawler.

  • server: ariel, suda, probably all others
  • minimum active thread threshold: 200
  • previous active thread threshold: 100
  • lowest position: 200
  • user: wikuser
  • cmd: Query
  • run time: 10
  • query: SELECT cur_title FROM cur WHERE cur_namespace=0 AND cur_title >= ' (trailing ' for letter of the alphabet)
Allpages subpage with many queries
  • server: ariel, suda, probably all others
  • minimum active thread threshold: 50
  • previous active thread threshold: 40
  • lowest position: 40
  • user: wikuser
  • cmd: Query
  • run time: 10
  • query: SELECT cur_title FROM cur WHERE cur_namespace=0 AND cur_title >= ' (trailing ' for letter of the alphabet)
Allpages subpage
  • server: ariel, suda, probably all others
  • minimum active thread threshold: 25
  • previous active thread threshold: 25
  • lowest position: 1 (only one at a time)
  • user: wikuser
  • cmd: Query
  • run time: 40
  • query: SELECT cur_title FROM cur WHERE cur_namespace=0 AND cur_title >= ' (trailing ' for letter of the alphabet)

Special pages in MediaWiki 1.4

Mediawiki 1.4 doesn't respect the miser mode switch properly and block these - kill them on sight unless they finish in 5 seconds or less.

Allpages top level
  • server: ariel, suda, probably all others
  • minimum active thread threshold: 5
  • previous active thread threshold: 1
  • lowest position: 900
  • user: wikuser
  • cmd: Query
  • run time: 5
  • query: /* indexShowToplevel */
Ancientpages
  • server: ariel, suda, probably all others
  • minimum active thread threshold: 5
  • previous active thread threshold: 1
  • lowest position: 900
  • user: wikuser
  • cmd: Query
  • run time: 5
  • query: /* ancientpagespage::doQuery */

Counting cur

Page moves and some other things count the number of cur entries. That's slow in InnoDB, so get rid of most of them when the system is busy.

wfSpecialStatistics Count cur with many queries
  • server: ariel, suda, probably all others
  • minimum active thread threshold: 50
  • previous active thread threshold: 40
  • lowest position: 40
  • user: wikuser
  • cmd: Query
  • run time: 10
  • query: /* wfSpecialStatistics */ SELECT COUNT(cur_id) AS total FROM 'cur'
wfSpecialStatistics Count cur
  • server: ariel, suda, probably all others
  • minimum active thread threshold: 25
  • previous active thread threshold: 25
  • lowest position: 1 (only one at a time)
  • user: wikuser
  • cmd: Query
  • run time: 40
  • query: /* wfSpecialStatistics */ SELECT COUNT(cur_id) AS total FROM 'cur'

Stub threshold

When there are a lot of queries outstanding these are killed because it's usually a small number of people generating a significant amount of load and it's an effective way to help to quickly relieve the situation.

  • server: ariel, suda, probably all others
  • minimum active thread threshold: 200
  • previous active thread threshold: 200
  • lowest position: 200
  • user: wikuser
  • cmd: Query
  • run time: 10
  • query: SELECT LENGTH(cur_text) AS x, cur_namespace, cur_is_redirect FROM cur