Obsolete talk:Querybane

Rendered with Parsoid
From Wikitech

Experts opinion

Apparently, a query killer application/script should not exist in the production environment.

There are several, quite obvious, reasons:

a) in most cases, operating system performance/resource scheduling, database server configuration, database design, data access path (missing or not engaged indexes) are the root-cause of query slowness.

b) in many cases it is impossible to determine the level of importance of a particular "select" statement in a large scale system. Frequently the execution results of select statement are used in "where" clause of update/delete statements, or in insert statements, there is no guarantee the application requested the data processes empty record sets correctly, this might cause data loss/corruption.

c) in general, heavy load of database server should not be considered a problem if the system is scalable.

d) long running, especially "unusually" long running queries should not exist in any database, including MySQL, such queries should be detected/rewritten/optimized in the early stages of development.

a) not in our case, b) not in our case, c) we don't have infinite money, d) in some systems (e.g. ours), it's easier to write a query killer to temporarily fix performance problems due to unexpected long-running queries than to find and fix the cause of said queries. The running time and impact of a given query depends on many things, such as available buffer space and the properties of the queries running concurrently. These things are impossible to profile accurately in the early stages of development. Post-deployment optimisation or simply increasing hardware capacity can remove the need for query killing, but it remains a useful stopgap, especially for sites such as Wikimedia which are in continuous development. -- Tim 07:30, 29 March 2006 (PST)
Sorry, that's contrary to our objectives:
  • Our core objective is donation-based cost-effective performance delivery, not perfect performance delivery. That means we do not build out to the 10 to 100 times load capacity that it takes to respond well to extreme temporary load, nor to all denial of service conditions. If we were a business with sales depending on consistent fast performance the cost-benefit situation would differ significantly.
  • We can determine the importance of a query. We have a largely single-application system and can and do work out what's significant when deciding what we can safely eliminate, degrading less important activities first.
  • The core of the application is extremely well tuned. Demand for it isn't and has historically always grown to use all offered performance. Querybane is a tool that can shed extreme load when the push-pull of demand and supply is temporarily out of step. It resulted in major improvement in service and overload availability.
  • Querybane allows us to accept possibly problematic queries at all times and reject them only when the instantaneous system load makes them unacceptable. That increases service availability and equipment utilisation level, without unacceptable performance degradation for the core features.
It's definitely not the traditional or perfect bank design, deployment and growth philosophy! It is a pretty typical Web 2.0 design and grow-out approach. Don't do it this way if you're a major bank with the resources to get maximum load provisioning on day one and a two year before first release development schedule. Tim and I are experts also - the best solution depends on the application and its context. It's not always perfect reliability. Jamesday 02:59, 20 July 2006 (PDT)