Obsolete:Performance/Query profiling in MediaWiki
For current information about query optimisation, profiling and caching, see Database optimization on mediawiki.org. |
This page is oriented toward features developers who are developing code that introduces new data access patterns. Experienced operations and platform engineers will likely find its contents a little old-hat.
MediaWiki provides a comprehensive and performant abstraction layer on top of database which encapsulates a broad range of queries. Before venturing too far into query optimization, make sure that you are not overlooking an existing implementation in core that has been refined, optimized, and battle-tested in production. A good trick is to check the class list for a class representing the type of object you want to query and then looking at the various methods it implements.
Every so often, you'll need to construct a query that is different from the set commonly running in production. In such cases, it is important to carefully evaluate the performance implications of your code and to communicate them broadly.
Prod / dev gap
Test your query against production slaves prior to full deployment. Other databases (such as the ones used for development and research) are queried much less frequently (several times a day vs. thousands per second) and therefore exhibit very different caching behavior and performance characteristics. The production databases are far more subject to both buffer pool churn and concurrent disk reads, so worst case performance is actually more likely.
Cache invalidation and avoiding cache stampedes
If you are relying on caching to buttress an expensive query, it might be good to have an idea of the stampede size on invalidation. If it is high (or you just don't want a bunch of users tied up due to this) you can use a 0 TTL cache key, store the timestamp in the cache value, and invalidation application side. This way you could use tricks like using add() / delete() to make sure only one process regenerates the cache at any given time (other contenders could use the stale value or do nothing if there isn't a current value).