WMDE/Wikidata/Caching

From Wikitech
< WMDE‎ | Wikidata

Varnish Cache

For full details of the varnish caching see Varnish and mw:Manual:Varnish caching.

Purging a single page (URL)

echo 'https://www.wikidata.org/wiki/Special:EntityData/L1.rdf' | mwscript purgeList.php --wiki=aawiki

(The --wiki option is ignored here.)

Purging a list of pages

mw:Manual:PurgeList.php

Purging all URLs in a file:

cat /home/addshore/listofURLs | mwscript purgeList.php --wiki=aawiki

Purging all pages in a (the Lexeme) namespace:

mwscript purgeList.php --wiki wikidatawiki --namespace 146

Parser Cache

mw:Manual:$wgCacheEpoch is used by many other things than just the parser cache. So don't touch it.

Purging a selection of numeric IDs

printf 'Lexeme:L%d\n' {1..1000} | mwscript purgePage.php --wiki wikidatawiki --skip-exists-check

Selectively reject mass parser cache values

If you want to selectively reject parser cache entries you can use the RejectParserCacheValue hook.

This can be added to CommonSettings.php, for example for example https://gerrit.wikimedia.org/r/#/c/operations/mediawiki-config/+/468301/1/wmf-config/CommonSettings.php.

// T203888: Purge Wikidata Lexeme parser cache for senses deployment - Addshore
if ( $wgDBname === 'wikidatawiki' ) {
	/** @var WikiPage $wikiPage */
	$wgHooks['RejectParserCacheValue'][] = function ( $value, $wikiPage, $popts ) {
		if (
			$wikiPage->getTitle()->getNamespace() === 146 &&
			$value->expired( '20181018105500' )
		) {
			return false;
		}
		return true;
	};
}

You can monitor the rejection rate for the whole cluster on https://grafana.wikimedia.org/d/000000106/parser-cache

If you are going to be rejecting a large amount of entries / a large period of time then ideally ping the DBAs before doing so.

Actively remove parser cache entries

You can use mw:Manual:PurgeParserCache.php.

On the WMF cluster this already runs nightly.

Memcached

For the main docs for memcached see Memcached

There is a 1MB limit on memcached keys in production, we are not near that.

Currently you can get the data from a key by doing the following on a mediawiki host in production:

echo "get wikidatawiki:wikibase-PropertyOrderProvider" | nc localhost 11213 -q 2 > test

Shared Cache

The sharedcache shares contains a bunch of data (mainly entity revisions) shared between all sites on the cluster.

Cache keys are rotated for each new mediawiki version, this is due to possible serialization changes. [citation needed]

During the HHVM -> php7 migration there will be 2 different keys for each thing stored, one with -hhvm in the key, one without, again due to possible differences in the serialization [citation needed].

Currently these cache keys exist with a TTL of 1 week.[citation needed]

From Wikibase/docs/options.wiki, docs for $wgWBRepoSettings and $wgWBClientSettings keys.

sharedCacheKeyPrefix
Prefix to use for cache keys that should be shared among a wikibase repo and all its clients. The default is constructed from $wgDBname and WBL_VERSION. In order to share caches between clients (and the repo), set a prefix based on the repo's name and WBL_VERSION or a similar version ID.
Note: The default may change in order to use the repo's database name automatically.
sharedCacheDuration
The default duration of entries in the shared object cache, in seconds. Default is 3600 seconds (1 hour).
sharedCacheType
The type of cache to use for the shared object cache. Defaults to $wgMainCacheType. Use CACHE_XXX constants.

The shared cache key in production is set in wmf-config/Wikibase.php and will look something like "wikibase_shared/1_31_0-wmf_2-wikidatawiki".

WikiPageEntityRevisionLookup

In keys like:

wikibase_shared/1_33_0-wmf_20-wikidatawiki:WikiPageEntityRevisionLookup:Q64

Current sizes

On 13 March 2019:

  • Q30407191 (From Special:LongPages)
    • hhvm 112k
    • php7 182k
  • Q64 (regular item?)
    • hhvm 32k
    • php7 52k
  • Q3156846 (From Special:ShortPages)
    • hhvm 470bytes
    • php71 359bytes

CacheAwarePropertyInfoStore

This cache key stores a copy of the whole of the property info table.

Soon the CacheAwarePropertyInfoStore in wikibase will have per property caching to reduce the load on the main key.

Keys look like:

wikibase_shared/1_33_0-wmf_20-wikidatawiki-hhvm:CacheAwarePropertyInfoStore wikibase_shared/1_33_0-wmf_20-wikidatawiki:CacheAwarePropertyInfoStore

Current size: hhvm 100k, php7 150k

Current issues: https://phabricator.wikimedia.org/T97368#5018549

Misc Cache

wikibase.repo.formatter.*

Although not in the "shared cache", this cache is still shared between all wikibase installs that use a given memcached cluster.

Keys look like:

wikibase.repo.formatter.<entityId>_<revisionId>_<languageCode>_<valueType(label/description)>

Resulting in:

wikibase.repo.formatter.Q64_880324577_en_label

Data in these keys is very small, but there can be lots of them.

This cache has the potential to store #ofEntityies * #someLowRevCount * #languages * #types, which is currently something like 50000000*1.2*400*2, but in reality less keys than this exist.

These keys are never removed by wikibase, instead the LRU eviction method is relied upon. These keys currently have a 1 week TTL (March 2019)[citation needed].

Grafana dashboard: https://grafana.wikimedia.org/d/u5wAugyik/wikibase-formattercache

wikidatawiki:wikibase-PropertyOrderProvider

This wiki exist for each wikibase repo install (so there is one for commons too)

Currently for wikidata on March 2019 the size of this key is 9k