WMDE/Wikidata/Scaling

From Wikitech
< WMDE‎ | Wikidata

Current & Future

WDQS

Needed for: Increased edit rate. Increased WDQS queries

Right now "Disaster mitigations" are underway.

Long-term looks like evaluation of new services: https://phabricator.wikimedia.org/T206560

Also Wikidata_query_service/ScalingStrategy

Abuse filter

Needed for: Larger entity sizes.

Various things have been worked on in this area, including https://phabricator.wikimedia.org/T204109 and other tickets avoiding processing here.

Ultimately the AbuseFilter needs re working in order to make it work better for the Wikibase usecase, or other more specific solutions are needed.

Entity size / serialization

Needed for: Larger entity sizes.

As is highlighted in https://addshore.com/2021/07/what-happens-in-wikibase-when-you-make-a-new-item/ there is quote some overhead with editing larger and larger items.

Revision table?

Needed for: More revisions

In 3 years time we will likely have 2 billion rows in the mediawiki revision table. Is this actually an issue? Do we just need some bigger hardware? Sharding?

See also T63111 for various 32-bit primary keys.

Other things come into play here, for example changing Wikidata editing patterns to mean that less revisions happen..

Dump generation

Weekly dump generation: https://phabricator.wikimedia.org/T206535

All revisions: https://phabricator.wikimedia.org/T221504

Recent Changes

MediaWiki is not optimized for having so many changes and also for having so many changes in a single NS and not many in the others..

On a social side RC is also quite unweildy.

Past

wb_terms

Removed: https://phabricator.wikimedia.org/T208425

CacheAwarePropertyInfoStore

Needed for

  • Increased property count. Increased request rate(/more app servers?)?
  • Continued stability

Links

Timeline

Rate of entity Id acquisition

Needed for: Increased entity creation rate

fname:"Wikibase\\SqlIdGenerator::generateNewId" AND db_name:"wikidatawiki" AND channel:"DBQuery" AND error:"Lock wait timeout exceeded; try restarting transaction"

Timeline

Change dispatching

Needed for: Edit rate

Change dispatching was slow and often backlogged. This limits the rate of editing that can happen on wikidata.

Fix: https://phabricator.wikimedia.org/T205865

With the fix in place, we managed to switch the number of dispatchers from 4 to 2. We switched back to 3 a while later when turning on data access to a bunch of wiktionaries.