Cache servers

From Wikitech
Jump to navigation Jump to search
Wikimedia infrastructure

Data centres and PoPs

Networking

HTTP Caching


MediaWiki

Media

Logs

Search

[edit]

WMF currently runs around 100 physical servers in 5 data centres with the purpose of caching HTTP traffic. The servers are split in different logical clusters according to the type of contents they cache. A puppet role corresponds to each logical cluster, for example cache::text and cache::upload.

Move server to another cluster

The following procedure allows to move a cache server to another cluster. The example shows how cp3043.esams.wmnet can be moved from cache_text to cache_upload.

Depool and downtime

On the cluster::management node (neodymium.eqiad.wmnet at the time of this writing):

sudo -i confctl select name=cp3043.esams.wmnet set/pooled=no

Make sure that depooling took place correctly, for instance by looking at the frontend and backend traffic instance breakdown dashboards.

On the alerting_host node (currently einsteinium.wikimedia.org):

sudo -i icinga-downtime -h cp3043 -d 7200 -r "move $1 to cache_upload --$USER"

Remove host from its current cluster

Remove the host from the list of cache_text machines in conftool and hiera (commit example)

Puppet needs to be run on all other (non-cp3043) cache_text nodes to reflect the hiera changes. Conftool changes take effect automatically upon puppet-merge. You can double-check whether the node has been removed from the list of cache_text servers in esams.

Add host to new cluster

Disable puppet on all cache nodes belonging to the new cluster (cache_upload in this example).

Add the node to cache_upload in hiera, change server role (example) and run wmf-auto-reimage

Add node to conftool.

Run puppet on all cache_upload nodes.

Final verification and pooling

Ensure that the new node is working as expected (eg: varnishtest -k /usr/share/varnish/tests/upload/*.vtc, test requests locally against varnish-fe and varnish-be)

Repool.