Cache servers

From Wikitech
Jump to navigation Jump to search
Wikimedia infrastructure

Data centres and PoPs

Networking

HTTP Caching


MediaWiki

Media

Logs

Search

WMF currently runs around 100 physical servers in 5 data centres with the purpose of caching HTTP traffic. The servers are split in different logical clusters according to the type of contents they cache. A puppet role corresponds to each logical cluster, for example cache::text and cache::upload.

Move server to another cluster

The following procedure allows to move a cache server to another cluster. The example shows how cp3043.esams.wmnet can be moved from cache_text to cache_upload.

Depool and downtime

On the cluster::management node (neodymium.eqiad.wmnet at the time of this writing):

sudo -i confctl select name=cp3043.esams.wmnet set/pooled=no

Make sure that depooling took place correctly, for instance by looking at the frontend and backend traffic instance breakdown dashboards.

On the alerting_host node (currently einsteinium.wikimedia.org):

sudo -i icinga-downtime -h cp3043 -d 7200 -r "move $1 to cache_upload --$USER"

Remove host from its current cluster

Remove the host from the list of cache_text machines in conftool and hiera (commit example)

Puppet needs to be run on all other (non-cp3043) cache_text nodes to reflect the hiera changes. Conftool changes take effect automatically upon puppet-merge. You can double-check whether the node has been removed from the list of cache_text servers in esams.

Add host to new cluster

Disable puppet on all cache nodes belonging to the new cluster (cache_upload in this example).

Add the node to cache_upload in hiera, change server role (example) and run wmf-auto-reimage

Add node to conftool.

Run puppet on all cache_upload nodes.

Final verification and pooling

Ensure that the new node is working as expected (eg: varnishtest -k /usr/share/varnish/tests/upload/*.vtc, test requests locally against varnish-fe and varnish-be)

Repool.