Dumps/Dump servers

From Wikitech
Jump to: navigation, search

XML Dump servers

Hardware

We have two hosts:

  • labstore1007 in eqiad, production, web server, rsync to public mirrors:
    Hardware/OS: HP ProLiant DL380 Gen9, Debian 8 (jessie), one HPE D3600 array, 128GB RAM, 2 quad-core Xeon E52623v4 cpus
    Disks: 2 internal 1T drives for the OS in raid 1, 12 6TB internal disks and one HPE D3600 array with 12 6T disks, in two raid 10 volumes, no swap
  • labstore1006 in eqiad, production, nfs server to WMF Cloud and stats hosts:
    Hardware/OS: HP ProLiant DL380 Gen9, Debian 8 (jessie), one HPE D3600 array, 128GB RAM, 2 quad-core Xeon ES2623v4 cpus
    Disks: 2 internal 1T drives for the OS in raid 1, 12 6TB internal disks and one HPE D3600 array with 12 6T disks, in two raid 10 volumes, no swap

Note that these hosts also serve other public datasets such as some POTY files, the pagecount stats, etc.

Services

The production host serves dump files and other public data sets to the public, using nginx. It also serves as an rsync server to our mirrors and to labs.

Deploying a new host

You'll need to set up the raid arrays by hand. We typically have two arrays so set up two raid 10 arrays with LVM to make one giant 64T volume, ext4.

Install in the usual way (add to puppet, copying a pre-existing production labstorexxx host stanza, set up everything for PXE boot and go). Depending on what the new box is going to do, you'll need to choose the appropriate role (web/rsync, or nfs work), or combine profiles to create a new role.

Space issues

If we run low on space, we can keep fewer rounds of XML dumps; this is controlled by /etc/dumps/xml_keeps.conf on each host. This file is generated by puppet. The hosts where generated dumps are written as they are created, keep only a few dumps, and the web servers and such keep many more.

The class dumps::web::cleanups::xmldumps generates one list of how many dumps to keep for hosts that are 'replicas', i.e. the web servers, with larger keep numbers, and one list for the generating hosts (the nfs servers where dumps are written during each run). The list $keep_replicas is the one you want to tweak; the number of dumps can be adjusted for the huge wikis (enwiki, wikidatawiki), the big wikis (such as dewiki, commonswiki, etc) and then the rest separately.