Zookeeper

From Wikitech

Zookeeper is used by Analytics systems like Hadoop and Druid, and for Kafka.

Clusters

There are multiple clusters running in Wikimedia:

  • main-eqiad (conf100[7-9]) - used by all Kafka clusters in eqiad and Burrow (Kafka monitoring)
  • main-codfw (conf200[4-6]) - used by the Kafka cluster in codfw and Burrow (Kafka monitoring)
  • analytics-eqiad (an-conf100[1-3]) - used by the Hadoop clusters
  • druid-*-eqiad (druid* nodes) - clusters co-located within the Druid hosts and supporting Druid only.

Packaging

Zookeeper .deb packages are available both from Cloudera and from Debian. We prefer to use Debian packages where possible, so Zookeeper servers and our Kafka packaging use Debian's version. However, the Cloudera and Debian packages may not be installed on the same nodes, as most other Cloudera Hadoop related packages depend on their version of Zookeeper. Hadoop nodes have the Cloudera version of Zookeeper installed.

The Cloudera version of Zookeeper is slightly newer, and has at least one convenient feature that I have used. /usr/lib/zookeeper/bin/zkCli.sh on Hadoop nodes has an 'rmr' command, which will allow you to recursively delete znodes if you should ever have to. I have used this when tearing down test Kafka clusters, or manually deleting Kafka topics.

Changing Zookeeper IP addresses

Zookeeper caches IP addresses of its peers during start up. If you ever need to change an IP of a Zookeeper node, you will also need to restart each Zookeeper server once the new IP has been changed in DNS.