From Wikitech
Jump to navigation Jump to search

Zookeeper is used by the Analytics Cluster mainly for Kafka, but also by Hive for table locking. It is also used for Hadoop HA YARN ResourceManager.

If you have to make changes to the Zookeeper cluster (e.g. add, move or remove nodes), you will need to be apply your changes on Hadoop ResourceManager nodes (analytics1001 and analytics1002 (as of June 2015) and Kafka Brokers. These services will need a restart in order to pick up the changes.

See Zookeeper/Administration for administration tips.


Zookeeper .deb packages are available both from Cloudera and from Debian. We prefer to use Debian packages where possible, so Zookeeper servers and our Kafka packaging use Debian's version. However, the Cloudera and Debian packages may not be installed on the same nodes, as most other Cloudera Hadoop related packages depend on their version of Zookeeper. Hadoop nodes have the Cloudera version of Zookeeper installed.

The Cloudera version of Zookeeper is slightly newer, and has at least one convenient feature that I have used. /usr/lib/zookeeper/bin/zkCli.sh on Hadoop nodes has an 'rmr' command, which will allow you to recursively delete znodes if you should ever have to. I have used this when tearing down test Kafka clusters, or manually deleting Kafka topics.

Changing Zookeeper IP addresses

Zookeeper caches IP addresses of its peers during start up. If you ever need to change an IP of a Zookeeper node, you will also need to restart each Zookeeper server once the new IP has been changed in DNS.