Analytics/Systems/Cluster/Hadoop

From Wikitech
Jump to: navigation, search

The Analytics Hadoop Cluster consists of the following systems:

1 master node (NameNode, ResourceManager, etc.)
1 standby NameNode and ResouceManager
42 x worker nodes (DataNode, NodeManager).

The hardware infrastructure page has the system description and configurations.

We run Cloudera's CDH5.

Administration links

See Analytics/Cluster/Hadoop/Administration

For users

Hive is the most frequently used way to access data on our Hadoop cluster, although some have been using Spark, too.