From Wikitech
< Analytics‎ | Systems‎ | Cluster(Redirected from Analytics/Cluster/Hadoop)
Jump to navigation Jump to search

The Analytics Hadoop Cluster consists of the following systems:

1 master node (NameNode, ResourceManager, etc.)
1 standby NameNode and ResouceManager
42 x worker nodes (DataNode, NodeManager).

The hardware infrastructure page has the system description and configurations.

We run Cloudera's CDH5.

Administration links

See Hue documentation for jobs running on Hadoop and hunting down logs.

See the Administration page for servicing individual nodes or understanding the cluster better.

For users

Hive is the most frequently used way to access data on our Hadoop cluster, although some have been using Spark, too.