Obsolete:Hadoop

From Wikitech
Jump to: navigation, search
This process no longer needs to run on the snapshot hosts. (Nov 1 2010)

Hadoop is being setup on snapshot1, snapshot2, snapshot3 to test XML Snapshots.

Setting up a node cluster

  1. wget http://www.trieuvan.com/apache/hadoop/core/hadoop-0.20.1/hadoop-0.20.1.tar.gz
  2. apt-get install sun-java6-bin

core-site.xml

<configuration>
 <property>
    <name>fs.default.name</name>
    <value>hdfs://208.80.152.139:9000</value>
  </property>
</configuration>

hdfs-site.xml

<configuration>
  <property>
    <name>dfs.name.dir</name>
    <value>/var/hdfs/name</value>
    <final>true</final>
   </property>
  <property>
    <name>dfs.data.dir</name>
    <value>/var/hdfs/data</value>
    <final>true</final>
   </property>p
  <property>
    <name>fs.checkpoint.dir</name>
    <value>/var/hdfs/namesecondary</value>
    <final>true</final>
   </property>
</configuration>

Slaves

Add slave servers to conf/slaves

hadoop-env.sh

export JAVA_HOME=/usr/lib/jvm/java-6-sun/

Post Config

  1. mkdir /var/hdfs/name
  2. mkdir /var/hdfs/data
  3. chown -R hadoop.hadoop /var/hdfs

On namenode:

  1. mkdir /var/hdfs/namesecondary

Logs

  • namenode - tells you who's registered connections
  • datanode - block activity

SVN

  • private-wmf/hadoop/conf