Obsolete:Hadoop

From Wikitech
Jump to navigation Jump to search
This page contains historical information. It is probably no longer true.
This process no longer needs to run on the snapshot hosts. (Nov 1 2010)

Hadoop is being setup on snapshot1, snapshot2, snapshot3 to test XML Snapshots.

Setting up a node cluster

  1. wget http://www.trieuvan.com/apache/hadoop/core/hadoop-0.20.1/hadoop-0.20.1.tar.gz
  2. apt-get install sun-java6-bin

core-site.xml

<configuration>
 <property>
    <name>fs.default.name</name>
    <value>hdfs://208.80.152.139:9000</value>
  </property>
</configuration>

hdfs-site.xml

<configuration>
  <property>
    <name>dfs.name.dir</name>
    <value>/var/hdfs/name</value>
    <final>true</final>
   </property>
  <property>
    <name>dfs.data.dir</name>
    <value>/var/hdfs/data</value>
    <final>true</final>
   </property>p
  <property>
    <name>fs.checkpoint.dir</name>
    <value>/var/hdfs/namesecondary</value>
    <final>true</final>
   </property>
</configuration>

Slaves

Add slave servers to conf/slaves

hadoop-env.sh

export JAVA_HOME=/usr/lib/jvm/java-6-sun/

Post Config

  1. mkdir /var/hdfs/name
  2. mkdir /var/hdfs/data
  3. chown -R hadoop.hadoop /var/hdfs

On namenode:

  1. mkdir /var/hdfs/namesecondary

Logs

  • namenode - tells you who's registered connections
  • datanode - block activity

SVN

  • private-wmf/hadoop/conf