User:QChrisNonWMF/TestClusterSetup
Appearance
To setup your own test cluster with Hadoop, Hive, Oozie, and Pig.
When following the below stop-by-step instructions, you'll at the end have a cluster consisting of three nodes:
foo-master.eqiad.wmflabs. The cluster's master, and main namenode.foo-worker1.eqiad.wmflabs. A worker node. You can use this to schedule oozie jobs or run hive querie.foo-worker2.eqiad.wmflabs. A worker node. You can use this to schedule oozie jobs or run hive querie.
Please swap foo with your login name if you follow these instructions!
Relevant Bugs
- bug 68161 because puppet from 2014-07-17 does not allow to bring a cluster up.
Creating instances
- Create a master instance.
- Go to the "Add instance" page.
- Set
Instance nametofoo-master. - Set
Instance typetom1.small [...].
- (A bigger instance won't buy you anything. When testing, you'll most likely not run into disk space problems, but rather "number of filenames"-limitations. So let's be nice to wikitech.)
- Click “Submit”.
- Create a first worker instance.
- (Use the same settings as for the master instance, but set
Instance nametofoo-worker1).
- (Use the same settings as for the master instance, but set
- Create a second worker instance.
- (Use the same settings as for the master instance, but set
Instance nametofoo-worker2).
- (Use the same settings as for the master instance, but set
- Wait for all of the instances showing “Puppet status” to say “ok”.
- (Will take ~15 minutes. The page does not refresh on its own, so reload from time to time)
Setting up Hadoop
- Set up Hadoop master
- Go to the “configure” page for the
foo-masterinstance. - Select
role::analytics::hadoop::master. - Set
hadoop_cluster_nametoanalytics-hadoop. - Set
hadoop_namenodestofoo-master.eqiad.wmflabs. - Click “Submit”.
- Log in to the
foo-masterinstance. - Run
sudo puppet agent --onetime --verbose --no-daemonize --no-splay --show_diff.- (This wil take ~7 minutes).
- Go to the “configure” page for the
- Set up first worker instance.
- Go to the “configure” page for the
foo-worker1instance. - Select
role::analytics::hadoop::worker. - Set
hadoop_cluster_nametoanalytics-hadoop. - Set
hadoop_namenodestofoo-master.eqiad.wmflabs.- (This is no typo,
foo-master.eqiad.wmflabsis the namenode.)
- (This is no typo,
- Click “Submit”.
- Log in to the
foo-worker1instance. - Run
sudo puppet agent --onetime --verbose --no-daemonize --no-splay --show_diff.
- Go to the “configure” page for the
- Set up second worker instance.
- Repeat the steps from “Set up first worker instance”, but use
foo-worker?insteadfoo-worker1.
- Repeat the steps from “Set up first worker instance”, but use
- Check that the two workers are known to the namenode
- Log in to the
foo-masterinstance. - Run
sudo su - -c 'sudo -u hdfs hdfs dfsadmin -printTopology' - If everything is working, The output will be a “defaultRack” showing a separate line for each worker node. Like
- Log in to the
Rack: /default-rack
10.68.17.180:50010 (foo-worker1.eqiad.wmflabs)
10.68.17.181:50010 (foo-worker2.eqiad.wmflabs)
Setting up Oozie, Hive
- Set up Hadoop master
- Go to the “configure” page for the
foo-masterinstance. - Select
role::analytics::hive::server. - Select
role::analytics::oozie::server. - Click “Submit”.
- Log in to the
foo-masterinstance. - Run
sudo puppet agent --onetime --verbose --no-daemonize --no-splay --show_diff.
- Go to the “configure” page for the
- Set up first worker instance.
- Go to the “configure” page for the
foo-worker1instance. - Select
role::analytics::hive::client. - Select
role::analytics::oozie::client. - Select
role::analytics::pig. - Click “Submit”.
- Log in to the
foo-worker1instance. - Run
sudo puppet agent --onetime --verbose --no-daemonize --no-splay --show_diff.
- Go to the “configure” page for the
- Set up second worker instance.
- Repeat the steps from “Set up first worker instance”, but use
foo-worker2insteadfoo-worker1.
- Repeat the steps from “Set up first worker instance”, but use
Bootstrapping content
Done. HDFS, Hive, Oozie, Pig are functional.
Sample data
You can use the scripts from the cluster-scripts repository to bootstrap some parts. For example:
- Log in to the
foo-worker1instance. - In the cluster-scripts directory, run
./prepare_for_user.sh - Log out.
- Log in to the
foo-worker1instance again - In the cluster-scripts directory, run
./hive_add_table_webrequest_sequence_stats.sh - In the cluster-scripts directory, run
./hive_add_table_webrequest.sh - In the cluster-scripts directory, run
./insert_webrequest_fixture_duplicate_monitoring.sh
Sample Deployment
- In the refinery directory, run
bin/deploy --no-dry-run --verbose
Sample Oozie job
- In the refinery directory, run
oozie job -config oozie/webrequest/compute_sequence_stats/job.properties -run -D hadoopMaster=foo-master.eqiad.wmflabs
Cluster Tuning
- Replication issues. (
hdfs-site.xml) The default block size is rather big for labs, and it'll pretty quickly give you randem errors about not being able to replicate blocks. You can work around it by settingdfs.namenode.fs-limits.min-block-sizeto100, and bothdfs.blocksize, anddfs.blocksizeto128k. That should basically kill the replication issue. If you run into it nonetheless, runhdfs dfs -expunge, whenever you run into the issue.
- Oozie needing 5 minutes between job creations. (
oozie-site.xml) Add the propertyoozie.service.CoordMaterializeTriggerService.lookup.intervalwith value2, to get the default down to 2 seconds. Reboot the oozie server afterwards.
- If yarn applications (e.g.: Hive jobs, Oozie jobs, ...) do not finish in time (like >10 minutes on simple selects), check
yarn application -list. If the jobs do not change there either, checkmapred job -list. IfUsedMem < NeededMem, orRsvdMem > 0for >5 minutes, you'll likely need more data nodes. Just addfoo-worker3,foo-worker4, ... accordingly to the description forfoo-worker1above.
- If the namenode switches into safemode, it's typically the free disk space (in plain fs) running low on the namenode in
/var. Look for big directories in/var/log. After some cleanup there, you can get the namenode out of safe mode by runningsudo su - -c "sudo -u hdfs hdfs dfsadmin -safemode leave"