General documentation for Hue can be found on our instance at https://hue.wikimedia.org/help/. This is our graphical interface into the Hadoop cluster and everything happening on it, so it's complex. On this page we will try to detail a few common tasks.
To access Hue, you'll need general analytics data access.
If you have that, you can log in using your shell username and Wikimedia developer account (Wikitech) password. If you already have cluster access, but can't log into Hue, it is likely that your account needs to be manually synced. Ask an Analytics Opsen – ottomata (aottowikimedia.org) or elukey (ltoscanowikimedia.org) – or file a Phabricator task for help (the instructions they will need to follow are at Analytics/Cluster/Hue/Administration).
Testing an Oozie job that runs a Spark job
- start your job, overriding properties like start_time (see detail at Analytics/Cluster/Oozie). Coordinators should have example submit commands at the top.
- look at running coordinators on Hue. When you started your job you got an oozie id you can use directly, but your job is usually at the top of the Running or Completed queue here.
- in the coordinator view, on the Calendar Tab, you should see just one instance running if you properly passed start_time/stop_time overrides. Click on that.
- in the workflow view now, on the Actions Tab, you'll see a little 3-stack icon in the Logs column. Click on that.
- These are the logs of the Oozie Job. You probably want the logs of the Spark Job application master. When running normally, the Oozie job logs lines like these:
2019-06-18 14:25:52,359 [main] INFO org.apache.spark.deploy.yarn.Client - Application report for application_1560620285026_8417 (state: RUNNING) 2019-06-18 14:25:53,360 [main] INFO org.apache.spark.deploy.yarn.Client - Application report for application_1560620285026_8417 (state: RUNNING)
- Your Spark Job logs will be in yarn under the application id "application_1560620285026_8417". To find it, you either go to https://yarn.wikimedia.org/cluster/scheduler and look around or go directly to https://yarn.wikimedia.org/cluster/app/application_1560620285026_8417. If you know the cluster's not too busy you can go straight to the scheduler, it might be easy to find your job without looking through Hue.
- Tricky: Our settings by default are to retry jobs 6 times. You will not see this if you're just looking at Hue, because the Oozie job won't fail when the application master fails. It will try again up to 6 times, failing and restarting each time. If this is happening, you'll see more than one Application Master listed at the direct link above. If this is the case, you probably want to kill your job, it wouldn't have restarted for any good reason.
- Important: your job has to finish for logs to be available, because it runs through yarn and logs are aggregated and available only after the job completes.
- Copy your application id and look at yarn logs. As the user that started the job (analytics, hdfs, your own user, etc.), run
yarn logs -applicationId application_1560620285026_8417 | grep ERROR.
- If you need more detail, you can play with the grep, but keep in mind yarn logs are incredibly verbose. There's a wrapper script to help grep for some common things, you can run it like this (as the user that started the job - otherwise all you get is confusion):
$ export PYTHONPATH=/srv/deployment/analytics/refinery/python $ cd /srv/deployment/analytics/refinery/bin $ ./yarn-logs application_1560620285026_8417