Analytics/Cluster/Hue

From Wikitech
Jump to navigation Jump to search

Hue is a web interface for the Hadoop cluster, most notably for running Hive queries and checking the status of Oozie jobs. Our Hue instance is at https://hue.wikimedia.org.

General documentation for Hue can be found on our instance at https://hue.wikimedia.org/help/. This is our graphical interface into the Hadoop cluster and everything happening on it, so it's complex. On this page we will try to detail a few common tasks.

Access

To access Hue, you'll need general analytics data access.

If you have that, you can log in using your shell username and Wikimedia developer account (Wikitech) password. If you already have cluster access, but can't log into Hue, it is likely that your account needs to be manually synced. Ask an Analytics Opsen – ottomata (aotto at wikimedia.org) or elukey (ltoscano at wikimedia.org) – or file a Phabricator task for help (the instructions they will need to follow are at Analytics/Cluster/Hue/Administration).

Testing an Oozie job that runs a Spark job

  • start your job, overriding properties like start_time (see detail at Analytics/Cluster/Oozie). Coordinators should have example submit commands at the top.
  • look at running coordinators on Hue. When you started your job you got an oozie id you can use directly, but your job is usually at the top of the Running or Completed queue here.
  • in the coordinator view, on the Calendar Tab, you should see just one instance running if you properly passed start_time/stop_time overrides. Click on that.
  • in the workflow view now, on the Actions Tab, you'll see a little 3-stack icon in the Logs column. Click on that.
  • These are the logs of the Oozie Job. You probably want the logs of the Spark Job application master. When running normally, the Oozie job logs lines like these:
2019-06-18 14:25:52,359 [main] INFO  org.apache.spark.deploy.yarn.Client  - Application report for application_1560620285026_8417 (state: RUNNING)
2019-06-18 14:25:53,360 [main] INFO  org.apache.spark.deploy.yarn.Client  - Application report for application_1560620285026_8417 (state: RUNNING)
  • Your Spark Job logs will be in yarn under the application id "application_1560620285026_8417". To find it, you either go to https://yarn.wikimedia.org/cluster/scheduler and look around or go directly to https://yarn.wikimedia.org/cluster/app/application_1560620285026_8417. If you know the cluster's not too busy you can go straight to the scheduler, it might be easy to find your job without looking through Hue.
  • Tricky: Our settings by default are to retry jobs 6 times. You will not see this if you're just looking at Hue, because the Oozie job won't fail when the application master fails. It will try again up to 6 times, failing and restarting each time. If this is happening, you'll see more than one Application Master listed at the direct link above. If this is the case, you probably want to kill your job, it wouldn't have restarted for any good reason.
  • Important: your job has to finish for logs to be available, because it runs through yarn and logs are aggregated and available only after the job completes.
  • Copy your application id and look at yarn logs. As the user that started the job (analytics, hdfs, your own user, etc.), run yarn logs -applicationId application_1560620285026_8417 | grep ERROR.
  • If you need more detail, you can play with the grep, but keep in mind yarn logs are incredibly verbose. There's a wrapper script to help grep for some common things, you can run it like this (as the user that started the job - otherwise all you get is confusion):
$ export PYTHONPATH=/srv/deployment/analytics/refinery/python
$ cd /srv/deployment/analytics/refinery/bin
$ ./yarn-logs application_1560620285026_8417