In order to access Hive, you need an account with production shell access in either the
analytics-privatedata-users or the
analytics-users user group. For more instructions, see Analytics/Data access.
Some of the data in Hive, like the webrequest logs, are private data,so only
analytics-privatedata-users can access it. If you are requesting access to Hive, you probably want to be in this group.
Once you have the credentials, see Analytics/Systems/Cluster/Access for instructions on using the web UI and SSH tunneling.
- Analytics/Cluster/Hive/Queries (includes a FAQ about common tasks and problems)
While hive supports SQL, there are some differences: see the Hive Language Manual for more info.
(see also Analytics/Data)
- Webrequest (raw and refined)
- EventLogging data, in the
- The wmf_raw and wmf databases contain Hive tables maintained by Ops. You can create your own tables in Hive, but please be sure to create them in a different database, preferably one named after your shell username.
- Hive has the ability to map tables on top of almost any data structure. Since webrequest logs are JSON, the Hive tables must be told to use a JSON SerDe to be able to serialize/deserialize to/from JSON. We use the JsonSerDe included with Hive-HCatalog.
- The HCatalog .jar will be automatically added to a Hive client's auxpath. You shouldn't need to think about it.
See the FAQ