当前位置：

首页
/
IT
/
linux
/
hue 框架介绍和安装部署

hue 框架介绍和安装部署

大家好, 我是来自内蒙古的小哥, 我现在在北京学习大数据, 我想把学到的东西分享给大家, 想和大家一起学习

hue 全称: HUE=Hadoop User Experience

他是 cloudera 公司提供的一个 web 框架, 和其他大数据框架整合, 提供可视化界面

hue 的架构

1.hue UI:hue 提供一个可视化的 Web 界面

2.hue server:hue 的服务器, 对外提供一个 Web 的访问

3.hue db: 存储整合框架的信息

1,Hue 的介绍

HUE=Hadoop User Experience

Hue 是一个开源的 Apache Hadoop UI 系统, 由 Cloudera Desktop 演化而来, 最后 Cloudera 公司将其贡献给 Apache 基金会的 Hadoop 社区, 它是基于 Python Web 框架 Django 实现的.

通过使用 Hue 我们可以在浏览器端的 Web 控制台上与 Hadoop 集群进行交互来分析处理数据, 例如操作 HDFS 上的数据, 运行 MapReduce Job, 执行 Hive 的 SQL 语句, 浏览 HBase 数据库等等.

HUE 链接

Site:  http://gethue.com/
GitHub:  https://github.com/cloudera/hue
Reviews:  https://review.cloudera.org/

Hue 的架构

核心功能

SQL 编辑器, 支持 Hive, Impala, MySQL, Oracle, PostgreSQL, SparkSQL, Solr SQL, Phoenix...

搜索引擎 Solr 的各种图表

Spark 和 Hadoop 的友好界面支持

支持调度系统 Apache Oozie, 可进行 workflow 的编辑, 查看

HUE 提供的这些功能相比 Hadoop 生态各组件提供的界面更加友好, 但是一些需要 debug 的场景可能还是需要使用原生系统才能更加深入的找到错误的原因.

HUE 中查看 Oozie workflow 时, 也可以很方便的看到整个 workflow 的 DAG 图, 不过在最新版本中已经将 DAG 图去掉了, 只能看到 workflow 中的 action 列表和他们之间的跳转关系, 想要看 DAG 图的仍然可以使用 oozie 原生的界面系统查看.

1, 访问 HDFS 和文件浏览

2, 通过 Web 调试和开发 hive 以及数据结果展示

3, 查询 Solr 和结果展示, 报表生成

4, 通过 Web 调试和开发 impala 交互式 SQL Query

5,spark 调试和开发

7,oozie 任务的开发, 监控, 和工作流协调调度

8,Hbase 数据查询和修改, 数据展示

9,Hive 的元数据 (metastore) 查询

10,MapReduce 任务进度查看, 日志追踪

11, 创建和提交 MapReduce,Streaming,Java job 任务

12,Sqoop2 的开发和调试

13,Zookeeper 的浏览和编辑

14, 数据库 (MySQL,PostGres,SQLite,Oracle) 的查询和展示

一句话总结: Hue 是一个友好的界面集成框架, 可以集成我们各种学习过的以及将要学习的框架, 一个界面就可以做到查看以及执行所有的框架

2,Hue 的安装

Hue 的安装支持多种方式, 包括 rpm 包的方式进行安装, tar.gz 包的方式进行安装以及 cloudera manager 的方式来进行安装等, 我们这里使用 tar.gz 包的方式来进行安装

第一步: 下载 Hue 的压缩包并上传到 Linux 解压

Hue 的压缩包的下载地址:

http://archive.cloudera.com/cdh5/cdh/5/

我们这里使用的是 CDH5.14.0 这个对应的版本, 具体下载地址为

下载然后上传到 Linux 系统, 然后进行解压

cd /export/softwares/
tar -zxvf hue-3.9.0-cdh5.14.0.tar.gz -C ../servers/

第二步: 编译安装启动

2.1,Linux 系统安装依赖包:

联网安装各种必须的依赖包

yum install ant asciidoc cyrus-sasl-devel cyrus-sasl-gssapi cyrus-sasl-plain gcc gcc-c++ krb5-devel libffi-devel libxml2-devel libxslt-devel make MySQL MySQL-devel openldap-devel python-devel SQLite-devel gmp-devel

2.2, 开始配置 Hue

cd /export/servers/hue-3.9.0-cdh5.14.0/desktop/conf
VIM  hue.INI
# 通用配置
[desktop]
secret_key=jFE93j;2[290-eiw.KEiwN2s3['d;/.q[eIW^y#e=+Iei*@Mn<qW5o   #  这里输入密钥为了保证唯一性可以输入任何参数只要唯一即可
http_host=node03.hadoop.com
is_hue_4=true
time_zone=Asia/Shanghai
server_user=root
server_group=root
default_user=root
default_hdfs_superuser=root
# 配置使用 MySQL 作为 hue 的存储数据库, 大概在 hue.INI 的 587 行左右
[[database]]
engine=MySQL
host=node03.hadoop.com
port=3306
user=root
password=123456
name=hue

2.3, 创建 MySQL 数据库

创建 hue 数据库

create database hue default character set utf8 default collate utf8_general_ci;

注意: 要为 hue 这个数据库创建对应的用户, 并分配权限

grant all on hue.* to 'hue'@'%' identified by 'hue';

2.4, 准备进行编译

cd /export/servers/hue-3.9.0-cdh5.14.0
make apps

2.5,Linux 系统添加普通用户 hue

useradd hue
passwd hue

2.6, 启动 hue 进程

cd /export/servers/hue-3.9.0-cdh5.14.0/
build/env/bin/supervisor

2.7, 页面访问

http://node03:8888/

第一次访问的时候, 需要设置管理员用户和密码

我们这里的管理员的用户名与密码尽量保持与我们安装 hadoop 的用户名和密码一致,

我们安装 hadoop 的用户名与密码分别是 root 123456

初次登录使用 root 用户, 密码为 123456

进入之后发现我们的 hue 页面报错了, 这个错误主要是因为 hive 的原因, 因为我们的 hue 与 hive 集成的时候出错了, 所以我们需要配置我们的 hue 与 hive 进行集成, 接下里就看看我们的 hue 与 hive 以及 hadoop 如何进行集成

3,hue 与其他框架的集成

3.1,hue 与 hadoop 的 HDFS 以及 yarn 集成

第一步: 更改所有 hadoop 节点的 core-site.xml 配置

记得更改完 core-site.xml 之后一定要重启 hdfs 与 yarn 集群

三台机器更改 core-site.xml

<property>
  <name>
    hadoop.proxyuser.root.hosts
  </name>
  <value>
    *
  </value>
</property>
<property>
  <name>
    hadoop.proxyuser.root.groups
  </name>
  <value>
    *
  </value>
</property>

第二步: 更改所有 hadoop 节点的 hdfs-site.xml

<property>
  <name>
    dfs.webhdfs.enabled
  </name>
  <value>
    true
  </value>
</property>

第三步: 重启 hadoop 集群

在 node01 机器上面执行以下命令

cd /export/servers/hadoop-2.6.0-cdh5.14.0
sbin/stop-dfs.sh
sbin/start-dfs.sh
sbin/stop-yarn.sh
sbin/start-yarn.sh

第四步: 停止 hue 的服务, 并继续配置 hue.INI

cd /export/servers/hue-3.9.0-cdh5.14.0/desktop/conf
VIM hue.INI

配置我们的 hue 与 hdfs 集成

[[hdfs_clusters]]
[[[default]]]
fs_defaultfs=hdfs://node01.hadoop.com:8020
webhdfs_url=http://node01.hadoop.com:50070/webhdfs/v1
hadoop_hdfs_home=/export/servers/hadoop-2.6.0-cdh5.14.0
hadoop_bin=/export/servers/hadoop-2.6.0-cdh5.14.0/bin
hadoop_conf_dir=/export/servers/hadoop-2.6.0-cdh5.14.0/etc/hadoop

配置我们的 hue 与 yarn 集成

[[yarn_clusters]]
[[[default]]]
resourcemanager_host=node01
resourcemanager_port=8032
submit_to=True
resourcemanager_api_url=http://node01:8088
history_server_api_url=http://node01:19888

3.2, 配置 hue 与 hive 集成

如果需要配置 hue 与 hive 的集成, 我们需要启动 hive 的 metastore 服务以及 hiveserver2 服务(impala 需要 hive 的 metastore 服务, hue 需要 hvie 的 hiveserver2 服务)

更改 hue 的配置 hue.INI

修改 hue.INI

[beeswax]
hive_server_host=node03.hadoop.com
hive_server_port=10000
hive_conf_dir=/export/servers/hive-1.1.0-cdh5.14.0/conf
server_conn_timeout=120
auth_username=root
auth_password=123456
[metastore]
  #允许使用 hive 创建数据库表等操作
enable_new_create_table=true

启动 hive 的 metastore 服务

去 node03 机器上启动 hive 的 metastore 以及 hiveserver2 服务

cd /export/servers/hive-1.1.0-cdh5.14.0
nohup bin/hive --service metastore &
nohup bin/hive --service hiveserver2 &

重新启动 hue, 然后就可以通过浏览器页面操作 hive 了

3.3, 配置 hue 与 impala 的集成

停止 hue 的服务进程

修改 hue.INI 配置文件

[impala]
server_host=node03
server_port=21050
impala_conf_dir=/etc/impala/conf

3.4, 配置 hue 与 MySQL 的集成

找到 databases 这个选项, 将这个选项下面的 MySQL 注释给打开, 然后配置 MySQL 即可, 大概在 1547 行

[[[MySQL]]]
nice_name="My SQL DB"
engine=MySQL
host=node03.hadoop.com
port=3306
user=root
password=123456

3.5, 重新启动 hue 的服务

cd /export/servers/hue-3.9.0-cdh5.14.0/
build/env/bin/supervisor

3.6, 解决 hive 以及 impala 执行权限不足的问题

在我们 hive 当中执行任意的查询, 只要是需要跑 MR 的程序, 就会报错, 发现权限不够的异常, 具体详细信息如下:

INFO  : Compiling command(queryId=root_20180625191616_d02efd23-2322-4f3d-9cb3-fc3a06ff4ce0): select count(1) from mystu
INFO  : Semantic Analysis Completed
INFO  : Returning Hive schema: Schema(fieldSchemas:[FieldSchema(name:_c0, type:bigint, comment:null)], properties:null)
INFO  : Completed compiling command(queryId=root_20180625191616_d02efd23-2322-4f3d-9cb3-fc3a06ff4ce0); Time taken: 0.065 seconds
INFO  : Concurrency mode is disabled, not creating a lock manager
INFO  : Executing command(queryId=root_20180625191616_d02efd23-2322-4f3d-9cb3-fc3a06ff4ce0): select count(1) from mystu
INFO  : Query ID = root_20180625191616_d02efd23-2322-4f3d-9cb3-fc3a06ff4ce0
INFO  : Total jobs = 1
INFO  : Launching Job 1 out of 1
INFO  : Starting task [Stage-1:MAPRED] in serial mode
INFO  : Number of reduce tasks determined at compile time: 1
INFO  : In order to change the average load for a reducer (in bytes):
INFO  :   set hive.exec.reducers.bytes.per.reducer=<number>
INFO  : In order to limit the maximum number of reducers:
INFO  :   set hive.exec.reducers.max=<number>
INFO  : In order to set a constant number of reducers:
INFO  :   set mapreduce.job.reduces=<number>
ERROR : Job Submission failed with exception 'org.apache.hadoop.security.AccessControlException(Permission denied: user=admin, access=EXECUTE, inode="/tmp":root:supergroup:drwxrwx---

我们需要给 hdfs 上面的几个目录执行权限即可

hdfs  dfs  -chmod o+x /tmp
hdfs  dfs  -chmod o+x  /tmp/hadoop-yarn
hdfs  dfs  -chmod o+x  /tmp/hadoop-yarn/staging

或者我们可以这样执行

hdfs dfs -chmod -R o+x /tmp

可以将 / tmp 目录下所有的文件及文件夹都赋予权限

继续执行 hive 的任务就不会报错了

来源: http://www.bubuko.com/infodetail-3098674.html

与本文相关文章

暂无,快来抢沙发吧！