当前位置：

首页
/
IT
/
linux
/
HDFS 部署体验

HDFS 部署体验

jng 2019-03-02 20:29:41 浏览 60 评论 0

数据存储与数据库

hdfs

hadoop

配置

cluster
xml
Machine
file

摘要: 体验一把 HDFS 部署. 不涉及复杂功能. 当作自动文件备份工具使用.

简单说明

重要的配置参数及配置选择

部署实践, 参数配置修改记录

local machine, NameNode
local machine, DataNode
192.168.1.101, DataNode

启动 HDFS cluster

启动验证

hdfs shell 创建文件

问题与解决

关闭 HDFS cluster

结论

简单说明

下载 hadoop distribution

有三个包: (区别是啥?)

hadoop-x.y.z-site.tar.gz
hadoop-x.y.z-src.tar.gz
hadoop-x.y.z.tar.gz

hadoop 由不同的组件组成, 不同组件有不同的 daemon, 每个 daemon 是独立的 java process; 配置 daemon 的启动参数, 是通过环境变量实现

HDFS

在 etc/hadoop/hadoop-evn.sh 中配置

NameNode daemon: HDFS_NAMENODE_OPTS
DataNode daemon: HDFS_DATANODE_OPTS
Secondary NameNode daemon: HDFS_SECONDARYNAMENODE_OPTS
YARN

在 etc/hadoop/yarn-evn.sh 中配置

ResourceManager daemon: YARN_RESOURCEMANAGER_OPTS
NodeManager daemon: YARN_NODEMANAGER_OPTS
webAppProxy daemon: YARN_PROXYSERVER_OPTS
MapReduce

在 etc/hadoop/mapred-evn.sh 中配置

MAP Reduce Job History Server daemon: MAPRED_HISTORYSERVER_OPTS

hadoop 全局配置, 在系统文件 (~/.bashrc) 中配置

HADOOP_HOME: hadoop distribution 的家目录, 至少要配置

HADOOP_PID_DIR
HADOOP_LOG_DIR
HADOOP_HEAPSIZE_MAX

重要的配置参数及配置选择

所有节点都要配置

etc/hadoop/core-site.xml

示例配置位置:./share/doc/hadoop/hadoop-project-dist/hadoop-common/core-default.xml

fs.defaultFS

配置 HDFS 中 NameNode 的 URI

io.file.buffer.size

NameNode 节点配置

etc/hadoop/hdfs-site.xml

示例配置位置:./share/doc/hadoop/hadoop-project-dist/hadoop-hdfs/hdfs-default.xml

dfs.namenode.name.dir
dfs.hosts / dfs.hosts/excluded
dfs.blocksize
dfs.namenode.handler.count

DataNode 节点配置

etc/hadoop/hdfs-site.xml
dfs.datanode.data.dir

部署实践, 参数配置修改记录

local machine, NameNode

system 环境变量

export HADOOP_HOME="/home/jng/installed/hadoop/hadoop-3.2.0"
export HADOOP_PID_DIR="/home/jng/installed/hadoop/hadoop_pid_dir"
export HADOOP_LOG_DIR="/home/jng/installed/hadoop/hadoop_log_dir"
etc/hadoop/core-size.xml
fs.defaultFS
<property>
fs.defaultFS
hdfs://195.90.3.212:9988/
The name of the default file system. A URI whose
scheme and authority determine the FileSystem implementation. The
uri's scheme determines the config property (fs.SCHEME.impl) naming
the FileSystem implementation class. The uri's authority is used to

determine the host, port, etc. for a filesystem.

</property>
io.file.buffer.size
<property>
io.file.buffer.size
4096

The size of buffer for use in sequence files.

The size of this buffer should probably be a multiple of hardware
page size (4096 on Intel x86), and it determines how much data is

buffered during read and write operations.

</property>
etc/hadoop/hdfs-site.xml
dfs.namenode.name.dir
<property>
      <name>dfs.namenode.name.dir</name>
      <value>file:///home/jng/installed/hadoop/dfs_namenode_name_dir</value>
      <description>Determines where on the local filesystem the DFS name node
should store the name table(fsimage). If this is a comma-delimited list
of directories then the name table is replicated in all of the
directories, for redundancy.
</property>
local machine, DataNode
etc/hadoop/hdfs-site.xml
dfs.datanode.data.dir
<property>
dfs.datanode.data.dir
file:///home/jng/installed/hadoop/dfs_datanode_data_dir
Determines where on the local filesystem an DFS data node
should store its blocks. If this is a comma-delimited
list of directories, then data will be stored in all named
directories, typically on different devices. The directories should be tagged
with corresponding storage types ([SSD]/[DISK]/[ARCHIVE]/[RAM_DISK]) for HDFS
storage policies. The default storage type will be DISK if the directory does
not have a storage type tagged explicitly. Directories that do not exist will

be created if local filesystem permission allows.

</property>
192.168.1.101, DataNode

system 环境变量

export HADOOP_HOME="/home/mhb/installed/hadoop/hadoop-3.2.0"
export HADOOP_PID_DIR="/home/mhb/installed/hadoop/hadoop_pid_dir"
export HADOOP_LOG_DIR="/home/mhb/installed/hadoop/hadoop_log_dir"
etc/hadoop/core-site.html
fs.defaultFS
<property>
fs.defaultFS
hdfs://195.90.3.212:9988/
The name of the default file system. A URI whose
scheme and authority determine the FileSystem implementation. The
uri's scheme determines the config property (fs.SCHEME.impl) naming
the FileSystem implementation class. The uri's authority is used to

determine the host, port, etc. for a filesystem.

</property>
io.file.buffer.size
<property>
io.file.buffer.size
4096

The size of buffer for use in sequence files.

The size of this buffer should probably be a multiple of hardware
page size (4096 on Intel x86), and it determines how much data is

buffered during read and write operations.

</property>
etc/hadoop/hdfs-site.xml
dfs.datanode.data.dir
<property>
dfs.datanode.data.dir
file:///home/mhb/installed/hadoop/dfs_datanode_data_dir
Determines where on the local filesystem an DFS data node
should store its blocks. If this is a comma-delimited
list of directories, then data will be stored in all named
directories, typically on different devices. The directories should be tagged
with corresponding storage types ([SSD]/[DISK]/[ARCHIVE]/[RAM_DISK]) for HDFS
storage policies. The default storage type will be DISK if the directory does
not have a storage type tagged explicitly. Directories that do not exist will

be created if local filesystem permission allows.

</property>

启动 HDFS cluster

首次启动 HDFS cluster 必须进行格式化

# 在 namenode 设备上执行(??)
    $ $HADOOP_HOME/bin/hdfs namenode -format <cluster_name>

启动 NameNode

# 在 namenode 设备上执行
    $ $HADOOP_HOME/bin/hdfs --daemon start namenode

启动 DataNode

$ $HADOOP_HOME/bin/hdfs --daemon start datanode

可选一键启动

# 前提是同时满足: 1)etc/hadoop/workers 文件被正确配置; 2)NameNode 设备与 DataNode 设备间无密码 SSH 访问已经配置完毕
    $ $HADOOP_HOME/sbin/start-dfs.sh

启动验证

查看 NameNode 的 Web ui: http://ip:port http://ip/ default port is: 9870

查看 DataNode 的 Web ui: http://ip:port http://ip/ default port is: 9864

hdfs shell 创建文件

从本地 copy 大尺寸文件到 hdfs 中, 查看 namenode,datanode 的数据存储文件夹大小变化

copy 文件前

local machine as NameNode 的 dfs.namenode.name.dir 路径

[j@j dfs_namenode_name_dir]$ pwd
    /home/jng/installed/hadoop/dfs_namenode_name_dir
    [j@j dfs_namenode_name_dir]$ du -hs
    2.1M    .
    [j@j dfs_namenode_name_dir]$

local machine as DataNode 的 dfs.datanode.data.dir 路径

[j@j dfs_datanode_data_dir]$ pwd
    /home/jng/installed/hadoop/dfs_datanode_data_dir
    [j@j dfs_datanode_data_dir]$ du -hs
    44K    .
    [j@j dfs_datanode_data_dir]$

192.168.1.101 DataNode 的 dfs.datanode.data.dir 路径

m@m:~/installed/hadoop/dfs_datanode_data_dir$ pwd
    /home/mhb/installed/hadoop/dfs_datanode_data_dir
    m@m:~/installed/hadoop/dfs_datanode_data_dir$ du -hs
    44K    .
    m@m:~/installed/hadoop/dfs_datanode_data_dir$

copy 文件

# 在 NameNode 上操作
[j@j hadoop-3.2.0]$ pwd
/home/jng/installed/hadoop/hadoop-3.2.0
[j@j hadoop-3.2.0]$ ls -lh ~/software/hadoop/hadoop-3.2.0.tar.gz

-rw-r--r-- 1 jng jng 330M 2 月 25 14:21 /home/jng/software/hadoop/hadoop-3.2.0.tar.gz

[j@j hadoop-3.2.0]$ ./bin/hdfs dfs -moveFromLocal ~/software/hadoop/hadoop-3.2.0.tar.gz /tmp/

copy 文件后

local machine as NameNode 的 dfs.namenode.name.dir 路径

[j@j dfs_namenode_name_dir]$ pwd
    /home/jng/installed/hadoop/dfs_namenode_name_dir
    [j@j dfs_namenode_name_dir]$ du -hs
    2.1M    .
    [j@j dfs_namenode_name_dir]$

local machine as DataNode 的 dfs.datanode.data.dir 路径

[j@j dfs_datanode_data_dir]$ pwd
    /home/jng/installed/hadoop/dfs_datanode_data_dir
    [j@j dfs_datanode_data_dir]$ du -hs
    333M    .
    [j@j dfs_datanode_data_dir]$

192.168.1.101 as DataNode 的 dfs.dataanode.data.dir 路径

m@m:~/installed/hadoop/dfs_datanode_data_dir$ pwd
    /home/mhb/installed/hadoop/dfs_datanode_data_dir
    m@m:~/installed/hadoop/dfs_datanode_data_dir$ du -hs
    333M    .
    m@m:~/installed/hadoop/dfs_datanode_data_dir$

问题与解决

NameNode Web UI 上查看 namenode-log 可能发现 WARN 形如:"WARN org.apache.hadoop.hdfs.server.blockmanagement.DatanodeManager: Unresolved datanode registration: hostname cannot be resolved (ip=192.168.1.101, hostname=192.168.1.101)"

ref: <https://blog.csdn.net/qqpy789/article/details/78189335>

修改文件 etc/hadoop/hdfs-site.xml 配置 dfs.namenode.datanode.registration.ip-hostname-check 取值为 false

<property>
      <name>dfs.namenode.datanode.registration.ip-hostname-check</name>
      <value>false</value>
      <description>
If true (the default), then the namenode requires that a connecting
datanode's address must be resolved to a hostname. If necessary, a reverse
DNS lookup is performed. All attempts to register a datanode from an

unresolvable address are rejected.

It is recommended that this setting be left on to prevent accidental
registration of datanodes listed by hostname in the excludes file during a
DNS outage. Only set this to false in environments where there is no

infrastructure to support reverse DNS lookup.

</description>
    </property>

关闭 HDFS cluster

关闭 NameNode

# 在 NameNode 设备上执行
    $ $HADOOP_HOME/bin/hdfs --daemon stop namenode

关闭 DataNode

$ $HADOOP_HOME/bin/hdfs --daemon stop datanode

结论

HDFS 可以独立与 YARN 存在并运行

即, 不启动 YARN,HDFS 也能正常运行, 至少通过 HDFS shell 是这样

HDFS 的 NameNode 设备上可以同时运行一个 DataNode

来源: https://yq.aliyun.com/articles/692077

与本文相关文章

暂无,快来抢沙发吧！