jng 2019-03-02 20:29:41 浏览 60 评论 0
数据存储与数据库
hdfs
hadoop
配置
- cluster
- xml
- Machine
- file
摘要: 体验一把 HDFS 部署. 不涉及复杂功能. 当作自动文件备份工具使用.
目录
简单说明
重要的配置参数及配置选择
部署实践, 参数配置修改记录
- local machine, NameNode
- local machine, DataNode
- 192.168.1.101, DataNode
启动 HDFS cluster
启动验证
hdfs shell 创建文件
问题与解决
关闭 HDFS cluster
结论
简单说明
下载 hadoop distribution
有三个包: (区别是啥?)
- hadoop-x.y.z-site.tar.gz
- hadoop-x.y.z-src.tar.gz
- hadoop-x.y.z.tar.gz
hadoop 由不同的组件组成, 不同组件有不同的 daemon, 每个 daemon 是独立的 java process; 配置 daemon 的启动参数, 是通过环境变量实现
HDFS
在 etc/hadoop/hadoop-evn.sh 中配置
- NameNode daemon: HDFS_NAMENODE_OPTS
- DataNode daemon: HDFS_DATANODE_OPTS
- Secondary NameNode daemon: HDFS_SECONDARYNAMENODE_OPTS
- YARN
在 etc/hadoop/yarn-evn.sh 中配置
- ResourceManager daemon: YARN_RESOURCEMANAGER_OPTS
- NodeManager daemon: YARN_NODEMANAGER_OPTS
- webAppProxy daemon: YARN_PROXYSERVER_OPTS
- MapReduce
在 etc/hadoop/mapred-evn.sh 中配置
MAP Reduce Job History Server daemon: MAPRED_HISTORYSERVER_OPTS
hadoop 全局配置, 在系统文件 (~/.bashrc) 中配置
HADOOP_HOME: hadoop distribution 的家目录, 至少要配置
- HADOOP_PID_DIR
- HADOOP_LOG_DIR
- HADOOP_HEAPSIZE_MAX
重要的配置参数及配置选择
所有节点都要配置
etc/hadoop/core-site.xml
示例配置位置:./share/doc/hadoop/hadoop-project-dist/hadoop-common/core-default.xml
fs.defaultFS
配置 HDFS 中 NameNode 的 URI
io.file.buffer.size
NameNode 节点配置
etc/hadoop/hdfs-site.xml
示例配置位置:./share/doc/hadoop/hadoop-project-dist/hadoop-hdfs/hdfs-default.xml
- dfs.namenode.name.dir
- dfs.hosts / dfs.hosts/excluded
- dfs.blocksize
- dfs.namenode.handler.count
DataNode 节点配置
- etc/hadoop/hdfs-site.xml
- dfs.datanode.data.dir
部署实践, 参数配置修改记录
local machine, NameNode
system 环境变量
- export HADOOP_HOME="/home/jng/installed/hadoop/hadoop-3.2.0"
- export HADOOP_PID_DIR="/home/jng/installed/hadoop/hadoop_pid_dir"
- export HADOOP_LOG_DIR="/home/jng/installed/hadoop/hadoop_log_dir"
- etc/hadoop/core-size.xml
- fs.defaultFS
- <property>
- fs.defaultFS
- hdfs://195.90.3.212:9988/
- The name of the default file system. A URI whose
- scheme and authority determine the FileSystem implementation. The
- uri's scheme determines the config property (fs.SCHEME.impl) naming
- the FileSystem implementation class. The uri's authority is used to
determine the host, port, etc. for a filesystem.
- </property>
- io.file.buffer.size
- <property>
- io.file.buffer.size
- 4096
The size of buffer for use in sequence files.
- The size of this buffer should probably be a multiple of hardware
- page size (4096 on Intel x86), and it determines how much data is
buffered during read and write operations.
- </property>
- etc/hadoop/hdfs-site.xml
- dfs.namenode.name.dir
- <property>
- <name>dfs.namenode.name.dir</name>
- <value>file:///home/jng/installed/hadoop/dfs_namenode_name_dir</value>
- <description>Determines where on the local filesystem the DFS name node
- should store the name table(fsimage). If this is a comma-delimited list
- of directories then the name table is replicated in all of the
- directories, for redundancy.
- </property>
- local machine, DataNode
- etc/hadoop/hdfs-site.xml
- dfs.datanode.data.dir
- <property>
- dfs.datanode.data.dir
- file:///home/jng/installed/hadoop/dfs_datanode_data_dir
- Determines where on the local filesystem an DFS data node
- should store its blocks. If this is a comma-delimited
- list of directories, then data will be stored in all named
- directories, typically on different devices. The directories should be tagged
- with corresponding storage types ([SSD]/[DISK]/[ARCHIVE]/[RAM_DISK]) for HDFS
- storage policies. The default storage type will be DISK if the directory does
- not have a storage type tagged explicitly. Directories that do not exist will
be created if local filesystem permission allows.
- </property>
- 192.168.1.101, DataNode
system 环境变量
- export HADOOP_HOME="/home/mhb/installed/hadoop/hadoop-3.2.0"
- export HADOOP_PID_DIR="/home/mhb/installed/hadoop/hadoop_pid_dir"
- export HADOOP_LOG_DIR="/home/mhb/installed/hadoop/hadoop_log_dir"
- etc/hadoop/core-site.html
- fs.defaultFS
- <property>
- fs.defaultFS
- hdfs://195.90.3.212:9988/
- The name of the default file system. A URI whose
- scheme and authority determine the FileSystem implementation. The
- uri's scheme determines the config property (fs.SCHEME.impl) naming
- the FileSystem implementation class. The uri's authority is used to
determine the host, port, etc. for a filesystem.
- </property>
- io.file.buffer.size
- <property>
- io.file.buffer.size
- 4096
The size of buffer for use in sequence files.
- The size of this buffer should probably be a multiple of hardware
- page size (4096 on Intel x86), and it determines how much data is
buffered during read and write operations.
- </property>
- etc/hadoop/hdfs-site.xml
- dfs.datanode.data.dir
- <property>
- dfs.datanode.data.dir
- file:///home/mhb/installed/hadoop/dfs_datanode_data_dir
- Determines where on the local filesystem an DFS data node
- should store its blocks. If this is a comma-delimited
- list of directories, then data will be stored in all named
- directories, typically on different devices. The directories should be tagged
- with corresponding storage types ([SSD]/[DISK]/[ARCHIVE]/[RAM_DISK]) for HDFS
- storage policies. The default storage type will be DISK if the directory does
- not have a storage type tagged explicitly. Directories that do not exist will
be created if local filesystem permission allows.
</property>
启动 HDFS cluster
首次启动 HDFS cluster 必须进行格式化
- # 在 namenode 设备上执行(??)
- $ $HADOOP_HOME/bin/hdfs namenode -format <cluster_name>
启动 NameNode
- # 在 namenode 设备上执行
- $ $HADOOP_HOME/bin/hdfs --daemon start namenode
启动 DataNode
$ $HADOOP_HOME/bin/hdfs --daemon start datanode
可选一键启动
- # 前提是同时满足: 1)etc/hadoop/workers 文件被正确配置; 2)NameNode 设备与 DataNode 设备间无密码 SSH 访问已经配置完毕
- $ $HADOOP_HOME/sbin/start-dfs.sh
启动验证
查看 NameNode 的 Web ui: http://ip:port http://ip/ default port is: 9870
查看 DataNode 的 Web ui: http://ip:port http://ip/ default port is: 9864
hdfs shell 创建文件
从本地 copy 大尺寸文件到 hdfs 中, 查看 namenode,datanode 的数据存储文件夹大小变化
copy 文件前
local machine as NameNode 的 dfs.namenode.name.dir 路径
- [j@j dfs_namenode_name_dir]$ pwd
- /home/jng/installed/hadoop/dfs_namenode_name_dir
- [j@j dfs_namenode_name_dir]$ du -hs
- 2.1M .
- [j@j dfs_namenode_name_dir]$
local machine as DataNode 的 dfs.datanode.data.dir 路径
- [j@j dfs_datanode_data_dir]$ pwd
- /home/jng/installed/hadoop/dfs_datanode_data_dir
- [j@j dfs_datanode_data_dir]$ du -hs
- 44K .
- [j@j dfs_datanode_data_dir]$
192.168.1.101 DataNode 的 dfs.datanode.data.dir 路径
- m@m:~/installed/hadoop/dfs_datanode_data_dir$ pwd
- /home/mhb/installed/hadoop/dfs_datanode_data_dir
- m@m:~/installed/hadoop/dfs_datanode_data_dir$ du -hs
- 44K .
- m@m:~/installed/hadoop/dfs_datanode_data_dir$
copy 文件
- # 在 NameNode 上操作
- [j@j hadoop-3.2.0]$ pwd
- /home/jng/installed/hadoop/hadoop-3.2.0
- [j@j hadoop-3.2.0]$ ls -lh ~/software/hadoop/hadoop-3.2.0.tar.gz
-rw-r--r-- 1 jng jng 330M 2 月 25 14:21 /home/jng/software/hadoop/hadoop-3.2.0.tar.gz
[j@j hadoop-3.2.0]$ ./bin/hdfs dfs -moveFromLocal ~/software/hadoop/hadoop-3.2.0.tar.gz /tmp/
copy 文件后
local machine as NameNode 的 dfs.namenode.name.dir 路径
- [j@j dfs_namenode_name_dir]$ pwd
- /home/jng/installed/hadoop/dfs_namenode_name_dir
- [j@j dfs_namenode_name_dir]$ du -hs
- 2.1M .
- [j@j dfs_namenode_name_dir]$
local machine as DataNode 的 dfs.datanode.data.dir 路径
- [j@j dfs_datanode_data_dir]$ pwd
- /home/jng/installed/hadoop/dfs_datanode_data_dir
- [j@j dfs_datanode_data_dir]$ du -hs
- 333M .
- [j@j dfs_datanode_data_dir]$
192.168.1.101 as DataNode 的 dfs.dataanode.data.dir 路径
- m@m:~/installed/hadoop/dfs_datanode_data_dir$ pwd
- /home/mhb/installed/hadoop/dfs_datanode_data_dir
- m@m:~/installed/hadoop/dfs_datanode_data_dir$ du -hs
- 333M .
- m@m:~/installed/hadoop/dfs_datanode_data_dir$
问题与解决
NameNode Web UI 上查看 namenode-log 可能发现 WARN 形如:"WARN org.apache.hadoop.hdfs.server.blockmanagement.DatanodeManager: Unresolved datanode registration: hostname cannot be resolved (ip=192.168.1.101, hostname=192.168.1.101)"
ref: <https://blog.csdn.net/qqpy789/article/details/78189335>
修改文件 etc/hadoop/hdfs-site.xml 配置 dfs.namenode.datanode.registration.ip-hostname-check 取值为 false
- <property>
- <name>dfs.namenode.datanode.registration.ip-hostname-check</name>
- <value>false</value>
- <description>
- If true (the default), then the namenode requires that a connecting
- datanode's address must be resolved to a hostname. If necessary, a reverse
- DNS lookup is performed. All attempts to register a datanode from an
unresolvable address are rejected.
- It is recommended that this setting be left on to prevent accidental
- registration of datanodes listed by hostname in the excludes file during a
- DNS outage. Only set this to false in environments where there is no
infrastructure to support reverse DNS lookup.
- </description>
- </property>
关闭 HDFS cluster
关闭 NameNode
- # 在 NameNode 设备上执行
- $ $HADOOP_HOME/bin/hdfs --daemon stop namenode
关闭 DataNode
$ $HADOOP_HOME/bin/hdfs --daemon stop datanode
结论
HDFS 可以独立与 YARN 存在并运行
即, 不启动 YARN,HDFS 也能正常运行, 至少通过 HDFS shell 是这样
HDFS 的 NameNode 设备上可以同时运行一个 DataNode
来源: https://yq.aliyun.com/articles/692077