一、实验主机:4台阿里云主机centos6.9
node1:master(namenode)node2:slave(datanode)
node3:slave(datanode)
node4:slave(datanode)
在/etc/hosts文件中设置如上四个主机名解析
四台主机之间相互设置免密登录(这里直接使用root用户,生产环境应该使用其它应用用户),使用命令 ssh-keygen -t rsa 生成密钥对
把四台主机公钥文件~/.ssh/id_rsa.pub文件内容追加写入到文件~/.ssh/authorized_keys中
二、下载安装介质
jdk-8u141-linux-x64.tar.gz oracle官网下载 (以测试hadoop2.7.4不支持jdk1.7.0_80)hadoop-2.7.4.tar.gz https://mirrors.tuna.tsinghua.edu.cn/apache/hadoop/common/hadoop-2.7.4/hadoop-2.7.4.tar.gz
解压到/usr/local/目录下
tar -zxfjdk-8u141-linux-x64.tar.gz -C /usr/local/
tar -zxfhadoop-2.7.4.tar.gz-C /usr/local/
三、设置环境变量
编辑/etc/profileexport PATH=$PATH:$JAVA_HOME/bin/:/usr/local/hadoop-2.7.4/bin/:/usr/local/hadoop-2.7.4/sbin/
source /etc/profile
执行hadoop命令测试环境变量是否生效
[root@node1 ~]# hadoop
Usage: hadoop [--config confdir] [COMMAND | CLASSNAME]
CLASSNAME run the class named CLASSNAME
or
where COMMAND is one of:
fs run a generic filesystem user client
version print the version
jar run a jar file
note: please use "yarn jar" to launch
YARN applications, not this command.
checknative [-a|-h] check native hadoop and compression libraries availability
distcpcopy file or directories recursively
archive -archiveName NAME -p*create a hadoop archive
classpath prints the class path needed to get the
credential interact with credential providers
Hadoop jar and the required libraries
daemonlog get/set the log level for each daemon
trace view and modify Hadoop tracing settings
Most commands print help when invoked w/o parameters.
[root@node1 ~]#
四、配置并启动hadoop
hadoop-env.sh文件中需要配置JAVA_HOMEexport JAVA_HOME=/usr/local/jdk1.8.0_141/
如下四个配置文件分别用于配置hadoop的不同组件,对应的默认配置为xxx-default.xml,具体认查阅官方文档
conf/core-site.xml
conf/hdfs-site.xml
conf/yarn-site.xml
conf/mapred-site.xml
本次实验core-site.xml的配置,配置了master(namenode)的url和hadoop的tmp.dir
点击(此处)折叠或打开
- <configuration>
- <property>
- <name>fs.defaultFS</name>
- <value>hdfs://node1:9000</value>
- </property>
- <property>
- <name>hadoop.tmp.dir</name>
- <value>/var/hadoop</value>
- </property>
- </configuration>
在masterh上执行hadoop-daemon.sh start namenode
在slave上执行hadoop-daemon.sh start datanode
启动之前要先格式化namenode,在master上执行hdfs namenode -format
配置集中管理
这样分别要master和slave上启动比较麻烦,可以在master上配置集中管理在slaves文件中添加slave的地址即可,然后master要配置免密登录slave,省去输入密码的麻烦
[root@node1 hadoop]# cat slaves
node2
node3
node4
[root@node1 hadoop]#
配置好后就可以使用start-dfs.sh stop-dfs.sh 命令对整个hdfs集群进行启停了。使用这个方式启动会在master上自动启secondarynamenode
master上启动的进程
[root@node1 hadoop]# jps
9930 Jps
9471 SecondaryNameNode
9283 NameNode
[root@node1 hadoop]#
slave上启动的进程
[root@node2 subdir0]# jps
8816 Jps
8677 DataNode
[root@node2 subdir0]#
五、操作hdfs,增删改查文件
- hadoop fs -ls /
- hadoop fs -put "/your/file/path" /
- hadoop fs -mkdir /dirname
- hadoop fs -text /filename
- hadoop fs -rm /filename