Hadoop分支
- Apache
- Cloudera
- Hortonworks
本文是采用Cloudera分支的hadoop。
下载cdh-5.3.6 版本
下载地址:http://archive.cloudera.com/cdh5/cdh/5/
各组件版本一定保持一致。
- cdh5.3.6-snappy-lib-natirve.tar.gz
- hadoop-2.5.0-cdh5.3.6.tar.gz
- hive-0.13.1-cdh5.3.6.tar.gz
- sqoop-1.4.5-cdh5.3.6.tar.gz
安装配置
- 配置好jdk
- 上传到ubuntu /opt/software/cdh。
- tar -zxvf hadoop-2.5.0-cdh5.3.6.tar.gz -C /opt/cdh-5.3.6
- tar -zxvf hive-0.13.1-cdh5.3.6.tar.gz -C /opt/cdh-5.3.6
<configuration>
<property>
<name>fs.defaultFS</name>
<value>hdfs://hp-expert.tianpo.com:8020</value>
</property>
<property>
<name>hadoop.tmp.dir</name>
<value>/opt/cdh-5.3.6/hadoop-2.5.0-cdh5.3.6/data/tmp</value>
</property>
</configuration>
<configuration>
<property>
<name>dfs.namenode.secondary.http-address</name>
<value>hp-expert.tianpo.com:50090</value>
</property>
<property>
<name>dfs.namenode.http-address</name>
<value>hp-expert.tianpo.com:50070</value>
</property>
<property>
<name>dfs.replication</name>
<value>1</value>
</property>
<property>
<name>dfs.permissions</name>
<value>false</value>
</property>
</configuration>
<configuration>
<property>
<name>mapreduce.framework.name</name>
<value>yarn</value>
</property>
<property>
<name>mapreduce.jobhistory.address</name>
<value>hp-expert.tianpo.com:10020</value>
</property>
<property>
<name>mapreduce.jobhistory.webapp.address</name>
<value>hp-expert.tianpo.com:19888</value>
</property>
</configuration>
<configuration>
<property>
<name>yarn.resourcemanager.hostname</name>
<value>hp-expert.tianpo.com</value>
</property>
<property>
<name>yarn.nodemanager.aux-services</name>
<value>mapreduce_shuffle</value>
</property>
<property>
<name>yarn.log-aggregation-enable</name>
<value>true</value>
</property>
<property>
<name>yarn.log-aggregation.retain-seconds</name>
<value>640800</value>
</property>
</configuration>
hp-expert.tianpo.com
格式化namenode
启动
检查jps:
- 1905 NameNode
- 2354 NodeManager
- 2499 JobHistoryServer
- 2084 ResourceManager
- 1991 DataNode
- 2538 Jps
访问:http://hp-expert.tianpo.com:50070/ 如果打不开,检查是否有端口在监听:netstat –ant 50070
检查host配置:格式为(不能以用127.0.0.1):IP 域名
配置hive
# Set HADOOP_HOME to point to a specific hadoop install directory
HADOOP_HOME=/opt/cdh-5.3.6/hadoop-2.5.0-cdh5.3.6
# Hive Configuration Directory can be controlled by:
export HIVE_CONF_DIR=/opt/cdh-5.3.6/hive-0.13.1-cdh5.3.6/conf
hive.log.threshold=ALL
hive.root.logger=WARN,DRFA
hive.log.dir=/opt/cdh-5.3.6/hive-0.13.1-cdh5.3.6/logs
hive.log.file=hive.log
<?xml version="1.0" encoding="UTF-8" standalone="no"?>
<?xml-stylesheet type="text/xsl" href="configuration.xsl"?>
<configuration>
<property>
<name>javax.jdo.option.ConnectionURL</name>
<value>jdbc:mysql://host:3306/metadata?createDatabaseIfNotExist=true</value>
</property>
<property>
<name>javax.jdo.option.ConnectionDriverName</name>
<value>com.mysql.jdbc.Driver</value>
</property>
<property>
<name>javax.jdo.option.ConnectionUserName</name>
<value>***</value>
</property>
<property>
<name>javax.jdo.option.ConnectionPassword</name>
<value>***</value>
</property>
<property>
<name>hive.cli.print.header</name>
<value>true</value>
</property>
<property>
<name>hive.cli.print.current.db</name>
<value>true</value>
</property>
<property>
<name>hive.fetch.task.conversion</name>
<value>more</value>
</property>
</configuration>
需要把jdbc驱动上传到hive/lib下(mysql-connector-java-5.1.27.jar),注意对应的版本。
bin/hdfs dfs -mkdir -p /user/hive/warehouse
bin/hdfs dfs -chomd g+w /user/hive/warehouse
create table student(id int, name string) ROW FORMAT DELIMITED FIELDS TERMINATED BY '\t';
load data local inpath '/opt/datas/student.txt'into table student ;
web站点
- http://hp-expert.tianpo.com:50070
- http://hp-expert.tianpo.com:8088/cluster