在MacOs上配置hadoop和spark环境
Setting up Hadoop with Spark on MacOs
Instructions
- 准备环境
如果没有brew,先google怎样安装brew
先uninstall老版本的Hadoop然后更新homebrew formulae
检查版本信息
如果以上程序没有安装,需要使用
brew install app
进行安装。 安装环境
安装hadoop安装spark
设置环境变量
使用vim编辑~/.bash_profile
,将以下内容贴到最后# set environment variables
export JAVA_HOME=$(/usr/libexec/java_home)
export HADOOP_HOME=/usr/local/Cellar/hadoop/2.5.1
export HADOOP_CONF_DIR=$HADOOP_HOME/libexec/etc/hadoop
export SCALA_HOME=/usr/local/Cellar/apache-spark/1.1.0 # set path variables
export PATH=$PATH:$HADOOP_HOME/bin:$SCALA_HOME/bin # set alias start & stop scripts
alias hstart=$HADOOP_HOME/sbin/start-dfs.sh;$HADOOP_HOME/sbin/start-yarn.sh
alias hstop=$HADOOP_HOME/sbin/stop-dfs.sh;$HADOOP_HOME/sbin/stop-yarn.sh
Hadoop必须要使ssh生效,设置ssh
- 配置文件路径:
- 生成秘钥:
Generating public/private rsa key pair.
Enter file in which to save the key (/var/root/.ssh/id_rsa): 输入/var/root/.ssh/id_rsa
Enter passphrase (empty for no passphrase): [直接回车]
Enter same passphrase again: [直接回车]
Your identification has been saved in /var/root/.ssh/id_rsa.
Your public key has been saved in /var/root/.ssh/id_rsa.pub.
key fingerprint is:
97:e9:5a:5e:91:52:30:63:9e:34:1a:6f:24:64:75:af root@cuican.local
The key's randomart image is:
+--[ RSA 2048]----+
| .=.X . |
| . X B . |
| . = . . |
| . + o |
| S = E |
| o . . |
| o . |
| + . |
| . . |
+-----------------+
- 修改配置文
Port 22
#AddressFamily any
#ListenAddress 0.0.0.0
#ListenAddress ::
# The default requires explicit activation of protocol 1
Protocol 2
# HostKey for protocol version 1
#HostKey /etc/ssh/ssh_host_key
# HostKeys for protocol version 2
#HostKey /etc/ssh/ssh_host_rsa_key
#HostKey /etc/ssh/ssh_host_dsa_key
#HostKey /etc/ssh/ssh_host_ecdsa_key
HostKey /var/root/.ssh/id_rsa # Lifetime and size of ephemeral version 1 server key
KeyRegenerationInterval 1h
ServerKeyBits 1024 # Logging
# obsoletes QuietMode and FascistLogging
SyslogFacility AUTHPRIV
#LogLevel INFO # Authentication:
LoginGraceTime 2m
PermitRootLogin yes
StrictModes yes
#MaxAuthTries 6
#MaxSessions 10 RSAAuthentication yes PubkeyAuthentication yes
- 启动ssh服务
Mac 上sshd的位置在
/usr/sbin/sshd
在终端输入sudo /usr/sbin/sshd即可启动sshd服务。
配置Hadoop
到hadoop的安装路径编辑
etc/hadoop/hadoop-env.sh
# this fixes the "scdynamicstore" warning
export HADOOP_OPTS="$HADOOP_OPTS -Djava.security.krb5.realm= -Djava.security.krb5.kdc="
编辑
etc/hadoop/core-site.xml
<configuration>
<property>
<name>fs.defaultFS</name>
<value>hdfs://localhost:9000</value>
</property>
</configuration>
编辑
etc/hadoop/hdfs-site.xml
<configuration>
<property>
<name>dfs.replication</name>
<value>1</value>
</property>
</configuration>
编辑
etc/hadoop/mapred-site.xml
<configuration>
<property>
<name>mapreduce.framework.name</name>
<value>yarn</value>
</property>
</configuration>
编辑
etc/hadoop/yarn-site.xml
<configuration>
<property>
<name>yarn.nodemanager.aux-services</name>
<value>mapreduce_shuffle</value>
</property>
</configuration>
开始启用Hadoop
移动到Hadoop的root directory格式化Hadoop HDFS
启动NameNode和DataNode daemon
从网页中查看
启动ResourceManager和NodeManager daemon
检查所有的守护线程是不是已经在运行
从网页中查看ResourceManager
创建HDFS目录
启动一个MapReduce的例子
\#calculate pi
./bin/hadoop jar libexec/share/hadoop/mapreduce/hadoop-mapreduce-examples-2.5.1.jar pi 10 100
启动spark
到Spark的安装目录
启动Spark的例子
在网页中查看Spark任务
也可以使用
Spark-submit
来提交任务# pattern to launch an application in yarn-cluster mode
./bin/spark-submit --class <path.to.class> --master yarn-cluster [options] <app.jar> [options] # run example application (calculate pi)
./bin/spark-submit --class org.apache.spark.examples.SparkPi --master yarn-cluster libexec/lib/spark-examples-*.jar
结束