问题描述
我已经为scala配置了eclipse,并创建了一个maven项目,并在Windows上编写了一个简单的字数统计工作。现在,我的spark + hadoop已安装在linux服务器上。如何将我的Spark代码从Eclipse启动到Spark集群(在Linux上)?
I have configured eclipse for scala and created a maven project and wrote a simple word count spark job on windows. Now my spark+hadoop are installed on linux server. How can I launch my spark code from eclipse to spark cluster (which is on linux)?
任何建议。
推荐答案
实际上,这个答案并不像您期望的那么简单。
Actually this answer is not so simple, as you would expect.
我会做很多假设,首先您要使用 sbt
,第二个是您正在基于Linux的计算机上工作,第三个是最后一个是您有两个类
在您的项目中,假设 RunMe
和 Globals
,最后一个假设是您要设置设置在程序内部。因此,在您的可运行代码中的某处,您必须具有以下内容:
I will make many assumptions, first that you use sbt
, second is that you are working in a linux based computer, third is the last is that you have two classes
in your project, let's say RunMe
and Globals
, and the last assumption will be that you want to set up the settings inside the program. Thus, somewhere in your runnable code you must have something like this:
object RunMe {
def main(args: Array[String]) {
val conf = new SparkConf()
.setMaster("mesos://master:5050") //If you use Mesos, and if your network resolves the hostname master to its IP.
.setAppName("my-app")
.set("spark.executor.memory", "10g")
val sc = new SparkContext(conf)
val sqlContext = new SQLContext()
//your code comes here
}
}
必须遵循的步骤是:
-
在项目的根目录中编译项目,通过使用:
Compile the project, in the root of it, by using:
$ sbt程序集
将作业发送到主节点,这是有趣的部分(假设您在项目 target / scala /
中具有下一个结构,并且在内部具有文件 .jar
,它对应于已编译的项目)
Send the job to the master node, this is the interesting part (assuming you have the next structure in your project target/scala/
, and inside you have a file .jar
, which corresponds to the compiled project)
$ spark-submit --class RunMe target / scala / app.jar
请注意,因为我认为项目有两个或多个类,您将必须确定要运行的类。此外,我敢打赌,对于 Yarn
和 Mesos
来说,这两种方法都非常相似。
Notice that, because I assumed that the project has two or more classes you would have to identify which class you want to run. Furthermore, I bet that both approaches, for Yarn
and Mesos
are very similar.
这篇关于在Eclipse上本地运行Spark代码,并在远程服务器上安装Spark的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持!