问题描述
有一个最近的功能(2015年春季),显然旨在允许以编程方式提交Spark作业.
There is a somewhat recent (Spring 2015) feature apparently intended to allow submitting a spark job programmatically.
这是JIRA https://issues.apache.org/jira/browse/SPARK-4924
但是,如何实际使用这些功能还是有不确定性的(也包括我在内).这是吉拉语中的最后一条评论:
However there is uncertainty (and count me as well) about how to actually use these features. Here are the last comments in the jira:
当要求这项工作的真正作者进一步解释时,是在API文档中查找".
When asking the actual author of this work to further explain it is "look in the API docs".
作者没有提供更多细节,并且显然认为整个问题都是自我解释.如果任何人都可以在这里进行说明:特别是-在API文档中何处描述了此较新的Spark Submit功能-会感激的.
The author did not provide further details and apparently feels the whole issue were self explanatory. If anyone can connect the dots here: specifically - where in the API docs is this newer Spark Submit capability described - it would be appreciated.
以下是我正在寻找的一些信息-指向以下内容:
Here is some of the info I am looking for -Pointers to the following:
- Spark API已添加了哪些功能
- 我们如何使用它们
- 任何示例/其他相关文档和/或代码
更新接受的答案中提到的SparkLauncher
确实在琐碎的(master = local [*])条件下启动了一个简单的应用程序.尚待观察它在实际集群上的可用性.在将打印语句添加到链接代码后:
Update The SparkLauncher
referred to in the accepted answer does launch a simple app under trivial ( master=local[*]) conditions. It remains to be seen how usable it will be on an actual cluster. After adding a print statement to the linked code:
println(启动.正在等待.") spark.waitFor()
println("launched.. and waiting..") spark.waitFor()
我们确实看到了:
嗯,这可能只是向前迈出的一小步.当我转向真正的集群环境时,将更新此问题.
Well this is probably a small step forward. Will update this question as I move towards a real clustered environment.
推荐答案
查看拉请求,似乎该功能由SparkLauncher
类提供,如此处的API文档.
Looking at the details of the pull request, it seems that the functionality is provided by the SparkLauncher
class, described in the API docs here.
Spark应用程序的启动器.
Launcher for Spark applications.
使用此类以编程方式启动Spark应用程序.班级 使用构建器模式允许客户端配置Spark 应用程序并将其作为子进程启动.
Use this class to start Spark applications programmatically. The class uses a builder pattern to allow clients to configure the Spark application and launch it as a child process.
API文档相当少,但我发现了一个博客文章,其中提供了工作示例(代码也可以在 GitHub存储库).我复制了以下示例的简化版本(未经测试),以防链接失效:
The API docs are rather minimal, but I found a blog post that gives a worked example (code also available in a GitHub repo). I have copied a simplified version of the example below (untested) in case the links go stale:
import org.apache.spark.launcher.SparkLauncher
object Launcher extends App {
val spark = new SparkLauncher()
.setSparkHome("/home/user/spark-1.4.0-bin-hadoop2.6")
.setAppResource("/home/user/example-assembly-1.0.jar")
.setMainClass("MySparkApp")
.setMaster("local[*]")
.launch();
spark.waitFor();
}
另请参阅:
- Another tutorial blog post / review of the feature
- A book chapter on the topic
这篇关于如何使用程序化火花提交功能的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持!