IntelliJ中的独立Spark应用程序

本文介绍了IntelliJ中的独立Spark应用程序的处理方法，对大家解决问题具有一定的参考价值，需要的朋友们下面随着小编来一起学习吧！

问题描述

我正在尝试在本地服务器上运行Spark应用程序(用Scala编写)以进行调试.看来YARN是我在sbt构建定义中使用的spark(2.2.1)版本中的默认设置，并且根据我不断得到的错误，没有spark/YARN服务器在监听:

I am trying to run a spark application (written in Scala) on a local server for debug. It seems that YARN is the default in the version of spark (2.2.1) that I have in the sbt build definitions, and according to an error I'm consistently getting, there is no spark/YARN server listening:

Client:920 - Failed to connect to server: 0.0.0.0/0.0.0.0:8032: retries get failed due to exceeded maximum allowed retries number

根据netstat的确，我的本地服务器上确实没有处于侦听状态的端口8032.

According to netstat indeed there is really no port 8032 on my local server, in listening state.

通常如何以一种绕过此问题的方式在本地运行我的spark应用程序?我只需要应用程序处理少量数据以进行调试，因此我希望能够在本地运行，而无需依赖本地服务器上的特定SPARK/YARN安装和设置-这将是理想的调试设置.

How would I typically accomplish running my spark application locally, in a way bypassing this problem? I only need the application to process a small amount of data for debug, and hence would like to be able to run locally, without reliance on specific SPARK/YARN installations and setups on the local server ― that would be an ideal debug setup.

有可能吗?

我的sbt定义已经引入了所有必要的spark和spark.yarn罐子.在IntelliJ之外的sbt中运行相同项目时，也会重现该问题.

My sbt definitions already bring in all the necessary spark and spark.yarn jars. The problem also reproduces when running the same project in sbt, outside of IntelliJ.

推荐答案

如果必须使用极少的数据测试管道，则可以使用.master("local[*]")以本地模式提交spark应用程序.

You could submit spark application in local mode with .master("local[*]") if you have to test pipeline with miniscule data.

完整代码:

val spark = SparkSession
  .builder
  .appName("myapp")
  .master("local[*]")
  .getOrCreate()

对于spark-submit，使用--master local[*]作为参数之一.请参考以下内容: https://spark.apache.org/docs/latest/submitting -applications.html

For spark-submit use --master local[*] as one of the arguments. Refer this: https://spark.apache.org/docs/latest/submitting-applications.html

注意:不要在您的代码库中硬编码母版，请始终尝试从命令行提供这些变量.这使得应用程序可用于本地/测试/mesos/kubernetes/yarn/其他任何地方.

Note: Do not hard code master in your codebase, always try to supply these variables from commandline. This makes application reusable for local/test/mesos/kubernetes/yarn/whatever.

这篇关于IntelliJ中的独立Spark应用程序的文章就介绍到这了，希望我们推荐的答案对大家有所帮助，也希望大家多多支持！