问题描述
我正在尝试在本地服务器上运行Spark应用程序(用Scala编写)以进行调试.看来YARN是我在sbt构建定义中使用的spark(2.2.1)版本中的默认设置,并且根据我不断得到的错误,没有spark/YARN服务器在监听:
I am trying to run a spark application (written in Scala) on a local server for debug. It seems that YARN is the default in the version of spark (2.2.1) that I have in the sbt build definitions, and according to an error I'm consistently getting, there is no spark/YARN server listening:
Client:920 - Failed to connect to server: 0.0.0.0/0.0.0.0:8032: retries get failed due to exceeded maximum allowed retries number
根据netstat的确,我的本地服务器上确实没有处于侦听状态的端口8032.
According to netstat indeed there is really no port 8032 on my local server, in listening state.
通常如何以一种绕过此问题的方式在本地运行我的spark应用程序?我只需要应用程序处理少量数据以进行调试,因此我希望能够在本地运行,而无需依赖本地服务器上的特定SPARK/YARN安装和设置-这将是理想的调试设置.
How would I typically accomplish running my spark application locally, in a way bypassing this problem? I only need the application to process a small amount of data for debug, and hence would like to be able to run locally, without reliance on specific SPARK/YARN installations and setups on the local server ― that would be an ideal debug setup.
有可能吗?
我的sbt定义已经引入了所有必要的spark和spark.yarn罐子.在IntelliJ之外的sbt中运行相同项目时,也会重现该问题.
My sbt definitions already bring in all the necessary spark and spark.yarn jars. The problem also reproduces when running the same project in sbt, outside of IntelliJ.
推荐答案
如果必须使用极少的数据测试管道,则可以使用.master("local[*]")
以本地模式提交spark应用程序.
You could submit spark application in local mode with .master("local[*]")
if you have to test pipeline with miniscule data.
完整代码:
val spark = SparkSession
.builder
.appName("myapp")
.master("local[*]")
.getOrCreate()
对于spark-submit
,使用--master local[*]
作为参数之一.请参考以下内容: https://spark.apache.org/docs/latest/submitting -applications.html
For spark-submit
use --master local[*]
as one of the arguments. Refer this: https://spark.apache.org/docs/latest/submitting-applications.html
注意:不要在您的代码库中硬编码母版,请始终尝试从命令行提供这些变量.这使得应用程序可用于本地/测试/mesos/kubernetes/yarn/其他任何地方.
Note: Do not hard code master in your codebase, always try to supply these variables from commandline. This makes application reusable for local/test/mesos/kubernetes/yarn/whatever.
这篇关于IntelliJ中的独立Spark应用程序的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持!