问题描述
我已按照以下步骤在intellij中设置 pyspark
:
I have followed the steps to set up pyspark
in intellij from this question:
以下是试图运行的简单代码:
Here is the simple code attempted to run:
#!/usr/bin/env python
from pyspark import *
def p(msg): print("%s\n" %repr(msg))
import numpy as np
a = np.array([[1,2,3], [4,5,6]])
p(a)
import os
sc = SparkContext("local","ptest",conf=SparkConf().setAppName("x"))
ardd = sc.parallelize(a)
p(ardd.collect())
以下是提交代码的结果
NOTE: SPARK_PREPEND_CLASSES is set, placing locally compiled Spark classes ahead of assembly.
Error: Must specify a primary resource (JAR or Python or R file)
Run with --help for usage help or --verbose for debug output
Traceback (most recent call last):
File "/git/misc/python/ptest.py", line 14, in <module>
sc = SparkContext("local","ptest",SparkConf().setAppName("x"))
File "/shared/spark16/python/pyspark/conf.py", line 104, in __init__
SparkContext._ensure_initialized()
File "/shared/spark16/python/pyspark/context.py", line 245, in _ensure_initialized
SparkContext._gateway = gateway or launch_gateway()
File "/shared/spark16/python/pyspark/java_gateway.py", line 94, in launch_gateway
raise Exception("Java gateway process exited before sending the driver its port number")
Exception: Java gateway process exited before sending the driver its port number
但是我真的不明白 如何 这可能会起作用:为了在 Spark
中运行,代码需要捆绑并通过<$提交C $ C>火花提交。
However I really do not understand how this could be expected to work: in order to run in Spark
the code needs to be bundled up and submitted via spark-submit
.
所以我怀疑其他问题确实真正解决了通过Intellij提交pyspark代码来激发。
So I doubt that that other question actually truly addressed submitting pyspark code through Intellij to spark.
有没有办法将 pyspark
代码提交到 pyspark
?它实际上是
Is there a way to submit pyspark
code to pyspark
? It would actually be
spark-submit myPysparkCode.py
pyspark
可执行文件本身已弃用,因为 Spark 1.0
。任何人都有这个工作吗?
The pyspark
executable itself is deprecated since Spark 1.0
. Anyone have this working?
推荐答案
在我的情况下,来自其他Q& A的变量设置涵盖 most 但不包括 all 所需的设置。我试过很多次了。
In my case the variable settings from the other Q&A Write and run pyspark in IntelliJ IDEA covered most but not all of the required settings. I tried them many times.
仅在添加:
PYSPARK_SUBMIT_ARGS = pyspark-shell
到运行配置
做了 pyspark
最后安静下来并成功。
to the run configuration
did pyspark
finally quiet down and succeed.
这篇关于在Intellij中运行pyspark代码的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持!