本文介绍了Dataproc:Jupyter pyspark笔记本无法导入graphframes程序包的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!
问题描述
在Dataproc Spark群集中,graphframe程序包在spark-shell中可用,但在jupyter pyspark笔记本中不可用.
In Dataproc spark cluster, graphframe package is available in spark-shell but not in jupyter pyspark notebook.
Pyspark内核配置:
Pyspark kernel config:
PACKAGES_ARG='--packages graphframes:graphframes:0.2.0-spark2.0-s_2.11'
以下是初始化群集的cmd:
Following is the cmd to initialize cluster :
gcloud dataproc clusters create my-dataproc-cluster --properties spark.jars.packages=com.databricks:graphframes:graphframes:0.2.0-spark2.0-s_2.11 --metadata "JUPYTER_PORT=8124,INIT_ACTIONS_REPO=https://github.com/{xyz}/dataproc-initialization-actions.git" --initialization-actions gs://dataproc-initialization-actions/jupyter/jupyter.sh --num-workers 2 --properties spark:spark.executorEnv.PYTHONHASHSEED=0,spark:spark.yarn.am.memory=1024m --worker-machine-type=n1-standard-4 --master-machine-type=n1-standard-4
推荐答案
这是Spark Shells和YARN的旧错误,我认为已在 SPARK-15782 ,但显然此案没有出现.
This is an old bug with Spark Shells and YARN, that I thought was fixed in SPARK-15782, but apparently this case was missed.
建议的解决方法是添加
import os
sc.addPyFile(os.path.expanduser('~/.ivy2/jars/graphframes_graphframes-0.2.0-spark2.0-s_2.11.jar'))
在您导入之前.
这篇关于Dataproc:Jupyter pyspark笔记本无法导入graphframes程序包的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持!