问题描述
我正在使用Dockerized映像和Jupyter笔记本以及SparkR内核.创建SparkR笔记本时,它使用Microsoft R(3.3.2)安装而不是香草CRAN R安装(3.2.3).
I am using a Dockerized image and Jupyter notebook along with SparkR kernel. When I create a SparkR notebook, it uses an install of Microsoft R (3.3.2) instead of vanilla CRAN R install (3.2.3).
我正在使用的Docker映像安装了一些自定义R库和Python程序,但我没有明确安装MicrosoftR.无论我是否可以删除Microsoft R或并排安装它,如何使我的SparkR内核使用R的自定义安装?
The Docker image I'm using installs some custom R libraries and Python pacakages but I don't explicitly install Microsoft R. Regardless of whether or not I can remove Microsoft R or have it side-by-side, how I can get my SparkR kernel to use a custom installation of R?
预先感谢
推荐答案
与Docker相关的问题之外,Jupyter内核的设置是在名为kernel.json
的文件中配置的,这些文件位于特定的目录(每个内核一个)中,可以看到使用命令jupyter kernelspec list
;例如,在我的(Linux)机器上就是这种情况:
Docker-related issues aside, the settings for Jupyter kernels are configured in files named kernel.json
, residing in specific directories (one per kernel) which can be seen using the command jupyter kernelspec list
; for example, here is the case in my (Linux) machine:
$ jupyter kernelspec list
Available kernels:
python2 /usr/lib/python2.7/site-packages/ipykernel/resources
caffe /usr/local/share/jupyter/kernels/caffe
ir /usr/local/share/jupyter/kernels/ir
pyspark /usr/local/share/jupyter/kernels/pyspark
pyspark2 /usr/local/share/jupyter/kernels/pyspark2
tensorflow /usr/local/share/jupyter/kernels/tensorflow
再次,例如,这是我的R内核(ir
)的kernel.json
的内容
Again, as an example, here are the contents of the kernel.json
for my R kernel (ir
)
{
"argv": ["/usr/lib64/R/bin/R", "--slave", "-e", "IRkernel::main()", "--args", "{connection_file}"],
"display_name": "R 3.3.2",
"language": "R"
}
这是我的pyspark2
内核的相应文件:
And here is the respective file for my pyspark2
kernel:
{
"display_name": "PySpark (Spark 2.0)",
"language": "python",
"argv": [
"/opt/intel/intelpython27/bin/python2",
"-m",
"ipykernel",
"-f",
"{connection_file}"
],
"env": {
"SPARK_HOME": "/home/ctsats/spark-2.0.0-bin-hadoop2.6",
"PYTHONPATH": "/home/ctsats/spark-2.0.0-bin-hadoop2.6/python:/home/ctsats/spark-2.0.0-bin-hadoop2.6/python/lib/py4j-0.10.1-src.zip",
"PYTHONSTARTUP": "/home/ctsats/spark-2.0.0-bin-hadoop2.6/python/pyspark/shell.py",
"PYSPARK_PYTHON": "/opt/intel/intelpython27/bin/python2"
}
}
如您所见,在两种情况下,argv
的第一个元素都是相应语言的可执行文件-在我的情况下,我的ir
内核是GNU R,而我的pyspark2
内核是Intel Python 2.7.对此进行更改,使其指向您的GNU R可执行文件,应该可以解决您的问题.
As you can see, in both cases the first element of argv
is the executable for the respective language - in my case, GNU R for my ir
kernel and Intel Python 2.7 for my pyspark2
kernel. Changing this, so that it points to your GNU R executable, should resolve your issue.
这篇关于如何使用Jupyter + SparkR和自定义R安装的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持!