问题描述
当我在控制台中运行命令 pyspark
时,我试图启动 jupyter 笔记本.当我现在输入它时,它只会在控制台中启动和交互式 shell.但是,这不方便键入长代码行.有没有办法将 jupyter notebook 连接到 pyspark shell?谢谢.
我假设您已经安装了 spark 和 jupyter 笔记本,并且它们可以相互独立地完美运行.
如果是这种情况,请按照以下步骤操作,您应该能够启动带有 (py)spark 后端的 jupyter 笔记本.
转到您的 spark 安装文件夹,那里应该有一个
bin
目录:/path/to/spark/bin
创建一个文件,我们称之为
start_pyspark.sh
打开
#!/bin/bashstart_pyspark.sh
并编写如下内容:export PYSPARK_PYTHON=/path/to/anaconda3/bin/pythonexport PYSPARK_DRIVER_PYTHON=/path/to/anaconda3/bin/jupyterexport PYSPARK_DRIVER_PYTHON_OPTS="notebook --NotebookApp.open_browser=False --NotebookApp.ip='*' --NotebookApp.port=8880"pyspark$@"
将 /path/to ...
分别替换为您安装 python 和 jupyter 二进制文件的路径.
这一步很可能已经完成,但以防万一
修改您的
通过添加以下行~/.bashrc
文件#火花导出路径="/path/to/spark/bin:/path/to/spark/sbin:$PATH"export SPARK_HOME="/path/to/spark"导出 SPARK_CONF_DIR="/path/to/spark/conf"
运行 source ~/.bashrc
就设置好了.
继续尝试 start_pyspark.sh
.
您还可以为脚本提供参数,例如start_pyspark.sh --packages dibbhatt:kafka-spark-consumer:1.0.14
.
希望对你有用.
I am trying to fire the jupyter notebook when I run the command pyspark
in the console. When I type it now, it only starts and interactive shell in the console. However, this is not convenient to type long lines of code. Is there are way to connect the jupyter notebook to pyspark shell? Thanks.
I'm assuming you already have spark and jupyter notebooks installed and they work flawlessly independent of each other.
If that is the case, then follow the steps below and you should be able to fire up a jupyter notebook with a (py)spark backend.
Go to your spark installation folder and there should be a
bin
directory there:/path/to/spark/bin
Create a file, let's call it
start_pyspark.sh
Open
start_pyspark.sh
and write something like:#!/bin/bash
export PYSPARK_PYTHON=/path/to/anaconda3/bin/python export PYSPARK_DRIVER_PYTHON=/path/to/anaconda3/bin/jupyter export PYSPARK_DRIVER_PYTHON_OPTS="notebook --NotebookApp.open_browser=False --NotebookApp.ip='*' --NotebookApp.port=8880" pyspark "$@"
Replace the /path/to ...
with the path where you have installed your python and jupyter binaries respectively.
Most probably this step is already done, but just in case
Modify your~/.bashrc
file by adding the following lines# Spark export PATH="/path/to/spark/bin:/path/to/spark/sbin:$PATH" export SPARK_HOME="/path/to/spark" export SPARK_CONF_DIR="/path/to/spark/conf"
Run source ~/.bashrc
and you are set.
Go ahead and try start_pyspark.sh
.
You could also give arguments to the script, something likestart_pyspark.sh --packages dibbhatt:kafka-spark-consumer:1.0.14
.
Hope it works out for you.
这篇关于如何使用 jupyter notebook 运行 pyspark?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持!