问题描述
我正在尝试在 python 中处理 12GB 的数据,为此我迫切需要使用 Spark ,但我想我太愚蠢了,无法自己或使用互联网使用命令行,这就是为什么我想我必须转向 SO ,
I am trying to work with 12GB of data in python for which I desperately need to use Spark , but I guess I'm too stupid to use command line by myself or by using internet and that is why I guess I have to turn to SO ,
到目前为止,我已经下载了 spark 并解压缩了 tar 文件或其他任何内容(抱歉语言不通,但我觉得自己很愚蠢)但现在我无处可去.我已经看到了 spark 网站文档的说明,它说:
So by far I have downloaded the spark and unzipped the tar file or whatever that is ( sorry for the language but I am feeling stupid and out ) but now I can see nowhere to go. I have seen the instruction on spark website documentation and it says :
Spark 还提供了 Python API.要在 Python 解释器中以交互方式运行 Spark,请使用 bin/pyspark
但在哪里执行此操作?请帮忙.我使用的是 Windows 10
Spark also provides a Python API. To run Spark interactively in a Python interpreter, use bin/pyspark
but where to do this ? please please help .Edit : I am using windows 10
注意:我在尝试安装某些东西时总是遇到问题,主要是因为我似乎无法理解命令提示符
Note:: I have always faced problems when trying to install something mainly because I can't seem to understand Command prompt
推荐答案
如果你对jupyter notebook比较熟悉,可以安装Apache Toree 将 pyspark、scala、sql 和 SparkR 内核与 Spark 集成.
If you are more familiar with jupyter notebook, you can install Apache Toree which integrates pyspark,scala,sql and SparkR kernels with Spark.
用于安装 toree
pip install toree
jupyter toree install --spark_home=path/to/your/spark_directory --interpreters=PySpark
如果你想安装其他内核,你可以使用
if you want to install other kernels you can use
jupyter toree install --interpreters=SparkR,SQl,Scala
现在运行
jupyter notebook
在选择新笔记本时的 UI 中,您应该会看到以下可用内核
In the UI while selecting new notebook, you should see following kernels availble
Apache Toree-Pyspark
Apache Toree-SparkR
Apache Toree-SQL
Apache Toree-Scala
这篇关于如何在 python 或 jupyter notebook 中使用 spark的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持!