python - PySpark的addPyFile方法使SparkContext无

我一直在努力do this。在PySpark shell中，我得到SparkContext作为sc。但是当我使用addPyFile方法时，它会生成SparkContextNone：

>>> sc2 = sc.addPyFile("/home/ec2-user/redis.zip")
>>> sc2 is None
True

发生了什么？

最佳答案

下面是source code to pyspark's (v1.1.1) addPyFile。（在我撰写本文时，pyspark官方文档中1.4.1的源链接已断开）
它返回None，因为没有return语句。另请参见：in python ,if a function doesn't have a return statement,what does it return?
所以，如果您这样做了，当然sc2 = sc.addPyFile("mymodule.py")不会返回任何内容，因为sc2不会返回任何内容！
相反，只需调用.addPyFile()并继续使用sc.addPyFile("mymodule.py")作为sc

def addPyFile(self, path):
635          """
636          Add a .py or .zip dependency for all tasks to be executed on this
637          SparkContext in the future.  The C{path} passed can be either a local
638          file, a file in HDFS (or other Hadoop-supported filesystems), or an
639          HTTP, HTTPS or FTP URI.
640          """
641          self.addFile(path)
642          (dirname, filename) = os.path.split(path)  # dirname may be directory or HDFS/S3 prefix
643
644          if filename.endswith('.zip') or filename.endswith('.ZIP') or filename.endswith('.egg'):
645              self._python_includes.append(filename)
646              # for tests in local mode
647              sys.path.append(os.path.join(SparkFiles.getRootDirectory(), filename))