问题描述
Spark Scala API具有Dataset#transform
方法,可轻松链接自定义DataFrame转换,如下所示:
The Spark Scala API has a Dataset#transform
method that makes it easy to chain custom DataFrame transformations like so:
val weirdDf = df
.transform(myFirstCustomTransformation)
.transform(anotherCustomTransformation)
我没有看到transform方法>文档中的pyspark .
I don't see an equivalent transform
method for pyspark in the documentation.
是否有PySpark方式来链接自定义转换?
Is there a PySpark way to chain custom transformations?
如果没有,如何对pyspark.sql.DataFrame
类进行猴子修补以添加transform
方法?
If not, how can the pyspark.sql.DataFrame
class be monkey patched to add a transform
method?
更新
从已添加到PySpark . "https://spark.apache.org/docs/3.0.0-preview/api/python/pyspark.sql.html#pyspark.sql.DataFrame.transform" rel ="nofollow noreferrer"> PySpark 3.0 . 此博文概述了使用transform方法链接函数调用的最佳做法. >
The transform method was added to PySpark as of PySpark 3.0. This blog post outlines best practices for chaining function calls with the transform method.
推荐答案
实现:
from pyspark.sql.dataframe import DataFrame
def transform(self, f):
return f(self)
DataFrame.transform = transform
用法:
spark.range(1).transform(lambda df: df.selectExpr("id * 2"))
这篇关于与Scala Dataset#transform方法等效的Pyspark变换方法的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持!