本文介绍了与Scala Dataset#transform方法等效的Pyspark变换方法的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

Spark Scala API具有Dataset#transform方法,可轻松链接自定义DataFrame转换,如下所示:

The Spark Scala API has a Dataset#transform method that makes it easy to chain custom DataFrame transformations like so:

val weirdDf = df
  .transform(myFirstCustomTransformation)
  .transform(anotherCustomTransformation)

我没有看到transform方法>文档中的pyspark .

I don't see an equivalent transform method for pyspark in the documentation.

是否有PySpark方式来链接自定义转换?

Is there a PySpark way to chain custom transformations?

如果没有,如何对pyspark.sql.DataFrame类进行猴子修补以添加transform方法?

If not, how can the pyspark.sql.DataFrame class be monkey patched to add a transform method?

更新

已添加到PySpark . "https://spark.apache.org/docs/3.0.0-preview/api/python/pyspark.sql.html#pyspark.sql.DataFrame.transform" rel ="nofollow noreferrer"> PySpark 3.0 . 此博文概述了使用transform方法链接函数调用的最佳做法. >

The transform method was added to PySpark as of PySpark 3.0. This blog post outlines best practices for chaining function calls with the transform method.

推荐答案

实现:

from pyspark.sql.dataframe import DataFrame

def transform(self, f):
    return f(self)

DataFrame.transform = transform

用法:

spark.range(1).transform(lambda df: df.selectExpr("id * 2"))

这篇关于与Scala Dataset#transform方法等效的Pyspark变换方法的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持!

08-05 08:53