问题描述
explode
函数和 explode
运算符有什么区别?
What's the difference between explode
function and explode
operator?
推荐答案
spark.sql.functions.explode
explode
函数为给定数组或映射列(在 DataFrame 中)中的每个元素创建一个新行.
spark.sql.functions.explode
explode
function creates a new row for each element in the given array or map column (in a DataFrame).
val signals: DataFrame = spark.read.json(signalsJson)
signals.withColumn("element", explode($"data.datapayload"))
explode
创建一个 列.
参见函数 对象和 How to unwind array in DataFrame (from JSON) 中的示例?
explode
运算符几乎是 explode
函数.
来自 Scaladoc:
From the scaladoc:
explode
返回一个新的数据集,其中单列已被提供的函数扩展到零或更多行.这类似于 HiveQL 中的横向视图.输入行的所有列都与函数输出的每个值隐式连接.
ds.flatMap(_.words.split(" "))
请注意(再次引用 scaladoc):
Please note that (again quoting the scaladoc):
已弃用(自 2.0.0 版起)使用 flatMap()
或 select()
和 functions.explode()
代替
参见 数据集 API 以及 如何使用类型化数据集将多值列拆分为单独的行?
尽管 explode
被弃用(然后我们可以将主要问题转换为 explode
函数和 flatMap
运算符之间的区别),但区别是前者是函数而后者是运算符.它们具有不同的签名,但可以给出相同的结果.这通常会导致讨论什么更好,通常归结为个人偏好或编码风格.
Despite explode
being deprecated (that we could then translate the main question to the difference between explode
function and flatMap
operator), the difference is that the former is a function while the latter is an operator. They have different signatures, but can give the same results. That often leads to discussions what's better and usually boils down to personal preference or coding style.
考虑到 flatMap
在 Scala 编程中无处不在(主要是隐藏在 for-comprehension 后面.
One could also say that flatMap
(i.e. explode
operator) is more Scala-ish given how ubiquitous flatMap
is in Scala programming (mainly hidden behind for-comprehension).
这篇关于爆炸函数和运算符有什么区别?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持!