spark:聚合器和UDAF有什么区别?

本文介绍了spark:聚合器和UDAF有什么区别?的处理方法，对大家解决问题具有一定的参考价值，需要的朋友们下面随着小编来一起学习吧！

问题描述

在 Spark 的文档中，聚合器:

In Spark's documentation, Aggregator:

抽象类聚合器[-IN, BUF, OUT] 扩展可序列化

用户定义聚合的基类，可以是用于数据集操作以获取组的所有元素和将它们减少到一个值.

A base class for user-defined aggregations, which can beused in Dataset operations to take all of the elements of a group andreduce them to a single value.

UserDefinedAggregateFunction 是:

UserDefinedAggregateFunction is:

抽象类 UserDefinedAggregateFunction 扩展可序列化

实现用户自定义聚合函数的基类(UDAF).

The base class for implementing user-defined aggregate functions(UDAF).

根据数据集聚合器 - Databricks，聚合器类似于 UDAF，但接口是根据 JVM 对象而不是 Row 表示的."

According to Dataset Aggregator - Databricks, "an Aggregator is similar to a UDAF, but the interface is expressed in terms of JVM objects instead of as a Row ."

这两个类好像很相似，除了接口的类型之外还有什么区别?

It seems these two classes are very similar, what are other differences apart from the types in the interface?

一个类似的问题是:UDAF 与 Spark 中聚合器的性能

聚合器和UDAF有什么区别

问题描述

推荐答案