问题描述
我是Flink的新手.有时在某些情况下,我需要在DataStream上进行聚合而无需先执行keyBy.为什么Flink不支持DataStream上的聚合(总和,最小值,最大值等)?
I am a newbie to Flink. Sometimes there are cases where I want to do aggregation on a DataStream without needed to do a keyBy first. Why doesn't Flink support aggregation (sum, min, max, etc.) on a DataStream?
谢谢你,艾哈迈德.
推荐答案
使用 FLIP-134 Flink社区已决定从DataStream API中弃用所有这些关系方法:
With FLIP-134 the Flink community has decided to deprecate all of these relational methods from the DataStream API:
-
DataStream#project
-
Windowed/KeyedStream#sum,min,max,minBy,maxBy
-
DataStream#keyBy
,其中用字段名称或索引指定的键(包括ConnectedStreams#keyBy
)
DataStream#project
Windowed/KeyedStream#sum,min,max,minBy,maxBy
DataStream#keyBy
where the key specified with field name or index (includingConnectedStreams#keyBy
)
此决定的基本原理是Table/SQL是更完整,性能更高的关系API,它已经支持批处理和流式处理.使用这些API,您可以轻松执行全局聚合,而无需先执行keyBy或GROUP BY.
The rationale behind this decision is that Table/SQL is a more complete and more performant relational API, and it already supports both batch and streaming. With these APIs you can easily perform global aggregations, without having to first do a keyBy or GROUP BY.
一个例子:
StreamExecutionEnvironment env = StreamExecutionEnvironment.getExecutionEnvironment();
StreamTableEnvironment tableEnv = StreamTableEnvironment.create(env);
SingleOutputStreamOperator<Integer> numbers = env.fromElements(0, 1, 1, 0, 3, 2);
Table data = tableEnv.fromDataStream(numbers, $("n"));
Table results = data.select($("n").max());
tableEnv
.toRetractStream(results, Row.class)
.print();
env.execute();
这篇关于在Flink中,为什么DataStream不支持聚合的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持!