一、over(partition by ......)主要和聚合函数sum()、count()、avg()等结合使用,实现分组聚合的功能

示列:根据day_id日期和mac_id机器码进行聚合分组求每一天的该机器的销量和即sum_num,hive sql语句:select day_id,mac_id,mac_color,day_num,sum(day_num)over(partition by day_id,mac_id order by day_id) sum_num from test_temp_mac_id;

注:day_id,mac_id,mac_color,day_num为查询原有数据,sum_num为计算结果

day_idmac_idmac_colorday_numsum_num
201710111292金色189
201710111292金色1489
201710111292金色289
201710111292金色1189
201710111292黑色289
201710111292粉金5889
201710111292金色189
201710112013金色1022
201710112013金色922
201710112013金色222
201710112013金色122
201710121292金色518
201710121292金色718
201710121292金色518
201710121292粉金118
201710122013粉金17
201710122013金色67
201710131292黑色11
201710132013粉金22
2017101112460茶花金11

二、over(partition by ......)与group by 区别

如果用group by实现一中根据day_id日期和mac_id机器码进行聚合分组求每一天的该机器的销量和即sum_num,

则hive sql语句为:select day_id,mac_id,sum(day_num) sum_num from test_temp_mac_id group by day_id,mac_id order by day_id;结果如下表

注:我们可以观察到group by可以实现同样的分组聚合功能,但sql语句不能写与分组聚合无关的字段,否则会报错,即group by 与over(partition by ......)主要区别为,带上group by的hive sql语句只能显示与分组聚合相关的字段,而带上over(partition by ......)的hive sql语句能显示所有字段.。

day_idmac_idsum_num
201710111246091
201710112013022
201710111292289
201710121292218
20171012201307
20171013129221
20171013201302

https://blog.csdn.net/qq_37325859/article/details/78222712

05-11 22:45