Hive 的collect_set使用详解
https://blog.csdn.net/liyantianmin/article/details/48262109
- 对于非group by字段,可以用Hive的collect_set函数收集这些字段,返回一个数组;
- 使用数字下标,可以直接访问数组中的元素;
select a,collect_set(b) as bb from t where b<='xxxxxx' group by a
会按照a分组 通过collect_set会把每个a所对应的b构建成一个以逗号分隔的数组返回。上述SQL返回:
a1,["b1","b2"]
a2,["b1","b2","b3","b4"]
可以按照这个返回的数组做文章,即为
select * from (select a,collect_set(b) as bb from t where b<='xxxxxx' group by a
) where size(tmp.bb)=1 and
tmp
.bb[0]='xxxxxxxx';
表示某bb所对应的数组长度为1 并且第一个bb为xxxxxxxx的a