本文介绍了Spark每组最多n个的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!
问题描述
如何在 spark-sql
?
$ b中得到top-n(可以说是前10或前3) $ b
答案是畅销产品和第二畅销产品在每个类别中如下
选择产品,类别,收入FROM
(SELECT product,category ,收入dense_rank()
OVER(PARTITION BY category ORDER BY revenue DESC)as rank
FROM productRevenue)tmp
WHERE rank
Tis会给你想要的结果
How can I get the top-n (lets say top 10 or top 3) per group in spark-sql
?
http://www.xaprb.com/blog/2006/12/07/how-to-select-the-firstleastmax-row-per-group-in-sql/ provides a tutorial for general SQL. However, spark does not implement subqueries in the where clause.
解决方案
You can use the window function feature that was added in Spark 1.4Suppose that we have a productRevenue table as shown below.
the answer to What are the best-selling and the second best-selling products in every category is as follows
SELECT product,category,revenue FROM
(SELECT product,category,revenue,dense_rank()
OVER (PARTITION BY category ORDER BY revenue DESC) as rank
FROM productRevenue) tmp
WHERE rank <= 2
Tis will give you the desired result
这篇关于Spark每组最多n个的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持!