我对SQL相当陌生,正在处理一些实践问题。我有一个Twitter数据库示例,我试图根据关注者的数量在每个位置找到前3名用户。
以下是我正在使用的表格:
id_follower_location
id | followers | location
-----------------+-----------+----------
id28929238 | 1 | Toronto
id289292338 | 1 | California
id2892923838 | 2 | Rome
.
.
locations
location
----------------------
Bay Area, California
London
Nashville, TN
.
.
我可以通过以下方式找到“顶级”用户:
create view top1id as
select location,
(select id_followers_location.id from id_followers_location
where id_followers_location.location = locations.location
order by followers desc limit 1
) as id
from locations;
create view top1 as
select location, id,
(select followers from id_followers_location
where id_followers_location.id = top1id.id
) as followers
from top1id;
我唯一能想到的解决这个问题的方法就是找出“Top 1st”、“Top 2nd”和“Top 3rd”,然后使用
union
组合它。这是正确的/唯一的方法吗?还是有更好的办法? 最佳答案
前n名
使用rank()
至少有3行(如果存在的话更少)。如果前三个列之间存在联系,则可能返回更多行。见:
PostgreSQL equivalent for TOP n WITH TIES: LIMIT "with ties"?
如果每一个位置需要精确的3行(如果存在的话更少),就必须断开关系。一种方法是使用row_number()
而不是rank()
。
SELECT *
FROM (
SELECT id, location
, row_number() OVER (PARTITION BY location ORDER BY followers DESC) AS rn
FROM id_follower_location
) r
WHERE rn <= 3
ORDER BY location, rn;
您可能需要将
ORDER BY
添加到外部查询以保证排序输出。如果有三个以上的有效候选者,则可以从ties中任意选择,除非在
ORDER BY
子句中添加了更多的OVER
项来断开ties。前1名
至于查询获取前1行:PostgreSQL中有一种更简单、更快的方法:
SELECT DISTINCT ON (location)
id, location -- add additional columns freely
FROM id_follower_location
ORDER BY location, followers DESC;
在这个密切相关的答案中,此查询技术的详细信息:
Select first row in each GROUP BY group?
关于sql - 查找每个位置的前3位用户,我们在Stack Overflow上找到一个类似的问题:https://stackoverflow.com/questions/15995271/