问题描述
我有一些这样格式化的数据:
I have some data formatted like this:
Lane Series
1 680
1 685
1 688
2 666
2 425
2 775
...
我想在每个泳道上获得最高的n个系列(就本例而言,假设为2,但可能不止于此)
And I'd like to grab the highest n series per lane (let's say 2 for the sake of this example, but it could be many more than that)
因此输出应为:
Lane Series
1 688
1 685
2 775
2 666
获得每个泳道的最高序列很容易,但是我似乎找不到找到最高2个结果的方法.
Getting the highest series per lane is easy, but I can't seem to find a way to get the highest 2 results.
我将MAX聚合函数与GROUP BY一起使用以获取MAX,但是没有SQL Server中的"TOP N"函数,并且使用ORDER BY ... LIMIT仅返回总体上最高的N个结果,而不是每个泳道.
I use a MAX aggregate function with a GROUP BY to get the MAX, but there's no "TOP N" function as in SQL Server and using ORDER BY... LIMIT only returns the highest N results overall, not per lane.
由于我使用的是JAVA应用程序,因此我编写了自己的代码以查询数据库并选择N是什么,因此我可以执行循环并使用LIMIT循环遍历每个通道,每次都进行不同的查询,但是我想学习如何使用MySQL来做到这一点.
Since I use a JAVA application I coded myself to query the database and choose what N is, I could do a loop and use a LIMIT and loop through every lane, making a different query each time, but I want to learn how to do it using MySQL.
推荐答案
该解决方案对于MySQL来说是最快的,并且可以处理非常大的表,但是它使用了笨拙的" MySQL功能,因此不会在其他方面使用数据库风味.
This solution is the fastest for MySQL and will work with very large tables, but it uses "funky" MySQL features, so wouldn't be of use for other database flavours.
(经过编辑,可以在应用逻辑之前对进行排序)
(Edited to sort before applying logic)
set @count:=-1, @lane:=0;
select lane, series
from (select lane, series from lane_series order by lane, series desc) x
where if(lane != @lane, @count:=-1, 0) is not null
and if(lane != @lane, @lane:=lane, lane) is not null
and (@count:=@count+1) < 2; -- Specify the number of row at top of each group here
要对类固醇进行此查询,请在泳道和序列上定义一个索引:CREATE INDEX lane_series_idx on lane_series(lane, series);
,它将进行(超快速)仅索引扫描-这样您的其他文本列就不会对其造成影响.
To put this query on steroids, define an index on lane and series: CREATE INDEX lane_series_idx on lane_series(lane, series);
and it will do (super fast) index-only scan - so your other text columns don't affect it.
此查询的重点是:
- 它只需要一张表通过(尽管已排序)
- 它可以处理任何级别的平局,例如,如果第二局有平局,则仅显示第二局中的一个-即行数是绝对的,不会超过
- It requires only one table pass (albeit sorted)
- It handles ties at any level, for example if there's a tie for 2nd, only one of the 2nd will be displayed - ie the row count is absolute and never exceeded
这是测试输出:
create table lane_series (lane int, series int);
insert into lane_series values (1, 680),(1, 685),(1, 688),(2, 666),(2, 425),(2, 775);
-- Execute above query:
+------+--------+
| lane | series |
+------+--------+
| 1 | 688 |
| 1 | 685 |
| 2 | 775 |
| 2 | 666 |
+------+--------+
这篇关于在MySQL中找到每个组的最高n个值的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持!