本文介绍了MySQL:选择N行,但在一列中仅包含唯一值的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

给出此数据集:

ID  Name            City            Birthyear
1   Egon Spengler   New York        1957
2   Mac Taylor      New York        1955
3   Sarah Connor    Los Angeles     1959
4   Jean-Luc Picard La Barre        2305
5   Ellen Ripley    Nostromo        2092
6   James T. Kirk   Riverside       2233
7   Henry Jones     Chicago         1899

我需要找到3个最老的人,但是每个城市只有一个.

I need to find the 3 oldest persons, but only one of every city.

如果它是最古老的三个,那将是...

If it would just be the three oldest, it would be...

  • 亨利·琼斯/芝加哥
  • Mac Taylor/纽约
  • Egon Spengler/纽约

但是,由于Egon Spengler和Mac Taylor都位于纽约,因此Egon Spengler会退学,而下一个(Sarah Connor/洛杉矶)会进来.

However since both Egon Spengler and Mac Taylor are located in New York, Egon Spengler would drop out and the next one (Sarah Connor / Los Angeles) would come in instead.

有什么优雅的解决方案吗?

Any elegant solutions?

更新:

目前,PConroy的一种变体是最好/最快的解决方案:

Currently a variation of PConroy is the best/fastest solution:

SELECT P.*, COUNT(*) AS ct
   FROM people P
   JOIN (SELECT MIN(Birthyear) AS Birthyear
              FROM people
              GROUP by City) P2 ON P2.Birthyear = P.Birthyear
   GROUP BY P.City
   ORDER BY P.Birthyear ASC
   LIMIT 10;

对于大数据集(5分钟后中止),他的原始查询"IN"非常慢,但是将子查询移至JOIN可以大大提高速度.大约花费了0.15秒.在我的测试环境中为1 mio行.我有一个关于城市,生日"的索引,另一个是关于生日"的索引.

His original query with "IN" is extremly slow with big datasets (aborted after 5 minutes), but moving the subquery to a JOIN will speed it up a lot. It took about 0.15 seconds for approx. 1 mio rows in my test environment. I have an index on "City, Birthyear" and a second one just on "Birthyear".

注意:这与...有关.

Note: This is related to...

  • Selecting unique rows in a set of two possibilities
  • SQL Query to get latest price

推荐答案

可能不是最优雅的解决方案,并且IN的性能可能在较大的表上受到影响.

Probably not the most elegant of solutions, and the performance of IN may suffer on larger tables.

嵌套查询获取每个城市的最小Birthyear.只有具有此Birthyear的记录才在外部查询中匹配.按年龄排序,然后限制为3个结果,则可以使您成为所在城市中年龄最大的3个最老的人(Egon Spengler退学..)

The nested query gets the minimum Birthyear for each city. Only records who have this Birthyear are matched in the outer query. Ordering by age then limiting to 3 results gets you the 3 oldest people who are also the oldest in their city (Egon Spengler drops out..)

SELECT Name, City, Birthyear, COUNT(*) AS ct
FROM table
WHERE Birthyear IN (SELECT MIN(Birthyear)
               FROM table
               GROUP by City)
GROUP BY City
ORDER BY Birthyear DESC LIMIT 3;

+-----------------+-------------+------+----+
| name            | city        | year | ct |
+-----------------+-------------+------+----+
| Henry Jones     | Chicago     | 1899 | 1  |
| Mac Taylor      | New York    | 1955 | 1  |
| Sarah Connor    | Los Angeles | 1959 | 1  |
+-----------------+-------------+------+----+

编辑-在外部查询中添加了GROUP BY City,因为出生年限相同的人将返回多个值.对外部查询进行分组可确保每个城市仅返回一个结果,如果超过一个人的最小值为Birthyear. ct列将显示该城市中是否有不止一个人与那个Birthyear

Edit - added GROUP BY City to outer query, as people with same birth years would return multiple values. Grouping on the outer query ensures that only one result will be returned per city, if more than one person has that minimum Birthyear. The ct column will show if more than one person exists in the city with that Birthyear

这篇关于MySQL:选择N行,但在一列中仅包含唯一值的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持!

08-11 15:38