sql - 查找列中具有重复值的行

我有一个表author_data:

 author_id | author_name
 ----------+----------------
 9         | ernest jordan
 14        | k moribe
 15        | ernest jordan
 25        | william h nailon
 79        | howard jason
 36        | k moribe

现在我需要的结果是:

 author_id | author_name
 ----------+----------------
 9         | ernest jordan
 15        | ernest jordan
 14        | k moribe
 36        | k moribe

也就是说，对于具有重复外观的名称，我需要author_id。我已经尝试过以下语句:

select author_id,count(author_name)
from author_data
group by author_name
having count(author_name)>1

但这不起作用。我怎么能得到这个？

最佳答案

我建议在子查询中使用window function:

SELECT author_id, author_name  -- omit the name here if you just need ids
FROM (
   SELECT author_id, author_name
        , count(*) OVER (PARTITION BY author_name) AS ct
   FROM   author_data
   ) sub
WHERE  ct > 1;

您将认识到基本的聚合函数count()。可以通过添加OVER子句将其转换为窗口函数-就像其他任何聚合函数一样。
这样，它计算每个分区的行数。 Voilá。
这必须在子查询中完成，因为结果不能在相同WHERE(在SELECT之后出现)的WHERE子句中引用。看:

Best way to get result count before LIMIT was applied

在没有窗口功能的旧版本(v.8.3或更早版本)中-或通常-此替代方法执行起来非常快:

SELECT author_id, author_name  -- omit name, if you just need ids
FROM   author_data a
WHERE  EXISTS (
   SELECT FROM author_data a2
   WHERE  a2.author_name = a.author_name
   AND    a2.author_id <> a.author_id
   );

如果您担心性能，请在author_name上添加一个索引。

关于sql - 查找列中具有重复值的行，我们在Stack Overflow上找到一个类似的问题：https://stackoverflow.com/questions/22722870/