mysql - 需要通过ID删除重复项

我有一个查询，该查询成功分析了我的表中与100％有问题的字段匹配的重复项，并返回了重复项计数。现在，我需要编写一个返回记录ID值的查询，以便删除它们。这是我的初始查询：

SELECT CompanyID, COUNT(*) AS dupecount
FROM artist_reports
GROUP BY contemporary, traditional, country, folk, functional, decorative, abstract, representational, figurative, price_range_low,
  price_range_high, best_selling_range_low, best_selling_range_high, average_sales_other_shows, total_sales_this_event, average_exhibitor_quality, financial_fairness, patrons_art_savvy, demographics, buying_energy,
  advertising, venue_environment, show_layout, organization, director_support, staff_support, logistical_ease, load_in_out, parking_ease, artist_amenities,
  awards_judging, security_efficiency, weather, event_year, critique, artist_reports.status, public_email, artist_reports.email, would_you_return, fairs_per_year, CompanyID
HAVING COUNT(*) > 1

从理论上讲，我具有指向另一个页面的链接，该页面传递了CompanyID，然后该ID将删除重复项。但是，我试过了，它只返回一条记录：

SELECT arid FROM artist_reports WHERE arid IN (
  SELECT * FROM (
    SELECT arid
      FROM artist_reports
      WHERE CompanyID = 12345
      GROUP BY contemporary, traditional, country, folk, functional, decorative, abstract, representational, figurative, price_range_low,
        price_range_high, best_selling_range_low, best_selling_range_high, average_sales_other_shows, total_sales_this_event, average_exhibitor_quality, financial_fairness, patrons_art_savvy, demographics, buying_energy,
        advertising, venue_environment, show_layout, organization, director_support, staff_support, logistical_ease, load_in_out, parking_ease, artist_amenities,
        awards_judging, security_efficiency, weather, event_year, critique, artist_reports.status, public_email, artist_reports.email, would_you_return, fairs_per_year, CompanyID
      HAVING COUNT(*) > 1) AS a )

我不确定自己在做什么错，但是理想情况下，我想为报表（干旱）获取具有唯一ID值的记录集，然后遍历那些值以删除最新的ID，从而保持原始状态不变。

最佳答案

在oracle中，您可以使用窗口函数，但是在这里您需要作弊。

我假设您可能想通过arid删除重复项。要使用查询获取所有ID，请尝试向其中添加GROUP_CONCAT：

SELECT CompanyID, COUNT(*) AS dupecount,
      GROUP_CONCAT(arid) AS all_duplicates_ids
FROM artist_reports
GROUP BY contemporary, traditional, country, folk, functional, decorative, abstract, representational, figurative, price_range_low,
  price_range_high, best_selling_range_low, best_selling_range_high, average_sales_other_shows, total_sales_this_event, average_exhibitor_quality, financial_fairness, patrons_art_savvy, demographics, buying_energy,
  advertising, venue_environment, show_layout, organization, director_support, staff_support, logistical_ease, load_in_out, parking_ease, artist_amenities,
  awards_judging, security_efficiency, weather, event_year, critique, artist_reports.status, public_email, artist_reports.email, would_you_return, fairs_per_year, CompanyID
HAVING COUNT(*) > 1

现在您将得到类似的结果（假设有5个重复项）：

CompanyID|dupecount|all_duplicates_ids
---------------------------------------
12345    |      5  | '2,5,8,9, 12'

您不想删除所有5条记录，仅删除其中4条，因此可以通过将GROUP_CONCAT部分替换为以下内容来修剪此字符串：

SUBSTRING_INDEX(
    CONCAT(
      GROUP_CONCAT(arid),
      ','),
    ',', 1) AS all_duplicates_without_one

现在，您已经用逗号分隔了带有ID的字符串。您可以在where子句的删除查询中使用它-我建议使用FIND_IN_SET函数检查arid是否用逗号分隔的字符串：

http://dev.mysql.com/doc/refman/5.0/en/string-functions.html#function_find-in-set

关于mysql - 需要通过ID删除重复项，我们在Stack Overflow上找到一个类似的问题：https://stackoverflow.com/questions/31036215/