从大表中删除重复项

从大表中删除重复项

本文介绍了从大表中删除重复项的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我有相当大的表,有19万000条记录,我有重复的行的问题。甚至在这里也有很多类似的问题,但是他们似乎都没有给我一个满意的答案。一些要考虑的点:




  • 行唯一性由两列确定, location_id
  • 我希望尽可能快地执行执行时间(
  • 复制表格不太可行,因为表格的大小是几千兆字节。

  • 无需担心关系。



    • 如上所述,每个 location_id 只能有一个不同的 datetime ,我想删除所有重复的实例。任何想法都可以生存下来,因为数据是一样的。



      任何想法?

      解决方案

      我想您可以使用此查询从表中删除重复记录

        ALTER IGNORE TABLE table_name ADD UNIQUE(location_id,datetime)

      在进行此操作之前,首先要测试一些示例数据。 。然后尝试这个....



      注意:在版本5.5上,它适用于MyISAM,但不适用于InnoDB。


      I have quite a large table with 19 000 000 records, and I have problem with duplicate rows. There's a lot of similar questions even here in SO, but none of them seems to give me a satisfactory answer. Some points to consider:

      • Row uniqueness is determined by two columns, location_id and datetime.
      • I'd like to keep the execution time as fast as possible (< 1 hour).
      • Copying tables is not very feasible as the table is several gigabytes in size.
      • No need to worry about relations.

      As said, every location_id can have only one distinct datetime, and I would like to remove all the duplicate instances. It does not matter which one of them survives, as the data is identical.

      Any ideas?

      解决方案

      I think you can use this query to delete the duplicate records from the table

      ALTER IGNORE TABLE table_name ADD UNIQUE (location_id, datetime)
      

      Before doing this, just test with some sample data first..and then Try this....

      Note: On version 5.5, it works on MyISAM but not InnoDB.

      这篇关于从大表中删除重复项的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持!

08-24 03:48