从大表中删除重复项 | 从大表中删除重复项

本文介绍了从大表中删除重复项的处理方法，对大家解决问题具有一定的参考价值，需要的朋友们下面随着小编来一起学习吧！

问题描述

我有相当大的表，有19万000条记录，我有重复的行的问题。甚至在这里也有很多类似的问题，但是他们似乎都没有给我一个满意的答案。一些要考虑的点：

行唯一性由两列确定， location_id 和
我希望尽可能快地执行执行时间（
复制表格不太可行，因为表格的大小是几千兆字节。

无需担心关系。

如上所述，每个 location_id 只能有一个不同的 datetime ，我想删除所有重复的实例。任何想法都可以生存下来，因为数据是一样的。

任何想法？

解决方案

我想您可以使用此查询从表中删除重复记录

  ALTER IGNORE TABLE table_name ADD UNIQUE（location_id，datetime）

在进行此操作之前，首先要测试一些示例数据。。然后尝试这个....

注意：在版本5.5上，它适用于MyISAM，但不适用于InnoDB。

I have quite a large table with 19 000 000 records, and I have problem with duplicate rows. There's a lot of similar questions even here in SO, but none of them seems to give me a satisfactory answer. Some points to consider:

Row uniqueness is determined by two columns, location_id and datetime.
I'd like to keep the execution time as fast as possible (< 1 hour).
Copying tables is not very feasible as the table is several gigabytes in size.
No need to worry about relations.

As said, every location_id can have only one distinct datetime, and I would like to remove all the duplicate instances. It does not matter which one of them survives, as the data is identical.

Any ideas?

解决方案

I think you can use this query to delete the duplicate records from the table

ALTER IGNORE TABLE table_name ADD UNIQUE (location_id, datetime)

Before doing this, just test with some sample data first..and then Try this....

Note: On version 5.5, it works on MyISAM but not InnoDB.

这篇关于从大表中删除重复项的文章就介绍到这了，希望我们推荐的答案对大家有所帮助，也希望大家多多支持！