问题描述
我有相当大的表,有19万000条记录,我有重复的行的问题。甚至在这里也有很多类似的问题,但是他们似乎都没有给我一个满意的答案。一些要考虑的点:- 行唯一性由两列确定,
location_id
和$>
- 我希望尽可能快地执行执行时间(
- 复制表格不太可行,因为表格的大小是几千兆字节。
- 无需担心关系。
如上所述,每个 location_id
只能有一个不同的 datetime
,我想删除所有重复的实例。任何想法都可以生存下来,因为数据是一样的。
任何想法?
我想您可以使用此查询从表中删除重复记录
ALTER IGNORE TABLE table_name ADD UNIQUE(location_id,datetime)
在进行此操作之前,首先要测试一些示例数据。 。然后尝试这个....
注意:在版本5.5上,它适用于MyISAM,但不适用于InnoDB。
I have quite a large table with 19 000 000 records, and I have problem with duplicate rows. There's a lot of similar questions even here in SO, but none of them seems to give me a satisfactory answer. Some points to consider:
- Row uniqueness is determined by two columns,
location_id
anddatetime
. - I'd like to keep the execution time as fast as possible (< 1 hour).
- Copying tables is not very feasible as the table is several gigabytes in size.
- No need to worry about relations.
As said, every location_id
can have only one distinct datetime
, and I would like to remove all the duplicate instances. It does not matter which one of them survives, as the data is identical.
Any ideas?
I think you can use this query to delete the duplicate records from the table
ALTER IGNORE TABLE table_name ADD UNIQUE (location_id, datetime)
Before doing this, just test with some sample data first..and then Try this....
Note: On version 5.5, it works on MyISAM but not InnoDB.
这篇关于从大表中删除重复项的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持!