问题描述
我在mongodb(将来10m +)中有大约1.7M的文档。其中一些代表我不想要的重复条目。文件的结构是这样的: {
_id:14124412,
nodes:[
12345,
54321
],
名称:一些美
}
如果同一个 相同的一个节点,则文档是重复的。删除重复的最快方法是什么?
假设您要永久删除包含重复的
+ 节点
从集合中输入,您可以添加唯一
索引选项:
db.test.ensureIndex({name:1,nodes:1},{unique:true,dropDups:true})
正如文档所述,请谨慎使用,因为它将从数据库中删除数据。备份您的数据库,以防万一不能像您所期待的那样。
更新
此解决方案仅在MongoDB 2.x中有效,因为 dropDups
选项在3.0中不再可用()。
I have approximately 1.7M documents in mongodb (in future 10m+). Some of them represent duplicate entry which I do not want. Structure of document is something like this:
{
_id: 14124412,
nodes: [
12345,
54321
],
name: "Some beauty"
}
Document is duplicate if it has at least one node same as another document with same name. What is the fastest way to remove duplicates?
Assuming you want to permanently delete docs that contain a duplicate name
+ nodes
entry from the collection, you can add a unique
index with the dropDups: true
option:
db.test.ensureIndex({name: 1, nodes: 1}, {unique: true, dropDups: true})
As the docs say, use extreme caution with this as it will delete data from your database. Back up your database first in case it doesn't do exactly as you're expecting.
UPDATE
This solution is only valid through MongoDB 2.x as the dropDups
option is no longer available in 3.0 (docs).
这篇关于在mongodb中删除重复文件的最快方式的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持!