问题描述
我有一个大型的mongoDB集合.我想将此集合导出为CSV,以便随后将其导入统计信息包中以进行数据分析.
I have a large mongoDB collection. I want to export this collection to CSV so I can then import it in a statistics package to do data analysis.
该馆藏中有大约15 GB的文档.我想将集合拆分为约100个大小相等的CSV文件.有什么方法可以使用mongoexport实现此目的吗?我还可以在pymongo中查询整个集合,将其拆分并手动写入csv文件,但是我想这样做会比较慢,并且需要更多的编码.
The collection has about 15 GB of documents in it. I would like to split the collection into ~100 equally sized CSV files. Is there any way to achieve this using mongoexport? I could also query the whole collection in pymongo, split it and write to csv files manually, but I guess this would be slower and would require more coding.
谢谢您的输入.
推荐答案
您可以使用--skip
& --limit
选项.
You can do it using --skip
& --limit
options.
例如,如果您的收藏馆藏有1000个文档,则可以使用脚本循环(伪代码)来实现:
For example, if you that your collection holds 1,000 document you can do it using a script loop (pseudo code):
loops = 100
count = db.collection.count()
batch_size = count / loops
for (i = 0; i < loops; i++) {
mongoexport --skip (batch_size * i) --limit batch_size --out export${i}.json ...
}
考虑到您的文档大小大致相等.
Taking into account that your documents are roughly equal in size.
但是请注意,大的跳跃很慢.
Note however, that large skips are slow.
下界迭代将比上界迭代更快.
Lower bound iterations will be faster than upper bound iterations.
这篇关于mongoexport到多个csv文件的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持!