问题描述
mongodb.countDocuments
当结果集很大时很慢
关于用户收集的测试数据:
Test data on users collection:
- 1000 万个文档,状态为
'active'
- 100k 状态为
'inactive'
的文档
字段status
被索引{status: 1}
The field status
is indexed {status: 1}
db.users.countDocuments({status: 'active'})
需要 2.91 秒db.users.countDocuments({status: 'inactive'})
需要 0.018 秒
db.users.countDocuments({status: 'active'})
takes 2.91 secdb.users.countDocuments({status: 'inactive'})
takes 0.018 sec
我了解 countDocuments
使用聚合来查找和计算结果.
I understand that countDocuments
uses an aggegation to find and count the results.
estimatedDocumentCount
() 在这种情况下不起作用,因为需要查询过滤器
estimatedDocumentCount
() does not work in this case because query filter is needed
有什么改进建议吗?
推荐答案
计数看起来是一种应该很便宜的东西,但往往不是.因为 mongo 不维护在其 b-tree 索引中匹配特定条件的文档数量的计数,所以它需要扫描索引计数文档.这意味着对文档进行 100 倍的计数将花费 100 倍的时间,这就是我们在这里看到的大致情况——0.018 * 100 = 1.8s
.
Counting seems like one of those things that should be cheap, but often isn't. Because mongo doesn't maintain a count of the number of documents that match certain criteria in its b-tree index, it needs to scan through the index counting documents as it goes. That means that counting 100x the documents will take 100x the time, and this is roughly what we see here -- 0.018 * 100 = 1.8s
.
要加快速度,您有几个选择:
To speed this up, you have a few options:
- 活动计数大致为
estimatedDocumentCount() - db.users.countDocuments({status: 'inactive'})
.对于您的用例,这是否足够准确? - 或者,您可以在一个单独的集合中维护一个
counts
文档,该集合与您拥有的活动/非活动文档的数量保持同步.
- The active count is roughly
estimatedDocumentCount() - db.users.countDocuments({status: 'inactive'})
. Would this be accurate enough for your use case? - Alternatively, you can maintain a
counts
document in a separate collection that you keep in sync with the number of active/inactive documents that you have.
这篇关于即使使用索引,当结果集很大时 mongodb.countDocuments 也很慢的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持!