问题描述
我已经在mongo中使用了很多聚合,我知道分组计数等方面的性能优势.但是,mongo在计算集合中所有文档的这两种方式上在性能上有什么区别吗?:
I've used aggregation in mongo a lot, I know performance benefits on the grouped counts and etc. But, do mongo have any difference in performance on those two ways to count all documents in a collection?:
collection.aggregate([
{
$match: {}
},{
$group: {
_id: null,
count: {$sum: 1}
}
}]);
和
collection.find({}).count()
更新:第二种情况:假设我们有以下示例数据:
Update: Second case:Let's say we have this sample data:
{_id: 1, type: 'one', value: true}
{_id: 2, type: 'two', value: false}
{_id: 4, type: 'five', value: false}
使用aggregate()
:
var _ids = ['id1', 'id2', 'id3'];
var counted = Collections.mail.aggregate([
{
'$match': {
_id: {
'$in': _ids
},
value: false
}
}, {
'$group': {
_id: "$type",
count: {
'$sum': 1
}
}
}
]);
使用count()
:
var counted = {};
var type = 'two';
for (i = 0, len = _ids.length; i < len; i++) {
counted[_ids[i]] = Collections.mail.find({
_id: _ids[i], value: false, type: type
}).count();
}
推荐答案
.count()
快得多.您可以通过调用
.count()
is by far faster. You can see the implementation by calling
// Note the missing parentheses at the end
db.collection.count
返回光标的长度.默认查询(如果在没有查询文档的情况下调用了count()
),则该查询又被实现为返回_id_
索引iirc的长度.
which returns the length of the cursor. of the default query (if count()
is called with no query document), which in turn is implemented as returning the length of the _id_
index, iirc.
但是,聚合将读取每个文档并进行处理.当仅对大约100k文档进行处理时,这只能与.count()
处于相同数量级(给出并根据您的RAM取值).
An aggregation, however, reads each and every document and processes it. This can only be halfway in the same order of magnitude with .count()
when doing it over only some 100k of documents (give and take according to your RAM).
以下功能已应用于具有大约1200万个条目的集合:
Below function was applied to a collection with some 12M entries:
function checkSpeed(col,iterations){
// Get the collection
var collectionUnderTest = db[col];
// The collection we are writing our stats to
var stats = db[col+'STATS']
// remove old stats
stats.remove({})
// Prevent allocation in loop
var start = new Date().getTime()
var duration = new Date().getTime()
print("Counting with count()")
for (var i = 1; i <= iterations; i++){
start = new Date().getTime();
var result = collectionUnderTest.count()
duration = new Date().getTime() - start
stats.insert({"type":"count","pass":i,"duration":duration,"count":result})
}
print("Counting with aggregation")
for(var j = 1; j <= iterations; j++){
start = new Date().getTime()
var doc = collectionUnderTest.aggregate([{ $group:{_id: null, count:{ $sum: 1 } } }])
duration = new Date().getTime() - start
stats.insert({"type":"aggregation", "pass":j, "duration": duration,"count":doc.count})
}
var averages = stats.aggregate([
{$group:{_id:"$type","average":{"$avg":"$duration"}}}
])
return averages
}
并返回:
{ "_id" : "aggregation", "average" : 43828.8 }
{ "_id" : "count", "average" : 0.6 }
单位是毫秒.
hth
这篇关于MongoDB Count()与聚合的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持!