我收藏了30亿份文件。每个文档如下所示:
"_id" : ObjectId("54c1a013715faf2cc0047c77"),
"service_type" : "JE",
"receiver_id" : NumberLong("865438083645"),
"time" : ISODate("2012-12-05T23:07:36Z"),
"duration" : 24,
"service_description" : "NQ",
"receiver_cell_id" : null,
"location_id" : "658_55525",
"caller_id" : NumberLong("475035504705")
我想得到不同用户的列表(他们至少应该作为呼叫者“呼叫者id”出现一次)、他们的计数(每个用户作为呼叫者或接收者出现在集合中的次数)以及如果他们是呼叫者的位置计数(即每个用户的每个位置id的计数)。
最后我想说:
"number_of_records" : 20,
"locations" : [{location_id: 658_55525, count:5}, {location_id: 840_5425, count:15}],
"user" : NumberLong("475035504705")
我尝试了所描述的here和here解决方案,但它们的效率不够(非常慢)。实现这一目标的有效途径是什么?
最佳答案
对结果使用聚合:
db.<collection>.aggregate([
{ $group : { _id : { user: "$caller_id", localtion: '$location_id'} , count : { $sum : 1} } },
{ $project : { _id : 0, _id : '$_id.user', location : '$_id.localtion', count : '$count' } },
{ $group : { _id : '$_id', 'locations' : { $push : { location_id : '$location', count : '$count' } }, number_of_records : {$sum : '$count'} } },
{ $project : { _id : 0, user : '$_id', locations : '$locations', number_of_records : '$number_of_records'} },
{ $out : 'outputCollection'},
])
输出为:
{
"0" : {
"locations" : [
{
"location_id" : "840_5425",
"count" : 8
},
{
"location_id" : "658_55525",
"count" : 5
}
],
"number_of_records" : 13,
"user" : NumberLong(475035504705)
}
}
使用
allowDiskUse
更新:var pipe = [
{ $group : { _id : { user: "$caller_id", localtion: '$location_id'} , count : { $sum : 1} } },
{ $project : { _id : 0, _id : '$_id.user', location : '$_id.localtion', count : '$count' } },
{ $group : { _id : '$_id', 'locations' : { $push : { location_id : '$location', count : '$count' } }, number_of_records : {$sum : '$count'} } },
{ $project : { _id : 0, user : '$_id', locations : '$locations', number_of_records : '$number_of_records'} },
{ $out : 'outputCollection'},
];
db.runCommand(
{ aggregate: "collection",
pipeline: pipe,
allowDiskUse: true
}
)
关于mongodb - Mongodb明显汇总了30亿个文件,我们在Stack Overflow上找到一个类似的问题:https://stackoverflow.com/questions/28117318/