问题描述
我有一个结构如下的文件:
I have a document that's structured as follows:
{
'_id' => 'Star Wars',
'count' => 1234,
'spelling' => [ ( 'Star wars' => 10, 'Star Wars' => 15, 'sTaR WaRs' => 5) ]
}
我想获得前N个文档(按降序排列),但是每个文档仅拼写一个(值最高的一个).聚合框架有没有办法做到这一点?
I would like to get the top N documents (by descending count), but with only one one spelling per document (the one with the highest value). It there a way to do this with the aggregation framework?
我可以轻松获得前10个结果(使用$ sort和$ limit).但是,我每个人怎么只得到一个拼写?
I can easily get the top 10 results (using $sort and $limit). But how do I get only one spelling per each?
例如,如果我有以下三个记录:
So for example, if I have the following three records:
{
'_id' => 'star_wars',
'count' => 1234,
'spelling' => [ ( 'Star wars' => 10, 'Star Wars' => 15, 'sTaR WaRs' => 5) ]
}
{
'_id' => 'willow',
'count' => 2211,
'spelling' => [ ( 'willow' => 300, 'Willow' => 550) ]
}
{
'_id' => 'indiana_jones',
'count' => 12,
'spelling' => [ ( 'indiana Jones' => 10, 'Indiana Jones' => 25, 'indiana jones' => 5) ]
}
我要求获得前2个结果,我会得到:
And I ask for the top 2 results, I'll get:
{
'_id' => 'willow',
'count' => 2211,
'spelling' => 'Willow'
}
{
'_id' => 'star_wars',
'count' => 1234,
'spelling' => 'Star Wars'
}
(或具有这种效果的东西)
(or something to this effect)
谢谢!
推荐答案
您设计的架构将使您很难使用MapReduce之外的任何东西,因为您已将对象的键用作值.因此,我调整了您的架构以使其更好地与MongoDB的功能匹配(在此示例中,也是JSON格式):
Your schema as designed would make using anything but a MapReduce difficult as you've used the keys of the object as values. So, I adjusted your schema to better match with MongoDB's capabilities (in JSON format as well for this example):
{
'_id' : 'star_wars',
'count' : 1234,
'spellings' : [
{ spelling: 'Star wars', total: 10},
{ spelling: 'Star Wars', total : 15},
{ spelling: 'sTaR WaRs', total : 5} ]
}
请注意,现在它是一个具有特定键名spelling
和total
值的对象的数组(我不知道该数字实际表示的是什么,因此在我的书中称它为total例子).
Note that it's now an array of objects with a specific key name, spelling
, and a value for the total
(I didn't know what that number actually represented, so I've called it total in my examples).
进入汇总:
db.so.aggregate([
{ $unwind: '$spellings' },
{ $project: {
'spelling' : '$spellings.spelling',
'total': '$spellings.total',
'count': '$count'
}
},
{ $sort : { total : -1 } },
{ $group : { _id : '$_id',
count: { $first: '$count' },
largest : { $first : '$total' },
spelling : { $first: '$spelling' }
}
}
])
- 展开所有数据,以便聚合管道可以访问数组的各种值
- 整理数据以包括管道所需的关键方面.在这种情况下,特定的
spelling
,total
和count
. - 在
total
上排序,以便最后的分组可以使用$first
- 然后进行分组,以便仅返回每个
_id
的$first
值,然后还返回count
,由于将其展平为管道的方式,每个临时文档将包含字段.
- Unwind all of the data so the aggregation pipeline can access the various values of the array
- Flatten the data to include the key aspects needed by the pipeline. In this case, the specific
spelling
, thetotal
, and thecount
. - Sort on the
total
, so that the last grouping can use$first
- Then, group so that only the
$first
value for each_id
is returned, and then also return thecount
which because of the way it was flattened for the pipeline, each temporary document will contain thecount
field.
结果:
[
{
"_id" : "star_wars",
"count" : 1234,
"largest" : 15,
"spelling" : "Star Wars"
},
{
"_id" : "indiana_jones",
"count" : 12,
"largest" : 25,
"spelling" : "Indiana Jones"
},
{
"_id" : "willow",
"count" : 2211,
"largest" : 550,
"spelling" : "Willow"
}
]
这篇关于MongoDB聚合框架的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持!