问题描述
我有一个数据集,外观为
I have a data set looks as
{"BrandId":"a","SessionId":100,"UserName":"tom"}
{"BrandId":"a","SessionId":200,"UserName":"tom"}
{"BrandId":"b","SessionId":300,"UserName":"mike"}
我想按Brandid计算不同的会话和用户名组,示例sql如下:
I would like to count distinct session and username group by brandid, the sample sql is like:
select brandid,count_distinct(sessionid),count_distinct(username)
from data
group by brandid
我试图编写Mongo DB,我当前的代码如下,并且不起作用.反正有什么办法使它起作用?
I tried to write Mongo DB, my current code is as following and it does not work. Is there anyway to make it work?
db.logs.aggregate([
{$group:{
_id:{brand:"$BrandId",user:"$UserName",session:"$SessionId"},
count:{$sum:1}}},
{$group:{
_id:"$_id.brand",
users:{$sum:"$_id.user"},
sessions:{$sum:"$_id.session"}
}}
])
对于特定示例,预期计数为
for the certain example, the expected count is
{"BrandId:"a","countSession":2,"countUser":1}
{"BrandId:"b","countSession":1,"countUser":1}
如果您了解SQL,则预期结果与我提到的SQL相同.
if you know SQL, the expect result is as same as the SQL I mentioned.
推荐答案
您可以使用 $addToSet
,以在$group
期间累积不同的SessionId
和UserName
值集,然后添加 $project
阶段到使用 $size
运算符以获取每个集合的大小:
You can do this by using $addToSet
to accumulate the distinct set of SessionId
and UserName
values during the $group
, and then adding a $project
stage to your pipeline that uses the $size
operator to get the size of each set:
db.logs.aggregate([
{$group: {
_id: '$BrandId',
sessionIds: {$addToSet: '$SessionId'},
userNames: {$addToSet: '$UserName'}
}},
{$project: {
_id: 0,
BrandId: '$_id',
countSession: {$size: '$sessionIds'},
countUser: {$size: '$userNames'}
}}
])
结果:
{
"BrandId" : "b",
"countSession" : 1,
"countUser" : 1
},
{
"BrandId" : "a",
"countSession" : 2,
"countUser" : 1
}
这篇关于使用Mongo按组计算多个不同的字段的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持!