问题描述
我有一个高度嵌套的mongoDB对象集,我想计算与给定条件匹配的子文档的数量.(在每个文档中).例如:
I have a highly nested mongoDB set of objects and I want to count the number of subdocuments that match a given condition (in each document). For example:
{"_id":{"chr":"20","pos":"14371","ref":"A","alt":"G"},
"studies":[
{
"study_id":"Study1",
"samples":[
{
"sample_id":"NA00001",
"formatdata":[
{"GT":"1|0","GQ":48,"DP":8,"HQ":[51,51]}
]
},
{
"sample_id":"NA00002",
"formatdata":[
{"GT":"0|0","GQ":48,"DP":8,"HQ":[51,51]}
]
}
]
}
]
}
{"_id":{"chr":"20","pos":"14372","ref":"T","alt":"AA"},
"studies":[
{
"study_id":"Study3",
"samples":[
{
"sample_id":"SAMPLE1",
"formatdata":[
{"GT":"1|0","GQ":48,"DP":8,"HQ":[51,51]}
]
},
{
"sample_id":"SAMPLE2",
"formatdata":[
{"GT":"1|0","GQ":48,"DP":8,"HQ":[51,51]}
]
}
]
}
]
}
{"_id":{"chr":"20","pos":"14373","ref":"C","alt":"A"},
"studies":[
{
"study_id":"Study3",
"samples":[
{
"sample_id":"SAMPLE3",
"formatdata":[
{"GT":"0|0","GQ":48,"DP":8,"HQ":[51,51]}
]
},
{
"sample_id":"SAMPLE7",
"formatdata":[
{"GT":"0|0","GQ":48,"DP":8,"HQ":[51,51]}
]
}
]
}
]
}
我想知道有多少个子文档包含GT:"1 | 0",在这种情况下,第一个文档中为1,第二个文档中为两个,第三个文档中为0.我已经尝试了unwind和aggregate函数,但是显然我没有做正确的事情.当我尝试通过"GT"字段计算子文档时,mongo抱怨:
I want to know how many subdocuments contain GT:"1|0", which in this case would be 1 in the first document, and two in the second, and 0 in the 3rd. I've tried the unwind and aggregate functions but I'm obviously not doing something correct. When I try to count the sub documents by the "GT" field, mongo complains:
db.collection.aggregate([{$group: {"$studies.samples.formatdata.GT":1,_id:0}}])
因为我的网上论坛的名称不能包含.",但是如果我将其省略:
since my group's names cannot contain ".", yet if I leave them out:
db.collection.aggregate([{$group: {"$GT":1,_id:0}}])
它抱怨,因为"$ GT不能是运算符名称"
it complains because "$GT cannot be an operator name"
有什么想法吗?
推荐答案
您需要处理 $unwind
处理数组时,您需要执行三遍:
You need to process $unwind
when working with arrays, and you need to do this three times:
db.collection.aggregate([
// Un-wind the array's to access filtering
{ "$unwind": "$studies" },
{ "$unwind": "$studies.samples" },
{ "$unwind": "$studies.samples.formdata" },
// Group results to obtain the matched count per key
{ "$group": {
"_id": "$studies.samples.formdata.GT",
"count": { "$sum": 1 }
}}
])
理想情况下,您想过滤输入.可能在$ unwind之前和之后都使用 $ match 进行此操作处理并使用 $ regex 来匹配文档中的数据点以"1"开头.
Ideally you want to filter your input. Possibly do this with a $match both before and after $unwind is processed and using a $regex to match documents where the data at point begins with a "1".
db.collection.aggregate([
// Match first to exclude documents where this is not present in any array member
{ "$match": { "studies.samples.formdata.GT": /^1/ } },
// Un-wind the array's to access filtering
{ "$unwind": "$studies" },
{ "$unwind": "$studies.samples" },
{ "$unwind": "$studies.samples.formdata" },
// Match to filter
{ "$match": { "studies.samples.formdata.GT": /^1/ } },
// Group results to obtain the matched count per key
{ "$group": {
"_id": {
"_id": "$_id",
"key": "$studies.samples.formdata.GT"
},
"count": { "$sum": 1 }
}}
])
请注意,在所有情况下,带美元$"前缀的条目都是指引用文档属性的变量".这些是在右侧使用输入的值".左侧的键"必须指定为纯字符串键.不能使用任何变量来命名键.
Note that in all cases the "dollar $" prefixed entries are the "variables" referring to properties of the document. These are "values" to use an input on the right side. The left side "keys" must be specified as a plain string key. No variable can be used to name a key.
这篇关于MongoDB嵌套对象聚合计数的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持!