问题描述
我写了一个 mapreduce 函数,其中记录以下列格式发出
I wrote a mapreduce function where the records are emitted in the following format
{userid:<xyz>, {event:adduser, count:1}}
{userid:<xyz>, {event:login, count:1}}
{userid:<xyz>, {event:login, count:1}}
{userid:<abc>, {event:adduser, count:1}}
其中 userid 是键,其余的是该键的值.在 MapReduce 函数之后,我想得到以下格式的结果
where userid is the key and the remaining are the value for that key.After the MapReduce function, I want to get the result in following format
{userid:<xyz>,{events: [{adduser:1},{login:2}], allEventCount:3}}
为了实现这一点,我编写了以下 reduce 函数我知道这可以通过 group by.. 在聚合框架和 mapreduce 中实现,但是对于复杂的场景,我们需要类似的功能.所以,我采用了这种方法.
To acheive this I wrote the following reduce functionI know this can be achieved by group by.. both in aggregation framework and mapreduce, but we require a similar functionality for a complex scenario. So, I am taking this approach.
var reducefn = function(key,values){
var result = {allEventCount:0, events:[]};
values.forEach(function(value){
var notfound=true;
for(var n = 0; n < result.events.length; n++){
eventObj = result.events[n];
for(ev in eventObj){
if(ev==value.event){
result.events[n][ev] += value.allEventCount;
notfound=false;
break;
}
}
}
if(notfound==true){
var newEvent={}
newEvent[value.event]=1;
result.events.push(newEvent);
}
result.allEventCount += value.allEventCount;
});
return result;
}
这运行完美,当我运行 1000 条记录时,当有 3k 或 10k 条记录时,我得到的结果是这样的
This runs perfectly, when I run for 1000 records, when there are 3k or 10k records, the result I get is something like this
{ "_id" : {...}, "value" :{"allEventCount" :30, "events" :[ { "undefined" : 1},
{"adduser" : 1 }, {"remove" : 3 }, {"training" : 1 }, {"adminlogin" : 1 },
{"downgrade" : 2 } ]} }
无法理解此 undefined
来自何处,并且各个事件的总和小于 allEventCount.集合中的所有文档都有非空字段 event
所以没有机会未定义.
Not able to understand where this undefined
came from and also the sum of the individual events is less than allEventCount. All the docs in the collection has non-empty field event
so there is no chance of undefined.
Mongo DB 版本 -- 2.2.1环境——本地机器,无分片.
Mongo DB version -- 2.2.1Environment -- Local machine, no sharding.
在reduce函数中,为什么这个操作会失败result.events[n][ev] += value.allEventCount;
当类似操作result.allEventCount += value.allEventCount;
通过?
In the reduce function, why should this operation fail result.events[n][ev] += value.allEventCount;
when the similar operation result.allEventCount += value.allEventCount;
passes?
johnyHK建议的正确答案
The corrected answer as suggested by johnyHK
减少功能:
var reducefn = function(key,values){
var result = {totEvents:0, event:[]};
values.forEach(function(value){
value.event.forEach(function(eventElem){
var notfound=true;
for(var n = 0; n < result.event.length; n++){
eventObj = result.event[n];
for(ev in eventObj){
for(evv in eventElem){
if(ev==evv){
result.event[n][ev] += eventElem[evv];
notfound=false;
break;
}
}}
}
if(notfound==true){
result.event.push(eventElem);
}
});
result.totEvents += value.totEvents;
});
return result;
}
推荐答案
您从 map
函数中emit
的对象的形状必须与返回的对象相同来自您的 reduce
函数,因为在处理大量文档(如本例中)时,reduce
的结果可以反馈到 reduce
.
The shape of the object you emit
from your map
function must be the same as the object returned from your reduce
function, as the results of a reduce
can get fed back into reduce
when processing large numbers of docs (like in this case).
所以你需要改变你的 emit
来发出这样的文档:
So you need to change your emit
to emit docs like this:
{userid:<xyz>, {events:[{adduser: 1}], allEventCount:1}}
{userid:<xyz>, {events:[{login: 1}], allEventCount:1}}
然后相应地更新您的 reduce
函数.
and then update your reduce
function accordingly.
这篇关于MongoDB MapReduce:超过 1000 条记录无法按预期工作的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持!