本文介绍了在mongodb map-reduce中使用相同的键多次调用reduce.的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我正在尝试在mongo shell中的mongodb上运行map reduce.由于某种原因,在减少阶段,我得到了多次调用同一键的信息(而不是单个键),因此得到了错误的结果.我不是这方面的专家,所以也许我正在犯一些愚蠢的错误.任何帮助表示赞赏.

I'm trying to run map reduce on mongodb in mongo shell. For some reason, in the reduce phase, I get several calls for the same key (instead of single one), so I get wrong results.I'm not an expert in this domains, so maybe I'm doing some stupid mistake. Any help appreciated.

谢谢.

这是我的小例子:

我正在创建10000个文档:

I'm creating 10000 documents:

var i = 0;
db.docs.drop();
while (i < 10000) {
    db.docs.insert({text:"line " + i,index:i});
    i++;
}

然后我正在基于模块10进行map-reduce(所以我除了要在每个存储桶"中获得1000)

Then I'm doing map-reduce based on module 10 (so I except to get 1000 in each "bucket")

db.docs.mapReduce(
    function() {
       emit(this.index%10,1);
    },
    function(key,values) {
       return values.length;
    },
    {
    out : {inline : 1}
    }
);

但是,结果显示如下:

{
    "results" : [
        {
            "_id" : 0,
            "value" : 21
        },
        {
            "_id" : 1,
            "value" : 21
        },
        {
            "_id" : 2,
            "value" : 21
        },
        {
            "_id" : 3,
            "value" : 21
        },
        {
            "_id" : 4,
            "value" : 21
        },
        {
            "_id" : 5,
            "value" : 21
        },
        {
            "_id" : 6,
            "value" : 21
        },
        {
            "_id" : 7,
            "value" : 21
        },
        {
            "_id" : 8,
            "value" : 21
        },
        {
            "_id" : 9,
            "value" : 21
        }
    ],
    "timeMillis" : 76,
    "counts" : {
        "input" : 10000,
        "emit" : 10000,
        "reduce" : 500,
        "output" : 10
    },
    "ok" : 1,
}

推荐答案

Map/Reduce本质上是一种递归操作.特别是, reduce函数记录的要求包括以下内容声明:

Map/Reduce is essentially a recursive operation. In particular, the documented requirements for the reduce function include the following statement:

因此,您必须期望输入只是先前调用所计算的数字.以下代码通过实际添加值来做到这一点:

Therefore, you have to expect that the input is merely the number that was counted by a previous invocation. The following code does that by actually adding the values:

db.docs.mapReduce(
    function() { emit(this.index % 10, 1); },
    function(key,values) { return Array.sum(values); },
    { out : {inline : 1} } );

现在,emit(key, 1)在某种程度上更具意义,因为1不再只是用于填充数组的任何数字,而是考虑了其值.

Now, the emit(key, 1) makes more sense in a way, because 1 is no longer just any number used to fill the array, but its value is considered.

请注意,这是多么危险:对于较小的数据集,可能是偶然给出了正确的结果,因为引擎认为不需要并行化.

As a sidenote, note how dangerous this is: For a smaller dataset, the correct result might have been given by accident, because the engine decided a parallelization wouldn't be necessary.

这篇关于在mongodb map-reduce中使用相同的键多次调用reduce.的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持!

05-29 03:36
查看更多