问题描述
使用Hadoop map / reduce与内置map / reduce Mongo进行map / reduce工作在MongoDB上的关键区别有哪些?
何时选择哪个映射/减少引擎?每个引擎对mongodb中存储的数据有什么优点和缺点?解决方案
使用Hadoop map / reduce与内置map / reduce Mongo进行map / reduce工作在MongoDB上的关键区别有哪些?
何时选择哪个映射/减少引擎?每个引擎对mongodb中存储的数据有什么优点和缺点?解决方案
我的答案是基于知识和经验Hadoop MR和Mongo DB MR的学习。
让我们看看有哪些主要差异,然后尝试为选择定义标准:
差异是:
从上面我可以推荐以下选择标准: >
如果您需要简单的group by和过滤,请选择Mongo DB MR,不要指望map和reduce之间有重复的混洗。换句话说 - 很简单。
如果您要执行复杂的,计算密集的MR作业(例如一些回归计算),请选择hadoop MR。在map和reduce之间有很多或不可预知的数据大小也表明Hadoop MR。
Java是一种更强大的语言,包含更多的库,尤其是统计数据库。应该考虑到这一点。
What are the key differences to do map/reduce work on MongoDB using Hadoop map/reduce vs built-in map/reduce of Mongo?
When do I pick which map/reduce engine? what are the pros and cons of each engine to work on data stored in mongodb?
My answer is based on knowledge and experience of Hadoop MR and learning of Mongo DB MR.Lets see what are major differences and then try to define criteria for selection:Differences are:
From the above I can suggest the following criteria for selection:
Select Mongo DB MR if you need simple group by and filtering, do not expect heavy shuffling between map and reduce. In other words - something simple.
Select hadoop MR if you're going to do complicated, computationally intense MR jobs (for example some regressions calculations). Having a lot or unpredictable size of data between map and reduce also suggests Hadoop MR.
Java is a stronger language with more libraries, especially statistical. That should be taken into account.
这篇关于Hadoop Map / Reduce与内置的Map / Reduce的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持!