问题描述
我已经使用 cloudera 测试了 hadoop 和 mapreduce,我发现它非常酷,我认为我是最新且相关的 BigData 解决方案.但是几天前,我发现了这个:https://spark.incubator.apache.org/
I have tested hadoop and mapreduce with cloudera and I found it pretty cool, I thought I was the most recent and relevant BigData solution. But few days ago, I found this :https://spark.incubator.apache.org/
一个闪电般的快速集群计算系统",能够在 Hadoop 集群之上工作,并且显然能够碾压 mapreduce.我看到它在 RAM 中比 mapreduce 更有效.我认为当您必须进行集群计算以克服您在单台机器上可能遇到的 I/O 问题时,mapreduce 仍然是相关的.但是由于 Spark 可以完成 mapreduce 所做的工作,并且可能在多个操作上效率更高,这不是 MapReduce 的终结吗?还是 MapReduce 可以做更多的事情,或者 MapReduce 在特定情况下是否比 Spark 更有效?
A "Lightning fast cluster computing system", able to work on the top of a Hadoop cluster, and apparently able to crush mapreduce. I saw that it worked more in RAM than mapreduce.I think that mapreduce is still relevant when you have to do cluster computing to overcome I/O problems you can have on a single machine.But since Spark can do the jobs that mapreduce do, and may be way more efficient on several operations, isn't it the end of MapReduce ? Or is there something more that MapReduce can do, or can MapReduce be more efficient than Spark in a certain context ?
推荐答案
MapReduce 本质上是面向批处理的.因此,任何基于 MR 实现的框架,如 Hive 和 Pig,本质上也是面向批处理的.对于机器学习和交互式分析的迭代处理,Hadoop/MR 不满足要求.这里 是 Cloudera 关于Why Spark
的一篇很好的文章,它很好地总结了它.
MapReduce is batch oriented in nature. So, any frameworks on top of MR implementations like Hive and Pig are also batch oriented in nature. For iterative processing as in the case of Machine Learning and interactive analysis, Hadoop/MR doesn't meet the requirement. Here is a nice article from Cloudera on Why Spark
which summarizes it very nicely.
这不是 MR 的终结.在撰写本文时,与 Spark 相比,Hadoop 已经成熟得多,并且许多供应商都支持它.它会随着时间而改变.Cloudera 已开始将 Spark 包含在 CDH 中,并且随着时间的推移,越来越多的供应商会将其包含在他们的大数据分发中并为其提供商业支持.在可预见的未来,我们会同时看到 MR 和 Spark.
It's not an end of MR. As of this writing Hadoop is much mature when compared to Spark and a lot of vendors support it. It will change over time. Cloudera has started including Spark in CDH and over time more and more vendors would be including it in their Big Data distribution and providing commercial support for it. We would see MR and Spark in parallel for foreseeable future.
此外,借助 Hadoop 2(又名 YARN),MR 和其他模型(包括 Spark)也可以在单个集群上运行.所以,Hadoop 不会去任何地方.
Also with Hadoop 2 (aka YARN), MR and other models (including Spark) can be run on a single cluster. So, Hadoop is not going anywhere.
这篇关于MapReduce 还是 Spark?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持!