本文介绍了Hadoop WordCount示例问题,我需要做一些性能调整吗?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我是 Hadoop 的新手.

最近我只是实现WordCount示例.

Recently I just make an implementation of WordCount example.

但是,当我在带有2个输入文件(仅9个字)的单个节点上运行此程序时,这样做花费了将近33秒!!!太疯狂了,这让我很困惑!!!

But when I run this programs on my single node with 2 input files , just 9 word, it cost nearly 33 second to do such !!! so crazy, and it makes me so confusing !!!

任何人都可以告诉我这是正常现象吗?

Can any one tell me is this normal or some???

如何解决此问题?记住,我只是创建了2个输入文件,其中有9个字.

How can I fix this problem? Remember, I just create 2 input files with 9 word in it.

推荐答案

Hadoop对于非常小的作业而言效率不高,因为JVM启动,进程初始化等需要花费更多时间.不过,可以通过启用JVM重用在某种程度上对其进行优化.

Hadoop is not efficient for very very small jobs, as it takes more time for the JVM Startup, process initialization and others. Though, it can be optimized to some extent by enabling JVM reuse.

http://hadoop.apache. org/common/docs/r0.20.2/mapred_tutorial.html#Task + JVM + Reuse

此外,Apache Hadoop中正在进行一些工作

Also, there is some work going on this in Apache Hadoop

https://issues.apache.org/jira/browse/MAPREDUCE-1220

不确定此版本将包含在哪个版本中,或者JIRA的状态是什么.

Not sure in which release this will be included or what the state of the JIRA is.

这篇关于Hadoop WordCount示例问题,我需要做一些性能调整吗?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持!

08-21 15:00