问题描述
我的团队使用Hadoop库构建了一个Java应用程序,将一堆输入文件转换为有用的输出。
给定当前负载,单个多核服务器将在未来一年左右做好。我们还没有需要去一个多服务器Hadoop集群,但我们选择开始这个项目准备。
My team built a Java application using the Hadoop libraries to transform a bunch of input files into useful output.Given the current load a single multicore server will do fine for the coming year or so. We do not (yet) have the need to go for a multiserver Hadoop cluster, yet we chose to start this project "being prepared".
当我运行这个应用程序命令行(或在eclipse或netbeans),我还不能说服它使用更多的一个地图和/或减少线程一次。
由于这个工具是非常CPU密集型的,所以单线程是我目前的瓶颈。
When I run this app on the command-line (or in eclipse or netbeans) I have not yet been able to convince it to use more that one map and/or reduce thread at a time.Given the fact that the tool is very CPU intensive this "single threadedness" is my current bottleneck.
当在netbeans分析器中运行它时,应用程序为了各种目的启动多个线程,但只有一个map / reduce正在同一时间运行。
When running it in the netbeans profiler I do see that the app starts several threads for various purposes, but only a single map/reduce is running at the same moment.
输入数据由几个输入文件组成,因此Hadoop应该最少能够在映射阶段同时为每个输入文件运行1个线程。
The input data consists of several input files so Hadoop should at least be able to run 1 thread per input file at the same time for the map phase.
我可以做什么至少有2甚至4个活动线程运行在这个应用程序的大部分处理时间应该是可能的)
What do I do to at least have 2 or even 4 active threads running (which should be possible for most of the processing time of this application)?
我希望这是一个非常愚蠢的,我忽略了。
I'm expecting this to be something very silly that I've overlooked.
我发现这个:
这实现了我在Hadoop中寻找的功能0.21
它介绍flag mapreduce.local.map.tasks.maximum来控制它。
I just found this: https://issues.apache.org/jira/browse/MAPREDUCE-1367This implements the feature I was looking for in Hadoop 0.21It introduces the flag mapreduce.local.map.tasks.maximum to control it.
现在我也找到了解决方案。
For now I've also found the solution described here in this question.
推荐答案
我不确定我是否正确,但是当您在本地模式下运行任务时,有多个mappers / reducers。
I'm not sure if I'm correct, but when you are running tasks in local mode, you can't have multiple mappers/reducers.
无论如何,要设置最大数量的运行mappers和reducers使用配置选项 mapred.tasktracker.map.tasks.maximum 和 mapred.tasktracker.reduce.tasks.maximum 默认情况下,这些选项设置为 2 ,所以我可能是对的。
Anyway, to set maximum number of running mappers and reducers use configuration options mapred.tasktracker.map.tasks.maximum and mapred.tasktracker.reduce.tasks.maximum by default those options are set to 2, so I might be right.
最后,如果你想准备多节点集群,直接运行这个完全分布式的方式,但有所有服务器namenode,datanode,tasktracker,jobtracker,...)在单个机器上运行
Finally, if you want to be prepared for multinode cluster go straight with running this in fully-distributed way, but have all servers (namenode, datanode, tasktracker, jobtracker, ...) run on a single machine
这篇关于在多个CPU内核上运行独立Hadoop应用程序的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持!