在多个CPU内核上运行独立Hadoop应用程序

本文介绍了在多个CPU内核上运行独立Hadoop应用程序的处理方法，对大家解决问题具有一定的参考价值，需要的朋友们下面随着小编来一起学习吧！

问题描述

我的团队使用Hadoop库构建了一个Java应用程序，将一堆输入文件转换为有用的输出。
给定当前负载，单个多核服务器将在未来一年左右做好。我们还没有需要去一个多服务器Hadoop集群，但我们选择开始这个项目准备。

My team built a Java application using the Hadoop libraries to transform a bunch of input files into useful output.Given the current load a single multicore server will do fine for the coming year or so. We do not (yet) have the need to go for a multiserver Hadoop cluster, yet we chose to start this project "being prepared".

当我运行这个应用程序命令行（或在eclipse或netbeans），我还不能说服它使用更多的一个地图和/或减少线程一次。
由于这个工具是非常CPU密集型的，所以单线程是我目前的瓶颈。

When I run this app on the command-line (or in eclipse or netbeans) I have not yet been able to convince it to use more that one map and/or reduce thread at a time.Given the fact that the tool is very CPU intensive this "single threadedness" is my current bottleneck.

当在netbeans分析器中运行它时，应用程序为了各种目的启动多个线程，但只有一个map / reduce正在同一时间运行。

When running it in the netbeans profiler I do see that the app starts several threads for various purposes, but only a single map/reduce is running at the same moment.

输入数据由几个输入文件组成，因此Hadoop应该最少能够在映射阶段同时为每个输入文件运行1个线程。

The input data consists of several input files so Hadoop should at least be able to run 1 thread per input file at the same time for the map phase.

我可以做什么至少有2甚至4个活动线程运行在这个应用程序的大部分处理时间应该是可能的）

What do I do to at least have 2 or even 4 active threads running (which should be possible for most of the processing time of this application)?

我希望这是一个非常愚蠢的，我忽略了。

I'm expecting this to be something very silly that I've overlooked.

我发现这个：
这实现了我在Hadoop中寻找的功能0.21
它介绍flag mapreduce.local.map.tasks.maximum来控制它。

I just found this: https://issues.apache.org/jira/browse/MAPREDUCE-1367This implements the feature I was looking for in Hadoop 0.21It introduces the flag mapreduce.local.map.tasks.maximum to control it.

现在我也找到了解决方案。

For now I've also found the solution described here in this question.

在多个CPU内核上运行独立Hadoop应用程序

问题描述

推荐答案