问题描述
正如标题中所解释的,当我执行Hadoop程序(并在本地模式下进行调试)时,会发生以下情况:
我的测试数据中的所有10个csv行在映射步骤后调用的映射器,分区器和RawComperator(OutputKeyComparatorClass)中正确处理。但OutputValueGroupingComparatorClass和ReduceClass的函数不会在之后执行。
我的应用程序如下所示。 (由于空间限制,我省略了用作配置参数的类的实现,直到有人有一个想法,涉及它们):
public class RetweetApplication {
public static int DEBUG = 1;
static String INPUT =/ home / ema / INPUT-H;
static String OUTPUT =/ home / ema / OUTPUT-H+(new Date())。toString();
public static void main(String [] args){
JobClient client = new JobClient();
JobConf conf =新的JobConf(RetweetApplication.class);
if(DEBUG> 0){
conf.set(mapred.job.tracker,local);
conf.set(fs.default.name,file:///);
conf.set(dfs.replication,1);
}
FileInputFormat.setInputPaths(conf,new Path(INPUT));
FileOutputFormat.setOutputPath(conf,new Path(OUTPUT));
//conf.setOutputKeyClass(Text.class);
//conf.setOutputValueClass(Text.class);
conf.setMapOutputKeyClass(Text.class);
conf.setMapOutputValueClass(Text.class);
conf.setMapperClass(RetweetMapper.class);
conf.setPartitionerClass(TweetPartitioner.class);
conf.setOutputKeyComparatorClass(TwitterValueGroupingComparator.class);
conf.setOutputValueGroupingComparator(TwitterKeyGroupingComparator.class);
conf.setReducerClass(RetweetReducer.class);
conf.setOutputFormat(TextOutputFormat.class);
client.setConf(conf);
尝试{
JobClient.runJob(conf);
} catch(Exception e){
e.printStackTrace();
}
}
}
3。我得到了以下控制台输出(对于格式抱歉,但不知何故这个日志没有得到格式正确):
大胆的标志线从这一点重复不断。
4。在映射器看到每个元组之后,许多打开的进程都处于活动状态:
$ b
RetweetApplication(1)[远程Java应用程序]
OpenJDK客户端VM [localhost:5002]
线程[main](正在运行)
线程[线程-2](正在运行)
守护线程[ (正在运行)
线程[MapOutputCopier attempt_local_0001_r_000000_0.0](正在运行)
线程[MapOutputCopier attempt_local_0001_r_000000_0.1](正在运行)
线程[MapOutputCopier attempt_local_0001_r_000000_0.2](正在运行)
线程[MapOutputCopier attempt_local_0001_r_000000_0.4](正在运行)
线程[MapOutputCopier attempt_local_0001_r_000000_0.3](正在运行)
守护线程[用于合并磁盘文件的线程](正在运行)
守护线程[Thread for合并到内存文件中](正在运行)
守护线程[用于轮询Map Completion事件的线程](正在运行)
是否有任何理由,为什么Hadoop希望映射器获得更多输出(请参阅日志中的粗体标记行)进入输入目录?如前所述,我调试了所有的输入在mapper / partitioner / etc中正确处理。
$ b 更新
在帮助我发现,我的程序没有按照我的预期在localMode中启动: ReduceTask中的
isLocal
变量code> class被设置为 false
,但它应该是 true
。
对我来说,为什么会发生这种情况绝对不清楚,因为必须设置为启用独立模式的3个选项才是正确的选择。令人惊讶的是:本地
设置被忽略,从正常光盘读取设置不是很正常,这很奇怪,因为我认为 local 模式和 file:///
协议相耦合。
在调试 ReduceTask
过程中,我将 isLocal
变量设置为在我的调试视图中评估 isLocal = true
,然后尝试执行程序的其余部分。它没有成功,这是堆栈跟踪:
12/05/22 14:28:28信息mapred.LocalJobRunner:
12/05/22 14:28:28信息mapred.Merger:合并1个已排序的区段
12/05/22 14:28:28信息mapred.Merger:直到最后一个合并通道,还剩1个字节:1956字节
12/05/22 14:28:28信息mapred.LocalJobRunner:
12/05/22 14:28:29 WARN conf.Configuration:file: /tmp/hadoop-ema/mapred/local/localRunner/job_local_0001.xml:a尝试覆盖最终参数:fs.default.name;忽略。
12/05/22 14:28:29 WARN conf.Configuration:file:/tmp/hadoop-ema/mapred/local/localRunner/job_local_0001.xml:尝试覆盖最终参数:mapred.job.tracker ;忽略。
12/05/22 14:28:30信息ipc.Client:重试连接到服务器:master / 127.0.0.1:9001。已经尝试0次(s)。
12/05/22 14:28:31信息ipc.Client:重试连接到服务器:master / 127.0.0.1:9001。已经尝试过1次。
12/05/22 14:28:32信息ipc.Client:重试连接到服务器:master / 127.0.0.1:9001。已经尝试过2次。
12/05/22 14:28:33信息ipc.Client:重试连接到服务器:master / 127.0.0.1:9001。已经尝试了3次。
12/05/22 14:28:34信息ipc.Client:重试连接到服务器:master / 127.0.0.1:9001。已经尝试过4次。
12/05/22 14:28:35信息ipc.Client:重试连接到服务器:master / 127.0.0.1:9001。已经尝试过5次。
12/05/22 14:28:36信息ipc.Client:重试连接到服务器:master / 127.0.0.1:9001。已经尝试过6次。
12/05/22 14:28:37信息ipc.Client:重试连接到服务器:master / 127.0.0.1:9001。已经尝试过7次。
12/05/22 14:28:38信息ipc.Client:重试连接到服务器:master / 127.0.0.1:9001。已经尝试了8次。
12/05/22 14:28:39信息ipc.Client:重试连接到服务器:master / 127.0.0.1:9001。已经尝试了9次。
12/05/22 14:28:39 WARN conf.Configuration:file:/tmp/hadoop-ema/mapred/local/localRunner/job_local_0001.xml:尝试覆盖最终参数:fs.default.name ;忽略。
12/05/22 14:28:39 WARN conf.Configuration:file:/tmp/hadoop-ema/mapred/local/localRunner/job_local_0001.xml:尝试覆盖最终参数:mapred.job.tracker ;忽略。
12/05/22 14:28:39警告mapred.LocalJobRunner:job_local_0001
java.net.ConnectException:调用master / 127.0.0.1:9001连接异常失败:java.net.ConnectException:连接被拒绝
在org.apache.hadoop.ipc.Client.wrapException(Client.java:1095)
在org.apache.hadoop.ipc.Client.call(Client.java:1071)
at org.apache.hadoop.ipc.RPC $ Invoker.invoke(RPC.java:225)
at $ Proxy1.getProtocolVersion(Unknown Source)
at org.apache.hadoop.ipc.RPC .getProxy(RPC.java:396)
at org.apache.hadoop.ipc.RPC.getProxy(RPC.java:379)
at org.apache.hadoop.hdfs.DFSClient.createRPCNamenode(DFSClient .java:119)
at org.apache.hadoop.hdfs.DFSClient。< init>(DFSClient.java:238)
at org.apache.hadoop.hdfs.DFSClient。< init> (DFSClient.java:203)
at org.apache.hadoop.hdfs.DistributedFileSystem.initialize(DistributedFileSystem.java:89)
at org.apache.hadoop.fs.FileSystem.createFileSystem(FileSystem.jav a:1386)
在org.apache.hadoop.fs.FileSystem.access $ 200(FileSystem.java:66)
在org.apache.hadoop.fs.FileSystem $ Cache.get(FileSystem.java :1404)
at org.apache.hadoop.fs.FileSystem.get(FileSystem.java:254)
at org.apache.hadoop.fs.FileSystem.get(FileSystem.java:123)
at org.apache.hadoop.mapred.ReduceTask $ OldTrackingRecordWriter。< init>(ReduceTask.java:446)
at org.apache.hadoop.mapred.ReduceTask.runOldReducer(ReduceTask.java:490)
at org.apache.hadoop.mapred.ReduceTask.run(ReduceTask.java:420)
at org.apache.hadoop.mapred.LocalJobRunner $ Job.run(LocalJobRunner.java:260)
导致:java.net.ConnectException:连接被拒绝
在sun.nio.ch.SocketChannelImpl.checkConnect(本地方法)
在sun.nio.ch.SocketChannelImpl.finishConnect(SocketChannelImpl.java: 592)
在org.apache.hadoop.net.SocketIOWithTimeout.connect(SocketIOWithTimeout.java:206)
在org.apache.hadoop.net.NetUtils.connect (NetUtils.java:489)$ or
at org.apache.hadoop.ipc.Client $ Connection.setupConnection(Client.java:434)
at org.apache.hadoop.ipc.Client $ Connection.setupIOstreams (Client.java:560)
at org.apache.hadoop.ipc.Client $ Connection.access $ 2000(Client.java:184)
at org.apache.hadoop.ipc.Client.getConnection( Client.java:1202)
at org.apache.hadoop.ipc.Client.call(Client.java:1046)
... 17 more
12/05/22 14:28 :39 WARN conf.Configuration:file:/tmp/hadoop-ema/mapred/local/localRunner/job_local_0001.xml:尝试覆盖最终参数:fs.default.name;忽略。
12/05/22 14:28:39 WARN conf.Configuration:file:/tmp/hadoop-ema/mapred/local/localRunner/job_local_0001.xml:尝试覆盖最终参数:mapred.job.tracker ;忽略。
12/05/22 14:28:39信息mapred.JobClient:作业完成:job_local_0001
12/05/22 14:28:39信息mapred.JobClient:计数器:20
12 / 05/22 14:28:39信息mapred.JobClient:文件输入格式计数器
12/05/22 14:28:39信息mapred.JobClient:字节读取= 967
12/05/22 14:28:39信息mapred.JobClient:FileSystemCounters
12/05/22 14:28:39信息mapred.JobClient:FILE_BYTES_READ = 14093
12/05/22 14:28:39信息mapred。 JobClient:FILE_BYTES_WRITTEN = 47859
12/05/22 14:28:39信息mapred.JobClient:Map-Reduce Framework
12/05/22 14:28:39信息mapred.JobClient:地图输出物化bytes = 1960
12/05/22 14:28:39信息mapred.JobClient:映射输入记录= 10
12/05/22 14:28:39信息mapred.JobClient:减少随机播放字节数= 0
12/05/22 14:28:39信息mapred.JobClient:溢出记录= 10
12/05/22 14:28:39信息mapred.JobClient:映射输出字节= 1934
12/05/22 14:28:39信息mapred.JobClient:共有已提交的堆u sage(bytes)= 115937280
12/05/22 14:28:39信息mapred.JobClient:花费的CPU时间(ms)= 0
12/05/22 14:28:39信息mapred。 JobClient:映射输入字节= 967
12/05/22 14:28:39信息mapred.JobClient:SPLIT_RAW_BYTES = 82
12/05/22 14:28:39信息mapred.JobClient:合并输入记录= 0
12/05/22 14:28:39信息mapred.JobClient:减少输入记录= 0
12/05/22 14:28:39信息mapred.JobClient:减少输入组= 0
12/05/22 14:28:39信息mapred.JobClient:合并输出记录= 0
12/05/22 14:28:39信息mapred.JobClient:物理内存(字节)快照= 0
12/05/22 14:28:39信息mapred.JobClient:减少输出记录= 0
12/05/22 14:28:39信息mapred.JobClient:虚拟内存(字节)快照= 0
12/05/22 14:28:39信息mapred.JobClient:映射输出记录= 10
12/05/22 14:28:39信息mapred.JobClient:作业失败:不适用
java.io.IOException:作业失败!
at org.apache.hadoop.mapred.JobClient.runJob(JobClient.java:1265)
at uni.kassel.macek.rtprep.RetweetApplication.main(RetweetApplication.java:50)
在sun.reflect.NativeMethodAccessorImpl.invoke0(本地方法)
at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:616)
at org.apache.hadoop.util.RunJar.main(RunJar.java:156)
code>
由于这个stacktrace现在显示了我,在执行期间使用了端口9001,我猜想xml配置文件以某种方式覆盖本地java的设置(我用于测试),这是奇怪的,因为我反复阅读在互联网上,java覆盖xml配置。如果没有人知道如何解决这个问题,那么试着简单地删除所有的配置xml。也许这就解决了这个问题...
$ b $ p $ strong>新的更新
重命名Hadoops conf
文件夹解决了等待复印机的问题,程序一直执行到最后。可悲的是执行不再等待我的调试器,尽管 HADOOP_OPTS
设置正确。 RESUME:它只有一个配置问题:XML可能(对于某些配置参数)覆盖JAVA。如果有人知道我可以如何调试再次运行,这将是完美的,但现在我只是很高兴我没有看到这堆栈跟踪了! ;)
感谢克里斯为你的时间和精力!
>对不起,我之前没有看到这个,但是您的conf xml文件中似乎有两个重要的配置属性设置为final,如以下日志语句所示:这意味着您的作业无法真正以本地模式运行,它以本地模式启动,但reducer读取序列化的作业配置并确定它不处于本地模式,并尝试通过任务跟踪器端口获取映射输出。
你说你的修复是重命名conf文件夹 - 这会默认hadoop回到默认配置,这两个属性没有标记为'final'。
as explained in the title, when i execute my Hadoop Program (and debug it in local mode) the following happens:
1. All 10 csv-lines in my test data are handled correctly in the Mapper, the Partitioner and the RawComperator(OutputKeyComparatorClass) that is called after the map-step. But the OutputValueGroupingComparatorClass's and the ReduceClass's functions do NOT get executed afterwards.
2. My application looks like the following. (due to space constraints i omit the implementation of the classes i used as configuration parameters, til somebody has an idea, that involves them):
public class RetweetApplication {
public static int DEBUG = 1;
static String INPUT = "/home/ema/INPUT-H";
static String OUTPUT = "/home/ema/OUTPUT-H "+ (new Date()).toString();
public static void main(String[] args) {
JobClient client = new JobClient();
JobConf conf = new JobConf(RetweetApplication.class);
if(DEBUG > 0){
conf.set("mapred.job.tracker", "local");
conf.set("fs.default.name", "file:///");
conf.set("dfs.replication", "1");
}
FileInputFormat.setInputPaths(conf, new Path(INPUT));
FileOutputFormat.setOutputPath(conf, new Path(OUTPUT));
//conf.setOutputKeyClass(Text.class);
//conf.setOutputValueClass(Text.class);
conf.setMapOutputKeyClass(Text.class);
conf.setMapOutputValueClass(Text.class);
conf.setMapperClass(RetweetMapper.class);
conf.setPartitionerClass(TweetPartitioner.class);
conf.setOutputKeyComparatorClass(TwitterValueGroupingComparator.class);
conf.setOutputValueGroupingComparator(TwitterKeyGroupingComparator.class);
conf.setReducerClass(RetweetReducer.class);
conf.setOutputFormat(TextOutputFormat.class);
client.setConf(conf);
try {
JobClient.runJob(conf);
} catch (Exception e) {
e.printStackTrace();
}
}
}
3. I get the following console output(sorry for the format, but somehow this log doesnt get formatted correctly):
The bold marked lines repeat endlessly from this point.
4. Alot of open processes are active after the mapper saw every tuple:
RetweetApplication (1) [Remote Java Application]
OpenJDK Client VM[localhost:5002]
Thread [main] (Running)
Thread [Thread-2] (Running)
Daemon Thread [communication thread] (Running)
Thread [MapOutputCopier attempt_local_0001_r_000000_0.0] (Running)
Thread [MapOutputCopier attempt_local_0001_r_000000_0.1] (Running)
Thread [MapOutputCopier attempt_local_0001_r_000000_0.2] (Running)
Thread [MapOutputCopier attempt_local_0001_r_000000_0.4] (Running)
Thread [MapOutputCopier attempt_local_0001_r_000000_0.3] (Running)
Daemon Thread [Thread for merging on-disk files] (Running)
Daemon Thread [Thread for merging in memory files] (Running)
Daemon Thread [Thread for polling Map Completion Events] (Running)
Is there any reason, why Hadoop expects more output from the mapper (see the bold marked lines in the log) than i put into the input directory? As already mentioned, i debugged that ALL inputs are properly processed in the mapper/partitioner/etc.
UPDATEWith the help of Chris (see comments) i found out, that my program was NOT started in localMode as i expected it: the isLocal
variable in the ReduceTask
class is set to false
, though it should be true
.
To me it is absolutely unclear why this happens, since the 3 options that have to be set to enable the standalone mode were set the right way. Surprisingly: tho the local
setting was ignored, the "read from normal disc" setting wasnt, which is very strange imho, because i thought local
mode and the file:///
protocol are coupled.
During debugging ReduceTask
i set the isLocal
variable to true by evaluating isLocal=true
in my debug view and then tried to execute the rest of the program. It did not work out and this is the stacktrace:
12/05/22 14:28:28 INFO mapred.LocalJobRunner:
12/05/22 14:28:28 INFO mapred.Merger: Merging 1 sorted segments
12/05/22 14:28:28 INFO mapred.Merger: Down to the last merge-pass, with 1 segments left of total size: 1956 bytes
12/05/22 14:28:28 INFO mapred.LocalJobRunner:
12/05/22 14:28:29 WARN conf.Configuration: file:/tmp/hadoop-ema/mapred/local/localRunner/job_local_0001.xml:a attempt to override final parameter: fs.default.name; Ignoring.
12/05/22 14:28:29 WARN conf.Configuration: file:/tmp/hadoop-ema/mapred/local/localRunner/job_local_0001.xml:a attempt to override final parameter: mapred.job.tracker; Ignoring.
12/05/22 14:28:30 INFO ipc.Client: Retrying connect to server: master/127.0.0.1:9001. Already tried 0 time(s).
12/05/22 14:28:31 INFO ipc.Client: Retrying connect to server: master/127.0.0.1:9001. Already tried 1 time(s).
12/05/22 14:28:32 INFO ipc.Client: Retrying connect to server: master/127.0.0.1:9001. Already tried 2 time(s).
12/05/22 14:28:33 INFO ipc.Client: Retrying connect to server: master/127.0.0.1:9001. Already tried 3 time(s).
12/05/22 14:28:34 INFO ipc.Client: Retrying connect to server: master/127.0.0.1:9001. Already tried 4 time(s).
12/05/22 14:28:35 INFO ipc.Client: Retrying connect to server: master/127.0.0.1:9001. Already tried 5 time(s).
12/05/22 14:28:36 INFO ipc.Client: Retrying connect to server: master/127.0.0.1:9001. Already tried 6 time(s).
12/05/22 14:28:37 INFO ipc.Client: Retrying connect to server: master/127.0.0.1:9001. Already tried 7 time(s).
12/05/22 14:28:38 INFO ipc.Client: Retrying connect to server: master/127.0.0.1:9001. Already tried 8 time(s).
12/05/22 14:28:39 INFO ipc.Client: Retrying connect to server: master/127.0.0.1:9001. Already tried 9 time(s).
12/05/22 14:28:39 WARN conf.Configuration: file:/tmp/hadoop-ema/mapred/local/localRunner/job_local_0001.xml:a attempt to override final parameter: fs.default.name; Ignoring.
12/05/22 14:28:39 WARN conf.Configuration: file:/tmp/hadoop-ema/mapred/local/localRunner/job_local_0001.xml:a attempt to override final parameter: mapred.job.tracker; Ignoring.
12/05/22 14:28:39 WARN mapred.LocalJobRunner: job_local_0001
java.net.ConnectException: Call to master/127.0.0.1:9001 failed on connection exception: java.net.ConnectException: Connection refused
at org.apache.hadoop.ipc.Client.wrapException(Client.java:1095)
at org.apache.hadoop.ipc.Client.call(Client.java:1071)
at org.apache.hadoop.ipc.RPC$Invoker.invoke(RPC.java:225)
at $Proxy1.getProtocolVersion(Unknown Source)
at org.apache.hadoop.ipc.RPC.getProxy(RPC.java:396)
at org.apache.hadoop.ipc.RPC.getProxy(RPC.java:379)
at org.apache.hadoop.hdfs.DFSClient.createRPCNamenode(DFSClient.java:119)
at org.apache.hadoop.hdfs.DFSClient.<init>(DFSClient.java:238)
at org.apache.hadoop.hdfs.DFSClient.<init>(DFSClient.java:203)
at org.apache.hadoop.hdfs.DistributedFileSystem.initialize(DistributedFileSystem.java:89)
at org.apache.hadoop.fs.FileSystem.createFileSystem(FileSystem.java:1386)
at org.apache.hadoop.fs.FileSystem.access$200(FileSystem.java:66)
at org.apache.hadoop.fs.FileSystem$Cache.get(FileSystem.java:1404)
at org.apache.hadoop.fs.FileSystem.get(FileSystem.java:254)
at org.apache.hadoop.fs.FileSystem.get(FileSystem.java:123)
at org.apache.hadoop.mapred.ReduceTask$OldTrackingRecordWriter.<init>(ReduceTask.java:446)
at org.apache.hadoop.mapred.ReduceTask.runOldReducer(ReduceTask.java:490)
at org.apache.hadoop.mapred.ReduceTask.run(ReduceTask.java:420)
at org.apache.hadoop.mapred.LocalJobRunner$Job.run(LocalJobRunner.java:260)
Caused by: java.net.ConnectException: Connection refused
at sun.nio.ch.SocketChannelImpl.checkConnect(Native Method)
at sun.nio.ch.SocketChannelImpl.finishConnect(SocketChannelImpl.java:592)
at org.apache.hadoop.net.SocketIOWithTimeout.connect(SocketIOWithTimeout.java:206)
at org.apache.hadoop.net.NetUtils.connect(NetUtils.java:489)
at org.apache.hadoop.ipc.Client$Connection.setupConnection(Client.java:434)
at org.apache.hadoop.ipc.Client$Connection.setupIOstreams(Client.java:560)
at org.apache.hadoop.ipc.Client$Connection.access$2000(Client.java:184)
at org.apache.hadoop.ipc.Client.getConnection(Client.java:1202)
at org.apache.hadoop.ipc.Client.call(Client.java:1046)
... 17 more
12/05/22 14:28:39 WARN conf.Configuration: file:/tmp/hadoop-ema/mapred/local/localRunner/job_local_0001.xml:a attempt to override final parameter: fs.default.name; Ignoring.
12/05/22 14:28:39 WARN conf.Configuration: file:/tmp/hadoop-ema/mapred/local/localRunner/job_local_0001.xml:a attempt to override final parameter: mapred.job.tracker; Ignoring.
12/05/22 14:28:39 INFO mapred.JobClient: Job complete: job_local_0001
12/05/22 14:28:39 INFO mapred.JobClient: Counters: 20
12/05/22 14:28:39 INFO mapred.JobClient: File Input Format Counters
12/05/22 14:28:39 INFO mapred.JobClient: Bytes Read=967
12/05/22 14:28:39 INFO mapred.JobClient: FileSystemCounters
12/05/22 14:28:39 INFO mapred.JobClient: FILE_BYTES_READ=14093
12/05/22 14:28:39 INFO mapred.JobClient: FILE_BYTES_WRITTEN=47859
12/05/22 14:28:39 INFO mapred.JobClient: Map-Reduce Framework
12/05/22 14:28:39 INFO mapred.JobClient: Map output materialized bytes=1960
12/05/22 14:28:39 INFO mapred.JobClient: Map input records=10
12/05/22 14:28:39 INFO mapred.JobClient: Reduce shuffle bytes=0
12/05/22 14:28:39 INFO mapred.JobClient: Spilled Records=10
12/05/22 14:28:39 INFO mapred.JobClient: Map output bytes=1934
12/05/22 14:28:39 INFO mapred.JobClient: Total committed heap usage (bytes)=115937280
12/05/22 14:28:39 INFO mapred.JobClient: CPU time spent (ms)=0
12/05/22 14:28:39 INFO mapred.JobClient: Map input bytes=967
12/05/22 14:28:39 INFO mapred.JobClient: SPLIT_RAW_BYTES=82
12/05/22 14:28:39 INFO mapred.JobClient: Combine input records=0
12/05/22 14:28:39 INFO mapred.JobClient: Reduce input records=0
12/05/22 14:28:39 INFO mapred.JobClient: Reduce input groups=0
12/05/22 14:28:39 INFO mapred.JobClient: Combine output records=0
12/05/22 14:28:39 INFO mapred.JobClient: Physical memory (bytes) snapshot=0
12/05/22 14:28:39 INFO mapred.JobClient: Reduce output records=0
12/05/22 14:28:39 INFO mapred.JobClient: Virtual memory (bytes) snapshot=0
12/05/22 14:28:39 INFO mapred.JobClient: Map output records=10
12/05/22 14:28:39 INFO mapred.JobClient: Job Failed: NA
java.io.IOException: Job failed!
at org.apache.hadoop.mapred.JobClient.runJob(JobClient.java:1265)
at uni.kassel.macek.rtprep.RetweetApplication.main(RetweetApplication.java:50)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:616)
at org.apache.hadoop.util.RunJar.main(RunJar.java:156)
Since this stacktrace now shows me, that the port 9001 is used during execution, i guess that somehow the xml-configuration file overwrites the local-java-made setting (which i use for testing), which is strange since i read over and over on the internet, that java overwrites xml configuration. If nobody knows how to correct this, ill try to simply erase all configuration-xmls. Perhaps this solves the problem...
NEW UPDATE
Renaming Hadoops conf
folder solved the problem of the waiting copier and the program is executed til the end. Sadly the execution doesnt wait anymore for my debugger although HADOOP_OPTS
is set correctly.
RESUME:Its only a configuration issue: XML may (for some configuration parameters) overwrite JAVA. If somebody knew how i can get debugging to run again, it would be perfect, but for now im just glad i dont see this stacktrace anymore! ;)
Thank you Chris for your time and effords!
解决方案 Sorry i didn't see this before, but you appear to have two important configuration properties set to final in your conf xml files, as denoted by the following log statements:
This means that your job is unable to actually run in local mode, it starts in local mode, but the reducer reads the serialized job configuration and determines it is not in local mode, and tried to fetch map outputs via the task tracker ports.
You said your fix was to rename the conf folder - this will default hadoop back to the default configuration, where these two properties are not marked as 'final'
这篇关于Hadoop - Reducer正在等待Mapper输入?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持!