问题描述
我使用命令 hadoop jar< jar>运行了MapReduce程序.[mainClass]路径/到/输入路径/到/输出
.但是,我的工作是挂在: INFO mapreduce.Job:地图100%减少29%
.
很久以后,我终止并检查了datanode日志(我正在伪分布式模式下运行).它包含以下异常:
java.io.IOException:inputStream中的过早EOF在org.apache.hadoop.io.IOUtils.readFully(IOUtils.java:201)在org.apache.hadoop.hdfs.protocol.datatransfer.PacketReceiver.doReadFully(PacketReceiver.java:213)在org.apache.hadoop.hdfs.protocol.datatransfer.PacketReceiver.doRead(PacketReceiver.java:134)在org.apache.hadoop.hdfs.protocol.datatransfer.PacketReceiver.receiveNextPacket(PacketReceiver.java:109)在org.apache.hadoop.hdfs.server.datanode.BlockReceiver.receivePacket(BlockReceiver.java:472)在org.apache.hadoop.hdfs.server.datanode.BlockReceiver.receiveBlock(BlockReceiver.java:849)在org.apache.hadoop.hdfs.server.datanode.DataXceiver.writeBlock(DataXceiver.java:804)在org.apache.hadoop.hdfs.protocol.datatransfer.Receiver.opWriteBlock(Receiver.java:137)在org.apache.hadoop.hdfs.protocol.datatransfer.Receiver.processOp(Receiver.java:74)在org.apache.hadoop.hdfs.server.datanode.DataXceiver.run(DataXceiver.java:251)在java.lang.Thread.run(Thread.java:745)
在日志中
5秒后发生了 ERROR DataXceiver错误处理WRITE_BLOCK操作
.
什么问题可能导致此异常和错误?
我的NodeHealthReport说:
1/1本地目录是错误的:/home/$ USER/hadoop/nm-local-dir;1/1日志目录不正确:/home/$USER/hadoop-2.7.1/logs/userlogs
我发现此,表明可能需要增加 dfs.datanode.max.xcievers
.但是,它已被弃用,新属性称为 dfs.datanode.max.transfer.threads
,默认值为4096.如果更改此设置可以解决我的问题,我应该将其设置为什么新值?/p>
this 表示可能需要增加datanode的 ulimit
.我的 ulimit -n
(打开的文件)是1024.如果增加此分辨率可以解决我的问题,我应该将其设置为什么?
由于多种原因,可能会发生过早的EOF,其中之一就是使用FileOutputCommitter生成了大量线程,以将它们写入一个reducer节点上的磁盘.MultipleOutputs类允许您使用自定义名称写入文件,并完成该操作,它为每个文件生成一个线程,并绑定一个端口以写入磁盘.现在,这限制了可以在一个reducer节点上写入的文件数量.当文件数大约在一个reducer节点上越过12000时,由于线程被杀死并且_temporary文件夹被删除,导致大量这些异常消息,我遇到了此错误.我的猜测是-这不是内存超调问题,也不能通过允许hadoop引擎生成更多线程来解决.减少一次在一个节点上写入的文件数解决了我的问题-通过减少实际写入的文件数或通过增加reducer节点.
I ran a MapReduce program using the command hadoop jar <jar> [mainClass] path/to/input path/to/output
. However, my job was hanging at: INFO mapreduce.Job: map 100% reduce 29%
.
Much later, I terminated and checked the datanode log (I am running in pseudo-distributed mode). It contained the following exception:
java.io.IOException: Premature EOF from inputStream
at org.apache.hadoop.io.IOUtils.readFully(IOUtils.java:201)
at org.apache.hadoop.hdfs.protocol.datatransfer.PacketReceiver.doReadFully(PacketReceiver.java:213)
at org.apache.hadoop.hdfs.protocol.datatransfer.PacketReceiver.doRead(PacketReceiver.java:134)
at org.apache.hadoop.hdfs.protocol.datatransfer.PacketReceiver.receiveNextPacket(PacketReceiver.java:109)
at org.apache.hadoop.hdfs.server.datanode.BlockReceiver.receivePacket(BlockReceiver.java:472)
at org.apache.hadoop.hdfs.server.datanode.BlockReceiver.receiveBlock(BlockReceiver.java:849)
at org.apache.hadoop.hdfs.server.datanode.DataXceiver.writeBlock(DataXceiver.java:804)
at org.apache.hadoop.hdfs.protocol.datatransfer.Receiver.opWriteBlock(Receiver.java:137)
at org.apache.hadoop.hdfs.protocol.datatransfer.Receiver.processOp(Receiver.java:74)
at org.apache.hadoop.hdfs.server.datanode.DataXceiver.run(DataXceiver.java:251)
at java.lang.Thread.run(Thread.java:745)
5 seconds later in the log was ERROR DataXceiver error processing WRITE_BLOCK operation
.
What problem might be causing this exception and error?
My NodeHealthReport said:
1/1 local-dirs are bad: /home/$USER/hadoop/nm-local-dir;
1/1 log-dirs are bad: /home/$USER/hadoop-2.7.1/logs/userlogs
I found this which indicates that dfs.datanode.max.xcievers
may need to be increased. However, it is deprecated and the new property is called dfs.datanode.max.transfer.threads
with default value 4096. If changing this would fix my problem, what new value should I set it to?
This indicates that the ulimit
for the datanode may need to be increased. My ulimit -n
(open files) is 1024. If increasing this would fix my problem, what should I set it to?
Premature EOF can occur due to multiple reasons, one of which is spawning of huge number of threads to write to disk on one reducer node using FileOutputCommitter. MultipleOutputs class allows you to write to files with custom names and to accomplish that, it spawns one thread per file and binds a port to it to write to the disk. Now this puts a limitation on the number of files that could be written to at one reducer node. I encountered this error when the number of files crossed 12000 roughly on one reducer node, as the threads got killed and the _temporary folder got deleted leading to plethora of these exception messages. My guess is - this is not a memory overshoot issue, nor it could be solved by allowing hadoop engine to spawn more threads. Reducing the number of files being written at one time at one node solved my problem - either by reducing the actual number of files being written, or by increasing reducer nodes.
这篇关于由于inputStream的EOF过早导致Hadoop MapReduce作业I/O异常的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持!