问题描述
很长时间以来我一直在困扰这个问题。我尝试在distibuted节点中运行某些东西。
我有2个datanode和masternode和jobtracker。
我在每个节点的tasktracker.log中都收到以下错误:
<
2012-01-03 08:48:30,910 WARN mortbay.log - / mapOutput:org.apache.hadoop.util.DiskChecker $ DiskErrorException:找不到taskTracker / jobcache / job_201201031846_0001 / attempt_201201031846_0001_m_000000_1 / output / file.out .index在任何配置的本地目录中
2012-01-03 08:48:40,927警告mapred.TaskTracker - getMapOutput(attempt_201201031846_0001_m_000000_2,0)失败:
org.apache.hadoop.util.DiskChecker $ DiskErrorException:无法在任何配置的本地目录中找到taskTracker / jobcache / job_201201031846_0001 / attempt_201201031846_0001 / attempt_201201031846_0001_m_000000_2 / output / file.out.index
at org.apache.hadoop.fs.LocalDirAllocator $ AllocatorPerContext.getLocalPathToRead(LocalDirAllocator.java: 389)
at org.apache.hadoop.fs.LocalDirAllocator.getLocalPathToRead(LocalDirAllocator.java:138)
at org.apache.hadoop.mapred.TaskTracker $ MapOutputServlet.doGet(TaskTracker.java:2887)
位于javax.servlet.http.HttpServlet.service(HttpServlet.java:707)
at javax.servlet.http.HttpServlet.service(HttpServlet.java:820)
at org.mortbay.jetty.servlet.ServletHolder.handle(ServletHolder.java:502)
at org.mortbay .jetty.servlet.ServletHandler.handle(ServletHandler.java:363)
at org.mortbay.jetty.security.SecurityHandler.handle(SecurityHandler.java:216)
at org.mortbay.jetty.servlet .SessionHandler.handle(SessionHandler.java:181)
at org.mortbay.jetty.handler.ContextHandler.handle(ContextHandler.java:766)
at org.mortbay.jetty.webapp.WebAppContext.handle (WebAppContext.java:417)
at org.mortbay.jetty.handler.ContextHandlerCollection.handle(ContextHandlerCollection.java:230)
at org.mortbay.jetty.handler.HandlerWrapper.handle(HandlerWrapper.java :152)
at org.mortbay.jetty.Server.handle(Server.java:324)
at org.mortbay.jetty.HttpConnection.handleRequest(HttpConnection.java:534)
at org.mortbay.jetty.HttpConnection $ RequestHandler.headerComplete(HttpConnect
at org.mortbay.jetty.HttpParser.parseNext(HttpParser.java:533)
at org.mortbay.jetty.HttpParser.parseAvailable(HttpParser.java:207)
at org.mortbay.jetty.HttpConnection.handle(HttpConnection.java:403)
at org.mortbay.io.nio.SelectChannelEndPoint.run(SelectChannelEndPoint.java:409)
at org.mortbay .thread.QueuedThreadPool $ PoolThread.run(QueuedThreadPool.java:522)
>
以及奴隶的hadoop.log中的这个错误:
2012-01-03 10:20:36,732警告mapred.ReduceTask - attempt_201201031954_0006_r_000001_0将主机localhost添加到惩罚框中,下一次联系在4秒内
2012-01- 03 10:20:41,738警告mapred.ReduceTask - attempt_201201031954_0006_r_000001_0复制失败:attempt_201201031954_0006_m_000001_2 from localhost
2012-01-03 10:20:41,738警告mapred.ReduceTask - java.io.FileNotFoundException:http:// localhost:50060 / mapOutput?job = job_201201031954_0006& map = attempt_201201031954_0006_m_000001_2& reduce = 1
at sun.reflect.GeneratedConstructorAccessor6.newInstance(Unknown Source)
at sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:27)
at java.lang.reflect.Constructor.newInstance(Constructor.java:513)
at sun.net.www.protocol.http.HttpURLConnection $ 6.run(HttpURLConnection.java:1491)
在java.securi ty.AccessController.doPrivileged(本地方法)
在sun.net.www.protocol.http.HttpURLConnection.getChainedException(HttpURLConnection.java:1485)
在sun.net.www.protocol.http.HttpURLConnection .getInputStream(HttpURLConnection.java:1139)
at org.apache.hadoop.mapred.ReduceTask $ ReduceCopier $ MapOutputCopier.getInputStream(ReduceTask.java:1447)
at org.apache.hadoop.mapred.ReduceTask $ ReduceCopier $ MapOutputCopier.getMapOutput(ReduceTask.java:1349)
at org.apache.hadoop.mapred.ReduceTask $ ReduceCopier $ MapOutputCopier.copyOutput(ReduceTask.java:1261)
at org.apache.hadoop .mapred.ReduceTask $ ReduceCopier $ MapOutputCopier.run(ReduceTask.java:1195)
导致:java.io.FileNotFoundException:http:// localhost:50060 / mapOutput?job = job_201201031954_0006& map = attempt_201201031954_0006_m_000001_2& reduce = 1
at sun.net.www.protocol.http.HttpURLConnection.getInputStream(HttpURLConnection.java:1434)
... 4 more
2012-01-03 10 :20:41,739 WARR mapred.ReduceTask - attempt_201201031954_0006_r_000001_0将主机本地主机添加到惩罚框中,在4秒内接触下一个
2012-01-03 10:20:46,761警告mapred.ReduceTask - attempt_201201031954_0006_r_000001_0复制失败:attempt_201201031954_0006_m_000000_3 from localhost
2012-01-03 10:20:46,762警告mapred.ReduceTask - java.io.FileNotFoundException:http:// localhost:50060 / mapOutput?job = job_201201031954_0006& map = attempt_201201031954_0006_m_000000_3& reduce = 1
at sun .reflect.GeneratedConstructorAccessor6.newInstance(Unknown Source)
at sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:27)
at java.lang.reflect.Constructor.newInstance(Constructor.java:513)
at sun.net.www.protocol.http.HttpURLConnection $ 6.run(HttpURLConnection.java:1491)
at java.security.AccessController.doPrivileged(Native Method)
at sun.net .www.protocol.http.HttpURLConnection.getChainedExc eption(HttpURLConnection.java:1485)
at sun.net.www.protocol.http.HttpURLConnection.getInputStream(HttpURLConnection.java:1139)
at org.apache.hadoop.mapred.ReduceTask $ ReduceCopier $ MapOutputCopier.getInputStream(ReduceTask.java:1447)
at org.apache.hadoop.mapred.ReduceTask $ ReduceCopier $ MapOutputCopier.getMapOutput(ReduceTask.java:1349)
at org.apache.hadoop.mapred。 ReduceTask $ ReduceCopier $ MapOutputCopier.copyOutput(ReduceTask.java:1261)
at org.apache.hadoop.mapred.ReduceTask $ ReduceCopier $ MapOutputCopier.run(ReduceTask.java:1195)
引起:java。 io.FileNotFoundException:http:// localhost:50060 / mapOutput?job = job_201201031954_0006& map = attempt_201201031954_0006_m_000000_3& reduce = 1
at sun.net.www.protocol.http.HttpURLConnection.getInputStream(HttpURLConnection.java:1434)
... 4 more
这是我的配置:
mapred-site:
<属性>
<名称> mapred.job.tracker< / name>
< value> 10.20.1.112:9001< / value>
< description> MapReduce作业跟踪器运行
at的主机和端口。< / description>
< / property>
<属性>
<名称> mapred.map.tasks< / name>
<值> 2< /值>
< description>
将mapred.map任务定义为从属主机的数量
< / description>
< / property>
<属性>
<名称> mapred.reduce.tasks< / name>
<值> 2< /值>
< description>
将mapred.reduce任务定义为从属主机数量
< / property>
<属性>
<名称> mapred.system.dir< /名称>
<值> filesystem / mapreduce / system< / value>
< / property>
<属性>
<名称> mapred.local.dir< / name>
<值> filesystem / mapreduce / local< / value>
< / property>
<属性>
<名称> mapred.submit.replication< / name>
<值> 2< /值>
< / property>
<属性>
< name> hadoop.tmp.dir< / name>
<值> tmp< /值>
< / property>
<属性>
<名称> mapred.child.java.opts< / name>
<值> -Xmx2048m< /值>
< / property>
核心站点:
<性>
<名称> fs.default.name< /名称>
<值> hdfs://10.20.1.112:9000< /值>
< description>默认文件系统的名称。一个URI,其
模式和权限决定了FileSystem的实现。
< / description>
< / property>
我试过玩tmp dir - 没有帮助。
我尝试过使用mapred.local.dir - 没有帮助。
我也厌倦了在运行时查看文件系统目录中的内容。
我发现路径:taskTracker / jobcache / job_201201031846_0001 / attempt_201201031846_0001_m_000000_1 /
存在,但它没有输出文件夹。
任何想法?
谢谢。
在这里,我想问题是:您的tasktracker想要询问master的映射输出,所以它应该是: p>
http://10.20.1.112:50060/mapOutput?job=job_201201031954_0006&map=attempt_201201031954_0006_m_000001_2&reduce=1
但是在你的tasknode中,它试图从
http:// localhost:50060 / mapOutput?job = job_201201031954_0006& map = attempt_201201031954_0006_m_000001_2& reduce = 1
所以问题发生了,主要问题不是hadoop.tmp.dir,mapred.system.dir和mapred.local.dir,我也面临这个问题,我解决了通过删除master / etc / hosts中的127.0.0.1 localhost问题,也许你可以试试它!
编辑
总之,请转到导致错误的节点文件结构中的 etc / hosts
文件,并删除l ine 127.0.0.1 localhost
I'm stuck on this problem for a very long time.I try to run something in distibuted node.I have 2 datanodes and a master with namenode and jobtracker.I keep getting the following error in tasktracker.log of each of the nodes
<
2012-01-03 08:48:30,910 WARN mortbay.log - /mapOutput: org.apache.hadoop.util.DiskChecker$DiskErrorException: Could not find taskTracker/jobcache/job_201201031846_0001/attempt_201201031846_0001_m_000000_1/output/file.out.index in any of the configured local directories
2012-01-03 08:48:40,927 WARN mapred.TaskTracker - getMapOutput(attempt_201201031846_0001_m_000000_2,0) failed :
org.apache.hadoop.util.DiskChecker$DiskErrorException: Could not find taskTracker/jobcache/job_201201031846_0001/attempt_201201031846_0001_m_000000_2/output/file.out.index in any of the configured local directories
at org.apache.hadoop.fs.LocalDirAllocator$AllocatorPerContext.getLocalPathToRead(LocalDirAllocator.java:389)
at org.apache.hadoop.fs.LocalDirAllocator.getLocalPathToRead(LocalDirAllocator.java:138)
at org.apache.hadoop.mapred.TaskTracker$MapOutputServlet.doGet(TaskTracker.java:2887)
at javax.servlet.http.HttpServlet.service(HttpServlet.java:707)
at javax.servlet.http.HttpServlet.service(HttpServlet.java:820)
at org.mortbay.jetty.servlet.ServletHolder.handle(ServletHolder.java:502)
at org.mortbay.jetty.servlet.ServletHandler.handle(ServletHandler.java:363)
at org.mortbay.jetty.security.SecurityHandler.handle(SecurityHandler.java:216)
at org.mortbay.jetty.servlet.SessionHandler.handle(SessionHandler.java:181)
at org.mortbay.jetty.handler.ContextHandler.handle(ContextHandler.java:766)
at org.mortbay.jetty.webapp.WebAppContext.handle(WebAppContext.java:417)
at org.mortbay.jetty.handler.ContextHandlerCollection.handle(ContextHandlerCollection.java:230)
at org.mortbay.jetty.handler.HandlerWrapper.handle(HandlerWrapper.java:152)
at org.mortbay.jetty.Server.handle(Server.java:324)
at org.mortbay.jetty.HttpConnection.handleRequest(HttpConnection.java:534)
at org.mortbay.jetty.HttpConnection$RequestHandler.headerComplete(HttpConnection.java:864)
at org.mortbay.jetty.HttpParser.parseNext(HttpParser.java:533)
at org.mortbay.jetty.HttpParser.parseAvailable(HttpParser.java:207)
at org.mortbay.jetty.HttpConnection.handle(HttpConnection.java:403)
at org.mortbay.io.nio.SelectChannelEndPoint.run(SelectChannelEndPoint.java:409)
at org.mortbay.thread.QueuedThreadPool$PoolThread.run(QueuedThreadPool.java:522)
>
and this error in hadoop.log of the slave:
2012-01-03 10:20:36,732 WARN mapred.ReduceTask - attempt_201201031954_0006_r_000001_0 adding host localhost to penalty box, next contact in 4 seconds
2012-01-03 10:20:41,738 WARN mapred.ReduceTask - attempt_201201031954_0006_r_000001_0 copy failed: attempt_201201031954_0006_m_000001_2 from localhost
2012-01-03 10:20:41,738 WARN mapred.ReduceTask - java.io.FileNotFoundException: http://localhost:50060/mapOutput?job=job_201201031954_0006&map=attempt_201201031954_0006_m_000001_2&reduce=1
at sun.reflect.GeneratedConstructorAccessor6.newInstance(Unknown Source)
at sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:27)
at java.lang.reflect.Constructor.newInstance(Constructor.java:513)
at sun.net.www.protocol.http.HttpURLConnection$6.run(HttpURLConnection.java:1491)
at java.security.AccessController.doPrivileged(Native Method)
at sun.net.www.protocol.http.HttpURLConnection.getChainedException(HttpURLConnection.java:1485)
at sun.net.www.protocol.http.HttpURLConnection.getInputStream(HttpURLConnection.java:1139)
at org.apache.hadoop.mapred.ReduceTask$ReduceCopier$MapOutputCopier.getInputStream(ReduceTask.java:1447)
at org.apache.hadoop.mapred.ReduceTask$ReduceCopier$MapOutputCopier.getMapOutput(ReduceTask.java:1349)
at org.apache.hadoop.mapred.ReduceTask$ReduceCopier$MapOutputCopier.copyOutput(ReduceTask.java:1261)
at org.apache.hadoop.mapred.ReduceTask$ReduceCopier$MapOutputCopier.run(ReduceTask.java:1195)
Caused by: java.io.FileNotFoundException: http://localhost:50060/mapOutput?job=job_201201031954_0006&map=attempt_201201031954_0006_m_000001_2&reduce=1
at sun.net.www.protocol.http.HttpURLConnection.getInputStream(HttpURLConnection.java:1434)
... 4 more
2012-01-03 10:20:41,739 WARN mapred.ReduceTask - attempt_201201031954_0006_r_000001_0 adding host localhost to penalty box, next contact in 4 seconds
2012-01-03 10:20:46,761 WARN mapred.ReduceTask - attempt_201201031954_0006_r_000001_0 copy failed: attempt_201201031954_0006_m_000000_3 from localhost
2012-01-03 10:20:46,762 WARN mapred.ReduceTask - java.io.FileNotFoundException: http://localhost:50060/mapOutput?job=job_201201031954_0006&map=attempt_201201031954_0006_m_000000_3&reduce=1
at sun.reflect.GeneratedConstructorAccessor6.newInstance(Unknown Source)
at sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:27)
at java.lang.reflect.Constructor.newInstance(Constructor.java:513)
at sun.net.www.protocol.http.HttpURLConnection$6.run(HttpURLConnection.java:1491)
at java.security.AccessController.doPrivileged(Native Method)
at sun.net.www.protocol.http.HttpURLConnection.getChainedException(HttpURLConnection.java:1485)
at sun.net.www.protocol.http.HttpURLConnection.getInputStream(HttpURLConnection.java:1139)
at org.apache.hadoop.mapred.ReduceTask$ReduceCopier$MapOutputCopier.getInputStream(ReduceTask.java:1447)
at org.apache.hadoop.mapred.ReduceTask$ReduceCopier$MapOutputCopier.getMapOutput(ReduceTask.java:1349)
at org.apache.hadoop.mapred.ReduceTask$ReduceCopier$MapOutputCopier.copyOutput(ReduceTask.java:1261)
at org.apache.hadoop.mapred.ReduceTask$ReduceCopier$MapOutputCopier.run(ReduceTask.java:1195)
Caused by: java.io.FileNotFoundException: http://localhost:50060/mapOutput?job=job_201201031954_0006&map=attempt_201201031954_0006_m_000000_3&reduce=1
at sun.net.www.protocol.http.HttpURLConnection.getInputStream(HttpURLConnection.java:1434)
... 4 more
This is my configuration:
mapred-site:
<property>
<name>mapred.job.tracker</name>
<value>10.20.1.112:9001</value>
<description>The host and port that the MapReduce job tracker runs
at.</description>
</property>
<property>
<name>mapred.map.tasks</name>
<value>2</value>
<description>
define mapred.map tasks to be number of slave hosts
</description>
</property>
<property>
<name>mapred.reduce.tasks</name>
<value>2</value>
<description>
define mapred.reduce tasks to be number of slave hosts
</description>
</property>
<property>
<name>mapred.system.dir</name>
<value>filesystem/mapreduce/system</value>
</property>
<property>
<name>mapred.local.dir</name>
<value>filesystem/mapreduce/local</value>
</property>
<property>
<name>mapred.submit.replication</name>
<value>2</value>
</property>
<property>
<name>hadoop.tmp.dir</name>
<value>tmp</value>
</property>
<property>
<name>mapred.child.java.opts</name>
<value>-Xmx2048m</value>
</property>
core-site:
<property>
<name>fs.default.name</name>
<value>hdfs://10.20.1.112:9000</value>
<description>The name of the default file system. A URI whose
scheme and authority determine the FileSystem implementation.
</description>
</property>
I've tried playing with tmp dir - didnt help.I've tried playing with mapred.local.dir - didn't help.
I also tired to see what is in the filesystem dir during runtime.I found that the path : taskTracker/jobcache/job_201201031846_0001/attempt_201201031846_0001_m_000000_1/exists, but it doesn't have output folder in it.
any idea?
thanks.
Here I think the question is: Your tasktracker wants to ask the map output from master, so it should be:
http://10.20.1.112:50060/mapOutput?job=job_201201031954_0006&map=attempt_201201031954_0006_m_000001_2&reduce=1
but in your tasknode, it tried to get it from
http://localhost:50060/mapOutput?job=job_201201031954_0006&map=attempt_201201031954_0006_m_000001_2&reduce=1
so the problem occurs, and the main problem is not hadoop.tmp.dir, mapred.system.dir and mapred.local.dir, I'm facing this problem too, and I resolved the problem by deleting the "127.0.0.1 localhost" in /etc/hosts of master, maybe you can try it!
EDIT
In summary, go to the etc/hosts
file in the file structure of the node that's causing the error and remove the line 127.0.0.1 localhost
这篇关于继续运行hadoop分布式模式失败的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持!