到目前为止,对于此问题,我已经在此处1和此处在2中尝试了的解决方案。但是,尽管这些解决方案确实导致执行mapreduce任务,但随着我得到类似于3的输出,它们似乎仅在名称节点上运行。
基本上,我正在运行具有我自己设计的mapreduce算法的 2节点群集。 mapreduce jar在单个节点群集上完美地执行了,这使我认为我的hadoop多节点配置出现了错误。要设置多节点,我遵循了here 教程。
为了报告发生了什么问题,当我执行程序时(检查了namenode,tasktrackers,jobtrackers和Datanodes是否在各自的节点上运行之后),我的程序在终端中用以下行暂停:INFO mapred.JobClient: map 100% reduce 0%
如果查看任务的日志,我会看到copy failed: attempt... from slave-node
和SocketTimeoutException
。
看一下我的从节点(DataNode)上的日志,显示执行停止在下面的行:TaskTracker: attempt... 0.0% reduce > copy >
正如链接1和2中的解决方案所建议的,从etc/hosts
文件中删除各种ip地址会导致成功执行,但是我最终在从属节点(DataNode)日志中遇到了链接4中的项:INFO org.apache.hadoop.mapred.TaskTracker: Received 'KillJobAction'for job: job_201201301055_0381
WARN org.apache.hadoop.mapred.TaskTracker: Unknown job job_201201301055_0381being deleted.
在我看来,像一样,是新的hadoop用户,但看到这可能是完全正常的。在我看来,这似乎是指向主机文件中的错误IP地址,并且通过删除该IP地址,我只是暂停了从属节点上的执行,然后在namenode上继续处理而不是(根本没有优势)。
总结一下:
编辑为每个节点添加的主机和配置文件
管理员:etc / hosts 127.0.0.1 localhost
127.0.1.1 joseph-Dell-System-XPS-L702X
#The following lines are for hadoop master/slave setup
192.168.1.87 master
192.168.1.74 slave
# The following lines are desirable for IPv6 capable hosts
::1 ip6-localhost ip6-loopback
fe00::0 ip6-localnet
ff00::0 ip6-mcastprefix
ff02::1 ip6-allnodes
ff02::2 ip6-allrouters
从站:etc / hosts 127.0.0.1 localhost
127.0.1.1 joseph-Home # this line was incorrect, it was set as 7.0.1.1
#the following lines are for hadoop mutli-node cluster setup
192.168.1.87 master
192.168.1.74 slave
# The following lines are desirable for IPv6 capable hosts
::1 ip6-localhost ip6-loopback
fe00::0 ip6-localnet
ff00::0 ip6-mcastprefix
ff02::1 ip6-allnodes
ff02::2 ip6-allrouters
主站:core-site.xml <?xml version="1.0"?>
<?xml-stylesheet type="text/xsl" href="configuration.xsl"?>
<!-- Put site-specific property overrides in this file. -->
<configuration>
<property>
<name>hadoop.tmp.dir</name>
<value>/home/hduser/tmp</value>
<description>A base for other temporary directories.</description>
</property>
<property>
<name>fs.default.name</name>
<value>hdfs://master:54310</value>
<description>The name of the default file system. A URI whose
scheme and authority determine the FileSystem implementation. The
uri’s scheme determines the config property (fs.SCHEME.impl) naming
the FileSystem implementation class. The uri’s authority is used to
determine the host, port, etc. for a filesystem.</description>
</property>
</configuration>
从站:core-site.xml <?xml version="1.0"?>
<?xml-stylesheet type="text/xsl" href="configuration.xsl"?>
<!-- Put site-specific property overrides in this file. -->
<configuration>
<property>
<name>hadoop.tmp.dir</name>
<value>/home/hduser/tmp</value>
<description>A base for other temporary directories.</description>
</property>
<property>
<name>fs.default.name</name>
<value>hdfs://master:54310</value>
<description>The name of the default file system. A URI whose
scheme and authority determine the FileSystem implementation. The
uri’s scheme determines the config property (fs.SCHEME.impl) naming
the FileSystem implementation class. The uri’s authority is used to
determine the host, port, etc. for a filesystem.</description>
</property>
</configuration>
主站:hdfs-site.xml <?xml version="1.0"?>
<?xml-stylesheet type="text/xsl" href="configuration.xsl"?>
<!-- Put site-specific property overrides in this file. -->
<configuration>
<property>
<name>dfs.replication</name>
<value>2</value>
<description>Default block replication.
The actual number of replications can be specified when the file is created.
The default is used if replication is not specified in create time.
</description>
</property>
</configuration>
从站:hdfs-site.xml <?xml version="1.0"?>
<?xml-stylesheet type="text/xsl" href="configuration.xsl"?>
<!-- Put site-specific property overrides in this file. -->
<configuration>
<property>
<name>dfs.replication</name>
<value>2</value>
<description>Default block replication.
The actual number of replications can be specified when the file is created.
The default is used if replication is not specified in create time.
</description>
</property>
</configuration>
主站:mapred-site.xml <?xml version="1.0"?>
<?xml-stylesheet type="text/xsl" href="configuration.xsl"?>
<!-- Put site-specific property overrides in this file. -->
<configuration>
<property>
<name>mapred.job.tracker</name>
<value>master:54311</value>
<description>The host and port that the MapReduce job tracker runs
at. If “local”, then jobs are run in-process as a single map
and reduce task.
</description>
</property>
</configuration>
从站:mapre-site.xml <?xml version="1.0"?>
<?xml-stylesheet type="text/xsl" href="configuration.xsl"?>
<!-- Put site-specific property overrides in this file. -->
<configuration>
<property>
<name>mapred.job.tracker</name>
<value>master:54311</value>
<description>The host and port that the MapReduce job tracker runs
at. If “local”, then jobs are run in-process as a single map
and reduce task.
</description>
</property>
</configuration>
最佳答案
错误在etc / hosts中:
在错误运行期间,slave etc / hosts文件如下所示:
127.0.0.1 localhost
7.0.1.1 joseph-Home # THIS LINE IS INCORRECT, IT SHOULD BE 127.0.1.1
#the following lines are for hadoop mutli-node cluster setup
192.168.1.87 master
192.168.1.74 slave
# The following lines are desirable for IPv6 capable hosts
::1 ip6-localhost ip6-loopback
fe00::0 ip6-localnet
ff00::0 ip6-mcastprefix
ff02::1 ip6-allnodes
ff02::2 ip6-allrouters
您可能已经发现,此计算机“joseph-Home”的IP地址配置错误。应将其设置为127.0.1.1时将其设置为7.0.1.1。因此,将第2行从属etc / hosts文件更改为
127.0.1.1 joseph-Home
可以解决此问题,并且我的日志通常显示在从属节点上。新的etc / hosts文件:
127.0.0.1 localhost
127.0.1.1 joseph-Home # THIS LINE IS INCORRECT, IT SHOULD BE 127.0.1.1
#the following lines are for hadoop mutli-node cluster setup
192.168.1.87 master
192.168.1.74 slave
# The following lines are desirable for IPv6 capable hosts
::1 ip6-localhost ip6-loopback
fe00::0 ip6-localnet
ff00::0 ip6-mcastprefix
ff02::1 ip6-allnodes
ff02::2 ip6-allrouters
关于java - Hadoop集群卡死在Reduce>复制>,我们在Stack Overflow上找到一个类似的问题:https://stackoverflow.com/questions/18634825/