(这是我就此事先前提出的问题进行讨论的后续行动)

我遵循these指令设置了一个小型Hadoop集群,但使用的是Hadoop 2.7.4版。群集似乎工作正常,但是我无法运行mapreduce作业。特别是在尝试以下操作时

$HADOOP_HOME/share/hadoop/mapreduce/hadoop-mapreduce-examples-2.7.4.jar randomwriter outdenter code here

作业打印
17/11/27 16:35:21 INFO client.RMProxy: Connecting to ResourceManager at
ec2-yyy.eu-central-
1.compute.amazonaws.com/xxx:8032
Running 0 maps.

Job started: Mon Nov 27 16:35:22 UTC 2017

17/11/27 16:35:22 INFO client.RMProxy: Connecting to ResourceManager at
ec2-yyy.eu-central-
1.compute.amazonaws.com/xxx:8032


17/11/27 16:35:22 INFO mapreduce.JobSubmitter: number of splits:0

17/11/27 16:35:22 INFO mapreduce.JobSubmitter: Submitting tokens for
job: job_1511799491035_0006

17/11/27 16:35:22 INFO impl.YarnClientImpl: Submitted application
application_1511799491035_0006

17/11/27 16:35:22 INFO mapreduce.Job: The url to track the job:
http://ec2-yyy.eu-central-
1.compute.amazonaws.com:8088/proxy/application_1511799491035_0006/

17/11/27 16:35:22 INFO mapreduce.Job: Running job:
job_1511799491035_0006

永远不会超越这种状态。

在工作追踪器中,它说
ACCEPTED: waiting for AM container to be allocated, launched and
register with RM.

然后,我查看了找到的日志文件
2017-11-27 13:50:29,202 INFO org.apache.hadoop.conf.Configuration: found resource capacity-scheduler.xml at file:/usr/local/hadoop/etc/hadoop/capacity-scheduler.xml
2017-11-27 13:50:29,252 INFO org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.CapacitySchedulerConfiguration: max alloc mb per queue for root is undefined
2017-11-27 13:50:29,252 INFO org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.CapacitySchedulerConfiguration: max alloc vcore per queue for root is undefined
2017-11-27 13:50:29,256 INFO org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.ParentQueue: root, capacity=1.0, asboluteCapacity=1.0, maxCapacity=1.0, asboluteMaxCapacity=1.0, state=RUNNING, acls=ADMINISTER_QUEUE:*SUBMIT_APP:*, labels=*, reservationsContinueLooking=true
2017-11-27 13:50:29,256 INFO org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.ParentQueue: Initialized parent-queue root name=root, fullname=root
2017-11-27 13:50:29,265 INFO org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.CapacitySchedulerConfiguration: max alloc mb per queue for root.default is undefined
2017-11-27 13:50:29,265 INFO org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.CapacitySchedulerConfiguration: max alloc vcore per queue for root.default is undefined

这表明容量调度程序存在问题。文件capacity-scheduler.xml如下所示:
<configuration>

  <property>
    <name>yarn.scheduler.capacity.maximum-applications</name>
    <value>10000</value>
    <description>
      Maximum number of applications that can be pending and running.
    </description>
  </property>

  <property>
    <name>yarn.scheduler.capacity.maximum-am-resource-percent</name>
    <value>0.1</value>
    <description>
      Maximum percent of resources in the cluster which can be used to run
      application masters i.e. controls number of concurrent running
      applications.
    </description>
  </property>

  <property>
    <name>yarn.scheduler.capacity.resource-calculator</name>
    <value>org.apache.hadoop.yarn.util.resource.DefaultResourceCalculator</value>
    <description>
      The ResourceCalculator implementation to be used to compare
      Resources in the scheduler.
      The default i.e. DefaultResourceCalculator only uses Memory while
      DominantResourceCalculator uses dominant-resource to compare
      multi-dimensional resources such as Memory, CPU etc.
    </description>
  </property>

  <property>
    <name>yarn.scheduler.capacity.root.queues</name>
    <value>default</value>
    <description>
      The queues at the this level (root is the root queue).
    </description>
  </property>

  <property>
    <name>yarn.scheduler.capacity.root.default.capacity</name>
    <value>100</value>
    <description>Default queue target capacity.</description>
  </property>

  <property>
    <name>yarn.scheduler.capacity.root.default.user-limit-factor</name>
    <value>1</value>
    <description>
      Default queue user limit a percentage from 0.0 to 1.0.
    </description>
  </property>

  <property>
    <name>yarn.scheduler.capacity.root.default.maximum-capacity</name>
    <value>100</value>
    <description>
      The maximum capacity of the default queue.
    </description>
  </property>

  <property>
    <name>yarn.scheduler.capacity.root.default.state</name>
    <value>RUNNING</value>
    <description>
      The state of the default queue. State can be one of RUNNING or STOPPED.
    </description>
  </property>

  <property>
    <name>yarn.scheduler.capacity.root.default.acl_submit_applications</name>
    <value>*</value>
    <description>
      The ACL of who can submit jobs to the default queue.
    </description>
  </property>

  <property>
    <name>yarn.scheduler.capacity.root.default.acl_administer_queue</name>
    <value>*</value>
    <description>
      The ACL of who can administer jobs on the default queue.
    </description>
  </property>

  <property>
    <name>yarn.scheduler.capacity.node-locality-delay</name>
    <value>40</value>
    <description>
      Number of missed scheduling opportunities after which the CapacityScheduler
      attempts to schedule rack-local containers.
      Typically this should be set to number of nodes in the cluster, By default is setting
      approximately number of nodes in one rack which is 40.
    </description>
  </property>

  <property>
    <name>yarn.scheduler.capacity.queue-mappings</name>
    <value></value>
    <description>
      A list of mappings that will be used to assign jobs to queues
      The syntax for this list is [u|g]:[name]:[queue_name][,next mapping]*
      Typically this list will be used to map users to queues,
      for example, u:%user:%user maps all users to queues with the same name
      as the user.
    </description>
  </property>

  <property>
    <name>yarn.scheduler.capacity.queue-mappings-override.enable</name>
    <value>false</value>
    <description>
      If a queue mapping is present, will it override the value specified
      by the user? This can be used by administrators to place jobs in queues
      that are different than the one specified by the user.
      The default is false.
    </description>
  </property>

</configuration>

如果有任何有关如何解决此问题的提示,我将不胜感激。

谢谢
c14

最佳答案

集群配置一切正常,但在作业执行方面,t2.micro实例提供的RAM不足以运行MapReduce作业,因此最好使用更大的实例进行集群创建和作业执行

关于hadoop - hadoop集群未运行map reduce作业-调度程序问题,我们在Stack Overflow上找到一个类似的问题:https://stackoverflow.com/questions/47535019/

10-14 19:16
查看更多