开始运行hama BSP作业时遇到以下问题。当Hama在实际运行我自己的代码之前尝试加载和分区输入数据时,将发生此异常。这是一些网站中讨论的已知问题,但不幸的是,没有已知原因(例如,参见here)。

当我仅运行部分数据集时,我的BSP作业可以正常运行。但是,当我运行完整的数据集时,会出现问题:(

我能知道如何解决或避免这个问题吗?

13/11/18 01:19:30 INFO bsp.FileInputFormat: Total input paths to process : 32
13/11/18 01:19:30 INFO bsp.FileInputFormat: Total input paths to process : 32
13/11/18 01:19:30 INFO bsp.BSPJobClient: Running job: job_201311180115_0002
13/11/18 01:19:33 INFO bsp.BSPJobClient: Current supersteps number: 0
13/11/18 01:19:33 INFO bsp.BSPJobClient: Job failed.
13/11/18 01:19:33 ERROR bsp.BSPJobClient: Error partitioning the input path.
java.io.IOException: Runtime partition failed for the job.
    at org.apache.hama.bsp.BSPJobClient.partition(BSPJobClient.java:465)
    at org.apache.hama.bsp.BSPJobClient.submitJobInternal(BSPJobClient.java:333)
    at org.apache.hama.bsp.BSPJobClient.submitJob(BSPJobClient.java:293)
    at org.apache.hama.bsp.BSPJob.submit(BSPJob.java:228)
    at org.apache.hama.bsp.BSPJob.waitForCompletion(BSPJob.java:235)
    at edu.wisc.cs.db.opener.hama.ConnectedEntityBspDriver.main(ConnectedEntityBspDriver.java:183)
    at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
    at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)
    at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
    at java.lang.reflect.Method.invoke(Method.java:597)
    at org.apache.hama.util.RunJar.main(RunJar.java:146)

最佳答案

在这个问题上停留了几个小时之后,我发现一旦输入文件的数量大于允许的bsp任务的数量,就会发生此错误。我认为Hama将来应该修复此错误。

快速解决此问题的方法是增加最大bsp任务的数量,该任务由bsp.tasks.maximum文件中的变量hama-site.xml指定。例如,以下代码使用10而不是默认设置3:

<property>
  <name>bsp.tasks.maximum</name>
  <value>10</value>
  <description>The maximum number of BSP tasks that will be run simultaneously
  by a groom server.</description>
</property>

关于java - Hama BSP中此作业的运行时分区失败,我们在Stack Overflow上找到一个类似的问题:https://stackoverflow.com/questions/20042399/

10-12 19:06