我尝试运行最新版本的apache giraph示例,在快速入门页面(http://giraph.apache.org/quick_start.html)上进行说明。我使用CDH 4.4.0(Hadoop的Cloudera发行版)
我已经将Giraph的依赖关系更新为CDH 4.4.0。一切顺利
当我运行示例时,得到以下输出
-bash-4.1$ hadoop jar /usr/local/giraph/giraph-examples/target/giraph-examples-1.1.0- SNAPSHOT-for-hadoop-2.0.0-cdh4.4.0-jar-with-dependencies.jar
org.apache.giraph.GiraphRunner
org.apache.giraph.examples.SimpleShortestPathsComputation
-vif org.apache.giraph.io.formats.JsonLongDoubleFloatDoubleVertexInputFormat
-vip /user/hdfs/input/tiny_graph.txt
-vof org.apache.giraph.io.formats.IdWithValueTextOutputFormat
-op /user/hdfs/output/shortestpaths -w 1
13/10/02 18:31:58 INFO utils.ConfigurationUtils: No edge input format specified. Ensure your InputFormat does not require one.
13/10/02 18:31:58 INFO utils.ConfigurationUtils: No edge output format specified. Ensure your OutputFormat does not require one.
13/10/02 18:31:58 INFO job.GiraphJob: run: Since checkpointing is disabled (default), do not allow any task retries (setting mapred.map.max.attempts = 0, old value = 4)
13/10/02 18:31:58 WARN mapred.JobClient: Use GenericOptionsParser for parsing the arguments. Applications should implement Tool for the same.
13/10/02 18:32:00 INFO job.GiraphJob: run: Tracking URL: http://hadoop57:50030/jobdetails.jsp?jobid=job_201310021452_0015
13/10/02 18:32:22 INFO mapred.JobClient: Running job: job_201310021452_0015
13/10/02 18:32:22 INFO mapred.JobClient: Job complete: job_201310021452_0015
13/10/02 18:32:22 INFO mapred.JobClient: Counters: 6
13/10/02 18:32:22 INFO mapred.JobClient: Job Counters
13/10/02 18:32:22 INFO mapred.JobClient: Failed map tasks=1
13/10/02 18:32:22 INFO mapred.JobClient: Launched map tasks=2
13/10/02 18:32:22 INFO mapred.JobClient: Total time spent by all maps in occupied slots (ms)=29054
13/10/02 18:32:22 INFO mapred.JobClient: Total time spent by all reduces in occupied slots (ms)=0
13/10/02 18:32:22 INFO mapred.JobClient: Total time spent by all maps waiting after reserving slots (ms)=0
13/10/02 18:32:22 INFO mapred.JobClient: Total time spent by all reduces waiting after reserving slots (ms)=0
并且作业日志显示异常:
java.lang.IllegalStateException: run: Caught an unrecoverable exception
java.io.FileNotFoundException: File
_bsp/_defaultZkManagerDir/job_201310021452_0015/_zkServer does not exist.
at org.apache.giraph.graph.GraphMapper.run(GraphMapper.java:101)
at org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:672)
at org.apache.hadoop.mapred.MapTask.run(MapTask.java:330)
at org.apache.hadoop.mapred.Child$4.run(Child.java:268)
at java.security.AccessController.doPrivileged(Native Method)
at javax.security.auth.Subject.doAs(Subject.java:396)
at
org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1408)
at org.apache.hadoop.mapred.Child.main(Child.java:262)
Caused by: java.lang.RuntimeException: java.io.FileNotFoundException: File
_bsp/_defaultZkManagerDir/job_201310021452_0015/_zkServer does not exist.
at org.apache.giraph.zk.ZooKeeperManager.onlineZooKeeperServers(ZooKeeperManager.java:792)
at org.apache.giraph.graph.GraphTaskManager.startZooKeeperManager(GraphTaskManager.java
有时创建
_bsp/_defaultZkManagerDir/job_201310021452_0015/_zkServer
文件,有时不创建。您能否提供任何提示从哪里开始寻找这个问题。
BR
康拉德
最佳答案
看起来Giraph正在开始它自己的动物园管理员 session 。只需尝试将以下内容作为VM参数传递给GiraphRunner。
-Dgiraph.zkList=<zookeeper server address>:<port>
例如
-Dgiraph.zkList=localhost:2181
您的命令将如下所示:
-bash-4.1$ hadoop jar /usr/local/giraph/giraph-examples/target/giraph-examples-1.1.0- SNAPSHOT-for-hadoop-2.0.0-cdh4.4.0-jar-with-dependencies.jar
org.apache.giraph.GiraphRunner
org.apache.giraph.examples.SimpleShortestPathsComputation
-Dgiraph.zkList=localhost:2181
-vif org.apache.giraph.io.formats.JsonLongDoubleFloatDoubleVertexInputFormat
-vip /user/hdfs/input/tiny_graph.txt
-vof org.apache.giraph.io.formats.IdWithValueTextOutputFormat
-op /user/hdfs/output/shortestpaths -w 1
好运..!!
关于apache - Apache Giraph无法在CDH4.4.0上运行,我们在Stack Overflow上找到一个类似的问题:https://stackoverflow.com/questions/19142648/