assNotFoundException在修改后的SimpleS

assNotFoundException在修改后的SimpleS

本文介绍了ClassNotFoundException在修改后的SimpleShortestPathsVertex上运行GiraphRunner的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧! 问题描述 我对Giraph比较陌生,我试图让我的Giraph edit-compile-deploy循环为我们的代码工作。我能够运行由 http://blog.cloudera.com/blog/2014/02/how-to-write-and-run-giraph-jobs-on-hadoop/ ,但我坚持运行SimpleShortestPathsVertex Giraph示例的修改版本时发生ClassNotFoundException。我已经尝试了各种各样的-libjars和HADOOP_CLASSPATH的组合,但我没有想法,我非常感谢你的帮助。详情如下。 版本 Hadoop:Hadoop 2.0.0-cdh4。 4.0 Giraph:giraph-examples-1.0.0-for-hadoop-2.0.0-alpha-jar -with-dependencies.jar > PageRankBenchmark运行正常 $ hadoop jar $ GIRAPH_HOME / giraph-examples / target / giraph-examples-1.0.0-for-hadoop-2.0.0-alpha-jar -with-dependencies.jar \ org.apache.giraph.benchmark.PageRankBenchmark \ -Dgiraph .zkList =< myhost>:2181 \ -e 1 -s 3 -v -V 50 -w 1 ... 14/08/01 11 :42:44信息mapred.JobClient:作业完成:job_201407291058_0015 ... (全部输出在下面) GiraphRunner SimpleShortestPathsVertex也运行OK $ hadoop jar $ GIRAPH_HOME / giraph-examples / target /giraph-examples-1.0.0-for-hadoop-2.0.0-alpha-jar-with-dependencies.jar \ org.apache.giraph.GiraphRunner \ -Dgiraph.zkList = < myho st>:2181 \ org.apache.giraph.examples.SimpleShortestPathsVertex \ -vif org.apache.giraph.io.formats.JsonLongDoubleFloatDoubleVertexInputFormat \ -vip ginput / tiny_graph。 txt \ -of org.apache.giraph.io.formats.IdWithValueTextOutputFormat \ -op goutput / shortestpathsC2 \ -ca SimpleShortestPathsVertex.source = 2 \ -w 1 ... 14/08/01 11:47:46信息mapred.JobClient:工作完成:job_201407291058_0017 ... (全部输出在下面) 奖励:结果是正确的: $ hadoop fs -cat goutput / shortestpathsC2 / p * 0 1.0 2 2.0 1 0.0 3 1.0 4 5.0 但是我的SimpleShortestPathsVertex的修改版本得到ClassNotFoundException 包含已修改顶点(KdlSimpleShortestPathsVertex,无包)的jar是OK: $ jar -tf 〜/ kdl_hadoop_play .jar META-INF / MANIFEST.MF KdlSimpleShortestPathsVertex.class META-INF / 但是我运行的pukes: $ $ $ $ $ $ $ $ $ $ $ $ $ $ $ $ $ $ $ GIRAPH_HOME / giraph-core / target / giraph- 1.0.0-for-hadoop-2.0.0-alpha-jar -with-dependencies.jar \ org.apache.giraph.GiraphRunner \ -Dgiraph.zkList =< myhost> ;: 2181 \ -libjars〜/ kdl_hadoop_play.jar \ KdlSimpleShortestPathsVertex \ -vif org.apache.giraph.io.formats.JsonLongDoubleFloatDoubleVertexInputFormat \ -vip / user /cornell/ginput/tiny_graph.txt \ -of org.apache.giraph.io.formats.IdWithValueTextOutputFormat \ -op / user / cornell / goutput / shortestpathsC2 \ - ca KdlSimpleShortestPathsVertex.source = 2 \ -w 1 在线程main中的异常java.lang.ClassNotFoundException:KdlSimpleShortestPathsVertex在java.net.URLClassLoader中 $ 1.run( URLClassLoader.java:366)java.net.URLClassLoader $ 1.run(URLClassLoa der.java:355)$ java.util.AccessController.doPrivileged(Native方法)在java.net.URLClassLoader.findClass上的(URLClassLoader.java:354)$ b $在java.lang。 ClassLoader.loadClass(ClassLoader.java:425)$ b $在java.lang.ClassLoader.loadClass(ClassLoader.java:358)在java.lang.Class.forName0(本地方法) at java.lang.Class.forName(Class.java:190) at org.apache.giraph.utils.ConfigurationUtils.populateGiraphConfiguration(ConfigurationUtils.java:210) at org.apache.giraph.utils。 ConfigurationUtils.parseArgs(ConfigurationUtils.java:147)在org.apache.giraph.GiraphRunner.run(GiraphRunner.java:74)在org.apache.hadoop.util.ToolRunner.run(ToolRunner。 at org.apache.giraph.GiraphRunner.main(GiraphRunner.java:124) at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57) at sun.refl ect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) at java.lang.reflect.Method.invoke(Method.java:606) at org.apache.hadoop.util.RunJar.main( RunJar.java:208) 我最好的猜想是... ...环顾四周后,也许GiraphRunner不能正确处理-libjars,正如 http://grepalex.com/2013/02/25/hadoop-libjars/ (确保你的代码使用GenericOptionsParser)。浏览Giraph源文件,我没有看到该类访问。我尝试将HADOOP_CLASSPATH设置到我的jar中,但是这并没有解决问题。 任何帮助都很棒! PageRankBenchmark输出 14/08/01 11:42:27信息job.GiraphJob:run:由于checkpointing已禁用(默认),不允许任何任务重试(设置mapred.map.max.attempts = 0,旧值= 4) 14/08/01 11:42:28警告mapred.JobClient:使用GenericOptionsParser进行解析参数。应用程序应该实现相同的工具。 14/08/01 11:42:28 WARN bsp.BspOutputFormat:checkOutputSpecs:ImmutableOutputCommiter不会检查任何内容 14/08/01 11:42:29信息mapred.JobClient:正在运行的作业:job_201407291058_0015 14/08/01 11:42:30信息mapred.JobClient:map 0%reduce 0% 14/08/01 11:42:40信息mapred.JobClient:map 50%reduce 0% 14/08/01 11:42:41信息mapred.JobClient:地图100%减少0% 14/08/01 11:42:44信息mapred.JobClient:工作完成:job_201407291058_0015 14/08/01 11:42:44信息mapred.JobClient:计数器:39 14/08/01 11:42:44信息mapred.JobClient:文件系统计数器 14/08/01 11:42:44信息mapred.JobClient:FILE:读取的字节数= 0 14/08/01 11:42:44信息mapred.JobClient:FILE:写入的字节数= 369846 14 / 08/01 11:42:44 INFO mapred.JobClient:FILE:读操作数= 0 14/08/01 11:42:44 INFO mapred.JobClient:FILE:大读操作数= 0 14/08/01 11:42:44信息mapred.JobClient:FILE:Num写操作= 0 14/08/01 11:42:44信息mapred.JobClient:HDFS:读取的字节数= 88 14/08/01 11:42:44 INFO mapred。 JobClient:HDFS:写入的字节数= 0 14/08/01 11:42:44信息mapred.JobClient:HDFS:读取操作次数= 2 14/08/01 11:42: 44 INFO mapred.JobClient:HDFS:写操作次数= 1 14/08 / 01 11:42:44信息mapred.JobClient:作业计数器 14/08/01 11:42:44信息mapred.JobClient:启动地图任务= 2 14/08/01 11:42: 44信息mapred.JobClient:所有地图在占用时隙中花费的总时间(ms)= 15772 14/08/01 11:42:44信息mapred.JobClient:占用时隙中所有减少花费的总时间(ms) )= 0 14/08/01 11:42:44信息mapred.JobClient:预留插槽后等待的所有地图花费的总时间(毫秒)= 0 14/08/01 11:42: 44信息mapred.JobClient:所有人花费的时间减少(ms)= 0 14/08/01 11:42:44信息mapred.JobClient:Map-Reduce Framework 14/08/01 11:42:44信息mapred.JobClient:地图输入记录= 2 14/08/01 11:42:44信息mapred.JobClient:地图输出记录= 0 14/08/01 11:42:44信息mapred.JobClient:输入拆分字节= 88 14/08/01 11:42:44信息mapred.JobClient:溢出记录= 0 14/08/01 11:42:44信息mapred.JobClient:花费的CPU时间(毫秒)= 2230 14/08/01 11:42:44信息mapred.JobClient:物理内存(字节)snapshot = 411357184 14/08/01 11:42:44信息mapred.JobClient:Virtual内存(字节)快照= 2428895232 14/08/01 11:42:44信息mapred.JobClient:总承诺堆使用率(字节)= 806027264 14/08/01 11:42:44信息mapred.JobClient:Giraph Stats 14/08/01 11:42:44信息mapred.JobClient:聚合边缘= 50 14/08/01 11:42:44信息mapred.JobClient:聚合完成顶点= 50 14/08/01 11:42:44信息mapred.JobClient:Aggr egate vertices = 50 14/08/01 11:42:44信息mapred.JobClient:当前主任务分区= 0 14/08/01 11:42:44信息mapred.JobClient:当前工人= 1 14/08/01 11:42:44信息mapred.JobClient:最后一次checkpointed superstep = 0 14/08/01 11:42:44信息mapred.JobClient:发送消息= 0 14/08/01 11:42:44信息mapred.JobClient:Superstep = 4 14/08/01 11:42:44信息mapred.JobClient:Giraph计时器 14/08 / 01 11:42:44信息mapred.JobClient:输入superstep(毫秒)= 238 14/08/01 11:42:44信息mapred.JobClient:设置(毫秒)= 2903 14/08 / 01 11:42:44信息mapred.JobClient:Shutdown(毫秒)= 68 14/08/01 11:42:44信息mapred.JobClient:Superstep 0(毫秒)= 77 14 / 08/01 11:42:44信息mapred.JobClient:Superstep 1(毫秒)= 64 14/08/01 11:42:44信息mapred.JobClient:Superstep 2(毫秒)= 45 14/08/01 11:42:44信息mapred.JobClient:Superstep 3(毫秒)= 43 14/08/01 11:42:44信息mapred.JobClient:Total(毫秒)= 3442 SimpleShortestPathsVertex输出 14/08/01 11:47:37 INFO utils.ConfigurationUtils:未指定边缘输入格式。确保你的InputFormat不需要一个。 14/08/01 11:47:37 INFO utils.ConfigurationUtils:在GiraphConfiguration中将自定义参数[SimpleShortestPathsVertex.source]设置为[2] 14/08/01 11:47:37 WARN作业。 GiraphConfigurationValidator:输出格式顶点索引类型未知 14/08/01 11:47:37 WARN job.GiraphConfigurationValidator:输出格式顶点值类型未知 14/08/01 11:47: 37 WARN job.GiraphConfigurationValidator:输出格式边缘值类型未知 14/08/01 11:47:37 INFO job.GiraphJob:run:由于checkpointing已禁用(默认),因此不允许任何任务重试(设置mapred.map.max.attempts = 0,旧值= 4) 14/08/01 11:47:37警告mapred.JobClient:使用GenericOptionsParser解析参数。应用程序应该实现相同的工具。 14/08/01 11:47:38信息mapred.JobClient:正在运行的作业:job_201407291058_0017 14/08/01 11:47:39信息mapred.JobClient:map 0%reduce 0% 14/08/01 11:47:44信息mapred.JobClient:地图50%减少0% 14/08/01 11:47:45信息mapred.JobClient:地图100%减少0% 14/08/01 11:47:46信息mapred.JobClient:工作完成:job_201407291058_0017 14/08/01 11:47:46信息mapred.JobClient:计数器:39 14/08 / 01 11:47:46信息mapred.JobClient:文件系统计数器 14/08/01 11:47:46信息mapred.JobClient:FILE:读取的字节数= 0 14/08 / 01 11:47:46信息mapred.JobClient:FILE:写入的字节数= 367068 14/08/01 11:47:46信息mapred.JobClient:FILE:读取操作数= 0 14/08/01 11:47:46信息mapred.JobClient:FILE:大量读取操作的数量= 0 14/08/01 11:47:46信息mapred.JobClient:FILE:写入操作的数量= 0 14/08/01 11:47:46信息mapred.JobClient:HDFS:读取的字节数= 200 14 / 08/01 11:47:46信息mapred.JobClient:HDFS:写入的字节数= 30 14/08/01 11:47:46信息mapred.JobClient:HDFS:读取操作次数= 5 14/08/01 11:47:46信息mapred.JobClient:HDFS:大量读取操作的数量= 0 14/08/01 11:47:46信息mapred.JobClient:HDFS:写入次数操作= 2 14/08/01 11:47:46信息mapred.JobClient:作业计数器 14/08/01 11:47:46信息mapred.JobClient:启动的地图任务= 2 14/08/01 11:47:46信息mapred.JobClient:所有地图在占用插槽中花费的总时间(毫秒)= 8538 14/08/01 11:47:46信息mapred.JobClient:占用插槽中所有缩减花费的总时间(毫秒)= 0 14/08/01 11:47:46信息mapred.JobClient:预留插槽后等待的所有地图花费的总时间(毫秒)= 0 14/08/01 11:47:46信息mapred.JobClient:所有花费的时间都减少了预留槽后的等待时间(毫秒)= 0 14/08/01 11:47:46信息mapred.JobClient :Map-Reduce Framework 14/08/01 1 1:47:46信息mapred.JobClient:地图输入记录= 2 14/08/01 11:47:46信息mapred.JobClient:地图输出记录= 0 14/08/01 11: 47:46信息mapred.JobClient:输入分割字节= 88 14/08/01 11:47:46信息mapred.JobClient:溢出记录= 0 14/08/01 11:47:46 INFO mapred.JobClient:CPU花费的时间(ms)= 1590 14/08/01 11:47:46信息mapred.JobClient:物理内存(字节)快照= 341344256 14/08/01 11 :47:46信息mapred.JobClient:虚拟内存(字节)快照= 2363527168 14/08/01 11:47:46信息mapred.JobClient:总提交堆使用率(字节)= 504758272 14 / 08/01 11:47:46信息mapred.JobClient:Giraph Stats 14/08/01 11:47:46信息mapred.JobClient:聚合边缘= 12 14/08/01 11: 47:46信息mapred.JobClient:聚合完成的顶点= 5 14/08/01 11:47:46信息mapred.JobClient:聚合顶点= 5 14/08/01 11:47:46信息mapred.JobClient:当前主任务分区= 0 14/08/01 11 :47:46信息mapred.JobClient:当前工作人员= 1 14/08/01 11:47:46信息mapred.JobClient:最后一次checkpointed superstep = 0 14/08/01 11:47: 46信息mapred.JobClient:发送消息= 0 14/08/01 11:47:46信息mapred.JobClient:Superstep = 4 14/08/01 11:47:46信息mapred.JobClient :Giraph Timers 14/08/01 11:47:46信息mapred.JobClient:输入superstep(毫秒)= 181 14/08/01 11:47:46信息mapred.JobClient:Setup(毫秒)= 313 14/08/01 11:47:46信息mapred.JobClient:关闭(毫秒)= 128 14/08/01 11:47:46信息mapred.JobClient:Superstep 0 (毫秒)= 57 14/08/01 11:47:46信息mapred.JobClient:Superstep 1(毫秒)= 54 14/08/01 11:47:46信息mapred.JobClient:信息mapred.JobClient:Superstep 3(毫秒)= 35 14/08/01 11:47:46信息mapred。 JobClient:Total(毫秒)= 805 解决方案好吧,在查看hadoop脚本以及Hadoop和Giraph源代码之后,我想我已经明白了。最重要的提示来自在Hadoop中使用libjars选项以及此行输出: 原因似乎是GiraphRunner使用它自己的ConfigurationUtils.parseArgs()来获取组织.apache.commons.cli.CommandLine,而不是使用推荐的org.apache.hadoop.util.GenericOptionsParser.getCommandLine(),它将授予'libjars'选项。这让我回到了Hadoop的通用类路径处理工具:CLASSPATH和/或HADOOP_CLASSPATH。这是什么工作: 设置HADOOP_CLASSPATH以包含您的应用程序jar 和 gigraph核心jar,使用 分隔符。 传递使用相同类路径但带有逗号分隔符的 例如,在我的机器上: $ export GIRAPH_HOME = / share / apps / giraph $ export HADOOP_CLASSPATH = / home /< me> /kdl_hadoop_play.jar:$ GIRAPH_HOME / giraph-ex.jar:$ HADOOP_CLASSPATH $ export LIBJARS = / home /< me> / kdl_hadoop_play.jar,$ GIRAPH_HOME / giraph-core.jar $ hadoop fs -rm -R goutput / shortestpathsC2 $ hadoop jar $ GIRAPH_HOME / giraph-ex.jar org.apache.giraph.GiraphRunner \ -Dgiraph.zkList =< myhost>:2181 \ -libjars $ {LIBJARS} \ KdlSimpleShortestPathsVertex \ -vif org.apache.giraph.io。 formats.JsonLongDoubleFloatDoubleVertexInputFormat \ -vip /user/cornell/ginput/tiny_graph.txt \ -of org.apache.giraph.io.formats.IdWithValueTextOutputFormat \ -op / user / cornell / goutput / shortestpathsC2 \ -ca SimpleShortestPathsVertex.source = 2 \ -w 1 ... $ hadoop fs -cat goutput / shortestpathsC2 / p * 这给出了预期的输出和结果。 更普遍的是,如果Giraph团队改变代码以使用(显然)更多标准解析器。 希望有帮助! I'm relatively new to Giraph and I'm trying to get my Giraph edit-compile-deploy loop working for our code. I am able to run various examples inspired by http://blog.cloudera.com/blog/2014/02/how-to-write-and-run-giraph-jobs-on-hadoop/ , but I'm stuck with a ClassNotFoundException when running my modified version of the SimpleShortestPathsVertex Giraph example. I've tried various combinations of -libjars and HADOOP_CLASSPATH, but I'm out of ideas and I'd really appreciate your help. Details follow.VersionsHadoop: Hadoop 2.0.0-cdh4.4.0Giraph: giraph-examples-1.0.0-for-hadoop-2.0.0-alpha-jar-with-dependencies.jarThe PageRankBenchmark runs OK$ hadoop jar $GIRAPH_HOME/giraph-examples/target/giraph-examples-1.0.0-for-hadoop-2.0.0-alpha-jar-with-dependencies.jar \org.apache.giraph.benchmark.PageRankBenchmark \-Dgiraph.zkList=<myhost>:2181 \-e 1 -s 3 -v -V 50 -w 1...14/08/01 11:42:44 INFO mapred.JobClient: Job complete: job_201407291058_0015...(full output is below)The GiraphRunner SimpleShortestPathsVertex also runs OK$ hadoop jar $GIRAPH_HOME/giraph-examples/target/giraph-examples-1.0.0-for-hadoop-2.0.0-alpha-jar-with-dependencies.jar \org.apache.giraph.GiraphRunner \-Dgiraph.zkList=<myhost>:2181 \org.apache.giraph.examples.SimpleShortestPathsVertex \-vif org.apache.giraph.io.formats.JsonLongDoubleFloatDoubleVertexInputFormat \-vip ginput/tiny_graph.txt \-of org.apache.giraph.io.formats.IdWithValueTextOutputFormat \-op goutput/shortestpathsC2 \-ca SimpleShortestPathsVertex.source=2 \-w 1...14/08/01 11:47:46 INFO mapred.JobClient: Job complete: job_201407291058_0017...(full output is below)Bonus: the results are correct:$ hadoop fs -cat goutput/shortestpathsC2/p*0 1.02 2.01 0.03 1.04 5.0But my modified version of SimpleShortestPathsVertex gets ClassNotFoundExceptionThe jar containing the modified vertex (KdlSimpleShortestPathsVertex, no package) is OK:$ jar -tf ~/kdl_hadoop_play.jarMETA-INF/MANIFEST.MFKdlSimpleShortestPathsVertex.classMETA-INF/But my run pukes:$ hadoop jar $GIRAPH_HOME/giraph-core/target/giraph-1.0.0-for-hadoop-2.0.0-alpha-jar-with-dependencies.jar \org.apache.giraph.GiraphRunner \-Dgiraph.zkList=<myhost>:2181 \-libjars ~/kdl_hadoop_play.jar \KdlSimpleShortestPathsVertex \-vif org.apache.giraph.io.formats.JsonLongDoubleFloatDoubleVertexInputFormat \-vip /user/cornell/ginput/tiny_graph.txt \-of org.apache.giraph.io.formats.IdWithValueTextOutputFormat \-op /user/cornell/goutput/shortestpathsC2 \-ca KdlSimpleShortestPathsVertex.source=2 \-w 1Exception in thread "main" java.lang.ClassNotFoundException: KdlSimpleShortestPathsVertexat java.net.URLClassLoader$1.run(URLClassLoader.java:366)at java.net.URLClassLoader$1.run(URLClassLoader.java:355)at java.security.AccessController.doPrivileged(Native Method)at java.net.URLClassLoader.findClass(URLClassLoader.java:354)at java.lang.ClassLoader.loadClass(ClassLoader.java:425)at java.lang.ClassLoader.loadClass(ClassLoader.java:358)at java.lang.Class.forName0(Native Method)at java.lang.Class.forName(Class.java:190)at org.apache.giraph.utils.ConfigurationUtils.populateGiraphConfiguration(ConfigurationUtils.java:210)at org.apache.giraph.utils.ConfigurationUtils.parseArgs(ConfigurationUtils.java:147)at org.apache.giraph.GiraphRunner.run(GiraphRunner.java:74)at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:70)at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:84)at org.apache.giraph.GiraphRunner.main(GiraphRunner.java:124)at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)at java.lang.reflect.Method.invoke(Method.java:606)at org.apache.hadoop.util.RunJar.main(RunJar.java:208)My best guess ......after looking around is that maybe GiraphRunner is not processing the -libjars correctly, as hinted at by http://grepalex.com/2013/02/25/hadoop-libjars/ ("Make sure your code is using GenericOptionsParser"). Browsing the Giraph source, I do not see that class accessed. I tried setting HADOOP_CLASSPATH to my jar, but that didn't solve the problem.Any help would be awesome!PageRankBenchmark output14/08/01 11:42:27 INFO job.GiraphJob: run: Since checkpointing is disabled (default), do not allow any task retries (setting mapred.map.max.attempts = 0, old value = 4)14/08/01 11:42:28 WARN mapred.JobClient: Use GenericOptionsParser for parsing the arguments. Applications should implement Tool for the same.14/08/01 11:42:28 WARN bsp.BspOutputFormat: checkOutputSpecs: ImmutableOutputCommiter will not check anything14/08/01 11:42:29 INFO mapred.JobClient: Running job: job_201407291058_001514/08/01 11:42:30 INFO mapred.JobClient: map 0% reduce 0%14/08/01 11:42:40 INFO mapred.JobClient: map 50% reduce 0%14/08/01 11:42:41 INFO mapred.JobClient: map 100% reduce 0%14/08/01 11:42:44 INFO mapred.JobClient: Job complete: job_201407291058_001514/08/01 11:42:44 INFO mapred.JobClient: Counters: 3914/08/01 11:42:44 INFO mapred.JobClient: File System Counters14/08/01 11:42:44 INFO mapred.JobClient: FILE: Number of bytes read=014/08/01 11:42:44 INFO mapred.JobClient: FILE: Number of bytes written=36984614/08/01 11:42:44 INFO mapred.JobClient: FILE: Number of read operations=014/08/01 11:42:44 INFO mapred.JobClient: FILE: Number of large read operations=014/08/01 11:42:44 INFO mapred.JobClient: FILE: Number of write operations=014/08/01 11:42:44 INFO mapred.JobClient: HDFS: Number of bytes read=8814/08/01 11:42:44 INFO mapred.JobClient: HDFS: Number of bytes written=014/08/01 11:42:44 INFO mapred.JobClient: HDFS: Number of read operations=214/08/01 11:42:44 INFO mapred.JobClient: HDFS: Number of large read operations=014/08/01 11:42:44 INFO mapred.JobClient: HDFS: Number of write operations=114/08/01 11:42:44 INFO mapred.JobClient: Job Counters 14/08/01 11:42:44 INFO mapred.JobClient: Launched map tasks=214/08/01 11:42:44 INFO mapred.JobClient: Total time spent by all maps in occupied slots (ms)=1577214/08/01 11:42:44 INFO mapred.JobClient: Total time spent by all reduces in occupied slots (ms)=014/08/01 11:42:44 INFO mapred.JobClient: Total time spent by all maps waiting after reserving slots (ms)=014/08/01 11:42:44 INFO mapred.JobClient: Total time spent by all reduces waiting after reserving slots (ms)=014/08/01 11:42:44 INFO mapred.JobClient: Map-Reduce Framework14/08/01 11:42:44 INFO mapred.JobClient: Map input records=214/08/01 11:42:44 INFO mapred.JobClient: Map output records=014/08/01 11:42:44 INFO mapred.JobClient: Input split bytes=8814/08/01 11:42:44 INFO mapred.JobClient: Spilled Records=014/08/01 11:42:44 INFO mapred.JobClient: CPU time spent (ms)=223014/08/01 11:42:44 INFO mapred.JobClient: Physical memory (bytes) snapshot=41135718414/08/01 11:42:44 INFO mapred.JobClient: Virtual memory (bytes) snapshot=242889523214/08/01 11:42:44 INFO mapred.JobClient: Total committed heap usage (bytes)=80602726414/08/01 11:42:44 INFO mapred.JobClient: Giraph Stats14/08/01 11:42:44 INFO mapred.JobClient: Aggregate edges=5014/08/01 11:42:44 INFO mapred.JobClient: Aggregate finished vertices=5014/08/01 11:42:44 INFO mapred.JobClient: Aggregate vertices=5014/08/01 11:42:44 INFO mapred.JobClient: Current master task partition=014/08/01 11:42:44 INFO mapred.JobClient: Current workers=114/08/01 11:42:44 INFO mapred.JobClient: Last checkpointed superstep=014/08/01 11:42:44 INFO mapred.JobClient: Sent messages=014/08/01 11:42:44 INFO mapred.JobClient: Superstep=414/08/01 11:42:44 INFO mapred.JobClient: Giraph Timers14/08/01 11:42:44 INFO mapred.JobClient: Input superstep (milliseconds)=23814/08/01 11:42:44 INFO mapred.JobClient: Setup (milliseconds)=290314/08/01 11:42:44 INFO mapred.JobClient: Shutdown (milliseconds)=6814/08/01 11:42:44 INFO mapred.JobClient: Superstep 0 (milliseconds)=7714/08/01 11:42:44 INFO mapred.JobClient: Superstep 1 (milliseconds)=6414/08/01 11:42:44 INFO mapred.JobClient: Superstep 2 (milliseconds)=4514/08/01 11:42:44 INFO mapred.JobClient: Superstep 3 (milliseconds)=4314/08/01 11:42:44 INFO mapred.JobClient: Total (milliseconds)=3442SimpleShortestPathsVertex output14/08/01 11:47:37 INFO utils.ConfigurationUtils: No edge input format specified. Ensure your InputFormat does not require one.14/08/01 11:47:37 INFO utils.ConfigurationUtils: Setting custom argument [SimpleShortestPathsVertex.source] to [2] in GiraphConfiguration14/08/01 11:47:37 WARN job.GiraphConfigurationValidator: Output format vertex index type is not known14/08/01 11:47:37 WARN job.GiraphConfigurationValidator: Output format vertex value type is not known14/08/01 11:47:37 WARN job.GiraphConfigurationValidator: Output format edge value type is not known14/08/01 11:47:37 INFO job.GiraphJob: run: Since checkpointing is disabled (default), do not allow any task retries (setting mapred.map.max.attempts = 0, old value = 4)14/08/01 11:47:37 WARN mapred.JobClient: Use GenericOptionsParser for parsing the arguments. Applications should implement Tool for the same.14/08/01 11:47:38 INFO mapred.JobClient: Running job: job_201407291058_001714/08/01 11:47:39 INFO mapred.JobClient: map 0% reduce 0%14/08/01 11:47:44 INFO mapred.JobClient: map 50% reduce 0%14/08/01 11:47:45 INFO mapred.JobClient: map 100% reduce 0%14/08/01 11:47:46 INFO mapred.JobClient: Job complete: job_201407291058_001714/08/01 11:47:46 INFO mapred.JobClient: Counters: 3914/08/01 11:47:46 INFO mapred.JobClient: File System Counters14/08/01 11:47:46 INFO mapred.JobClient: FILE: Number of bytes read=014/08/01 11:47:46 INFO mapred.JobClient: FILE: Number of bytes written=36706814/08/01 11:47:46 INFO mapred.JobClient: FILE: Number of read operations=014/08/01 11:47:46 INFO mapred.JobClient: FILE: Number of large read operations=014/08/01 11:47:46 INFO mapred.JobClient: FILE: Number of write operations=014/08/01 11:47:46 INFO mapred.JobClient: HDFS: Number of bytes read=20014/08/01 11:47:46 INFO mapred.JobClient: HDFS: Number of bytes written=3014/08/01 11:47:46 INFO mapred.JobClient: HDFS: Number of read operations=514/08/01 11:47:46 INFO mapred.JobClient: HDFS: Number of large read operations=014/08/01 11:47:46 INFO mapred.JobClient: HDFS: Number of write operations=214/08/01 11:47:46 INFO mapred.JobClient: Job Counters 14/08/01 11:47:46 INFO mapred.JobClient: Launched map tasks=214/08/01 11:47:46 INFO mapred.JobClient: Total time spent by all maps in occupied slots (ms)=853814/08/01 11:47:46 INFO mapred.JobClient: Total time spent by all reduces in occupied slots (ms)=014/08/01 11:47:46 INFO mapred.JobClient: Total time spent by all maps waiting after reserving slots (ms)=014/08/01 11:47:46 INFO mapred.JobClient: Total time spent by all reduces waiting after reserving slots (ms)=014/08/01 11:47:46 INFO mapred.JobClient: Map-Reduce Framework14/08/01 11:47:46 INFO mapred.JobClient: Map input records=214/08/01 11:47:46 INFO mapred.JobClient: Map output records=014/08/01 11:47:46 INFO mapred.JobClient: Input split bytes=8814/08/01 11:47:46 INFO mapred.JobClient: Spilled Records=014/08/01 11:47:46 INFO mapred.JobClient: CPU time spent (ms)=159014/08/01 11:47:46 INFO mapred.JobClient: Physical memory (bytes) snapshot=34134425614/08/01 11:47:46 INFO mapred.JobClient: Virtual memory (bytes) snapshot=236352716814/08/01 11:47:46 INFO mapred.JobClient: Total committed heap usage (bytes)=50475827214/08/01 11:47:46 INFO mapred.JobClient: Giraph Stats14/08/01 11:47:46 INFO mapred.JobClient: Aggregate edges=1214/08/01 11:47:46 INFO mapred.JobClient: Aggregate finished vertices=514/08/01 11:47:46 INFO mapred.JobClient: Aggregate vertices=514/08/01 11:47:46 INFO mapred.JobClient: Current master task partition=014/08/01 11:47:46 INFO mapred.JobClient: Current workers=114/08/01 11:47:46 INFO mapred.JobClient: Last checkpointed superstep=014/08/01 11:47:46 INFO mapred.JobClient: Sent messages=014/08/01 11:47:46 INFO mapred.JobClient: Superstep=414/08/01 11:47:46 INFO mapred.JobClient: Giraph Timers14/08/01 11:47:46 INFO mapred.JobClient: Input superstep (milliseconds)=18114/08/01 11:47:46 INFO mapred.JobClient: Setup (milliseconds)=31314/08/01 11:47:46 INFO mapred.JobClient: Shutdown (milliseconds)=12814/08/01 11:47:46 INFO mapred.JobClient: Superstep 0 (milliseconds)=5714/08/01 11:47:46 INFO mapred.JobClient: Superstep 1 (milliseconds)=5414/08/01 11:47:46 INFO mapred.JobClient: Superstep 2 (milliseconds)=3614/08/01 11:47:46 INFO mapred.JobClient: Superstep 3 (milliseconds)=3514/08/01 11:47:46 INFO mapred.JobClient: Total (milliseconds)=805 解决方案 OK, after looking at the hadoop scripts along with Hadoop and Giraph source, I think I figured it out. The big hint came from Using the libjars option with Hadoop along with this line from the output:The cause appears to be that GiraphRunner uses its own ConfigurationUtils.parseArgs() to get the org.apache.commons.cli.CommandLine instead of using the recommended org.apache.hadoop.util.GenericOptionsParser.getCommandLine(), which honors the 'libjars' option. This led me to fall back on Hadoop's generic classpath-handling tools: CLASSPATH and/or HADOOP_CLASSPATH. Here's what worked:Set HADOOP_CLASSPATH to include your application jar and the gigraph core jar, using a colon delimiter.Pass -libjars using that same classpath but with a comma delimiter.For example, on my machine:$ export GIRAPH_HOME=/share/apps/giraph$ export HADOOP_CLASSPATH=/home/<me>/kdl_hadoop_play.jar:$GIRAPH_HOME/giraph-ex.jar:$HADOOP_CLASSPATH$ export LIBJARS=/home/<me>/kdl_hadoop_play.jar,$GIRAPH_HOME/giraph-core.jar$ hadoop fs -rm -R goutput/shortestpathsC2$ hadoop jar $GIRAPH_HOME/giraph-ex.jar org.apache.giraph.GiraphRunner \-Dgiraph.zkList=<myhost>:2181 \-libjars ${LIBJARS} \KdlSimpleShortestPathsVertex \-vif org.apache.giraph.io.formats.JsonLongDoubleFloatDoubleVertexInputFormat \-vip /user/cornell/ginput/tiny_graph.txt \-of org.apache.giraph.io.formats.IdWithValueTextOutputFormat \-op /user/cornell/goutput/shortestpathsC2 \-ca SimpleShortestPathsVertex.source=2 \-w 1...$ hadoop fs -cat goutput/shortestpathsC2/p*Which gives the expected output and results.More generally, it would be helpful if the Giraph team changed the code to use the (apparently) more standard parser.Hope that helps! 这篇关于ClassNotFoundException在修改后的SimpleShortestPathsVertex上运行GiraphRunner的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持!
09-15 21:12