为什么hadoop不能识别我的Map类？ | 为什么hadoop不能识别我的Map类

本文介绍了为什么hadoop不能识别我的Map类？的处理方法，对大家解决问题具有一定的参考价值，需要的朋友们下面随着小编来一起学习吧！

问题描述

我试图在hadoop 2.2.0上运行我的PDFWordCount map-reduce程序，但我得到这个错误：

  13 / 12/25 23:37:26信息mapreduce.Job：任务ID：attempt_1388041362368_0003_m_000009_2，状态：FAILED 
错误：java.lang.RuntimeException：java.lang.ClassNotFoundException：类PDFWordCount $未找到MyMap 
在org.apache.hadoop.conf.Configuration.getClass（Configuration.java:1720）
 at org.apache.hadoop.mapreduce.task.JobContextImpl.getMapperClass（JobContextImpl.java:186）
 at org .apache.hadoop.mapred.MapTask.runNewMapper（MapTask.java:721）
 at org.apache.hadoop.mapred.MapTask.run（MapTask.java:339）
 at org.apache.hadoop .mapred.YarnChild $ 2.run（YarnChild.java:162）
 at java.security.AccessController.doPrivileged（Native Method）
 at javax.security.auth.Subject.doAs（Subject.java:415 ）
在org.apache.hadoop.security.UserGroupInformation.doAs（UserGroupInformation.java:1491）
在org.apache.hadoop.mapred .YarnChild.main（YarnChild.java:157）
导致：java.lang.ClassNotFoundException：类PDFWordCount未找到$ MyMap 
在org.apache.hadoop.conf.Configuration.getClassByName（Configuration.java ：1626）
 at org.apache.hadoop.conf.Configuration.getClass（Configuration.java:1718）
 ... 8 more

它说我的地图类是未知的。我有一个在3个虚拟机上有namenod和2个datanode的集群。

我的主要功能是这样的：

<$ p公共静态无效主要（字符串[] args）抛出异常{
配置conf =新配置（）;
@SuppressWarnings（deprecation）
Job job = new Job（conf，wordcount）;

job.setOutputKeyClass（Text.class）;
job.setOutputValueClass（IntWritable.class）;

job.setMapperClass（MyMap.class）;
job.setReducerClass（MyReduce.class）;

job.setInputFormatClass（PDFInputFormat.class）;
job.setOutputFormatClass（TextOutputFormat.class）;

FileInputFormat.addInputPath（job，new Path（args [0]））;
FileOutputFormat.setOutputPath（job，new Path（args [1]））;

job.setJarByClass（PDFWordCount.class）;
job.waitForCompletion（true）;
}

如果我使用以下命令运行我的jar：

yarn jar myjar.jar PDFWordCount / in / out
它需要 / in 作为输出路径，并且当我有job.setJarByClass（PDFWordCount.class）时，我已经运行了一个简单的WordCount项目，其主函数完全像这个主函数并且运行它，我使用了 yarn jar wc.jar MyWordCount / in2 / out2 并且运行完美。
我不明白什么是问题！
更新：我尝试将我的工作从此项目移至我成功使用的wordcount项目。我构建了一个包，将相关文件从pdfwordcount项目复制到这个包并导出项目（我的主文件没有改为使用 PDFInputFormat ，所以除了将java文件移动到新的包）。它没有工作。我从其他项目中删除了文件，但没有奏效。我将java文件移回到默认包但它不起作用！有什么不对？！解决方案我发现了一种解决这个问题的方法，即使我无法理解实际存在的问题。当我想在eclipse中将我的java项目导出为jar文件时，我有两个选择：将所需的库提取到生成的JAR中将所需的库打包到生成的JAR中我不知道究竟有什么区别，或者是不是大问题。我曾经选择第二个选项，但是如果我选择第一个选项，我可以使用以下命令运行我的作业： yarn jar pdf .jar / in / out I am trying to run my PDFWordCount map-reduce program on hadoop 2.2.0 but I get this error: 13/12/25 23:37:26 INFO mapreduce.Job: Task Id : attempt_1388041362368_0003_m_000009_2, Status : FAILED Error: java.lang.RuntimeException: java.lang.ClassNotFoundException: Class PDFWordCount$MyMap not found at org.apache.hadoop.conf.Configuration.getClass(Configuration.java:1720) at org.apache.hadoop.mapreduce.task.JobContextImpl.getMapperClass(JobContextImpl.java:186) at org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:721) at org.apache.hadoop.mapred.MapTask.run(MapTask.java:339) at org.apache.hadoop.mapred.YarnChild$2.run(YarnChild.java:162) at java.security.AccessController.doPrivileged(Native Method) at javax.security.auth.Subject.doAs(Subject.java:415) at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1491) at org.apache.hadoop.mapred.YarnChild.main(YarnChild.java:157) Caused by: java.lang.ClassNotFoundException: Class PDFWordCount$MyMap not found at org.apache.hadoop.conf.Configuration.getClassByName(Configuration.java:1626) at org.apache.hadoop.conf.Configuration.getClass(Configuration.java:1718) ... 8 more It says that my map class is not known. I have a cluster with a namenod and 2 datanodes on 3 VMs. My main function is this: public static void main(String[] args) throws Exception { Configuration conf = new Configuration(); @SuppressWarnings("deprecation") Job job = new Job(conf, "wordcount"); job.setOutputKeyClass(Text.class); job.setOutputValueClass(IntWritable.class); job.setMapperClass(MyMap.class); job.setReducerClass(MyReduce.class); job.setInputFormatClass(PDFInputFormat.class); job.setOutputFormatClass(TextOutputFormat.class); FileInputFormat.addInputPath(job, new Path(args[0])); FileOutputFormat.setOutputPath(job, new Path(args[1])); job.setJarByClass(PDFWordCount.class); job.waitForCompletion(true); } If I run my jar using this command: yarn jar myjar.jar PDFWordCount /in /out it takes /in as output path and gives me error while I have job.setJarByClass(PDFWordCount.class); in my main function as you see above. I have run simple WordCount project with main function exactly like this main function and to run it, I used yarn jar wc.jar MyWordCount /in2 /out2 and it run flawlessly. I can't understand what is the problem! UPDATE: I tried to move my work from this project to wordcount project I have used successfully. I built a package, copied related files from pdfwordcount project to this package and exported the project (my main was not changed to used PDFInputFormat, so I did nothing except moving java files to new package.) It didn't work. I deleted files from other project but it didn't work. I moved java file back to default package but it didn't work! What's wrong?! 解决方案 I found a way to overcome this problem, even though I couldn't understand what was the problem actually. When I want to export my java project as a jar file in eclipse, I have two options: Extract required libraries into generated JAR Package required libraries into generated JAR I don't know exactly what is the difference or is it a big deal or not. I used to choose second option, but if I choose first option, I can run my job using this command: yarn jar pdf.jar /in /out 这篇关于为什么hadoop不能识别我的Map类？的文章就介绍到这了，希望我们推荐的答案对大家有所帮助，也希望大家多多支持！