问题描述
这是我的代码,它连接到hadoop机器并执行一组验证并写入另一个目录。
public class Main {
public static void main(String ... strings){
System.setProperty(HADOOP_USER_NAME,root);
String in1 =hdfs://myserver/user/root/adnan/inputfile.txt;
String out =hdfs:// myserver / user / root / cascading / temp2;
属性properties = new Properties();
AppProps.setApplicationJarClass(properties,Main.class);
HadoopFlowConnector flowConnector = new HadoopFlowConnector(properties);
点击inTap = new Hfs(new TextDelimited(true,,),in1);
点击Tab = new Hfs(new TextDelimited(true,,),out);
Pipe inPipe = new Pipe(in1);
每个removeErrors = new每个(inPipe,Fields.ALL,new BigFilter());
GroupBy group = new GroupBy(removeErrors,getGroupByFields(fieldCols));
Every mergeGroup = new Every(group,Fields.ALL,new MergeGroupAggregator(fieldCols),Fields.RESULTS);
FlowDef flowDef = FlowDef.flowDef()
.addSource(inPipe,inTap)
.addTailSink(mergeGroup,outTap);
flowConnector.connect(flowDef).complete();
}
我的工作正在提交到hadoop机器。我可以在作业追踪器上查看这个。但工作失败,我得到了以下异常。
cascading.tap.hadoop.io.MultiInputSplit找不到
在org.apache.hadoop.mapred .MapTask.getSplitDetails(MapTask.java:348)
at org.apache.hadoop.mapred.MapTask.runOldMapper(MapTask.java:389)
at org.apache.hadoop.mapred.MapTask.run (MapTask.java:333)
at org.apache.hadoop.mapred.Child $ 4.run(Child.java:268)
at java.security.AccessController.doPrivileged(Native Method)
在javax.security.auth.Subject.doAs(Subject.java:415)
在org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1408)
在org.apache。 hadoop.mapred.Child.main(Child.java:262)
导致:java.lang.ClassNotFoundException:Class cascading.tap.hadoop.io.MultiInputSplit在org.apache.hadoop找不到
。 conf.Configuration.getClassByName(Configuration.java:1493)
at org.apache.hadoop.mapred.MapTask.getSplitDetails(MapTask.java:346)
... 7 m ore
java.lang.ClassNotFoundException:Class cascading.tap.hadoop.io.MultiInputSplit未找到
,位于org.apache.hadoop.conf.Configuration.getClassByName (Configuration.java:1493)
请注意:
1.我从我的Windows机器上运行此程序,并将hadoop安装在不同的机器上。
2.我正在使用cloudera发行hadoop,它是CDH 4.
得到了问题。 CDH 4.2与级联2.1有关系。所以改为CDH 4.1,它对我有用。
Here is my code that connects to hadoop machine and perform set of validation and write on another directory.
public class Main{
public static void main(String...strings){
System.setProperty("HADOOP_USER_NAME", "root");
String in1 = "hdfs://myserver/user/root/adnan/inputfile.txt";
String out = "hdfs://myserver/user/root/cascading/temp2";
Properties properties = new Properties();
AppProps.setApplicationJarClass(properties, Main.class);
HadoopFlowConnector flowConnector = new HadoopFlowConnector(properties);
Tap inTap = new Hfs(new TextDelimited(true, ","), in1);
Tap outTap = new Hfs(new TextDelimited(true, ","), out);
Pipe inPipe = new Pipe("in1");
Each removeErrors = new Each(inPipe, Fields.ALL, new BigFilter());
GroupBy group = new GroupBy(removeErrors, getGroupByFields(fieldCols));
Every mergeGroup = new Every(group, Fields.ALL, new MergeGroupAggregator(fieldCols), Fields.RESULTS);
FlowDef flowDef = FlowDef.flowDef()
.addSource(inPipe, inTap)
.addTailSink(mergeGroup, outTap);
flowConnector.connect(flowDef).complete();
}
My job is getting submitted to hadoop machine. I can check this on job tracker. but job is getting failed and I am getting exception below.
cascading.tap.hadoop.io.MultiInputSplit not found at org.apache.hadoop.mapred.MapTask.getSplitDetails(MapTask.java:348) at org.apache.hadoop.mapred.MapTask.runOldMapper(MapTask.java:389) at org.apache.hadoop.mapred.MapTask.run(MapTask.java:333) at org.apache.hadoop.mapred.Child$4.run(Child.java:268) at java.security.AccessController.doPrivileged(Native Method) at javax.security.auth.Subject.doAs(Subject.java:415) at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1408) at org.apache.hadoop.mapred.Child.main(Child.java:262)Caused by: java.lang.ClassNotFoundException: Class cascading.tap.hadoop.io.MultiInputSplit not found at org.apache.hadoop.conf.Configuration.getClassByName(Configuration.java:1493) at org.apache.hadoop.mapred.MapTask.getSplitDetails(MapTask.java:346) ... 7 more
java.lang.ClassNotFoundException: Class cascading.tap.hadoop.io.MultiInputSplit not found at org.apache.hadoop.conf.Configuration.getClassByName(Configuration.java:1493)
Note that :1. I am running this from my windows machine and hadoop is setup on different box. 2. I am using cloudera distribution for hadoop which is CDH 4.
got the issue. CDH 4.2 has issue with cascading 2.1. So changed to CDH 4.1 and it worked for me.
这篇关于在使用级联框架运行hadoop程序时获取cascading.tap.hadoop.io.MultiInputSplit类未发现异常的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持!