本文介绍了在使用级联框架运行hadoop程序时获取cascading.tap.hadoop.io.MultiInputSplit类未发现异常的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

这是我的代码,它连接到hadoop机器并执行一组验证并写入另一个目录。

  public class Main {

public static void main(String ... strings){

System.setProperty(HADOOP_USER_NAME,root);
String in1 =hdfs://myserver/user/root/adnan/inputfile.txt;
String out =hdfs:// myserver / user / root / cascading / temp2;

属性properties = new Properties();
AppProps.setApplicationJarClass(properties,Main.class);
HadoopFlowConnector flowConnector = new HadoopFlowConnector(properties);

点击inTap = new Hfs(new TextDelimited(true,,),in1);
点击Tab = new Hfs(new TextDelimited(true,,),out);

Pipe inPipe = new Pipe(in1);

每个removeErrors = new每个(inPipe,Fields.ALL,new BigFilter());
GroupBy group = new GroupBy(removeErrors,getGroupByFields(fieldCols));
Every mergeGroup = new Every(group,Fields.ALL,new MergeGroupAggregator(fieldCols),Fields.RESULTS);

FlowDef flowDef = FlowDef.flowDef()
.addSource(inPipe,inTap)
.addTailSink(mergeGroup,outTap);

flowConnector.connect(flowDef).complete();

}



我的工作正在提交到hadoop机器。我可以在作业追踪器上查看这个。但工作失败,我得到了以下异常。



cascading.tap.hadoop.io.MultiInputSplit找不到
在org.apache.hadoop.mapred .MapTask.getSplitDetails(MapTask.java:348)
at org.apache.hadoop.mapred.MapTask.runOldMapper(MapTask.java:389)
at org.apache.hadoop.mapred.MapTask.run (MapTask.java:333)
at org.apache.hadoop.mapred.Child $ 4.run(Child.java:268)
at java.security.AccessController.doPrivileged(Native Method)
在javax.security.auth.Subject.doAs(Subject.java:415)
在org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1408)
在org.apache。 hadoop.mapred.Child.main(Child.java:262)
导致:java.lang.ClassNotFoundException:Class cascading.tap.hadoop.io.MultiInputSplit在org.apache.hadoop找不到
。 conf.Configuration.getClassByName(Configuration.java:1493)
at org.apache.hadoop.mapred.MapTask.getSplitDetails(MapTask.java:346)
... 7 m ore



java.lang.ClassNotFoundException:Class cascading.tap.hadoop.io.MultiInputSplit未找到
,位于org.apache.hadoop.conf.Configuration.getClassByName (Configuration.java:1493)

请注意:
1.我从我的Windows机器上运行此程序,并将hadoop安装在不同的机器上。
2.我正在使用cloudera发行hadoop,它是CDH 4.

解决方案

得到了问题。 CDH 4.2与级联2.1有关系。所以改为CDH 4.1,它对我有用。


Here is my code that connects to hadoop machine and perform set of validation and write on another directory.

      public class Main{

            public static void main(String...strings){

        System.setProperty("HADOOP_USER_NAME", "root");
        String in1 = "hdfs://myserver/user/root/adnan/inputfile.txt";
        String out = "hdfs://myserver/user/root/cascading/temp2";

        Properties properties = new Properties();
        AppProps.setApplicationJarClass(properties, Main.class);
        HadoopFlowConnector flowConnector = new HadoopFlowConnector(properties);

        Tap inTap = new Hfs(new TextDelimited(true, ","), in1);
        Tap outTap = new Hfs(new TextDelimited(true, ","), out);

        Pipe inPipe = new Pipe("in1");  

        Each removeErrors = new Each(inPipe, Fields.ALL, new BigFilter());
        GroupBy group = new GroupBy(removeErrors, getGroupByFields(fieldCols));
        Every mergeGroup = new Every(group, Fields.ALL, new MergeGroupAggregator(fieldCols), Fields.RESULTS);

        FlowDef flowDef = FlowDef.flowDef()
                .addSource(inPipe, inTap)
                .addTailSink(mergeGroup, outTap);

        flowConnector.connect(flowDef).complete();

}

My job is getting submitted to hadoop machine. I can check this on job tracker. but job is getting failed and I am getting exception below.

cascading.tap.hadoop.io.MultiInputSplit not found at org.apache.hadoop.mapred.MapTask.getSplitDetails(MapTask.java:348) at org.apache.hadoop.mapred.MapTask.runOldMapper(MapTask.java:389) at org.apache.hadoop.mapred.MapTask.run(MapTask.java:333) at org.apache.hadoop.mapred.Child$4.run(Child.java:268) at java.security.AccessController.doPrivileged(Native Method) at javax.security.auth.Subject.doAs(Subject.java:415) at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1408) at org.apache.hadoop.mapred.Child.main(Child.java:262)Caused by: java.lang.ClassNotFoundException: Class cascading.tap.hadoop.io.MultiInputSplit not found at org.apache.hadoop.conf.Configuration.getClassByName(Configuration.java:1493) at org.apache.hadoop.mapred.MapTask.getSplitDetails(MapTask.java:346) ... 7 more

java.lang.ClassNotFoundException: Class cascading.tap.hadoop.io.MultiInputSplit not found at org.apache.hadoop.conf.Configuration.getClassByName(Configuration.java:1493)

Note that :1. I am running this from my windows machine and hadoop is setup on different box. 2. I am using cloudera distribution for hadoop which is CDH 4.

解决方案

got the issue. CDH 4.2 has issue with cascading 2.1. So changed to CDH 4.1 and it worked for me.

这篇关于在使用级联框架运行hadoop程序时获取cascading.tap.hadoop.io.MultiInputSplit类未发现异常的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持!

11-03 05:35