本文介绍了我如何构建/运行这个简单的Mahout程序而不会出现异常?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我想运行我在Mahout In Action中找到的代码:

  package org.help; 

import java.io.IOException;
import java.util.ArrayList;
import java.util.List;

导入org.apache.hadoop.conf.Configuration;
导入org.apache.hadoop.fs.FileSystem;
导入org.apache.hadoop.fs.Path;
import org.apache.hadoop.io.SequenceFile;
import org.apache.hadoop.io.Text;
import org.apache.mahout.math.DenseVector;
import org.apache.mahout.math.NamedVector;
import org.apache.mahout.math.VectorWritable;

public class SeqPrep {

public static void main(String args [])throws IOException {

List< NamedVector> apples = new ArrayList< NamedVector>();

NamedVector苹果;

apple = new NamedVector(new DenseVector(new double [] {0.11,510,1}),小圆青苹果);

apples.add(apple);

配置conf = new Configuration();
FileSystem fs = FileSystem.get(conf);
Path path = new Path(appledata / apples);

SequenceFile.Writer writer = new SequenceFile.Writer(fs,conf,path,Text.class,VectorWritable.class);

VectorWritable vec = new VectorWritable();
for(NamedVector vector:apples){
vec.set(vector);
writer.append(new Text(vector.getName()),vec);
}
writer.close();

SequenceFile.Reader reader = new SequenceFile.Reader(fs,new Path(appledata / apples),conf);

Text key = new Text();
VectorWritable value = new VectorWritable();
while(reader.next(key,value)){
System.out.println(key.toString()+,+ value.get()。asFormatString());
}
reader.close();

}

}

我编译它与:

  $ javac -classpath:/usr/local/hadoop-1.0.3/hadoop-core-1.0.3。罐子:/home/hduser/mahout/trunk/core/target/mahout-core-0.8-SNAPSHOT.jar:/home/hduser/mahout/trunk/core/target/mahout-core-0.8-SNAPSHOT-job.jar: /home/hduser/mahout/trunk/core/target/mahout-core-0.8-SNAPSHOT-sources.jar -d myjavac / SeqPrep.java 

I jar it:

  $ jar -cvf SeqPrep.jar -C myjavac /。 

现在我想在我的本地hadoop节点上运行它。我试过了:

  hadoop jar SeqPrep.jar org.help.SeqPrep 



但是我得到:

 主类java.lang.NoClassDefFoundError:org / apache / mahout / math / Vector $ b $在java.lang.Class.forName0(本地方法)
在java.lang.Class.forName(Class.java: 247)
at org.apache.hadoop.util.RunJar.main(RunJar.java:149)

所以我尝试使用libjars参数:

$ $ hadoop jar SeqPrep.jar org.help.SeqPrep -libjars / home / hduser / mahout / trunk / core / target / mahout-core-0.8-SNAPSHOT.jar -libjars /home/hduser/mahout/trunk/core/target/mahout-core-0.8-SNAPSHOT-job.jar -libjars / home / hduser / mahout / trunk / core / target / mahout-core-0.8-SNAPSHOT-sources.jar -libjars /home/hduser/mahout/trunk/math/target/mahout-math-0.8-SNAPSHOT.jar -libjars / home / hduser / mahout / trunk / math / target / mahout-math-0.8-SNAPSHOT-sources.jar





我的最终目标是能够将hadoop fs上的.csv文件读入稀疏矩阵,然后将其乘以由一个随机向量。

编辑:看起来像Razvan得到它(注意:请参阅下面的另一种方式来做到这一点,你的hadoop安装)。供参考:

  $ find /usr/local/hadoop-1.0.3/。 | grep mah 
/usr/local/hadoop-1.0.3/./lib/mahout-core-0.8-SNAPSHOT-tests.jar
/usr/local/hadoop-1.0.3/./ lib / mahout-core-0.8-SNAPSHOT.jar
/usr/local/hadoop-1.0.3/./lib/mahout-core-0.8-SNAPSHOT-job.jar
/ usr / local / hadoop-1.0.3 /./lib/mahout-core-0.8-SNAPSHOT-sources.jar
/usr/local/hadoop-1.0.3/./lib/mahout-math-0.8-SNAPSHOT-sources。 jar
/usr/local/hadoop-1.0.3/./lib/mahout-math-0.8-SNAPSHOT-tests.jar
/usr/local/hadoop-1.0.3/./lib/ mahout-math-0.8-SNAPSHOT.jar

然后:

  $ hadoop jar SeqPrep.jar org.help.SeqPrep 

小圆青苹果,小圆青苹果:{0:0.11,1 :510.0,2:1.0}

编辑:我正在尝试做到这一点,而无需将mahout jar复制到hadoop lib /

  $ rm /usr/local/hadoop-1.0.3/lib / mahout- * 

然后当然是:

  hadoop jar SeqPrep.jar org.he lp.SeqPrep 

线程main中的异常java.lang.NoClassDefFoundError:org / apache / mahout / math / Vector $ b $在java.lang.Class.forName0(本地方法)
在java.lang.Class.forName(Class.java:247)
在org.apache.hadoop.util.RunJar.main(RunJar.java:149)
引起:java.lang .ClassNotFoundException:org.apache.mahout.math.Vector $ b $ java.net.URLClassLoader $ 1.run(URLClassLoader.java:202)
at java.security.AccessController.doPrivileged(Native Method)$ b $ java.net.URLClassLoader.findClass(URLClassLoader.java:190)$ b $ java.lang.ClassLoader.loadClass(ClassLoader.java:306)$ b $ java.util.ClassLoader.loadClass(ClassLoader .java:247)

当我尝试mahout作业文件时:

  $ hadoop jar〜/ mahout / trunk / core / target / mahout-core-0.8-SNAPSHOT-job.jar org.help.SeqPrep 

线程main中的异常java.lang.ClassNotFoundException:org.help.SeqPrep $ b $ java.net.URLC lassLoader $ 1.run(URLClassLoader.java:202)
at java.security.AccessController.doPrivileged(Native Method)$ b $ at java.net.URLClassLoader.findClass(URLClassLoader.java:190)
在java.lang.ClassLoader.loadClass(ClassLoader.java:306)$ b $在java.lang.ClassLoader.loadClass(ClassLoader.java:247)
在java.lang.Class.forName0(本地方法)
at java.lang.Class.forName(Class.java:247)
at org.apache.hadoop.util.RunJar.main(RunJar.java:149)

如果我尝试包含我制作的.jar文件:

  $ hadoop jar〜/ mahout / trunk / core / target / mahout-core-0.8-SNAPSHOT-job.jar SeqPrep.jar org.help.SeqPrep 

线程中的异常mainjava.lang.ClassNotFoundException:SeqPrep.jar

编辑:显然,我只能每次发送一个jar到hadoop。这意味着我需要将我制作的类添加到mahout核心作业文件中:

 〜/ mahout / trunk / core / target $ cp mahout-core-0.8-SNAPSHOT-job.jar mahout-core-0.8-SNAPSHOT-job.jar_backup 

〜/ mahout / trunk / core / target $ cp〜/ workspace / seqprep / bin /org/help/SeqPrep.class。

〜/ mahout / trunk / core / target $ jar uf mahout-core-0.8-SNAPSHOT-job.jar SeqPrep.class

然后:

 〜/ mahout / trunk / core / target $ hadoop jar mahout-core-0.8-SNAPSHOT-job.jar org.help.SeqPrep 

线程main中的异常java.lang.ClassNotFoundException:org.help.SeqPrep

编辑:好吧,现在我可以做到这一点而不会干扰我的hadoop安装。我在之前的编辑中更新错误.jar。它应该是:

 〜/ mahout / trunk / core / target $ jar uf mahout-core-0.8-SNAPSHOT-job。 jar org / help / SeqPrep.class 

然后:

 〜/ mahout / trunk / core / target $ hadoop jar mahout-core-0.8-SNAPSHOT-job.jar org.help.SeqPrep 

small圆形青苹果,小圆青苹果:{0:0.11,1:510.0,2:1.0}


解决方案

你需要使用Mahout提供的jobJAR文件。它打包了所有的依赖关系。你也需要添加你的类。这就是所有Mahout示例的工作原理。您不应该将Mahout jar放入Hadoop库中,因为这样会在Hadoop中安装太深的程序。


I would like to run this code which I found in Mahout In Action:

package org.help;

import java.io.IOException;
import java.util.ArrayList;
import java.util.List;

import org.apache.hadoop.conf.Configuration;
import org.apache.hadoop.fs.FileSystem;
import org.apache.hadoop.fs.Path;
import org.apache.hadoop.io.SequenceFile;
import org.apache.hadoop.io.Text;
import org.apache.mahout.math.DenseVector;
import org.apache.mahout.math.NamedVector;
import org.apache.mahout.math.VectorWritable;

public class SeqPrep {

    public static void main(String args[]) throws IOException{

        List<NamedVector> apples = new ArrayList<NamedVector>();

        NamedVector apple;

        apple = new NamedVector(new DenseVector(new double[]{0.11, 510, 1}), "small round green apple");

        apples.add(apple);

        Configuration conf = new Configuration();
        FileSystem fs = FileSystem.get(conf);
        Path path = new Path("appledata/apples");

        SequenceFile.Writer writer = new SequenceFile.Writer(fs,  conf, path, Text.class, VectorWritable.class);

        VectorWritable vec = new VectorWritable();
        for(NamedVector vector : apples){
            vec.set(vector);
            writer.append(new Text(vector.getName()), vec);
        }
        writer.close();

        SequenceFile.Reader reader = new SequenceFile.Reader(fs, new Path("appledata/apples"), conf);

        Text key = new Text();
        VectorWritable value = new VectorWritable();
        while(reader.next(key, value)){
            System.out.println(key.toString() + " , " + value.get().asFormatString());
        }
        reader.close();

    }

}

I compile it with:

$ javac -classpath :/usr/local/hadoop-1.0.3/hadoop-core-1.0.3.jar:/home/hduser/mahout/trunk/core/target/mahout-core-0.8-SNAPSHOT.jar:/home/hduser/mahout/trunk/core/target/mahout-core-0.8-SNAPSHOT-job.jar:/home/hduser/mahout/trunk/core/target/mahout-core-0.8-SNAPSHOT-sources.jar -d myjavac/ SeqPrep.java

I jar it:

$ jar -cvf SeqPrep.jar -C myjavac/ .

Now I'd like to run it on my local hadoop node. I've tried:

 hadoop jar SeqPrep.jar org.help.SeqPrep

But I get:

Exception in thread "main" java.lang.NoClassDefFoundError: org/apache/mahout/math/Vector
    at java.lang.Class.forName0(Native Method)
    at java.lang.Class.forName(Class.java:247)
    at org.apache.hadoop.util.RunJar.main(RunJar.java:149)

So I tried using the libjars parameter:

$ hadoop jar SeqPrep.jar org.help.SeqPrep -libjars /home/hduser/mahout/trunk/core/target/mahout-core-0.8-SNAPSHOT.jar -libjars /home/hduser/mahout/trunk/core/target/mahout-core-0.8-SNAPSHOT-job.jar -libjars /home/hduser/mahout/trunk/core/target/mahout-core-0.8-SNAPSHOT-sources.jar -libjars /home/hduser/mahout/trunk/math/target/mahout-math-0.8-SNAPSHOT.jar -libjars /home/hduser/mahout/trunk/math/target/mahout-math-0.8-SNAPSHOT-sources.jar

and got the same problem. I don't know what else to try.

My eventual goal is to be able to read a .csv file on the hadoop fs into a sparse matrix and then multiply it by a random vector.

edit: Looks like Razvan got it (note: see below for another way to do this that does not mess with your hadoop installation). For reference:

$ find /usr/local/hadoop-1.0.3/. |grep mah
/usr/local/hadoop-1.0.3/./lib/mahout-core-0.8-SNAPSHOT-tests.jar
/usr/local/hadoop-1.0.3/./lib/mahout-core-0.8-SNAPSHOT.jar
/usr/local/hadoop-1.0.3/./lib/mahout-core-0.8-SNAPSHOT-job.jar
/usr/local/hadoop-1.0.3/./lib/mahout-core-0.8-SNAPSHOT-sources.jar
/usr/local/hadoop-1.0.3/./lib/mahout-math-0.8-SNAPSHOT-sources.jar
/usr/local/hadoop-1.0.3/./lib/mahout-math-0.8-SNAPSHOT-tests.jar
/usr/local/hadoop-1.0.3/./lib/mahout-math-0.8-SNAPSHOT.jar

and then:

$hadoop jar SeqPrep.jar org.help.SeqPrep

small round green apple , small round green apple:{0:0.11,1:510.0,2:1.0}

edit: I'm trying to do this without copying the mahout jars into the hadoop lib/

$ rm /usr/local/hadoop-1.0.3/lib/mahout-*

and then of course:

hadoop jar SeqPrep.jar org.help.SeqPrep

Exception in thread "main" java.lang.NoClassDefFoundError: org/apache/mahout/math/Vector
    at java.lang.Class.forName0(Native Method)
    at java.lang.Class.forName(Class.java:247)
    at org.apache.hadoop.util.RunJar.main(RunJar.java:149)
Caused by: java.lang.ClassNotFoundException: org.apache.mahout.math.Vector
    at java.net.URLClassLoader$1.run(URLClassLoader.java:202)
    at java.security.AccessController.doPrivileged(Native Method)
    at java.net.URLClassLoader.findClass(URLClassLoader.java:190)
    at java.lang.ClassLoader.loadClass(ClassLoader.java:306)
    at java.lang.ClassLoader.loadClass(ClassLoader.java:247)

and when I try the mahout job file:

$hadoop jar ~/mahout/trunk/core/target/mahout-core-0.8-SNAPSHOT-job.jar org.help.SeqPrep

Exception in thread "main" java.lang.ClassNotFoundException: org.help.SeqPrep
    at java.net.URLClassLoader$1.run(URLClassLoader.java:202)
    at java.security.AccessController.doPrivileged(Native Method)
    at java.net.URLClassLoader.findClass(URLClassLoader.java:190)
    at java.lang.ClassLoader.loadClass(ClassLoader.java:306)
    at java.lang.ClassLoader.loadClass(ClassLoader.java:247)
    at java.lang.Class.forName0(Native Method)
    at java.lang.Class.forName(Class.java:247)
    at org.apache.hadoop.util.RunJar.main(RunJar.java:149)

If I try to include the .jar file I made:

$ hadoop jar ~/mahout/trunk/core/target/mahout-core-0.8-SNAPSHOT-job.jar SeqPrep.jar org.help.SeqPrep

Exception in thread "main" java.lang.ClassNotFoundException: SeqPrep.jar

edit: Apparently I can only send one jar at a time to hadoop. This means I need to add the class I made into the mahout core job file:

~/mahout/trunk/core/target$ cp mahout-core-0.8-SNAPSHOT-job.jar mahout-core-0.8-SNAPSHOT-job.jar_backup

~/mahout/trunk/core/target$ cp ~/workspace/seqprep/bin/org/help/SeqPrep.class .

~/mahout/trunk/core/target$ jar uf mahout-core-0.8-SNAPSHOT-job.jar SeqPrep.class

And then:

~/mahout/trunk/core/target$ hadoop jar mahout-core-0.8-SNAPSHOT-job.jar org.help.SeqPrep

Exception in thread "main" java.lang.ClassNotFoundException: org.help.SeqPrep

edit: Ok, now I can do it without messing with my hadoop installation. I was updating the .jar wrong in that previous edit. It should be:

~/mahout/trunk/core/target$ jar uf mahout-core-0.8-SNAPSHOT-job.jar org/help/SeqPrep.class

then:

~/mahout/trunk/core/target$ hadoop jar mahout-core-0.8-SNAPSHOT-job.jar org.help.SeqPrep

small round green apple , small round green apple:{0:0.11,1:510.0,2:1.0}
解决方案

You need to use the "job" JAR file provided by Mahout. It packages up all the dependencies. You need to add your classes to it too. This is how all the Mahout examples work. You shouldn't put Mahout jars in the Hadoop lib since that sort of "installs" a program too deeply in Hadoop.

这篇关于我如何构建/运行这个简单的Mahout程序而不会出现异常?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持!

09-15 12:46