我不知道为什么这样做:

public final class JavaSparkPi {

public static void main(String[] args) throws Exception {

    SparkConf sparkConf = new SparkConf().setMaster("yarn-cluster").setAppName("mySparkApp");
    JavaSparkContext jsc = new JavaSparkContext(sparkConf);
        ArrayList<Integer> list = new ArrayList<>();
        for(int i = 0; i < 10 ; i++){
            list.add(i);
        }

        JavaRDD<Integer> dataSet = jsc.parallelize(list)
                .map(s->2*s)
                .map(s->5*s);

        int weirdStuff= dataSet.reduce((a, b) -> (a + b)/2);
        System.out.println("stuff is " + weirdStuff);
        jsc.stop();

}
}


以及为什么不这样做:

public final class JavaSparkPi {

    private void startWorkingOnMicroSpark() {
    SparkConf sparkConf = new SparkConf().setMaster("yarn-cluster").setAppName("mySparkApp");
    JavaSparkContext jsc = new JavaSparkContext(sparkConf);
        ArrayList<Integer> list = new ArrayList<>();
        for(int i = 0; i < 10 ; i++){
            list.add(i);
        }

        JavaRDD<Integer> dataSet = jsc.parallelize(list)
                .map(s->2*s)
                .map(s->5*s);

        int weirdStuff = dataSet.reduce((a, b) -> (a + b)/2);
        System.out.println("weirdStuff is " + weirdStuff);
        jsc.stop();
    }
public static void main(String[] args) throws Exception {

    JavaSparkPi jsp = new JavaSparkPi();
    jsp.startWorkingOnMicroSpark();

}

}


我正在使用EMR开发Spark。我发现这两个项目之间的唯一区别是,一个项目的主要部分写在火花部分上,而另一个则没有。
我使用EMR在EMR中将它们作为火花应用程序启动
    --class JavaSparkPi
论点。

这是失败的状态:

Statut :FAILED

Raison :

Fichier journal :s3://mynewbucket/Logs/j-3AKSZXK7FKMX6/steps/s-2MT0SB910U3TE/stderr.gz

Détails:Exception in thread "main" org.apache.spark.SparkException: Application application_1501228129826_0003 finished with failed status

Emplacement JAR : command-runner.jar

Classe principale : Aucun

Arguments : spark-submit --deploy-mode cluster --class JavaSparkPi s3://mynewbucket/Code/SparkAWS.jar

Action sur échec : Continuer


有一个成功的:

Emplacement JAR : command-runner.jar
Classe principale : Aucun
Arguments : spark-submit --deploy-mode cluster --class JavaSparkPi
s3://mynewbucket/Code/SparkAWS.jar
Action sur échec : Continuer

最佳答案

将那些Spark初始化方法放入main。

SparkConf sparkConf =新的SparkConf()。setMaster(“ yarn-cluster”)。setAppName(“ mySparkApp”);
    JavaSparkContext jsc =新的JavaSparkContext(sparkConf);

10-08 00:11