我正在尝试创建一个示例测试用例,并使用spark-java给出了所有必需的依赖项和junit框架。

https://github.com/holdenk/spark-testing-base

火花测试基础2.2

import org.apache.spark.sql.*;
import com.holdenkarau.spark.testing.*;
import org.apache.spark.SparkConf;

import static org.junit.Assert.assertFalse;
import static org.junit.Assert.assertTrue;
import org.junit.Test;
import org.apache.spark.sql.Dataset;
import org.apache.spark.sql.catalyst.encoders.OuterScopes;

import java.io.Serializable;
import java.util.Arrays;
import java.util.List;

public class SampleJavaDatasetTest extends JavaDatasetSuiteBase implements Serializable {

    SparkSession sparkSession = SparkSession
            .builder()
            .appName("aws-crediting")
            .config("spark.driver.allowMultipleContexts" , "true")
            .master("local")
            .config("spark.some.config.option", "some-value")
            .getOrCreate();

    @Test
    public void testEqualDataFrameWithItSelf() {
        OuterScopes.addOuterScope(this);
        List<BasicMagic> list = Arrays.asList(new BasicMagic("holden", 30),
                new BasicMagic("mahmoud", 23));
        Dataset<BasicMagic> dataset = sparkSession.createDataset(list, Encoders.bean(BasicMagic.class));
        assertDatasetEquals(dataset, dataset);
   }
}


和我得到的错误是在下面,但似乎是不是问题与多个上下文,因为我在配置中使用它。

org.apache.spark.SparkException: Only one SparkContext may be running in this JVM (see SPARK-2243). To ignore this error, set spark.driver.allowMultipleContexts = true. The currently running SparkContext was created at:
org.apache.spark.sql.SparkSession$Builder.getOrCreate(SparkSession.scala:901)
com.nielsen.engineering.netsight.aws.test.SampleJavaDatasetTest.<init>(SampleJavaDatasetTest.java:28)
sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method)
sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:62)
sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45)
java.lang.reflect.Constructor.newInstance(Constructor.java:423)
org.junit.runners.BlockJUnit4ClassRunner.createTest(BlockJUnit4ClassRunner.java:217)
org.junit.runners.BlockJUnit4ClassRunner$1.runReflectiveCall(BlockJUnit4ClassRunner.java:266)
org.junit.internal.runners.model.ReflectiveCallable.run(ReflectiveCallable.java:12)
org.junit.runners.BlockJUnit4ClassRunner.methodBlock(BlockJUnit4ClassRunner.java:263)
org.junit.runners.BlockJUnit4ClassRunner.runChild(BlockJUnit4ClassRunner.java:78)
org.junit.runners.BlockJUnit4ClassRunner.runChild(BlockJUnit4ClassRunner.java:57)
org.junit.runners.ParentRunner$3.run(ParentRunner.java:290)

最佳答案

无需在junit内创建SparkSession。它已经为您创建。如果需要使用它可以调用函数

sqlContext()。sparkSession();

为拿到它,为实现它。

09-11 16:39
查看更多