我想为Spark和MongoDB构建一个具有maven依赖关系的Scala应用程序。我使用的Scala版本是2.10。我的pom看起来像这样(忽略了无关的部分):
<properties>
<maven.compiler.source>1.6</maven.compiler.source>
<maven.compiler.target>1.6</maven.compiler.target>
<encoding>UTF-8</encoding>
<scala.tools.version>2.10</scala.tools.version>
<!-- Put the Scala version of the cluster -->
<scala.version>2.10.5</scala.version>
</properties>
<!-- repository to add org.apache.spark -->
<repositories>
<repository>
<id>cloudera-repo-releases</id>
<url>https://repository.cloudera.com/artifactory/repo/</url>
</repository>
</repositories>
<build>
<sourceDirectory>src/main/scala</sourceDirectory>
<testSourceDirectory>src/test/scala</testSourceDirectory>
<!-- <pluginManagement> -->
<plugins>
<plugin>
<!-- see http://davidb.github.com/scala-maven-plugin -->
<groupId>net.alchim31.maven</groupId>
<artifactId>scala-maven-plugin</artifactId>
<version>3.1.3</version>
<executions>
<execution>
<goals>
<goal>compile</goal>
<goal>testCompile</goal>
</goals>
<configuration>
<args>
<arg>-make:transitive</arg>
<arg>-dependencyfile</arg>
<arg>${project.build.directory}/.scala_dependencies</arg>
</args>
</configuration>
</execution>
</executions>
</plugin>
<plugin>
<groupId>org.apache.maven.plugins</groupId>
<artifactId>maven-surefire-plugin</artifactId>
<version>2.13</version>
<configuration>
<useFile>false</useFile>
<disableXmlReport>true</disableXmlReport>
<!-- If you have classpath issue like NoDefClassError,... -->
<!-- useManifestOnlyJar>false</useManifestOnlyJar -->
<includes>
<include>**/*Test.*</include>
<include>**/*Suite.*</include>
</includes>
</configuration>
</plugin>
<!-- "package" command plugin -->
<plugin>
<artifactId>maven-assembly-plugin</artifactId>
<version>2.4.1</version>
<configuration>
<descriptorRefs>
<descriptorRef>jar-with-dependencies</descriptorRef>
</descriptorRefs>
</configuration>
<executions>
<execution>
<id>make-assembly</id>
<phase>package</phase>
<goals>
<goal>single</goal>
</goals>
</execution>
</executions>
</plugin>
</plugins>
<!-- </pluginManagement> -->
</build>
<dependencies>
<dependency>
<groupId>org.scala-lang</groupId>
<artifactId>scala-library</artifactId>
<version>${scala.version}</version>
</dependency>
<dependency>
<groupId>org.apache.spark</groupId>
<artifactId>spark-core_2.10</artifactId>
<version>1.6.1</version>
</dependency>
<dependency>
<groupId>org.mongodb.spark</groupId>
<artifactId>mongo-spark-connector_2.10</artifactId>
<version>1.1.0</version>
</dependency>
<dependency>
<groupId>org.mongodb.scala</groupId>
<artifactId>mongo-scala-driver_2.11</artifactId>
<version>1.1.1</version>
</dependency>
</dependencies>
当我运行
mvn clean assembly:assembly
时,发生以下错误:C:\Develop\workspace\SparkApplication>mvn clean assembly:assembly
[INFO] Scanning for projects...
[INFO]
[INFO] ------------------------------------------------------------------------
[INFO] Building SparkApplication 0.0.1-SNAPSHOT
[INFO] ------------------------------------------------------------------------
[INFO]
[INFO] --- maven-clean-plugin:2.5:clean (default-clean) @ SparkApplication ---
[INFO] Deleting C:\Develop\workspace\SparkApplication\target
[INFO]
[INFO] ------------------------------------------------------------------------
[INFO] Building SparkApplication 0.0.1-SNAPSHOT
[INFO] ------------------------------------------------------------------------
[INFO]
[INFO] >>> maven-assembly-plugin:2.4.1:assembly (default-cli) > package @ SparkA
pplication >>>
[INFO]
[INFO] --- maven-resources-plugin:2.6:resources (default-resources) @ SparkAppli
cation ---
[INFO] Using 'UTF-8' encoding to copy filtered resources.
[INFO] skip non existing resourceDirectory C:\Develop\workspace\SparkApplication
\src\main\resources
[INFO]
[INFO] --- maven-compiler-plugin:3.1:compile (default-compile) @ SparkApplicatio
n ---
[INFO] Nothing to compile - all classes are up to date
[INFO]
[INFO] --- scala-maven-plugin:3.1.3:compile (default) @ SparkApplication ---
[WARNING] Expected all dependencies to require Scala version: 2.10.5
[WARNING] xx.xxx.xxx:SparkApplication:0.0.1-SNAPSHOT requires scala version:
2.10.5
[WARNING] com.twitter:chill_2.10:0.5.0 requires scala version: 2.10.4
[WARNING] Multiple versions of scala libraries detected!
[INFO] C:\Develop\workspace\SparkApplication\src\main\scala:-1: info: compiling
[INFO] Compiling 1 source files to C:\Develop\workspace\SparkApplication\target\
classes at 1477993255625
[INFO] No known dependencies. Compiling everything
[ERROR] error: bad symbolic reference. A signature in package.class refers to ty
pe compileTimeOnly
[INFO] in package scala.annotation which is not available.
[INFO] It may be completely missing from the current classpath, or the version o
n
[INFO] the classpath might be incompatible with the version used when compiling
package.class.
[ERROR] C:\Develop\workspace\SparkApplication\src\main\scala\com\examples\MainEx
ample.scala:33: error: Reference to method intWrapper in class LowPriorityImplic
its should not have survived past type checking,
[ERROR] it should have been processed and eliminated during expansion of an encl
osing macro.
[ERROR] val count = sc.parallelize(1 to NUM_SAMPLES).map{i =>
[ERROR] ^
[ERROR] two errors found
[INFO] ------------------------------------------------------------------------
[INFO] BUILD FAILURE
[INFO] ------------------------------------------------------------------------
[INFO] Total time: 10.363 s
[INFO] Finished at: 2016-11-01T10:40:58+01:00
[INFO] Final Memory: 20M/353M
[INFO] ------------------------------------------------------------------------
[ERROR] Failed to execute goal net.alchim31.maven:scala-maven-plugin:3.1.3:compi
le (default) on project SparkApplication: wrap: org.apache.commons.exec.ExecuteE
xception: Process exited with an error: 1(Exit value: 1) -> [Help 1]
[ERROR]
[ERROR] To see the full stack trace of the errors, re-run Maven with the -e swit
ch.
[ERROR] Re-run Maven using the -X switch to enable full debug logging.
[ERROR]
[ERROR] For more information about the errors and possible solutions, please rea
d the following articles:
[ERROR] [Help 1] http://cwiki.apache.org/confluence/display/MAVEN/MojoExecutionE
xception
仅当添加
mongo-scala-driver_2.11
依赖项时,才会发生该错误。如果没有这种依赖性,则将构建jar。我的代码当前是Spark website中的Pi-Estimation示例:val conf = new SparkConf()
.setAppName("Cluster Application")
//.set("spark.serializer", "org.apache.spark.serializer.KryoSerializer")
val sc = new SparkContext(conf)
val count = sc.parallelize(1 to NUM_SAMPLES).map{i =>
val x = Math.random()
val y = Math.random()
if (x*x + y*y < 1) 1 else 0
}.reduce(_ + _)
println("Pi is roughly " + 4.0 * count / NUM_SAMPLES)
我还尝试在每个元素中添加以下标记,因为我在github issue中发现了这一点。虽然没有帮助。
<exclusions>
<exclusion>
<!-- make sure wrong scala version is not pulled in -->
<groupId>org.scala-lang</groupId>
<artifactId>scala-library</artifactId>
</exclusion>
</exclusions>
如何解决这个问题? MongoDB Scala驱动程序似乎是针对Scala 2.11构建的,但是Spark需要Scala 2.10。
最佳答案
删除Mongo Scala驱动程序依赖项,该依赖项未针对Scala 2.10进行编译,因此不兼容。
好消息是MongoDB Spark Connector是一个独立的连接器。它使用同步Mongo Java Driver,因为Spark是为CPU密集型同步任务设计的。它被设计为遵循Spark习惯用法,并且是将MongoDB连接到Spark所需的全部。
另一方面,Mongo Scala Driver是现代Scala约定的惯用语。所有IO都是完全异步的。这对于Web应用程序和提高单个计算机的可伸缩性非常有用。