GoogleHadoopFileSystem不能转换为Hadoop的系统中呢？ | ogleHadoopFileSystem不能转换为Hadoop的

本文介绍了GoogleHadoopFileSystem不能转换为Hadoop的系统中呢？的处理方法，对大家解决问题具有一定的参考价值，需要的朋友们下面随着小编来一起学习吧！

问题描述

原来的问题是

Short Answer

有一些短期的选项：

使用星火1.3.1了。

在bdutil部署，使用HDFS作为默认文件系统（ - default_fs = HDFS ）;你仍然可以直接指定 GS：在你的工作// 的路径，只是HDFS将用于中间数据和暂存文件。有在此模式下使用原始蜂巢一些小的不兼容，虽然。

使用原始 VAL sqlContext =新org.apache.spark.sql.SQLContext（SC），而不是HiveContext如果你不需要HiveContext功能。

git的克隆https://github.com/dennishuo/spark 并运行 ./ make-distribution.sh --name我的定制-spark --tgz --skip-java的测试-Pyarn -Phadoop-2.6 -Dhadoop.version = 2.6.0 -Phive -Phive-thriftserver 让你可以在你的bdutil的指定一个新的压缩包 spark_env.sh 。

There are a few short-term options:

长的答案

我们已经证实，它只是体现在 fs.default.name 和 fs.defaultFS 设置为一个 GS：无论// 路径是否试图加载从 parquetFile（GS：// ......）的路径或 parquetFile（HDFS：// ...），并在 fs.default.name 和 fs.defaultFS 被设置为一个HDFS路径，无论从HDFS和GCS加载数据工作正常。这也是特定于火花1.4+目前，并没有在火花1.3.1以上present

Long Answer

回归似乎在这实际上修复了之前相关的类加载问题，请的用于确定要使用的类加载器，并与上述交互致力于打破GoogleHadoopFileSystem类加载。

We've verified that it only manifests when fs.default.name and fs.defaultFS are set to a gs:// path regardless of whether trying to load a path from parquetFile("gs://...") or parquetFile("hdfs://..."), and when fs.default.name and fs.defaultFS are set to an HDFS path, loading data from both HDFS and from GCS works fine. This is also specific to Spark 1.4+ currently, and is not present in Spark 1.3.1 or older.

在该文件下面的线包括在 com.google一切。* 作为确实加载共享库，因为番石榴和可能protobuf的依赖关系的共同类，但遗憾的是GoogleHadoopFileSystem应该被加载在这种情况下，蜂巢班，就像 org.apache.hadoop.hdfs.DistributedFileSystem 。我们只是碰巧不幸共享 com.google。* 包的命名空间。

The regression appears to have been introduced in https://github.com/apache/spark/commit/9ac8393663d759860c67799e000ec072ced76493 which actually fixes a prior related classloading issue, SPARK-8368. While the fix itself is correct for normal cases, there's a method IsolatedClientLoader.isSharedClass used to determine which classloader to use, and interacts with the aforementioned commit to break GoogleHadoopFileSystem classloading.

The following lines in that file include everything under com.google.* as a "shared class" because of Guava and possibly protobuf dependencies which are indeed loaded as shared libraries, but unfortunately GoogleHadoopFileSystem should be loaded as a "hive class" in this case, just like org.apache.hadoop.hdfs.DistributedFileSystem. We just happen to unluckily share the com.google.* package namespace.

这可以通过添加以下行 $ {} SPARK_INSTALL /conf/log4j.properties 验证

This can be verified by adding the following line to ${SPARK_INSTALL}/conf/log4j.properties:

log4j.logger.org.apache.spark.sql.hive.client=DEBUG

和输出显示：

...
15/07/20 20:59:14 DEBUG IsolatedClientLoader: hive class: org.apache.hadoop.hdfs.DistributedFileSystem - jar:file:/home/hadoop/spark-install/lib/spark-assembly-1.4.1-hadoop2.6.0.jar!/org/apache/hadoop/hdfs/DistributedFileSystem.class
...
15/07/20 20:59:14 DEBUG IsolatedClientLoader: shared class: com.google.cloud.hadoop.fs.gcs.GoogleHadoopFileSystem
java.lang.RuntimeException: java.lang.ClassCastException: com.google.cloud.hadoop.fs.gcs.GoogleHadoopFileSystem cannot be cast to org.apache.hadoop.fs.FileSystem

这篇关于GoogleHadoopFileSystem不能转换为Hadoop的系统中呢？的文章就介绍到这了，希望我们推荐的答案对大家有所帮助，也希望大家多多支持！