问题描述
我有一个处于独立模式的 Apache Spark 集群 (2.2.0).直到现在运行使用 HDFS 来存储镶木地板文件.我正在使用 Apache Hive 1.2 的 Hive Metastore 服务访问,使用 Thriftserver,Spark over JDBC.
I have an Apache Spark Cluster(2.2.0) in standalone mode. Till now was running using HDFS to store the parquet files. I'm using the Hive Metastore Service of Apache Hive 1.2 to access, using the Thriftserver, Spark over JDBC.
现在我想使用 S3 对象存储代替 HDFS.我在 hive-site.xml 中添加了以下配置:
Now I want to use S3 Object Storage instead HDFS. I have added the following configuration to my hive-site.xml:
<property>
<name>fs.s3a.access.key</name>
<value>access_key</value>
<description>Profitbricks Access Key</description>
</property>
<property>
<name>fs.s3a.secret.key</name>
<value>secret_key</value>
<description>Profitbricks Secret Key</description>
</property>
<property>
<name>fs.s3a.endpoint</name>
<value>s3-de-central.profitbricks.com</value>
<description>ProfitBricks S3 Object Storage Endpoint</description>
</property>
<property>
<name>fs.s3a.endpoint.http.port</name>
<value>80</value>
<description>ProfitBricks S3 Object Storage Endpoint HTTP Port</description>
</property>
<property>
<name>fs.s3a.endpoint.https.port</name>
<value>443</value>
<description>ProfitBricks S3 Object Storage Endpoint HTTPS Port</description>
</property>
<property>
<name>hive.metastore.warehouse.dir</name>
<value>s3a://dev.spark.my_bucket/parquet/</value>
<description>Profitbricks S3 Object Storage Hive Warehouse Location</description>
</property>
我在 MySQL 5.7 数据库中有 hive 元存储.我已将以下 jar 文件添加到 Hive lib 文件夹:
I have the hive metastore in a MySQL 5.7 database. I have added to the Hive lib folder the following jar files:
- aws-java-sdk-1.7.4.jar
- hadoop-aws-2.7.3.jar
我已经删除了 MySQL 上旧的 hive Metastore 架构,然后我使用以下命令启动了 Metastore 服务:hive --service metastore &
并且我收到以下错误:
I have deleted the old hive metastore schema on MySQL and then I start the metastore service with the following command: hive --service metastore &
and I get the following error:
java.lang.NoClassDefFoundError: com/fasterxml/jackson/databind/ObjectMapper
at com.amazonaws.util.json.Jackson.<clinit>(Jackson.java:27)
at com.amazonaws.internal.config.InternalConfig.loadfrom(InternalConfig.java:182)
at com.amazonaws.internal.config.InternalConfig.load(InternalConfig.java:199)
at com.amazonaws.internal.config.InternalConfig$Factory.<clinit>(InternalConfig.java:232)
at com.amazonaws.ServiceNameFactory.getServiceName(ServiceNameFactory.java:34)
at com.amazonaws.AmazonWebServiceClient.computeServiceName(AmazonWebServiceClient.java:703)
at com.amazonaws.AmazonWebServiceClient.getServiceNameIntern(AmazonWebServiceClient.java:676)
at com.amazonaws.AmazonWebServiceClient.computeSignerByURI(AmazonWebServiceClient.java:278)
at com.amazonaws.AmazonWebServiceClient.setEndpoint(AmazonWebServiceClient.java:160)
at com.amazonaws.services.s3.AmazonS3Client.setEndpoint(AmazonS3Client.java:475)
at com.amazonaws.services.s3.AmazonS3Client.init(AmazonS3Client.java:447)
at com.amazonaws.services.s3.AmazonS3Client.<init>(AmazonS3Client.java:391)
at com.amazonaws.services.s3.AmazonS3Client.<init>(AmazonS3Client.java:371)
at org.apache.hadoop.fs.s3a.S3AFileSystem.initialize(S3AFileSystem.java:235)
at org.apache.hadoop.fs.FileSystem.createFileSystem(FileSystem.java:2811)
at org.apache.hadoop.fs.FileSystem.access$200(FileSystem.java:100)
at org.apache.hadoop.fs.FileSystem$Cache.getInternal(FileSystem.java:2848)
at org.apache.hadoop.fs.FileSystem$Cache.get(FileSystem.java:2830)
at org.apache.hadoop.fs.FileSystem.get(FileSystem.java:389)
at org.apache.hadoop.fs.Path.getFileSystem(Path.java:356)
at org.apache.hadoop.hive.metastore.Warehouse.getFs(Warehouse.java:104)
at org.apache.hadoop.hive.metastore.Warehouse.getDnsPath(Warehouse.java:140)
at org.apache.hadoop.hive.metastore.Warehouse.getDnsPath(Warehouse.java:146)
at org.apache.hadoop.hive.metastore.Warehouse.getWhRoot(Warehouse.java:159)
at org.apache.hadoop.hive.metastore.Warehouse.getDefaultDatabasePath(Warehouse.java:177)
at org.apache.hadoop.hive.metastore.HiveMetaStore$HMSHandler.createDefaultDB_core(HiveMetaStore.java:601)
at org.apache.hadoop.hive.metastore.HiveMetaStore$HMSHandler.createDefaultDB(HiveMetaStore.java:620)
at org.apache.hadoop.hive.metastore.HiveMetaStore$HMSHandler.init(HiveMetaStore.java:461)
at org.apache.hadoop.hive.metastore.RetryingHMSHandler.<init>(RetryingHMSHandler.java:66)
at org.apache.hadoop.hive.metastore.RetryingHMSHandler.getProxy(RetryingHMSHandler.java:72)
at org.apache.hadoop.hive.metastore.HiveMetaStore.newRetryingHMSHandler(HiveMetaStore.java:5762)
at org.apache.hadoop.hive.metastore.HiveMetaStore.newRetryingHMSHandler(HiveMetaStore.java:5757)
at org.apache.hadoop.hive.metastore.HiveMetaStore.startMetaStore(HiveMetaStore.java:5990)
at org.apache.hadoop.hive.metastore.HiveMetaStore.main(HiveMetaStore.java:5915)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:498)
at org.apache.hadoop.util.RunJar.run(RunJar.java:234)
at org.apache.hadoop.util.RunJar.main(RunJar.java:148)
Caused by: java.lang.ClassNotFoundException: com.fasterxml.jackson.databind.ObjectMapper
at java.net.URLClassLoader.findClass(URLClassLoader.java:381)
at java.lang.ClassLoader.loadClass(ClassLoader.java:424)
at sun.misc.Launcher$AppClassLoader.loadClass(Launcher.java:335)
at java.lang.ClassLoader.loadClass(ClassLoader.java:357)
缺少的类属于 Jackson 库,然后我复制了位于 spark-2.2.0-bin-hadoop2.7/jars/文件夹中的 Jackson-*.jar,它们是:
The missing class belongs to the Jackson library, then I have copied the Jackson-*.jar located on my spark-2.2.0-bin-hadoop2.7/jars/ folder which are:
- jackson-annotations-2.6.5.jar
- jackson-core-2.6.5.jar
- jackson-core-asl-1.9.13.jar
- jackson-databind-2.6.5.jar
- jackson-jaxrs-1.9.13.jar
- jackson-mapper-asl-1.9.13.jar
- jackson-module-paranamer-2.6.5.jar
- jackson-module-scala_2.11-2.6.5.jar
- jackson-xc-1.9.13.jar
但后来出现以下错误:
2018-01-05 17:51:00,819 ERROR [main]: metastore.HiveMetaStore (HiveMetaStore.java:main(5920)) - Metastore Thrift Server threw an exception...
java.lang.NumberFormatException: For input string: "100M"
at java.lang.NumberFormatException.forInputString(NumberFormatException.java:65)
at java.lang.Long.parseLong(Long.java:589)
at java.lang.Long.parseLong(Long.java:631)
at org.apache.hadoop.conf.Configuration.getLong(Configuration.java:1319)
at org.apache.hadoop.fs.s3a.S3AFileSystem.initialize(S3AFileSystem.java:248)
at org.apache.hadoop.fs.FileSystem.createFileSystem(FileSystem.java:2811)
at org.apache.hadoop.fs.FileSystem.access$200(FileSystem.java:100)
at org.apache.hadoop.fs.FileSystem$Cache.getInternal(FileSystem.java:2848)
at org.apache.hadoop.fs.FileSystem$Cache.get(FileSystem.java:2830)
at org.apache.hadoop.fs.FileSystem.get(FileSystem.java:389)
at org.apache.hadoop.fs.Path.getFileSystem(Path.java:356)
at org.apache.hadoop.hive.metastore.Warehouse.getFs(Warehouse.java:104)
at org.apache.hadoop.hive.metastore.Warehouse.getDnsPath(Warehouse.java:140)
at org.apache.hadoop.hive.metastore.Warehouse.getDnsPath(Warehouse.java:146)
at org.apache.hadoop.hive.metastore.Warehouse.getWhRoot(Warehouse.java:159)
at org.apache.hadoop.hive.metastore.Warehouse.getDefaultDatabasePath(Warehouse.java:177)
at org.apache.hadoop.hive.metastore.HiveMetaStore$HMSHandler.createDefaultDB_core(HiveMetaStore.java:601)
at org.apache.hadoop.hive.metastore.HiveMetaStore$HMSHandler.createDefaultDB(HiveMetaStore.java:620)
at org.apache.hadoop.hive.metastore.HiveMetaStore$HMSHandler.init(HiveMetaStore.java:461)
at org.apache.hadoop.hive.metastore.RetryingHMSHandler.<init>(RetryingHMSHandler.java:66)
at org.apache.hadoop.hive.metastore.RetryingHMSHandler.getProxy(RetryingHMSHandler.java:72)
at org.apache.hadoop.hive.metastore.HiveMetaStore.newRetryingHMSHandler(HiveMetaStore.java:5762)
at org.apache.hadoop.hive.metastore.HiveMetaStore.newRetryingHMSHandler(HiveMetaStore.java:5757)
at org.apache.hadoop.hive.metastore.HiveMetaStore.startMetaStore(HiveMetaStore.java:5990)
at org.apache.hadoop.hive.metastore.HiveMetaStore.main(HiveMetaStore.java:5915)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:498)
at org.apache.hadoop.util.RunJar.run(RunJar.java:234)
at org.apache.hadoop.util.RunJar.main(RunJar.java:148)
我认为这里的错误与某些 jar 版本不兼容有关,但我找不到正确的版本.
I think the error here it have something to do with some jar version incompatibility but I'm not able to find the correct versions.
有人可以帮我吗?
推荐答案
- 您绝对不能混合使用 Hadoop-common、hadoop-aws、aws-s3-sdk 和 jackson 版本的所有版本,否则您会看到堆栈跟踪.
- 而且它都是开源的,所以如果您在本地 D/L 所有源 JAR,您的 IDE 将帮助您找到导致堆栈跟踪的原因.这就是我们所做的.这并不神奇,现代 IDE (intellij IDEA) 甚至具有特殊的堆栈调试功能.
- You absolutely cannot mix versions of the Hadoop-common, hadoop-aws, aws-s3-sdk and jackson versions from what everything expects, or you will see stack traces.
- And its all open source, so if you D/L all the source JARs locally, your IDE will help you find what's causing the stack trace. This is what we all do. It's not magic, modern IDEs (intellij IDEA) even have special stack debugging.
这个进来是因为hadoop-common的/core-default.xml
资源中设置的fs.s3a.multipart.size
的值是100M,进来了使用 HADOOP-13680 和范围解析处理数字,如100M"而不是104857600.此堆栈跟踪显示Hadoop 2.8+ 配置"
This one is coming in because the value of fs.s3a.multipart.size
set in hadoop-common's /core-default.xml
resource is 100M, which came in with HADOOP-13680 and the range parsing handling numbers like "100M" instead of 104857600 . This stack trace says "Hadoop 2.8+ configuration"
您可以尝试将配置中的属性设置为该数值,但这是一个警告信号,表明 JAR 的版本不同步,并且在其他内容中断之前您可能只会得到几行.
You could try setting the property in your configs to that numeric value, but its a warning sign that versions of JARs are out of sync and you will probably only get a few lines further before something else breaks.
修复:确保 hadoop-common.jar
和 hadoop-aws.jar
同步.看起来您已经将 jackson 和 aws 排成一列,尽管 jackson 足够复杂,您永远不能认为这是理所当然的.
Fix: make sure that hadoop-common.jar
and hadoop-aws.jar
are in sync. It looks like you've got the jackson and aws ones lined up, though jackson is complex enough you can never take that for granted.
这篇关于Hive 1.2 Metastore 服务在将其配置为 S3 存储而不是 HDFS 后不会启动的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持!