本文介绍了AWS EMR 5.11.0 - Spark 上的 Apache Hive的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我正在尝试在 AWS EMR 5.11.0 上的 Spark 上设置 Apache Hive.Apache Spark 版本 - 2.2.1Apache Hive 版本 - 2.3.2纱线日志显示以下错误:

I am trying to setup Apache Hive on Spark on AWS EMR 5.11.0.Apache Spark Version - 2.2.1Apache Hive Version - 2.3.2Yarn logs show below error:

18/01/28 21:55:28 错误 ApplicationMaster:用户类抛出异常:java.lang.NoSuchFieldError:SPARK_RPC_SERVER_ADDRESSjava.lang.NoSuchFieldError: SPARK_RPC_SERVER_ADDRESS在 org.apache.hive.spark.client.rpc.RpcConfiguration.(RpcConfiguration.java:47)在 org.apache.hive.spark.client.RemoteDriver.(RemoteDriver.java:134)在 org.apache.hive.spark.client.RemoteDriver.main(RemoteDriver.java:516)在 sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)在 sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)在 sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)在 java.lang.reflect.Method.invoke(Method.java:498)在 org.apache.spark.deploy.yarn.ApplicationMaster$$anon$2.run(ApplicationMaster.scala:635)

18/01/28 21:55:28 ERROR ApplicationMaster: User class threw exception: java.lang.NoSuchFieldError: SPARK_RPC_SERVER_ADDRESSjava.lang.NoSuchFieldError: SPARK_RPC_SERVER_ADDRESS at org.apache.hive.spark.client.rpc.RpcConfiguration.(RpcConfiguration.java:47) at org.apache.hive.spark.client.RemoteDriver.(RemoteDriver.java:134) at org.apache.hive.spark.client.RemoteDriver.main(RemoteDriver.java:516) at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) at java.lang.reflect.Method.invoke(Method.java:498) at org.apache.spark.deploy.yarn.ApplicationMaster$$anon$2.run(ApplicationMaster.scala:635)

hive-server2.log:2018-01-28T21:56:50,109 错误 [HiveServer2-Background-Pool: Thread-68([])]: client.SparkClientImpl (SparkClientImpl.java:(112)) - 等待客户端连接超时.可能的原因包括网络问题、远程驱动程序错误或集群没有可用资源等.请检查 YARN 或 Spark 驱动程序的日志以获取更多信息.java.util.concurrent.ExecutionException:java.util.concurrent.TimeoutException:等待客户端连接超时.在 io.netty.util.concurrent.AbstractFuture.get(AbstractFuture.java:41) ~[netty-all-4.0.52.Final.jar:4.0.52.Final]在 org.apache.hive.spark.client.SparkClientImpl.(SparkClientImpl.java:109) ~[hive-exec-2.3.2-amzn-0.jar:2.3.2-amzn-0]在 org.apache.hive.spark.client.SparkClientFactory.createClient(SparkClientFactory.java:80) ~[hive-exec-2.3.2-amzn-0.jar:2.3.2-amzn-0]在 org.apache.hadoop.hive.ql.exec.spark.RemoteHiveSparkClient.createRemoteClient(RemoteHiveSparkClient.java:101) ~[hive-exec-2.3.2-amzn-0.jar:2.3.2-amzn-0]在 org.apache.hadoop.hive.ql.exec.spark.RemoteHiveSparkClient.(RemoteHiveSparkClient.java:97) ~[hive-exec-2.3.2-amzn-0.jar:2.3.2-amzn-0]在 org.apache.hadoop.hive.ql.exec.spark.HiveSparkClientFactory.createHiveSparkClient(HiveSparkClientFactory.java:73) ~[hive-exec-2.3.2-amzn-0.jar:2.3.2-amzn-0]在 org.apache.hadoop.hive.ql.exec.spark.session.SparkSessionImpl.open(SparkSessionImpl.java:62) ~[hive-exec-2.3.2-amzn-0.jar:2.3.2-amzn-0]在 org.apache.hadoop.hive.ql.exec.spark.session.SparkSessionManagerImpl.getSession(SparkSessionManagerImpl.java:115) ~[hive-exec-2.3.2-amzn-0.jar:2.3.2-amzn-0]在 org.apache.hadoop.hive.ql.exec.spark.SparkUtilities.getSparkSession(SparkUtilities.java:126) ~[hive-exec-2.3.2-amzn-0.jar:2.3.2-amzn-0]

hive-server2.log:2018-01-28T21:56:50,109 ERROR [HiveServer2-Background-Pool: Thread-68([])]: client.SparkClientImpl (SparkClientImpl.java:(112)) - Timed out waiting for client to connect.Possible reasons include network issues, errors in remote driver or the cluster has no available resources, etc.Please check YARN or Spark driver's logs for further information.java.util.concurrent.ExecutionException: java.util.concurrent.TimeoutException: Timed out waiting for client connection. at io.netty.util.concurrent.AbstractFuture.get(AbstractFuture.java:41) ~[netty-all-4.0.52.Final.jar:4.0.52.Final] at org.apache.hive.spark.client.SparkClientImpl.(SparkClientImpl.java:109) ~[hive-exec-2.3.2-amzn-0.jar:2.3.2-amzn-0] at org.apache.hive.spark.client.SparkClientFactory.createClient(SparkClientFactory.java:80) ~[hive-exec-2.3.2-amzn-0.jar:2.3.2-amzn-0] at org.apache.hadoop.hive.ql.exec.spark.RemoteHiveSparkClient.createRemoteClient(RemoteHiveSparkClient.java:101) ~[hive-exec-2.3.2-amzn-0.jar:2.3.2-amzn-0] at org.apache.hadoop.hive.ql.exec.spark.RemoteHiveSparkClient.(RemoteHiveSparkClient.java:97) ~[hive-exec-2.3.2-amzn-0.jar:2.3.2-amzn-0] at org.apache.hadoop.hive.ql.exec.spark.HiveSparkClientFactory.createHiveSparkClient(HiveSparkClientFactory.java:73) ~[hive-exec-2.3.2-amzn-0.jar:2.3.2-amzn-0] at org.apache.hadoop.hive.ql.exec.spark.session.SparkSessionImpl.open(SparkSessionImpl.java:62) ~[hive-exec-2.3.2-amzn-0.jar:2.3.2-amzn-0] at org.apache.hadoop.hive.ql.exec.spark.session.SparkSessionManagerImpl.getSession(SparkSessionManagerImpl.java:115) ~[hive-exec-2.3.2-amzn-0.jar:2.3.2-amzn-0] at org.apache.hadoop.hive.ql.exec.spark.SparkUtilities.getSparkSession(SparkUtilities.java:126) ~[hive-exec-2.3.2-amzn-0.jar:2.3.2-amzn-0]

还有,2018-01-28T21:56:50,110 错误 [HiveServer2-Background-Pool: Thread-68([])]: spark.SparkTask (SessionState.java:printError(1126)) - 无法执行 spark 任务,异常 'org.apache.hadoop.hive.ql.metadata.HiveException(无法创建火花客户端.)'org.apache.hadoop.hive.ql.metadata.HiveException:无法创建 spark 客户端.在 org.apache.hadoop.hive.ql.exec.spark.session.SparkSessionImpl.open(SparkSessionImpl.java:64)在 org.apache.hadoop.hive.ql.exec.spark.session.SparkSessionManagerImpl.getSession(SparkSessionManagerImpl.java:115)在 org.apache.hadoop.hive.ql.exec.spark.SparkUtilities.getSparkSession(SparkUtilities.java:126)在 org.apache.hadoop.hive.ql.exec.spark.SparkTask.execute(SparkTask.java:103)在 org.apache.hadoop.hive.ql.exec.Task.executeTask(Task.java:199)在 org.apache.hadoop.hive.ql.exec.TaskRunner.runSequential(TaskRunner.java:100)在 org.apache.hadoop.hive.ql.Driver.launchTask(Driver.java:2183)在 org.apache.hadoop.hive.ql.Driver.execute(Driver.java:1839)在 org.apache.hadoop.hive.ql.Driver.runInternal(Driver.java:1526)在 org.apache.hadoop.hive.ql.Driver.run(Driver.java:1237)在 org.apache.hadoop.hive.ql.Driver.run(Driver.java:1232)在 org.apache.hive.service.cli.operation.SQLOperation.runQuery(SQLOperation.java:255)在 org.apache.hive.service.cli.operation.SQLOperation.access$800(SQLOperation.java:91)在 org.apache.hive.service.cli.operation.SQLOperation$BackgroundWork$1.run(SQLOperation.java:348)在 java.security.AccessController.doPrivileged(Native Method)在 javax.security.auth.Subject.doAs(Subject.java:422)在 org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1698)在 org.apache.hive.service.cli.operation.SQLOperation$BackgroundWork.run(SQLOperation.java:362)在 java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)在 java.util.concurrent.FutureTask.run(FutureTask.java:266)在 java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)在 java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)在 java.lang.Thread.run(Thread.java:748)引起:java.lang.RuntimeException: java.util.concurrent.ExecutionException: java.util.concurrent.TimeoutException: Timed out waiting for client connection.

Also,2018-01-28T21:56:50,110 ERROR [HiveServer2-Background-Pool: Thread-68([])]: spark.SparkTask (SessionState.java:printError(1126)) - Failed to execute spark task, with exception 'org.apache.hadoop.hive.ql.metadata.HiveException(Failed to create spark client.)'org.apache.hadoop.hive.ql.metadata.HiveException: Failed to create spark client. at org.apache.hadoop.hive.ql.exec.spark.session.SparkSessionImpl.open(SparkSessionImpl.java:64) at org.apache.hadoop.hive.ql.exec.spark.session.SparkSessionManagerImpl.getSession(SparkSessionManagerImpl.java:115) at org.apache.hadoop.hive.ql.exec.spark.SparkUtilities.getSparkSession(SparkUtilities.java:126) at org.apache.hadoop.hive.ql.exec.spark.SparkTask.execute(SparkTask.java:103) at org.apache.hadoop.hive.ql.exec.Task.executeTask(Task.java:199) at org.apache.hadoop.hive.ql.exec.TaskRunner.runSequential(TaskRunner.java:100) at org.apache.hadoop.hive.ql.Driver.launchTask(Driver.java:2183) at org.apache.hadoop.hive.ql.Driver.execute(Driver.java:1839) at org.apache.hadoop.hive.ql.Driver.runInternal(Driver.java:1526) at org.apache.hadoop.hive.ql.Driver.run(Driver.java:1237) at org.apache.hadoop.hive.ql.Driver.run(Driver.java:1232) at org.apache.hive.service.cli.operation.SQLOperation.runQuery(SQLOperation.java:255) at org.apache.hive.service.cli.operation.SQLOperation.access$800(SQLOperation.java:91) at org.apache.hive.service.cli.operation.SQLOperation$BackgroundWork$1.run(SQLOperation.java:348) at java.security.AccessController.doPrivileged(Native Method) at javax.security.auth.Subject.doAs(Subject.java:422) at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1698) at org.apache.hive.service.cli.operation.SQLOperation$BackgroundWork.run(SQLOperation.java:362) at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511) at java.util.concurrent.FutureTask.run(FutureTask.java:266) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624) at java.lang.Thread.run(Thread.java:748)Caused by: java.lang.RuntimeException: java.util.concurrent.ExecutionException: java.util.concurrent.TimeoutException: Timed out waiting for client connection.

谁能指出我可能在配置中遗漏了什么?

Could anyone point out what I maybe missing in the configuration?

推荐答案

抱歉,EMR 尚不支持 Hive on Spark.我自己还没有尝试过,但我认为导致您错误的可能原因可能是 EMR 支持的 Spark 版本与 Hive 所依赖的 Spark 版本不匹配.上次查了一下,在Spark上运行Hive时,Hive不支持Spark 2.x.鉴于您的第一个错误是 NoSuchFieldError,似乎版本不匹配是最可能的原因.超时错误可能是一个红鲱鱼.

Sorry, but Hive on Spark is not yet supported on EMR. I have not tried it myself yet, but I think the likely cause of your errors might be a mismatch between the version of Spark supported on EMR and the version of Spark upon which Hive depends. The last time I checked, Hive did not support Spark 2.x when running Hive on Spark. Given that your first error is a NoSuchFieldError, it seems like a version mismatch is the most likely cause. The timeout error may be a red herring.

这篇关于AWS EMR 5.11.0 - Spark 上的 Apache Hive的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持!

08-01 20:25