问题描述
下面是创建HIVE表并将数据加载到其中的简单代码.
Below is the simple code to create HIVE table, and load data in it.
import java.util.Properties
import org.apache.spark.SparkContext
import org.apache.spark.SparkConf
import org.apache.spark.sql.SQLContext
import sqlContext.implicits._
val conf = new SparkConf().setAppName("HIVE_Test").setMaster("local").set("spark.executor.memory","1g").set("spark.driver.allowMultipleContexts", "true");
val sc = new SparkContext(conf);
val sqlContext = new org.apache.spark.sql.hive.HiveContext(sc);
sqlContext.sql("CREATE TABLE test_amit_hive12(VND_ID INT,VND_NM STRING,VND_SHORT_NM STRING,VND_ADR_LN_1_TXT STRING,VND_ADR_LN_2_TXT STRING,VND_CITY_CD STRING,VND_ZIP_CD INT,LOAD_TS FLOAT,UPDT_TS FLOAT, PROMO_STTS_CD STRING, VND_STTS_CD STRING)");
sqlContext.sql("LOAD DATA LOCAL INPATH 'path_to/amitesh/part.txt' INTO TABLE test_amit_hive12");
exit()
我有2个查询:
1)在创建表"中,我对表名进行了硬编码,但是代码如何理解文件所具有的分隔符?当我们通过HIVE提示创建HIVE表时,我们确实编写了以下几行
1) In the "create table", I have hard coded the table names, but how would the code understand what delimiter the file is having ? when we create a HIVE table through HIVE prompt, we do write following lines
FIELDS TERMINATED BY ‘’
LINES TERMINATED BY ‘’
那么,在使用Spark/Scala时我们不需要这样做吗?
So, don't we need to do that while working with Spark/Scala?
2)通过Spark-shell执行代码时,出现以下错误::
2) While executing the code through Spark-shell, I am getting below error::
ERROR KeyProviderCache: Could not find uri with key [dfs.encryption.key.provider.uri] to create a keyProvider !!
res1: org.apache.spark.sql.DataFrame = [result: string]
我找到了关于stackoverflow的帖子,但未得到答复.在其他网站上,我发现它是Hadoop 2.7.1的错误.我检查了我的,我有2.7.2.那么,我的版本中存在该错误的可能性是什么?我正在使用IBM的BigInsight.以下是我的版本详细信息
I found a post on stackoverflow, but it was unanswered. On other website, i found that its a bug with Hadoop 2.7.1. I checked mine, I have 2.7.2. So, what is the possibilities of the bug existing with my version. I am using IBMs BigInsight. Following is my version details
Hadoop 2.7.2-IBM-12
但是,有没有人可以帮助我解决此问题,我将必须有非常有力的证据来证明这是我经理的错误.
However, is there any one who could help me resolve this issue, I will have to have a very strong proof to prove this as a bug to my Manager.
下面是人们说错误是错误的链接之一
Below is one of link where people says the error is a bug
` https://talendexpert.com/talend-spark-error/
推荐答案
有点晚了,但这能解决您的问题吗?
A bit late, but does this solve your problem?
遇到了同样的错误,但这对我来说并不是一个真正的问题.错误后,代码运行正常.有时它会弹出而有时却不弹出,因此也许它已连接到集群中涉及特定Spark作业的执行程序节点.
Got the same error, but it was not really a problem for me.After the error the code ran just fine. Sometimes it pops up and sometimes it doesn't, so maybe it is connected to the executor nodes on our cluster that are involved in the particular Spark job.
它与Hadoop版本没有直接关系,但是它基于您运行的Spark版本.
It is not directly related to the Hadoop version, but it is based on the Spark version you run.
此处报告了错误和解决方案: https://issues.apache.org/jira/browse/SPARK-20594 .
Bug and solution are reported here: https://issues.apache.org/jira/browse/SPARK-20594.
也就是说,升级到Spark 2.2.0可能会解决此问题.
That is, upgrading to Spark 2.2.0 probably will solve this issue.
这篇关于错误KeyProviderCache:找不到带有密钥的uri的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持!