本文介绍了无法在spark更新后查看配置单元表的数据的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

案例:
我有一个HiveTest表,它是一个ORC表和事务集,并且在spark shell和查看的数据中加载true。

  var rdd = objHiveContext.sql(select * from HiveTest)
rdd.show()

--- 可以查看数据

现在我去了我的蜂巢shell或ambari更新了表,例如

  hive>更新HiveTest set name ='test'---完成并成功
hive> select * from HiveTest - 能够查看更新的数据

现在,当我可以回来触发并运行我无法查看除列名外的任何数据

$ sc $ gt; $ b scala> rdd1.show()

- 这次只打印列,数据不会来

问题2:当我运行
scal> objHiveContext.sql时,无法从spark sql进行更新(update HiveTest set name ='test' )获得以下错误

  org.apache.spark.sql.AnalysisException:
查询中不受支持的语言功能:INSERT INTO HiveTest值(1,'sudhir','Software',1,'IT')
TOK_QUERY 0,0,17,0
TOK_FROM 0,-1,17,0
TOK_VIRTUAL_TABLE 0,-1,17,0
TOK_VIRTUAL_TABREF 0,-1,-1,0
TOK_ANONYMOUS 0,-1,-1,0
TOK_VALUES_TABLE 1,6,17,28
TOK_VALUE_ROW 1,7,17​​,28
1 1,8,8,2
'sudhir'1,10,10,30
'软件'1,12,12,39
1 1,14,14,50
'IT'1,16,16,52
TOK_INSERT 1,0,-1,12
TOK_INSERT_INTO 1,0,4, 12
TOK_TAB 1,4,4,12
TOK_TABNAME 1,4,4,12
HiveTest 1,4 4,4,12
TOK_SELECT 0,-1,-1,0
TOK_SELEXPR 0,-1,-1,0
TOK_ALLCOLREF 0,-1,-1,0

scala.NotImplementedError:无解析规则:
TOK_VIRTUAL_TABLE 0,-1,17,0
TOK_VIRTUAL_TABREF 0, - 1,-1,0
TOK_ANONYMOUS 0,-1,-1,0
TOK_VALUES_TABLE 1,6,17,28
TOK_VALUE_ROW 1,7,17​​,28
1 1 ,8,8,28
'sudhir'1,10,10,30
'软件'1,12,12,39
1 1,14,14,50
'IT'1,16,16,52

org.apache.spark.sql.hive.HiveQl $ .nodeToRelation(HiveQl.scala:1235)



这个错误是针对insert语句的update语句也是同样的错误类型。

解决方案

您是否试过objHiveContext.refreshTable(HiveTest)?
$ b

Spark SQL积极缓存Hive Metastore数据。



如果更新发生在Spark SQL之外,您可能会遇到一些意想不到的结果,因为Spark SQL的Hive Metastore版本已过期。



以下是更多信息:





文档主要提到Parquet,但这可能适用于ORC和其他文件格式。

例如,如果您将新文件添加到Spark SQL之外的目录中,则需要在Spark SQL中调用hiveContext.refreshTable()以查看新的数据。

Case: I have a table HiveTest which is a ORC table and transaction set true and loaded in spark shell and viewed data

var rdd= objHiveContext.sql("select * from HiveTest")
rdd.show()

--- Able to view data

Now I went to my hive shell or ambari updated the table , example

hive> update HiveTest set name='test'   ---Done and success
hive> select * from HiveTest -- able to view updated data

Now when I can come back to spark and run I cannot view any data except column names

scala>var rdd1= objHiveContext.sql("select * from HiveTest")
scala> rdd1.show()

--This time only columns are printed , data is not coming

Issue 2: Unable to update from spark sql when I runscal>objHiveContext.sql("update HiveTest set name='test'") getting below error

org.apache.spark.sql.AnalysisException:
Unsupported language features in query: INSERT INTO HiveTest values(1,'sudhir','Software',1,'IT')
TOK_QUERY 0, 0,17, 0
  TOK_FROM 0, -1,17, 0
    TOK_VIRTUAL_TABLE 0, -1,17, 0
      TOK_VIRTUAL_TABREF 0, -1,-1, 0
        TOK_ANONYMOUS 0, -1,-1, 0
      TOK_VALUES_TABLE 1, 6,17, 28
        TOK_VALUE_ROW 1, 7,17, 28
          1 1, 8,8, 28
          'sudhir' 1, 10,10, 30
          'Software' 1, 12,12, 39
          1 1, 14,14, 50
          'IT' 1, 16,16, 52
  TOK_INSERT 1, 0,-1, 12
    TOK_INSERT_INTO 1, 0,4, 12
      TOK_TAB 1, 4,4, 12
        TOK_TABNAME 1, 4,4, 12
          HiveTest 1, 4,4, 12
    TOK_SELECT 0, -1,-1, 0
      TOK_SELEXPR 0, -1,-1, 0
        TOK_ALLCOLREF 0, -1,-1, 0

scala.NotImplementedError: No parse rules for:
 TOK_VIRTUAL_TABLE 0, -1,17, 0
  TOK_VIRTUAL_TABREF 0, -1,-1, 0
    TOK_ANONYMOUS 0, -1,-1, 0
  TOK_VALUES_TABLE 1, 6,17, 28
    TOK_VALUE_ROW 1, 7,17, 28
      1 1, 8,8, 28
      'sudhir' 1, 10,10, 30
      'Software' 1, 12,12, 39
      1 1, 14,14, 50
      'IT' 1, 16,16, 52

org.apache.spark.sql.hive.HiveQl$.nodeToRelation(HiveQl.scala:1235)

This error is for Insert into statement same sort of error for update statement also.

解决方案

Have you tried objHiveContext.refreshTable("HiveTest")?

Spark SQL aggressively caches Hive metastore data.

If an update happens outside of Spark SQL, you might experience some unexpected results as Spark SQL's version of the Hive metastore is out of date.

Here's some more info:

http://spark.apache.org/docs/latest/sql-programming-guide.html#metadata-refreshing

http://spark.apache.org/docs/latest/api/scala/index.html#org.apache.spark.sql.hive.HiveContext

The docs mostly mention Parquet, but this likely applies to ORC and other file formats.

With JSON, for example, if you add new files into a directory outside of Spark SQL, you'll need to call hiveContext.refreshTable() within Spark SQL to see the new data.

这篇关于无法在spark更新后查看配置单元表的数据的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持!

08-24 05:25