本文介绍了Spark 可以从 pyspark 访问 Hive 表,但不能从 spark-submit 访问的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

因此,当从 pyspark 运行时,我会输入(不指定任何上下文):

So, when running from pyspark i would type in (without specifying any contexts) :

df_openings_latest = sqlContext.sql('select * from experian_int_openings_latest_orc')

.. 效果很好.

但是,当我从 spark-submit 运行我的脚本时,比如

However, when i run my script from spark-submit, like

spark-submit script.py 我把以下内容放在

from pyspark.sql import SQLContext
from pyspark import SparkConf, SparkContext
conf = SparkConf().setAppName('inc_dd_openings')
sc = SparkContext(conf=conf)
sqlContext = SQLContext(sc)

df_openings_latest = sqlContext.sql('select * from experian_int_openings_latest_orc')

但它给了我一个错误

pyspark.sql.utils.AnalysisException: u'Table not found:experian_int_openings_latest_orc;'

所以它看不到我的桌子.

So it doesnt see my table.

我做错了什么?请帮忙

附言Spark 版本是在 Amazon EMR 上运行的 1.6

P.S. Spark version is 1.6 running on Amazon EMR

推荐答案

Spark 2.x

如果在没有 .

The same problem may occur in Spark 2.x if SparkSession has been created without enabling Hive support.

Spark 1.x

这很简单.当您使用 PySpark shell 并且 Spark 已经构建了 Hive 支持时,默认的 SQLContext 实现(作为 sqlContext 可用的实现)是 HiveContext.

It is pretty simple. When you use PySpark shell, and Spark has been build with Hive support, default SQLContext implementation (the one available as a sqlContext) is HiveContext.

在您的独立应用程序中,您使用不提供 Hive 功能的普通 SQLContext.

In your standalone application you use plain SQLContext which doesn't provide Hive capabilities.

假设其余配置正确,只需替换:

Assuming the rest of the configuration is correct just replace:

from pyspark.sql import SQLContext

sqlContext = SQLContext(sc)

from pyspark.sql import HiveContext

sqlContext = HiveContext(sc)

这篇关于Spark 可以从 pyspark 访问 Hive 表,但不能从 spark-submit 访问的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持!

08-04 16:27