问题描述
因此,当从 pyspark 运行时,我会输入(不指定任何上下文):
So, when running from pyspark i would type in (without specifying any contexts) :
df_openings_latest = sqlContext.sql('select * from experian_int_openings_latest_orc')
.. 效果很好.
但是,当我从 spark-submit
运行我的脚本时,比如
However, when i run my script from spark-submit
, like
spark-submit script.py
我把以下内容放在
from pyspark.sql import SQLContext
from pyspark import SparkConf, SparkContext
conf = SparkConf().setAppName('inc_dd_openings')
sc = SparkContext(conf=conf)
sqlContext = SQLContext(sc)
df_openings_latest = sqlContext.sql('select * from experian_int_openings_latest_orc')
但它给了我一个错误
pyspark.sql.utils.AnalysisException: u'Table not found:experian_int_openings_latest_orc;'
所以它看不到我的桌子.
So it doesnt see my table.
我做错了什么?请帮忙
附言Spark 版本是在 Amazon EMR 上运行的 1.6
P.S. Spark version is 1.6 running on Amazon EMR
推荐答案
Spark 2.x
如果在没有 .
The same problem may occur in Spark 2.x if SparkSession
has been created without enabling Hive support.
Spark 1.x
这很简单.当您使用 PySpark shell 并且 Spark 已经构建了 Hive 支持时,默认的 SQLContext
实现(作为 sqlContext
可用的实现)是 HiveContext
.
It is pretty simple. When you use PySpark shell, and Spark has been build with Hive support, default SQLContext
implementation (the one available as a sqlContext
) is HiveContext
.
在您的独立应用程序中,您使用不提供 Hive 功能的普通 SQLContext
.
In your standalone application you use plain SQLContext
which doesn't provide Hive capabilities.
假设其余配置正确,只需替换:
Assuming the rest of the configuration is correct just replace:
from pyspark.sql import SQLContext
sqlContext = SQLContext(sc)
与
from pyspark.sql import HiveContext
sqlContext = HiveContext(sc)
这篇关于Spark 可以从 pyspark 访问 Hive 表,但不能从 spark-submit 访问的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持!