中以编程方式启动

中以编程方式启动

本文介绍了在 Python 中以编程方式启动 HiveThriftServer的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

在 spark-shell (scala) 中,我们导入,org.apache.spark.sql.hive.thriftserver._用于以编程方式为特定配置单元上下文启动 Hive Thrift 服务器作为HiveThriftServer2.startWithContext(hiveContext) 公开该特定会话的注册临时表.

In the spark-shell (scala), we import,org.apache.spark.sql.hive.thriftserver._for starting Hive Thrift server programatically for a particular hive context asHiveThriftServer2.startWithContext(hiveContext) to expose a registered temp table for that particular session.

我们如何使用 python 做同样的事情?python 上是否有用于导入 HiveThriftServer 的包/api?任何其他想法/建议表示赞赏.

How can we do the same using python? Is there a package / api on python for importing HiveThriftServer? Any other thoughts / recommendations appreciated.

我们已经使用 pyspark 创建了一个数据框

We have used pyspark for creating a dataframe

谢谢

拉维纳拉亚南

推荐答案

可以使用 py4j java gateway 导入.以下代码适用于 spark 2.0.2,可以通过 beeline 查询在 python 脚本中注册的临时表.

You can import it using py4j java gateway. The following code worked for spark 2.0.2 and could query temp tables registered in python script through beeline.

from py4j.java_gateway import java_import
java_import(sc._gateway.jvm,"")

spark = SparkSession
        .builder
        .appName(app_name)
        .master(master)
        .enableHiveSupport()
        .config('spark.sql.hive.thriftServer.singleSession', True)
        .getOrCreate()
sc=spark.sparkContext
sc.setLogLevel('INFO')

#Start the Thrift Server using the jvm and passing the same spark session corresponding to pyspark session in the jvm side.
sc._gateway.jvm.org.apache.spark.sql.hive.thriftserver.HiveThriftServer2.startWithContext(spark._jwrapped)

spark.sql('CREATE TABLE myTable')
data_file="path to csv file with data"
dataframe = spark.read.option("header","true").csv(data_file).cache()
dataframe.createOrReplaceTempView("myTempView")

然后去beeline检查它是否正确启动:

Then go to beeline to check if it correclty started:

in terminal> $SPARK_HOME/bin/beeline
beeline> !connect jdbc:hive2://localhost:10000
beeline> show tables;

它应该显示在 python 中创建的表和临时表/视图,包括上面的myTable"和myTempView".必须有相同的火花会话才能看到临时视图

It should show the tables and temp tables/views created in python including "myTable" and "myTempView" above. It is necessary to have the same spark session in order to see temporary views

(参见答案:避免以编程方式使用创建的上下文启动 HiveThriftServer2.
注意:即使 Thrift 服务器从终端启动并连接到同一个 Metastore,也可以访问 hive 表,但是无法访问临时视图,因为它们在 spark 会话中且未写入 Metastore)

(see ans: Avoid starting HiveThriftServer2 with created context programmatically.
NOTE: It's possible to access hive tables even if the Thrift server is started from terminal and connected to the same metastore, however temp views cannot be accessed as they are in the spark session and not written to metastore)

这篇关于在 Python 中以编程方式启动 HiveThriftServer的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持!

08-04 16:21