通过Python客户端进行Hive查询

通过Python客户端进行Hive查询

本文介绍了通过Python客户端进行Hive查询的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我在运行AWS EMR的hadoop集群上安装了hive 0.8。



我试图做一些数据QA,包括运行配置单元查询和获取结果到Python中,其中包含更多的逻辑。

目前,这是通过发送配置单元查询作为作业流程步骤,将这些结果转储到主节点上的本地存储,将这些结果传送到本地计算机,然后用python加载文件并解析结果。总而言之,这不是一个非常有趣的过程。



理想情况下,我可以用类似于以下的方式来做到这一点:

  conn = hive.connect(ip,port,user,pw)
cursor = conn.cursor()
cursor.execute(query)
rs = cursor.fetchall()

看来这应该是可能的。 Hive表示它支持。还有另一个,看起来它正在做我想做的事。



但是,我在查找文档时遇到了问题。特别是,我还没有弄清楚在哪些地方获得这些例子中使用的包装。如果有人能够提供有关如何让python客户端正常工作的详细说明,将会非常有帮助,但如果没有这样做,仅仅知道在哪里获得这些包会很有帮助。



./ build / dist / lib / py

如果您在PYTHONPATH环境变量中包含该路径,您应该能够访问模块,或者添加使用sys模块在脚本中创建Python路径的路径。

另外请注意,不再有名为'hive'的模块。在示例代码中,您链接的配置单元应该替换为hive_service。


I have hive 0.8 installed on a hadoop cluster running in AWS EMR.

I am trying to do some data QA, which involves running a hive query and fetching the results into python where some more logic is contained.

Currently, this is achieved by sending a hive query as a jobflow step, dumping those results to local storage on the master node, SCP-ing those results to my local machine, and then loading the file with python and parsing the results. All in all, not a very fun process.

Ideally, I would be able to do this in a fashion similar to:

conn = hive.connect(ip, port, user, pw)
cursor = conn.cursor()
cursor.execute(query)
rs = cursor.fetchall()

It seems that this is supposedly possible. Hive says that it supports it here. There is also another SO question that looks like it's doing what I'd like to do.

However, I'm having trouble finding documentation. In particular, I haven't been able to figure out where to obtain the packages used in these examples. It would be immensely helpful if anyone were able to provide detailed instructions as to how to get the python client working, but failing that, it would be helpful just to know where to obtain these packages.

解决方案

If you build hive from source, the modules will be located here (relative to the hive-trunk directory):

./build/dist/lib/py

You should be able to access the modules if you include that path in your PYTHONPATH environment variable, or you add that path to your python path in your script with the sys module.

Also note that there is no longer a module named 'hive'. In the example code you linked 'hive' should be replaced with 'hive_service'.

这篇关于通过Python客户端进行Hive查询的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持!

08-20 06:49