问题描述
我是Hive,MapReduce和Hadoop的新手。
我使用Putty连接到配置单元表并访问表中的记录。所以我做的是 - 我打开了Putty,并输入了我输入的主机名 - vip.name.com
,然后点击打开
。然后我输入了我的用户名和密码,然后输入了一些命令进入Hive sql。以下是我做的清单:
$ bash
bash-3.00 $ hive
Hive历史记录文件= /tmp/rkost/hive_job_log_rkost_201207010451_1212680168.txt
hive> set mapred.job.queue.name = mdhi-technology;
hive>从表LIMIT 1中选择*;
所以我的问题是 -
vip.name.com ,这是一步一步的过程来做到这一点。
如果我需要通过我的Windows机器上的JDBC程序来完成同样的事情,那我该怎么做。使用JDBC程序的手段,我如何访问Hive表并将结果返回。正如我知道我可以如何使用oracle表执行此操作。但我唯一的困惑是,因为我使用这个主机名 vip.name.com
登录到Putty。我希望问题清楚。任何建议将不胜感激。
总之我的问题是 - 我可以在任何SQLClient中执行相同的操作,而不是从Putty进行日志记录吗?
更新 -
我试着按Mark的方式提示我。但我总是得到 - Hive:无法建立与vip.host.com:10000/default的连接:java.net.ConnectionException:连接超时:connect
您使用Putty做什么工作是将SSH安装到安装有Hive的计算机上并进行安装。然后,您将从Hive命令行发出Hive查询。这是发布Hive查询的一种方式。还有其他一些不需要SSH的方法,你可能需要的是通过JDBC连接。
是一篇文章,介绍如何使用SQuirreL通过JDBC连接到Amazon EMR集群上的Hive安装。这篇文章似乎是亚马逊特有的,但事实并非如此。只要你有Hive服务器运行在集群的其中一个节点上,并且没有防火墙阻止客户端机器和一个运行Hive的连接,你应该可以连接。
您可能想记住一些与上述链接相关的内容:
- 您可以忽略步骤3,要求您创建一个SSH通道,除非您使用EMR。
- 在您的连接URI中输入的端口可能与您的情况不同。将localhost替换为运行Hive的计算机的完全限定域名。要找出Hive服务器正在侦听的端口,可以查看日志目录(其位置取决于您的安装)中的Hive服务器nanny日志文件,或者运行简单的
netstat -a
命令。我相信10000是默认的端口号,因此直接尝试10000可能是有意义的。
I am new to Hive, MapReduce and Hadoop.I am using Putty to connect to hive table and access records in the tables. So what I did is- I opened Putty and in the host name I typed- vip.name.com
and then I click Open
. And then I entered my username and password and then few commands to get to Hive sql. Below is the list what I did
$ bash
bash-3.00$ hive
Hive history file=/tmp/rkost/hive_job_log_rkost_201207010451_1212680168.txt
hive> set mapred.job.queue.name=mdhi-technology;
hive> select * from table LIMIT 1;
So my question is-
Is there any other way I can do the same thing in any Sql client like Sql Developer or Squirel SQL Client instead of doing it from the command prompt. And if it is there then what is the step by step process to do this considering my example as I am logging to vip.name.com
from Putty .
And same thing if I need to do through JDBC Program in my windows machine then how I can do it. Means with the JDBC Program, how I can access Hive tables and get the result back. As I know how I can do this with the oracle tables. But the only confusion I have is, as I am using this hostname vip.name.com
to log into Putty. I am hoping the question is clear. Any suggestion will be appreciated.
In short my question is- Can I do the same thing in any SQLClient instead of logging from the Putty?
Update-
I tried doing the way Mark has suggested me. But I am always getting- Hive: Could not establish connection to vip.host.com:10000/default: java.net.ConnectionException: Connection timed out: connect
What are you doing with Putty is SSH'ing into a machine with Hive installed and set up. Then you are issuing Hive queries from the Hive command line. That is one way of issuing Hive queries. There are other ways that don't require SSH'ing, one you probably need is connection via JDBC.
Here is an article which describes how to connect to a Hive installation on Amazon's EMR cluster using SQuirreL via JDBC. The article might appear to be Amazon specific but it's not. As long you have Hive server running on one of the nodes of the cluster and no firewall impeding connection between the client machine and one running Hive, you should be able to connect.
A couple things you might want to keep in mind related to the above link:
- You can ignore step 3 where it asks you to create a SSH tunnel unless you are using EMR.
- The port that you enter in your connection URI might be different in your case. Replace localhost with the fully qualified domain name of the machine that Hive is running on. To find out which port Hive server is listening on, you can look into your Hive server nanny log file present in the log directory (whose location depends on your installation) or run a simple
netstat -a
command. I believe 10000 is the default port number, so it might make sense to try out 10000 directly.
这篇关于在SQLClient中访问Hive表,但不能从Putty中访问的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持!