问题描述
我有一台安装了Rstudio的Ubuntu桌面,我也有一个远程hadoop集群,我希望从RStudio连接到Centos下运行,从我的理解这是一种可行的方法,但有人可以证实这一点吗?
Rstudio不允许您连接到hadoop,但您可以使用hadoop streaming API提交您的hadoop作业。
有几个软件包可以帮助您入门。我已经使用rmr在hadoop集群上使用流api运行map / reduce作业。这些可以在这里找到。
还有一个rhipe软件包,可以让你在R脚本中与hdfs文件系统进行通信。
I have an Ubuntu desktop with Rstudio on, I also have a remote hadoop cluster running under Centos that I hope to connect to from RStudio, from my understanding this is a viable method but can someone please confirm this?
Rstudio will not allow you to connect to hadoop but you can use the hadoop streaming api to submit your hadoop jobs.
There are a few packages to help you get started. I have used rmr to run map/reduce jobs on a hadoop cluster with the streaming api. Those can be found here.
https://github.com/RevolutionAnalytics/RHadoop/wiki
There is also the rhipe package which will allow you to communicate with the hdfs file system inside your R scripts.
http://www.datadr.org/doc/functions.html
这篇关于RStudio连接到远程Hadoop服务器的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持!