问题描述
在 Google Cloud Dataproc 中,如何访问Spark或Hadoop作业历史服务器?我希望能够在运行作业时查看我的作业历史记录详细信息.
In Google Cloud Dataproc how can I access the Spark or Hadoop job history servers? I want to be able to look at my job history details when I run jobs.
推荐答案
为此,您需要创建一个到集群的SSH隧道,然后在浏览器中使用SOCKS代理.这是由于以下事实:在群集上打开Web界面时,防火墙规则阻止任何人进行连接(出于安全性考虑).
To do this, you will need to create an SSH tunnel to the cluster and then use a SOCKS proxy with your browser. This is due to the fact that while the web interfaces are open on the cluster, firewall rules prevent anyone from connecting (for security.)
要访问Spark或Hadoop作业历史记录服务器,您首先需要创建一个到群集主节点的SSH隧道:
To access the Spark or Hadoop job history server, you will first need to create an SSH tunnel to the master node of your cluster:
gcloud compute ssh --zone=<master-host-zone> \ --ssh-flag="-D 1080" --ssh-flag="-N" --ssh-flag="-n" <master-host-name>
gcloud compute ssh --zone=<master-host-zone> \ --ssh-flag="-D 1080" --ssh-flag="-N" --ssh-flag="-n" <master-host-name>
一旦建立了SSH隧道,就需要配置浏览器以使用SOCKS代理.假设您使用的是Chrome,并且知道系统上Chrome的路径,则可以使用以下方式使用SOCKS代理启动Chrome:
Once you have the SSH tunnel in place, you need to configure a browser to use a SOCKS proxy. Assuming you're using Chrome and know the path to Chrome on your system, you can launch Chrome with a SOCKS proxy using:
<Google Chrome executable path> \
--proxy-server="socks5://localhost:1080" \
--host-resolver-rules="MAP * 0.0.0.0 , EXCLUDE localhost" \
--user-data-dir=/tmp/
有关如何执行此操作的完整详细信息可在此处找到.
The full details on how to do this can be found here.
这篇关于在Dataproc中,我如何访问Spark和Hadoop的工作历史记录?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持!