在Dataproc中，我如何访问Spark和Hadoop的工作历史记录?

本文介绍了在Dataproc中，我如何访问Spark和Hadoop的工作历史记录?的处理方法，对大家解决问题具有一定的参考价值，需要的朋友们下面随着小编来一起学习吧！

问题描述

在 Google Cloud Dataproc 中，如何访问Spark或Hadoop作业历史服务器?我希望能够在运行作业时查看我的作业历史记录详细信息.

In Google Cloud Dataproc how can I access the Spark or Hadoop job history servers? I want to be able to look at my job history details when I run jobs.

推荐答案

为此，您需要创建一个到集群的SSH隧道，然后在浏览器中使用SOCKS代理.这是由于以下事实:在群集上打开Web界面时，防火墙规则阻止任何人进行连接(出于安全性考虑).

To do this, you will need to create an SSH tunnel to the cluster and then use a SOCKS proxy with your browser. This is due to the fact that while the web interfaces are open on the cluster, firewall rules prevent anyone from connecting (for security.)

要访问Spark或Hadoop作业历史记录服务器，您首先需要创建一个到群集主节点的SSH隧道:

To access the Spark or Hadoop job history server, you will first need to create an SSH tunnel to the master node of your cluster:

gcloud compute ssh --zone=<master-host-zone> \ --ssh-flag="-D 1080" --ssh-flag="-N" --ssh-flag="-n" <master-host-name>

一旦建立了SSH隧道，就需要配置浏览器以使用SOCKS代理.假设您使用的是Chrome，并且知道系统上Chrome的路径，则可以使用以下方式使用SOCKS代理启动Chrome:

Once you have the SSH tunnel in place, you need to configure a browser to use a SOCKS proxy. Assuming you're using Chrome and know the path to Chrome on your system, you can launch Chrome with a SOCKS proxy using:

<Google Chrome executable path> \
  --proxy-server="socks5://localhost:1080" \
  --host-resolver-rules="MAP * 0.0.0.0 , EXCLUDE localhost" \
  --user-data-dir=/tmp/

有关如何执行此操作的完整详细信息可在此处找到.

The full details on how to do this can be found here.

这篇关于在Dataproc中，我如何访问Spark和Hadoop的工作历史记录?的文章就介绍到这了，希望我们推荐的答案对大家有所帮助，也希望大家多多支持！