Spark SPARK_PUBLIC_DNS 和 SPARK_LOCAL_IP 在带有 docker 容器的独立集群上

本文介绍了Spark SPARK_PUBLIC_DNS 和 SPARK_LOCAL_IP 在带有 docker 容器的独立集群上的处理方法，对大家解决问题具有一定的参考价值，需要的朋友们下面随着小编来一起学习吧！

问题描述

到目前为止，我只在 Linux 机器和 VM(桥接网络)上运行 Spark，但现在我对使用更多计算机作为从机很感兴趣.在计算机上分发 Spark Slave Docker 容器并让它们自动将自己连接到硬编码的 Spark master ip 会很方便.这已经很短了，但我在从容器上配置正确的 SPARK_LOCAL_IP(或 start-slave.sh 的 --host 参数)时遇到问题.

So far I have run Spark only on Linux machines and VMs (bridged networking) but now I am interesting on utilizing more computers as slaves. It would be handy to distribute a Spark Slave Docker container on computers and having them automatically connecting themselves to a hard-coded Spark master ip. This short of works already but I am having trouble configuring the right SPARK_LOCAL_IP (or --host parameter for start-slave.sh) on slave containers.

我认为我正确配置了 SPARK_PUBLIC_DNS 环境变量以匹配主机的网络可访问 ip(来自 10.0.x.x 地址空间)，至少它显示在 Spark 主 Web UI 上并且所有机器都可以访问.

I think I correctly configured the SPARK_PUBLIC_DNS env variable to match the host machine's network-accessible ip (from 10.0.x.x address space), at least it is shown on Spark master web UI and accessible by all machines.

我还按照 http://sometechshit.blogspot.ru/2015/04/running-spark-standalone-cluster-in.html，但在我的情况下，Spark master 在另一台机器上运行，而不是在 Docker 内部.我正在从网络中的另一台机器启动 Spark 作业，可能也运行一个从机本身.

I have also set SPARK_WORKER_OPTS and Docker port forwards as instructed at http://sometechshit.blogspot.ru/2015/04/running-spark-standalone-cluster-in.html, but in my case the Spark master is running on an other machine and not inside Docker. I am launching Spark jobs from an other machine within the network, possibly also running a slave itself.

我尝试过的事情:

根本不配置 SPARK_LOCAL_IP，slave 绑定到容器的 ip(如 172.17.0.45)，无法从 master 或驱动程序连接，计算在大多数时间仍然有效，但并非总是如此
绑定到 0.0.0.0，slave 与 master 通信并建立一些连接但它死了，另一个 slave 出现并消失，它们继续像这样循环
绑定到主机 ip，启动失败，因为该 ip 在容器内不可见，但由于配置了端口转发，其他人可以访问

我想知道为什么在连接到从站时没有使用配置的 SPARK_PUBLIC_DNS?我以为 SPARK_LOCAL_IP 只会影响本地绑定，而不会透露给外部计算机.

I wonder why isn't the configured SPARK_PUBLIC_DNS being used when connecting to slaves? I thought SPARK_LOCAL_IP would only affect on local binding but not being revealed to external computers.

在 https://databricks.gitbooks.io/databricks-spark-knowledge-base/content/troubleshooting/connectivity_issues.html 他们指示将 SPARK_LOCAL_IP 设置为驱动程序、主进程和工作进程的集群可寻址主机名"，这是唯一的选择吗?我会避免额外的 DNS 配置，只使用 ips 来配置计算机之间的流量.或者有什么简单的方法可以实现这一目标?

At https://databricks.gitbooks.io/databricks-spark-knowledge-base/content/troubleshooting/connectivity_issues.html they instruct to "set SPARK_LOCAL_IP to a cluster-addressable hostname for the driver, master, and worker processes", is this the only option? I would avoid the extra DNS configuration and just use ips to configure traffic between computers. Or is there an easy way to achieve this?

总结当前设置:

Master 在 Linux 上运行(虚拟机在 Windows 上的 VirtualBox 上使用桥接网络)
驱动程序从其他 Windows 计算机提交作业，效果很好
用于启动从站的 Docker 映像作为已保存"的 .tar.gz 文件分发，加载(curl xyz | gunzip | docker load)并在网络中的其他机器上启动，具有带有私有/公共 ip 配置的探针

在带有

Spark SPARK_PUBLIC_DNS 和 SPARK_LOCAL_IP 在带有 docker 容器的独立集群上

问题描述

推荐答案