为什么在一个运行在HDFS上的Spark wordcount应用程序中，局部性级别都是任意的？

本文介绍了为什么在一个运行在HDFS上的Spark wordcount应用程序中，局部性级别都是任意的？的处理方法，对大家解决问题具有一定的参考价值，需要的朋友们下面随着小编来一起学习吧！

问题描述

我运行了一个包含12个节点（8G内存和8个内核）的Spark集群，用于某些测试。

我试图弄清楚为什么数据本地地图阶段的一个简单的wordcount应用程序都是任何。 14GB数据集存储在HDFS中。

解决方案

我今天遇到同样的问题。这是我的情况：

当我设置时，我的集群有9名工人（每个设置一个执行器） - total-executor-cores 9 ，Locality杠杆是NODE_LOCAL，但是当我将总executor-cores设置为9以下时，比如 - total-executor-cores 7 ，则Locality控制杆变为ANY，总时间成本是NODE_LOCAL控制杆的10倍。你可以试试。

I ran a Spark cluster of 12 nodes (8G memory and 8 cores for each) for some tests.

I'm trying to figure out why data localities of a simple wordcount app in "map" stage are all "Any". The 14GB dataset is stored in HDFS.

解决方案

I encounter the same problem today. This is my situation:

My cluster have 9 workers(each setup one executor by default) ,when i set --total-executor-cores 9, the Locality lever is NODE_LOCAL, but when i set the total-executor-cores below 9 such as --total-executor-cores 7, then Locality lever become ANY, and the total time cost is 10X than NODE_LOCAL lever. You can have a try.

这篇关于为什么在一个运行在HDFS上的Spark wordcount应用程序中，局部性级别都是任意的？的文章就介绍到这了，希望我们推荐的答案对大家有所帮助，也希望大家多多支持！