问题描述
我运行了一个包含12个节点(8G内存和8个内核)的Spark集群,用于某些测试。
我试图弄清楚为什么数据本地地图阶段的一个简单的wordcount应用程序都是任何。 14GB数据集存储在HDFS中。
我今天遇到同样的问题。这是我的情况:
当我设置时,我的集群有9名工人(每个设置一个执行器) - total-executor-cores 9
,Locality杠杆是NODE_LOCAL,但是当我将总executor-cores设置为9以下时,比如 - total-executor-cores 7
,则Locality控制杆变为ANY,总时间成本是NODE_LOCAL控制杆的10倍。你可以试试。
I ran a Spark cluster of 12 nodes (8G memory and 8 cores for each) for some tests.
I'm trying to figure out why data localities of a simple wordcount app in "map" stage are all "Any". The 14GB dataset is stored in HDFS.
I encounter the same problem today. This is my situation:
My cluster have 9 workers(each setup one executor by default) ,when i set --total-executor-cores 9
, the Locality lever is NODE_LOCAL, but when i set the total-executor-cores below 9 such as --total-executor-cores 7
, then Locality lever become ANY, and the total time cost is 10X than NODE_LOCAL lever. You can have a try.
这篇关于为什么在一个运行在HDFS上的Spark wordcount应用程序中,局部性级别都是任意的?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持!