本文介绍了阿帕奇火花:核心的数目与执行器的数量的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我想纱线运行星火作业时,了解内核的数量和执行者的数量之间的关系。

I'm trying to understand the relationship of the number of cores and the number of executors when running a Spark job on YARN.

该测试环境如下:


  • 数据节点的数量:3

  • 数据节点机规格:

    • CPU:酷睿i7-4790(内核#:4,线程#:8)

    • 内存:32GB(8GB×4)

    • 硬盘:8TB(2TB×4)

    网络:1GB

    星火版本:1.0.0

    Spark version: 1.0.0

    的Hadoop版本:2.4.0(Hortonworks HDP 2.1)

    Hadoop version: 2.4.0 (Hortonworks HDP 2.1)

    星火作业流程:sc.textFile - >过滤器 - >地图 - >过滤器 - > mapToPair - > reduceByKey - >地图 - > saveAsTextFile

    Spark job flow: sc.textFile -> filter -> map -> filter -> mapToPair -> reduceByKey -> map -> saveAsTextFile

    输入数据


    • 类型:单一的文本文件

    • 大小:165GB

    • 行数:454568833

    输出


    • 第二次过滤后的行数:310640717

    • 结果文件的行数:99848268

    • 结果文件的大小:41GB

    作业与以下配置运行:


    1. - 主纱客户--executor内存19G --executor-芯7 --num-执行人3 (每个数据节点执行人,使用不亚于核)

    1. --master yarn-client --executor-memory 19G --executor-cores 7 --num-executors 3 (executors per data node, use as much as cores)

    - 主纱客户--executor内存19G --executor-核心4 --num-执行人3 (内核#降低)

    - 主纱客户--executor-4G内存--executor-芯2 --num遗嘱执行人12 (少核,更执行人)

    --master yarn-client --executor-memory 4G --executor-cores 2 --num-executors 12 (less core, more executor)

    运行时间:


    1. 50分15秒

    1. 50 min 15 sec

    55分48秒

    31分23秒

    要我吃惊的是,(3)快得多。结果
    我认为,(1)会更快,因为会有洗牌的时候少,执行者之间的通信。结果
    虽然芯#(1)比较少(3),循环移位#of芯不是因为2的关键因素)没有表现良好。

    To my surprise, (3) was much faster.
    I thought that (1) would be faster, since there would be less inter-executor communication when shuffling.
    Although # of cores of (1) is fewer than (3), #of cores is not the key factor since 2) did perform well.

    (pwilmot的回答后添加以下内容。)

    (Followings were added after pwilmot's answer.)

    有关的信息,性能监视器屏幕捕获如下:

    For the information, the performance monitor screen capture is as follows:


    • 为(1)神经节数据节点汇总 - 工作开始于04:37


    • 为(3)神经节数据节点汇总 - 工作开始于19:47。请忽略此时间之前的图形。

    该图大致分为2个部分:

    The graph roughly divides into 2 sections:


    • 第一:从开始到reduceByKey:CPU密集型,没有网络活动

    • 二:reduceByKey后:CPU降低,网络I / O完成

    正如图中所示,(1)可作为它被赋予使用尽可能多的CPU功率。因此,它可能不是线程的数量的问题。

    As the graph shows, (1) can use as much CPU power as it was given. So, it might not be the problem of the number of the threads.

    如何解释这个结果?

    推荐答案

    我还没有玩过这些设置自己,所以这只是猜测,但如果我们认为这个问题是正常的内核和线程在分布式系统然后在群集可以使用多达12个内核(4 * 3台)和24线(8 * 3台)。在你的前两个例子你是给你的工作相当数量的内核的(潜在的计算空间),但线程(工种)的数量就这些内核上运行非常有限,你是不是能够使用多分配的处理能力因而作业是即使是分配更多的计算资源要慢。

    I haven't played with these settings myself so this is just speculation but if we think about this issue as normal cores and threads in a distributed system then in your cluster you can use up to 12 cores (4 * 3 machines) and 24 threads (8 * 3 machines). In your first two examples you are giving your job a fair number of cores (potential computation space) but the number of threads (jobs) to run on those cores is so limited that you aren't able to use much of the processing power allocated and thus the job is slower even though there is more computation resources allocated.

    您提到您的关注是在洗牌一步 - 而这是很好的限制了洗牌一步的开销一般是多少更重要的是利用集群的并行化。想想极端的例子 - 零洗牌单线程程序。

    you mention that your concern was in the shuffle step - while it is nice to limit the overhead in the shuffle step it is generally much more important to utilize the parallelization of the cluster. Think about the extreme case - a single threaded program with zero shuffle.

    这篇关于阿帕奇火花:核心的数目与执行器的数量的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持!

09-15 21:58