问题描述
我正在AWS AWS上学习Spark.在此过程中,我试图了解执行程序数(--num-executors)和执行程序核心数(--executor-cores)之间的区别.有人可以在这里告诉我吗?
I am learning Spark on AWS EMR. In the process I am trying to understand the difference between number of executors(--num-executors) and executor cores (--executor-cores). Can any one please tell me here?
当我尝试提交以下工作时,我也收到错误消息:
Also when I am trying to submit the following job, I am getting error:
spark-submit --deploy-mode cluster --master yarn --num-executors 1 --executor-cores 5 --executor-memory 1g -–conf spark.yarn.submit.waitAppCompletion=false wordcount.py s3://test/spark-example/input/input.txt s3://test/spark-example/output21
Error: Unrecognized option: -–conf
推荐答案
执行者数量是将执行您的应用程序的不同纱线容器(认为是进程/JVM)的数量.
Number of executors is the number of distinct yarn containers (think processes/JVMs) that will execute your application.
执行者核心数是您在每个执行者(容器)内部获得的线程数.
Number of executor-cores is the number of threads you get inside each executor (container).
因此,您的spark应用程序的并行度(正在运行的并发线程/任务数)为#executors X #executor-cores
.如果您有10个执行者和5个执行者核心,那么(希望)同时运行50个任务.
So the parallelism (number of concurrent threads/tasks running) of your spark application is #executors X #executor-cores
. If you have 10 executors and 5 executor-cores you will have (hopefully) 50 tasks running at the same time.
这篇关于纱:火花中的执行者数和执行者核数有什么区别?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持!