将多少执行程序和内核分配给我的Spark任务

将多少执行程序和内核分配给我的Spark任务

本文介绍了Spark - 将多少执行程序和内核分配给我的Spark任务的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

Spark体系结构完全围绕执行程序和内核的概念展开。我想实际上看到在集群中运行的Spark应用程序运行多少个执行程序和内核。



我试着在我的应用程序中使用下面的代码片段,但没有运气。

  val sc = new SparkContext(conf)
conf.get(spark.executor.instances)
conf .get(spark.executor.cores)

有没有办法使用 SparkContext 对象或 SparkConf 对象等。

解决方案

Scala(编程方式):



getExecutorStorageStatus getExecutorMemoryStatus 都返回执行者的数量,包括驱动程序。
类似于下面的示例代码片段。

  / **返回当前活动/注册执行程序的方法
*不包括驱动程序。
* @param sc用于检索注册执行程序的spark上下文。
* @返回host:port形式的执行程序列表。
* /
def currentActiveExecutors(sc:SparkContext):Seq [String] = {
val allExecutors = sc.getExecutorMemoryStatus.map(_._ 1)
val driverHost:String = sc.getConf.get(spark.driver.host)
allExecutors.filter(!_.split(:)(0).equals(driverHost))。toList
}

sc.getConf.getInt(spark.executor.instances,1)

同样获得所有属性并打印如下,你也可以获得核心信息。

  sc.getConf.getAll.mkString( \\\

OR

  sc.getConf.toDebugString 

主要是 spark.executor.cores 为执行者 spark.driver.cores 驱动程序应该具有此值。



Python:





编辑
但是可以使用从SparkSession暴露的Py4J绑定进行访问。



sc._jsc.sc()。getExecutorMemoryStatus()


Spark architecture is entirely revolves around the concept of executors and cores. I would like to see practically how many executors and cores running for my spark application running in a cluster.

I was trying to use below snippet in my application but no luck.

val conf = new SparkConf().setAppName("ExecutorTestJob")
val sc = new SparkContext(conf)
conf.get("spark.executor.instances")
conf.get("spark.executor.cores")

Is there any way to get those values using SparkContext Object or SparkConf object etc..

解决方案

Scala (Programmatic way) :

getExecutorStorageStatus and getExecutorMemoryStatus both return the number of executors including driver.like below example snippet.

/** Method that just returns the current active/registered executors
        * excluding the driver.
        * @param sc The spark context to retrieve registered executors.
        * @return a list of executors each in the form of host:port.
        */
       def currentActiveExecutors(sc: SparkContext): Seq[String] = {
         val allExecutors = sc.getExecutorMemoryStatus.map(_._1)
         val driverHost: String = sc.getConf.get("spark.driver.host")
         allExecutors.filter(! _.split(":")(0).equals(driverHost)).toList
       }

sc.getConf.getInt("spark.executor.instances", 1)

similarly get all properties and print like below you may get cores information as well..

sc.getConf.getAll.mkString("\n")

OR

sc.getConf.toDebugString

Mostly spark.executor.cores for executors spark.driver.cores driver should have this value.

Python :

Above methods getExecutorStorageStatus and getExecutorMemoryStatus, In python api were not implemented

EDITBut can be accessed using Py4J bindings exposed from SparkSession.

sc._jsc.sc().getExecutorMemoryStatus()

这篇关于Spark - 将多少执行程序和内核分配给我的Spark任务的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持!

09-01 16:54