问题描述
什么是标题为局部性级别的意义和5的状态数据
地方 - >本地程序 - >节点本地 - >本地架 - >任何
What's the meaning of the title "Locality Level" and the 5 status Datalocal --> process local --> node local --> rack local --> Any?
推荐答案
据我所知道的地方级指示已执行对数据的访问类型。当一个节点完成其所有工作和CPU空闲时,星火可以决定开始,需要从其他地方获取数据的其他未决任务。所以,理想情况下,所有的任务应该是本地的过程,因为它是用较低的数据访问延迟有关。
The locality level as far as I know indicates which type of access to data has been performed. When a node finishes all its work and its CPU become idle, Spark may decide to start other pending task that require obtaining data from other places. So ideally, all your tasks should be process local as it is associated with lower data access latency.
您可以使用移动到其他地方的水平之前配置的等待时间:
You can configure the wait time before moving to other locality levels using:
spark.locality.wait
有关参数的更多信息可以在
对于不同层次的PROCESS_LOCAL,NODE_LOCAL,RACK_LOCAL,或任何我认为方法的 findTask 和 findSpeculativeTask 在 org.apache.spark.scheduler。 TaskSetManager 说明根据自己的所在地国家级星火如何选择任务。它首先会检查它打算在同一个执行者的过程将推出PROCESS_LOCAL任务。如果不是,它会检查是否NODE_LOCAL任务可能是在其他执行者在同一节点中,或者它需要从象HDFS系统检索,缓存,等等RACK_LOCAL意味着数据是在另一节点,因此它需要之前被传输执行。最后,ANY就是采取有可能在当前节点运行任何未决的任务。
With respect to the different levels PROCESS_LOCAL, NODE_LOCAL, RACK_LOCAL, or ANY I think the methods findTask and findSpeculativeTask in org.apache.spark.scheduler.TaskSetManager illustrate how Spark chooses tasks based on their locality level. It first will check for PROCESS_LOCAL tasks which are going to be launched in the same executor process. If not, it will check for NODE_LOCAL tasks that may be in other executors in the same node or it need to be retrieved from systems like HDFS, cached, etc. RACK_LOCAL means that data is in another node and therefore it need to be transferred prior execution. And finally, ANY is just to take any pending task that may run in the current node.
/**
* Dequeue a pending task for a given node and return its index and locality level.
* Only search for tasks matching the given locality constraint.
*/
private def findTask(execId: String, host: String, locality: TaskLocality.Value)
: Option[(Int, TaskLocality.Value)] =
{
for (index <- findTaskFromList(execId, getPendingTasksForExecutor(execId))) {
return Some((index, TaskLocality.PROCESS_LOCAL))
}
if (TaskLocality.isAllowed(locality, TaskLocality.NODE_LOCAL)) {
for (index <- findTaskFromList(execId, getPendingTasksForHost(host))) {
return Some((index, TaskLocality.NODE_LOCAL))
}
}
if (TaskLocality.isAllowed(locality, TaskLocality.RACK_LOCAL)) {
for {
rack <- sched.getRackForHost(host)
index <- findTaskFromList(execId, getPendingTasksForRack(rack))
} {
return Some((index, TaskLocality.RACK_LOCAL))
}
}
// Look for no-pref tasks after rack-local tasks since they can run anywhere.
for (index <- findTaskFromList(execId, pendingTasksWithNoPrefs)) {
return Some((index, TaskLocality.PROCESS_LOCAL))
}
if (TaskLocality.isAllowed(locality, TaskLocality.ANY)) {
for (index <- findTaskFromList(execId, allPendingTasks)) {
return Some((index, TaskLocality.ANY))
}
}
// Finally, if all else has failed, find a speculative task
findSpeculativeTask(execId, host, locality)
}
这篇关于什么是的&QUOT意义;局部性平&QUOT;星火集群的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持!