本文介绍了spark.sql.shuffle.partitions到底指的是什么?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

spark.sql.shuffle.partitions 到底指的是什么?我们是在谈论广泛转换的结果所产生的分区数量,还是在广泛转换的结果分区之前的中间某种中间分区中发生的事情?

What exactly does spark.sql.shuffle.partitions refer to? Are we talking of the number of partitions that is the results of a wide transformation, or something that happens in the middle as in some sort of intermediary partitioning before the result partition of the wide transformation?

由于我的理解,按照我们的广泛转型

Because in my understanding, as per a wide transformation we have

Parents RDDs -> shuffle files -> Child RDDs

spark.sql.shuffle.partitions参数在这里指的是什么?随机播放文件儿童RDD 或其他我忽略的内容?

What does the spark.sql.shuffle.partitions parameter refer to here? The shuffles files or the CHILD RDDs or something else that I ignored?

推荐答案

这已经在官方文档:

换句话说,它是子 Dataset 的分区数.

In other words it is the number of partitions of the child Dataset.

这篇关于spark.sql.shuffle.partitions到底指的是什么?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持!

08-23 06:07