问题描述
spark.sql.shuffle.partitions
到底指的是什么?我们是在谈论作为宽转换结果的分区数量,还是在中间发生的某些事情,例如在宽转换的结果分区之前的某种中间分区?
What exactly does spark.sql.shuffle.partitions
refer to? Are we talking of the number of partitions that is the results of a wide transformation, or something that happens in the middle as in some sort of intermediary partitioning before the result partition of the wide transformation?
因为据我所知,根据我们的广泛转变
Because in my understanding, as per a wide transformation we have
Parents RDDs -> shuffle files -> Child RDDs
这里的spark.sql.shuffle.partitions参数指的是什么?shuffles 文件 或 CHILD RDDs 或其他我忽略的东西?
What does the spark.sql.shuffle.partitions parameter refer to here? The shuffles files or the CHILD RDDs or something else that I ignored?
推荐答案
这已经在 官方文档:
spark.sql.shuffle.partitions
200 配置混洗数据以进行连接或聚合时使用的分区数.
换句话说就是子Dataset
的分区数.
In other words it is the number of partitions of the child Dataset
.
这篇关于spark.sql.shuffle.partitions 究竟指的是什么?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持!