问题描述
我已经使用 sc.broadcast
查找文件以改善效果。
我也来了解在Spark SQL函数中有一个名为 broadcast
的函数。
两者之间有什么区别?
哪一个我应该用它来广播参考/查找表?
如果你想在Spark SQL中实现广播连接,你应该使用 broadcast
函数(结合所需的 spark.sql.autoBroadcastJoinThreshold
配置)。它会:
SparkContext.broadcast
用于处理本地对象并适用于使用用Spark DataFrames
。
I have used sc.broadcast
for lookup files to improve the performance.
I also came to know there is a function called broadcast
in Spark SQL Functions.
What is the difference between two?
Which one i should use it for broadcasting the reference/look up tables?
If you want to achieve broadcast join in Spark SQL you should use broadcast
function (combined with desired spark.sql.autoBroadcastJoinThreshold
configuration). It will:
- Mark given relation for broadcasting.
- Adjust SQL execution plan.
- When output relation is evaluated it will take care of collecting data, and broadcasting, and applying correct join mechanism.
SparkContext.broadcast
is used to handle local objects and is applicable for use with Spark DataFrames
.
这篇关于spark.sql中的sc.broadcast和广播函数之间的区别的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持!