本文介绍了spark.sql中的sc.broadcast和广播函数之间的区别的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我已经使用 sc.broadcast 查找文件以改善效果。



我也来了解在Spark SQL函数中有一个名为 broadcast 的函数。



两者之间有什么区别?



哪一个我应该用它来广播参考/查找表?

解决方案

如果你想在Spark SQL中实现广播连接,你应该使用 broadcast 函数(结合所需的 spark.sql.autoBroadcastJoinThreshold 配置)。它会:


  • 调整SQL执行计划
  • >
  • 在评估输出关系时,它将负责收集数据,广播和应用正确的连接机制。



SparkContext.broadcast 用于处理本地对象并适用于使用用Spark DataFrames


I have used sc.broadcast for lookup files to improve the performance.

I also came to know there is a function called broadcast in Spark SQL Functions.

What is the difference between two?

Which one i should use it for broadcasting the reference/look up tables?

解决方案

If you want to achieve broadcast join in Spark SQL you should use broadcast function (combined with desired spark.sql.autoBroadcastJoinThreshold configuration). It will:

  • Mark given relation for broadcasting.
  • Adjust SQL execution plan.
  • When output relation is evaluated it will take care of collecting data, and broadcasting, and applying correct join mechanism.

SparkContext.broadcast is used to handle local objects and is applicable for use with Spark DataFrames.

这篇关于spark.sql中的sc.broadcast和广播函数之间的区别的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持!

10-19 00:55