本文介绍了广播变量无法采取一切数据的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!
问题描述
当施加与collectasmap(广播变量),不是所有的值由广播变量包括在内。例如。
When applying broadcast variable with collectasmap(), not all the values are included by broadcast variable. e.g.
val emp = sc.textFile("...text1.txt").map(line => (line.split("\t")(3),line.split("\t")(1))).distinct()
val emp_new = sc.textFile("...text2.txt").map(line => (line.split("\t")(3),line.split("\t")(1))).distinct()
emp_new.foreach(println)
val emp_newBC = sc.broadcast(emp_new.collectAsMap())
println(emp_newBC.value)
在我内emp_newBC检查了价值,我看到,并非所有从emp_new数据出现。我想什么?
When i checked the values within emp_newBC I saw that not all the data from emp_new appear. What am i missing?
先谢谢了。
推荐答案
的问题是,emp_new是元组的集合,而emp_newBC是一个广播地图。如果您正在收集地图,重复键被删除,因此,你有较少的数据。如果你想找回所有元组的列表,用
The problem is that emp_new is a collection of tuples, while emp_newBC is a broadcasted map. If you are collecting map, the duplicate keys are being removed and therefore you have less data. If you want to get back a list of all tuples, use
VAL emp_newBC = sc.broadcast(emp_new.collect())
这篇关于广播变量无法采取一切数据的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持!