本文介绍了spark javardd方法collect()&之间有什么区别? collectAsync()?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!
问题描述
我正在探索Spark 2.0 Java API,并对collect()
& collectAsync()
可用于javardd.
I am exploring the spark 2.0 java api and have a doubt regarding collect()
& collectAsync()
available for javardd.
推荐答案
collect():
它返回一个包含此RDD中所有元素的数组.
collect():
It returns an array that contains all of the elements in this RDD.
List<Integer> data = Arrays.asList(1, 2, 3, 4, 5);
JavaRDD<Integer> rdd = sc.parallelize(data, 1);
List<Integer> result = rdd.collect();
//elements in will be copied to driver in above step and control will
//wait till the action completes
collectAsync():
collect
的异步版本,该版本返回 Future (java.util.concurrent.Future
),用于检索包含此RDD中所有元素的数组.
collectAsync():
The asynchronous version of collect
, which returns a Future(java.util.concurrent.Future
) for retrieving an array containing all of the elements in this RDD.
List<Integer> data = Arrays.asList(1, 2, 3, 4, 5);
JavaRDD<Integer> rdd = sc.parallelize(data, 1);
JavaFutureAction<List<Integer>> future = rdd.collectAsync();
// retuns only future object but not data (no latency here)
List<Integer> result = future.get();
//Now elements in will be copied to driver
这篇关于spark javardd方法collect()&之间有什么区别? collectAsync()?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持!