本文介绍了spark javardd方法collect()&之间有什么区别? collectAsync()?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我正在探索Spark 2.0 Java API,并对collect()& collectAsync()可用于javardd.

I am exploring the spark 2.0 java api and have a doubt regarding collect() & collectAsync() available for javardd.

推荐答案

collect():

它返回一个包含此RDD中所有元素的数组.

collect():

It returns an array that contains all of the elements in this RDD.

List<Integer> data = Arrays.asList(1, 2, 3, 4, 5);
JavaRDD<Integer> rdd = sc.parallelize(data, 1);
List<Integer> result = rdd.collect();
//elements in will be copied to driver in above step and control will
//wait till the action completes


collectAsync():

collect异步版本,该版本返回 Future (java.util.concurrent.Future),用于检索包含此RDD中所有元素的数组.


collectAsync():

The asynchronous version of collect, which returns a Future(java.util.concurrent.Future) for retrieving an array containing all of the elements in this RDD.

List<Integer> data = Arrays.asList(1, 2, 3, 4, 5);
JavaRDD<Integer> rdd = sc.parallelize(data, 1);
JavaFutureAction<List<Integer>> future = rdd.collectAsync();
// retuns only future object but not data (no latency here)

List<Integer> result = future.get();
//Now elements in will be copied to driver

这篇关于spark javardd方法collect()&amp;之间有什么区别? collectAsync()?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持!

06-27 23:19