本文介绍了初始化RDD为空的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!
问题描述
我有一个称为的RDD
I have an RDD called
JavaPairRDD<String, List<String>> existingRDD;
现在,我需要将此existingRDD
初始化为空,以便当我得到实际的rdd时,可以与此existingRDD
进行联合.如何将existingRDD
初始化为空的RDD,除了将其初始化为null?这是我的代码:
Now I need to initialize this existingRDD
to empty so that when I get the actual rdd's I can do a union with this existingRDD
.How do I initialize existingRDD
to an empty RDD except initializing it to null?Here is my code:
JavaPairRDD<String, List<String>> existingRDD;
if(ai.get()%10==0)
{
existingRDD.saveAsNewAPIHadoopFile("s3://manthan-impala-test/kinesis-dump/" + startTime + "/" + k + "/" + System.currentTimeMillis() + "/",
NullWritable.class, Text.class, TextOutputFormat.class); //on worker failure this will get overwritten
}
else
{
existingRDD.union(rdd);
}
推荐答案
要在 Java 中创建空的RDD,只需执行以下操作:
To create an empty RDD in Java, you'll just to do the following:
// Get an RDD that has no partitions or elements.
JavaSparkContext jsc;
...
JavaRDD<T> emptyRDD = jsc.emptyRDD();
我相信您知道如何使用泛型,否则,对于您的情况,您将需要:
I trust you know how to use generics, otherwise, for your case, you'll need:
JavaRDD<Tuple2<String,List<String>>> emptyRDD = jsc.emptyRDD();
JavaPairRDD<String,List<String>> emptyPairRDD = JavaPairRDD.fromJavaRDD(
existingRDD
);
您还可以使用mapToPair
方法将JavaRDD
转换为JavaPairRDD
.
You can also use the mapToPair
method to convert your JavaRDD
to a JavaPairRDD
.
在 Scala 中:
val sc: SparkContext = ???
...
val emptyRDD = sc.emptyRDD
// emptyRDD: org.apache.spark.rdd.EmptyRDD[Nothing] = EmptyRDD[1] at ...
这篇关于初始化RDD为空的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持!