1) for Categories
twitter handle , categories , sub_categories
handle , Products , MakeUp
handle , Health, MakeUp
handle2 , Services , Face
handle3 , Marketing , Soap
JavaPairRDD<String ,Category> categoryPairRDD
2) For Twitter
Twitter handle , twitter_post , twitter_likes
handle "Iphone" , 10
handle2 "Samsung" ,20
JavaPairRDD<String ,Twitter> twitterPairRDD
JavaPairRDD<String, Tuple2<Iterable<Ontologies>, Iterable<Twitter>>> grouped = categoryPairRDD
.cogroup(twitterPairRDD);
我应该如何迭代共组值,以便如果找到了对象,则如果为键打印值,否则
打印空值
即在我的类别中存在PairRDD handle3,但在twitterRDD中不存在它,因此应该将其用于关键handle3
handle3 , Marketing , Soap , null , null
最终输出应该是
handle , Products , Makeup , Iphone , 10
handle , Health , Makeup , , Iphone, 10
handle2 , Services , Face , Samsung , 20
handle3 , Marketing, Soap , null , null
最佳答案
设法获得解决方案
JavaPairRDD<String, Tuple2<Ontologies, Optional<twitterPairRDD>>> left = ontologiesPair.leftOuterJoin(twitterPairRDD);
left.foreach(new VoidFunction<Tuple2<String,Tuple2<Ontologies,Optional<Twitter>>>>() {
@Override
public void call(Tuple2<String, Tuple2<Ontologies, Optional<Instagram>>> arg0) throws Exception {
try{
Optional<Twitter> tweet = arg0._2._2();
//print values from tuple ie arg0._2._1() and tweet object
}
catch(Exception e){
Twitter tweet = new Twitter("",-1);
//Print values from arg0._2._1() and empty tweet object
}
但我仍然想知道使用共同小组的任何答案