1) for Categories

twitter handle , categories , sub_categories

handle        ,  Products ,    MakeUp
handle        ,  Health,     MakeUp
handle2        , Services ,     Face
handle3         , Marketing ,    Soap

JavaPairRDD<String ,Category> categoryPairRDD

2) For Twitter

Twitter handle , twitter_post , twitter_likes

 handle                "Iphone"              , 10
 handle2               "Samsung"                 ,20


JavaPairRDD<String ,Twitter>  twitterPairRDD


JavaPairRDD<String, Tuple2<Iterable<Ontologies>, Iterable<Twitter>>> grouped = categoryPairRDD
           .cogroup(twitterPairRDD);


我应该如何迭代共组值,以便如果找到了对象,则如果为键打印值,否则
打印空值

即在我的类别中存在PairRDD handle3,但在twitterRDD中不存在它,因此应该将其用于关键handle3

handle3 , Marketing , Soap , null , null


最终输出应该是

handle , Products , Makeup  , Iphone , 10
handle , Health , Makeup ,  , Iphone, 10
handle2 , Services , Face , Samsung , 20
handle3  , Marketing, Soap ,  null , null

最佳答案

设法获得解决方案

JavaPairRDD<String, Tuple2<Ontologies, Optional<twitterPairRDD>>> left =  ontologiesPair.leftOuterJoin(twitterPairRDD);

    left.foreach(new VoidFunction<Tuple2<String,Tuple2<Ontologies,Optional<Twitter>>>>() {

        @Override
        public void call(Tuple2<String, Tuple2<Ontologies, Optional<Instagram>>> arg0) throws Exception {
            try{
                 Optional<Twitter> tweet = arg0._2._2();
                 //print values from tuple ie arg0._2._1() and tweet    object
              }
               catch(Exception e){
                Twitter tweet = new Twitter("",-1);
               //Print values from arg0._2._1() and empty tweet object
            }


但我仍然想知道使用共同小组的任何答案

09-28 00:13