

然后我在map函数中创建一对(用户名,朋友),每一对都有一个键Key [name1] [name2],其中name1,2是用户名和朋友名字按字母顺序排列。
通常,在读取userA和userB行后,他们在他们的朋友列表中都有对方,我会得到2个具有不同值的标识键,在本例中为:KeyUserAUserB:UserA,UserB 和KeyUserAUserB:UserB,UserA。
但是,在reducer函数中,我分别获得了两次KeyUserAUserB。这不是我期待的Hadoop ....





最后。整个意想不到的行为是因为我正在使用组合器类= reducer类。在评论该行后,一切都按预期工作。

In my Hadoop project, I am reading lines of text file with a number of names for each line. The first name represents my username, and the rest are a list of friends.Then I am creating pairs of (username, friend) , in the map function, each pair has a key "Key[name1][name2]" where name1,2 are the username and the friend name ordered alphabetically.Normally, after reading the line of userA and line of userB , and they both have each other in their friends list, I would get 2 identic keys with different values, which in this case is: KeyUserAUserB : "UserA,UserB" and KeyUserAUserB : "UserB,UserA".What I expect in the reduce function is to get, at one point, KeyUserAUserB as a key and a pair of "UserA,UserB","UserB,UserA" as values . So the values iterator would have 2 elements.However, in the reducer function, I get twice KeyUserAUserB with a single value respectively. This is not what I am expecting from Hadoop....

I also noticed in my userlogs , I have 4 "m" folders, and in the first 2 of them I have the logs which helped me identify the above. In both "m" logs the output (System.out) of the map function is intertwined with the output of reduce function . I don't know if that has anything to do with my anomaly, but I expected the reduce output to stay in the "r" folder.Also, for the above example, one log for KeyUserAUserB is printed in one "m" log file, and the other KeyUserAUserB in the other... Although for some cases it happens that a KeyUserAUserB comes to the reducer with both values, i found at least one case when it never comes with both values (and also those 2 pairs key-value with identical key reside in different "m" log files).

Another thing I noticed, the output collect from the Reduce function doesn't send the values directly to the output file, but passes them again as an input for the the same Reduce function...

What do you think about this behavior, what can be the possible causes?


Finally. The whole unexpected behavior is because I am using a combiner class = the reducer class. After commenting that line, everything worked as expected.


08-24 02:31