问题描述
我正在测试一个简单的mapreduce应用程序,但是我试图了解在遍历reduce调用的输入值时会发生什么。
这是一段奇怪的代码。
public void reduce(Text key,Iterable< E> values,Context上下文)
抛出IOException,InterruptedException {
Iterator< E> iterator = values.iterator();
E first =(E)statesIter.next();
while(statesIter.hasNext()){
E state = statesIter.next();
System.out.println(first.toString());
//一些其他的东西
}
//一些其他的东西
}
所以没什么奇怪的,除了每个println调用实际上打印一个不同的字符串。因此,每次我调用 next()
方法时,由 first
引用的对象都会发生变化。
那么为什么会出现这种奇怪的行为呢? 这有点违反直觉,但它实际上 - Hadoop重用键/值,您应该克隆它们想要保持它们。
I'm testing a simple mapreduce application, but I'm getting a little stuck trying to understand what happen when I iterate over input values of a reduce call.
This is the piece of code which behaves strangely..
public void reduce(Text key, Iterable<E> values, Context context)
throws IOException, InterruptedException{
Iterator<E> iterator = values.iterator();
E first = (E)statesIter.next();
while(statesIter.hasNext()){
E state = statesIter.next();
System.out.println(first.toString());
// some other stuff
}
// some other stuff
}
so nothing strange.. except the fact that each println invocation actually prints a different string. So, every time I call the next()
method, the object referenced by first
changes.
So why this strange behavior?
It's somewhat counter-intuitive, but it's actually documented in the API docs -- Hadoop reuses the keys / values, you should clone them if you want to keep them around.
这篇关于Hadoop MapReduce迭代reduce调用的输入值的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持!