问题描述
public void reduce(Pair< String )我收到一个迭代器作为参数,我想迭代两次值。 ,String>键,Iterator< IntWritable>值,
上下文上下文)
?怎么样 ?
签名由我使用的框架(即Hadoop)强加。
- 编辑 -
最后,真实签名 reduce
方法的方法是使用 iterable
。我被这个误导了(这实际上是唯一一个不被弃用(但是错误)的例子
如果您想再次迭代,我们必须缓存来自迭代器的值。至少我们可以结合第一次迭代和缓存:
Iterator< IntWritable> it = getIterator();
列表< IntWritable> cache = new ArrayList< IntWritable>();
//第一次循环和缓存
while(it.hasNext()){
IntWritable value = it.next();
doSomethingWithValue();
cache.add(value);
}
//第二个循环
(IntWritable value:cache){
doSomethingElseThatCantBeDoneInFirstLoop(value);
$ / code>
(仅仅为代码添加一个答案,知道你提到了这个解决方案你自己的评论;))
为什么没有缓存是不可能的: Iterator
是实现一个接口的东西,并没有一个单独的要求, Iterator
对象实际上存储了值。迭代两次,你必须重置迭代器(不可能)或克隆它(再次:不可能)。
给一个迭代器的例子,克隆/重置没有任何意义:
public class Randoms实现Iterator< Double> {
private int counter = 10;
@Override
public boolean hasNext(){
return counter> 0;
}
@Override
public boolean next(){
count--;
return Math.random();
}
@Override
public boolean remove(){
throw new UnsupportedOperationException(delete not supported);
}
}
I receive an iterator as argument and I would like to iterate on values twice.
public void reduce(Pair<String,String> key, Iterator<IntWritable> values,
Context context)
Is it possible ? How ?The signature is imposed by the framework I am using (namely Hadoop).
-- edit --
Finally the real signature of the reduce
method is with an iterable
. I was misled by this wiki page (which is actually the only non-deprecated (but wrong) example of wordcount I found).
We have to cache the values from the iterator if you want to iterate again. At least we can combine the first iteration and the caching:
Iterator<IntWritable> it = getIterator();
List<IntWritable> cache = new ArrayList<IntWritable>();
// first loop and caching
while (it.hasNext()) {
IntWritable value = it.next();
doSomethingWithValue();
cache.add(value);
}
// second loop
for(IntWritable value:cache) {
doSomethingElseThatCantBeDoneInFirstLoop(value);
}
(just to add an answer with code, knowing that you mentioned this solution in your own comment ;) )
why it's impossible without caching: an Iterator
is something that implements an interface and there is not a single requirement, that the Iterator
object actually stores values. Do iterate twice you either have to reset the iterator (not possible) or clone it (again: not possible).
To give an example for an iterator where cloning/resetting wouldn't make any sense:
public class Randoms implements Iterator<Double> {
private int counter = 10;
@Override
public boolean hasNext() {
return counter > 0;
}
@Override
public boolean next() {
count--;
return Math.random();
}
@Override
public boolean remove() {
throw new UnsupportedOperationException("delete not supported");
}
}
这篇关于在值上迭代两次(MapReduce)的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持!