问题描述
有没有办法使用 MapReduce 生成排列?
Is there a way to generate permutations with MapReduce?
输入文件:
1 title1
2 title2
3 title3
我的目标:
1,2 title1,title2
1,3 title1,title3
2,3 title2,title3
推荐答案
由于文件将有 n
输入,排列应该有 n^2
输出.您可以让 n
个任务执行其中的 n
个操作,这是有道理的.我相信你可以做到这一点(假设只有一个文件):
Since a file will have n
inputs, the permutations should have n^2
outputs. It makes sense that you could have n
tasks perform n
of those operations. I believe you could do this (assuming only for one file):
将您的输入文件放入 DistributedCache 以只读方式访问您的 Mapper/Reducers.在文件的每一行上进行输入拆分(如在 WordCount 中).因此,映射器将收到一行(例如您的示例中的 title1
).然后从 DistributedCache 中的文件中读取行并发出您的键/值对:将键作为输入,将值作为来自 DistributedCache 的文件中的每一行.
Put your input file into the DistributedCache to be accessible as read-only to your Mapper/Reducers. Make an input split on each line of the file (like in WordCount). The mapper will thus recieve one line (e.g. title1
in your example). Then read the lines out of the file in the DistributedCache and emit your key/value pairs: with the key as your input and the values as each line from the file from DistributedCache.
在此模型中,您应该只需要一个 Map 步骤.
In this model, you should only need a Map step.
类似:
public static class PermuteMapper
extends Mapper<Object, Text, Text, Text>{
private static final IN_FILENAME="file.txt";
public void map(Object key, Text value, Context context
) throws IOException, InterruptedException {
String inputLine = value.toString();
// set the property mapred.cache.files in your
// configuration for the file to be available
Path[] cachedPaths = DistributedCache.getLocalCacheArchives(conf);
if ( cachedPaths[0].getName().equals(IN_FILENAME) ) {
// function defined elsewhere
String[] cachedLines = getLinesFromPath(cachedPaths[0]);
for (String line : cachedLines)
context.emit(inputLine, line);
}
}
}
这篇关于使用 MapReduce 进行排列的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持!