问题描述
我是MapReduce的新手,我想问一下是否有人可以使用MapReduce给我一个执行字长的频率的想法。我已经有了字数的代码,但我想使用字长,这是我到目前为止所做的。
I am new in MapReduce and I wanted to ask if someone can give me an idea to perform word length frequency using MapReduce. I've already have the code for word count but I wanted to use word length, this is what I've got so far.
public class WordCount {
public static class Map extends Mapper<LongWritable, Text, Text, IntWritable> {
private final static IntWritable one = new IntWritable(1);
private Text word = new Text();
public void map(LongWritable key, Text value, Context context) throws IOException, InterruptedException {
String line = value.toString();
StringTokenizer tokenizer = new StringTokenizer(line);
while (tokenizer.hasMoreTokens()) {
word.set(tokenizer.nextToken());
context.write(word, one);
}
}
}
谢谢......
推荐答案
对于字长频率, tokenizer.nextToken()
不应以键
的形式发出。实际上要考虑该字符串的长度。因此,只需进行以下更改,您的代码就可以正常运行:
For word length frequency, tokenizer.nextToken()
shouldn't be emit as key
. The length of that string actually be considered. So your code will do fine with just the following change and is sufficient :
word.set( String.valueOf( tokenizer.nextToken().length() ));
现在,如果你深入了解,你会发现 Mapper
输出键不应再是 Text
尽管它有效。更好地使用 IntWritable
键:
Now if you give deep look, you will realize that Mapper
output key should no longer be Text
although it works. Better use an IntWritable
key instead :
public static class Map extends Mapper<LongWritable, Text, IntWritable, IntWritable> {
private final static IntWritable one = new IntWritable(1);
private IntWritable wordLength = new IntWritable();
public void map(LongWritable key, Text value, Context context)
throws IOException, InterruptedException {
String line = value.toString();
StringTokenizer tokenizer = new StringTokenizer(line);
while (tokenizer.hasMoreTokens()) {
wordLength.set(tokenizer.nextToken().length());
context.write(wordLength, one);
}
}
}
虽然大多数<$使用 StringTokenizer
,使用 String.split $ c >方法。因此,相应地进行更改。
Although most of the MapReduce
examples use StringTokenizer
, it's cleaner and advisable to use String.split
method. So make the changes accordingly.
这篇关于MapReduce查找字长频率的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持!