我坚持在文本文件中找到25个最常用的单词。
我有一个模糊的想法如何使用TreeMap
来执行此操作,但我不确定
public static String CommonElements(WordStream words){
TreeMap<String, Integer> Map = new TreeMap<String, Integer>();
for(String w: words){
w = w.toLowerCase();
int token = Map.get(w);
if(token != 0){
Map.put(w,token);
}
}
}
该方法假定返回文本文件中最常见的25个单词的列表。
最佳答案
码
字符串Stackoverflow could help you. Help Help at Stackoverflow.
的示例代码
import java.util.regex.Pattern;
import java.util.stream.Collectors.*;
import java.util.stream.*;
import java.util.HashMap;
import java.util.*;
import java.util.Map.Entry;
public class WordCount {
public static void main(String[] args) {
String sentence = "Stackoverflow could help you. Help Help at Stackoverflow.";
Stream<String> wordStream = Pattern.compile("\\W").splitAsStream(sentence);
HashMap<String,Integer> unsortedMap = new HashMap<String,Integer>();
// foreach word count how many the word occurs in the wordstream
wordStream.forEach((wordReal) -> {
String word = wordReal.toLowerCase();
if (!word.equals("")) {
if (unsortedMap.get(word) == null) {
unsortedMap.put(word, 0);
}
unsortedMap.put(word, unsortedMap.get(word) + 1);
}
});
// sort hashmap after value desc
Map<String, Integer> sortedMap =
unsortedMap.entrySet().stream()
.sorted(Map.Entry.comparingByValue((v1,v2)->v2.compareTo(v1)))
.collect(Collectors.toMap(Entry::getKey, Entry::getValue,
(e1, e2) -> e1, LinkedHashMap::new));
// just println word and wordcount, here you can limit to 25 (just delete)
for (Map.Entry<String, Integer> entry : sortedMap.entrySet()) {
System.out.println("Word : `" + entry.getKey() + "` Count : " + entry.getValue());
}
}
}
输出量
Word : `help` Count : 3
Word : `stackoverflow` Count : 2
Word : `at` Count : 1
Word : `could` Count : 1
Word : `you` Count : 1
如果只想得到25个结果,则只需要限制25个结果之后的输出,或者只删除25个结果之后的所有条目。
关于java - 坚持在文本文件中找到25个最常见的单词,我们在Stack Overflow上找到一个类似的问题:https://stackoverflow.com/questions/35988025/