我坚持在文本文件中找到25个最常用的单词。

我有一个模糊的想法如何使用TreeMap来执行此操作,但我不确定

public static String CommonElements(WordStream words){
    TreeMap<String, Integer> Map = new TreeMap<String, Integer>();
    for(String w: words){
        w = w.toLowerCase();
        int token = Map.get(w);
        if(token != 0){
            Map.put(w,token);



        }
    }
}


该方法假定返回文本文件中最常见的25个单词的列表。

最佳答案



字符串Stackoverflow could help you. Help Help at Stackoverflow.的示例代码

import java.util.regex.Pattern;
import java.util.stream.Collectors.*;
import java.util.stream.*;
import java.util.HashMap;
import java.util.*;
import java.util.Map.Entry;

public class WordCount {


    public static void main(String[] args) {
        String sentence = "Stackoverflow could help you. Help Help at Stackoverflow.";
        Stream<String> wordStream = Pattern.compile("\\W").splitAsStream(sentence);
        HashMap<String,Integer> unsortedMap = new HashMap<String,Integer>();
        // foreach word count how many the word occurs in the wordstream
        wordStream.forEach((wordReal) -> {
            String word = wordReal.toLowerCase();
            if (!word.equals("")) {
                if (unsortedMap.get(word) == null) {
                    unsortedMap.put(word, 0);
                }
                unsortedMap.put(word, unsortedMap.get(word) + 1);
            }
        });
        // sort hashmap after value desc
        Map<String, Integer> sortedMap =
             unsortedMap.entrySet().stream()
            .sorted(Map.Entry.comparingByValue((v1,v2)->v2.compareTo(v1)))
            .collect(Collectors.toMap(Entry::getKey, Entry::getValue,
                                      (e1, e2) -> e1, LinkedHashMap::new));

        // just println word and wordcount, here you can limit to 25 (just delete)
        for (Map.Entry<String, Integer> entry : sortedMap.entrySet()) {
            System.out.println("Word : `" + entry.getKey() + "` Count : " + entry.getValue());
        }
    }

}


输出量

Word : `help` Count : 3
Word : `stackoverflow` Count : 2
Word : `at` Count : 1
Word : `could` Count : 1
Word : `you` Count : 1


如果只想得到25个结果,则只需要限制25个结果之后的输出,或者只删除25个结果之后的所有条目。

关于java - 坚持在文本文件中找到25个最常见的单词,我们在Stack Overflow上找到一个类似的问题:https://stackoverflow.com/questions/35988025/

10-15 03:45