在一个项目中,我试图查询特定用户句柄的推文,并在该用户的推文中找到最常见的单词,并返回该最常见单词的频率。

下面是我的代码:

  public String mostPopularWord()
  {
     this.removeCommonEnglishWords();
     this.sortAndRemoveEmpties();

     Map<String, Integer> termsCount = new HashMap<>();
     for(String term : terms)
     {
        Integer c = termsCount.get(term);
        if(c==null)
           c = new Integer(0);
        c++;
        termsCount.put(term, c);
     }
     Map.Entry<String,Integer> mostRepeated = null;
     for(Map.Entry<String, Integer> curr: termsCount.entrySet())
     {
         if(mostRepeated == null || mostRepeated.getValue()<curr.getValue())
             mostRepeated = curr;
     }

     //frequencyMax = termsCount.get(mostRepeated.getKey());

     try
     {
        frequencyMax = termsCount.get(mostRepeated.getKey());
        return mostRepeated.getKey();
     }
     catch (NullPointerException e)
     {
        System.out.println("Cannot find most popular word from the tweets.");
     }

     return "";
  }


我还认为这将有助于显示我在上述方法中调用的前两个方法的代码,如下所示。它们都在同一类中,并定义了以下内容:

  private Twitter twitter;
  private PrintStream consolePrint;
  private List<Status> statuses;
  private List<String> terms;
  private String popularWord;
  private int frequencyMax;

  @SuppressWarnings("unchecked")
  public void sortAndRemoveEmpties()
  {
     Collections.sort(terms);
     terms.removeAll(Arrays.asList("", null));
  }

  private void removeCommonEnglishWords()
  {
     Scanner sc = null;

     try
     {
        sc = new Scanner(new File("commonWords.txt"));
     }
     catch(Exception e)
     {
        System.out.println("The file is not found");
     }

     List<String> commonWords = new ArrayList<String>();
     int count = 0;
     while(sc.hasNextLine())
     {
        count++;
        commonWords.add(sc.nextLine());
     }

     Iterator<String> termIt = terms.iterator();
     while(termIt.hasNext())
     {
        String term = termIt.next();
        for(String word : commonWords)
           if(term.equalsIgnoreCase(word))
              termIt.remove();
     }
  }


对于较长的代码段,我深表歉意。但令人沮丧的是,即使我的removeCommonEnglish()方法显然是正确的(在另一篇文章中讨论过),当我运行mostPopularWord()时,它也会返回“ the”,这显然是我常见英语单词列表的一部分已经并且打算从列表中删除条款。我可能做错了什么?

更新1:
这是commonWords文件的链接:
https://drive.google.com/file/d/1VKNI-b883uQhfKLVg-L8QHgPTLNb22uS/view?usp=sharing

更新2:调试时我注意到的一件事是
while(sc.hasNext())
在removeCommonEnglishWords()中被完全跳过。我不明白为什么。

最佳答案

如果像这样使用流,它可能会更简单:

String mostPopularWord() {
    return terms.stream()
            .collect(Collectors.groupingBy(s -> s, Collectors.counting()))
            .entrySet().stream()
            .sorted(Map.Entry.comparingByValue(Comparator.reverseOrder()))
            .findFirst()
            .map(Map.Entry::getKey)
            .orElse("");
}

关于java - 在一个人的推文中找到最受欢迎的词,我们在Stack Overflow上找到一个类似的问题:https://stackoverflow.com/questions/59801373/

10-10 04:15