我创建了两个HashMap,其中包含来自两个单独的txt文件的字符串。
现在,我试图比较两个HashMap,并计算每个文件包含的重复值的数量。例如,如果file1和file2都两次包含字符串“ hello”,则我的控制台应打印:hello发生2次。
这是我的第一个HashMap:
List<String> word_list = new ArrayList<>();
//Load your words to the word_list here
while (INPUT_TEXT1.hasNext()) {
String input_word = INPUT_TEXT1.next();
word_list.add(input_word);
}
INPUT_TEXT1.close();
String regexPattern = "[^a-zA-Z]";
int index = 0;
for (String s : word_list) {
word_list.set(index++, s.replaceAll(regexPattern, "").toLowerCase());
}
//Find the unique words now from list
String[] uniqueWords = word_list.stream().distinct().
toArray(size -> new String[size]);
Map<String, Integer> wordsMap = new HashMap<>();
int frequency = 0;
//Load the words to Map with each uniqueword as Key and frequency as Value
for (String uniqueWord : uniqueWords) {
frequency = Collections.frequency(word_list, uniqueWord);
System.out.println(uniqueWord+" occured "+frequency+" times");
wordsMap.put(uniqueWord, frequency);
}
//Now, Sort the words with the reverse order of frequency(value of HashMap)
Stream<Entry<String, Integer>> topWords = wordsMap.entrySet().stream().
sorted(Map.Entry.<String,Integer>comparingByValue().reversed()).limit(6);
//Now print the Top 5 words to console
System.out.println("Top 5 Words:::");
topWords.forEach(System.out::println);
System.out.println("\n\n");
这是我的第二个HashMap:
List<String> wordList = new ArrayList<>();
//Load your words to the word_list here
while (INPUT_TEXT2.hasNext()) {
String input_word1 = INPUT_TEXT2.next();
wordList.add(input_word1);
}
INPUT_TEXT2.close();
String regex = "[^a-zA-Z]";
int index1 = 0;
for (String s : wordList) {
wordList.set(index1++, s.replaceAll(regex, "").toLowerCase());
}
String[] uniqueWords1 = wordList.stream().distinct().
toArray(size -> new String[size]);
Map<String, Integer> wordsMap1 = new HashMap<>();
//Load the words to Map with each uniqueword as Key and frequency as Value
for (String uniqueWord : uniqueWords1) {
frequency = Collections.frequency(wordList, uniqueWord);
System.out.println(uniqueWord+" occured "+frequency+" times");
wordsMap.put(uniqueWord, frequency);
}
//Now, Sort the words with the reverse order of frequency(value of HashMap)
Stream<Entry<String, Integer>> topWords1 = wordsMap1.entrySet().stream().
sorted(Map.Entry.<String,Integer>comparingByValue().reversed()).limit(6)
这是我寻找重复值的原始方法:
boolean val = wordsMap.keySet().containsAll(wordsMap1.keySet());
for (Entry<String, Integer> str : wordsMap.entrySet()) {
System.out.println("================= " + str.getKey());
if(wordsMap1.containsKey(str.getKey())){
System.out.println("Map2 Contains Map 1 Key");
}
}
System.out.println("================= " + val);
有人对此有其他建议吗?谢谢
编辑
如何计算每个单个值的出现次数?
最佳答案
我认为您的代码也能正常工作。如果您的目标是找到一种更好的方法来执行上一次检查,则可以尝试以下操作:
Set<String> keySetMap1 = new HashSet<String>(wordsMap.keySet());
Set<String> keySet2 = wordsMap1.keySet();
keySetMap1.retainAll(keySet2);
keySetMap1.stream().forEach(x -> System.out.println("Map2 Contains Map 1 Key: "+x));