在Java流中拆分字符串? | 在Java流中拆分字符串

本文介绍了在Java流中拆分字符串?的处理方法，对大家解决问题具有一定的参考价值，需要的朋友们下面随着小编来一起学习吧！

问题描述

我有一个作业，我们在其中读取文本文件并计算每个单词的出现次数(忽略标点符号).我们不必使用流，但我想练习使用它们.

I have an assignment where we're reading textfiles and counting the occurrences of each word (ignoring punctuation). We don't have to use streams but I want to practice using them.

到目前为止，我已经能够读取文本文件并将每一行放入一个字符串中，并将所有字符串放入一个列表中，如下所示:

So far I am able to read a text file and put each line in a string, and all the strings in a list using this:

try (Stream<String> p = Files.lines(FOLDER_OF_TEXT_FILES)) {
    list = p.map(line -> line.replaceAll("[^A-Za-z0-9 ]", ""))
            .collect(Collectors.toList());
}

但是，到目前为止，它仅使所有行变成单个String，因此列表中的每个元素都不是单词，而是一行.有没有一种使用流的方法，可以使每个元素成为一个单词，例如使用String的带有regex的split方法?还是我必须在流本身之外处理此问题?

However, so far, it simply makes all the lines a single String, so each element of the list is not a word, but a line. Is there a way using streams that I can have each element be a single word, using something like String's split method with regex? Or will I have to handle this outside the stream itself?

推荐答案

一个人可以使用Pattern.splitAsStream以高性能的方式拆分字符串，并同时替换所有非单词字符，然后再创建出现次数映射:

one could use a Pattern.splitAsStream to split a string in a performant way and at the same time replace all non word characters before creating a map of occurrence counts:

Pattern splitter = Pattern.compile("(\\W*\\s+\\W*)+");
String fileStr = Files.readString(Path.of(FOLDER_OF_TEXT_FILES));

Map<String, Long> collect = splitter.splitAsStream(fileStr)
        .collect(groupingBy(Function.identity(), counting()));

System.out.println(collect);

为了拆分和删除非单词字符，我们使用模式(\W*\s+\W*)+，在该模式中，我们查找可选的非单词字符，然后查找空格，然后再次查找可选的非单词字符.

For splitting and removal of non word characters we are using the pattern (\W*\s+\W*)+ where we look for optional non word characters then a space and then again for optional non word characters.

这篇关于在Java流中拆分字符串?的文章就介绍到这了，希望我们推荐的答案对大家有所帮助，也希望大家多多支持！