在Java流中拆分字符串

在Java流中拆分字符串

本文介绍了在Java流中拆分字符串?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我有一个作业,我们在其中读取文本文件并计算每个单词的出现次数(忽略标点符号).我们不必使用流,但我想练习使用它们.

I have an assignment where we're reading textfiles and counting the occurrences of each word (ignoring punctuation). We don't have to use streams but I want to practice using them.

到目前为止,我已经能够读取文本文件并将每一行放入一个字符串中,并将所有字符串放入一个列表中,如下所示:

So far I am able to read a text file and put each line in a string, and all the strings in a list using this:

try (Stream<String> p = Files.lines(FOLDER_OF_TEXT_FILES)) {
    list = p.map(line -> line.replaceAll("[^A-Za-z0-9 ]", ""))
            .collect(Collectors.toList());
}

但是,到目前为止,它仅使所有行变成单个String,因此列表中的每个元素都不是单词,而是一行.有没有一种使用流的方法,可以使每个元素成为一个单词,例如使用String的带有regex的split方法?还是我必须在流本身之外处理此问题?

However, so far, it simply makes all the lines a single String, so each element of the list is not a word, but a line. Is there a way using streams that I can have each element be a single word, using something like String's split method with regex? Or will I have to handle this outside the stream itself?

推荐答案

一个人可以使用Pattern.splitAsStream以高性能的方式拆分字符串,并同时替换所有非单词字符,然后再创建出现次数映射:

one could use a Pattern.splitAsStream to split a string in a performant way and at the same time replace all non word characters before creating a map of occurrence counts:

Pattern splitter = Pattern.compile("(\\W*\\s+\\W*)+");
String fileStr = Files.readString(Path.of(FOLDER_OF_TEXT_FILES));

Map<String, Long> collect = splitter.splitAsStream(fileStr)
        .collect(groupingBy(Function.identity(), counting()));

System.out.println(collect);

为了拆分和删除非单词字符,我们使用模式(\W*\s+\W*)+,在该模式中,我们查找可选的非单词字符,然后查找空格,然后再次查找可选的非单词字符.

For splitting and removal of non word characters we are using the pattern (\W*\s+\W*)+ where we look for optional non word characters then a space and then again for optional non word characters.

这篇关于在Java流中拆分字符串?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持!

08-20 09:29