问题描述
我的文档如下所示:
data.txt
100, "some text"
101, "more text"
102, "even more text"
我使用正则表达式处理它并返回一个新的处理过的文档,如下所示:
I processed it using regex and returned a new processed documents as the follow:
Stream<String> lines = Files.lines(Paths.get(data.txt);
Pattern regex = Pattern.compile("([\\d{1,3}]),(.*)");
List<MyClass> result =
lines.map(regex::matcher)
.filter(Matcher::find)
.map(m -> new MyClass(m.group(1), m.group(2)) //MyClass(int id, String text)
.collect(Collectors.toList());
这将返回已处理的MyClass列表。可以并行运行,一切正常。
This returns a list of MyClass processed. Can run in parallel and everything is ok.
问题是我现在有这个:
data2.txt
data2.txt
101, "some text
the text continues in the next line
and maybe in the next"
102, "for a random
number
of lines"
103, "until the new pattern of new id comma appears"
所以,我不知何故需要连接从流中读取的行,直到出现新的匹配。(像缓冲区?)
So, I somehow need to join lines that are being read from the stream until a new match appear. (Something like an buffer?)
我试图收集字符串然后收集MyCla ss(),但没有成功,因为我实际上无法拆分流。
I tried to Collect strings and then collect MyClass(), but with no success, because I cannot actually split streams.
降低连接线的想法,但我会连接线,我无法减少并生成一个新的行流。
Reduce comes to mind to concatenate lines, but I'll concatenate just lines and I cannot reduce and generate a new stream of lines.
如何用java 8流解决这个问题?
Any ideas how to solve this with java 8 streams?
推荐答案
这是 java.util.Scanner
的工作。随着即将推出的Java 9,你会写:
This is a job for java.util.Scanner
. With the upcoming Java 9, you would write:
List<MyClass> result;
try(Scanner s=new Scanner(Paths.get("data.txt"))) {
result = s.findAll("(\\d{1,3}),\\s*\"([^\"]*)\"")
//MyClass(int id, String text)
.map(m -> new MyClass(Integer.parseInt(m.group(1)), m.group(2)))
.collect(Collectors.toList());
}
result.forEach(System.out::println);
但是因为 Stream
生成 findAll
在Java 8下不存在,我们需要一个辅助方法:
but since the Stream
producing findAll
does not exist under Java 8, we’ll need a helper method:
private static Stream<MatchResult> matches(Scanner s, String pattern) {
Pattern compiled=Pattern.compile(pattern);
return StreamSupport.stream(
new Spliterators.AbstractSpliterator<MatchResult>(1000,
Spliterator.ORDERED|Spliterator.NONNULL) {
@Override
public boolean tryAdvance(Consumer<? super MatchResult> action) {
if(s.findWithinHorizon(compiled, 0)==null) return false;
action.accept(s.match());
return true;
}
}, false);
}
更换 findAll
这个帮助方法,我们得到
Replacing findAll
with this helper method, we get
List<MyClass> result;
try(Scanner s=new Scanner(Paths.get("data.txt"))) {
result = matches(s, "(\\d{1,3}),\\s*\"([^\"]*)\"")
// MyClass(int id, String text)
.map(m -> new MyClass(Integer.parseInt(m.group(1)), m.group(2)))
.collect(Collectors.toList());
}
这篇关于累积Java流,然后才对其进行处理的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持!