I have created a parsing library that accepts a provided input and returns a stream of Records. A program then calls this library and processes the results. In my case, my program is using something like
recordStream.forEach(r -> insertIntoDB(r));
One of the types of input that can be provided to the parsing library is a flat file, which may have a header row. As such, the parsing library can be configured to skip a header row. If a header row is configured, it adds a skip(n) element to the return, e.g.
Files.lines(input)**.skip(1)**.parallel().map(r -> createRecord(r));
But, it seems that skip, parallel and forEach do not play nicely togetherThe end programmer must instead invoke forEachOrdered, but it is poor design to put this requirement on the programmer, to expect them to know they must use forEachOrdered if dealing with an input type of a file with a header row.
How can I enforce the ordered requirement myself when necessary, within the construction of the returned stream chain, to return a fully functional stream to the program writer, instead of a stream with hidden limitations? Is the answer to wrap the stream in another stream?
不必因为 skip()
is necessary not because of the skip()
, but because your Stream is parallel. Even if the stream is parallel, the stream will skip the first element, as indicated in the documentation:
明确记载, forEach
不必然尊重秩序。不要使用 forEachOrdered
,当您关心订单时,只是滥用Stream API:
It's clearly documented that forEach
doesn't necessarily respect the order. Not using forEachOrdered
when you care about the order is just a misuse of the Stream API:
我不会从库中返回并行流。我会返回一个顺序的(其中forEach会尊重订单),并让调用者调用 parallel()
I would not return a parallel stream from the library. I would return a sequential one (where forEach would respect the order), and let the caller call parallel()
and assume the consequences if it wants to.
Using a parallel stream by default is a bad idea.