问题描述
在将流的元素收集到集合中时,在流上指定 .distinct()
是否有任何优势(或缺点)?例如:
When collecting the elements of a stream into a set, is there any advantage (or drawback) to also specifying .distinct()
on the stream? For example:
return items.stream().map(...).distinct().collect(toSet());
鉴于该集合已经删除重复项,这似乎是多余的,但它是否提供任何性能优势或坏处?答案取决于流是并行/顺序还是有序/无序?
Given that the set will already remove duplicates, this seems redundant, but does it offer any performance advantage or disadvantage? Does the answer depend on whether the stream is parallel/sequential or ordered/unordered?
推荐答案
根据, distinct
是一个有状态的中间操作。
According to the javadoc, distinct
is a stateful intermediate operation.
如果您真的有 .distinct
,紧接着 .collect
,它并没有真正增加任何好处。也许如果 .distinct
实现比 Set
重复检查更具性能,你可能会获得一些好处,但如果你无论如何,收集到一套你最终会得到相同的结果。
If you literally have .distinct
followed immediately by .collect
, it doesn't really add any benefit. Maybe if the .distinct
implementation is more performant than the Set
duplication check, you might get some benefit, but if you're collecting to a set you're going to end up with the same result anyway.
另一方面,如果 .distinct
发生在 .map
操作之前,并且该特定映射是一项昂贵的操作,您可能会获得一些收益,因为您整体处理的数据较少。
If, on the other hand, .distinct
occurs before your .map
operation, and that particular mapping is an expensive operation, you may get some gains there because you're processing less data overall.
这篇关于使用distinct()和collect(toSet())是否值得的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持!