stream（）。collect（Collectors.toSet（））vs stream（）。distinct（）。collect（Collectors.toList（））

本文介绍了stream（）。collect（Collectors.toSet（））vs stream（）。distinct（）。collect（Collectors.toList（））的处理方法，对大家解决问题具有一定的参考价值，需要的朋友们下面随着小编来一起学习吧！

问题描述

如果我有一个列表（~200个元素）的对象，只有很少的唯一对象（~20个元素）。
我想只有唯一的值。在 list.stream（）。collect（Collectors.toSet（））和list.stream（）。distinct（）。collect（Collectors.toList（））这是更多高效的wrt延迟和内存消耗？

If i have a list (~200 elements) of objects, with only few unique objects (~20 elements).I want to have only unique values. Between list.stream().collect(Collectors.toSet()) and list.stream().distinct().collect(Collectors.toList()) which is more efficient wrt latency and memory consumption ?

推荐答案

虽然答案很明显 - 不要理会速度和内存的这些细节消耗这些少量元素和一个返回 Set 而另一个返回 List ;有一些有趣的小细节（有趣的IMO）。

While the answer is pretty obvious - don't bother with these details of speed and memory consumption for this little amount of elements and the fact that one returns a Set and the other a List; there are some interesting small details (interesting IMO).

假设您从一个已知为 distinct 的源流式传输，在这种情况下您的 .distinct（）操作将是NO-OP;因为没有必要实际做任何事情。

Suppose you are streaming from a source that is already known to be distinct, in such a case your .distinct() operation will be a NO-OP; because there is no need to actually do anything.

如果您从列表（按设计订购）流式传输，并且没有中间操作（无序， .distinct（）将被强制保留订单，方法是使用 LinkedHashSet 内部 - 非常昂贵。

If you are streaming from a List (which is by design ordered) and there are no intermediate operations (unordered for example) that change the order, .distinct() will be forced to preserve the order, by using a LinkedHashSet internally - pretty expensive.

如果您正在进行并行处理， list.stream（）.collect（Collectors.toSet（））版本将合并多个 HashSet s（在9中这已经略有提升，相对于8），<$另一方面，c $ c> .distinct（）将旋转一个 ConcurrentHashMap ，它将使所有键都保持为虚拟 Boolean.TRUE value（它也会做一些有趣的事情来保存你的流可能有的 null - 即使这在内部以不同方式处理案例）

If you are doing parallel processing, list.stream().collect(Collectors.toSet()) version will merge multiple HashSets (in 9 this has been slightly improved vs 8), .distinct() on the other hand, will spin a ConcurrentHashMap that will keep all the keys with a dummy Boolean.TRUE value (it's also doing something interesting to preserve the null that your stream might have - even this internally is handled differently in two cases)

这篇关于stream（）。collect（Collectors.toSet（））vs stream（）。distinct（）。collect（Collectors.toList（））的文章就介绍到这了，希望我们推荐的答案对大家有所帮助，也希望大家多多支持！