Why should I use concurrent characteristic in parallel stream with collect:
List<Integer> list =
Collections.synchronizedList(new ArrayList<>(Arrays.asList(1, 2, 4)));
Map<Integer, Integer> collect = list.stream().parallel()
.collect(Collectors.toConcurrentMap(k -> k, v -> v, (c, c2) -> c + c2));
Map<Integer, Integer> collect = list.stream().parallel()
.collect(Collectors.toMap(k -> k, v -> v, (c, c2) -> c + c2));
In other words, what are the side effects to not using this characteristic, Is it useful for the internal stream operations?
These two collectors operate in a fundamentally different way.
首先,Stream 框架会将工作负载拆分为可以并行处理的独立块(这就是为什么您不需要特殊的集合作为源,synchronizedList
First, the Stream framework will split the workload into independent chunks that can be processed in parallel (that’s why you don’t need a special collection as the source, synchronizedList
is unnecessary).
使用非并发收集器,每个块将通过使用收集器的供应商创建一个本地容器(此处为 Map
With a non-concurrent collector, each chunk will be processed by creating a local container (here, a Map
) using the Collector’s supplier and accumulating it into the local container (putting entries). These partial results have to be merged, i.e. one map has been put into the other, to get a final result.
A concurrent collector supports accumulating concurrently, so only one ConcurrentMap
will be created and all threads accumulate into that map at the same time. So after completion, no merging step is required, as there is only one map.
因此,两个收集器都是线程安全的,但可能表现出完全不同的性能特征,具体取决于任务.如果 Stream 在收集结果之前的工作量很大,则差异可能可以忽略不计.如果像在您的示例中一样,在收集操作之前没有相关工作,则结果在很大程度上取决于必须合并映射的频率,即出现相同的键,以及实际目标 ConcurrentMap
So both collectors are thread-safe, but might exhibit entirely different performance characteristics, depending on the task. If the Stream’s workload before collecting the result is heavy, the differences might be negligible. If like in your example, there is no relevant work before the collect operation, the outcome heavily depends on how often mappings have to be merged, i.e the same key occurs, and how the actual target ConcurrentMap
deals with contention in the concurrent case.
If you mostly have distinct keys, the merging step of a non-concurrent collector can be as expensive as the previous putting, destroying any benefit of the parallel processing. But if you have lots of duplicate keys, requiring merging of the values, the contention on the same key may degrade the concurrent collector’s performance.
So there’s no simple "which is better" answer (well, if there was such an answer, why bother adding the other variant). It depends on your actual operation. You can use the expected scenario as a starting point for selecting one but should measure with the real-life data then. Since both are equivalent, you can change your choice at any time.