java - 深入了解CollectionsRemoveAll方法

我有一个大小约为200k的列表。我在过滤列表时遇到了一些问题。

这是实现：

public List<> filterList(List<> listToBeFiltered){
List<> removeElementsFromList = listToBeFiltered.parallelStream()
                                    .filter(//some filtering logic)
                                    .collect(Collectors.toList());
listToBeFiltered.removeAll(removeElementsFromList);
return listToBeFiltered;
}

我在代码中遇到的问题是，当removeElementsFromList接近listToBeFiltered的大小时，程序将停留在removeAll语句中。任何见解/替代解决方案都倍受赞赏。

最佳答案

问题在于x.removeAll(y)运算是O（n×m），其中n是集合x的大小，m是集合y的大小（即O（| x |×| y |）。

removeAll方法基本上只是为y中的每个元素遍历整个列表，检查x中的每个元素是否恰好相等，如果相等，则将其删除。如果您可以一次完成，那将会更有效率。

假设您正在使用Java 8，则有一种更有效的方法：

List<Integer> xs = new ArrayList<>();
// TODO: initialize xs with a bunch of values
List<Integer> ys = new ArrayList<>();
// TODO: initialize ys with a bunch of values
Set<Integer> ysSet = new HashSet<>(ys);
List<Integer> xsPrime = xs.stream()
    .filter(x -> !ysSet.contains(x))
    .collect(Collectors.toList());

对于大小为100k的xs和大小为ys的66k，使用removeAll大约需要5500ms，而使用上述方法仅需要大约8ms。我希望由于removeAll的二次复杂度，当您扩展到200k时，差异会更加明显。

相反，上面使用的过滤器版本的复杂度将为O（n + m），因为构建HashSet中所有值的ys为O（m），然后将O（n）设置为遍历xs的所有值以确保新的ysSet中不包含任何值。（这当然是假设HashSet查找为O（1）。）

再次回顾您的问题，我意识到您已经在使用filter ...在这种情况下，我建议您只是反转过滤器逻辑，然后将传入列表的值重置为过滤后的值：

public List<> filterList(List<> listToBeFiltered){
    List<> filteredList = listToBeFiltered.parallelStream()
        .filter(/* some inverted filtering logic */)
        .collect(Collectors.toList());
    listToBeFiltered.clear();
    listToBeFiltered.addAll(filteredList);
    return listToBeFiltered;
}

如果您不需要更改原始列表，则可以直接返回filteredList。（无论如何，这将是我的首选解决方案。）

我只是再次运行测试，这次我添加了另一个使用循环而不是流的版本：

Set<Integer> ysSet = new HashSet<>(ys);
List<Integer> xsPrime = new ArrayList<>();
for (Integer x : xs) {
    if (!ysSet.contains(x)) {
        xsPrime.add(x);
    }
}
return xsPrime;

此版本完成大约需要7ms，而不是8ms。由于这仅比流版本快一点（特别是考虑到使用removeAll的原始版本要慢3个数量级），所以我坚持使用流版本-尤其是因为您可以在那里使用并行性（因为您可以已经与parallelStream一起使用）。

关于java - 深入了解CollectionsRemoveAll方法，我们在Stack Overflow上找到一个类似的问题：https://stackoverflow.com/questions/33227592/