如何比较两个巨大的

如何比较两个巨大的

本文介绍了如何比较两个巨大的 List<String>在爪哇?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我的应用程序生成 2 个大列表(最多 3.5 条字符串记录).我需要最好和最快的方法来比较它.目前我是这样做的:

My application generates 2 big lists (up to 3.5mill string records). I need the best and fastest way to compare it. Currently I am doing it like this:

List list1 = ListUtils.subtract(sourceDbResults, hiveResults);
List list2 = ListUtils.subtract(hiveResults, sourceDbResults);

但是正如我从 jconsole 看到的那样,这种方法在内存上确实很昂贵,有时甚至在其上处理堆栈.有什么好的解决方案或想法吗?

But this method is really expensive on memory as i see from jconsole and sometimes process even stack on it. Any good solutions or ideas?

列表中的元素位置/顺序总是相同的,所以我不需要处理它.比较后,我需要知道列表是否相同,如果它们不相同,则从这些列表中获取差异.减法非常适合小型列表.

Element positions/order in the list are always the same, so I dont need to deal with it. After comparing I need to know if the list are the same and to get the differences from these list if they are not the same. Subtract works perfect for small lists.

推荐答案

鉴于你已经说过你的两个列表已经排序,它们可以在 O(N) 时间内进行比较,这比你当前的解决方案要快得多使用 ListUtils.下面的方法使用与合并大多数教科书中可以找到的两个排序列表类似的算法来实现这一点.

Given that you've said your two lists are already sorted, they can be compared in O(N) time, which is much faster than your current solution that uses ListUtils. The following method does this using a similar algorithm to the one that merges two sorted lists that can be found in most textbooks.

import java.util.*;

public class CompareSortedLists {
    public static void main(String[] args) {
        List<Integer> sourceDbResults = Arrays.asList(1, 2, 3, 4, 5, 8);
        List<Integer> hiveResults = Arrays.asList(2, 3, 6, 7);
        List<Integer> inSourceDb_notInHive = new ArrayList<>();
        List<Integer> inHive_notInSourceDb = new ArrayList<>();

        compareSortedLists(
                sourceDbResults, hiveResults,
                inSourceDb_notInHive, inHive_notInSourceDb);

        assert inSourceDb_notInHive.equals(Arrays.asList(1, 4, 5, 8));
        assert inHive_notInSourceDb.equals(Arrays.asList(6, 7));
    }

    /**
     * Compares two sorted lists (or other iterable collections in ascending order).
     * Adds to onlyInList1 any and all elements in list1 that are not in list2; and
     * conversely to onlyInList2. The caller must ensure the two input lists are
     * already sorted and should initialize onlyInList1 and onlyInList2 to empty,
     * writable collections.
     */
    public static <T extends Comparable<? super T>> void compareSortedLists(
            Iterable<T> list1, Iterable<T> list2,
            Collection<T> onlyInList1, Collection<T> onlyInList2) {
        Iterator<T> it1 = list1.iterator();
        Iterator<T> it2 = list2.iterator();
        T e1 = it1.hasNext() ? it1.next() : null;
        T e2 = it2.hasNext() ? it2.next() : null;
        while (e1 != null || e2 != null) {
            if (e2 == null) {  // No more elements in list2, some remaining in list1
                onlyInList1.add(e1);
                e1 = it1.hasNext() ? it1.next() : null;
            }
            else if (e1 == null) {  // No more elements in list1, some remaining in list2
                onlyInList2.add(e2);
                e2 = it2.hasNext() ? it2.next() : null;
            }
            else {
                int comp = e1.compareTo(e2);
                if (comp < 0) {
                    onlyInList1.add(e1);
                    e1 = it1.hasNext() ? it1.next() : null;
                }
                else if (comp > 0) {
                    onlyInList2.add(e2);
                    e2 = it2.hasNext() ? it2.next() : null;
                }
                else /* comp == 0 */ {
                    e1 = it1.hasNext() ? it1.next() : null;
                    e2 = it2.hasNext() ? it2.next() : null;
                }
            }
        }
    }
}

上述方法不使用外部库,可以与Java 6 以上的任何版本一起使用.如果您使用 PeekingIterator,例如来自 Apache Commons Collections 或 Guava 的那个,或者自己编写,那么您可以使方法更简单,特别是如果您还使用 Java 8:

The above method uses no external libraries, and can be used with any version of Java from 6 upwards. If you use a PeekingIterator, such as the one from Apache Commons Collections, or Guava, or write your own, then you can make the method simpler, especially if you also use Java 8:

public static <T extends Comparable<? super T>> void compareSortedLists(
        Iterable<T> list1, Iterable<T> list2,
        Collection<T> onlyInList1, Collection<T> onlyInList2) {
    PeekingIterator<T> it1 = new PeekingIterator<>(list1.iterator());
    PeekingIterator<T> it2 = new PeekingIterator<>(list2.iterator());
    while (it1.hasNext() && it2.hasNext()) {
        int comp = it1.peek().compareTo(it2.peek());
        if (comp < 0)
            onlyInList1.add(it1.next());
        else if (comp > 0)
            onlyInList2.add(it2.next());
        else /* comp == 0 */ {
            it1.next();
            it2.next();
        }
    }
    it1.forEachRemaining(onlyInList1::add);
    it2.forEachRemaining(onlyInList2::add);
}

这篇关于如何比较两个巨大的 List<String>在爪哇?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持!

08-24 10:03