问题描述
每个集合包含一堆校验和。例如:
集A:
{
4445968d0e100ad08323df8c895cea15
a67f8052594d6ba3f75502c0b91b868f
07736dde2f8484a4a3af463e05f039e3
5b1e374ff2ba949ab49870ca24d3163a
}
Each set contains bunch of checksums. For example:
Set A:
{
4445968d0e100ad08323df8c895cea15
a67f8052594d6ba3f75502c0b91b868f
07736dde2f8484a4a3af463e05f039e3
5b1e374ff2ba949ab49870ca24d3163a
}
B组:
{
6639e1da308fd7b04b7635a17450df7c
4445968d0e100ad08323df8c895cea15
a67f8052594d6ba3f75502c0b91b868f
}
Set B:
{
6639e1da308fd7b04b7635a17450df7c
4445968d0e100ad08323df8c895cea15
a67f8052594d6ba3f75502c0b91b868f
}
A和B的最大公共子集:
{
4445968d0e100ad08323df8c895cea15
a67f8052594d6ba3f75502c0b91b868f
}
The maximum common subset of A and B is:
{
4445968d0e100ad08323df8c895cea15
a67f8052594d6ba3f75502c0b91b868f
}
很多这些操作都将被执行,所以我在寻找一个有效的算法来做到这一点。感谢您的帮助。
A lot of these operations will be performed, so I'm looking for an efficient algorithm to do so.Thanks for your help.
推荐答案
将组之一在哈希表和遍历其他,丢弃不在散列元素。另外,无论排序并通过他们迭代同时,作为归并排序。
Put one of the sets in a hash table and iterate through the other, discarding elements that aren't in the hash. Alternatively, sort both and iterate through them simultaneously, as in merge sort.
编辑:后一种方法创建一个排序结果。我要补充一点,如果集广泛不同大小的,他们是presorted(说是因为你做了一堆交点),那么你就可以实现大的性能提升用无界二分查找跳过极目大名单。
The latter method creates a sorted result. I should add that if the sets are of widely disparate sizes and they're presorted (say because you're doing a bunch of intersections), then you can realize a large performance improvement by using "unbounded" binary search to skip ahead in the large list.
这篇关于高效的算法找出两套最大公共子集?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持!