问题描述
这就是我想做的:我有一个包含几个等价关系的列表:
So here's what I want to do: I have a list that contains several equivalence relations:
l = [[1, 2], [2, 3], [4, 5], [6, 7], [1, 7]]
我想合并共享一个元素的集合.这是一个示例实现:
And I want to union the sets that share one element. Here is a sample implementation:
def union(lis):
lis = [set(e) for e in lis]
res = []
while True:
for i in range(len(lis)):
a = lis[i]
if res == []:
res.append(a)
else:
pointer = 0
while pointer < len(res):
if a & res[pointer] != set([]) :
res[pointer] = res[pointer].union(a)
break
pointer +=1
if pointer == len(res):
res.append(a)
if res == lis:
break
lis,res = res,[]
return res
它会打印
[set([1, 2, 3, 6, 7]), set([4, 5])]
这做对了,但是当等价关系太大时,它太慢了.我查看了有关联合查找算法的描述: http://en.wikipedia.org/wiki/不相交的set_data_structure 但是我仍然在编码Python实现时遇到问题.
This does the right thing but is way too slow when the equivalence relations is too large. I looked up the descriptions on union-find algorithm: http://en.wikipedia.org/wiki/Disjoint-set_data_structurebut I still having problem coding a Python implementation.
推荐答案
在O(n)
时间运行的解决方案
Solution that runs in O(n)
time
def indices_dict(lis):
d = defaultdict(list)
for i,(a,b) in enumerate(lis):
d[a].append(i)
d[b].append(i)
return d
def disjoint_indices(lis):
d = indices_dict(lis)
sets = []
while len(d):
que = set(d.popitem()[1])
ind = set()
while len(que):
ind |= que
que = set([y for i in que
for x in lis[i]
for y in d.pop(x, [])]) - ind
sets += [ind]
return sets
def disjoint_sets(lis):
return [set([x for i in s for x in lis[i]]) for s in disjoint_indices(lis)]
工作方式:
>>> lis = [(1,2),(2,3),(4,5),(6,7),(1,7)]
>>> indices_dict(lis)
>>> {1: [0, 4], 2: [0, 1], 3: [1], 4: [2], 5: [2], 6: [3], 7: [3, 4]})
indices_dict
给出从等价#到lis
中的索引的映射.例如. 1
映射到lis
中的索引0
和4
.
indices_dict
gives a map from an equivalence # to an index in lis
. E.g. 1
is mapped to index 0
and 4
in lis
.
>>> disjoint_indices(lis)
>>> [set([0,1,3,4], set([2])]
disjoint_indices
给出了不相交的索引集的列表.每个集合对应于等价的索引.例如. lis[0]
和lis[3]
是相同的,但不是lis[2]
.
disjoint_indices
gives a list of disjoint sets of indices. Each set corresponds to indices in an equivalence. E.g. lis[0]
and lis[3]
are in the same equivalence but not lis[2]
.
>>> disjoint_set(lis)
>>> [set([1, 2, 3, 6, 7]), set([4, 5])]
disjoint_set
将不相交的索引转换为它们的适当等值.
disjoint_set
converts disjoint indices into into their proper equivalences.
很难看到 O(n)
的时间复杂度,但我会尽力解释.在这里,我将使用n = len(lis)
.
-
indices_dict
当然可以在O(n)
时间内运行,因为只有1个循环
indices_dict
certainly runs inO(n)
time because only 1 for-loop
disjoint_indices
是最难看到的.它肯定在O(len(d))
时间运行,因为当d
为空时外循环停止,并且内循环在每次迭代中都删除d
的元素.现在,len(d) <= 2n
因为d
是从等价#到索引的映射,并且lis
中最多有2n
个等价编号.因此,该函数在O(n)
中运行.
disjoint_indices
is the hardest to see. It certainly runs in O(len(d))
time since the outer loop stops when d
is empty and the inner loop removes an element of d
each iteration. now, the len(d) <= 2n
since d
is a map from equivalence #'s to indices and there are at most 2n
different equivalence #'s in lis
. Therefore, the function runs in O(n)
.
disjoint_sets
.但是,您会注意到,i
最多可以在lis
中的所有n
索引上运行,而x
可以在2元组上运行,因此总复杂度为2n = O(n)
disjoint_sets
is difficult to see because of the 3 combined for-loops. However, you'll notice that at most i
can run over all n
indices in lis
and x
runs over the 2-tuple, so the total complexity is 2n = O(n)
这篇关于使用Python联合查找实现的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持!