问题描述
我有两个大的2维数组,我想以它们的行作为元素来查找它们的集合差异.在Matlab中,此代码为setdiff(A,B,'rows')
.数组足够大,以至于我可能想到的明显的循环方法花费的时间太长.
I have two large 2-d arrays and I'd like to find their set difference taking their rows as elements. In Matlab, the code for this would be setdiff(A,B,'rows')
. The arrays are large enough that the obvious looping methods I could think of take too long.
推荐答案
此应可行,但由于正在创建的视图没有可用的mergesort,目前在1.6.1中已被打破.它适用于1.7.0的预发行版本.这应该是最快的方法,因为视图不必复制任何内存:
This should work, but is currently broken in 1.6.1 due to an unavailable mergesort for the view being created. It works in the pre-release 1.7.0 version. This should be the fastest way possible, since the views don't have to copy any memory:
>>> import numpy as np
>>> a1 = np.array([[1,2,3],[4,5,6],[7,8,9]])
>>> a2 = np.array([[4,5,6],[7,8,9],[1,1,1]])
>>> a1_rows = a1.view([('', a1.dtype)] * a1.shape[1])
>>> a2_rows = a2.view([('', a2.dtype)] * a2.shape[1])
>>> np.setdiff1d(a1_rows, a2_rows).view(a1.dtype).reshape(-1, a1.shape[1])
array([[1, 2, 3]])
您可以在Python中执行此操作,但这可能会很慢:
You can do this in Python, but it might be slow:
>>> import numpy as np
>>> a1 = np.array([[1,2,3],[4,5,6],[7,8,9]])
>>> a2 = np.array([[4,5,6],[7,8,9],[1,1,1]])
>>> a1_rows = set(map(tuple, a1))
>>> a2_rows = set(map(tuple, a2))
>>> a1_rows.difference(a2_rows)
set([(1, 2, 3)])
这篇关于在Python中找到两个大数组(矩阵)之间的集合差异的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持!