问题描述
我有两个非常大的numpy数组,它们都是3D的.我需要找到一种有效的方法来检查它们是否重叠,因为首先将它们都转换为集合会花费很长时间.我尝试使用在此找到的另一个解决方案来解决相同的问题,但适用于2D阵列,但是我没有设法使其适用于3D.这是2D解决方案:
I have two very large numpy arrays, which are both 3D. I need to find an efficient way to check if they are overlapping, because turning them both into sets first takes too long. I tried to use another solution I found here for this same problem but for 2D arrays, but I didn't manage to make it work for 3D.Here is the solution for 2D:
nrows, ncols = A.shape
dtype={'names':['f{}'.format(i) for i in range(ndep)],
'formats':ndep * [A.dtype]}
C = np.intersect1d(A.view(dtype).view(dtype), B.view(dtype).view(dtype))
# This last bit is optional if you're okay with "C" being a structured array...
C = C.view(A.dtype).reshape(-1, ndep)
(其中A和B是2D数组)我需要找到重叠的numpy数组的数量,而不是特定的数组.
(where A and B are the 2D arrays)I need to find the number of overlapping numpy arrays, but not the specific ones.
推荐答案
我们可以使用我在几个Q& A中使用的辅助函数来利用views
.为了获得子数组的存在,我们可以在视图上使用np.isin
,或者在np.searchsorted
上使用更费力的视图.
We could leverage views
using a helper function that I have used across few Q&As. To get the presence of subarrays, we could use np.isin
on the views or use a more laborious one with np.searchsorted
.
方法1:使用np.isin
-
# https://stackoverflow.com/a/45313353/ @Divakar
def view1D(a, b): # a, b are arrays
a = np.ascontiguousarray(a)
b = np.ascontiguousarray(b)
void_dt = np.dtype((np.void, a.dtype.itemsize * a.shape[1]))
return a.view(void_dt).ravel(), b.view(void_dt).ravel()
def isin_nd(a,b):
# a,b are the 3D input arrays to give us "isin-like" functionality across them
A,B = view1D(a.reshape(a.shape[0],-1),b.reshape(b.shape[0],-1))
return np.isin(A,B)
方法2::我们还可以在views
-
def isin_nd_searchsorted(a,b):
# a,b are the 3D input arrays
A,B = view1D(a.reshape(a.shape[0],-1),b.reshape(b.shape[0],-1))
sidx = A.argsort()
sorted_index = np.searchsorted(A,B,sorter=sidx)
sorted_index[sorted_index==len(A)] = len(A)-1
idx = sidx[sorted_index]
return A[idx] == B
因此,这两个解决方案为我们提供了b
中a
中每个子阵列的存在的掩码.因此,要获得所需的计数,应为-isin_nd(a,b).sum()
或isin_nd_searchsorted(a,b).sum()
.
So, these two solutions give us the mask of presence of each of the subarrays from a
in b
. Hence, to get our desired count, it would be - isin_nd(a,b).sum()
or isin_nd_searchsorted(a,b).sum()
.
样品运行-
In [71]: # Setup with 3 common "subarrays"
...: np.random.seed(0)
...: a = np.random.randint(0,9,(10,4,5))
...: b = np.random.randint(0,9,(7,4,5))
...:
...: b[1] = a[4]
...: b[3] = a[2]
...: b[6] = a[0]
In [72]: isin_nd(a,b).sum()
Out[72]: 3
In [73]: isin_nd_searchsorted(a,b).sum()
Out[73]: 3
大型数组上的计时-
In [74]: # Setup
...: np.random.seed(0)
...: a = np.random.randint(0,9,(100,100,100))
...: b = np.random.randint(0,9,(100,100,100))
...: idxa = np.random.choice(range(len(a)), len(a)//2, replace=False)
...: idxb = np.random.choice(range(len(b)), len(b)//2, replace=False)
...: a[idxa] = b[idxb]
# Verify output
In [82]: np.allclose(isin_nd(a,b),isin_nd_searchsorted(a,b))
Out[82]: True
In [75]: %timeit isin_nd(a,b).sum()
10 loops, best of 3: 31.2 ms per loop
In [76]: %timeit isin_nd_searchsorted(a,b).sum()
100 loops, best of 3: 1.98 ms per loop
这篇关于检查两个3D numpy数组是否包含重叠的2D数组的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持!