我有两个巨大的2d numpy整数数组x和u,其中u假设只有unqiue行。对于X中的每一行,我想得到U中匹配行的对应行索引(如果有,则为-1)例如,如果以下数组作为输入传递:

U = array([[1, 4],
       [2, 5],
       [3, 6]])

X = array([[1, 4],
       [3, 6],
       [7, 8],
       [1, 4]])

输出应为:
array([0,2,-1,0])

有没有一个有效的方法来做这个(或类似的事情)与纽比?
@分水岭:
你的方法对我来说失败了
print(type(rows), rows.dtype, rows.shape)
print(rows[:10])
print(search2D_indices(rows[:10], rows[:10]))

<class 'numpy.ndarray'> int32 (47398019, 5)
[[65536     1     1     1    17]
 [65536     1     1     1   153]
 [65536     1     1     2   137]
 [65536     1     1     3   153]
 [65536     1     1     9   124]
 [65536     1     1    13   377]
 [65536     1     1    13   134]
 [65536     1     1    13   137]
 [65536     1     1    13   153]
 [65536     1     1    13   439]]
[ 0  1  2  3  4 -1 -1 -1 -1  9]

最佳答案

方法1
this solutionFind the row indexes of several values in a numpy array的启发,这里有一个使用searchsorted的矢量化解决方案-

def search2D_indices(X, searched_values, fillval=-1):
    dims = np.maximum(X.max(0), searched_values.max(0))+1
    X1D = np.ravel_multi_index(X.T,dims)
    searched_valuesID = np.ravel_multi_index(searched_values.T,dims)
    sidx = X1D.argsort()
    idx = np.searchsorted(X1D,searched_valuesID,sorter=sidx)
    idx[idx==len(sidx)] = 0
    idx_out = sidx[idx]
    return np.where(X1D[idx_out] == searched_valuesID, idx_out, fillval)

样本运行-
In [121]: U
Out[121]:
array([[1, 4],
       [2, 5],
       [3, 6]])

In [122]: X
Out[122]:
array([[1, 4],
       [3, 6],
       [7, 8],
       [1, 4]])

In [123]: search2D_indices(U, X, fillval=-1)
Out[123]: array([ 0,  2, -1,  0])

方法2
扩展到整数为负的情况,我们需要相应地偏移dims并转换为1D,如下所示-
def search2D_indices_v2(X, searched_values, fillval=-1):
    X_lim = X.max()-X.min(0)
    searched_values_lim = searched_values.max()-searched_values.min(0)

    dims = np.maximum(X_lim, searched_values_lim)+1
    s = dims.cumprod()

    X1D = X.dot(s)
    searched_valuesID = searched_values.dot(s)
    sidx = X1D.argsort()
    idx = np.searchsorted(X1D,searched_valuesID,sorter=sidx)
    idx[idx==len(sidx)] = 0
    idx_out = sidx[idx]

    return np.where(X1D[idx_out] == searched_valuesID, idx_out, fillval)

样本运行-
In [142]: U
Out[142]:
array([[-1, -4],
       [ 2,  5],
       [ 3,  6]])

In [143]: X
Out[143]:
array([[-1, -4],
       [ 3,  6],
       [ 7,  8],
       [-1, -4]])

In [144]: search2D_indices_v2(U, X, fillval=-1)
Out[144]: array([ 0,  2, -1,  0])

方法3
另一个基于-
# https://stackoverflow.com/a/45313353/ @Divakar
def view1D(a, b): # a, b are arrays
    a = np.ascontiguousarray(a)
    b = np.ascontiguousarray(b)
    void_dt = np.dtype((np.void, a.dtype.itemsize * a.shape[1]))
    return a.view(void_dt).ravel(),  b.view(void_dt).ravel()

def search2D_indices_views(X, searched_values, fillval=-1):
    X1D,searched_valuesID = view1D(X, searched_values)
    sidx = X1D.argsort()
    idx = np.searchsorted(X1D,searched_valuesID,sorter=sidx)
    idx[idx==len(sidx)] = 0
    idx_out = sidx[idx]
    return np.where(X1D[idx_out] == searched_valuesID, idx_out, fillval)

10-06 05:20
查看更多