python - 在Numpy数组中循环和搜索

我需要遍历一个numpy数组，然后执行以下搜索。以下是大约300K值的数组（在下面的示例中为npArray1和npArray2）占用了60（s）个。

换句话说，我正在寻找npArray2中第一次出现的索引
对于npArray1的每个值。

for id in np.nditer(npArray1):
       newId=(np.where(npArray2==id))[0][0]

无论如何，我可以使用numpy加快上述速度吗？我需要在更大的阵列（50M）上运行上述脚本。请注意，以上各行中的两个numpy数组npArray1和npArray2的大小不一定相同，但它们均为1d。

非常感谢你的帮助，

最佳答案

函数np.unique将为您完成许多工作：

npArray2 = np.random.randint(100,None,(1000,)) #1000-long vector of ints between 1 and 100, so lots of repeats
vals,idxs = np.unique(searchMe, return_index=True) #each unique value AND the index of its first appearance
for val in npArray1:
  newId = idxs[vals==val][0]

vals是一个包含npArray2中唯一值的数组，而idxs给出npArray2中每个值首次出现的索引。在vals中搜索应该比在npArray1中搜索更快，因为它较小。

您可以利用vals已排序的事实来进一步加快搜索速度：

import bisect  #we can use binary search since vals is sorted
for val in npArray1:
    newId = idxs[bisect.bisect_left(vals, val)]