我有两个列表,其中一个列表非常庞大(数以百万计的元素),另外几千个。我想做以下
bigArray=[0,1,0,2,3,2,,.....]
smallArray=[0,1,2,3,4]
for i in len(smallArray):
pts=np.where(bigArray==smallArray[i])
#Do stuff with pts...
上面的作品,但很慢。有什么方法可以更有效地做到这一点,而无需诉诸C语言?
最佳答案
在您的情况下,您可能会受益于对大型阵列进行预排序。这是演示如何将时间从〜45秒减少到2秒(在我的笔记本电脑上)的示例(对于阵列5e6与1e3的一组特定长度)。显然,如果阵列大小完全不同,则解决方案将不是最佳解决方案。例如。使用默认解决方案时,复杂度为O(bigN * smallN),但对于我建议的解决方案,复杂度为O((bigN + smallN)* log(bigN))
import numpy as np, numpy.random as nprand, time, bisect
bigN = 5e6
smallN = 1000
maxn = 1e7
nprand.seed(1)
bigArr = nprand.randint(0, maxn, size=bigN)
smallArr = nprand.randint(0, maxn, size=smallN)
# brute force
t1 = time.time()
for i in range(len(smallArr)):
inds = np.where(bigArr == smallArr[i])[0]
t2 = time.time()
print "Brute", t2-t1
# not brute force (like nested loop with index scan)
t1 = time.time()
sortedind = np.argsort(bigArr)
sortedbigArr = bigArr[sortedind]
for i in range(len(smallArr)):
i1 = bisect.bisect_left(sortedbigArr, smallArr[i])
i2 = bisect.bisect_right(sortedbigArr, smallArr[i])
inds = sortedind[i1:i2]
t2=time.time()
print "Non-brute", t2-t1
输出:
蛮力42.5278530121
非粗麻布1.57193303108
关于python - numpy数组: Efficiently find matching indices,我们在Stack Overflow上找到一个类似的问题:https://stackoverflow.com/questions/10320751/