本文介绍了具有 numpy 的数组的有效阈值过滤器的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!



I need to filter an array to remove the elements that are lower than a certain threshold. My current code is like this:

threshold = 5
a = numpy.array(range(10)) # testing data
b = numpy.array(filter(lambda x: x >= threshold, a))

问题是这会创建一个临时列表,使用带有 lambda 函数的过滤器(慢).

The problem is that this creates a temporary list, using a filter with a lambda function (slow).

由于这是一个非常简单的操作,也许有一个 numpy 函数可以有效地执行此操作,但我一直找不到.

As this is a quite simple operation, maybe there is a numpy function that does it in an efficient way, but I've been unable to find it.


I've thought that another way to achieve this could be sorting the array, finding the index of the threshold and returning a slice from that index onwards, but even if this would be faster for small inputs (and it won't be noticeable anyway), its definitively asymptotically less efficient as the input size grows.


更新:我也进行了一些测量,当输入为 100.000.000 个条目时,排序+切片仍然是纯 python 过滤器的两倍.

Update: I took some measurements too, and the sorting+slicing was still twice as fast than the pure python filter when the input was 100.000.000 entries.

In [321]: r = numpy.random.uniform(0, 1, 100000000)

In [322]: %timeit test1(r) # filter
1 loops, best of 3: 21.3 s per loop

In [323]: %timeit test2(r) # sort and slice
1 loops, best of 3: 11.1 s per loop

In [324]: %timeit test3(r) # boolean indexing
1 loops, best of 3: 1.26 s per loop


b = a[a>threshold] this should do

b = a[a>threshold] this should do


import numpy as np, datetime
# array of zeros and ones interleaved
lrg = np.arange(2).reshape((2,-1)).repeat(1000000,-1).flatten()

t0 = datetime.datetime.now()
flt = lrg[lrg==0]
print datetime.datetime.now() - t0

t0 = datetime.datetime.now()
flt = np.array(filter(lambda x:x==0, lrg))
print datetime.datetime.now() - t0


$ python test.py


这篇关于具有 numpy 的数组的有效阈值过滤器的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持!

09-06 05:55