问题描述
我正在尝试使用numpy中的arpgpartition,但似乎出了点问题,我似乎无法弄清楚.这是正在发生的事情:
I am trying to use arpgpartition from numpy, but it seems there is something going wrong and I cannot seem to figure it out. Here is what's happening:
这些是排序数组norms
np.sort(norms)[:5]
array([ 53.64759445, 54.91434479, 60.11617279, 64.09630585, 64.75318909], dtype=float32)
但是当我使用indices_sorted = np.argpartition(norms, 5)[:5]
norms[indices_sorted]
array([ 60.11617279, 64.09630585, 53.64759445, 54.91434479, 64.75318909], dtype=float32)
当我认为我应该得到与排序数组相同的结果吗?
When I think I should get the same result as the sorted array?
当我使用3作为参数indices_sorted = np.argpartition(norms, 3)[:3]
norms[indices_sorted]
array([ 53.64759445, 54.91434479, 60.11617279], dtype=float32)
这对我来说意义不大,希望有人可以提供一些见解?
This isn't making much sense to me, hoping someone can offer some insight?
将这个问题改写为argpartition是否保留k个分区元素的顺序更有意义.
Rephrasing this question as whether argpartition preserves order of the k partitioned elements makes more sense.
推荐答案
我们需要使用按排序顺序保留的索引列表,而不是将第k个参数作为标量.因此,要保持第一个5
元素(而不是np.argpartition(a,5)[:5]
)的排序性质,只需执行-
We need to use list of indices that are to be kept in sorted order instead of feeding the kth param as a scalar. Thus, to maintain the sorted nature across the first 5
elements, instead of np.argpartition(a,5)[:5]
, simply do -
np.argpartition(a,range(5))[:5]
这里是一个使情况更清晰的示例-
Here's a sample run to make things clear -
In [84]: a = np.random.rand(10)
In [85]: a
Out[85]:
array([ 0.85017222, 0.19406266, 0.7879974 , 0.40444978, 0.46057793,
0.51428578, 0.03419694, 0.47708 , 0.73924536, 0.14437159])
In [86]: a[np.argpartition(a,5)[:5]]
Out[86]: array([ 0.19406266, 0.14437159, 0.03419694, 0.40444978, 0.46057793])
In [87]: a[np.argpartition(a,range(5))[:5]]
Out[87]: array([ 0.03419694, 0.14437159, 0.19406266, 0.40444978, 0.46057793])
请注意,argpartition
在性能方面很有意义,如果我们希望获取元素的一小部分的排序索引,则可以说k
elems的数量,它占elem总数的一小部分.
Please note that argpartition
makes sense on performance aspect, if we are looking to get sorted indices for a small subset of elements, let's say k
number of elems which is a small fraction of the total number of elems.
让我们使用更大的数据集,并尝试获取所有元素的排序索引,以使上述要点更明确-
Let's use a bigger dataset and try to get sorted indices for all elems to make the above mentioned point clear -
In [51]: a = np.random.rand(10000)*100
In [52]: %timeit np.argpartition(a,range(a.size-1))[:5]
10 loops, best of 3: 105 ms per loop
In [53]: %timeit a.argsort()
1000 loops, best of 3: 893 µs per loop
因此,要对所有元素进行排序,np.argpartition
并非可行之路.
Thus, to sort all elems, np.argpartition
isn't the way to go.
现在,假设我想仅获取具有该大数据集的前5个元素的排序索引,并且还保留这些元素的顺序-
Now, let's say I want to get sorted indices for only the first 5 elems with that big dataset and also keep the order for those -
In [68]: a = np.random.rand(10000)*100
In [69]: np.argpartition(a,range(5))[:5]
Out[69]: array([1647, 942, 2167, 1371, 2571])
In [70]: a.argsort()[:5]
Out[70]: array([1647, 942, 2167, 1371, 2571])
In [71]: %timeit np.argpartition(a,range(5))[:5]
10000 loops, best of 3: 112 µs per loop
In [72]: %timeit a.argsort()[:5]
1000 loops, best of 3: 888 µs per loop
在这里非常有用!
这篇关于无法理解numpy argpartition输出的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持!