I found np.count and np.histogram but it's not what I am looking for
Something like:
发件人:

array_ = np.array([0,0,0,1,0,0,2,0,0,1,2,0])

致:
array_ = np.array([8,8,8,2,8,8,2,8,8,2,2,8])

提前谢谢!

最佳答案

。使用原始数组作为np.bincount结果的索引可以得到所需的输出。

>>> array_ = np.array([0,0,0,1,0,0,2,0,0,1,2,0])
>>> np.bincount(array_)
array([8, 2, 2])
>>> np.bincount(array_)[array_]
array([8, 8, 8, 2, 8, 8, 2, 8, 8, 2, 2, 8])


An alternative approach that should be efficient even with large or negative inputs is to use bincount with the np.bincount and max(array_) + 1 arguments, as follows:
>>> array_ = np.array([0,0,0,1,0,0,2,0,0,1,2,0])
>>> _, inv, counts = np.unique(array_, return_inverse=True, return_counts=True)
>>> counts[inv]
array([8, 8, 8, 2, 8, 8, 2, 8, 8, 2, 2, 8])

注意np.unique参数是NumPy 1.9.0中的新参数,因此您需要一个最新版本的NumPy。!您仍然可以使用return_inversereturn_counts参数,它将返回一个小整数数组,其排列方式与原始整数相同。新的阵列现在处于完美的状态,以便return_counts高效地对其进行操作:
>>> array_ = np.array([0,0,0,1,0,0,2,0,0,1,2,0])
>>> _, inverse = np.unique(array_, return_inverse=True)
>>> np.bincount(inverse)[inverse]
array([8, 8, 8, 2, 8, 8, 2, 8, 8, 2, 2, 8])

Another example, with larger return_inverse contents:
>>> array_ = np.array([0, 71, 598, 71, 0, 0, 243])
>>> _, inverse = np.unique(array_, return_inverse=True)
>>> inverse
array([0, 1, 3, 1, 0, 0, 2])
>>> np.bincount(inverse)[inverse]
array([3, 2, 1, 2, 3, 3, 1])

。不过,和往常一样,如果效率是一个问题,那么你应该了解什么是最合适的。特别注意,np.unique在引擎盖下进行排序,其理论复杂度高于纯bincount解。。
所以让我们使用IPython的array_(这是在Python 3.4上)来做一些计时。First we'll define functions for the operations we need:
In [1]: import numpy as np; from collections import Counter

In [2]: def freq_bincount(array):
   ...:     return np.bincount(array)[array]
   ...:

In [3]: def freq_unique(array):
   ...:     _, inverse, counts = np.unique(array, return_inverse=True, return_counts=True)
   ...:     return counts[inverse]
   ...:

In [4]: def freq_counter(array):
   ...:     c = Counter(array)
   ...:     return np.array(list(map(c.get, array)))
   ...:

现在我们创建一个测试数组:
In [5]: test_array = np.random.randint(100, size=10**6)

。Here are the results on my machine:
In [6]: %timeit freq_bincount(test_array)
100 loops, best of 3: 2.69 ms per loop

In [7]: %timeit freq_unique(test_array)
10 loops, best of 3: 166 ms per loop

In [8]: %timeit freq_counter(test_array)
1 loops, best of 3: 317 ms per loop

。@Kasramvd解决方案中的Counter方法比dict方法慢一些,但这可能在不同的机器上或在不同版本的Python和NumPy上发生变化:您应该使用适合您的用例的数据进行测试。

关于python - numpy将数组元素转换为其频率的最快方法,我们在Stack Overflow上找到一个类似的问题:https://stackoverflow.com/questions/32591327/

10-14 17:57
查看更多