问题描述
如何最好地在python(带numpy)中找到一组数组(二维数组)中给定数组的出现次数?这是(简化)我需要在python代码中表达的内容:
How best to find the number of occurrences of a given array within a set of arrays (two-dimensional array) in python (with numpy)?This is (simplified) what I need expressed in python code:
patterns = numpy.array([[1, -1, 1, -1],
[1, 1, -1, 1],
[1, -1, 1, -1],
...])
findInPatterns = numpy.array([1, -1, 1, -1])
numberOfOccurrences = findNumberOfOccurrences(needle=findInPatterns, haystack=patterns)
print(numberOfOccurrences) # should print e.g. 2
实际上,我需要找出可以在集合中找到每个数组的频率.但是上面代码中描述的功能已经对我有很大帮助.
In reality, I need to find out how often each array can be found within the set. But the functionality described in the code above would already help me a lot on my way.
现在,我知道我可以使用循环来执行此操作,但我想知道是否有更有效的方法来执行此操作?谷歌搜索只让我进入numpy.bincount,它完全满足我的需要,但不适合二维数组,仅适合整数.
Now, I know I could use loops to do that but was wondering if there was a more efficient way to do this? Googling only let me to numpy.bincount which does exactly what I need but not for two-dimensional arrays and only for integers.
推荐答案
使用1
s和-1
s数组,从性能上讲,使用np.dot
并不能胜过一切:如果(且仅当)全部项匹配,则点积将总计一行中的项数.所以你可以做
With an array of 1
s and -1
s, performance wise nothing is going to beat using np.dot
: if (and only if) all items match then the dot product will add up to the number of items in the row. So you can do
>>> haystack = np.array([[1, -1, 1, -1],
... [1, 1, -1, 1],
... [1, -1, 1, -1]])
>>> needle = np.array([1, -1, 1, -1])
>>> haystack.dot(needle)
array([ 4, -2, 4])
>>> np.sum(haystack.dot(needle) == len(needle))
2
这是基于卷积的图像匹配的一种特殊情况,您可以轻松地重写它以查找比整行短的图案,甚至可以使用FFT加快速度.
This is sort of a toy particular case of convolution based image matching, and you could rewrite it easily to look for patterns shorter than a full row, and even speed it up using FFTs.
这篇关于Python:查找二维数组中给定数组的出现次数的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持!