问题描述
我有一个 numpy
数组
import numpy as np
arr = np.arange(20).reshape(2,10)
arr[1,:] = 0
arr[1,2] = arr[1,5] = arr[1,7] = 1
print(arr)
>>>[[0 1 2 3 4 5 6 7 8 9]
>>> [0 0 1 0 0 1 0 1 0 0]]
我想提取重叠的数组,它们从 1
开始并在下一个 1
之后结束.预期输出:
I want to extract overlapping arrays, starting at a 1
and ending behind the next 1
.Expected output:
[[0 1 2 3]
[0 0 1 0]]
[[2 3 4 5 6]
[1 0 0 1 0]]
[[5 6 7 8]
[1 0 1 0]]
[[7 8 9]
[1 0 0]]
此刻,我有一个基于索引的for循环,在 numpy
上下文中感到尴尬,并且还不得不将第一个和最后一个段视为特殊情况:
At the moment, I have an index-based for-loop that feels awkward in a numpy
context and also has to treat the first and last segment as special cases:
arr[1,0] = 1
ind = list(np.where(arr[1,:]))[0]
print(ind)
for i, j in enumerate(ind):
if not i:
continue
curr = np.copy(arr[:, ind[i-1]:j+2])
print(curr)
#last segment
curr = np.copy(arr[:, j:])
print(curr)
这种方法给了我想要的输出,但是我不相信没有比这更简单的方法(尽管这里的风滚草反应可能表明了这一点).如果有一个更简单的熊猫解决方案,那也很好.理想情况下,输出是这些数组或类似数据结构的列表;输出数组不必单独返回.
This approach gives me the desired output but I cannot believe there is not a numpier way to achieve this (although the tumbleweed reaction here may indicate this). If there is an easier pandas solution, that would also be fine. The output is ideally a list of these arrays or a similar data structure; the output arrays don't have to be returned individually.
推荐答案
解决方案中有一部分是我最喜欢的,并不复杂:
There is a part of solution, my favorite and not complicated:
split_idx = np.flatnonzero(arr[1]) + 2
>>> np.split(arr, split_idx, axis=1)
[array([[0, 1, 2, 3],
[0, 0, 1, 0]]),
array([[4, 5, 6],
[0, 1, 0]]),
array([[7, 8],
[1, 0]]),
array([[9],
[0]])]
但是有两件事表明此问题的任何 numpyic
方法的设计都不好:
But there are two things that indicates a bad design of any numpyic
approach for this problem:
- 您被迫使用非专为
numpy
设计的不同形状的列表.因此np.split
相当慢. - 您不能一次循环一个数组.在内部物品的开头需要额外插入.
- You're forced to work with lists of distinct shapes which is not designed for
numpy
. Sonp.split
is quite slow. - You can't loop an array in one go. Extra insertions are needed at the beginnings of interior items.
这篇关于numpy数组分割的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持!