快速，python式的方式在numpy数组中对1的块进行排名?

本文介绍了快速，python式的方式在numpy数组中对1的块进行排名?的处理方法，对大家解决问题具有一定的参考价值，需要的朋友们下面随着小编来一起学习吧！

问题描述

我有一个由0和1组成的numpy数组.数组中每个1序列代表一个事件的发生.我想用事件特定的ID号来标记与事件相对应的元素(其余的数组元素用np.nan)我肯定可以在循环中完成，但是还有更多的"python-ish"(快速，矢量化)的方式呢?

I have a numpy array consisting of 0's and 1's. Each sequence of 1's within the array stands for occurrence of one event. I want to label elements corresponding to an event with event-specific ID number (and the rest of array elements with np.nan) I surely can do that in a loop, but is there more "python-ish" (fast, vectorized) way of doing it?

带有3个事件的numpy数组的示例，我想对其进行标记.

Example of numpy array with 3 events I want to label.

import numpy as np 
arr = np.array([0,0,0,1,1,1,0,0,0,1,1,0,0,0,1,1,1,1])
some_func(arr)

# Expected output of some_func I search for: 
# [np.nan,np.nan,np.nan,0,0,0,np.nan,np.nan,np.nan,1,1,np.nan,np.nan,np.nan,2,2,2,2]

推荐答案

您想贴上标签，幸运的是，有一个带有SciPy的标签， scipy.ndimage.label -

You want to label and luckily, there's one with SciPy, scipy.ndimage.label -

In [43]: from scipy.ndimage import label

In [47]: out = label(arr)[0]

In [48]: np.where(arr==0,np.nan,out-1)
Out[48]: 
array([nan, nan, nan,  0.,  0.,  0., nan, nan, nan,  1.,  1., nan, nan,
       nan,  2.,  2.,  2.,  2.])

另一个有一些NumPy工作的人-

Another with some NumPy work -

def rank_chunks(arr):
    m = np.r_[False,arr.astype(bool)]
    idx = np.flatnonzero(m[:-1] < m[1:])
    id_ar = np.zeros(len(arr),dtype=float)
    id_ar[idx[1:]] = 1
    out = id_ar.cumsum()
    out[arr==0] = np.nan
    return out

另一个是masking + np.repeat-

def rank_chunks_v2(arr):
    m = np.r_[False,arr.astype(bool),False]
    idx = np.flatnonzero(m[:-1] != m[1:])
    l = idx[1::2]-idx[::2]
    out = np.full(len(arr),np.nan,dtype=float)
    out[arr!=0] = np.repeat(np.arange(len(l)),l)
    return out

时间(将给定的输入平铺到1Mx)-

Timings (tiling given input to 1Mx) -

In [153]: arr_big = np.tile(arr,1000000)

In [154]: %timeit np.where(arr_big==0,np.nan,label(arr_big)[0]-1)
     ...: %timeit rank_chunks(arr_big)
     ...: %timeit rank_chunks_v2(arr_big)
1 loop, best of 3: 312 ms per loop
1 loop, best of 3: 263 ms per loop
1 loop, best of 3: 229 ms per loop

这篇关于快速，python式的方式在numpy数组中对1的块进行排名?的文章就介绍到这了，希望我们推荐的答案对大家有所帮助，也希望大家多多支持！