本文介绍了在一个numpy数组中查找连续的重复nan的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

在numpy数组中找到最大连续重复nan的最佳方法是什么?

What is the best way to find the maximum number of consecutive repeated nan in a numpy array?

示例:

from numpy import nan

输入1:[nan, nan, nan, 0.16, 1, 0.16, 0.9999, 0.0001, 0.16, 0.101, nan, 0.16]

输出1:3

输入2:[nan, nan, 2, 1, 1, nan, nan, nan, nan, 0.101, nan, 0.16]

输出2:4

推荐答案

这是一种方法-

def max_repeatedNaNs(a):
    # Mask of NaNs
    mask = np.concatenate(([False],np.isnan(a),[False]))
    if ~mask.any():
        return 0
    else:
        # Count of NaNs in each NaN group. Then, get max count as o/p.
        c = np.flatnonzero(mask[1:] < mask[:-1]) - \
            np.flatnonzero(mask[1:] > mask[:-1])
        return c.max()

这是改进版-

def max_repeatedNaNs_v2(a):
    mask = np.concatenate(([False],np.isnan(a),[False]))
    if ~mask.any():
        return 0
    else:
        idx = np.nonzero(mask[1:] != mask[:-1])[0]
        return (idx[1::2] - idx[::2]).max()

根据 @pltrdy's comment 进行基准测试a>-

Benchmarking in response to @pltrdy's comment -

In [77]: a = np.random.rand(10000)

In [78]: a[np.random.choice(range(len(a)),size=1000,replace=0)] = np.nan

In [79]: %timeit contiguous_NaN(a) #@pltrdy's solution
100 loops, best of 3: 15.8 ms per loop

In [80]: %timeit max_repeatedNaNs(a)
10000 loops, best of 3: 103 µs per loop

In [81]: %timeit max_repeatedNaNs_v2(a)
10000 loops, best of 3: 86.4 µs per loop

这篇关于在一个numpy数组中查找连续的重复nan的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持!