本文介绍了插值缺失值2d python的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我有一个二维数组(或矩阵,如果愿意的话),其中一些缺失值表示为NaN.缺失值通常沿着一个轴成条状,例如:

1   2   3 NaN   5
2   3   4 Nan   6
3   4 Nan Nan   7
4   5 Nan Nan   8
5   6   7   8   9

在这里我想用一些合理的数字代替NaN.

我研究了delaunay三角剖分,但是发现的文档很少.

我尝试使用 astropy的卷积,因为它支持使用2d数组,非常简单.问题在于卷积不是插值,而是将所有值都移向平均值(可以通过使用窄核来缓解).

这个问题应该是这篇文章

的自然二维扩展.一个>.有没有一种方法可以对2d数组中的NaN/缺失值进行插值?

是的,您可以使用 scipy.interpolate.griddata 和掩码数组,您可以使用参数method通常选择'cubic'来选择更喜欢的插值类型: >

import numpy as np
from scipy import interpolate


#Let's create some random  data
array = np.random.random_integers(0,10,(10,10)).astype(float)
#values grater then 7 goes to np.nan
array[array>7] = np.nan

使用plt.imshow(array,interpolation='nearest')看起来像这样:

x = np.arange(0, array.shape[1])
y = np.arange(0, array.shape[0])
#mask invalid values
array = np.ma.masked_invalid(array)
xx, yy = np.meshgrid(x, y)
#get only the valid values
x1 = xx[~array.mask]
y1 = yy[~array.mask]
newarr = array[~array.mask]

GD1 = interpolate.griddata((x1, y1), newarr.ravel(),
                          (xx, yy),
                             method='cubic')

这是最终结果:

请注意,如果nan值在边缘且被nan值包围,则无法对thay进行插值并将其保留为nan.您可以使用fill_value自变量进行更改.

如果存在NaN值的3x3区域,这将如何工作,您会获得中间点的敏感数据吗?

这取决于您的数据类型,您必须执行一些测试.例如,您可以故意掩盖一些好的数据尝试不同类型的插值,例如三次,线性等),并使用具有被遮罩值的数组,并计算插值与您之前遮罩的原始值之间的差,并查看哪种方法返回您的细微差别.

您可以使用以下内容:

reference = array[3:6,3:6].copy()
array[3:6,3:6] = np.nan
method = ['linear', 'nearest', 'cubic']

for i in method:
    GD1 = interpolate.griddata((x1, y1), newarr.ravel(),
                              (xx, yy),
                                 method=i)
    meandifference = np.mean(np.abs(reference - GD1[3:6,3:6]))
    print ' %s interpolation difference: %s' %(i,meandifference )

这给出了这样的内容:

   linear interpolation difference: 4.88888888889
   nearest interpolation difference: 4.11111111111
   cubic interpolation difference: 5.99400137377

当然,这是针对随机数的,因此结果可能会有很大差异是正常的.因此,最好的办法是对数据集的故意遮盖"的部分进行测试,然后看看会发生什么.

I have a 2d array(or matrix if you prefer) with some missing values represented asNaN. The missing values are typically in a strip along one axis, eg:

1   2   3 NaN   5
2   3   4 Nan   6
3   4 Nan Nan   7
4   5 Nan Nan   8
5   6   7   8   9

where I would like to replace the NaN's by somewhat sensible numbers.

I looked into delaunay triangulation, but found very little documentation.

I tried using astropy's convolve as it supports use of 2d arrays, and is quite straightforward.The problem with this is that convolution is not interpolation, it moves all values towards the average (which could be mitigated by using a narrow kernel).

This question should be the natural 2-dimensional extension to this post. Is there a way to interpolate over NaN/missing values in a 2d-array?

解决方案

Yes you can use scipy.interpolate.griddata and masked array and you can choose the type of interpolation that you prefer using the argument method usually 'cubic' do an excellent job:

import numpy as np
from scipy import interpolate


#Let's create some random  data
array = np.random.random_integers(0,10,(10,10)).astype(float)
#values grater then 7 goes to np.nan
array[array>7] = np.nan

That looks something like this using plt.imshow(array,interpolation='nearest'):

x = np.arange(0, array.shape[1])
y = np.arange(0, array.shape[0])
#mask invalid values
array = np.ma.masked_invalid(array)
xx, yy = np.meshgrid(x, y)
#get only the valid values
x1 = xx[~array.mask]
y1 = yy[~array.mask]
newarr = array[~array.mask]

GD1 = interpolate.griddata((x1, y1), newarr.ravel(),
                          (xx, yy),
                             method='cubic')

This is the final result:

Look that if the nan values are in the edges and are surrounded by nan values thay can't be interpolated and are kept nan. You can change it using the fill_value argument.

How would this work if there is a 3x3 region of NaN-values, would you get sensible data for the middle point?

It depends on your kind of data, you have to perform some test. You could for instance mask on purpose some good data try different kind of interpolation e.g. cubic, linear etc. etc. with the array with the masked values and calculuate the difference between the values interpolated and the original values that you had masked before and see which method return you the minor difference.

You can use something like this:

reference = array[3:6,3:6].copy()
array[3:6,3:6] = np.nan
method = ['linear', 'nearest', 'cubic']

for i in method:
    GD1 = interpolate.griddata((x1, y1), newarr.ravel(),
                              (xx, yy),
                                 method=i)
    meandifference = np.mean(np.abs(reference - GD1[3:6,3:6]))
    print ' %s interpolation difference: %s' %(i,meandifference )

That gives something like this:

   linear interpolation difference: 4.88888888889
   nearest interpolation difference: 4.11111111111
   cubic interpolation difference: 5.99400137377

Of course this is for random numbers so it's normal that the result may vary a lot. So the best thing to do is to test on "on purpose masked" piece of your dataset and see what happen.

这篇关于插值缺失值2d python的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持!

09-05 11:06