本文介绍了使用带浮动的pandas reindex:插值的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

你能解释这种奇怪的行为吗?

Can you explain this bizarre behaviour?

df=pd.DataFrame({'year':[1986,1987,1988],'bomb':arange(3)}).set_index('year')

In [9]: df.reindex(arange(1986,1988.125,.125))
Out[9]:
          bomb
1986.000     0
1986.125   NaN
1986.250   NaN
1986.375   NaN
1986.500   NaN
1986.625   NaN
1986.750   NaN
1986.875   NaN
1987.000     1
1987.125   NaN
1987.250   NaN
1987.375   NaN
1987.500   NaN
1987.625   NaN
1987.750   NaN
1987.875   NaN
1988.000     2

In [10]: df.reindex(arange(1986,1988.1,.1))
Out[10]:
        bomb
1986.0     0
1986.1   NaN
1986.2   NaN
1986.3   NaN
1986.4   NaN
1986.5   NaN
1986.6   NaN
1986.7   NaN
1986.8   NaN
1986.9   NaN
1987.0   NaN
1987.1   NaN
1987.2   NaN
1987.3   NaN
1987.4   NaN
1987.5   NaN
1987.6   NaN
1987.7   NaN
1987.8   NaN
1987.9   NaN
1988.0   NaN

当增量是除了.125以外的任何值,我发现新索引值不会找到具有匹配值的旧行。即存在一个未被克服的精确问题。即使我在尝试插值之前强制索引为浮点数,也是如此。发生了什么事和/或正确的方法是什么?
我已经能够通过使用

When the increment is anything other than .125, I find that the new index values do not "find" the old rows that have matching values. ie there is a precision problem that is not being overcome. This is true even if I force the index to be a float before I try to interpolate. What is going on and/or what is the right way to do this?I've been able to get it to work with increment of 0.1 by using

reindex(  np.array(map(round,arange(1985,2010+dt,dt)*10))/10.0 )

顺便说一句,我这样做是线性插值多列的第一步(例如炸弹就是其中之一)。如果有更好的方法可以做到这一点,我很乐意直截了当。

By the way, I'm doing this as the first step in linearly interpolating a number of columns (e.g. "bomb" is one of them). If there's a nicer way to do that, I'd happily be set straight.

推荐答案

我认为你做得更好像这样通过使用PeriodIndex

I think you are better off doing something like this by using PeriodIndex

In [39]: df=pd.DataFrame({'bomb':np.arange(3)})

In [40]: df
Out[40]:
   bomb
0     0
1     1
2     2

In [41]: df.index = pd.period_range('1986','1988',freq='Y').asfreq('M')

In [42]: df
Out[42]:
         bomb
1986-12     0
1987-12     1
1988-12     2

In [43]: df = df.reindex(pd.period_range('1986','1988',freq='M'))

In [44]: df
Out[44]:
         bomb
1986-01   NaN
1986-02   NaN
1986-03   NaN
1986-04   NaN
1986-05   NaN
1986-06   NaN
1986-07   NaN
1986-08   NaN
1986-09   NaN
1986-10   NaN
1986-11   NaN
1986-12     0
1987-01   NaN
1987-02   NaN
1987-03   NaN
1987-04   NaN
1987-05   NaN
1987-06   NaN
1987-07   NaN
1987-08   NaN
1987-09   NaN
1987-10   NaN
1987-11   NaN
1987-12     1
1988-01   NaN
In [45]: df.iloc[0,0] = -1

In [46]: df['interp'] = df['bomb'].interpolate()

In [47]: df
Out[47]:
         bomb    interp
1986-01    -1 -1.000000
1986-02   NaN -0.909091
1986-03   NaN -0.818182
1986-04   NaN -0.727273
1986-05   NaN -0.636364
1986-06   NaN -0.545455
1986-07   NaN -0.454545
1986-08   NaN -0.363636
1986-09   NaN -0.272727
1986-10   NaN -0.181818
1986-11   NaN -0.090909
1986-12     0  0.000000
1987-01   NaN  0.083333
1987-02   NaN  0.166667
1987-03   NaN  0.250000
1987-04   NaN  0.333333
1987-05   NaN  0.416667
1987-06   NaN  0.500000
1987-07   NaN  0.583333
1987-08   NaN  0.666667
1987-09   NaN  0.750000
1987-10   NaN  0.833333
1987-11   NaN  0.916667
1987-12     1  1.000000
1988-01   NaN  1.000000

这篇关于使用带浮动的pandas reindex:插值的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持!

09-05 11:10