我有一个 Pandas 数据框,我将其插入以获得每日数据框。原始数据框如下所示:

               col_1      vals
2017-10-01  0.000000  0.112869
2017-10-02  0.017143  0.112869
2017-10-12  0.003750  0.117274
2017-10-14  0.000000  0.161556
2017-10-17  0.000000  0.116264

在插值数据框中,我想将数据值更改为 NaN,其中日期间隔超过 5 天。例如。在上面的数据帧中,2017-10-022017-10-12 之间的差距超过 5 天,因此在插入的数据帧中,这两个日期之间的所有值都应该被删除。我不知道该怎么做,也许是 combine_first

--编辑:插值数据框如下所示:
            col_1      vals
2017-10-01  0.000000  0.112869
2017-10-02  0.017143  0.112869
2017-10-03  0.015804  0.113309
2017-10-04  0.014464  0.113750
2017-10-05  0.013125  0.114190
2017-10-06  0.011786  0.114631
2017-10-07  0.010446  0.115071
2017-10-08  0.009107  0.115512
2017-10-09  0.007768  0.115953
2017-10-10  0.006429  0.116393
2017-10-11  0.005089  0.116834
2017-10-12  0.003750  0.117274
2017-10-13  0.001875  0.139415
2017-10-14  0.000000  0.161556
2017-10-15  0.000000  0.146459
2017-10-16  0.000000  0.131361
2017-10-17  0.000000  0.116264

预期输出:
               col_1      vals
2017-10-01  0.000000  0.112869
2017-10-02  0.017143  0.112869
2017-10-12  0.003750  0.117274
2017-10-13  0.001875  0.139415
2017-10-14  0.000000  0.161556
2017-10-15  0.000000  0.146459
2017-10-16  0.000000  0.131361
2017-10-17  0.000000  0.116264

最佳答案

我首先确定差距超过 5 天的地方。从那里,我生成了一个数组,用于标识这些差距之间的组。最后,我将使用 groupby 转向每日频率并进行插值。

# convenience: assign string to variable for easier access
daytype = 'timedelta64[D]'

# define five days for use when evaluating size of gaps
five = np.array(5, dtype=daytype)

# get the size of gaps
deltas = np.diff(df.index.values).astype(daytype)

# identify groups between gaps
groups = np.append(False, deltas > five).cumsum()

# handy function to turn to daily frequency and interpolate
to_daily = lambda x: x.asfreq('D').interpolate()

# and finally...
df.groupby(groups, group_keys=False).apply(to_daily)

               col_1      vals
2017-10-01  0.000000  0.112869
2017-10-02  0.017143  0.112869
2017-10-12  0.003750  0.117274
2017-10-13  0.001875  0.139415
2017-10-14  0.000000  0.161556
2017-10-15  0.000000  0.146459
2017-10-16  0.000000  0.131361
2017-10-17  0.000000  0.116264

如果您想提供自己的插值方法。您可以像这样修改上面的内容:
daytype = 'timedelta64[D]'
five = np.array(5, dtype=daytype)
deltas = np.diff(df.index.values).astype(daytype)
groups = np.append(False, deltas > five).cumsum()

# custom interpolation function that takes a dataframe
def my_interpolate(df):
    """This can be whatever you want.
    I just provided what will result
    in the same thing as before."""
    return df.interpolate()

to_daily = lambda x: x.asfreq('D').pipe(my_interpolate)

df.groupby(groups, group_keys=False).apply(to_daily)

               col_1      vals
2017-10-01  0.000000  0.112869
2017-10-02  0.017143  0.112869
2017-10-12  0.003750  0.117274
2017-10-13  0.001875  0.139415
2017-10-14  0.000000  0.161556
2017-10-15  0.000000  0.146459
2017-10-16  0.000000  0.131361
2017-10-17  0.000000  0.116264

关于python - 当多天数据丢失时,用 NaN 填充数据框,我们在Stack Overflow上找到一个类似的问题:https://stackoverflow.com/questions/46306736/

10-12 16:48
查看更多