我有一个 Pandas 数据框,我将其插入以获得每日数据框。原始数据框如下所示:
col_1 vals
2017-10-01 0.000000 0.112869
2017-10-02 0.017143 0.112869
2017-10-12 0.003750 0.117274
2017-10-14 0.000000 0.161556
2017-10-17 0.000000 0.116264
在插值数据框中,我想将数据值更改为 NaN,其中日期间隔超过 5 天。例如。在上面的数据帧中,
2017-10-02
和 2017-10-12
之间的差距超过 5 天,因此在插入的数据帧中,这两个日期之间的所有值都应该被删除。我不知道该怎么做,也许是 combine_first
?--编辑:插值数据框如下所示:
col_1 vals
2017-10-01 0.000000 0.112869
2017-10-02 0.017143 0.112869
2017-10-03 0.015804 0.113309
2017-10-04 0.014464 0.113750
2017-10-05 0.013125 0.114190
2017-10-06 0.011786 0.114631
2017-10-07 0.010446 0.115071
2017-10-08 0.009107 0.115512
2017-10-09 0.007768 0.115953
2017-10-10 0.006429 0.116393
2017-10-11 0.005089 0.116834
2017-10-12 0.003750 0.117274
2017-10-13 0.001875 0.139415
2017-10-14 0.000000 0.161556
2017-10-15 0.000000 0.146459
2017-10-16 0.000000 0.131361
2017-10-17 0.000000 0.116264
预期输出:
col_1 vals
2017-10-01 0.000000 0.112869
2017-10-02 0.017143 0.112869
2017-10-12 0.003750 0.117274
2017-10-13 0.001875 0.139415
2017-10-14 0.000000 0.161556
2017-10-15 0.000000 0.146459
2017-10-16 0.000000 0.131361
2017-10-17 0.000000 0.116264
最佳答案
我首先确定差距超过 5 天的地方。从那里,我生成了一个数组,用于标识这些差距之间的组。最后,我将使用 groupby
转向每日频率并进行插值。
# convenience: assign string to variable for easier access
daytype = 'timedelta64[D]'
# define five days for use when evaluating size of gaps
five = np.array(5, dtype=daytype)
# get the size of gaps
deltas = np.diff(df.index.values).astype(daytype)
# identify groups between gaps
groups = np.append(False, deltas > five).cumsum()
# handy function to turn to daily frequency and interpolate
to_daily = lambda x: x.asfreq('D').interpolate()
# and finally...
df.groupby(groups, group_keys=False).apply(to_daily)
col_1 vals
2017-10-01 0.000000 0.112869
2017-10-02 0.017143 0.112869
2017-10-12 0.003750 0.117274
2017-10-13 0.001875 0.139415
2017-10-14 0.000000 0.161556
2017-10-15 0.000000 0.146459
2017-10-16 0.000000 0.131361
2017-10-17 0.000000 0.116264
如果您想提供自己的插值方法。您可以像这样修改上面的内容:
daytype = 'timedelta64[D]'
five = np.array(5, dtype=daytype)
deltas = np.diff(df.index.values).astype(daytype)
groups = np.append(False, deltas > five).cumsum()
# custom interpolation function that takes a dataframe
def my_interpolate(df):
"""This can be whatever you want.
I just provided what will result
in the same thing as before."""
return df.interpolate()
to_daily = lambda x: x.asfreq('D').pipe(my_interpolate)
df.groupby(groups, group_keys=False).apply(to_daily)
col_1 vals
2017-10-01 0.000000 0.112869
2017-10-02 0.017143 0.112869
2017-10-12 0.003750 0.117274
2017-10-13 0.001875 0.139415
2017-10-14 0.000000 0.161556
2017-10-15 0.000000 0.146459
2017-10-16 0.000000 0.131361
2017-10-17 0.000000 0.116264
关于python - 当多天数据丢失时,用 NaN 填充数据框,我们在Stack Overflow上找到一个类似的问题:https://stackoverflow.com/questions/46306736/