我有以下数据:

                        data
timestamp
2012-06-01 17:00:00     9
2012-06-01 20:00:00     8
2012-06-01 13:00:00     9
2012-06-01 10:00:00     9


并想按时间降序排列,在数据的顶部和底部添加开始和结束日期,因此看起来像这样:

                        data
timestamp
2012-06-01 00:00:00     NaN
2012-06-01 10:00:00     9
2012-06-01 13:00:00     9
2012-06-01 17:00:00     9
2012-06-01 20:00:00     8
2012-06-02 00:00:00     NaN


最后,我想将数据集扩展到从一开始到结束的所有小时(以一小时为单位),用缺失的时间戳(包含“ None” /“ NaN”作为数据)填充数据框。
到目前为止,我有以下代码:

df2 = pd.DataFrame({'data':temperature, 'timestamp': pd.DatetimeIndex(timestamp)}, dtype=float)
df2.set_index('timestamp',inplace=True)
df3 = pd.DataFrame({ 'timestamp': pd.Series([ts1, ts2]), 'data': [None, None]})
df3.set_index('timestamp',inplace=True)
print(df3)
merged = df3.append(df2)
print(merged)


具有以下打印输出:

df3:
                     data
timestamp
2012-06-01 00:00:00     None
2012-06-02 00:00:00     None


merged:
                     data
timestamp
2012-06-01 00:00:00     NaN
2012-06-02 00:00:00     NaN
2012-06-01 17:00:00     9
2012-06-01 20:00:00     8
2012-06-01 13:00:00     9
2012-06-01 10:00:00     9


我试过了:

merged = merged.asfreq('H')


但这返回了不令人满意的结果:

                     data
2012-06-01 00:00:00   NaN
2012-06-01 01:00:00   NaN
2012-06-01 02:00:00   NaN
2012-06-01 03:00:00   NaN
2012-06-01 04:00:00   NaN
2012-06-01 05:00:00   NaN
2012-06-01 06:00:00   NaN
2012-06-01 07:00:00   NaN
2012-06-01 08:00:00   NaN
2012-06-01 09:00:00   NaN
2012-06-01 10:00:00     9


其余数据框在哪里?为什么它只包含直到第一个有效值的数据?

非常感谢您的帮助。在此先多谢

最佳答案

首先使用所需的时间戳索引创建一个空的数据框,然后与原始数据集进行左合并:

df2 = pd.DataFrame(index = pd.date_range('2012-06-01','2012-06-02', freq='H'))
df3 = pd.merge(df2, df, left_index = True, right_index = True, how = 'left')
df3
Out[103]:
                               timestamp  value
2012-06-01 00:00:00                  NaN    NaN
2012-06-01 01:00:00                  NaN    NaN
2012-06-01 02:00:00                  NaN    NaN
2012-06-01 03:00:00                  NaN    NaN
2012-06-01 04:00:00                  NaN    NaN
2012-06-01 05:00:00                  NaN    NaN
2012-06-01 06:00:00                  NaN    NaN
2012-06-01 07:00:00                  NaN    NaN
2012-06-01 08:00:00                  NaN    NaN
2012-06-01 09:00:00                  NaN    NaN
2012-06-01 10:00:00  2012-06-01 10:00:00      9
2012-06-01 11:00:00                  NaN    NaN
2012-06-01 12:00:00                  NaN    NaN
2012-06-01 13:00:00  2012-06-01 13:00:00      9
2012-06-01 14:00:00                  NaN    NaN
2012-06-01 15:00:00                  NaN    NaN
2012-06-01 16:00:00                  NaN    NaN
2012-06-01 17:00:00  2012-06-01 17:00:00      9
2012-06-01 18:00:00                  NaN    NaN
2012-06-01 19:00:00                  NaN    NaN
2012-06-01 20:00:00  2012-06-01 20:00:00      8
2012-06-01 21:00:00                  NaN    NaN
2012-06-01 22:00:00                  NaN    NaN
2012-06-01 23:00:00                  NaN    NaN
2012-06-02 00:00:00                  NaN    NaN

关于python - 通过添加开始和结束日期来扩展数据框,并用时间戳记和NaN填充它,我们在Stack Overflow上找到一个类似的问题:https://stackoverflow.com/questions/30712831/

10-11 02:20
查看更多