我有以下熊猫数据框
Date half_hourly_bucket Value
2018-01-01 00:00:01 - 00:30:00 123
2018-01-01 00:30:01 - 01:00:00 12
2018-01-01 01:00:01 - 01:30:00 122
2018-01-01 02:00:01 - 02:30:00 111
2018-01-01 03:00:01 - 03:30:00 122
2018-01-01 04:00:01 - 04:30:00 111
我想要的数据帧是
Date half_hourly_bucket Value
2018-01-01 00:00:01 - 00:30:00 123
2018-01-01 00:30:01 - 01:00:00 12
2018-01-01 01:00:01 - 01:30:00 122
2018-01-01 01:30:01 - 02:00:00 0
2018-01-01 02:00:01 - 02:30:00 122
2018-01-01 02:30:01 - 03:00:00 0
2018-01-01 03:00:01 - 03:30:00 111
2018-01-01 03:30:01 - 04:00:00 0
2018-01-01 04:00:01 - 04:30:00 111
2018-01-01 04:30:01 - 05:00:00 0
2018-01-01 05:00:01 - 05:30:00 0
2018-01-01 05:30:01 - 06:00:00 0
2018-01-01 06:00:01 - 06:30:00 0
2018-01-01 06:30:01 - 07:00:00 0
2018-01-01 07:00:01 - 07:30:00 0
2018-01-01 07:30:01 - 08:00:00 0
2018-01-01 08:00:01 - 08:30:00 0
2018-01-01 09:00:01 - 09:30:00 0
2018-01-01 10:00:01 - 10:30:00 0
2018-01-01 10:30:01 - 11:00:00 0
2018-01-01 11:00:01 - 11:30:00 0
2018-01-01 11:30:01 - 12:00:00 0
2018-01-01 12:00:01 - 12:30:00 0
2018-01-01 12:30:01 - 13:00:00 0
2018-01-01 13:00:01 - 13:30:00 0
2018-01-01 13:30:01 - 14:00:00 0
2018-01-01 14:00:01 - 14:30:00 0
2018-01-01 14:30:01 - 15:00:00 0
2018-01-01 15:00:01 - 15:30:00 0
2018-01-01 15:30:01 - 16:00:00 0
2018-01-01 16:00:01 - 16:30:00 0
2018-01-01 16:30:01 - 17:00:00 0
2018-01-01 17:00:01 - 17:30:00 0
2018-01-01 18:00:01 - 18:30:00 0
2018-01-01 18:30:01 - 19:00:00 0
2018-01-01 19:00:01 - 19:30:00 0
2018-01-01 19:30:01 - 20:00:00 0
2018-01-01 20:00:01 - 20:30:00 0
2018-01-01 20:30:01 - 21:00:00 0
2018-01-01 21:00:01 - 21:30:00 0
2018-01-01 21:30:01 - 22:00:00 0
2018-01-01 22:00:01 - 22:30:00 0
2018-01-01 22:30:01 - 23:00:00 0
2018-01-01 23:00:01 - 23:30:00 0
2018-01-01 23:30:01 - 00:00:00 0
我想在
Date
列中检查的是,在任何半小时的bucket(每天总共48个bucket)中是否缺少数据,如果缺少数据,则必须按顺序添加该bucket,并将其值设为0。我在熊猫身上怎么做?
最佳答案
解决方案中断half_hourly_bucket
到两个新列,处理它并重新连接:
#create DatetimeIndex
df = df.set_index('Date')
#split to new columns
df[['one','two']] = df['half_hourly_bucket'].str.split(' - ', expand=True)
#add first column to DatetimeIndex
df.index += pd.to_timedelta(df['one'])
#add mising values to DatetimeIndex
one_sec = pd.Timedelta(1, unit='s')
one_day = pd.Timedelta(1, unit='d')
df = df.reindex(pd.date_range(df.index.min().floor('D') + one_sec,
df.index.max().floor('D') + one_day - one_sec, freq='30T'))
#recreate column two
df['two'] = df.index + pd.Timedelta(30*60 - 1, unit='s')
#join together
df['half_hourly_bucket'] = (df.index.strftime('%H:%M:%S') + ' - ' +
df['two'].dt.strftime('%H:%M:%S'))
#replace missing values
df['Value'] = df['Value'].fillna(0)
df = df.rename_axis('Date').reset_index()
#filter only necessary columns
df = df[['Date','half_hourly_bucket','Value']]
print (df)
Date half_hourly_bucket Value
0 2018-01-01 00:00:01 00:00:01 - 00:30:00 123.0
1 2018-01-01 00:30:01 00:30:01 - 01:00:00 12.0
2 2018-01-01 01:00:01 01:00:01 - 01:30:00 122.0
3 2018-01-01 01:30:01 01:30:01 - 02:00:00 0.0
4 2018-01-01 02:00:01 02:00:01 - 02:30:00 111.0
5 2018-01-01 02:30:01 02:30:01 - 03:00:00 0.0
6 2018-01-01 03:00:01 03:00:01 - 03:30:00 122.0
7 2018-01-01 03:30:01 03:30:01 - 04:00:00 0.0
8 2018-01-01 04:00:01 04:00:01 - 04:30:00 111.0
9 2018-01-01 04:30:01 04:30:01 - 05:00:00 0.0
10 2018-01-01 05:00:01 05:00:01 - 05:30:00 0.0
11 2018-01-01 05:30:01 05:30:01 - 06:00:00 0.0
12 2018-01-01 06:00:01 06:00:01 - 06:30:00 0.0
13 2018-01-01 06:30:01 06:30:01 - 07:00:00 0.0
14 2018-01-01 07:00:01 07:00:01 - 07:30:00 0.0
15 2018-01-01 07:30:01 07:30:01 - 08:00:00 0.0
16 2018-01-01 08:00:01 08:00:01 - 08:30:00 0.0
17 2018-01-01 08:30:01 08:30:01 - 09:00:00 0.0
18 2018-01-01 09:00:01 09:00:01 - 09:30:00 0.0
19 2018-01-01 09:30:01 09:30:01 - 10:00:00 0.0
20 2018-01-01 10:00:01 10:00:01 - 10:30:00 0.0
21 2018-01-01 10:30:01 10:30:01 - 11:00:00 0.0
22 2018-01-01 11:00:01 11:00:01 - 11:30:00 0.0
23 2018-01-01 11:30:01 11:30:01 - 12:00:00 0.0
24 2018-01-01 12:00:01 12:00:01 - 12:30:00 0.0
25 2018-01-01 12:30:01 12:30:01 - 13:00:00 0.0
26 2018-01-01 13:00:01 13:00:01 - 13:30:00 0.0
27 2018-01-01 13:30:01 13:30:01 - 14:00:00 0.0
28 2018-01-01 14:00:01 14:00:01 - 14:30:00 0.0
29 2018-01-01 14:30:01 14:30:01 - 15:00:00 0.0
30 2018-01-01 15:00:01 15:00:01 - 15:30:00 0.0
31 2018-01-01 15:30:01 15:30:01 - 16:00:00 0.0
32 2018-01-01 16:00:01 16:00:01 - 16:30:00 0.0
33 2018-01-01 16:30:01 16:30:01 - 17:00:00 0.0
34 2018-01-01 17:00:01 17:00:01 - 17:30:00 0.0
35 2018-01-01 17:30:01 17:30:01 - 18:00:00 0.0
36 2018-01-01 18:00:01 18:00:01 - 18:30:00 0.0
37 2018-01-01 18:30:01 18:30:01 - 19:00:00 0.0
38 2018-01-01 19:00:01 19:00:01 - 19:30:00 0.0
39 2018-01-01 19:30:01 19:30:01 - 20:00:00 0.0
40 2018-01-01 20:00:01 20:00:01 - 20:30:00 0.0
41 2018-01-01 20:30:01 20:30:01 - 21:00:00 0.0
42 2018-01-01 21:00:01 21:00:01 - 21:30:00 0.0
43 2018-01-01 21:30:01 21:30:01 - 22:00:00 0.0
44 2018-01-01 22:00:01 22:00:01 - 22:30:00 0.0
45 2018-01-01 22:30:01 22:30:01 - 23:00:00 0.0
46 2018-01-01 23:00:01 23:00:01 - 23:30:00 0.0
47 2018-01-01 23:30:01 23:30:01 - 00:00:00 0.0
关于python - 如何检查 Pandas 中是否缺少任何字符串,我们在Stack Overflow上找到一个类似的问题:https://stackoverflow.com/questions/52873656/