我有一个数据框架,我想计算所有列每12小时的平均值。
dataframe有超过200k行。

          DateTime Speed   TRQ         ...    PtoP3  RMS3   Crest3
0       2016-07-01 00:00   994  35.4   ...       NA    NA       NA
1       2016-07-01 00:01   995  34.6   ...       NA    NA       NA
2       2016-07-01 00:02   995    34   ...       NA    NA       NA

我写了这个
Present_data.to_datetime(Present_data['DateTime'])
Total_12hravg_all = Present_data.groupby(pd.Grouper(freq='12H', key='DateTime')).mean()
print(Total_12hravg_all)

得到这个错误
类型错误:仅对DatetimeIndex、TimedeltaIndex或
PeriodIndex,但有一个“Index”实例

最佳答案

如果Datetime是列:
你的解决方案应该很有效:

Present_data['DateTime'] = pd.to_datetime(Present_data['DateTime'])
Total_12hravg_all = Present_data.groupby(pd.Grouper(freq='12H', key='DateTime')).mean()

另一种解决方案是使用参数resampleon
Present_data['DateTime'] = pd.to_datetime(Present_data['DateTime'])
Total_12hravg_all = Present_data.resample('12H', on='DateTime').mean()

或创建DatetimeIndex
Present_data['DateTime'] = pd.to_datetime(Present_data['DateTime'])
Present_data = Present_data.set_index('DateTime')
Total_12hravg_all = Present_data.groupby(pd.Grouper(freq='12H')).mean()
#resample
#Total_12hravg_all = Present_data.resample('12H').mean()

如果Datetime是索引:
Present_data.index = pd.to_datetime(Present_data.index)

Total_12hravg_all = Present_data.groupby(pd.Grouper(freq='12H')).mean()
#resample
#Total_12hravg_all = Present_data.resample('12H').mean()

最终解决方案:
Present_data['DateTime'] = pd.to_datetime(Present_data['DateTime'])
Present_data = Present_data.set_index('DateTime')

#convert non numeri values to NaNs
Present_data = Present_data.apply(lambda x: pd.to_numeric(x, errors='coerce'))

Total_12hravg_all = Present_data.groupby(pd.Grouper(freq='12H')).mean()
#resample
#Total_12hravg_all = Present_data.resample('12H').mean()

关于python - 每12小时计算所有列的平均值,我们在Stack Overflow上找到一个类似的问题:https://stackoverflow.com/questions/52400236/

10-09 08:49