问题描述
我有一些时间序列数据(组成一些),一个变量是value
,另一个变量是Temperature
I have some time series data (making some up) one variable is value
and the other is Temperature
import numpy as np
import pandas as pd
np.random.seed(11)
rows,cols = 50000,2
data = np.random.rand(rows,cols)
tidx = pd.date_range('2019-01-01', periods=rows, freq='T')
df = pd.DataFrame(data, columns=['Temperature','Value'], index=tidx)
问题 ,我如何每天在名为daily_summary
的独立熊猫df中每天重新采样数据,每列3列,分别包含:
Question, How do I resample the data per day in a separate pandas df named daily_summary
with 3 columns each containing:
- 每日最大值
- 出现最大值的小时
- 出现最大值时的记录温度
我知道我可以在下面使用此代码来查找每日的最大值和发生的时间:
I know I can use this code below to find daily maximum value and the hour it occurred:
daily_summary = df.groupby(df.index.normalize())['Value'].agg(['idxmax', 'max'])
daily_summary['hour'] = daily_summary['idxmax'].dt.hour
daily_summary = daily_summary.drop(['idxmax'], axis=1)
daily_summary.rename(columns = {'max':'DailyMaxValue'}, inplace = True)
但是我迷失了试图记录这些每日最高记录中的温度...
But I am lost trying to incorporate what the temperature was during these daily recordings of the maximum value...
使用.loc
会是一种更好的方法,其中循环可以每天进行过滤...类似这样的东西?
Would using .loc
be a better method where a loop could just filter thru each day... Something like this???
for idx, days in df.groupby(df.index.date):
print(days)
daily_summary = df.loc[days['Value'].max().astype('int')]
如果运行此命令,我可以每天打印days
,但是daily_summary
会抛出TypeError: cannot do index indexing on <class 'pandas.core.indexes.datetimes.DatetimeIndex'> with these indexers [0] of <class 'numpy.int32'>
If I run this I can print each day days
but the daily_summary
will throw a TypeError: cannot do index indexing on <class 'pandas.core.indexes.datetimes.DatetimeIndex'> with these indexers [0] of <class 'numpy.int32'>
非常感谢任何提示
推荐答案
您可以解析为idxmax
和loc
:
idx = df.groupby(df.index.normalize())['Value'].idxmax()
ret_df = df.loc[idx].copy()
# get the hour
ret_df['hour'] = ret_df.index.hour
# set date as index
ret_df.index = ret_df.index.normalize()
输出:
Temperature Value hour
2019-01-01 0.423320 0.998377 19
2019-01-02 0.117154 0.999976 10
2019-01-03 0.712291 0.999497 16
2019-01-04 0.404229 0.999996 21
2019-01-05 0.457618 0.999371 17
这篇关于重新采样时间序列数据的多个变量的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持!