本文介绍了每日大 pandas 数据框到分钟频率的转换不适用于2行数据框的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我正在尝试将每日频率数据帧转换为分钟数据,并且在上一篇文章中建议使用下面的ffil方法,但它似乎不适用于仅包含两行的数据帧().

I am trying to convert a daily frequency dataframe to minute data, and in a previous post it was suggested to use the ffil method below but it does not seem to work with dataframes that consist of only 2 rows (Conversion of Daily pandas dataframe to minute frequency).

因此下面的数据框应该被转换.

So the below dataframe is supposed to be converted.

import pandas as pd
dict = [
        {'ticker':'jpm','date': '2016-11-28','returns': 0.2},
{ 'ticker':'ge','date': '2016-11-28','returns': 0.2},
{'ticker':'fb', 'date': '2016-11-28','returns': 0.2},
{'ticker':'aapl', 'date': '2016-11-28','returns': 0.2},
{'ticker':'msft','date': '2016-11-28','returns': 0.2},
{'ticker':'amzn','date': '2016-11-28','returns': 0.2},
{'ticker':'jpm','date': '2016-11-29','returns': 0.2},
{'ticker':'ge', 'date': '2016-11-29','returns': 0.2},
{'ticker':'fb','date': '2016-11-29','returns': 0.2},
{'ticker':'aapl','date': '2016-11-29','returns': 0.2},
{'ticker':'msft','date': '2016-11-29','returns': 0.2},
{'ticker':'amzn','date': '2016-11-29','returns': 0.2}
]
df = pd.DataFrame(dict)
df['date']      = pd.to_datetime(df['date'])
df=df.set_index(['date','ticker'], drop=True)

这适用于整个数据框:

df_min = df.unstack().asfreq('Min', method='ffill').between_time('8:30','16:00').stack()

但是当我使用较小的数据框时,由于某种原因,它会返回一个空的数据框:

But when I work with a smaller dataframe it returns an empty dataframe for some reason:

df2=df.iloc[0:2,:]

df2_min = df2.unstack().asfreq('Min', method='ffill').between_time('8:30','16:00').stack()

有人对这种奇怪的行为有解释吗?

Does anyone have an explanation for this odd behaviour?

edt:我注意到只有在数据框至少有7行时,代码才有效.

edt: I noticed the code only works if the dataframe has at least 7 rows.

推荐答案

如果只有2行输入DataFrame,则在通过unstack整形后,获得一行DataFrame,而熊猫无法创建连续的分钟DataFrame,因为仅DatetimeIndex的一个值.

If you have only 2 row input DataFrame then after reshape by unstack get one row DataFrame and pandas cannot create continous minute DataFrame, because only one value of DatetimeIndex.

可能的解决方案是在改型后的第二天添加,填充上一个上一行的数据,应用解决方案,并在最后的步骤中按iloc的位置删除最后一个帮助行:

Possible solution is add next day after reshape, fill it last previous row data, apply solution and in last steps remove last helper row by positions with iloc:

df2=df.iloc[0:2]
print (df2)
                   returns
date       ticker
2016-11-28 jpm         0.2
           ge          0.2

df3 = df2.unstack()
print (df3)
ticker         jpm   ge
date
2016-11-28     0.2  0.2
df3.loc[df3.index.max() + pd.Timedelta(1, unit='d')] = df3.iloc[-1]
print (df3)
           returns
ticker         jpm   ge
date
2016-11-28     0.2  0.2
2016-11-29     0.2  0.2 <- helper row

df_min = df3.asfreq('Min', method='ffill')
print (df_min.tail())
                    returns
ticker                  jpm   ge
date
2016-11-28 23:56:00     0.2  0.2
2016-11-28 23:57:00     0.2  0.2
2016-11-28 23:58:00     0.2  0.2
2016-11-28 23:59:00     0.2  0.2
2016-11-29 00:00:00     0.2  0.2 <- helper row

df_min = df_min.iloc[:-1].between_time('8:30','16:00').stack()
#print (df_min)

这篇关于每日大 pandas 数据框到分钟频率的转换不适用于2行数据框的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持!

08-23 18:21