问题描述
我有以下日期时间索引的数据框:
I have the following dataframe with index in datetime:
A
date
2020-01 1
2020-01 2
2020-02 3
2020-02 4
2020-03 5
2020-03 6
2020-04 7
2020-04 8
我想创建一个 for 循环,返回新数据帧(直到数据结束),结果如下:
I want to create a for loop returning new dataframes (until end of data) with this outcome:
dataframe1
A
date
2020-01 1
2020-01 2
2020-02 3
2020-02 4
dataframe2
2020-02 3
2020-02 4
2020-03 5
2020-03 6
dataframe3
2020-03 5
2020-03 6
2020-04 7
2020-04 8
这个想法是添加和删除"滚动.逻辑是:
The idea is an 'add and drop' rolling. The logic is:
- 前两个月返回数据框1
- 删除第一个月并添加一个新的以返回 dataframe2
- 继续直到数据结束
我找到了这个 有用,但我不知道如何正确实施它.我已经尝试过,当我进行滚动时,每个月的值都会总结.我要保持原来的值!
I have found this to be useful, however I do not know how to implement it properly. I have tried and when I do the rolling, the values of each month are summed up. I want to keep the original values!
另外,如果我使用一个简单的 for 循环,例如:
Also, If I use a simple for loop like:
for i in range(len(df)):
print(df[i : i+n])
我可以根据数据帧的长度来实现我的结果.但实际上如何根据月份来做呢?
I can achieve my outcome according to the length of my dataframe. But how actually do it according to months?
任何建议将不胜感激,谢谢!
Any suggestion would be very appreciated, thank you!
推荐答案
好的,我得到你想要的!试试这个:
Ok i get what you want ! try this :
import pandas as pd
import numpy as np
d = {'date': ['2020-01','2020-01','2020-02','2020-02','2020-03','2020-03','2020-04','2020-04',], 'A': [1,2,3,4,5,6,7,8]}
df = pd.DataFrame(data=d)
df['date'] = pd.to_datetime(df['date'], format='%Y-%m')
result=[]
for i,date in enumerate(df.date.unique()):
if i==0:
result.append(df[(df['date']>=date)&(df['date']<=pd.to_datetime(date)+pd.DateOffset(months=2))])
else:
try:
result.append(df[(df['date']>result[i-1].iloc[-1,:].date)&(df['date']<=result[i-1].iloc[-1,:].date+pd.DateOffset(months=2))])
except:pass
result = [i for i in result if not i.empty]
for res in result:
print(res)
在这里我查看之前的 df 以添加日期条件.它将是 >比最后一个数据帧的最后日期.
Here i look in the previous df to add a condition on date. It will be > than the last date of the last dataframe.
df
Out[248]:
date A
0 2020-01-01 1
1 2020-01-01 2
2 2020-02-01 3
3 2020-02-01 4
4 2020-03-01 5
5 2020-03-01 6
6 2020-04-01 7
7 2020-04-01 8
for res in result:
print(res)
date A
0 2020-01-01 1
1 2020-01-01 2
2 2020-02-01 3
3 2020-02-01 4
4 2020-03-01 5
5 2020-03-01 6
date A
6 2020-04-01 7
7 2020-04-01 8
result=[]
for i,date in enumerate(df.date.unique()):
if i==0:
result.append(df[(df['date']>=date)&(df['date']<=pd.to_datetime(date)+pd.DateOffset(months=14))]) #here you choose your time period (for the first df 14 months like in your exemple)
else:
try:
result.append(df[(df['date']>result[i-1].iloc[0,:].date+pd.DateOffset(months=3))&(df['date']<=result[i-1].iloc[0,:].date+pd.DateOffset(months=17))]) #here for the others df, you take all the row between the first date of the previous dataframe + 3 months and the first date of the previous dataframe + 14+3 months
except:pass
result = [i for i in result if not i.empty]
for res in result:
print(res)
这篇关于数据框滑动索引的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持!