本文介绍了数据框滑动索引的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我有以下日期时间索引的数据框:

I have the following dataframe with index in datetime:

        A
date
2020-01  1
2020-01  2
2020-02  3
2020-02  4
2020-03  5
2020-03  6
2020-04  7
2020-04  8

我想创建一个 for 循环,返回新数据帧(直到数据结束),结果如下:

I want to create a for loop returning new dataframes (until end of data) with this outcome:

dataframe1

         A
date
2020-01  1
2020-01  2
2020-02  3
2020-02  4

dataframe2

2020-02  3
2020-02  4
2020-03  5
2020-03  6

dataframe3

2020-03  5
2020-03  6
2020-04  7
2020-04  8

这个想法是添加和删除"滚动.逻辑是:

The idea is an 'add and drop' rolling. The logic is:

  • 前两个月返回数据框1
  • 删除第一个月并添加一个新的以返回 dataframe2
  • 继续直到数据结束

我找到了这个 有用,但我不知道如何正确实施它.我已经尝试过,当我进行滚动时,每个月的值都会总结.我要保持原来的值!

I have found this to be useful, however I do not know how to implement it properly. I have tried and when I do the rolling, the values of each month are summed up. I want to keep the original values!

另外,如果我使用一个简单的 for 循环,例如:

Also, If I use a simple for loop like:

for i in range(len(df)):
    print(df[i : i+n])

我可以根据数据帧的长度来实现我的结果.但实际上如何根据月份来做呢?

I can achieve my outcome according to the length of my dataframe. But how actually do it according to months?

任何建议将不胜感激,谢谢!

Any suggestion would be very appreciated, thank you!

推荐答案

好的,我得到你想要的!试试这个:

Ok i get what you want ! try this :

import pandas as pd
import numpy as np
d = {'date': ['2020-01','2020-01','2020-02','2020-02','2020-03','2020-03','2020-04','2020-04',], 'A': [1,2,3,4,5,6,7,8]}
df = pd.DataFrame(data=d)
df['date'] = pd.to_datetime(df['date'], format='%Y-%m')
result=[]
for i,date in enumerate(df.date.unique()):
    if i==0:
        result.append(df[(df['date']>=date)&(df['date']<=pd.to_datetime(date)+pd.DateOffset(months=2))])
    else:
        try:
            result.append(df[(df['date']>result[i-1].iloc[-1,:].date)&(df['date']<=result[i-1].iloc[-1,:].date+pd.DateOffset(months=2))])
        except:pass
result = [i for i in result if not i.empty]
for res in result:
    print(res)

在这里我查看之前的 df 以添加日期条件.它将是 >比最后一个数据帧的最后日期.

Here i look in the previous df to add a condition on date. It will be > than the last date of the last dataframe.

df
Out[248]:
        date  A
0 2020-01-01  1
1 2020-01-01  2
2 2020-02-01  3
3 2020-02-01  4
4 2020-03-01  5
5 2020-03-01  6
6 2020-04-01  7
7 2020-04-01  8

for res in result:
    print(res)

        date  A
0 2020-01-01  1
1 2020-01-01  2
2 2020-02-01  3
3 2020-02-01  4
4 2020-03-01  5
5 2020-03-01  6
        date  A
6 2020-04-01  7
7 2020-04-01  8
result=[]
for i,date in enumerate(df.date.unique()):
    if i==0:
        result.append(df[(df['date']>=date)&(df['date']<=pd.to_datetime(date)+pd.DateOffset(months=14))]) #here you choose your time period (for the first df 14 months like in your exemple)
    else:
        try:
            result.append(df[(df['date']>result[i-1].iloc[0,:].date+pd.DateOffset(months=3))&(df['date']<=result[i-1].iloc[0,:].date+pd.DateOffset(months=17))]) #here for the others df, you take all the row between the first date of the previous dataframe + 3 months and the first date of the previous dataframe + 14+3 months
        except:pass
result = [i for i in result if not i.empty]
for res in result:
    print(res)

这篇关于数据框滑动索引的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持!

07-29 15:55