本文介绍了在 Pandas 数据框上滑动窗口的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我有一个包含时间序列数据的大型 Pandas 数据框.

我目前操作这个数据框来创建一个新的、更小的数据框,它是每 10 行滚动平均值.即滚动窗口技术.像这样:

def create_new_df(df):功能 = []x = df['X'].astype(float)i = x.index.values时间序列 = [i] * 10idx = np.array(time_sequence).T.flatten()[:len(x)]x = x.groupby(idx).mean()x.name = 'X'features.append(x)new_df = pd.concat(特征,轴=1)返回 new_df

要测试的代码:

columns = ['X']df_ = pd.DataFrame(columns=columns)df_ = df_.fillna(0) # 用 0s 而不是 NaNs数据 = np.array([np.arange(20)]*1).Tdf = pd.DataFrame(数据,列=列)测试 = create_new_df(df)打印测试

输出:

 X0 4.51 14.5

但是,我希望该函数使用滑动窗口具有 50% 重叠

所以输出看起来像这样:

 X0 4.51 9.52 14.5

我该怎么做?

这是我尝试过的:

from itertools import tee, izip定义窗口(可迭代,大小):iters = tee(可迭代,大小)对于 xrange(1, size) 中的 i:对于 iters[i:] 中的每个:下一个(每个,无)返回 izip(*iters)对于 window(df, 20) 中的每个:print list(each) # 没有想要的滑动窗口效果

有些人可能还建议使用熊猫 rolling_mean() 方法,但如果是这样,我看不到如何在窗口重叠的情况下使用此功能.

任何帮助将不胜感激.

解决方案

我认为 Pandas 滚动技术在这里很好.请注意,从 pandas 0.18.0 版本开始,您将使用 rolling().mean() 而不是 rolling_mean().

>>>df=pd.DataFrame({'x':range(30)})>>>df = df.rolling(10).mean() # 0.18.0 版本语法>>>df[4::5] # 每 5 行取一次X4 南9 4.514 9.519 14.524 19.529 24.5

I have a large pandas dataframe of time-series data.

I currently manipulate this dataframe to create a new, smaller dataframe that is rolling average of every 10 rows. i.e. a rolling window technique. Like this:

def create_new_df(df):
    features = []
    x = df['X'].astype(float)
    i = x.index.values
    time_sequence = [i] * 10
    idx = np.array(time_sequence).T.flatten()[:len(x)]
    x = x.groupby(idx).mean()
    x.name = 'X'
    features.append(x)
    new_df = pd.concat(features, axis=1)
    return new_df

Code to test:

columns = ['X']
df_ = pd.DataFrame(columns=columns)
df_ = df_.fillna(0) # with 0s rather than NaNs
data = np.array([np.arange(20)]*1).T
df = pd.DataFrame(data, columns=columns)

test = create_new_df(df)
print test

Output:

      X
0   4.5
1  14.5

However, I want the function to make the new dataframe using a sliding window with a 50% overlap

So the output would look like this:

      X
0   4.5
1   9.5
2  14.5

How can I do this?

Here's what I've tried:

from itertools import tee, izip

def window(iterable, size):
    iters = tee(iterable, size)
    for i in xrange(1, size):
        for each in iters[i:]:
            next(each, None)
    return izip(*iters)

for each in window(df, 20):
    print list(each) # doesn't have the desired sliding window effect

Some might also suggest using the pandas rolling_mean() methods, but if so, I can't see how to use this function with window overlap.

Any help would be much appreciated.

解决方案

I think pandas rolling techniques are fine here. Note that starting with version 0.18.0 of pandas, you would use rolling().mean() instead of rolling_mean().

>>> df=pd.DataFrame({ 'x':range(30) })
>>> df = df.rolling(10).mean()           # version 0.18.0 syntax
>>> df[4::5]                             # take every 5th row

       x
4    NaN
9    4.5
14   9.5
19  14.5
24  19.5
29  24.5

这篇关于在 Pandas 数据框上滑动窗口的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持!

10-12 16:12