与 pandas 或numpy的n维滑动窗口

本文介绍了与 pandas 或numpy的n维滑动窗口的处理方法，对大家解决问题具有一定的参考价值，需要的朋友们下面随着小编来一起学习吧！

问题描述

我该怎么办rollapply（....，by.column = FALSE）的R（XTS）等效，使用numpy的还是熊猫？当给定一个数据帧，熊猫rolling_apply似乎只列工作列，而不是提供选项来提供一个完整的（窗口大小）×（数据帧的宽度）矩阵为目标的功能。

How do I do the R(xts) equivalent of rollapply(...., by.column=FALSE), using Numpy or Pandas? When given a dataframe, pandas rolling_apply seems only to work column by column instead of providing the option to provide a full (window-size) x (data-frame-width) matrix to the target function.

import pandas as pd
import numpy as np

xx = pd.DataFrame(np.zeros([10, 10]))
pd.rolling_apply(xx, 5, lambda x: np.shape(x)[0])

    0   1   2   3   4   5   6   7   8   9
0 NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN
1 NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN
2 NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN
3 NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN
4   5   5   5   5   5   5   5   5   5   5
5   5   5   5   5   5   5   5   5   5   5
6   5   5   5   5   5   5   5   5   5   5
7   5   5   5   5   5   5   5   5   5   5
8   5   5   5   5   5   5   5   5   5   5
9   5   5   5   5   5   5   5   5   5   5

所以，发生了什么事是rolling_apply正在下降，反过来每一列和应用滑动5长窗下的这些各一个，而我希望的是滑动窗口每次是一个5×10阵列，而在这种情况下，，我会得到一个列向量（而不是二维数组）的结果。

So what's happening is rolling_apply is going down each column in turn and applying a sliding 5-length window down each one of these, whereas what I want is for the sliding windows to be a 5x10 array each time, and in this case, I would get a single column vector (not 2d array) result.

推荐答案

我确实无法找到一种方法来计算宽，在大熊猫滚动的应用
文档，所以我会使用numpy的，以获得阵列上的窗口化的观点，并应用ufunc
给它。这里有一个例子：

I indeed cannot find a way to compute "wide" rolling application in pandasdocs, so I'd use numpy to get a "windowing" view on the array and apply a ufuncto it. Here's an example:

In [40]: arr = np.arange(50).reshape(10, 5); arr
Out[40]:
array([[ 0,  1,  2,  3,  4],
       [ 5,  6,  7,  8,  9],
       [10, 11, 12, 13, 14],
       [15, 16, 17, 18, 19],
       [20, 21, 22, 23, 24],
       [25, 26, 27, 28, 29],
       [30, 31, 32, 33, 34],
       [35, 36, 37, 38, 39],
       [40, 41, 42, 43, 44],
       [45, 46, 47, 48, 49]])

In [41]: win_size = 5

In [42]: isize = arr.itemsize; isize
Out[42]: 8

arr.itemsize 是8，因为默认情况下DTYPE是 np.int64 ，你需要它下面的窗口鉴于成语：

arr.itemsize is 8 because default dtype is np.int64, you need it for the following "window" view idiom:

In [43]: windowed = np.lib.stride_tricks.as_strided(arr,
                                                    shape=(arr.shape[0] - win_size + 1, win_size, arr.shape[1]),
                                                    strides=(arr.shape[1] * isize, arr.shape[1] * isize, isize)); windowed
Out[43]:
array([[[ 0,  1,  2,  3,  4],
        [ 5,  6,  7,  8,  9],
        [10, 11, 12, 13, 14],
        [15, 16, 17, 18, 19],
        [20, 21, 22, 23, 24]],

       [[ 5,  6,  7,  8,  9],
        [10, 11, 12, 13, 14],
        [15, 16, 17, 18, 19],
        [20, 21, 22, 23, 24],
        [25, 26, 27, 28, 29]],

       [[10, 11, 12, 13, 14],
        [15, 16, 17, 18, 19],
        [20, 21, 22, 23, 24],
        [25, 26, 27, 28, 29],
        [30, 31, 32, 33, 34]],

       [[15, 16, 17, 18, 19],
        [20, 21, 22, 23, 24],
        [25, 26, 27, 28, 29],
        [30, 31, 32, 33, 34],
        [35, 36, 37, 38, 39]],

       [[20, 21, 22, 23, 24],
        [25, 26, 27, 28, 29],
        [30, 31, 32, 33, 34],
        [35, 36, 37, 38, 39],
        [40, 41, 42, 43, 44]],

       [[25, 26, 27, 28, 29],
        [30, 31, 32, 33, 34],
        [35, 36, 37, 38, 39],
        [40, 41, 42, 43, 44],
        [45, 46, 47, 48, 49]]])

健是沿给定轴的两个相邻元件之间的字节数，
因此， =迈进（arr.shape [1] * isize，arr.shape [1] * isize，isize）表示跳过5
从元素窗口去当[0]为窗口[1]，并跳过时，5种元素
从窗口去[0,0]为窗口[0,1]。现在你可以呼吁任何ufunc
结果数组，例如：

Strides are number of bytes between two neighbour elements along given axis,thus strides=(arr.shape[1] * isize, arr.shape[1] * isize, isize) means skip 5elements when going from windowed[0] to windowed[1] and skip 5 elements whengoing from windowed[0, 0] to windowed[0, 1]. Now you can call any ufunc on theresulting array, e.g.:

In [44]: windowed.sum(axis=(1,2))
Out[44]: array([300, 425, 550, 675, 800, 925])

这篇关于与 pandas 或numpy的n维滑动窗口的文章就介绍到这了，希望我们推荐的答案对大家有所帮助，也希望大家多多支持！