pandas 中的前瞻性滚动窗口 - 参差不齐的索引

本文介绍了 pandas 中的前瞻性滚动窗口 - 参差不齐的索引的处理方法，对大家解决问题具有一定的参考价值，需要的朋友们下面随着小编来一起学习吧！

问题描述

希望有人能在这里帮助我我正在尝试在参差不齐的时间索引上创建一个向前滚动的窗口.Pandas 抱怨单调性 - 这在我的索引中显然受到尊重.正常的向后窗口工作得很好.

hope someone can help me hereI am trying to create a forward rolling window on a ragged time index.Pandas complains about monotonicity - which is obviously respected in my index.The normal backward window works just fine.

反向时间索引不通过 is_monotonic.所以我想它需要一个单调的上升索引而不仅仅是单调索引.

The reverse time index does not pass is_monotonic. So I guess it requires a monotonic rising index and not just monotonic index.

请大家有更好的选择非常感谢！

Anyone has a better alternative pleasethanks a lot!

In [352] tmp[::-1]
Out[352]:
stamp
2018-04-23 06:45:16.920   -0.11
2018-04-23 06:45:16.919   -0.03
2018-04-23 06:45:16.918   -0.01
2018-04-23 06:45:16.917   -0.02
2018-04-23 06:45:16.916    0.03
2018-04-23 06:45:16.914    0.03
2018-04-23 06:45:16.911    0.03
2018-04-23 06:45:16.910    0.06
2018-04-23 06:45:16.909    0.09
2018-04-23 06:45:16.908    0.08
2018-04-23 06:45:16.907    0.18
2018-04-23 06:45:16.906    0.28
2018-04-23 06:45:16.905    0.28
2018-04-23 06:45:16.904    0.02
2018-04-23 06:45:16.903    0.09
2018-04-23 06:45:16.902    0.09
2018-04-23 06:45:16.901    0.09
2018-04-23 06:45:16.900    0.09
2018-04-23 06:45:16.899   -0.24
2018-04-23 06:45:16.898   -0.22
2018-04-23 06:45:16.894   -0.22
2018-04-23 06:45:16.799   -0.21
2018-04-23 06:45:16.798   -0.19
2018-04-23 06:45:16.797   -0.21
2018-04-23 06:45:15.057   -0.13
2018-04-23 06:45:15.056   -0.16
2018-04-23 06:45:13.382   -0.04
2018-04-23 06:45:13.381   -0.02
2018-04-23 06:45:13.380   -0.05
2018-04-23 06:45:13.379   -0.08
Name: d66, dtype: float64

In [353]: tmp[::-1].rolling('20L')
Traceback (most recent call last):

  File "<ipython-input-355-74bdfcdfbbd1>", line 1, in <module>
    tmp[::-1].rolling('20L')

  File "C:\Users\luigi\Anaconda3\lib\site-packages\pandas\core\generic.py", line 7067, in rolling
    on=on, axis=axis, closed=closed)

  File "C:\Users\luigi\Anaconda3\lib\site-packages\pandas\core\window.py", line 2069, in rolling
    return Rolling(obj, **kwds)

  File "C:\Users\luigi\Anaconda3\lib\site-packages\pandas\core\window.py", line 86, in __init__
    self.validate()

  File "C:\Users\luigi\Anaconda3\lib\site-packages\pandas\core\window.py", line 1104, in validate
    self._validate_monotonic()

  File "C:\Users\luigi\Anaconda3\lib\site-packages\pandas\core\window.py", line 1136, in _validate_monotonic
    "monotonic".format(formatted))

ValueError: index must be monotonic

In [356]: tmp.index.is_monotonic
Out[356]: True

In [357]: tmp[::-1].index.is_monotonic
Out[357]: False

In [358]: tmp[::-1].index.is_monotonic_decreasing
Out[358]: True

推荐答案

以防万一您仍在寻找解决方案.使用 reindex() 和额外列的帮助，具有参差不齐的前视窗口的滚动功能应该是可行的.

Just in case you are still looking for a solution. with reindex() and the help of an extra column, the rolling functions with ragged forward-looking window should be doable.

import pandas as pd
from io import StringIO

str = """dtime          value
2018-04-23 06:45:16.920   -0.11
2018-04-23 06:45:16.919   -0.03
2018-04-23 06:45:16.918   -0.01
2018-04-23 06:45:16.917   -0.02
2018-04-23 06:45:16.916    0.03
2018-04-23 06:45:16.914    0.03
2018-04-23 06:45:16.911    0.03
2018-04-23 06:45:16.910    0.06
2018-04-23 06:45:16.909    0.09
2018-04-23 06:45:16.908    0.08
2018-04-23 06:45:16.907    0.18
2018-04-23 06:45:16.906    0.28
2018-04-23 06:45:16.905    0.28
2018-04-23 06:45:16.904    0.02
2018-04-23 06:45:16.903    0.09
2018-04-23 06:45:16.902    0.09
2018-04-23 06:45:16.901    0.09
2018-04-23 06:45:16.900    0.09
2018-04-23 06:45:16.899   -0.24
2018-04-23 06:45:16.898   -0.22
2018-04-23 06:45:16.894   -0.22
2018-04-23 06:45:16.799   -0.21
2018-04-23 06:45:16.798   -0.19
2018-04-23 06:45:16.797   -0.21
2018-04-23 06:45:15.057   -0.13
2018-04-23 06:45:15.056   -0.16
2018-04-23 06:45:13.382   -0.04
2018-04-23 06:45:13.381   -0.02
2018-04-23 06:45:13.380   -0.05
2018-04-23 06:45:13.379   -0.08
"""

## read the data tmp[::-1]
df = pd.read_table(StringIO(str), sep="\s\s+", engine="python", index_col=["dtime"], parse_dates=['dtime'])

## reverse the data to its original order
df = df[::-1]

## setup the offset, i.e. 10ms
offset = '10ms'

# create a new column with values as index datetime plus the window timedelta 10ms
df['dt_new'] = df.index + pd.Timedelta(offset)

# use df.index and this new column to form the new index(remove duplicates and sort the list)
idx = sorted(set([*df.index.tolist(), *df.dt_new.tolist()]))

# reindex the original dataframe and calculate the backward rolling sum
df1 = df.reindex(idx).fillna(value={'value':0}).value.rolling(offset, closed='left').sum().to_frame()

# make a LEFt join to the original dataframe. `value_y` should be the forward rolling sum
df.merge(df1, left_on='dt_new', right_index=True, how='left')
#                         value_x                  dt_new  value_y
#dtime
#2018-04-23 06:45:13.379    -0.08 2018-04-23 06:45:13.389    -0.19
#2018-04-23 06:45:13.380    -0.05 2018-04-23 06:45:13.390    -0.11
#2018-04-23 06:45:13.381    -0.02 2018-04-23 06:45:13.391    -0.06
#2018-04-23 06:45:13.382    -0.04 2018-04-23 06:45:13.392    -0.04
#2018-04-23 06:45:15.056    -0.16 2018-04-23 06:45:15.066    -0.29
#2018-04-23 06:45:15.057    -0.13 2018-04-23 06:45:15.067    -0.13
#2018-04-23 06:45:16.797    -0.21 2018-04-23 06:45:16.807    -0.61
#2018-04-23 06:45:16.798    -0.19 2018-04-23 06:45:16.808    -0.40
#2018-04-23 06:45:16.799    -0.21 2018-04-23 06:45:16.809    -0.21
#2018-04-23 06:45:16.894    -0.22 2018-04-23 06:45:16.904    -0.32
#2018-04-23 06:45:16.898    -0.22 2018-04-23 06:45:16.908     0.66
#2018-04-23 06:45:16.899    -0.24 2018-04-23 06:45:16.909     0.96
#2018-04-23 06:45:16.900     0.09 2018-04-23 06:45:16.910     1.29
#2018-04-23 06:45:16.901     0.09 2018-04-23 06:45:16.911     1.26
#2018-04-23 06:45:16.902     0.09 2018-04-23 06:45:16.912     1.20
#2018-04-23 06:45:16.903     0.09 2018-04-23 06:45:16.913     1.11
#2018-04-23 06:45:16.904     0.02 2018-04-23 06:45:16.914     1.02
#2018-04-23 06:45:16.905     0.28 2018-04-23 06:45:16.915     1.03
#2018-04-23 06:45:16.906     0.28 2018-04-23 06:45:16.916     0.75
#2018-04-23 06:45:16.907     0.18 2018-04-23 06:45:16.917     0.50
#2018-04-23 06:45:16.908     0.08 2018-04-23 06:45:16.918     0.30
#2018-04-23 06:45:16.909     0.09 2018-04-23 06:45:16.919     0.21
#2018-04-23 06:45:16.910     0.06 2018-04-23 06:45:16.920     0.09
#2018-04-23 06:45:16.911     0.03 2018-04-23 06:45:16.921    -0.08
#2018-04-23 06:45:16.914     0.03 2018-04-23 06:45:16.924    -0.11
#2018-04-23 06:45:16.916     0.03 2018-04-23 06:45:16.926    -0.14
#2018-04-23 06:45:16.917    -0.02 2018-04-23 06:45:16.927    -0.17
#2018-04-23 06:45:16.918    -0.01 2018-04-23 06:45:16.928    -0.15
#2018-04-23 06:45:16.919    -0.03 2018-04-23 06:45:16.929    -0.14
#2018-04-23 06:45:16.920    -0.11 2018-04-23 06:45:16.930    -0.11

一些注意事项:

当滚动窗口的大小为 offset 时，结果可能会因您定义和选择 closed 选项的方式而异.默认情况下，closed 设置为 right.如果移动偏移量"是应用的方法(如本示例中所示)，则必须使用 closed = left 计算滚动聚合.(虽然你可能有不同的设计).当窗口大小为固定数时，默认的closed 为'both'.

The results might vary based on how you define and select the closed option when the size of the rolling window is an offset. By default closed is set to right. If shifting 'offset' is the method applied(as in this example), the rolling aggregation must be calculated with closed = left. (you might have different design though). When the window size is a fixed number, the deault closed is 'both'.

索引(dtime 字段)不应该包含重复项，如果没有，idx 应该根据两个字段(dtime, value)去重复.

The index (dtime field) should not contain duplicates, if not, idx should be de-duplicated based on two fields (dtime, value).

潜在问题:

在最坏的情况下，reindex() 可能会使行数增加一倍.
使用日期时间字段加入数据帧，如果日期时间保存为浮点数，这可能不适用于每个系统.

这篇关于 pandas 中的前瞻性滚动窗口 - 参差不齐的索引的文章就介绍到这了，希望我们推荐的答案对大家有所帮助，也希望大家多多支持！