问题描述
希望有人能在这里帮助我我正在尝试在参差不齐的时间索引上创建一个向前滚动的窗口.Pandas 抱怨单调性 - 这在我的索引中显然受到尊重.正常的向后窗口工作得很好.
hope someone can help me hereI am trying to create a forward rolling window on a ragged time index.Pandas complains about monotonicity - which is obviously respected in my index.The normal backward window works just fine.
反向时间索引不通过 is_monotonic.所以我想它需要一个单调的上升索引而不仅仅是单调索引.
The reverse time index does not pass is_monotonic. So I guess it requires a monotonic rising index and not just monotonic index.
请大家有更好的选择非常感谢!
Anyone has a better alternative pleasethanks a lot!
In [352] tmp[::-1]
Out[352]:
stamp
2018-04-23 06:45:16.920 -0.11
2018-04-23 06:45:16.919 -0.03
2018-04-23 06:45:16.918 -0.01
2018-04-23 06:45:16.917 -0.02
2018-04-23 06:45:16.916 0.03
2018-04-23 06:45:16.914 0.03
2018-04-23 06:45:16.911 0.03
2018-04-23 06:45:16.910 0.06
2018-04-23 06:45:16.909 0.09
2018-04-23 06:45:16.908 0.08
2018-04-23 06:45:16.907 0.18
2018-04-23 06:45:16.906 0.28
2018-04-23 06:45:16.905 0.28
2018-04-23 06:45:16.904 0.02
2018-04-23 06:45:16.903 0.09
2018-04-23 06:45:16.902 0.09
2018-04-23 06:45:16.901 0.09
2018-04-23 06:45:16.900 0.09
2018-04-23 06:45:16.899 -0.24
2018-04-23 06:45:16.898 -0.22
2018-04-23 06:45:16.894 -0.22
2018-04-23 06:45:16.799 -0.21
2018-04-23 06:45:16.798 -0.19
2018-04-23 06:45:16.797 -0.21
2018-04-23 06:45:15.057 -0.13
2018-04-23 06:45:15.056 -0.16
2018-04-23 06:45:13.382 -0.04
2018-04-23 06:45:13.381 -0.02
2018-04-23 06:45:13.380 -0.05
2018-04-23 06:45:13.379 -0.08
Name: d66, dtype: float64
In [353]: tmp[::-1].rolling('20L')
Traceback (most recent call last):
File "<ipython-input-355-74bdfcdfbbd1>", line 1, in <module>
tmp[::-1].rolling('20L')
File "C:\Users\luigi\Anaconda3\lib\site-packages\pandas\core\generic.py", line 7067, in rolling
on=on, axis=axis, closed=closed)
File "C:\Users\luigi\Anaconda3\lib\site-packages\pandas\core\window.py", line 2069, in rolling
return Rolling(obj, **kwds)
File "C:\Users\luigi\Anaconda3\lib\site-packages\pandas\core\window.py", line 86, in __init__
self.validate()
File "C:\Users\luigi\Anaconda3\lib\site-packages\pandas\core\window.py", line 1104, in validate
self._validate_monotonic()
File "C:\Users\luigi\Anaconda3\lib\site-packages\pandas\core\window.py", line 1136, in _validate_monotonic
"monotonic".format(formatted))
ValueError: index must be monotonic
In [356]: tmp.index.is_monotonic
Out[356]: True
In [357]: tmp[::-1].index.is_monotonic
Out[357]: False
In [358]: tmp[::-1].index.is_monotonic_decreasing
Out[358]: True
推荐答案
以防万一您仍在寻找解决方案.使用 reindex() 和额外列的帮助,具有参差不齐的前视窗口的滚动功能应该是可行的.
Just in case you are still looking for a solution. with reindex() and the help of an extra column, the rolling functions with ragged forward-looking window should be doable.
import pandas as pd
from io import StringIO
str = """dtime value
2018-04-23 06:45:16.920 -0.11
2018-04-23 06:45:16.919 -0.03
2018-04-23 06:45:16.918 -0.01
2018-04-23 06:45:16.917 -0.02
2018-04-23 06:45:16.916 0.03
2018-04-23 06:45:16.914 0.03
2018-04-23 06:45:16.911 0.03
2018-04-23 06:45:16.910 0.06
2018-04-23 06:45:16.909 0.09
2018-04-23 06:45:16.908 0.08
2018-04-23 06:45:16.907 0.18
2018-04-23 06:45:16.906 0.28
2018-04-23 06:45:16.905 0.28
2018-04-23 06:45:16.904 0.02
2018-04-23 06:45:16.903 0.09
2018-04-23 06:45:16.902 0.09
2018-04-23 06:45:16.901 0.09
2018-04-23 06:45:16.900 0.09
2018-04-23 06:45:16.899 -0.24
2018-04-23 06:45:16.898 -0.22
2018-04-23 06:45:16.894 -0.22
2018-04-23 06:45:16.799 -0.21
2018-04-23 06:45:16.798 -0.19
2018-04-23 06:45:16.797 -0.21
2018-04-23 06:45:15.057 -0.13
2018-04-23 06:45:15.056 -0.16
2018-04-23 06:45:13.382 -0.04
2018-04-23 06:45:13.381 -0.02
2018-04-23 06:45:13.380 -0.05
2018-04-23 06:45:13.379 -0.08
"""
## read the data tmp[::-1]
df = pd.read_table(StringIO(str), sep="\s\s+", engine="python", index_col=["dtime"], parse_dates=['dtime'])
## reverse the data to its original order
df = df[::-1]
## setup the offset, i.e. 10ms
offset = '10ms'
# create a new column with values as index datetime plus the window timedelta 10ms
df['dt_new'] = df.index + pd.Timedelta(offset)
# use df.index and this new column to form the new index(remove duplicates and sort the list)
idx = sorted(set([*df.index.tolist(), *df.dt_new.tolist()]))
# reindex the original dataframe and calculate the backward rolling sum
df1 = df.reindex(idx).fillna(value={'value':0}).value.rolling(offset, closed='left').sum().to_frame()
# make a LEFt join to the original dataframe. `value_y` should be the forward rolling sum
df.merge(df1, left_on='dt_new', right_index=True, how='left')
# value_x dt_new value_y
#dtime
#2018-04-23 06:45:13.379 -0.08 2018-04-23 06:45:13.389 -0.19
#2018-04-23 06:45:13.380 -0.05 2018-04-23 06:45:13.390 -0.11
#2018-04-23 06:45:13.381 -0.02 2018-04-23 06:45:13.391 -0.06
#2018-04-23 06:45:13.382 -0.04 2018-04-23 06:45:13.392 -0.04
#2018-04-23 06:45:15.056 -0.16 2018-04-23 06:45:15.066 -0.29
#2018-04-23 06:45:15.057 -0.13 2018-04-23 06:45:15.067 -0.13
#2018-04-23 06:45:16.797 -0.21 2018-04-23 06:45:16.807 -0.61
#2018-04-23 06:45:16.798 -0.19 2018-04-23 06:45:16.808 -0.40
#2018-04-23 06:45:16.799 -0.21 2018-04-23 06:45:16.809 -0.21
#2018-04-23 06:45:16.894 -0.22 2018-04-23 06:45:16.904 -0.32
#2018-04-23 06:45:16.898 -0.22 2018-04-23 06:45:16.908 0.66
#2018-04-23 06:45:16.899 -0.24 2018-04-23 06:45:16.909 0.96
#2018-04-23 06:45:16.900 0.09 2018-04-23 06:45:16.910 1.29
#2018-04-23 06:45:16.901 0.09 2018-04-23 06:45:16.911 1.26
#2018-04-23 06:45:16.902 0.09 2018-04-23 06:45:16.912 1.20
#2018-04-23 06:45:16.903 0.09 2018-04-23 06:45:16.913 1.11
#2018-04-23 06:45:16.904 0.02 2018-04-23 06:45:16.914 1.02
#2018-04-23 06:45:16.905 0.28 2018-04-23 06:45:16.915 1.03
#2018-04-23 06:45:16.906 0.28 2018-04-23 06:45:16.916 0.75
#2018-04-23 06:45:16.907 0.18 2018-04-23 06:45:16.917 0.50
#2018-04-23 06:45:16.908 0.08 2018-04-23 06:45:16.918 0.30
#2018-04-23 06:45:16.909 0.09 2018-04-23 06:45:16.919 0.21
#2018-04-23 06:45:16.910 0.06 2018-04-23 06:45:16.920 0.09
#2018-04-23 06:45:16.911 0.03 2018-04-23 06:45:16.921 -0.08
#2018-04-23 06:45:16.914 0.03 2018-04-23 06:45:16.924 -0.11
#2018-04-23 06:45:16.916 0.03 2018-04-23 06:45:16.926 -0.14
#2018-04-23 06:45:16.917 -0.02 2018-04-23 06:45:16.927 -0.17
#2018-04-23 06:45:16.918 -0.01 2018-04-23 06:45:16.928 -0.15
#2018-04-23 06:45:16.919 -0.03 2018-04-23 06:45:16.929 -0.14
#2018-04-23 06:45:16.920 -0.11 2018-04-23 06:45:16.930 -0.11
一些注意事项:
当滚动窗口的大小为
offset
时,结果可能会因您定义和选择closed
选项的方式而异.默认情况下,closed
设置为right
.如果移动偏移量"是应用的方法(如本示例中所示),则必须使用closed
=left
计算滚动聚合.(虽然你可能有不同的设计).当窗口大小为固定数时,默认的closed
为'both'.
The results might vary based on how you define and select the
closed
option when the size of the rolling window is anoffset
. By defaultclosed
is set toright
. If shifting 'offset' is the method applied(as in this example), the rolling aggregation must be calculated withclosed
=left
. (you might have different design though). When the window size is a fixed number, the deaultclosed
is 'both'.
索引(dtime
字段)不应该包含重复项,如果没有,idx
应该根据两个字段(dtime, value)去重复.
The index (dtime
field) should not contain duplicates, if not, idx
should be de-duplicated based on two fields (dtime, value).
潜在问题:
- 在最坏的情况下,reindex() 可能会使行数增加一倍.
- 使用日期时间字段加入数据帧,如果日期时间保存为浮点数,这可能不适用于每个系统.
这篇关于 pandas 中的前瞻性滚动窗口 - 参差不齐的索引的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持!