本文介绍了groupby具有重叠的间隔时间的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!
问题描述
我在python熊猫数据帧对象中有一个时间序列,我想基于索引创建一个组,但是我想要重叠的组,即组是不区分的。 header_sec是索引列。每组由2秒窗口组成。
输入dataFrame
header_sec
1 17004天22:17:13
2 17004天22:17:13
3 17004天22:17:13
4 17004天22:17:13
5 17004天22:17:14
6 17004天22: 17:14
7 17004天22:17:14
8 17004天22:17:14
9 17004天22:17:15
10 17004天22:17: 15
11 17004天22:17:15
12 17004天22:17:15
13 17004天22:17:16
14 17004天22:17:16
15 17004天22:17:16
16 17004天22:17:16
17 17004天22:17:17
18 17004天22:17:17
19 17004天22:17:17
20 17004天22:17:17
我的第一组应该有
1 17004天22:17:13
2 17004天22:17:13
3 17004天22:17:13
4 17004天22:17:13
5 17004天22:17:14
6 17004天22:17:14
7 17004天22:17:14
8 17004天22:17:14
第二组开始m前一个指数,并取上一个第二个记录的1/2。
7 17004天22:17:14
8 17004天22:17:14
9 17004天22:17:15
10 17004天22:17:15
11 17004天22:17:15
12 17004天22:17:15
13 17004天22:17:16
14 17004天22:17:16
第三组.....
13 17004天22:17:16
14 17004天22:17:16
15 17004天22:17:16
16 17004天22:17:16
17 17004天22:17:17
18 17004天22:17:17
19 17004天22:17:17
20 17004天22:17:17
如果我在索引上执行groupby,
dfgroup = df.groupby df.index)
每秒给一组。什么是合并这些组的最佳方式?
解决方案
这是一种技巧:
import numpy as np#如果你还没有这个
grouping = df.groupby(df.index)
的名字,组中的组:
try:
prev_sec = df.loc [(name - pd.to_timedelta(1,unit ='s')),:]
except KeyError:
prev_sec = pd.DataFrame(columns = group.columns)
try:
next_sec = df.loc [(name + pd.to_timedelta(1,unit ='s')) ,:]
除了KeyError:
next_sec = pd.DataFrame(columns = group.columns)
Pn = 2#用int(len(prev_sec)/ 2)替换为半行从以前的第二个
Nn = 2#替换为int(len(next_sec)/ 2)从下一个秒获取半行
group = pd.concat([prev_sec.iloc [-Pn:,: ],group,next_sec.iloc [:Nn ,:]])
#用操作替换以下行
print(name,group)
I have a time series in python pandas dataframe object and I want to create a group based on index but I want overlapping groups i.e groups are not distinct. The header_sec is the index column.Each groups consists of a 2 second window. Input dataFrame
header_sec
1 17004 days 22:17:13
2 17004 days 22:17:13
3 17004 days 22:17:13
4 17004 days 22:17:13
5 17004 days 22:17:14
6 17004 days 22:17:14
7 17004 days 22:17:14
8 17004 days 22:17:14
9 17004 days 22:17:15
10 17004 days 22:17:15
11 17004 days 22:17:15
12 17004 days 22:17:15
13 17004 days 22:17:16
14 17004 days 22:17:16
15 17004 days 22:17:16
16 17004 days 22:17:16
17 17004 days 22:17:17
18 17004 days 22:17:17
19 17004 days 22:17:17
20 17004 days 22:17:17
My first group should have
1 17004 days 22:17:13
2 17004 days 22:17:13
3 17004 days 22:17:13
4 17004 days 22:17:13
5 17004 days 22:17:14
6 17004 days 22:17:14
7 17004 days 22:17:14
8 17004 days 22:17:14
The second group starts from the previous index and takes 1/2 of the records in previous second.
7 17004 days 22:17:14
8 17004 days 22:17:14
9 17004 days 22:17:15
10 17004 days 22:17:15
11 17004 days 22:17:15
12 17004 days 22:17:15
13 17004 days 22:17:16
14 17004 days 22:17:16
Third group .....
13 17004 days 22:17:16
14 17004 days 22:17:16
15 17004 days 22:17:16
16 17004 days 22:17:16
17 17004 days 22:17:17
18 17004 days 22:17:17
19 17004 days 22:17:17
20 17004 days 22:17:17
If I do groupby on index,
dfgroup=df.groupby(df.index)
this gives one group per second. What would be the best way to merge these groups?
解决方案
Here is a technique:
import numpy as np # if you have not already done this
grouped = df.groupby(df.index)
for name, group in grouped:
try:
prev_sec = df.loc[(name - pd.to_timedelta(1, unit='s')), :]
except KeyError:
prev_sec = pd.DataFrame(columns=group.columns)
try:
next_sec = df.loc[(name + pd.to_timedelta(1, unit='s')), :]
except KeyError:
next_sec = pd.DataFrame(columns=group.columns)
Pn = 2 # replace this with int(len(prev_sec)/2) to get half rows from previous second
Nn = 2 # replace this with int(len(next_sec)/2) to get half rows from next second
group = pd.concat([prev_sec.iloc[-Pn:,:], group, next_sec.iloc[:Nn,:]])
# Replace the below lines with your operations
print(name, group)
这篇关于groupby具有重叠的间隔时间的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持!