问题描述
假设我有一个熊猫数据框,如下所示:
>>>df=pd.DataFrame({'dt':pd.to_datetime(['2018-12-10 16:35:34.246','2018-12-10 16:36:34.243','2018-12-10 16:38:34.216','2018-12-10 16:42:34.123']),'值':[1,2,3,4]})>>>dfdt值0 2018-12-10 16:35:34.246 11 2018-12-10 16:36:34.243 22 2018-12-10 16:38:34.216 33 2018-12-10 16:42:34.123 4>>>我想通过 'dt'
列对这个数据框进行分组,但我想以这样一种方式对其进行分组,即在分组后,它认为相差不到一秒的值相同我想根据每个组总结 'value'
列,并且我希望数据帧两个保持相同的长度,因此小于一秒的差异值将都是重复值,我到目前为止尝试过:
但如您所见,数据框没有改变,因为它按等效的 'dt'
列值分组.
我想要的输出是:
dt 值0 2018-12-10 16:35:34.246 31 2018-12-10 16:36:34.243 32 2018-12-10 16:38:34.216 33 2018-12-10 16:42:34.123 4
一个蛮力的解决方案是取你的 datetime
系列和每个 datetime
值之间的绝对差异,然后与阈值进行比较:
# 来自@StephenCowley 的数据阈值 = pd.Timedelta(seconds=1)df['val'] = [df.loc[(df['dt'] - t).abs()
Let's say i have a pandas dataframe as below:
>>> df=pd.DataFrame({'dt':pd.to_datetime(['2018-12-10 16:35:34.246','2018-12-10 16:36:34.243','2018-12-10 16:38:34.216','2018-12-10 16:42:34.123']),'value':[1,2,3,4]})
>>> df
dt value
0 2018-12-10 16:35:34.246 1
1 2018-12-10 16:36:34.243 2
2 2018-12-10 16:38:34.216 3
3 2018-12-10 16:42:34.123 4
>>>
I would like to group this dataframe by 'dt'
column, but i want to group it in a way that it thinks the values that are less than a second different are the same, after grouping those i would like to sum up the 'value'
column based on each group, and i want the dataframe two remain the same length, so the less than one second difference values would be all a duplicate value, i so far tried:
>>> df.groupby('dt',as_index=False)['value'].sum()
dt value
0 2018-12-10 16:35:34.246 1
1 2018-12-10 16:36:34.243 2
2 2018-12-10 16:38:34.216 3
3 2018-12-10 16:42:34.123 4
>>>
But as you see, the dataframe didn't change because this groups by equivalent 'dt'
column values.
My desired output is:
dt value
0 2018-12-10 16:35:34.246 3
1 2018-12-10 16:36:34.243 3
2 2018-12-10 16:38:34.216 3
3 2018-12-10 16:42:34.123 4
A brute force solution is to take the absolute difference between your datetime
series and each datetime
value, then compare against a threshold:
# data from @StephenCowley
threshold = pd.Timedelta(seconds=1)
df['val'] = [df.loc[(df['dt'] - t).abs() < threshold, 'value'].sum()
for t in df['dt']]
print(df)
dt value val
0 2018-12-10 16:35:34.246 1 3
1 2018-12-10 16:35:34.243 2 3
2 2018-12-10 16:38:34.216 3 3
3 2018-12-10 16:42:34.123 4 4
这篇关于按数据帧分组,按值小于一秒 - pandas的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持!