问题描述
如何为同一数据帧中第一列的所有相同值提取和比较数据帧中第二列的值?
How to extract and compare values of second column in a data frame for all the same values of first column in same data frame?
我的数据框为"df":
I have a data frame as 'df':
Name Datetime
Bob 26-04-2018 12:00:00
Claire 26-04-2018 12:00:00
Bob 26-04-2018 12:30:00
Grace 27-04-2018 08:30:00
Bob 27-04-2018 09:30:00
我想将新列作为df ['Id']添加到数据帧中,以便对于具有相同名称的用户,如果datetime值相差不超过30分钟,则将为它们分配相同的值值,如果日期时间差大于30分钟,则会为其分配一个不同的ID.
I want to add a new column to the data frame as df['Id'] such that, for users having same names, if the datetime values have a difference of not more than 30 min, they would be assigned the same value of Id, and if the datetime difference is greater than 30 min, then it would be assigned a different id.
我认为可以通过迭代遍历来实现,但是我不确定该怎么做.另外,因为我拥有大量数据集,还有更好的方法吗?
I think it could be achieved using iterating over loops but I am not sure how to do it. Also, is there a better way to do this as I have a huge data set?
我对数据框的预期输出为:
My expected output of the data frame would be as:
Name Datetime Id
Bob 26-04-2018 12:00:00 1
Claire 26-04-2018 12:00:00 2
Bob 26-04-2018 12:10:00 1
Bob 26-04-2018 12:20:00 1
Claire 27-04-2018 08:30:00 3
Bob 27-04-2018 09:30:00 4
任何帮助将不胜感激.谢谢
Any help would be appreciated.Thanks
推荐答案
我认为使用groupby
,grouper
和ngroup
很简单,如下所示:
I think it is simple using groupby
, grouper
and ngroup
as follows:
df['Id'] = df.groupby([pd.Grouper(freq='30T', key='Datetime'), 'Name']).ngroup().add(1)
Out[423]:
Name Datetime Id
0 Bob 2018-04-26 12:00:00 1
1 Claire 2018-04-26 12:00:00 2
2 Bob 2018-04-26 12:10:00 1
3 Bob 2018-04-26 12:20:00 1
4 Claire 2018-04-27 08:30:00 3
5 Bob 2018-04-27 09:30:00 4
这篇关于如何在 pandas 数据框中将第二列的值与第一列的值进行比较?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持!