本文介绍了 pandas :按键获取首次出现分组的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

如果我有以下数据框

| id | timestamp           | code | id2
| 10 | 2017-07-12 13:37:00 | 206  | a1
| 10 | 2017-07-12 13:40:00 | 206  | a1
| 10 | 2017-07-12 13:55:00 | 206  | a1
| 10 | 2017-07-12 19:00:00 | 206  | a2
| 11 | 2017-07-12 13:37:00 | 206  | a1
...

我需要对id, id2列进行分组,并获得timestamp值的首次出现,例如用于id=10, id2=a1, timestamp=2017-07-12 13:37:00.

I need to group by id, id2 columns and get the first occurrence of timestamp value, e.g. for id=10, id2=a1, timestamp=2017-07-12 13:37:00.

我搜索了它,发现了一些可能的解决方案,但无法弄清楚如何正确实现它们.可能应该是这样的:

I googled it and found some possible solutions, but cant figure out how to realize them properly. This probably should be something like:

df.groupby(["id", "id2"])["timestamp"].apply(lambda x: ....)

推荐答案

我认为您需要 GroupBy.first :

I think you need GroupBy.first:

df.groupby(["id", "id2"])["timestamp"].first()

drop_duplicates :

df.drop_duplicates(subset=['id','id2'])

对于相同的输出:

df1 = df.groupby(["id", "id2"], as_index=False)["timestamp"].first()
print (df1)
   id id2            timestamp
0  10  a1  2017-07-12 13:37:00
1  10  a2  2017-07-12 19:00:00
2  11  a1  2017-07-12 13:37:00

df1 = df.drop_duplicates(subset=['id','id2'])[['id','id2','timestamp']]
print (df1)
   id id2            timestamp
0  10  a1  2017-07-12 13:37:00
1  10  a2  2017-07-12 19:00:00
2  11  a1  2017-07-12 13:37:00

这篇关于 pandas :按键获取首次出现分组的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持!

10-27 14:52