我有这样一个DF:
Name Gender Age Level
Pikachu Male 4 8
Charmander Female 5 7
Charmander Female 5 7
Squirtle Male 3 6
Squirtle Male 3 9
Squirtle Female 4 9
我希望它看起来像这样:
Name Gender Age Level
Pikachu Male 4 8
Charmander Female 5 7
Squirtle Male 3 9
Squirtle Female 4 9
我不知道如何用英语解释我想做什么,所以我要用伪代码来写。
基本上:
If Name, Gender and Age are the same:
If there is a difference in levels:
Keep the row with higher level
If there is a tie:
Keep a random one
任何想法都是值得赞赏的!
最佳答案
使用argsort
和duplicated
:
df[~df.iloc[np.argsort(-df.Level)].drop('Level', 1).duplicated()]
Name Gender Age Level
0 Pikachu Male 4 8
1 Charmander Female 5 7
4 Squirtle Male 3 9
5 Squirtle Female 4 9
groupby
+idxmax
解决方案(尽管速度较慢):df.iloc[df.groupby(['Name','Gender', 'Age']).Level.idxmax()]
Name Gender Age Level
1 Charmander Female 5 7
0 Pikachu Male 4 8
5 Squirtle Female 4 9
4 Squirtle Male 3 9
关于python - 删除重复项,但在每组给定列中保留具有最大值的行,我们在Stack Overflow上找到一个类似的问题:https://stackoverflow.com/questions/53844459/