我有这样一个DF:

    Name        Gender         Age      Level
  Pikachu        Male           4         8
 Charmander     Female          5         7
 Charmander     Female          5         7
 Squirtle        Male           3         6
 Squirtle        Male           3         9
 Squirtle       Female          4         9

我希望它看起来像这样:
   Name        Gender         Age      Level
  Pikachu        Male           4         8
 Charmander     Female          5         7
 Squirtle        Male           3         9
 Squirtle       Female          4         9

我不知道如何用英语解释我想做什么,所以我要用伪代码来写。
基本上:
If Name, Gender and Age are the same:
      If there is a difference in levels:
            Keep the row with higher level
      If there is a tie:
            Keep a random one

任何想法都是值得赞赏的!

最佳答案

使用argsortduplicated

df[~df.iloc[np.argsort(-df.Level)].drop('Level', 1).duplicated()]

         Name  Gender  Age  Level
0     Pikachu    Male    4      8
1  Charmander  Female    5      7
4    Squirtle    Male    3      9
5    Squirtle  Female    4      9

groupby+idxmax解决方案(尽管速度较慢):
df.iloc[df.groupby(['Name','Gender', 'Age']).Level.idxmax()]

         Name  Gender  Age  Level
1  Charmander  Female    5      7
0     Pikachu    Male    4      8
5    Squirtle  Female    4      9
4    Squirtle    Male    3      9

关于python - 删除重复项,但在每组给定列中保留具有最大值的行,我们在Stack Overflow上找到一个类似的问题:https://stackoverflow.com/questions/53844459/

10-10 11:47