python - Python Pandas Drop Duplicates排在倒数第二位

在熊猫数据帧中，选择每个重复集的第二到最后一个最有效的方法是什么？
例如，我基本上想做这个操作：

df = df.drop_duplicates(['Person','Question'],take_last=True)

但这是：

df = df.drop_duplicates(['Person','Question'],take_second_last=True)

摘要问题：如果复制既不是最大值也不是最小值，如何选择要保留的复制？

最佳答案

使用groupby.apply：

df = pd.DataFrame({'A': [1, 1, 1, 1, 2, 2, 2, 3, 3, 4],
                   'B': np.arange(10), 'C': np.arange(10)})

df
Out:
   A  B  C
0  1  0  0
1  1  1  1
2  1  2  2
3  1  3  3
4  2  4  4
5  2  5  5
6  2  6  6
7  3  7  7
8  3  8  8
9  4  9  9

(df.groupby('A', as_index=False).apply(lambda x: x if len(x)==1 else x.iloc[[-2]])
   .reset_index(level=0, drop=True))
Out:
   A  B  C
2  1  2  2
5  2  5  5
7  3  7  7
9  4  9  9

使用不同的数据帧，将两列子集：

df = pd.DataFrame({'A': [1, 1, 1, 1, 2, 2, 2, 3, 3, 4],
                   'B': [1, 1, 2, 1, 2, 2, 2, 3, 3, 4], 'C': np.arange(10)})

df
Out:
   A  B  C
0  1  1  0
1  1  1  1
2  1  2  2
3  1  1  3
4  2  2  4
5  2  2  5
6  2  2  6
7  3  3  7
8  3  3  8
9  4  4  9

(df.groupby(['A', 'B'], as_index=False).apply(lambda x: x if len(x)==1 else x.iloc[[-2]])
   .reset_index(level=0, drop=True))
Out:
   A  B  C
1  1  1  1
2  1  2  2
5  2  2  5
7  3  3  7
9  4  4  9