我有一个Python DataFrame,电子邮件在其中重复。我想找到所有重复项并合并它们,以便在电子邮件中附加一组帐号。我也想将第三列保留在合并列中。

AccountID Email                    Quality_3

1         [email protected]      High
2         [email protected]
3         [email protected]
4         [email protected]     Medium
5         [email protected]
6         [email protected]
7         [email protected]
8         [email protected]



AccountID         Email                  Quality_3
1, 3, 5, 7        [email protected]    High
2, 6              [email protected]
4, 8              [email protected]   Medium


我正在查看左右连接,但似乎无法弄清楚。

最佳答案

尝试这个:

df_new=(df.astype(str).groupby('Email')['AccountID','Quality_3']
    .agg({'AccountID':lambda x: ','.join(x),'Quality_3':'first'}).reset_index())
print(df_new)

                  Email AccountID Quality_3
0   [email protected]   1,3,5,7      High
1  [email protected]       4,8    Medium
2    [email protected]       2,6      None

10-08 11:20