我有这样的数据框:
offer_id affiliate_id affiliate_source affiliate_sub5 advertiser_id Payout_cent Revenue_cents
428572 1327 14331605 14331605 291 50 30
428572 1327 1465 1465 291 50 30
428572 1327 1336 1336 291 50 30
428572 1327 14331605 14331605 291 50 30
428572 1327 14331605 14331605 291 50 30
我将“ offer_id”,“ affiliate_id”,“ affiliate_source”三列分组
操作方式:
df1.groupby(['offer_id', 'affiliate_id', 'affiliate_source'])[["payout_cents", "revenue_cents"]].sum()
结果:
offer_id affiliate_id affiliate_source payout_cents revenue_cents
428572 1327 14331605 90 150
1465 30 50
1336 30 50
问题:
组操作后,我还需要获取其他列,例如“ advertiser_id”和“ affiliate_sub5”。
谢谢。
最佳答案
我认为您需要add_suffix
和join
:
cols = ['offer_id', 'affiliate_id', 'affiliate_source']
df2 = df1.groupby(cols)[["payout_cents", "revenue_cents"]].sum().add_suffix('_sum')
df = df1.join(df2, on=cols)
print (df)
offer_id affiliate_id affiliate_source affiliate_sub5 advertiser_id \
0 428572 1327 14331605 14331605 291
1 428572 1327 1465 1465 291
2 428572 1327 1336 1336 291
3 428572 1327 14331605 14331605 291
4 428572 1327 14331605 14331605 291
payout_cents revenue_cents payout_cents_sum revenue_cents_sum
0 50 30 150 90
1 50 30 50 30
2 50 30 50 30
3 50 30 150 90
4 50 30 150 90
transform
的另一种解决方案:cols = ['offer_id', 'affiliate_id', 'affiliate_source']
cols1 = ["payout_cents", "revenue_cents"]
df1[['{}_sum'.format(x) for x in cols1]] = df1.groupby(cols)[cols1].transform('sum')
关于python - Pandas 按三列分组,但保留所有其他列,我们在Stack Overflow上找到一个类似的问题:https://stackoverflow.com/questions/48152309/