我有这样的数据框:

offer_id  affiliate_id affiliate_source affiliate_sub5  advertiser_id Payout_cent Revenue_cents
 428572          1327         14331605       14331605            291    50             30
 428572          1327             1465           1465            291    50             30
 428572          1327             1336           1336            291    50             30
 428572          1327         14331605       14331605            291    50             30
 428572          1327         14331605       14331605            291    50             30


我将“ offer_id”,“ affiliate_id”,“ affiliate_source”三列分组

操作方式:

df1.groupby(['offer_id', 'affiliate_id', 'affiliate_source'])[["payout_cents", "revenue_cents"]].sum()


结果:

offer_id affiliate_id affiliate_source                      payout_cents    revenue_cents
428572   1327         14331605                               90            150
                      1465                                   30            50
                      1336                                   30            50


问题:
组操作后,我还需要获取其他列,例如“ advertiser_id”和“ affiliate_sub5”。

谢谢。

最佳答案

我认为您需要add_suffixjoin

cols = ['offer_id', 'affiliate_id', 'affiliate_source']
df2 = df1.groupby(cols)[["payout_cents", "revenue_cents"]].sum().add_suffix('_sum')
df = df1.join(df2, on=cols)

print (df)
   offer_id  affiliate_id  affiliate_source  affiliate_sub5  advertiser_id  \
0    428572          1327          14331605        14331605            291
1    428572          1327              1465            1465            291
2    428572          1327              1336            1336            291
3    428572          1327          14331605        14331605            291
4    428572          1327          14331605        14331605            291

   payout_cents  revenue_cents  payout_cents_sum  revenue_cents_sum
0            50             30               150                 90
1            50             30                50                 30
2            50             30                50                 30
3            50             30               150                 90
4            50             30               150                 90


transform的另一种解决方案:

cols = ['offer_id', 'affiliate_id', 'affiliate_source']
cols1 = ["payout_cents", "revenue_cents"]

df1[['{}_sum'.format(x) for x in cols1]] = df1.groupby(cols)[cols1].transform('sum')

关于python - Pandas 按三列分组,但保留所有其他列,我们在Stack Overflow上找到一个类似的问题:https://stackoverflow.com/questions/48152309/

10-12 18:43