通过另一列的分组值的总和对pandas数据框中的列进行归一化

通过另一列的分组值的总和对pandas数据框中的列进行归一化

本文介绍了通过另一列的分组值的总和对pandas数据框中的列进行归一化的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我在尝试规范化pandas数据框中的某些列条目时有些卡住.所以我有一个像这样的数据框:

I'm a bit stuck on trying to normalize some entries of a column in a pandas dataframe. So I have a dataframe like this:

df = pd.DataFrame({
        'user':[0,0,1,1,1,2,2],
        'item':['A','B', 'A', 'B','C','B','C'],
        'bought':[1,1,1,3,3,2,3]})
df
bought|item|user
----------------
1     |A   |0
1     |B   |0
1     |A   |1
3     |B   |1
3     |C   |1
2     |B   |2
3     |C   |2

我想获取每个用户购买的总数量归一化的每个购买商品的数量.

I would like to get the number of each item bought normalized by the the total bought by each user.

换句话说,对于已购买"的每个条目,我想将其除以该用户所购买的总和(作为另一列).在这种情况下,我想要的输出是这样(但归一化"列不必是分数):

In other words, for each entry of 'bought' I'd like to divide it by the sum of the total bought for that user (as another column). In this case the output I'd like is this (but the 'normalized' column doesn't have to be fractions):

bought|item|user|normalized
--------------------------
1     |A   |0   |1/2
1     |B   |0   |1/2
1     |A   |1   |1/7
3     |B   |1   |3/7
3     |C   |1   |3/7
2     |B   |2   |2/5
3     |C   |2   |3/5

到目前为止,我已经按用户分组并得到了用户的总和:

So far I've grouped by user and gotten the sum by user:

grouped = df.groupby(by='user')
grouped.aggregate(np.sum)

但是在这一点上,我还是被卡住了.谢谢!

But at this point I'm stuck. Thanks!

推荐答案

pandas map

df.assign(normalized=df.bought.div(df.user.map(df.groupby('user').bought.sum())))

pandas transform

pandas transform

df.assign(normalized=df.bought.div(df.groupby('user').bought.transform('sum')))


两者都产量


both yield

   bought item  user  normalized
0       1    A     0    0.500000
1       1    B     0    0.500000
2       1    A     1    0.142857
3       3    B     1    0.428571
4       3    C     1    0.428571
5       2    B     2    0.400000
6       3    C     2    0.600000

这篇关于通过另一列的分组值的总和对pandas数据框中的列进行归一化的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持!

08-11 14:14