问题描述
我在尝试规范化pandas数据框中的某些列条目时有些卡住.所以我有一个像这样的数据框:
I'm a bit stuck on trying to normalize some entries of a column in a pandas dataframe. So I have a dataframe like this:
df = pd.DataFrame({
'user':[0,0,1,1,1,2,2],
'item':['A','B', 'A', 'B','C','B','C'],
'bought':[1,1,1,3,3,2,3]})
df
bought|item|user
----------------
1 |A |0
1 |B |0
1 |A |1
3 |B |1
3 |C |1
2 |B |2
3 |C |2
我想获取每个用户购买的总数量归一化的每个购买商品的数量.
I would like to get the number of each item bought normalized by the the total bought by each user.
换句话说,对于已购买"的每个条目,我想将其除以该用户所购买的总和(作为另一列).在这种情况下,我想要的输出是这样(但归一化"列不必是分数):
In other words, for each entry of 'bought' I'd like to divide it by the sum of the total bought for that user (as another column). In this case the output I'd like is this (but the 'normalized' column doesn't have to be fractions):
bought|item|user|normalized
--------------------------
1 |A |0 |1/2
1 |B |0 |1/2
1 |A |1 |1/7
3 |B |1 |3/7
3 |C |1 |3/7
2 |B |2 |2/5
3 |C |2 |3/5
到目前为止,我已经按用户分组并得到了用户的总和:
So far I've grouped by user and gotten the sum by user:
grouped = df.groupby(by='user')
grouped.aggregate(np.sum)
但是在这一点上,我还是被卡住了.谢谢!
But at this point I'm stuck. Thanks!
推荐答案
pandas
map
df.assign(normalized=df.bought.div(df.user.map(df.groupby('user').bought.sum())))
pandas
transform
pandas
transform
df.assign(normalized=df.bought.div(df.groupby('user').bought.transform('sum')))
两者都产量
both yield
bought item user normalized
0 1 A 0 0.500000
1 1 B 0 0.500000
2 1 A 1 0.142857
3 3 B 1 0.428571
4 3 C 1 0.428571
5 2 B 2 0.400000
6 3 C 2 0.600000
这篇关于通过另一列的分组值的总和对pandas数据框中的列进行归一化的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持!