我有一个看起来像这样的数据框:

data = [['A', 1, 100], ['A', 3, 100], ['A', 2, 100], ['A', 3, 100], ['A', 5, 100]]
df =  pd.DataFrame(data, columns = ['?', 'Rating', 'Amount'])


    ?   Rating  Amount
0   A   1       100
1   A   3       100
2   A   2       100
3   A   3       100
4   A   5       100


并且我需要基于替换值的“ Rating”值创建新列-看起来像这样:

    ?   Rating  Amount  1   2   3   5
0   A   1       100     100 0   0   0
1   A   3       100     0   0   100 0
2   A   2       100     0   100 0   0
3   A   3       100     0   0   100 0
4   A   5       100     0   0   0   100


现在我有这个:

ratingnames = np.unique(list(df['Rating']))
ratingnames.sort()

d = pd.DataFrame(0, index=np.arange(len(df['Rating'])), columns=ratingnames)

for i in range(len(df['Rating'])):
    ratingvalue = df.loc[i, 'Rating']
    d.loc[i, ratingvalue] = df.loc[i, 'Amount']

df = pd.concat([df, d], axis = 1)


但我认为它可以改进。有什么建议么?谢谢!

最佳答案

IIUC,使用get_dummies并乘以df['Amount'],然后在concat上乘axis=1

output = pd.concat((df,pd.get_dummies(df['Rating']).mul(df['Amount'],axis=0)),axis=1)




   ?  Rating  Amount    1    2    3    5
0  A       1     100  100    0    0    0
1  A       3     100    0    0  100    0
2  A       2     100    0  100    0    0
3  A       3     100    0    0  100    0
4  A       5     100    0    0    0  100


时间:
python - 从python中的现有列创建新列-LMLPHP

10-06 06:17