我有一个看起来像这样的数据框:
data = [['A', 1, 100], ['A', 3, 100], ['A', 2, 100], ['A', 3, 100], ['A', 5, 100]]
df = pd.DataFrame(data, columns = ['?', 'Rating', 'Amount'])
? Rating Amount
0 A 1 100
1 A 3 100
2 A 2 100
3 A 3 100
4 A 5 100
并且我需要基于替换值的“ Rating”值创建新列-看起来像这样:
? Rating Amount 1 2 3 5
0 A 1 100 100 0 0 0
1 A 3 100 0 0 100 0
2 A 2 100 0 100 0 0
3 A 3 100 0 0 100 0
4 A 5 100 0 0 0 100
现在我有这个:
ratingnames = np.unique(list(df['Rating']))
ratingnames.sort()
d = pd.DataFrame(0, index=np.arange(len(df['Rating'])), columns=ratingnames)
for i in range(len(df['Rating'])):
ratingvalue = df.loc[i, 'Rating']
d.loc[i, ratingvalue] = df.loc[i, 'Amount']
df = pd.concat([df, d], axis = 1)
但我认为它可以改进。有什么建议么?谢谢!
最佳答案
IIUC,使用get_dummies
并乘以df['Amount'],
然后在concat
上乘axis=1
:
output = pd.concat((df,pd.get_dummies(df['Rating']).mul(df['Amount'],axis=0)),axis=1)
? Rating Amount 1 2 3 5
0 A 1 100 100 0 0 0
1 A 3 100 0 0 100 0
2 A 2 100 0 100 0 0
3 A 3 100 0 0 100 0
4 A 5 100 0 0 0 100
时间: