在Pandas中,我有一个数据框,其中每一行对应一个用户,每一列对应一个与该用户相关的变量,包括他们如何评价某件事:
+----------------+--------------------------+----------+----------+
| name | email | rating_a | rating_b |
+----------------+--------------------------+----------+----------+
| Someone | [email protected] | 7.8 | 9.9 |
| Someone Else | [email protected] | 2.4 | 9.2 |
| Another Person | [email protected] | 3.5 | 7.5 |
+----------------+--------------------------+----------+----------+
我想对表进行透视,以使一列是评级的类型(
a
或b
),另一列是评级值(7.8
,3.5
等),而其他列与像这样:+----------------+-------------------------+-------------+--------------+
| name | email | rating_type | rating_value |
+----------------+-------------------------+-------------+--------------+
| Someone | [email protected] | a | 7.8 |
| Someone | [email protected] | b | 9.9 |
| Someone Else | [email protected] | a | 2.4 |
| Someone Else | [email protected] | b | 9.2 |
| Another Person | [email protected] | a | 3.5 |
| Another Person | [email protected] | b | 7.5 |
+----------------+-------------------------+-------------+--------------+
似乎熊猫melt方法是正确的,但是在这种情况下,我不确定我的
id_vars
是什么和value_vars
是什么。它还似乎删除了不在这两个类别之一中的所有列,例如电子邮件地址。但我想保留所有这些信息。我该如何使用Pandas?
最佳答案
您可以将melt
+ str.replace
用于更改列名称:
df.columns = df.columns.str.replace('rating_','')
df = df.melt(id_vars=['name','email'], var_name='rating_type', value_name='rating_value')
print (df)
name email rating_type rating_value
0 Someone [email protected] a 7.8
1 Someone Else [email protected] a 2.4
2 Another Person [email protected] a 3.5
3 Someone [email protected] b 9.9
4 Someone Else [email protected] b 9.2
5 Another Person [email protected] b 7.5
set_index
+ stack
+ rename_axis
+ reset_index
的另一种解决方案:df.columns = df.columns.str.replace('rating_','')
df = df.set_index(['name','email'])
.stack()
.rename_axis(['name','email','rating_type'])
.reset_index(name='rating_value')
print (df)
name email rating_type rating_value
0 Someone [email protected] a 7.8
1 Someone [email protected] b 9.9
2 Someone Else [email protected] a 2.4
3 Someone Else [email protected] b 9.2
4 Another Person [email protected] a 3.5
5 Another Person [email protected] b 7.5
如果需要更改行顺序,请使用
melt
解决方案:df.columns = df.columns.str.replace('rating_','')
df = df.reset_index() \
.melt(id_vars=['index','name','email'],
var_name='rating_type',
value_name='rating_value')\
.sort_values(['index','rating_type']) \
.drop('index', axis=1) \
.reset_index(drop=True)
print (df)
name email rating_type rating_value
0 Someone [email protected] a 7.8
1 Someone [email protected] b 9.9
2 Someone Else [email protected] a 2.4
3 Someone Else [email protected] b 9.2
4 Another Person [email protected] a 3.5
5 Another Person [email protected] b 7.5
关于python - 在Pandas中枢转一系列评级列,我们在Stack Overflow上找到一个类似的问题:https://stackoverflow.com/questions/44236927/