python - 在Pandas中枢转一系列评级列

在Pandas中，我有一个数据框，其中每一行对应一个用户，每一列对应一个与该用户相关的变量，包括他们如何评价某件事：

+----------------+--------------------------+----------+----------+
|      name      |          email           | rating_a | rating_b |
+----------------+--------------------------+----------+----------+
| Someone        | [email protected]         |      7.8 |      9.9 |
| Someone Else   | [email protected]    |      2.4 |      9.2 |
| Another Person | [email protected]  |      3.5 |      7.5 |
+----------------+--------------------------+----------+----------+

我想对表进行透视，以使一列是评级的类型（a或b），另一列是评级值（7.8，3.5等），而其他列与像这样：

+----------------+-------------------------+-------------+--------------+
|      name      |          email          | rating_type | rating_value |
+----------------+-------------------------+-------------+--------------+
| Someone        | [email protected]        | a           |          7.8 |
| Someone        | [email protected]        | b           |          9.9 |
| Someone Else   | [email protected]   | a           |          2.4 |
| Someone Else   | [email protected]   | b           |          9.2 |
| Another Person | [email protected] | a           |          3.5 |
| Another Person | [email protected] | b           |          7.5 |
+----------------+-------------------------+-------------+--------------+

似乎熊猫melt方法是正确的，但是在这种情况下，我不确定我的id_vars是什么和value_vars是什么。它还似乎删除了不在这两个类别之一中的所有列，例如电子邮件地址。但我想保留所有这些信息。

我该如何使用Pandas？

最佳答案

您可以将melt + str.replace用于更改列名称：

df.columns = df.columns.str.replace('rating_','')
df = df.melt(id_vars=['name','email'], var_name='rating_type', value_name='rating_value')
print (df)
             name                     email rating_type  rating_value
0         Someone          [email protected]           a           7.8
1    Someone Else     [email protected]           a           2.4
2  Another Person  [email protected]           a           3.5
3         Someone          [email protected]           b           9.9
4    Someone Else     [email protected]           b           9.2
5  Another Person  [email protected]           b           7.5

set_index + stack + rename_axis + reset_index的另一种解决方案：

df.columns = df.columns.str.replace('rating_','')
df = df.set_index(['name','email'])
       .stack()
       .rename_axis(['name','email','rating_type'])
       .reset_index(name='rating_value')
print (df)
             name                     email rating_type  rating_value
0         Someone          [email protected]           a           7.8
1         Someone          [email protected]           b           9.9
2    Someone Else     [email protected]           a           2.4
3    Someone Else     [email protected]           b           9.2
4  Another Person  [email protected]           a           3.5
5  Another Person  [email protected]           b           7.5

如果需要更改行顺序，请使用melt解决方案：

df.columns = df.columns.str.replace('rating_','')
df = df.reset_index() \
       .melt(id_vars=['index','name','email'],
             var_name='rating_type',
             value_name='rating_value')\
       .sort_values(['index','rating_type']) \
       .drop('index', axis=1) \
       .reset_index(drop=True)
print (df)
             name                     email rating_type  rating_value
0         Someone          [email protected]           a           7.8
1         Someone          [email protected]           b           9.9
2    Someone Else     [email protected]           a           2.4
3    Someone Else     [email protected]           b           9.2
4  Another Person  [email protected]           a           3.5
5  Another Person  [email protected]           b           7.5

关于python - 在Pandas中枢转一系列评级列，我们在Stack Overflow上找到一个类似的问题：https://stackoverflow.com/questions/44236927/