我有一个数据帧:

u_id    date        social_interaction_type_id  Total_Count
4       2018-08-19  4                           5
4       2018-08-24  2                           3
4       2018-08-21  1                           4

我想根据 u_id 和日期旋转 DataFrame。

所以结果应该是这样的:
u_id    date        4       2       1
4       2018-08-19  5       nan     nan
4       2018-08-24  nan     3       nan
4       2018-08-21  nan     nan     4

我的代码尝试:
df.pivot(index = ['u_id','date'] , columns='social_interaction_type_id',values='Total_Count')

错误:
ValueError: Length of passed values is 8803, index implies 1

最佳答案

使用 set_index unstack 的替代解决方案:

df = (df.set_index(['u_id','date','social_interaction_type_id'])['Total_Count']
       .unstack()
       .reset_index()
       .rename_axis(None, axis=1))
print (df)
   u_id        date    1    2    4
0     4  2018-08-19  NaN  NaN  5.0
1     4  2018-08-21  4.0  NaN  NaN
2     4  2018-08-24  NaN  3.0  NaN

如果在前 2 列中重复是必要的,请使用聚合函数 meansum 如:
print (df)
   u_id        date  social_interaction_type_id  Total_Count
0     4  2018-08-19                           4            5 <- 4  2018-08-19
1     4  2018-08-19                           6            4 <- 4  2018-08-19
2     4  2018-08-24                           2            3
3     4  2018-08-21                           1            4


df2 = (df.groupby(['u_id','date','social_interaction_type_id'])['Total_Count']
       .mean()
       .unstack()
       .reset_index()
       .rename_axis(None, axis=1))

或者:
df2 = (df.pivot_table(index=['u_id','date'],columns='social_interaction_type_id', values='Total_Count')
       .reset_index()
       .rename_axis(None, axis=1))
print (df2)
   u_id        date    1    2    4    6
0     4  2018-08-19  NaN  NaN  5.0  4.0
1     4  2018-08-21  4.0  NaN  NaN  NaN
2     4  2018-08-24  NaN  3.0  NaN  NaN

关于python - 基于两列的 Pandas 数据透视(多索引),我们在Stack Overflow上找到一个类似的问题:https://stackoverflow.com/questions/52701012/

10-09 07:28
查看更多