我有一个数据帧:
u_id date social_interaction_type_id Total_Count
4 2018-08-19 4 5
4 2018-08-24 2 3
4 2018-08-21 1 4
我想根据 u_id 和日期旋转 DataFrame。
所以结果应该是这样的:
u_id date 4 2 1
4 2018-08-19 5 nan nan
4 2018-08-24 nan 3 nan
4 2018-08-21 nan nan 4
我的代码尝试:
df.pivot(index = ['u_id','date'] , columns='social_interaction_type_id',values='Total_Count')
错误:
ValueError: Length of passed values is 8803, index implies 1
最佳答案
使用 set_index
和 unstack
的替代解决方案:
df = (df.set_index(['u_id','date','social_interaction_type_id'])['Total_Count']
.unstack()
.reset_index()
.rename_axis(None, axis=1))
print (df)
u_id date 1 2 4
0 4 2018-08-19 NaN NaN 5.0
1 4 2018-08-21 4.0 NaN NaN
2 4 2018-08-24 NaN 3.0 NaN
如果在前 2 列中重复是必要的,请使用聚合函数
mean
, sum
如:print (df)
u_id date social_interaction_type_id Total_Count
0 4 2018-08-19 4 5 <- 4 2018-08-19
1 4 2018-08-19 6 4 <- 4 2018-08-19
2 4 2018-08-24 2 3
3 4 2018-08-21 1 4
df2 = (df.groupby(['u_id','date','social_interaction_type_id'])['Total_Count']
.mean()
.unstack()
.reset_index()
.rename_axis(None, axis=1))
或者:
df2 = (df.pivot_table(index=['u_id','date'],columns='social_interaction_type_id', values='Total_Count')
.reset_index()
.rename_axis(None, axis=1))
print (df2)
u_id date 1 2 4 6
0 4 2018-08-19 NaN NaN 5.0 4.0
1 4 2018-08-21 4.0 NaN NaN NaN
2 4 2018-08-24 NaN 3.0 NaN NaN
关于python - 基于两列的 Pandas 数据透视(多索引),我们在Stack Overflow上找到一个类似的问题:https://stackoverflow.com/questions/52701012/