我有这种格式的数据-
年月HPI Div State_fips
1-1993 105.45 7 5
2-1993 105.58 7 5
3-1993 106.23 7 5
4-1993 106.63 7 5
所需的数据透视表为:
Stafips 1-1993 2-1993 3-1993 4-1993
5 105.45 105.58 106.23 106.63
(对熊猫来说还很新)
最佳答案
df1 = df.set_index(['State_fips', 'MonthYear'])['HPI'].unstack()
MonthYear 1-1993 2-1993 3-1993 4-1993
State_fips
5 105.45 105.58 106.23 106.63
df1 = df.pivot(index='State_fips', columns='MonthYear', values='HPI')
MonthYear 1-1993 2-1993 3-1993 4-1993
State_fips
5 105.45 105.58 106.23 106.63
但是,如果重复,则需要使用
groupby
或pivot_table
进行汇总,可以将mean
更改为sum
,median
,...:print (df)
MonthYear HPI Div State_fips
0 1-1993 105.45 7 5
1 2-1993 105.58 7 5
2 3-1993 106.23 7 5
3 4-1993 100.00 7 5 <-duplicates same 4-1993, 5
4 4-1993 200.00 7 5 <-duplicates same 4-1993, 5
df1 = df.pivot_table(index='State_fips', columns='MonthYear', values='HPI', aggfunc='mean')
MonthYear 1-1993 2-1993 3-1993 4-1993
State_fips
5 105.45 105.58 106.23 150.0 <- (100+200/2) = 150
df1 = df.groupby(['State_fips', 'MonthYear'])['HPI'].mean().unstack()
MonthYear 1-1993 2-1993 3-1993 4-1993
State_fips
5 105.45 105.58 106.23 150.0 <- (100+200/2) = 150
最后,如果需要从索引创建列并删除列名称:
df1 = df1.reset_index().rename_axis(None, axis=1)
print (df1)
State_fips 1-1993 2-1993 3-1993 4-1993
0 5 105.45 105.58 106.23 150.0
关于python - 如何使用Pandas df在Python中水平旋转csv中的表格?,我们在Stack Overflow上找到一个类似的问题:https://stackoverflow.com/questions/42993439/