目前我有这种格式的表:
Geo-id Name stat Year index 111500 Anniston-Oxford-Jacksonville AL 1991 0 111500 Anniston-Oxford-Jacksonville AL 1992 91.675 111500 Anniston-Oxford-Jacksonville AL 1993 93.8025 111500 Anniston-Oxford-Jacksonville AL 1994 96.63 111500 Anniston-Oxford-Jacksonville AL 1995 99.455 111500 Anniston-Oxford-Jacksonville AL 1996 102.4875 111500 Anniston-Oxford-Jacksonville AL 1997 109.0225 111500 Anniston-Oxford-Jacksonville AL 1998 114.7075 111500 Anniston-Oxford-Jacksonville AL 1999 116.005 112220 Auburn-Opelika AL 1992 90.695 112220 Auburn-Opelika AL 1993 94.2075 112220 Auburn-Opelika AL 1994 98.6825 112220 Auburn-Opelika AL 1995 103.3675 112220 Auburn-Opelika AL 1996 107.2725 112220 Auburn-Opelika AL 1997 111.7125
This should be converted to :
Geo-id Name 1991 1992 1993 1994 ........... 2017 111500 Anniston-Oxford-Jacksonville 0 91.675 93.8025 96.63 112220 Auburn-Opelika 0 90.695 94.2075 98.6825 and so on .....
Keeping geoid and name vertical ( but repeating only once since it is pivoted horizontally).
Code I have so far :
pre_horizontal_df = pd.read_csv('database_raw.csv')
pre_horizontal_df['period'] = pre_horizontal_df.year.astype(str)
df1 = pre_horizontal_df.groupby(['geoid', 'name'])['hpi'].mean().unstack()
print (df1)
但这是行不通的。可以在python df / pandas中完成此水平枢轴操作吗?
最佳答案
您需要添加列Year
,该列通过unstack
创建新列:
df1=pre_horizontal_df.groupby(['Geo-id','Name','Year'])['index'].mean().unstack(fill_value=0)
print (df1)
Year 1991 1992 1993 1994 1995 \
Geo-id Name
111500 Anniston-Oxford-Jacksonville 0.0 91.675 93.8025 96.6300 99.4550
112220 Auburn-Opelika 0.0 90.695 94.2075 98.6825 103.3675
Year 1996 1997 1998 1999
Geo-id Name
111500 Anniston-Oxford-Jacksonville 102.4875 109.0225 114.7075 116.005
112220 Auburn-Opelika 107.2725 111.7125 0.0000 0.000
pivot_table
的另一种解决方案:df1 = pre_horizontal_df.pivot_table(index=['Geo-id', 'Name'],
columns='Year',
values='index',
fill_value=0)
print (df1)
Year 1991 1992 1993 1994 1995 \
Geo-id Name
111500 Anniston-Oxford-Jacksonville 0 91.675 93.8025 96.6300 99.4550
112220 Auburn-Opelika 0 90.695 94.2075 98.6825 103.3675
Year 1996 1997 1998 1999
Geo-id Name
111500 Anniston-Oxford-Jacksonville 102.4875 109.0225 114.7075 116.005
112220 Auburn-Opelika 107.2725 111.7125 0.0000 0.000
最后,如果需要将索引转换为列:
df1 = df1.rename_axis(None, axis=1).reset_index()
print (df1)
Geo-id Name 1991 1992 1993 1994 \
0 111500 Anniston-Oxford-Jacksonville 0.0 91.675 93.8025 96.6300
1 112220 Auburn-Opelika 0.0 90.695 94.2075 98.6825
1995 1996 1997 1998 1999
0 99.4550 102.4875 109.0225 114.7075 116.005
1 103.3675 107.2725 111.7125 0.0000 0.000
编辑:
如果每个列都不重复创建新索引和新列,则可以将
set_index
与unstack
结合使用:print (pre_horizontal_df[pre_horizontal_df.duplicated(['Geo-id','Name','Year'], keep=False)])
Empty DataFrame
Columns: [Geo-id, Name, stat, Year, index]
Index: []
df1 = pre_horizontal_df.set_index(['Geo-id', 'Name', 'Year'])['index'].unstack(fill_value=0)
print (df1)
Year 1991 1992 1993 1994 1995 \
Geo-id Name
111500 Anniston-Oxford-Jacksonville 0.0 91.675 93.8025 96.6300 99.4550
112220 Auburn-Opelika 0.0 90.695 94.2075 98.6825 103.3675
Year 1996 1997 1998 1999
Geo-id Name
111500 Anniston-Oxford-Jacksonville 102.4875 109.0225 114.7075 116.005
112220 Auburn-Opelika 107.2725 111.7125 0.0000 0.000