目前我有这种格式的表:

Geo-id Name                            stat Year   index
111500  Anniston-Oxford-Jacksonville     AL 1991    0
111500  Anniston-Oxford-Jacksonville     AL 1992    91.675
111500  Anniston-Oxford-Jacksonville     AL 1993    93.8025
111500  Anniston-Oxford-Jacksonville     AL 1994    96.63
111500  Anniston-Oxford-Jacksonville     AL 1995    99.455
111500  Anniston-Oxford-Jacksonville     AL 1996    102.4875
111500  Anniston-Oxford-Jacksonville     AL 1997    109.0225
111500  Anniston-Oxford-Jacksonville     AL 1998    114.7075
111500  Anniston-Oxford-Jacksonville     AL 1999    116.005
112220  Auburn-Opelika                  AL  1992    90.695
112220  Auburn-Opelika                  AL  1993    94.2075
112220  Auburn-Opelika                  AL  1994    98.6825
112220  Auburn-Opelika                  AL  1995    103.3675
112220  Auburn-Opelika                  AL  1996    107.2725
112220  Auburn-Opelika                  AL  1997    111.7125

This should be converted to :

Geo-id  Name                            1991    1992    1993    1994 ........... 2017
111500  Anniston-Oxford-Jacksonville    0       91.675  93.8025 96.63
112220  Auburn-Opelika                  0       90.695  94.2075 98.6825 and so on .....

Keeping geoid and name vertical ( but repeating only once since it is pivoted horizontally).

Code I have so far :

   pre_horizontal_df = pd.read_csv('database_raw.csv')
   pre_horizontal_df['period'] = pre_horizontal_df.year.astype(str)
   df1 = pre_horizontal_df.groupby(['geoid', 'name'])['hpi'].mean().unstack()
   print (df1)


但这是行不通的。可以在python df / pandas中完成此水平枢轴操作吗?

最佳答案

您需要添加列Year,该列通过unstack创建新列:

df1=pre_horizontal_df.groupby(['Geo-id','Name','Year'])['index'].mean().unstack(fill_value=0)
print (df1)
Year                                 1991    1992     1993     1994      1995  \
Geo-id Name
111500 Anniston-Oxford-Jacksonville   0.0  91.675  93.8025  96.6300   99.4550
112220 Auburn-Opelika                 0.0  90.695  94.2075  98.6825  103.3675

Year                                     1996      1997      1998     1999
Geo-id Name
111500 Anniston-Oxford-Jacksonville  102.4875  109.0225  114.7075  116.005
112220 Auburn-Opelika                107.2725  111.7125    0.0000    0.000


pivot_table的另一种解决方案:

df1 = pre_horizontal_df.pivot_table(index=['Geo-id', 'Name'],
                                    columns='Year',
                                    values='index',
                                    fill_value=0)
print (df1)
Year                                 1991    1992     1993     1994      1995  \
Geo-id Name
111500 Anniston-Oxford-Jacksonville     0  91.675  93.8025  96.6300   99.4550
112220 Auburn-Opelika                   0  90.695  94.2075  98.6825  103.3675

Year                                     1996      1997      1998     1999
Geo-id Name
111500 Anniston-Oxford-Jacksonville  102.4875  109.0225  114.7075  116.005
112220 Auburn-Opelika                107.2725  111.7125    0.0000    0.000


最后,如果需要将索引转换为列:

df1 = df1.rename_axis(None, axis=1).reset_index()
print (df1)
   Geo-id                          Name  1991    1992     1993     1994  \
0  111500  Anniston-Oxford-Jacksonville   0.0  91.675  93.8025  96.6300
1  112220                Auburn-Opelika   0.0  90.695  94.2075  98.6825

       1995      1996      1997      1998     1999
0   99.4550  102.4875  109.0225  114.7075  116.005
1  103.3675  107.2725  111.7125    0.0000    0.000


编辑:

如果每个列都不重复创建新索引和新列,则可以将set_indexunstack结合使用:

print (pre_horizontal_df[pre_horizontal_df.duplicated(['Geo-id','Name','Year'], keep=False)])
Empty DataFrame
Columns: [Geo-id, Name, stat, Year, index]
Index: []

df1 = pre_horizontal_df.set_index(['Geo-id', 'Name', 'Year'])['index'].unstack(fill_value=0)
print (df1)
Year                                 1991    1992     1993     1994      1995  \
Geo-id Name
111500 Anniston-Oxford-Jacksonville   0.0  91.675  93.8025  96.6300   99.4550
112220 Auburn-Opelika                 0.0  90.695  94.2075  98.6825  103.3675

Year                                     1996      1997      1998     1999
Geo-id Name
111500 Anniston-Oxford-Jacksonville  102.4875  109.0225  114.7075  116.005
112220 Auburn-Opelika                107.2725  111.7125    0.0000    0.000

10-08 19:14