我该如何转换:

patient_id test    test_value      date_taken
11964    HBA1C         8.60        2017-06-14
11964    Glucose     231.00        2017-05-01
11964    Glucose     202.00        2017-07-01
11964    Glucose     194.00        2017-09-02
11964    Creatinine    1.10        2017-05-01
11964    Creatinine    1.28        2017-08-14


对此吗?

patient_id  hba1c_earliest hba1c_latest hba1c_change glucose_earliest glucose_latest/
    11964      8.60           8.60          0.0000        231.0           194.0
glucose_change creatinine_earliest creatinine_latest creatinine_change
     -0.1602         1.10               1.28             0.1636


对于扩展数据框:


  .*_earliest columns should include that lab result with the earliest date. .*_latest columns should include that lab result with the latest date. .*_change columns should hold the relative change (variation), (Latest - Earliest) / Earliest.

最佳答案

采用:

print (df.dtypes)
patient_id             int64 <- not necessary
test                  object <- not necessary
test_value           float64 <- necessary
date_taken    datetime64[ns] <- necessary
dtype: object

df = (df.sort_values(['patient_id','test','date_taken'])
       .groupby(['patient_id','test'])['test_value']
       .agg([('earliest','first'),('latest','last')])
       .assign(change = lambda x: (x['latest'] - x['earliest'])/ x['earliest'])
       .unstack()
       .swaplevel(0,1, axis=1)
       .reindex(columns=df['test'].unique(), level=0)
       )
df.columns = df.columns.map('_'.join)
df = df.reset_index()
print (df)
   patient_id  HBA1C_earliest  HBA1C_latest  HBA1C_change  Glucose_earliest  \
0       11964             8.6           8.6           0.0             231.0

   Glucose_latest  Glucose_change  Creatinine_earliest  Creatinine_latest  \
0           194.0       -0.160173                  1.1               1.28

   Creatinine_change
0           0.163636


说明:


首先sort_values多列
对于aggfirst列,按lastearliestlatest进行汇总。
通过assign创建新列
通过unstack进行修复
通过swaplevel交换MulriIndex中各列的级别
然后reindex以相同顺序(如原始列)
在列中用mapjoin展平MultiIndex
reset_index中列的最后index个。

关于python - 如何根据某些条件转换(长到宽)数据框,我们在Stack Overflow上找到一个类似的问题:https://stackoverflow.com/questions/53167448/

10-11 05:21