我该如何转换:
patient_id test test_value date_taken
11964 HBA1C 8.60 2017-06-14
11964 Glucose 231.00 2017-05-01
11964 Glucose 202.00 2017-07-01
11964 Glucose 194.00 2017-09-02
11964 Creatinine 1.10 2017-05-01
11964 Creatinine 1.28 2017-08-14
对此吗?
patient_id hba1c_earliest hba1c_latest hba1c_change glucose_earliest glucose_latest/
11964 8.60 8.60 0.0000 231.0 194.0
glucose_change creatinine_earliest creatinine_latest creatinine_change
-0.1602 1.10 1.28 0.1636
对于扩展数据框:
.*_earliest columns should include that lab result with the earliest date. .*_latest columns should include that lab result with the latest date. .*_change columns should hold the relative change (variation), (Latest - Earliest) / Earliest.
最佳答案
采用:
print (df.dtypes)
patient_id int64 <- not necessary
test object <- not necessary
test_value float64 <- necessary
date_taken datetime64[ns] <- necessary
dtype: object
df = (df.sort_values(['patient_id','test','date_taken'])
.groupby(['patient_id','test'])['test_value']
.agg([('earliest','first'),('latest','last')])
.assign(change = lambda x: (x['latest'] - x['earliest'])/ x['earliest'])
.unstack()
.swaplevel(0,1, axis=1)
.reindex(columns=df['test'].unique(), level=0)
)
df.columns = df.columns.map('_'.join)
df = df.reset_index()
print (df)
patient_id HBA1C_earliest HBA1C_latest HBA1C_change Glucose_earliest \
0 11964 8.6 8.6 0.0 231.0
Glucose_latest Glucose_change Creatinine_earliest Creatinine_latest \
0 194.0 -0.160173 1.1 1.28
Creatinine_change
0 0.163636
说明:
首先
sort_values
多列对于
agg
和first
列,按last
与earliest
和latest
进行汇总。通过
assign
创建新列通过
unstack
进行修复通过
swaplevel
交换MulriIndex中各列的级别然后
reindex
以相同顺序(如原始列)在列中用
map
用join
展平MultiIndexreset_index
中列的最后index
个。关于python - 如何根据某些条件转换(长到宽)数据框,我们在Stack Overflow上找到一个类似的问题:https://stackoverflow.com/questions/53167448/