本文介绍了 pandas :如何创建运行计数列?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!
问题描述
我有一个格式为纯文本的文件(我添加了列标题)
I have a flat text file of the form (column headers added by me)
CASE Diagnosis
S1 no diagnosis
S2 fungus
squamous lesion
S3 fungus
S4 squamous lesion
glandular lesion
atypia
我想通过多个诊断来堆积和拆开箱子,所以我想
I would like to stack and unstack cases with multiple diagnoses, so I would like
CASE DxN Diagnosis
S1 A no diagnosis
S2 A fungus
B squamous lesion
S3 A fungus
S4 A squamous lesion
B glandular lesion
C atypia
和
CASE A B C
S1 no diagnosis
S2 fungus squamous lesion
S3 fungus
S4 squamous lesion glandular lesion atypia
如何使该子系列成为DxN?计数永远不应大于F.即使有10,000个可能的答案,每个案例也永远不会超过6个,因此,最多也不会超过6列.我只想对于案例S1,诊断A是什么,对于案例S1,诊断B是什么,对于案例S1,诊断3是什么?"我不想为每个可能的答案都列.
how do I make that subseries DxN? The count should never be greater than F. Even if there were 10,000 possible answers, there is never more than 6 per case, so no more than 6 columns. I just want "What is diagnosis A for case S1, what's diagnosis B for case S1, what's diagnosis 3 for case S1?" I don't want a column for every possible answer.
推荐答案
这是您需要的吗?
df=df.replace('',np.nan).ffill()
df.assign(DxN=df.groupby('CASE').cumcount()).set_index(['CASE','DxN']).Diagnosis.unstack(fill_value='')
Out[709]:
DxN 0 1
CASE
S1 nodiagnosis
S2 fungus squamouslesion
S3 fungus
S4 squamouslesion glandularlesion
这篇关于 pandas :如何创建运行计数列?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持!