本文介绍了 pandas :如何创建运行计数列?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我有一个格式为纯文本的文件(我添加了列标题)

I have a flat text file of the form (column headers added by me)

CASE        Diagnosis
  S1 no diagnosis
  S2 fungus
     squamous lesion
  S3 fungus
  S4 squamous lesion
     glandular lesion
     atypia

我想通过多个诊断来堆积和拆开箱子,所以我想

I would like to stack and unstack cases with multiple diagnoses, so I would like

CASE DxN         Diagnosis
  S1 A   no diagnosis
  S2 A   fungus   
     B   squamous lesion
  S3 A   fungus
  S4 A   squamous lesion
     B   glandular lesion
     C   atypia

CASE                 A                 B       C
  S1 no diagnosis
  S2 fungus             squamous lesion
  S3 fungus
  S4 squamous lesion    glandular lesion  atypia

如何使该子系列成为DxN?计数永远不应大于F.即使有10,000个可能的答案,每个案例也永远不会超过6个,因此,最多也不会超过6列.我只想对于案例S1,诊断A是什么,对于案例S1,诊断B是什么,对于案例S1,诊断3是什么?"我不想为每个可能的答案都列.

how do I make that subseries DxN? The count should never be greater than F. Even if there were 10,000 possible answers, there is never more than 6 per case, so no more than 6 columns. I just want "What is diagnosis A for case S1, what's diagnosis B for case S1, what's diagnosis 3 for case S1?" I don't want a column for every possible answer.

推荐答案

这是您需要的吗?

    df=df.replace('',np.nan).ffill()
    df.assign(DxN=df.groupby('CASE').cumcount()).set_index(['CASE','DxN']).Diagnosis.unstack(fill_value='')
    Out[709]: 
    DxN                0                1
    CASE                                 
    S1       nodiagnosis                 
    S2            fungus   squamouslesion
    S3            fungus                 
    S4    squamouslesion  glandularlesion

这篇关于 pandas :如何创建运行计数列?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持!

10-23 10:01