python - 如何使用Python Pandas汇总，获取百分比并重新分配列和行？

我有三列“ A”（接受）和“ D”（拒绝）“ Decision”，以及年份和月份

Decision   Year   Month
A   2003   1
A   2005   3
D   2005   2
D   2003   3
A   2004   1

我想根据Decision ='A'的计数来对其进行重组，然后以Year为索引并以每月为列，创建一个新的df。注意：每个单元格现在变为否。该年和月中“ A”的含义

Year Month1 Month2 Month3 ...
2002   1   3   4
2003   2   4   5
2004   2   3   5
2005   5   3   42
2006   4   2   12

类似地，我想要决策='D'的另一个df

Year Month1 Month2 Month3 ...
2002   4   4   3
2003   2   4   23
2004   4   1   12
2005   4   2   31
2006   4   2   22

但最终，我希望每个单元格都是（编号“ A”）/（编号“ A” +编号“ D”）的百分比

Year Month1 Month2 Month3 ...
2002   .2   .43   .57
2003  (etc)
2004  (etc)
2005   (etc)
2006   (etc)

我曾尝试对熊猫进行groupby尝试，但没有成功，我想我可以创建不同的列表来获取计数，然后将这些列表合并在一起以创建df，但是我想知道熊猫是否有更简单的选择。

最佳答案

使用value_counts在groupby中使用normalize=True

d1 = df.groupby(['Year', 'Month']).Decision.value_counts(normalize=True)
d1.xs('A', level='Decision').unstack('Month', fill_value=0).add_prefix('Month')

Month    Month1    Month2    Month3
Year
2002   0.200000  0.428571  0.571429
2003   0.400000  0.666667  0.416667
2004   0.285714  0.300000  0.312500

设定

df = pd.DataFrame(dict(
        Decision=['A'] * 29 + ['D'] * 46,
        Year=[2002] * 8 + [2003] * 11 + [2004] * 10
           + [2002] * 11 + [2003] * 12 + [2004] * 23,
        Month=[
            1, 2, 2, 2, 3, 3, 3, 3, 1, 1, 2, 2, 2, 2, 3, 3, 3, 3, 3,
            1, 1, 2, 2, 2, 3, 3, 3, 3, 3, 1, 1, 1, 1, 2, 2, 2, 2, 3,
            3, 3, 1, 1, 1, 2, 2, 3, 3, 3, 3, 3, 3, 3, 1, 1, 1, 1, 1,
            2, 2, 2, 2, 2, 2, 2, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3],
    ))[['Decision', 'Year', 'Month']]