python - Pandas 中的频率表(如R中的plyr)

我的问题是如何计算大熊猫多变量的频率。
我从这个数据框架中得到：

d1 = pd.DataFrame( {'StudentID': ["x1", "x10", "x2","x3", "x4", "x5", "x6",   "x7",     "x8", "x9"],
                       'StudentGender' : ['F', 'M', 'F', 'M', 'F', 'M', 'F', 'M', 'M', 'M'],
                 'ExamenYear': ['2007','2007','2007','2008','2008','2008','2008','2009','2009','2009'],
                 'Exam': ['algebra', 'stats', 'bio', 'algebra', 'algebra', 'stats', 'stats', 'algebra', 'bio', 'bio'],
                 'Participated': ['no','yes','yes','yes','no','yes','yes','yes','yes','yes'],
                  'Passed': ['no','yes','yes','yes','no','yes','yes','yes','no','yes']},
                  columns = ['StudentID', 'StudentGender', 'ExamenYear', 'Exam', 'Participated', 'Passed'])

结果如下

             Participated  OfWhichpassed
 ExamenYear
2007                   3              2
2008                   4              3
2009                   3              2

（1）我尝试的一种可能性是计算两个数据帧并绑定它们

t1 = d1.pivot_table(values = 'StudentID', rows=['ExamenYear'], cols = ['Participated'], aggfunc = len)
t2 = d1.pivot_table(values = 'StudentID', rows=['ExamenYear'], cols = ['Passed'], aggfunc = len)
tx = pd.concat([t1, t2] , axis = 1)

Res1 = tx['yes']

（2）第二种可能性是使用聚合函数。

import collections
dg = d1.groupby('ExamenYear')
Res2 = dg.agg({'Participated': len,'Passed': lambda x : collections.Counter(x == 'yes')[True]})

 Res2.columns = ['Participated', 'OfWhichpassed']

至少可以说，这两种方式都是尴尬的。
在熊猫身上这是怎么做的？
P.S:我还尝试了数值计数而不是集合。计数器，但无法使其工作
作为参考：几个月前，我问了类似的问题，Rhere和PLYR可以帮助
----更新------
用户DSM是正确的。在期望的表格结果中有一个错误。
（1）选项1的代码是

 t1 = d1.pivot_table(values = 'StudentID', rows=['ExamenYear'], aggfunc = len)
 t2 = d1.pivot_table(values = 'StudentID', rows=['ExamenYear'], cols = ['Participated'], aggfunc = len)
 t3 = d1.pivot_table(values = 'StudentID', rows=['ExamenYear'], cols = ['Passed'], aggfunc = len)

 Res1 = pd.DataFrame( {'All': t1,
                       'OfWhichParticipated': t2['yes'],
                     'OfWhichPassed': t3['yes']})

它会产生结果

             All  OfWhichParticipated  OfWhichPassed
ExamenYear
2007          3                    2              2
2008          4                    3              3
2009          3                    3              2

（2）对于选项2，由于用户herrfz，我了解了如何使用值计数，代码将

Res2 = d1.groupby('ExamenYear').agg({'StudentID': len,
                                 'Participated': lambda x: x.value_counts()['yes'],
                                 'Passed': lambda x: x.value_counts()['yes']})

Res2.columns = ['All', 'OfWgichParticipated', 'OfWhichPassed']

将产生与res1相同的结果
但我的问题仍然是：
使用选项2，是否可以使用同一个变量两次（用于另一个操作？）是否可以为结果变量传递自定义名称？
----新更新----
我最终决定使用apply，据我所知，它更灵活。

最佳答案

这是：

d1.groupby('ExamenYear').agg({'Participated': len,
                              'Passed': lambda x: sum(x == 'yes')})

看起来并不比R解决方案更尴尬，imho。

关于python - Pandas 中的频率表(如R中的plyr)，我们在Stack Overflow上找到一个类似的问题：https://stackoverflow.com/questions/15589354/