yearCount = df[['antibiotic', 'order_date', 'antiYearCount']]
yearGroups = yearCount.groupby('order_date')
for year in yearGroups:
yearCount['antiYearCount'] =year.groupby('antibiotic'['antibiotic'].transform(pd.Series.value_counts)
在本例中,yearCount是一个包含“order\u date”、“anticle”、“antiYearCount”的数据帧。我已清除“订单日期”以仅包含订单年份。我要按“顺序日期”中的年份对yearCount进行分组,计算每个“年份组”中每个“抗生素”出现的次数,然后将该值赋给yearCount的“antiYearCount”变量。谢谢你的帮助!
最佳答案
我认为您需要将新列order_date
添加到groupby
中,然后也可以使用size
代替pd.Series.value_counts
来获得相同的输出:
df = pd.DataFrame({'antibiotic':list('accbbb'),
'antiYearCount':[4,5,4,5,5,4],
'C':[7,8,9,4,2,3],
'D':[1,3,5,7,1,0],
'E':[5,3,6,9,2,4],
'order_date': pd.to_datetime(['2012-01-01']*3+['2012-01-02']*3)})
print (df)
C D E antiYearCount antibiotic order_date
0 7 1 5 4 a 2012-01-01
1 8 3 3 5 c 2012-01-01
2 9 5 6 4 c 2012-01-01
3 4 7 9 5 b 2012-01-02
4 2 1 2 5 b 2012-01-02
5 3 0 4 4 b 2012-01-02
#copy for remove warning
#https://stackoverflow.com/a/45035966/2901002
yearCount = df[['antibiotic', 'order_date', 'antiYearCount']].copy()
yearCount['antiYearCount'] = yearCount.groupby(['order_date','antibiotic'])['antibiotic'] \
.transform('size')
print (yearCount)
antibiotic order_date antiYearCount
0 a 2012-01-01 1
1 c 2012-01-01 2
2 c 2012-01-01 2
3 b 2012-01-02 3
4 b 2012-01-02 3
5 b 2012-01-02 3
yearCount['antiYearCount'] = yearCount.groupby(['order_date','antibiotic'])['antibiotic'] \
.transform(pd.Series.value_counts)
print (yearCount)
antibiotic order_date antiYearCount
0 a 2012-01-01 1
1 c 2012-01-01 2
2 c 2012-01-01 2
3 b 2012-01-02 3
4 b 2012-01-02 3
5 b 2012-01-02 3