我有一个数据集(dataset1)如下所示:

Date        Company     Weekday

2015-01-01  Company1     Monday

2015-01-02  Company1     Tuesday

2015-01-03  Company1     Wednesday

2015-01-04  Company1     Thursday

2015-12-09  Company2     Monday

2015-12-10  Company2     Tuesday
………………………………………………………………………

2016-01-08  Company3     Wednesday

2016-01-09  Company3     Thursday


然后,我应用以下代码:

dataset2 = dataset1.groupby(['Company','Weekday']).size().sort_values(ascending=False)


一旦应用了以上代码,我将得到以下结果:

Index                        0

('Company1', Monday)        80

('Company1', Tuesday)       80

('Company1', Wednesday)     79
………………………………………………………………….

('Company3', Tuesday)       34


我试图隔离计数值大于50的所有dataset2条目,但是尝试以下操作时会遇到各种错误:

dataset2=dataset2.loc[dataset2[0]>50]


谁能提出意见?

最佳答案

使用Series,因此需要:

dataset2 = dataset1.groupby(['Company','Weekday']).size().sort_values(ascending=False)
dataset2 = dataset2[dataset2 > 50]


另一个解决方案是为Series.reset_index添加带有参数nameDataFrame,然后按列count进行过滤:

dataset2 = (dataset1.groupby(['Company','Weekday'])
                    .size()
                    .sort_values(ascending=False)
                    .reset_index(name='count'))

dataset2 = dataset2[dataset2['count'] > 50]

关于python - 使用 Pandas 按日期计数值的频率-第二部分,我们在Stack Overflow上找到一个类似的问题:https://stackoverflow.com/questions/53387122/

10-10 21:15