Let us say I have a pandas dataframe of this type (minimal example):

myDf = pd.DataFrame({'user': ['A'','B', 'C', 'D', 'E']*2,'date': ['2017-05-25']*5+['2017-05-26']*5,'nVisits':[10,2,3,0,0,6,0,4,8,1]})

在桌子上看起来像:
date        nVisits user
5/25/2017   10      A
5/25/2017   2       B
5/25/2017   3       C
5/25/2017   0       D
5/25/2017   0       E
5/26/2017   6       A
5/26/2017   0       B
5/26/2017   4       C
5/26/2017   8       D
5/26/2017   1       E

(1)我想将我的用户每天分为4个存储桶:0次访问,1次访问,2-4次访问,5+次访问,所以我想创建一个如下所示的数据框摘要:
date        group      nVisits  nObs
5/25/2017   zero       0        2
5/25/2017   one        0        0
5/25/2017   twoToFour  2        2
5/25/2017   fivePlus   10       1
5/26/2017   zero       0        1
5/26/2017   one        1        1
5/26/2017   twoToFour  4        1
5/26/2017   fivePlus   16       2

这个数据框架基本上是每个bucket的观察数以及每个bucket的访问数,其中哪些用户属于哪个bucket,每天更新一次。
(2)我想对所有客户的出生和死亡进行分类,其中,出生被分类为从0次就诊到>1次就诊的客户,死亡被分类为从>1次就诊到0次就诊的客户。
In this specific example the new dataframe would look like this:
date        event_type  user    nVisitsAtBirthDeath
5/26/2017   death       B       2
5/26/2017   birth       D       8
5/26/2017   birth       E       1


。。

最佳答案

我将使用pd.cut()方法:

In [29]: df['group'] = pd.cut(df.nVisits,
                              [-1, 0, 1, 4, np.inf],
                              labels=['zero','one','twoToFour','fivePlus'])

In [30]: df
Out[30]:
         date  nVisits user      group
0  2017-05-25       10    A   fivePlus
1  2017-05-25        2    B  twoToFour
2  2017-05-25        3    C  twoToFour
3  2017-05-25        0    D       zero
4  2017-05-25        0    E       zero
5  2017-05-26        6    A   fivePlus
6  2017-05-26        0    B       zero
7  2017-05-26        4    C  twoToFour
8  2017-05-26        8    D   fivePlus
9  2017-05-26        1    E        one

10-06 11:18