我有数据框:

         date   id
0  12-12-2015  123
1  13-12-2015  123
2  15-12-2015  123
3  16-12-2015  123
4  18-12-2015  123
5  12-12-2015  456
6  13-12-2015  456
7  15-12-2015  456


我需要将date数为id
我尝试df.groupby('id')['date'].count()
我需要获取(如果日期不在id中,则等于0)

      id   date   count
0  123   12-12-2015   1
1  123   13-12-2015   1
2  123   14-12-2015   0
3  123   15-12-2015   1
4  123   16-12-2015   1
5  123   17-12-2015   0
6  123   18-12-2015   1
7  456   12-12-2015   1
8  456   13-12-2015   1
9  456   14-12-2015   0
10 456   15-12-2015   1


然后将其以这种格式写入json文件

{
"1234567890abcdef1234567890abcdef": {
    "2016-06": 1,
    "2016-05": 0,
    "2016-04": 0,
    "2016-03": 1,
    "2016-02": 1,
    "2016-01": 0
},
"0987654321abcdef1234567890abcdef": {
    "2016-06": 1,
    "2016-05": 1,
    "2016-04": 1,
    "2016-03": 0,
    "2016-02": 0,
    "2016-01": 0
}


}

我怎样才能做到这一点?

最佳答案

第一次使用resample

df['date'] = pd.to_datetime(df.date)
df.set_index('date', inplace=True)

df = df.groupby('id').resample('D').size().reset_index(name='val')
print (df)

     id       date  val
0   123 2015-12-12    1
1   123 2015-12-13    1
2   123 2015-12-14    0
3   123 2015-12-15    1
4   123 2015-12-16    1
5   123 2015-12-17    0
6   123 2015-12-18    1
7   456 2015-12-12    1
8   456 2015-12-13    1
9   456 2015-12-14    0
10  456 2015-12-15    1


然后to_json

#remove 00:00:00 from datetime
df['date'] = df.date.dt.date
print (df.groupby('id').apply(lambda x: x.set_index('date')['val'].to_dict()).to_json())

{"123":{"2015-12-18":1,"2015-12-15":1,"2015-12-12":1,"2015-12-16":1,"2015-12-13":1,"2015-12-17":0,"2015-12-14":0},
"456":{"2015-12-15":1,"2015-12-12":1,"2015-12-13":1,"2015-12-14":0}}

10-04 22:46
查看更多