我有一个像下面这样的数据框:

name,date
AAA,201705
AAA,201706
AAA,201707
AAA,201708
AAA,201710
AAA,201711
AAA,201802
AAA,201803
AAA,201804
AAA,201805
AAA,201806
AAA,201807


在此数据框中,有两列可用,即名称和日期。在日期列中,只有年份和月份可用yyyymm格式。

在日期列值201709、201712和201801个月不可用。

需要检查所有月份是否存在。如果没有月份,则需要采用以下格式的输出:

name,start_date,end_date,count
AAA,201709,201709,1
AAA,201712,201801,2


我正在尝试使用pandas diff function

最佳答案

使用asfreq

#convert column to datetimes
df['date'] = pd.to_datetime(df['date'], format='%Y%m')
# get missing values by asfreq
a = df.set_index('date').groupby('name')['name'].apply(lambda x: x.asfreq('MS'))
#filter only NaNs consecutive rows
b = a.notnull().cumsum()[a.isnull()].reset_index(name='g')

#aggregate first, last and count
d = {'date':['first','last'],'name':['first', 'size']}
df = b.groupby('g').agg(d).reset_index(drop=True)
#data cleaning
df.columns = df.columns.map('_'.join)
df = df.rename(columns={'date_first':'start_date',
                        'date_last':'end_date',
                        'name_first':'name',
                        'name_size':'count'})
print (df)
  start_date   end_date name  count
0 2017-09-01 2017-09-01  AAA      1
1 2017-12-01 2018-01-01  AAA      2


详情:

print (a)
name  date
AAA   2017-05-01    AAA
      2017-06-01    AAA
      2017-07-01    AAA
      2017-08-01    AAA
      2017-09-01    NaN
      2017-10-01    AAA
      2017-11-01    AAA
      2017-12-01    NaN
      2018-01-01    NaN
      2018-02-01    AAA
      2018-03-01    AAA
      2018-04-01    AAA
      2018-05-01    AAA
      2018-06-01    AAA
      2018-07-01    AAA
Name: name, dtype: object

关于python - 如何从月度数据中找出月度差距计数?,我们在Stack Overflow上找到一个类似的问题:https://stackoverflow.com/questions/51040611/

10-09 15:10
查看更多