本文介绍了如何识别连续的日期的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我想在数据帧中识别连续的日期,那就是存在一个直接的前身或后继。然后我想标记哪一个日期在新的列中是不连续的。此外,我想在我的数据的特定子集内执行此操作。

I would like to identify dates in a dataframe which are consecutive, that is there exists either an immediate predecessor or successor. I would then like to mark which dates are and are not consecutive in a new column. Additionally I would like to do this operation within particular subsets of my data.

首先,我创建一个新的变量,我将连续确认True为False。

First I create a new variable where I'd identify True of False for Consecutive Days.

weatherFile['CONSECUTIVE_DAY'] = 'NA'

我已经转换日期到datetime对象,然后到序号:

I've converted dates into datetime objects then to ordinal ones:

weatherFile['DATE_OBJ'] = [datetime.strptime(d, '%Y%m%d') for d in weatherFile['DATE']]
weatherFile['DATE_INT'] = list([d.toordinal() for d in weatherFile['DATE_OBJ']])

现在我想确定以下组中的连续日期:

Now I would like to identify consecutive dates in the following groups:

weatherFile.groupby(['COUNTY_GEOID_YEAR', 'TEMPBIN'])

我正在考虑循环使用组,并应用一个操作来确定哪些日期是连续的,哪些不在独特的县里,tempbin子集。

I am thinking to loop through the groups and applying an operation that will identify which days are consecutive and which are not, within unique county, tempbin subsets.

我比较新的编程和python,这是一个很好的方法,到目前为止,如果这样我可以进步吗?

I'm rather new to programming and python, is this a good approach so far, if so how can I progress?

谢谢 - 如果我应该提供其他信息,请告诉我们。

Thank you - Let me know if I should provide additional information.

更新:

使用@karakfa建议我尝试了以下内容:

Using @karakfa advice I tried the following:

weatherFile.groupby(['COUNTY_GEOID_YEAR', 'TEMPBIN'])
weatherFile['DISTANCE'] = weatherFile[1:, 'DATE_INT'] - weatherFile[:-1,'DATE_INT']
weatherFile['CONSECUTIVE?'] = np.logical_or(np.insert((weatherFile['DISTANCE']),0,0) == 1, np.append((weatherFile['DISTANCE']),0) == 1)

这导致一个TypeError :不可分类型回溯发生在第二行。 weatherFile ['DATE_INT']是dtype:int64。

This resulting in a TypeError: unhashable type. Traceback happened in the second line. weatherFile['DATE_INT'] is dtype: int64.

推荐答案

您可以使用.shift(-1)或.shift(1)来比较连续的条目:

You can use .shift(-1) or .shift(1) to compare consecutive entries:

df.loc[df['DATE_INT'].shift(-1) - df['DATE_INT'] == 1, 'CONSECUTIVE_DAY'] = True

如果上一个条目是前一天,则将CONSECUTIVE_DAY设置为TRUE

Will set CONSECUTIVE_DAY to TRUE if the previous entry is the previous day

df.loc[(df['DATE_INT'].shift(-1) - df['DATE_INT'] == 1) | (df['DATE_INT'].shift(1) - df['DATE_INT'] == -1), 'CONSECUTIVE_DAY'] = True

如果条目是连续的日期之前或之后,则将CONSECUTIVE_DAY设置为TRUE。

Will set CONSECUTIVE_DAY to TRUE if the entry is preceeded by or followed by a consecutive date.

这篇关于如何识别连续的日期的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持!

06-28 19:09