本文介绍了将"TimeStamp"列截断为pandas DataFrame中的小时精度的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我有一个名为dfpandas.DataFrame,它具有一个自动生成的索引,其列为dt:

I have a pandas.DataFrame called df which has an automatically generated index, with a column dt:

df['dt'].dtype, df['dt'][0]
# (dtype('<M8[ns]'), Timestamp('2014-10-01 10:02:45'))

我想做的是创建一个新的列,将其截断为小时精度.我当前正在使用:

What I'd like to do is create a new column truncated to hour precision. I'm currently using:

df['dt2'] = df['dt'].apply(lambda L: datetime(L.year, L.month, L.day, L.hour))

这行得通,所以很好.但是,我想知道有一种使用pandas.tseries.offsets或创建DatetimeIndex或类似方法的好方法.

This works, so that's fine. However, I've an inkling there's some nice way using pandas.tseries.offsets or creating a DatetimeIndex or similar.

那么,如果可能的话,是否有一些pandas向导可以做到这一点?

So if possible, is there some pandas wizardry to do this?

推荐答案

在pandas 0.18.0及更高版本中,日期时间为 floor ceil round 方法可将时间戳取整到给定的固定精度/频率.要舍入到小时精度,可以使用:

In pandas 0.18.0 and later, there are datetime floor, ceil and round methods to round timestamps to a given fixed precision/frequency. To round down to hour precision, you can use:

>>> df['dt2'] = df['dt'].dt.floor('h')
>>> df
                      dt                     dt2
0    2014-10-01 10:02:45     2014-10-01 10:00:00
1    2014-10-01 13:08:17     2014-10-01 13:00:00
2    2014-10-01 17:39:24     2014-10-01 17:00:00


这是截断时间戳的另一种方法.与floor不同,它支持截断到年或月之类的精度.


Here's another alternative to truncate the timestamps. Unlike floor, it supports truncating to a precision such as year or month.

您可以临时调整基础NumPy datetime64数据类型的精度单位,将其从[ns]更改为[h]:

You can temporarily adjust the precision unit of the underlying NumPy datetime64 datatype, changing it from [ns] to [h]:

df['dt'].values.astype('<M8[h]')

这会将所有内容截断为小时精度.例如:

This truncates everything to hour precision. For example:

>>> df
                       dt
0     2014-10-01 10:02:45
1     2014-10-01 13:08:17
2     2014-10-01 17:39:24

>>> df['dt2'] = df['dt'].values.astype('<M8[h]')
>>> df
                      dt                     dt2
0    2014-10-01 10:02:45     2014-10-01 10:00:00
1    2014-10-01 13:08:17     2014-10-01 13:00:00
2    2014-10-01 17:39:24     2014-10-01 17:00:00

>>> df.dtypes
dt     datetime64[ns]
dt2    datetime64[ns]

对于其他任何单位,同样的方法也应适用:月份'M',分钟'm',依此类推:

The same method should work for any other unit: months 'M', minutes 'm', and so on:

  • 保留至一年:'<M8[Y]'
  • 保持月份:'<M8[M]'
  • 保持一天:'<M8[D]'
  • 保持最新:'<M8[m]'
  • 紧跟第二:'<M8[s]'
  • Keep up to year: '<M8[Y]'
  • Keep up to month: '<M8[M]'
  • Keep up to day: '<M8[D]'
  • Keep up to minute: '<M8[m]'
  • Keep up to second: '<M8[s]'

这篇关于将"TimeStamp"列截断为pandas DataFrame中的小时精度的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持!

08-19 19:18