问题描述
我有一个名为df
的pandas.DataFrame
,它具有一个自动生成的索引,其列为dt
:
I have a pandas.DataFrame
called df
which has an automatically generated index, with a column dt
:
df['dt'].dtype, df['dt'][0]
# (dtype('<M8[ns]'), Timestamp('2014-10-01 10:02:45'))
我想做的是创建一个新的列,将其截断为小时精度.我当前正在使用:
What I'd like to do is create a new column truncated to hour precision. I'm currently using:
df['dt2'] = df['dt'].apply(lambda L: datetime(L.year, L.month, L.day, L.hour))
这行得通,所以很好.但是,我想知道有一种使用pandas.tseries.offsets
或创建DatetimeIndex
或类似方法的好方法.
This works, so that's fine. However, I've an inkling there's some nice way using pandas.tseries.offsets
or creating a DatetimeIndex
or similar.
那么,如果可能的话,是否有一些pandas
向导可以做到这一点?
So if possible, is there some pandas
wizardry to do this?
推荐答案
在pandas 0.18.0及更高版本中,日期时间为 floor
, ceil
和 round
方法可将时间戳取整到给定的固定精度/频率.要舍入到小时精度,可以使用:
In pandas 0.18.0 and later, there are datetime floor
, ceil
and round
methods to round timestamps to a given fixed precision/frequency. To round down to hour precision, you can use:
>>> df['dt2'] = df['dt'].dt.floor('h')
>>> df
dt dt2
0 2014-10-01 10:02:45 2014-10-01 10:00:00
1 2014-10-01 13:08:17 2014-10-01 13:00:00
2 2014-10-01 17:39:24 2014-10-01 17:00:00
这是截断时间戳的另一种方法.与floor
不同,它支持截断到年或月之类的精度.
Here's another alternative to truncate the timestamps. Unlike floor
, it supports truncating to a precision such as year or month.
您可以临时调整基础NumPy datetime64
数据类型的精度单位,将其从[ns]
更改为[h]
:
You can temporarily adjust the precision unit of the underlying NumPy datetime64
datatype, changing it from [ns]
to [h]
:
df['dt'].values.astype('<M8[h]')
这会将所有内容截断为小时精度.例如:
This truncates everything to hour precision. For example:
>>> df
dt
0 2014-10-01 10:02:45
1 2014-10-01 13:08:17
2 2014-10-01 17:39:24
>>> df['dt2'] = df['dt'].values.astype('<M8[h]')
>>> df
dt dt2
0 2014-10-01 10:02:45 2014-10-01 10:00:00
1 2014-10-01 13:08:17 2014-10-01 13:00:00
2 2014-10-01 17:39:24 2014-10-01 17:00:00
>>> df.dtypes
dt datetime64[ns]
dt2 datetime64[ns]
对于其他任何单位,同样的方法也应适用:月份'M'
,分钟'm'
,依此类推:
The same method should work for any other unit: months 'M'
, minutes 'm'
, and so on:
- 保留至一年:
'<M8[Y]'
- 保持月份:
'<M8[M]'
- 保持一天:
'<M8[D]'
- 保持最新:
'<M8[m]'
- 紧跟第二:
'<M8[s]'
- Keep up to year:
'<M8[Y]'
- Keep up to month:
'<M8[M]'
- Keep up to day:
'<M8[D]'
- Keep up to minute:
'<M8[m]'
- Keep up to second:
'<M8[s]'
这篇关于将"TimeStamp"列截断为pandas DataFrame中的小时精度的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持!