本文介绍了使 pandas 与摆锤一起工作的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我最近偶然发现了一个很棒的新 pendulum,以便更轻松地处理日期时间.

I've recently stumbled upon a new awesome pendulum library for easier work with datetimes.

pandas中,有一个方便的 to_datetime()方法允许将系列和其他对象转换为日期时间:

In pandas, there is this handy to_datetime() method allowing to convert series and other objects to datetimes:

raw_data['Mycol'] = pd.to_datetime(raw_data['Mycol'], format='%d%b%Y:%H:%M:%S.%f')

创建自定义to_<something>方法的规范方法是什么- 在这种情况下,可以使用to_pendulum()方法将一系列日期字符串直接转换为 Pendulum对象?

What would be the canonical way to create a custom to_<something> method - in this case to_pendulum() method which would be able to convert Series of date strings directly to Pendulum objects?

这可能会导致Series具有各种有趣的功能,例如,将一系列日期字符串转换为一系列从现在开始的偏移量"-人类日期时间差异.

This may lead to Series having various interesting capabilities like, for instance, converting a series of date strings to a series of "offsets from now" - human datetime diffs.

推荐答案

在仔细浏览了一下API之后,我必须说我对他们所做的事情印象深刻.不幸的是,我认为Pendulumpandas不能一起工作(至少在当前最新版本-v0.21中).

After looking through the API a bit, I must say I'm impressed with what they've done. Unfortunately, I don't think Pendulum and pandas can work together (at least, with the current latest version - v0.21).

最重要的原因是pandas本身不支持Pendulum作为数据类型.所有本机支持的数据类型(np.intnp.floatnp.datetime64)都支持某种形式的矢量化.使用数据框(例如,普通循环和列表)将不会丝毫提高性能.如果有的话,用Pendulum对象在Series上调用apply会更慢(因为所有API开销).

The most important reason is that pandas does not natively support Pendulum as a datatype. All the natively supported datatypes (np.int, np.float and np.datetime64) all support vectorisation in some form. You are not going to get a shred of performance improvement using a dataframe over, say, a vanilla loop and list. If anything, calling apply on a Series with Pendulum objects is going to be slower (because of all the API overheads).

另一个原因是Pendulumdatetime-

from datetime import datetime

isinstance(pendulum.now(), datetime)
True

这很重要,因为如上所述,datetime是受支持的数据类型,因此熊猫会尝试datetime强制转换为熊猫的本机日期时间格式-Timestamp.这是一个例子.

This is important, because, as mentioned above, datetime is a supported datatype, so pandas will attempt to coerce datetime to pandas' native datetime format - Timestamp. Here's an example.

print(s)

0     2017-11-09 18:43:45
1     2017-11-09 20:15:27
2     2017-11-09 22:29:00
3     2017-11-09 23:42:34
4     2017-11-10 00:09:40
5     2017-11-10 00:23:14
6     2017-11-10 03:32:17
7     2017-11-10 10:59:24
8     2017-11-10 11:12:59
9     2017-11-10 13:49:09

s = s.apply(pendulum.parse)
s

0    2017-11-09 18:43:45+00:00
1    2017-11-09 20:15:27+00:00
2    2017-11-09 22:29:00+00:00
3    2017-11-09 23:42:34+00:00
4    2017-11-10 00:09:40+00:00
5    2017-11-10 00:23:14+00:00
6    2017-11-10 03:32:17+00:00
7    2017-11-10 10:59:24+00:00
8    2017-11-10 11:12:59+00:00
9    2017-11-10 13:49:09+00:00
Name: timestamp, dtype: datetime64[ns, <TimezoneInfo [UTC, GMT, +00:00:00, STD]>]

s[0]
Timestamp('2017-11-09 18:43:45+0000', tz='<TimezoneInfo [UTC, GMT, +00:00:00, STD]>')

type(s[0])
pandas._libs.tslib.Timestamp

因此,有些困难(涉及到dtype=object),您可以将Pendulum对象加载到数据帧中.这是您的处理方式-

So, with some difficulty (involving dtype=object), you could load Pendulum objects into dataframes. Here's how you'd do that -

v = np.vectorize(pendulum.parse)
s = pd.Series(v(s), dtype=object)

s

0     2017-11-09T18:43:45+00:00
1     2017-11-09T20:15:27+00:00
2     2017-11-09T22:29:00+00:00
3     2017-11-09T23:42:34+00:00
4     2017-11-10T00:09:40+00:00
5     2017-11-10T00:23:14+00:00
6     2017-11-10T03:32:17+00:00
7     2017-11-10T10:59:24+00:00
8     2017-11-10T11:12:59+00:00
9     2017-11-10T13:49:09+00:00

s[0]
<Pendulum [2017-11-09T18:43:45+00:00]>

但是,这实际上是没有用的,因为调用 any pendulum方法(通过apply)现在不仅会非常慢,而且最终结果会被强制为Timestamp再次徒劳无功.

However, this is essentially useless, because calling any pendulum method (via apply) will now not only be super slow, but will also end up in the result being coerced to Timestamp again, an exercise in futility.

这篇关于使 pandas 与摆锤一起工作的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持!

06-20 16:48