本文介绍了parse_dates如何与pd.read_sql_query一起使用的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

从MySQL数据库检索数据时,熊猫的parse_date应该如何工作?

How is Pandas parse_date supposed to work when retrieving data from a MySQL database?

Pandas 0.23 的文档提供了此信息信息:

The documentation of Pandas 0.23 gives this information:

要解析为日期的列名列表.

List of column names to parse as dates.

{column_name:格式的字典 字符串},其中格式字符串在解析的情况下与strftime兼容 字符串时间,或者在解析的情况下为(D,s,ns,ms,us)中的一个 整数时间戳.

Dict of {column_name: format string} where format string is strftime compatible in case of parsing string times, or is one of (D, s, ns, ms, us) in case of parsing integer timestamps.

{column_name:arg dict}的字典,其中arg dict对应于pandas.to_datetime()的关键字参数. 对于没有本地Datetime支持的数据库尤其有用,例如 作为SQLite.

Dict of {column_name: arg dict}, where the arg dict corresponds to the keyword arguments of pandas.to_datetime() Especially useful with databases without native Datetime support, such as SQLite.

例如,我想从MySQL Sakila数据库中检索一些数据.

I would like to retrieve for example some data from the MySQL Sakila database.

create table actor
(
    actor_id smallint(5) unsigned auto_increment
        primary key,
    first_name varchar(45) not null,
    last_name varchar(45) not null,
    last_update timestamp not null on update CURRENT_TIMESTAMP,
    constraint idx_unique_id_name
        unique (actor_id, last_name)
)

以下是一些示例数据:

INSERT INTO sakila.actor (actor_id, first_name, last_name, last_update) VALUES (1, 'PENELOPE', 'None', '2018-05-17 11:08:03');
INSERT INTO sakila.actor (actor_id, first_name, last_name, last_update) VALUES (2, 'NICK', 'WAHLBERG', '2006-02-15 04:34:33');
INSERT INTO sakila.actor (actor_id, first_name, last_name, last_update) VALUES (3, 'ED', 'CHASE', '2006-02-15 04:34:33');
INSERT INTO sakila.actor (actor_id, first_name, last_name, last_update) VALUES (4, 'JENNIFER', 'DAVIS', '2006-02-15 04:34:33');
INSERT INTO sakila.actor (actor_id, first_name, last_name, last_update) VALUES (5, 'JOHNNY', 'LOLLOBRIGIDA', '2018-05-17 11:14:15');
INSERT INTO sakila.actor (actor_id, first_name, last_name, last_update) VALUES (6, 'BETTE', 'Echt', '2018-05-17 11:13:57');
INSERT INTO sakila.actor (actor_id, first_name, last_name, last_update) VALUES (7, 'GRACE', 'MOSTEL', '2006-02-15 04:34:33');
INSERT INTO sakila.actor (actor_id, first_name, last_name, last_update) VALUES (8, 'MATTHEW', 'JOHANSSON', '2006-02-15 04:34:33');
INSERT INTO sakila.actor (actor_id, first_name, last_name, last_update) VALUES (9, 'JOE', 'SWANK', '2006-02-15 04:34:33');
INSERT INTO sakila.actor (actor_id, first_name, last_name, last_update) VALUES (10, 'CHRISTIAN', 'GABLE', '2006-02-15 04:34:33');

我使用默认的MySQL Python连接器:

I use the default MySQL Python Connector:

    db_connection_url = 'mysql+mysqlconnector://' \
                        + mysql_config_dict['user'] \
                        + ":"  \
                        + mysql_config_dict['password'] \
                        + "@" \
                        + mysql_config_dict['host'] \
                        + ":" \
                        + mysql_config_dict['port'] \
                        + "/"  \
                        + mysql_config_dict['db_name']

    if('ssl_cert' in mysql_config_dict):

        ssl_args = {'ssl_ca':mysql_config_dict['ssl_ca']}

    else:
        ssl_args = ''

使用这些参数

mysql_config_dict = {
    'user': 'root',
    'password': '',
    'host':  '127.0.0.1',
    'port': '3306',
    'db_name':  'sakila',
    'ssl_cert': os.getenv('SSL_CERT'),
    'ssl_key': os.getenv('SSL_KEY'),
    'ssl_ca': os.getenv('SSL_CA')
}

用于获取引擎.

用于检索结果集的Python代码段:

The Python snippet to retrieve the result set:

df = pd.read_sql_query('SELECT a.actor_id, a.last_name, a.last_update FROM sakila.actor a',parse_dates={'last_update':'%Y%m%d %H:%M:%S'},con=mysql_conn)

我得到一个KeyError:

I obtain a KeyError:

Traceback (most recent call last):
  File "~/Development/python-virtual-env/lib/python3.5/site-packages/pandas/core/indexes/base.py", line 2442, in get_loc
    return self._engine.get_loc(key)
  File "pandas/_libs/index.pyx", line 132, in pandas._libs.index.IndexEngine.get_loc (pandas/_libs/index.c:5280)
  File "pandas/_libs/index.pyx", line 154, in pandas._libs.index.IndexEngine.get_loc (pandas/_libs/index.c:5126)
  File "pandas/_libs/hashtable_class_helper.pxi", line 1210, in pandas._libs.hashtable.PyObjectHashTable.get_item (pandas/_libs/hashtable.c:20523)
  File "pandas/_libs/hashtable_class_helper.pxi", line 1218, in pandas._libs.hashtable.PyObjectHashTable.get_item (pandas/_libs/hashtable.c:20477)
KeyError: 'last_update'

当我使用

df = pd.read_sql_query('SELECT a.actor_id, a.last_name, a.last_update FROM sakila.actor a',parse_dates=True,con=mysql_conn)

它可以工作,但是我可以在IntelliJ的DataFrame视图中看到列'last_update'的列名以Byte字面量为前缀: b'last_update',这很奇怪.

it works but I can see in the DataFrame view of IntelliJ that the column name of the column 'last_update' is prefixed with the Byte literal: b'last_update', which is strange.

当我要将多个列视为日期列时,这里的正确用法是什么?谢谢!

What is the correct usage here when I want to treat multiple columns as date columns. Thanks!

推荐答案

当我使用以下命令调用pd.read_sql时,我将列表中的字段名称传递给parse_dates:

I pass the field names in a list to parse_dates when I invoke pd.read_sql with:

df= pd.read_sql(query,
                connection,
                parse_dates=['Date_of_creation',
                             'Date_of_termination']
                )

您提到使用字典进行自定义格式化:

You mentioned doing it with a dictionary for custom formatting:

fmt='%Y%m%d %H:%M:%S'

df= pd.read_sql(query,
                connection,
                parse_dates={'Date_of_creation':fmt,
                             'Date_of_termination':fmt}
                )

这篇关于parse_dates如何与pd.read_sql_query一起使用的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持!

08-22 23:44
查看更多