本文介绍了使用pyODBC的fast_executemany加速pandas.DataFrame.to_sql的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我想向运行MS SQL的远程服务器发送大型pandas.DataFrame.我现在的方法是将data_frame对象转换为元组列表,然后使用pyODBC的executemany()函数将其发送出去.它是这样的:

I would like to send a large pandas.DataFrame to a remote server running MS SQL. The way I do it now is by converting a data_frame object to a list of tuples and then send it away with pyODBC's executemany() function. It goes something like this:

 import pyodbc as pdb

 list_of_tuples = convert_df(data_frame)

 connection = pdb.connect(cnxn_str)

 cursor = connection.cursor()
 cursor.fast_executemany = True
 cursor.executemany(sql_statement, list_of_tuples)
 connection.commit()

 cursor.close()
 connection.close()

然后,我开始怀疑是否可以使用data_frame.to_sql()方法加速(或至少更具可读性).我想出了以下解决方案:

I then started to wonder if things can be sped up (or at least more readable) by using data_frame.to_sql() method. I have came up with the following solution:

 import sqlalchemy as sa

 engine = sa.create_engine("mssql+pyodbc:///?odbc_connect=%s" % cnxn_str)
 data_frame.to_sql(table_name, engine, index=False)

现在,代码更具可读性,但上传速度慢了至少150倍 ...

Now the code is more readable, but the upload is at least 150 times slower...

使用SQLAlchemy时是否可以翻转fast_executemany?

Is there a way to flip the fast_executemany when using SQLAlchemy?

我正在使用pandas-0.20.3,pyODBC-4.0.21和sqlalchemy-1.1.13.

I am using pandas-0.20.3, pyODBC-4.0.21 and sqlalchemy-1.1.13.

推荐答案

与SQLAlchemy的开发人员联系后,出现了一种解决此问题的方法.非常感谢他们的出色工作!

After contacting the developers of SQLAlchemy, a way to solve this problem has emerged. Many thanks to them for the great work!

必须使用游标执行事件并检查executemany标志是否已升高.如果确实如此,请打开fast_executemany选项.例如:

One has to use a cursor execution event and check if the executemany flag has been raised. If that is indeed the case, switch the fast_executemany option on. For example:

from sqlalchemy import event

@event.listens_for(engine, 'before_cursor_execute')
def receive_before_cursor_execute(conn, cursor, statement, params, context, executemany):
    if executemany:
        cursor.fast_executemany = True

有关执行事件的更多信息,请参见此处.

More information on execution events can be found here.

更新:在 SQLAlchemy 1.3.0 ,因此不再需要进行此破解.

UPDATE: Support for fast_executemany of pyodbc was added in SQLAlchemy 1.3.0, so this hack is not longer necessary.

这篇关于使用pyODBC的fast_executemany加速pandas.DataFrame.to_sql的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持!

08-11 15:14