问题描述
我想向运行MS SQL的远程服务器发送大型pandas.DataFrame
.我现在的方法是将data_frame
对象转换为元组列表,然后使用pyODBC的executemany()
函数将其发送出去.它是这样的:
I would like to send a large pandas.DataFrame
to a remote server running MS SQL. The way I do it now is by converting a data_frame
object to a list of tuples and then send it away with pyODBC's executemany()
function. It goes something like this:
import pyodbc as pdb
list_of_tuples = convert_df(data_frame)
connection = pdb.connect(cnxn_str)
cursor = connection.cursor()
cursor.fast_executemany = True
cursor.executemany(sql_statement, list_of_tuples)
connection.commit()
cursor.close()
connection.close()
然后,我开始怀疑是否可以使用data_frame.to_sql()
方法加速(或至少更具可读性).我想出了以下解决方案:
I then started to wonder if things can be sped up (or at least more readable) by using data_frame.to_sql()
method. I have came up with the following solution:
import sqlalchemy as sa
engine = sa.create_engine("mssql+pyodbc:///?odbc_connect=%s" % cnxn_str)
data_frame.to_sql(table_name, engine, index=False)
现在,代码更具可读性,但上传速度慢了至少150倍 ...
Now the code is more readable, but the upload is at least 150 times slower...
使用SQLAlchemy时是否可以翻转fast_executemany
?
Is there a way to flip the fast_executemany
when using SQLAlchemy?
我正在使用pandas-0.20.3,pyODBC-4.0.21和sqlalchemy-1.1.13.
I am using pandas-0.20.3, pyODBC-4.0.21 and sqlalchemy-1.1.13.
推荐答案
与SQLAlchemy的开发人员联系后,出现了一种解决此问题的方法.非常感谢他们的出色工作!
After contacting the developers of SQLAlchemy, a way to solve this problem has emerged. Many thanks to them for the great work!
必须使用游标执行事件并检查executemany
标志是否已升高.如果确实如此,请打开fast_executemany
选项.例如:
One has to use a cursor execution event and check if the executemany
flag has been raised. If that is indeed the case, switch the fast_executemany
option on. For example:
from sqlalchemy import event
@event.listens_for(engine, 'before_cursor_execute')
def receive_before_cursor_execute(conn, cursor, statement, params, context, executemany):
if executemany:
cursor.fast_executemany = True
有关执行事件的更多信息,请参见此处.
More information on execution events can be found here.
更新:在 SQLAlchemy 1.3.0 ,因此不再需要进行此破解.
UPDATE: Support for fast_executemany
of pyodbc
was added in SQLAlchemy 1.3.0, so this hack is not longer necessary.
这篇关于使用pyODBC的fast_executemany加速pandas.DataFrame.to_sql的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持!