本文介绍了从MySQL向 pandas 加载500万行的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我在(本地)网络上的MySQL数据库中有500万行(连接速度非常快,不在Internet上).

I have 5 million rows in a MySQL DB sitting over the (local) network (so quick connection, not on the internet).

与数据库的连接正常,但是如果我尝试这样做

The connection to the DB works fine, but if I try to do

f = pd.read_sql_query('SELECT * FROM mytable', engine, index_col = 'ID')

这真的需要很长时间.即使使用chunksize进行分块也会很慢.此外,我真的不知道它只是挂在这里还是在检索信息.

This takes a really long time. Even chunking with chunksize will be slow. Besides, I don't really know whether it's just hung there or indeed retrieving information.

对于那些在DB上处理大数据的人,我想问一下他们如何在Pandas会话中检索数据?

I would like to ask, for those people working with large data on a DB, how they retrieve their data for their Pandas session?

例如,运行查询,返回包含结果的csv文件并将那个加载到Pandas中是否会更智能"?听起来比需要的更多.

Would it be "smarter", for example, to run the query, return a csv file with the results and load that into Pandas? Sounds much more involved than it needs to be.

推荐答案

从-any-SQL数据库中将表中的所有数据加载到pandas中的最佳方法是:

The best way of loading all data from a table out of -any-SQL database into pandas is:

  1. 使用 COPY 将数据转出数据库PostgreSQL,选择进入输出文件(对于MySQL)或类似的东西方言.
  2. 使用 pandas.read_csv函数,用熊猫读取csv文件
  1. Dumping the data out of the database using COPY for PostgreSQL, SELECT INTO OUTFILE for MySQL or similar for other dialects.
  2. Reading the csv file with pandas using the pandas.read_csv function

仅将连接器用于读取几行. SQL数据库的强大功能在于它能够基于索引传递小块数据.

Use the connector only for reading a few rows. The power of an SQL database is its ability to deliver small chunks of data based on indices.

通过转储可以交付整个表.

Delivering entire tables is something you do with dumps.

这篇关于从MySQL向 pandas 加载500万行的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持!

08-11 14:13