python-3.x - PandaSQL很慢

我目前正从R切换到Python(anconda/Spyder Python 3)以进行数据分析。在R中，我经常使用很多R sqldf。由于我擅长sql查询，因此我不想重新学习data.table语法。使用R sqldf，我从未遇到性能问题。

现在，在Python中，我尝试使用pandasql，一个简单的df = "SELECT * From table LIMIT 1"将在193k行，19列上永久使用。

我尝试了pysqldf，但收到一条错误消息，说该表不存在，但确实存在。

# -*- coding: utf-8 -*-

import pandas as pd
import pandasql
import pysqldf

#Data loading
orders = pd.read_csv('data/orders.csv',sep = ';')

###### PANDASQL ######
test = pandasql.sqldf("SELECT  orders_id from orders LIMIT 1;",globals())
# Will last several minutes and use a lot of RAM

test = pandasql.sqldf("SELECT  orders_id from orders LIMIT 1;",locals())
# Will last several minutes and use a lot of RAM


###### PYSQLDF ######
sqldf = pysqldf.SQLDF(globals())
test = sqldf.execute("SELECT  * from orders LIMIT 1;")
#error
#Error for pysqldf

Traceback (most recent call last):

  File "<ipython-input-12-30b645117dc4>", line 1, in <module>
    test = sqldf.execute("SELECT  * from orders LIMIT 1;")

  File "C:\Users\p.stepniewski\AppData\Local\Continuum\anaconda3\lib\site-packages\pysqldf\sqldf.py", line 76, in execute
    self._del_table(tables)

  File "C:\Users\p.stepniewski\AppData\Local\Continuum\anaconda3\lib\site-packages\pysqldf\sqldf.py", line 117, in _del_table
    self.conn.execute("drop table " + tablename)

OperationalError: no such table: orders

我想念什么吗？在“学习 Pandas 查询语法”之前，希望使用pandasql/pysqldf答案。

R7中的Sqldf在i7/12G ram笔记本电脑上处理了多达1000万行的表上的复杂查询。

谢谢 !

最佳答案

确定，刚找到解决方案。

完全删除了Anaconda安装。

清理了相关文件夹。

从头开始安装带有PIP的Python 3.6。

然后pip安装了pandas，pandasql。

启动了我的脚本。在不到一秒钟的时间内执行脚本(pandasql)

关于python-3.x - PandaSQL很慢，我们在Stack Overflow上找到一个类似的问题：https://stackoverflow.com/questions/51590671/