python - 减少执行时间以将POstgreSQL表带到具有500000行的Pandas的替代方法？

我有一个PostgreSQL数据库，有70个表，我希望访问其中一个名为“hub_psm_log_inter”的特定表。我想把它带给大熊猫，并对它进行一些手术。我正在访问的表的形状是（500000，23），将来可能会增加。执行psql.read_sql_查询大约需要3分钟。我想减少它的时间。对我来说重要的行是where（cust_hub_id=358&status_switch=1）。dfúon的形状只有10000行。

import numpy as np
import pandas as pd

import psycopg2 as pg
import pandas.io.sql as psql

conn = pg.connect(
    database = '',
    user = '',
    password = '',
    host = '',
    port = ''
)

df2 = psql.read_sql_query("SELECT * FROM hub_psm_log_inter", conn)


df4 = df2[df2.cust_hub_id == 358]
df4['status_switch'] = pd.to_numeric(df4['status_switch'], errors='coerce')
df_on = df4[df4.status_switch == 1]

最佳答案

在SQL查询中使用WHERE子句：

SELECT * FROM hub_psm_log_inter WHERE cust_hub_id = 358 AND status_switch = 1

从代码的外观来看，status_switch可能作为字符串存储在表中，因此您可能需要引用它，即。

SELECT * FROM hub_psm_log_inter WHERE cust_hub_id = 358 AND status_switch = '1'

关于python - 减少执行时间以将POstgreSQL表带到具有500000行的Pandas的替代方法？，我们在Stack Overflow上找到一个类似的问题：https://stackoverflow.com/questions/44965150/