我对熊猫做了很多SQL,并且遇到了以下挑战。

我有一个数据框,看起来像

UserID, AccountNo, AccountName
123,    12345,     'Some name'
...


我想对每个帐号进行操作,我想添加一列称为总收入的列,该列是从mysql数据库中获得的,所以我在想类似的事情,

for accountno in df['AccountNo']:
    df1 = pd.read_sql(('select sum(VBRK_NETWR) as sum from sapdata2016.orders where VBAK_BSARK="ZEDI" and VBRK_KUNAG = %s;') % accountno, conn)


我需要扩展数据框

UserID, AccountNo, AccountName, TotalRevenue
123,    12345,     'Some name', df1
...


到目前为止,我拥有的代码(并且无法正常工作)会引发getitem错误

sets3 = []
i=0
for accountno in df5['kna1_kunnr']:
    df1 = pd.read_sql(('select sum(VBRK_NETWR) as sum from sapdata2016.orders where VBAK_BSARK="ZEDI" and VBRK_KUNAG = %s;') % accountno, conn)
    df2 = pd.DataFrame([(df5['userid'][i], df5['kna1_kunnr'][i], accountno, df5['kna1_name1'][i], df1['sum'][0])], columns=['User ID', 'AccountNo', 'tjeck', 'AccountName', 'Revenue'])
    sets3.append(df2)
    i += 1

df6 = pd.concat(sets3)


这个想法/代码不是很漂亮,我想知道是否有更好/更精巧的方法来实现?

最佳答案

考虑将熊猫数据作为临时表导出到MySQL,然后运行将您的熊猫数据与TotalRevenue汇总查询结合在一起的SQL查询。然后,将结果集读入pandas数据框。这种方法避免了任何循环。

from sqlalchemy import create_engine
...

# SQL ALCHEMY CONNECTION (PREFERRED OVER RAW CONNECTION)
engine = create_engine('mysql://user:pwd@localhost/database')
# engine = create_engine("mysql+pymysql://user:pwd@hostname:port/database") # load pymysql

df1.to_sql("mypandastemptable", con=engine, if_exists='replace')

sql = """SELECT t.UserID, t.AccountNo, t.AccountName, agg.TotalRevenue
         FROM mypandastemptable t
         LEFT JOIN
            (SELECT VBRK_KUNAG as AccountNo
                    SUM(VBRK_NETWR) as TotalRevenue
             FROM sapdata2016.orders
             WHERE VBAK_BSARK='ZEDI'
             GROUP BY VBRK_KUNAG) agg
         ON t.AccountNo = agg.AccountNo)
"""

newdf = pd.read_sql(sql, con=engine)


当然,反之亦然,将现有数据框的两个pandas数据框与分组的聚合查询结果集合并:

sql = """SELECT VBRK_KUNAG as AccountNo
                SUM(VBRK_NETWR) as TotalRevenue
         FROM sapdata2016.orders
         WHERE VBAK_BSARK='ZEDI'
         GROUP BY VBRK_KUNAG
"""

df2 = pd.read_sql(sql, con=engine)

newdf = df1.merge(df2, on='AccountNo', how='left')

关于python - SQL值更新 Pandas 数据框,我们在Stack Overflow上找到一个类似的问题:https://stackoverflow.com/questions/45983329/

10-11 03:21
查看更多