本文介绍了用pandas DataFrame替换mysql数据库表中的行的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

Python版本-2.7.6

Pandas版本-0.17.1

MySQLdb版本-1.2.5

在我的数据库(PRODUCT)中,有一个表(XML_FEED).表XML_FEED巨大(上百万条记录)我有一个pandas.DataFrame()(PROCESSED_DF).数据框具有数千行.

现在我需要运行

REPLACE INTO TABLE PRODUCT.XML_FEED
(COL1, COL2, COL3, COL4, COL5),
VALUES (PROCESSED_DF.values)

问题:-

有没有办法在熊猫中运行REPLACE INTO TABLE?我已经检查了pandas.DataFrame.to_sql(),但这不是我所需要的.我不喜欢用pandas读取XML_FEED表,因为它非常大.

解决方案

直到此版本(0.17.1),我都无法在熊猫中找到任何直接的方法.我报告了相同的功能要求.我在项目中通过使用MySQLdb然后使用DataFrame.to_sql(if_exists='append')

执行一些查询来完成此操作

假设

1)product_id是我在表PRODUCT

中的主键

2)feed_id是我在表XML_FEED中的主键.

简单版本

import MySQLdb
import sqlalchemy
import pandas

con = MySQLdb.connect('localhost','root','my_password', 'database_name')
con_str = 'mysql+mysqldb://root:my_password@localhost/database_name'
engine = sqlalchemy.create_engine(con_str) #because I am using mysql
df = pandas.read_sql('SELECT * from PRODUCT', con=engine)
df_product_id = df['product_id']
product_id_str = (str(list(df_product_id.values))).strip('[]')
delete_str = 'DELETE FROM XML_FEED WHERE feed_id IN ({0})'.format(product_id_str)
cur = con.cursor()
cur.execute(delete_str)
con.commit()
df.to_sql('XML_FEED', if_exists='append', con=engine)# you can use flavor='mysql' if you do not want to create sqlalchemy engine but it is depreciated

请注意:-REPLACE [INTO]语法允许我们将INSERT行插入表中,除了如果发生UNIQUE KEY(包括PRIMARY KEY)冲突,旧行在新的INSERT之前被删除,因此没有冲突. >

Python Version - 2.7.6

Pandas Version - 0.17.1

MySQLdb Version - 1.2.5

In my database ( PRODUCT ) , I have a table ( XML_FEED ). The table XML_FEED is huge ( Millions of record )I have a pandas.DataFrame() ( PROCESSED_DF ). The dataframe has thousands of rows.

Now I need to run this

REPLACE INTO TABLE PRODUCT.XML_FEED
(COL1, COL2, COL3, COL4, COL5),
VALUES (PROCESSED_DF.values)

Question:-

Is there a way to run REPLACE INTO TABLE in pandas? I already checked pandas.DataFrame.to_sql() but that is not what I need. I do not prefer to read XML_FEED table in pandas because it very huge.

解决方案

Till this version (0.17.1) I am unable find any direct way to do this in pandas. I reported a feature request for the same.I did this in my project with executing some queries using MySQLdb and then using DataFrame.to_sql(if_exists='append')

Suppose

1) product_id is my primary key in table PRODUCT

2) feed_id is my primary key in table XML_FEED.

SIMPLE VERSION

import MySQLdb
import sqlalchemy
import pandas

con = MySQLdb.connect('localhost','root','my_password', 'database_name')
con_str = 'mysql+mysqldb://root:my_password@localhost/database_name'
engine = sqlalchemy.create_engine(con_str) #because I am using mysql
df = pandas.read_sql('SELECT * from PRODUCT', con=engine)
df_product_id = df['product_id']
product_id_str = (str(list(df_product_id.values))).strip('[]')
delete_str = 'DELETE FROM XML_FEED WHERE feed_id IN ({0})'.format(product_id_str)
cur = con.cursor()
cur.execute(delete_str)
con.commit()
df.to_sql('XML_FEED', if_exists='append', con=engine)# you can use flavor='mysql' if you do not want to create sqlalchemy engine but it is depreciated

Please note:-The REPLACE [INTO] syntax allows us to INSERT a row into a table, except that if a UNIQUE KEY (including PRIMARY KEY) violation occurs, the old row is deleted prior to the new INSERT, hence no violation.

这篇关于用pandas DataFrame替换mysql数据库表中的行的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持!

08-11 15:03