本文介绍了将单个 pandas 数据框导出到多个SQL表(自动归一化)的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我有一个像这样的DataFrame,但是有数百万行和大约15列:

I have a DataFrame like this, but with millions of rows and about 15 columns:

       id    name  col1   col2  total
0 8252552 CHARLIE DESC1 VALUE1   5.99
1 8252552 CHARLIE DESC1 VALUE2  20.00
2 5699881    JOHN DESC1 VALUE1  39.00
2 5699881    JOHN DESC2 VALUE3  -3.99

DataFrame需要导出到多个表中的SQL数据库。我目前正在使用SQLite3来测试功能。这些表将是:

The DataFrame needs to be exported to a SQL database, in several tables. I'm currently using SQLite3, to test the functionality. The tables would be:


  • main( id INTEGER NOT NULL PRIMARY KEY AUTOINCREMENT UNIQUE,people_id INTEGER,col1_id INTEGER,col2_id INTEGER,总计)实型

  • people( id INTEGER NOT NULL PRIMARY KEY UNIQUE,name TEXT UNIQUE

  • col1( id整数非空主键自动增值唯一,名称为TEXT UNIQUE

  • col2( id整数非空主键自动输入的唯一性,名称为文本唯一性

  • main (id INTEGER NOT NULL PRIMARY KEY AUTOINCREMENT UNIQUE, people_id INTEGER, col1_id INTEGER, col2_id INTEGER, total REAL)
  • people (id INTEGER NOT NULL PRIMARY KEY UNIQUE, name TEXT UNIQUE)
  • col1 (id INTEGER NOT NULL PRIMARY KEY AUTOINCREMENT UNIQUE, name TEXT UNIQUE)
  • col2 (id INTEGER NOT NULL PRIMARY KEY AUTOINCREMENT UNIQUE, name TEXT UNIQUE)

主表应类似于对此:

  people_id col1_id col2_id  total
0   8252552       1       1   5.99
1   8252552       1       2  20.00
2   5699881       1       1  39.00
3   5699881       2       3  -3.99

其他表格,例如 people,像这样:

Other tables, like "people", like this:

     id    name
8252552 CHARLIE
5699881    JOHN

问题是,我找不到如何实现该usi在熊猫中使用 to_sql 方法的 schema 属性。使用Python,我会执行以下操作:

Thing is, I can't find how to achieve that using the schema attribute of the to_sql method in pandas. Using Python, I'd do something like this:

conn = sqlite3.connect("main.db")
cur = conn.cursor()
for row in dataframe:
    id = row["ID"]
    name = row["Name"]
    col1 = row["col1"]
    col2 = row["col2"]
    total = row["total"]
    cur.execute("INSERT OR IGNORE INTO people (id, name) VALUES (?, ?)", (id, name))
    people_id = cur.fetchone()[0]
    cur.execute("INSERT OR IGNORE INTO col1 (col1) VALUES (?)", (col1, ))
    col1_id = cur.fetchone()[0]
    cur.execute("INSERT OR IGNORE INTO col1 (col2) VALUES (?)", (col2, ))
    col2_id = cur.fetchone()[0]
    cur.execute("INSERT OR REPLACE INTO main (people_id, col1_id, col2_id, total) VALUES (?, ?, ?, ?)", (people_id, col1_id, col2_id, total ))
conn.commit()

将相应的值添加到表(人员,col1和col2),创建一个包含期望值和外键的行,并将该行添加到主表中。但是,有很多列和行,这可能会变得很慢。另外,我对这是最佳做法处理数据库时(我对数据库开发还不是很熟悉)

That would automatically add the corresponding values to the tables (people, col1 and col2), create a row with the desire values and foreign keys, and add that row to the primary table. However, there are a lot of columns and rows, and this might get very slow. Plus, I don't feel very confident that this is a "best practice" when dealing with databases (I'm fairly new to database development)

我的问题是:是否可以将pandas DataFrame导出到多个SQL表,并设置规范化规则,如上面的例子?有什么方法可以得到相同的结果,并且性能得到改善?

My question is: Is there a way to export a pandas DataFrame to multiple SQL Tables, setting the normalization rules, as in the above example? Is there any way to get the same result, with improved performance?

推荐答案

您可以先将Pandas数据帧拆分为几个子对象吗?数据帧根据数据库表,然后在每个子数据帧上应用 to_sql()方法?

Could you first split your Pandas data frame into several sub data frames according to the database tables, then apply the to_sql() method on each sub data frames?

这篇关于将单个 pandas 数据框导出到多个SQL表(自动归一化)的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持!

10-28 16:59