将单个 pandas 数据框导出到多个SQL表（自动归一化）

本文介绍了将单个 pandas 数据框导出到多个SQL表（自动归一化）的处理方法，对大家解决问题具有一定的参考价值，需要的朋友们下面随着小编来一起学习吧！

问题描述

我有一个像这样的DataFrame，但是有数百万行和大约15列：

I have a DataFrame like this, but with millions of rows and about 15 columns:

       id    name  col1   col2  total
0 8252552 CHARLIE DESC1 VALUE1   5.99
1 8252552 CHARLIE DESC1 VALUE2  20.00
2 5699881    JOHN DESC1 VALUE1  39.00
2 5699881    JOHN DESC2 VALUE3  -3.99

DataFrame需要导出到多个表中的SQL数据库。我目前正在使用SQLite3来测试功能。这些表将是：

The DataFrame needs to be exported to a SQL database, in several tables. I'm currently using SQLite3, to test the functionality. The tables would be:

main（ id INTEGER NOT NULL PRIMARY KEY AUTOINCREMENT UNIQUE，people_id INTEGER，col1_id INTEGER，col2_id INTEGER，总计）实型）

people（ id INTEGER NOT NULL PRIMARY KEY UNIQUE，name TEXT UNIQUE ）

col1（ id整数非空主键自动增值唯一，名称为TEXT UNIQUE ）

col2（ id整数非空主键自动输入的唯一性，名称为文本唯一性）

main (id INTEGER NOT NULL PRIMARY KEY AUTOINCREMENT UNIQUE, people_id INTEGER, col1_id INTEGER, col2_id INTEGER, total REAL)
people (id INTEGER NOT NULL PRIMARY KEY UNIQUE, name TEXT UNIQUE)
col1 (id INTEGER NOT NULL PRIMARY KEY AUTOINCREMENT UNIQUE, name TEXT UNIQUE)
col2 (id INTEGER NOT NULL PRIMARY KEY AUTOINCREMENT UNIQUE, name TEXT UNIQUE)

主表应类似于对此：

  people_id col1_id col2_id  total
0   8252552       1       1   5.99
1   8252552       1       2  20.00
2   5699881       1       1  39.00
3   5699881       2       3  -3.99

其他表格，例如 people，像这样：

Other tables, like "people", like this:

     id    name
8252552 CHARLIE
5699881    JOHN

问题是，我找不到如何实现该usi在熊猫中使用 to_sql 方法的 schema 属性。使用Python，我会执行以下操作：

Thing is, I can't find how to achieve that using the schema attribute of the to_sql method in pandas. Using Python, I'd do something like this:

conn = sqlite3.connect("main.db")
cur = conn.cursor()
for row in dataframe:
    id = row["ID"]
    name = row["Name"]
    col1 = row["col1"]
    col2 = row["col2"]
    total = row["total"]
    cur.execute("INSERT OR IGNORE INTO people (id, name) VALUES (?, ?)", (id, name))
    people_id = cur.fetchone()[0]
    cur.execute("INSERT OR IGNORE INTO col1 (col1) VALUES (?)", (col1, ))
    col1_id = cur.fetchone()[0]
    cur.execute("INSERT OR IGNORE INTO col1 (col2) VALUES (?)", (col2, ))
    col2_id = cur.fetchone()[0]
    cur.execute("INSERT OR REPLACE INTO main (people_id, col1_id, col2_id, total) VALUES (?, ?, ?, ?)", (people_id, col1_id, col2_id, total ))
conn.commit()

将相应的值添加到表（人员，col1和col2），创建一个包含期望值和外键的行，并将该行添加到主表中。但是，有很多列和行，这可能会变得很慢。另外，我对这是最佳做法处理数据库时（我对数据库开发还不是很熟悉）

That would automatically add the corresponding values to the tables (people, col1 and col2), create a row with the desire values and foreign keys, and add that row to the primary table. However, there are a lot of columns and rows, and this might get very slow. Plus, I don't feel very confident that this is a "best practice" when dealing with databases (I'm fairly new to database development)

我的问题是：是否可以将pandas DataFrame导出到多个SQL表，并设置规范化规则，如上面的例子？有什么方法可以得到相同的结果，并且性能得到改善？

My question is: Is there a way to export a pandas DataFrame to multiple SQL Tables, setting the normalization rules, as in the above example? Is there any way to get the same result, with improved performance?

Integer

将单个 pandas 数据框导出到多个SQL表（自动归一化）

问题描述

推荐答案