如何在PostreSQL中批量插入新行

本文介绍了如何在PostreSQL中批量插入新行的处理方法，对大家解决问题具有一定的参考价值，需要的朋友们下面随着小编来一起学习吧！

问题描述

我的产品列表（300万件）没有ID - 只有标题。但是我不知道DB中已经存在哪些标题。新产品（约290万件）必须加入DB。之后，我必须知道每个产品的ID（新的和现有的）。

I have list of products (3 million items) without IDs - only titles. But I don't know which titles already exist in DB. New products (about 2.9 million items) must be added into DB. After that I must know ID for each products (new and existing).

在PostgreSQL中有最快的方法吗？我可以根据需要更改数据库（添加默认值，添加列等）。

Is there the fastest way to do it in PostgreSQL? I can change DB as needed (add default values, add columns etc.).

导入数据

将所有内容设置为临时暂存表，并只在目标表中插入新的标题。

Import data

COPY everything to a temporary staging table and insert only new titles into your target table.

CREATE TEMP TABLE tmp(title text);

COPY tmp FROM 'path/to/file.csv';
ANALYZE tmp;

INSERT INTO tbl
SELECT DISTINCT tmp.title
FROM   tmp
LEFT   JOIN tbl USING (title)
WHERE  tbl.title IS NULL;

应该使用列 tbl_id tbl 。

LEFT JOIN / IS NULL 构造取消已经存在的标题的资格。 NOT EXISTS 将是另一种可能性。

The LEFT JOIN / IS NULL construct disqualifies already existing titles. NOT EXISTS would be another possibility.

DISTINCT 在临时表中的传入数据中复制 tmp 。

DISTINCT prevents duplicates in the incoming data in the temporary table tmp.

有助于确保查询计划员选择一个明智的计划和临时表不通过autovacuum进行分析。

ANALYZE is useful to make sure the query planner picks a sensible plan, and temporary tables are not analyzed by autovacuum.

由于您有300万个项目，因此可能需要为 temp_buffer （）

Since you have 3 million items, it might pay to raise the setting for temp_buffer (for this session only):

SET temp_buffers = 1000MB;

或者你可以负担得起，足够让RAM中的临时表保持更快。注意：必须先在会话中完成，然后才能创建任何临时对象。

Or however much you can afford and is enough to hold the temp table in RAM, which is much faster. Note: must be done first in the session - before any temp objects are created.

要查看导入数据的所有ID：

To see all IDs for the imported data:

SELECT tbl.tbl_id, tbl.title
FROM   tbl
JOIN   tmp USING (title)

在同一个会话中！临时表在会话结束时自动删除。

In the same session! A temporary table is dropped automatically at the end of the session.

这篇关于如何在PostreSQL中批量插入新行的文章就介绍到这了，希望我们推荐的答案对大家有所帮助，也希望大家多多支持！

analyze