问题描述
例如,我有一个存储过程,用于从csv文件导入数据并将读取的数据写入SQL表.我有一个定义如下的表:
For instance I've got a stored procedure to import data from csv files and write the read data into a SQL table.I've got a table defined as below:
CREATE TABLE person (id int, name text, age int, married boolean);
首先,我检查记录是否已经存在,如果存在,我会更新,如果不存在,请插入.每个记录字段可能具有不同的类型,因此SQL命令的结果将分配给一个标量变量列表:
First I check if the record exist already, if exists I do update, if doesn't - insert.Each field of record may have a different type, so the result of a SQL command are assigned to a list of scalar variables:
SELECT name, age, married INTO v_name, v_age, v_married [..]
让我们假设每一列都被声明为可选(允许NULL).那么检查哪个变量(v_name,v_age,v_married)不是NULL并且可以处理的最佳方法是什么?
Let's assume every column is declared as optional (NULL allowed). What's the best way then to check which variable (v_name, v_age, v_married) is not NULL and can be processed?
我找到了许多解决方案:
I've found many solutions:
- 如果没有找到
- 何时NO_DATA_FOUND然后
- 如果v_age不为空,则[...]
或者我现在正在使用的动态解决方案,使用上面提到的最后一种方法,当我必须检查多列(col)时:
or dynamic solution I'm using now using the last way I've mentioned above, when I have to check multiple columns (col):
list_of_columns := ARRAY['name','age','married'];
FOREACH x IN ARRAY list_of_columns LOOP
EXECUTE 'SELECT ' || x
|| ' FROM person
WHERE id = ' || quote_literal(v_id)
INTO y;
IF x = 'name' AND (y != v_name OR y IS NULL) THEN
UPDATE person
SET name = v_name
WHERE id = v_id;
ELSIF x = 'age' AND (y != v_age OR y IS NULL) THEN
UPDATE person
SET age = v_age
WHERE id = v_id;
ELSIF x = 'married' AND (y != v_married OR y IS NULL) THEN
UPDATE person
SET married= v_married
WHERE id = v_id;
END IF;
END LOOP;
我正在寻求考虑最佳实践和性能的最佳解决方案.感谢您的帮助!
I'm looking for the best solutions having regard to the best practice and performance.Any help is appreciated!
推荐答案
我认为,您可以彻底沿以下方面改进整个过程:
I think, you can radically improve the whole procedure along these lines:
BEGIN;
CREATE TEMP TABLE tmp_p ON COMMIT DROP AS
SELECT * FROM person LIMIT 0;
COPY tmp_p FROM '/absolute/path/to/file' FORMAT csv;
UPDATE person p
SET name = t.name
,age = t.age
,married = t.person
FROM tmp_p t
WHERE p.id = t.id
AND (p.name IS DISTINCT FROM t.name OR
p.age IS DISTINCT FROM t.age OR
p.married IS DISTINCT FROM t.married);
INSERT INTO person p(id, name, age, married, ...)
SELECT id, name, age, married, ...
FROM tmp_p t
WHERE NOT EXISTS (SELECT 1 FROM person x WHERE x.id = t.id);
COMMIT; -- drops temp table because of ON COMMIT DROP
解释
-
COPY
将您的CSV文件复制到具有匹配布局的临时表中.我用CREATE TABLE AS ... LIMIT 0
复制了目标表的布局,您可能需要调整...Explain
COPY
your CSV file to a temporary table with matching layout. I copied the layout of the target table withCREATE TABLE AS ... LIMIT 0
, you may need to adapt ...UPDATE
现有行.避免使用WHERE
子句的最后3行进行空更新(什么都不会改变).
(可能有用,但是实际上在您的代码中没有.)UPDATE
existing rows. Avoid empty updates (nothing would change) with the last 3 lines in theWHERE
clause.
(May be useful, but that wasn't actually in your code.)INSERT
不存在的行.为此,请使用NOT EXISTS
半联接.INSERT
non-existing rows. Use aNOT EXISTS
semi-join for that.所有事务都在同一笔交易中,因此在发生问题的过程中,您不会半生不熟.临时表在事务结束时被删除,因为我是用这种方式创建的.
All in one transaction so you don't end up with a half-baked result in the case of a problem along the way. The temp table is dropped at the end of the transaction because I created it that way.
这篇关于检查plpgsql中是否存在变量的最佳实践?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持!