问题描述
我确定这是常见的地方,但Google并没有帮助。我试图在PostgreSQL 9.1中编写一个简单的存储过程,它将从父 cpt
表中删除重复的条目。父表 cpt
由一个子表实验室
引用,定义为:
CREATE TABLE lab(
recid serial NOT NULL,
cpt_recid integer,
........
CONSTRAINT cs_cpt FOREIGN KEY(cpt_recid)
参考文献cpt(recid)MATCH SIMPLE
更新无删除限制操作,
...
);
我遇到的最大问题是如何获取失败的记录,以便我可以使用它在 EXCEPTION
子句中将孩子行从 lab
移动到一个可接受的键,然后循环并删除不必要的来自 cpt
表的记录。
这是(非常错误的)代码:
CREATE OR更换功能h_RemoveDuplicateCPT()
返回空值AS
$ BODY $
BEGIN
LOOP
BEGIN
DELETE FROM cpt
WHERE recid IN(
SELECT recid
FROM(
SELECT recid,
row_number()over(partition BY cdesc ORDER BY recid)AS rnum
FROM cpt)t
WHERE t.rnum> 1)
RETURNING recid;
IF count = 0 THEN
返回;
END IF;
EXCEPTION WHEN foreign_key_violation THEN
RAISE通知'固定unique_violation';
RAISE通知'recid是%',recid;
END;
END LOOP;
END;
$ BODY $
LANGUAGE plpgsql VOLATILE;
您可以使用单一SQL语句与。
无需功能(但当然可能),无循环,无异常处理:
pre>
WITH plan AS(
SELECT recid,cdesc,min(recid)OVER(PARTITION BY cdesc)AS master_recid
FROM cpt
)
,upd_lab AS(
UPDATE lab l
SET cpt_recid = p.master_recid - 链接到master recid ...
FROM plan p
WHERE l.cpt_recid = p.recid
AND p.recid<> p.master_recid - ...只有当没有链接到master
)
DELETE FROM cpt c
使用计划p
WHERE c.recid = p.recid
AND p.recid<> p.master_recid - ...只有如果不是master
RETURNING c.recid; - 任意返回全部删除(dupe)ID
strong> 更快更干净。循环比较昂贵,异常处理比较贵。
更重要的是, lab
中的引用被重定向到 cpt
自动,这不在您的原始代码。所以你可以一次删除所有的副本。
如果你喜欢,你可以将它包装在一个plpgsql或SQL函数中。
说明
-
在第一个CTE
计划
中,主队每组重复。在您的情况下,最小recid
每cdesc
。 -
在第二个CTE
upd_lab
将所有引用dupe的行重定向到cpt
中的主列。 -
最后,删除重复,这不会引起异常,因为相应的行几乎在同一时间链接到剩余的主列。 p>
ON DELETE RESTRICT
所有CTE和语句的主要查询在相同的基础表快照上运行,实际上 并发 。他们没有看到彼此对基础表的影响:
- 引发异常,因为:
但是,上述语句是单一命令,而:
粗体强调我的。您只需要注意写入同一个表的并发事务,但这是一个普遍的考虑因素,而不是此任务的特定。
一个例外适用于
UNIQUE
和PRIMARY KEY
约束,但这并不涉及这个案例:
I'm sure this is common place, but Google is not helping. I am trying to write a simple stored procedure in PostgreSQL 9.1 that will remove duplicate entries from a parent
cpt
table. The parent tablecpt
is referenced by a child tablelab
defined as:CREATE TABLE lab ( recid serial NOT NULL, cpt_recid integer, ........ CONSTRAINT cs_cpt FOREIGN KEY (cpt_recid) REFERENCES cpt (recid) MATCH SIMPLE ON UPDATE NO ACTION ON DELETE RESTRICT, ... );
The biggest problem I'm having is how to obtain the record which failed so that I can use it in the
EXCEPTION
clause to move the children rows fromlab
to one acceptable key, then loop back through and delete the unnecessary records from thecpt
table.Here is the (very wrong) code:
CREATE OR REPLACE FUNCTION h_RemoveDuplicateCPT() RETURNS void AS $BODY$ BEGIN LOOP BEGIN DELETE FROM cpt WHERE recid IN ( SELECT recid FROM ( SELECT recid, row_number() over (partition BY cdesc ORDER BY recid) AS rnum FROM cpt) t WHERE t.rnum > 1) RETURNING recid; IF count = 0 THEN RETURN; END IF; EXCEPTION WHEN foreign_key_violation THEN RAISE NOTICE 'fixing unique_violation'; RAISE NOTICE 'recid is %' , recid; END; END LOOP; END; $BODY$ LANGUAGE plpgsql VOLATILE;
解决方案You can do this much more efficiently with a single SQL statement with data-modifying CTEs.
No function required (but possible, of course), no looping, no exception handling:WITH plan AS ( SELECT recid, cdesc, min(recid) OVER (PARTITION BY cdesc) AS master_recid FROM cpt ) , upd_lab AS ( UPDATE lab l SET cpt_recid = p.master_recid -- link to master recid ... FROM plan p WHERE l.cpt_recid = p.recid AND p.recid <> p.master_recid -- ... only if not linked to master ) DELETE FROM cpt c USING plan p WHERE c.recid = p.recid AND p.recid <> p.master_recid -- ... only if not master RETURNING c.recid; -- optionaly return all deleted (dupe) IDs
This should be much faster and cleaner. Looping is comparatively expensive, exception handling is comparatively even more expensive.
Much more importantly, references inlab
are redirected to the respective master row incpt
automatically, which wasn't in your original code, yet. So you can delete all dupes at once.You can wrap this in a plpgsql or SQL function if you like.
Explanation
In the first CTE
plan
, identify the master-row per group of dupes. In your case the row with the minimumrecid
percdesc
.In the second CTE
upd_lab
redirect all rows referencing a dupe to the master row incpt
.Finally, delete dupes, which is not going to raise exceptions because depending rows are being linked to the remaining master-row virtually at the same time.
ON DELETE RESTRICT
All CTEs and the main query of a statement operate on the same snapshot of underlying tables, virtually concurrently. They don't see each others' effects on underlying tables:
One might expect a FK constraint with
ON DELETE RESTRICT
to raise exceptions because, per documentation:However, the above statement is a single command and, per documentation:
Bold emphasis mine. You only need to be aware of concurrent transactions writing to the same tables, but that's a general consideration, not specific to this task.
An exception applies for
UNIQUE
andPRIMARY KEY
constraint, but that does not concern this case:这篇关于如何使用外键依赖关系删除重复的行?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持!