问题描述
我在SQL Server中有一个SP,它每分钟运行数百次,并且需要针对数据库检查传入的流量.目前,它会执行以下操作
I have an SP in SQL Server which runs hundreds of times a minute, and needs to check incoming traffic against a database. At the moment it does the following
INSERT INTO table
SELECT @value1,@value2 WHERE NOT EXISTS
(SELECT * FROM table WHERE value1 = @value1 AND value2 = @value2);
但是,我也可以选择
IF NOT EXISTS(SELECT * FROM table WHERE value1 = @value1 AND value2 = @value2)
INSERT INTO table (value1,value2) VALUES (@value1,@value2);
哪个会更快?我觉得它们之间并没有太大的区别,但是从历史上看我不是很擅长TSQL ... =/
Which would be faster? I get the feeling there's not much difference between them but I'm historically not very good at TSQL... =/
更新:糟糕...意在声明EXISTS使用多个值来查找记录是否存在,因此唯一约束将不起作用.编辑了样本以反映这一点……
UPDATE: Whoops... meant to state that the EXISTS uses more than 1 value to find if a record exists, so a unique constraint won't work. Edited the sample to reflect that...
推荐答案
在对此问题及其答案添加了无数评论之后,我将继续回答自己的问题.
After adding a gazillion comments on this question and its answers, I will have my own go on answering it.
我不希望在原始问题中提议的两个提议之间在性能上有任何重大差异.正如Ray所指出的那样,一方面,第二种方法可能使您不必为插入做一些准备工作,但另一方面,与第一种解决方案一样,RDBMS通常在批处理语句方面表现最佳.
I would not expect any major difference in performance between the two proposed proposed in the original question. On one hand, as pointed out by Ray, the second approach might save you from doing some preparations for the insert, but on the other hand, an RDBMS usually performs best with batch statements, as in the first solution.
KM和DVK建议添加UNIQUE
约束,这将使唯一性测试隐式,但是将要求您在INSERT
语句周围添加某种错误处理.我很难确定为什么要增加任何其他性能,假设您已经有一个涵盖两列的索引. 如果您没有这样的索引,请添加它,然后重新考虑您对更高性能的需求.
KM and DVK suggest adding a UNIQUE
constraint, which will make the uniqueness test implicit, but will require you to add some kind of error handling around your INSERT
statement. I have a hard time spotting why this should add any additional performance, assuming that you already have an index covering the two columns. If you do not have such index, add it, and reconsider your need for more performance.
无论是显式还是隐式执行唯一性检查,对于AFAIK都无关紧要.如果通过在DBMS的内部"进行检查而获得了任何收益,那么当存在重复项时,与提高和处理错误相关的开销可能会吞噬掉这一收益.
Whether the uniqueness check is performed explicit or implicit should not matter AFAIK. If anything is gained by having the check done "inside" the stomach of the DBMS, that gain might just be eaten up by overhead associated with raising and handling errors when duplicates exists.
最重要的是:假设已经有一个索引,如果您仍然渴望提高性能,我的建议是对建议的三种解决方案进行实证测试.编写一个模拟预期输入数据的小程序,然后将这三个解决方案中的每一个都吹走数十亿行,其中包括合理数量的重复项.为此,请确保发布您的结果:-)
The bottom line: Assuming an index is already in place, if you still find yourself lusting for performance, my recommendation is that you perform empirical tests on the three suggested solutions. Cook up a small program that simulates the expected input data, and blow each of the three solutions away with a few billion rows, including a plausible amount of duplicates. do this, be sure to post your results :-)
这篇关于在INSERT之前或之后,EXISTS哪个更快?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持!