你能解释为什么吗

你能解释为什么吗

本文介绍了在SQL中发现性能差异很大(1小时到1分钟)。你能解释为什么吗?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

以下查询分别在标准机器上花费70分钟和1分钟记录1百万条记录。可能的原因是什么?



查询[01:10:00]

  SELECT * 
FROM cdc.fn_cdc_get_net_changes_dbo_PartitionTest(
CASE WHEN sys.fn_cdc_increment_lsn(0x00)< sys.fn_cdc_get_min_lsn('dbo_PartitionTest')
THEN sys.fn_cdc_get_min_lsn('dbo_PartitionTest')
ELSE sys.fn_cdc_increment_lsn(0x00)END
,sys.fn_cdc_get_max_lsn()
,'all with mask')
WHERE __ $ operation<> 1

修改查询[00:01:10]



pre $ DECLARE @MinLSN binary(10)
DECLARE @MaxLSN binary(10)
SELECT @ MaxLSN = sys.fn_cdc_get_max_lsn()
SELECT @ MinLSN = CASE WHEN sys.fn_cdc_increment_lsn(0x00)< sys.fn_cdc_get_min_lsn('dbo_PartitionTest')
THEN sys.fn_cdc_get_min_lsn('dbo_PartitionTest')
ELSE sys.fn_cdc_increment_lsn(0x00)END

SELECT *
FROM cdc.fn_cdc_get_net_changes_dbo_PartitionTest(
@MinLSN,@MaxLSN,'all with mask')WHERE __ $ operation<> 1






[修改]



我尝试使用类似的函数重新创建场景,以查看是否为每一行计算参数。

  CREATE FUNCTION Fn_Test(@a decimal)RETURNS TABLE 
AS
RETURN

SELECT @a参数,Getdate()Dt,PartitionTest。*
FROM PartitionTest
);

SELECT * FROM Fn_Test(RAND(DATEPART(s,GETDATE())))

但是,对于在38秒内处理的百万条记录,我获得了参数列的相同值。

即使确定性标量函数每行至少被评估一次。如果相同的确定性标量函数在具有相同参数的同一行上出现多次,我相信只有这样才会评估一次 - 例如,在 CASE WHEN fn_X(a,b,c)> 0 THEN fn_X(a,b,c)ELSE 0 END 或类似的东西。



我认为你的RAND问题是因为你继续重新调用:

我已采取缓存标量函数结果,正如你所指出的那样 - 甚至可以预先计算标量函数结果表并加入到它们中。最终必须做一些事情来使标量函数有效。没错,最好的选择是CLR - 显然这些表现远超SQL SQL UDF。不幸的是,我无法在当前环境中使用它们。


The following queries are taking 70 minutes and 1 minute respectively on a standard machine for 1 million records. What could be the possible reasons?

Query [01:10:00]

SELECT *
FROM cdc.fn_cdc_get_net_changes_dbo_PartitionTest(
    CASE WHEN sys.fn_cdc_increment_lsn(0x00)<sys.fn_cdc_get_min_lsn('dbo_PartitionTest')
        THEN sys.fn_cdc_get_min_lsn('dbo_PartitionTest')
        ELSE sys.fn_cdc_increment_lsn(0x00) END
    , sys.fn_cdc_get_max_lsn()
    , 'all with mask')
WHERE __$operation <> 1

Modified Query [00:01:10]

DECLARE @MinLSN binary(10)
DECLARE @MaxLSN binary(10)
SELECT @MaxLSN= sys.fn_cdc_get_max_lsn()
SELECT @MinLSN=CASE WHEN sys.fn_cdc_increment_lsn(0x00)<sys.fn_cdc_get_min_lsn('dbo_PartitionTest')
        THEN sys.fn_cdc_get_min_lsn('dbo_PartitionTest')
        ELSE sys.fn_cdc_increment_lsn(0x00) END

SELECT *
FROM cdc.fn_cdc_get_net_changes_dbo_PartitionTest(
        @MinLSN, @MaxLSN, 'all with mask') WHERE __$operation <> 1


[Modified]

I tried to recreate the scenario with a similar function to see if the parameters are evaluated for each row.

CREATE FUNCTION Fn_Test(@a decimal)RETURNS TABLE
AS
RETURN
(
    SELECT @a Parameter, Getdate() Dt, PartitionTest.*
    FROM PartitionTest
);

SELECT * FROM Fn_Test(RAND(DATEPART(s,GETDATE())))

But I am getting the same value for the column 'Parameter' for a a million records processed in 38 seconds.

解决方案

Even deterministic scalar functions are evaluated at least once per row. If the same deterministic scalar function occurs multiple times on the same "row" with the same parameters, I believe only then will it be evaluated once - e.g. in a CASE WHEN fn_X(a, b, c) > 0 THEN fn_X(a, b, c) ELSE 0 END or something like that.

I think your RAND problem is because you continue to reseed:

I have taken to caching scalar function results as you have indicated - even going so far as to precalculate tables of scalar function results and joining to them. Something has to be done eventually to make scalar functions perform. Right not, the best option is the CLR - apparently these far outperform SQL UDFs. Unfortunately, I cannot use them in my current environment.

这篇关于在SQL中发现性能差异很大(1小时到1分钟)。你能解释为什么吗?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持!

07-31 12:31