我有7个大表,可以随时存储1亿到100万行。我叫他们LargeTable1LargeTable2LargeTable3LargeTable4LargeTable7。这些表大多是静态的:既没有更新,也没有新的插入。它们每两周或每月只改变一次,当它们被截断并在每个寄存器中插入新的一批寄存器时。
所有这些表都有三个共同的字段:HeadquarterCountryFileHeadquarterCountry是格式为“000”的数字,但在其中两个表中,由于某些其他系统需要,它们被解析为int
我还有一个小得多的表,叫做Headquarters,上面有每个总部的信息。这个表的条目很少。实际上,最多1000个。
现在,我需要创建一个存储过程,它返回所有出现在大型表中但在Headquarters表中不存在或已被删除的总部(该表是逻辑删除的:它有一个DeletionDate字段来检查这个)。
这是我尝试过的查询:

CREATE PROCEDURE deletedHeadquarters
AS
BEGIN
    DECLARE @headquartersFiles TABLE
    (
        hq int,
        countryFile varchar(MAX)
    );

    SET NOCOUNT ON

    INSERT INTO @headquartersFiles
    SELECT headquarter, CONCAT(country, ' (', file, ')')
    FROM
    (
        SELECT DISTINCT CONVERT(int, headquarter) as headquarter,
                        CONVERT(int, country) as country,
                        file
        FROM            LargeTable1
        UNION
        SELECT DISTINCT headquarter,
                        country,
                        file
        FROM            LargeTable2
        UNION
        SELECT DISTINCT headquarter,
                        country,
                        file
        FROM            LargeTable3
        UNION
        SELECT DISTINCT headquarter,
                        country,
                        file
        FROM            LargeTable4
        UNION
        SELECT DISTINCT headquarter,
                        country,
                        file
        FROM            LargeTable5
        UNION
        SELECT DISTINCT headquarter,
                        country,
                        file
        FROM            LargeTable6
        UNION
        SELECT DISTINCT headquarter,
                        country,
                        file
        FROM            LargeTable7
    ) TC

    SELECT  RIGHT('000' + CAST(st.headquarter AS VARCHAR(3)), 3) as headquarter,
            MAX(s.deletionDate) as deletionDate,
            STUFF
            (
                (SELECT DISTINCT ', ' + st2.countryFile
                FROM @headquartersFiles st2
                WHERE st2.headquarter = st.headquarter
                FOR XML PATH('')),
                1,
                1,
                ''
            ) countryFile
    FROM    @headquartersFiles as st
    LEFT JOIN headquarters s ON CONVERT(int, s.headquarter) = st.headquarter
    WHERE   s.headquarter IS NULL
       OR   s.deletionDate IS NOT NULL
    GROUP BY st.headquarter

END

此SP的性能不适合我们的应用程序。当前大约需要50秒才能完成,每个表的总行数如下(只是让您了解大小):
大表1:1516666行
大表2:645740行
大型3:1950121行
大型表4:779336行
大型5:1100999行
大表6:16499行
大型7:24454行
我能做些什么来提高性能?我试着做了以下几点,没有太大的区别:
按批插入到本地表中,不包括我已经插入的那些总部,然后为那些重复的总部更新countryfile字段
为该联合查询创建视图
为总部字段的largetables创建索引
我还考虑过在LargeTables更改之后将这些丢失的总部插入一个永久表中,但是Headquarters表可以更频繁地更改,我不需要更改它的模块来保持这些东西的整洁和更新。但如果这是最好的选择,我会选择的。
谢谢

最佳答案

每一步都要过滤。但首先,修改headquarters表,使其具有适合您需要的类型。……以及索引:

alter table headquarters add headquarter_int as (cast(headquarter as int));
create index idx_headquarters_int on headquarters(headquarters_int);

SELECT DISTINCT headquarter, country, file
FROM LargeTable5 lt5
WHERE NOT EXISTS (SELECT 1
                  FROM headquarters s
                  WHERE s.headquarter_int = lt5.headquarter and s.deletiondate is not null
                 );

然后,需要LargeTable5(headquarter, country, file)上的索引。
这应该需要不到5秒的时间。如果是,那么构造完整的查询,确保相关子查询中的类型匹配,并且在完整表上有正确的索引。使用union删除表之间的重复项。

关于sql - 大表的UNION性能问题,我们在Stack Overflow上找到一个类似的问题:https://stackoverflow.com/questions/38150933/

10-10 02:07