考虑将更改应用于汇总表的问题。现有行必须更新,而新行必须插入。我的方法如下:

  • 将所有更改插入到临时表中(一次100K)
  • 将临时表合并到主表中(最终达到数亿行)

  • SQL(带有SORT MERGE提示)如下所示(没什么花哨的):

    merge /*+ USE_MERGE(t s) */
    into F_SCREEN_INSTANCE t
    using F_SCREEN_INSTANCE_BUF s
        on (s.DAY_ID = t.DAY_ID and s.PARTIAL_ID = t.PARTIAL_ID)
    when matched then update set
        t.ACTIVE_TIME_SUM = t.ACTIVE_TIME_SUM + s.ACTIVE_TIME_SUM,
        t.IDLE_TIME_SUM = t.IDLE_TIME_SUM + s.IDLE_TIME_SUM
    when not matched then insert values (
        s.DAY_ID, s.PARTIAL_ID, s.ID, s.AGENT_USER_ID, s.COMPUTER_ID, s.RAW_APPLICATION_ID, s.APP_USER_ID, s.APPLICATION_ID, s.USER_ID, s.RAW_MODULE_ID, s.MODULE_ID, s.START_TIME, s.RAW_SCREEN_NAME, s.SCREEN_ID, s.SCREEN_TYPE, s.ACTIVE_TIME_SUM, s.IDLE_TIME_SUM)
    
    F_SCREEN_INSTANCE表具有(DAY_ID, PARTIAL_ID)作为主键,并且也是IOT(索引组织表)。这使其成为合并联接的理想候选者:行通过查找键进行物理排序。

    到现在为止还挺好。我已经开始进行基准测试,初期看起来不错,一次合并需要10秒钟。但是大约一个小时后,合并占用了大约4分钟的时间,同时占用了大量的tempdb(每次合并4GB)。下面的查询计划显示,即使在理想情况下已经对表进行了排序,但F_SCREEN_INSTANCE在合并之前已重新排序。当然,随着表的增长,将需要更多的tempdb,并且整个方法将分崩离析。

    好,那为什么要重新排序 table 呢?这变成了合并联接实现的限制:the second table is always sorted



    O ... K,那么我可以将主表设置为第一,然后将缓冲区设置为第二吗?不,那也不可能。无论我如何在USE_MERGE提示中列出表,源表始终是第一位。

    最后,这是我的问题:我错过了什么吗?是否可以使这种SORT MERGE方法起作用?

    以下是一些更多详细信息,可解决您可能会问的问题:
  • 什么Oracle版本? 12c。
  • 您是否尝试过HASH JOIN?是的,这很糟糕,符合预期。需要扫描主表以构建哈希表。随着F_SCREEN_INSTANCE的增长,它无法缩放。
  • 您是否尝试过LOOP JOIN?是的,这也很糟糕。考虑到缓冲区表的大小,对F_SCREEN_INSTANCE的100K查找花费了不合理的时间。合并很快就花了3分钟。
  • 总而言之,MERGE JOIN在概念上是最佳的访问策略,但是Oracle实现似乎由于对目标表进行重新排序而受到严重破坏。
  • 最佳答案

    排序合并外部联接将始终将外部联接表放在第二位,而不管提示如何。添加额外的内部联接可以控制联接顺序,然后可以使用ROWID再次联接到大表。希望两个好的连接比一个坏的连接要好。

    假设

    该答案假定排序合并联接是最快的联接,并且该手册是正确的,因为始终对第二个数据集进行排序。如果没有大量有关数据的信息,将很难检验这些假设。

    示例架构

    这是一些类似的表,带有虚假的统计信息,以使优化器认为它们有500M行和100K行。

    create table F_SCREEN_INSTANCE(DAY_ID number, PARTIAL_ID number, ID number, AGENT_USER_ID number,COMPUTER_ID number, RAW_APPLICATION_ID number, APP_USER_ID number, APPLICATION_ID number, USER_ID number, RAW_MODULE_ID number,MODULE_ID number, START_TIME date, RAW_SCREEN_NAME varchar2(100), SCREEN_ID number, SCREEN_TYPE number, ACTIVE_TIME_SUM number, IDLE_TIME_SUM number,
        constraint f_screen_instance_pk primary key (day_id, partial_id)
    ) organization index;
    
    create table F_SCREEN_INSTANCE_BUF(DAY_ID number, PARTIAL_ID number, ID number, AGENT_USER_ID number,COMPUTER_ID number, RAW_APPLICATION_ID number, APP_USER_ID number,APPLICATION_ID number, USER_ID number, RAW_MODULE_ID number, MODULE_ID number, START_TIME date, RAW_SCREEN_NAME varchar2(100), SCREEN_ID number, SCREEN_TYPE number, ACTIVE_TIME_SUM number, IDLE_TIME_SUM number,
        constraint f_screen_instance_buf_pk primary key (day_id, partial_id)
    );
    
    begin
        dbms_stats.set_table_stats(user, 'F_SCREEN_INSTANCE', numrows => 500000000);
        dbms_stats.set_table_stats(user, 'F_SCREEN_INSTANCE_BUF', numrows => 100000);
    end;
    /
    

    问题

    当使用内部联接时,可以使用LEADING提示来实现所需的联接和联接顺序。较小的表F_SCREEN_INSTANCE_BUF是第二个表。
    explain plan for
    select /*+ use_merge(t s) leading(t s) */ *
    from f_screen_instance_buf s
    join f_screen_instance t
        on (s.DAY_ID = t.DAY_ID and s.PARTIAL_ID = t.PARTIAL_ID);
    
    select * from table(dbms_xplan.display(format => '-predicate'));
    
    Plan hash value: 563239985
    
    -----------------------------------------------------------------------------------------------------
    | Id  | Operation           | Name                  | Rows  | Bytes |TempSpc| Cost (%CPU)| Time     |
    -----------------------------------------------------------------------------------------------------
    |   0 | SELECT STATEMENT    |                       |   100K|    19M|       |  6898  (66)| 00:00:01 |
    |   1 |  MERGE JOIN         |                       |   100K|    19M|       |  6898  (66)| 00:00:01 |
    |   2 |   INDEX FULL SCAN   | F_SCREEN_INSTANCE_PK  |   500M|    46G|       |  4504 (100)| 00:00:01 |
    |   3 |   SORT JOIN         |                       |   100K|  9765K|    26M|  2393   (1)| 00:00:01 |
    |   4 |    TABLE ACCESS FULL| F_SCREEN_INSTANCE_BUF |   100K|  9765K|       |    34   (6)| 00:00:01 |
    -----------------------------------------------------------------------------------------------------
    

    更改为左联接时,LEADING提示不起作用。
    explain plan for
    select /*+ use_merge(t s) leading(t s) */ *
    from f_screen_instance_buf s
    left join f_screen_instance t
        on (s.DAY_ID = t.DAY_ID and s.PARTIAL_ID = t.PARTIAL_ID);
    
    select * from table(dbms_xplan.display(format => '-predicate'));
    
    Plan hash value: 1472690071
    
    -----------------------------------------------------------------------------------------------------------------
    | Id  | Operation                    | Name                     | Rows  | Bytes |TempSpc| Cost (%CPU)| Time     |
    -----------------------------------------------------------------------------------------------------------------
    |   0 | SELECT STATEMENT             |                          |   100K|    19M|       |    16M  (1)| 00:10:34 |
    |   1 |  MERGE JOIN OUTER            |                          |   100K|    19M|       |    16M  (1)| 00:10:34 |
    |   2 |   TABLE ACCESS BY INDEX ROWID| F_SCREEN_INSTANCE_BUF    |   100K|  9765K|       |   826   (0)| 00:00:01 |
    |   3 |    INDEX FULL SCAN           | F_SCREEN_INSTANCE_BUF_PK |   100K|       |       |    26   (0)| 00:00:01 |
    |   4 |   SORT JOIN                  |                          |   500M|    46G|   131G|    16M  (1)| 00:10:34 |
    |   5 |    INDEX FAST FULL SCAN      | F_SCREEN_INSTANCE_PK     |   500M|    46G|       |  2703 (100)| 00:00:01 |
    -----------------------------------------------------------------------------------------------------------------
    

    据我所知,此限制没有记录在案。我尝试使用+outlineDBMS_XPLAN设置查看完整的提示集,然后对其进行了更改。但是我所做的一切都无法更改LEFT JOIN版本的加入顺序。也许其他人可以使它起作用。
    select * from table(dbms_xplan.display(format => '-predicate +outline'));
    
    ...
    Outline Data
    -------------
    
      /*+
          BEGIN_OUTLINE_DATA
          USE_MERGE(@"SEL$0E991E55" "T"@"SEL$1")
          LEADING(@"SEL$0E991E55" "S"@"SEL$1" "T"@"SEL$1")
          INDEX_FFS(@"SEL$0E991E55" "T"@"SEL$1" ("F_SCREEN_INSTANCE"."DAY_ID" "F_SCREEN_INSTANCE"."PARTIAL_ID"))
          INDEX(@"SEL$0E991E55" "S"@"SEL$1" ("F_SCREEN_INSTANCE_BUF"."DAY_ID"
                  "F_SCREEN_INSTANCE_BUF"."PARTIAL_ID"))
          OUTLINE(@"SEL$9EC647DD")
          OUTLINE(@"SEL$2")
          MERGE(@"SEL$9EC647DD")
          OUTLINE_LEAF(@"SEL$0E991E55")
          ALL_ROWS
          DB_VERSION('12.1.0.1')
          OPTIMIZER_FEATURES_ENABLE('12.1.0.1')
          IGNORE_OPTIM_EMBEDDED_HINTS
          END_OUTLINE_DATA
      */
    

    可能的解决方案
    --#3: Join the large table to the smaller result set.  This uses the largest table twice,
    --but the plan can use the ROWID for a very quick join.
    explain plan for
    merge into F_SCREEN_INSTANCE t
    using
    (
        --#2: Now get the missing rows with an outer join.  Since the _BUF table is
        --small I assume it does not make a big difference exactly how it it joind
        --to the 100K result set.
        --The hints NO_MERGE and NO_PUSH_PRED are required to keep the INNER_JOIN
        --inline view intact.
        select /*+ no_merge(inner_join) no_push_pred(inner_join) */ inner_join.*
        from f_screen_instance_buf s
        left join
        (
            --#1: Get 100K rows efficiently with an inner join.
            --Note that the ROWID is retrieved here.
            select /*+ use_merge(t s) leading(t s) */ s.*, s.rowid s_rowid
            from f_screen_instance_buf s
            join f_screen_instance t
                on (s.DAY_ID = t.DAY_ID and s.PARTIAL_ID = t.PARTIAL_ID)
        ) inner_join
            on (s.DAY_ID = inner_join.DAY_ID and s.PARTIAL_ID = inner_join.PARTIAL_ID)
    ) s
        on (s.s_rowid = t.rowid)
    when matched then update set
        t.ACTIVE_TIME_SUM = t.ACTIVE_TIME_SUM + s.ACTIVE_TIME_SUM,
        t.IDLE_TIME_SUM = t.IDLE_TIME_SUM + s.IDLE_TIME_SUM
    when not matched then insert values (
        s.DAY_ID, s.PARTIAL_ID, s.ID, s.AGENT_USER_ID, s.COMPUTER_ID, s.RAW_APPLICATION_ID, s.APP_USER_ID, s.APPLICATION_ID, s.USER_ID, s.RAW_MODULE_ID, s.MODULE_ID, s.START_TIME, s.RAW_SCREEN_NAME, s.SCREEN_ID, s.SCREEN_TYPE, s.ACTIVE_TIME_SUM, s.IDLE_TIME_SUM);
    

    它不是很漂亮,但至少它在排序合并联接中首先使用大表生成了一个计划。
    select * from table(dbms_xplan.display);
    
    Plan hash value: 1086560566
    
    -------------------------------------------------------------------------------------------------------------
    | Id  | Operation                | Name                     | Rows  | Bytes |TempSpc| Cost (%CPU)| Time     |
    -------------------------------------------------------------------------------------------------------------
    |   0 | MERGE STATEMENT          |                          |   500G|   173T|       |  5355K (43)| 00:03:30 |
    |   1 |  MERGE                   | F_SCREEN_INSTANCE        |       |       |       |            |          |
    |   2 |   VIEW                   |                          |       |       |       |            |          |
    |*  3 |    HASH JOIN OUTER       |                          |   500G|   179T|    29M|  5355K (43)| 00:03:30 |
    |*  4 |     HASH JOIN OUTER      |                          |   100K|    28M|  3712K|  8663  (53)| 00:00:01 |
    |   5 |      INDEX FAST FULL SCAN| F_SCREEN_INSTANCE_BUF_PK |   100K|  2539K|       |     9   (0)| 00:00:01 |
    |   6 |      VIEW                |                          |   100K|    25M|       |  6898  (66)| 00:00:01 |
    |   7 |       MERGE JOIN         |                          |   100K|    12M|       |  6898  (66)| 00:00:01 |
    |   8 |        INDEX FULL SCAN   | F_SCREEN_INSTANCE_PK     |   500M|    12G|       |  4504 (100)| 00:00:01 |
    |*  9 |        SORT JOIN         |                          |   100K|  9765K|    26M|  2393   (1)| 00:00:01 |
    |  10 |         TABLE ACCESS FULL| F_SCREEN_INSTANCE_BUF    |   100K|  9765K|       |    34   (6)| 00:00:01 |
    |  11 |     INDEX FAST FULL SCAN | F_SCREEN_INSTANCE_PK     |   500M|    46G|       |  2703 (100)| 00:00:01 |
    -------------------------------------------------------------------------------------------------------------
    
    Predicate Information (identified by operation id):
    ---------------------------------------------------
    
       3 - access("INNER_JOIN"."S_ROWID"=("T".ROWID(+)))
       4 - access("S"."PARTIAL_ID"="INNER_JOIN"."PARTIAL_ID"(+) AND
                  "S"."DAY_ID"="INNER_JOIN"."DAY_ID"(+))
       9 - access("S"."DAY_ID"="T"."DAY_ID" AND "S"."PARTIAL_ID"="T"."PARTIAL_ID")
           filter("S"."PARTIAL_ID"="T"."PARTIAL_ID" AND "S"."DAY_ID"="T"."DAY_ID")
    

    关于performance - 优化MERGE语句的SORT MERGE连接,我们在Stack Overflow上找到一个类似的问题:https://stackoverflow.com/questions/22107938/

    10-11 22:38
    查看更多