本文介绍了Oracle根据to_date(literal)的格式使用或忽略索引列的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我使用索引列作为过滤器,将其置于'两个文字值之间'。 (该列位于索引的第二个位置,实际上使执行速度变慢;我稍后会处理它)。

I'm using an indexed column used as a filter by putting it 'between' two literal values. (The column is in the second position of the index and actually makes execution slower; I will deal with that later).

令我困惑的是Oracle(11.2.0.3.0)使用或忽略所述索引,具体取决于提供给to_date的值的格式和格式字符串:

What's confusing me is that Oracle (11.2.0.3.0) uses or ignores said index depending on the format of the value and format strings supplied to to_date:

这会忽略索引:

SQL> SELECT *
  2  FROM gprs_history_import  gh
  3  WHERE start_call_date_time BETWEEN
  4      to_date('20140610 000000','yyyymmdd hh24miss') AND
  5      to_date('20140610 235959','yyyymmdd hh24miss')
  6  /

Execution Plan
----------------------------------------------------------
Plan hash value: 990804809

--------------------------------------------------------------------------------------------------------------
| Id  | Operation              | Name                | Rows  | Bytes | Cost (%CPU)| Time     | Pstart| Pstop |
--------------------------------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT       |                     |   350 |   219K|   242K  (1)| 00:56:42 |       |       |
|   1 |  PARTITION RANGE SINGLE|                     |   350 |   219K|   242K  (1)| 00:56:42 |    74 |    74 |
|   2 |   PARTITION LIST ALL   |                     |   350 |   219K|   242K  (1)| 00:56:42 |     1 |     3 |
|*  3 |    TABLE ACCESS FULL   | GPRS_HISTORY_IMPORT |   350 |   219K|   242K  (1)| 00:56:42 |   220 |   222 |
--------------------------------------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------

   3 - filter("START_CALL_DATE_TIME"<=TO_DATE(' 2014-06-10 23:59:59', 'syyyy-mm-dd hh24:mi:ss'))

这个确实使用了索引(注意第4行日期部分之后的空格):

This one does use the index (notice the space after the date part in line 4):

SQL> SELECT *
  2  FROM gprs_history_import  gh
  3  WHERE start_call_date_time BETWEEN
  4      to_date('20140610 ','yyyymmdd ') AND
  5      to_date('20140610 235959','yyyymmdd hh24miss')
  6  /

Execution Plan
----------------------------------------------------------
Plan hash value: 464458373

---------------------------------------------------------------------------------------------------------------------------------
| Id  | Operation                            | Name                     | Rows  | Bytes | Cost (%CPU)| Time     | Pstart| Pstop |
---------------------------------------------------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT                     |                          |   350 |   219K|  2795K  (1)| 10:52:15 |       |       |
|*  1 |  FILTER                              |                          |       |       |            |          |       |       |
|   2 |   PARTITION RANGE ITERATOR           |                          |   350 |   219K|  2795K  (1)| 10:52:15 |   KEY |    74 |
|   3 |    PARTITION LIST ALL                |                          |   350 |   219K|  2795K  (1)| 10:52:15 |     1 |     3 |
|   4 |     TABLE ACCESS BY LOCAL INDEX ROWID| GPRS_HISTORY_IMPORT      |   350 |   219K|  2795K  (1)| 10:52:15 |   KEY |   222 |
|*  5 |      INDEX SKIP SCAN                 | GPRS_HISTORY_IMPORT_IDX1 |     1 |       |  2795K  (1)| 10:52:15 |   KEY |   222 |
---------------------------------------------------------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------

   1 - filter(TO_DATE('20140610 ','yyyymmdd ')<=TO_DATE(' 2014-06-10 23:59:59', 'syyyy-mm-dd hh24:mi:ss'))
   5 - access("START_CALL_DATE_TIME">=TO_DATE('20140610 ','yyyymmdd ') AND "START_CALL_DATE_TIME"<=TO_DATE(' 2014-06-10
              23:59:59', 'syyyy-mm-dd hh24:mi:ss'))
       filter("START_CALL_DATE_TIME">=TO_DATE('20140610 ','yyyymmdd ') AND "START_CALL_DATE_TIME"<=TO_DATE(' 2014-06-10
              23:59:59', 'syyyy-mm-dd hh24:mi:ss'))

((1)中的过滤器似乎有点傻,好像Oracle 没有理解表达式一样)

(The filter in (1) seems a bit silly, as if Oracle didn't understand the expression)

同样,这个没有(我删除了尾随的sp ace):

Again, this one doesn't (I removed the trailing space):

SQL> SELECT *
  2  FROM gprs_history_import  gh
  3  WHERE start_call_date_time BETWEEN
  4      to_date('20140610','yyyymmdd') AND
  5      to_date('20140610 235959','yyyymmdd hh24miss')
  6  /

Execution Plan
----------------------------------------------------------
Plan hash value: 990804809

--------------------------------------------------------------------------------------------------------------
| Id  | Operation              | Name                | Rows  | Bytes | Cost (%CPU)| Time     | Pstart| Pstop |
--------------------------------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT       |                     |   350 |   219K|   242K  (1)| 00:56:42 |       |       |
|   1 |  PARTITION RANGE SINGLE|                     |   350 |   219K|   242K  (1)| 00:56:42 |    74 |    74 |
|   2 |   PARTITION LIST ALL   |                     |   350 |   219K|   242K  (1)| 00:56:42 |     1 |     3 |
|*  3 |    TABLE ACCESS FULL   | GPRS_HISTORY_IMPORT |   350 |   219K|   242K  (1)| 00:56:42 |   220 |   222 |
--------------------------------------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------

   3 - filter("START_CALL_DATE_TIME"<=TO_DATE(' 2014-06-10 23:59:59', 'syyyy-mm-dd hh24:mi:ss'))

在空格周围加上引号会阻止索引的使用。

Putting quotes around the space precludes the index from ever being used.

给出了什么?

推荐答案

优化器或解析器中的错误会导致某些日期格式将静态分区修剪降级为动态分区修剪。分区修剪更改会导致不同的基数和成本,从而导致计划的许多其他部分发生重大变化。

Bugs in the optimizer or the parser cause some date formats to downgrade static partition pruning to dynamic partition pruning. Partition pruning changes lead to different cardinality and costs, which then lead to significant changes to many other parts of the plan.

此答案仅部分解释了问题,并包含一些猜测。希望它至少可以解决问题的根源和不存在的问题。如果你真的需要一个完整的解释并希望向Oracle提交服务请求,这至少是一个很好的起点。

This answer only partially explains the problem and contains some speculation. Hopefully it will at least shed a little light on what the problem is and is not. It is at least a good starting point if you really need a complete explanation and want to submit a service request to Oracle.

术语和一些背景阅读

静态分区修剪是优化程序在编译时确定将使用哪个分区的时间。统计数据是按分区划分的,从而可以获得更好的基数估算,从而实现更好的计划。例如,考虑一个按状态分区的表,其中CANCELED
的分区很小,而ACTIVE的分区很大。知道使用哪个分区可以完全改变最佳计划的连接顺序和访问方法。使用静态分区修剪时,
Pstart Pstop 将是数值。

Static partition pruning is when the optimizer determines at compile time which partition will be used. Statistics are per-partition, leading to better cardinality estimates, leading to better plans. For example, think of a table partitioned by status where the partition for CANCELLED is tiny and the partition for ACTIVE is large. Knowing which partition is used can completely change the join order and access methods of the optimal plan. Pstart and Pstop will be numeric values when static partition pruning is used.

动态分区修剪是优化器在运行时之前无法确定分区的时间。仅从所需的分区中检索数据,但构建
执行计划时并不知道使用了哪个分区。一些分区统计估计将是所有可用分区的简单平均值。
在上面按状态划分的表的示例中,小分区和大分区的平均值也不能准确表示。 Pstart
Pstop 将包含单词 KEY 使用动态分区修剪时。

Dynamic partition pruning is when the optimizer cannot determine the partition until runtime. Data is only retrieved from the required partitions but theexecution plan is built without special knowledge of which partition is used. Some partition statistic estimates will be a simple average of all available partitions. In the above example of a table partitioned by status, the average of a tiny partition and a large partition does not accurately represent either. Either Pstart or Pstop will include the word KEY when dynamic partition pruning is used.

Oracle®数据库VLDB和分区指南包含有关
值得一读。例如,手册中的一个相关引用:

The Oracle® Database VLDB and Partitioning Guide includes a section about Datatype Conversions that is worth reading. For example, one relevant quote from the manual:

示例模式和数据

这个简单的测试用例演示了这个问题。它还排除了常见的性能问题,例如缺少统计信息。

This simple test case demonstrates the problem. It also rules out common performance issues, such as missing statistics.

首先,创建一个包含2个分区的样本表,一个大分区,一个小分区。

First, create a sample table with 2 partitions, one large and one small.

create table gprs_history_import(id number, start_call_date_time date)
partition by range (start_call_date_time)
(
    partition p_large values less than (date '2014-06-01'),
    partition p_small values less than (date '2014-07-01')
);

insert into gprs_history_import
select level, date '2014-05-01'
from dual connect by level <= 1000;

insert into gprs_history_import
select level, date '2014-06-01'
from dual connect by level <= 10;

begin
    dbms_stats.gather_table_stats(user, 'GPRS_HISTORY_IMPORT');
end;
/

select count(*) from gprs_history_import partition (p_large); -- 1000
select count(*) from gprs_history_import partition (p_small); --   10

静态到动态会导致基数估算不佳

静态基数估计值是完美的1000.第二个日期格式的额外空格从 Pstop 更改为> KEY 。该计划从静态更改为动态分区修剪。动态估计是不准确的505,平均值为1000和10

The static cardinality estimate is a perfect 1000. The extra space in second date format changes Pstop from 1 to KEY. The plan changes from static to dynamic partition pruning. The dynamic estimate is an inaccurate 505, the average of 1000 and 10

为简单起见,此示例仅显示不良的基数估计值。没有必要显示查询运行缓慢,因为错误的行估计不可避免地导致执行计划错误,原因很多。

For simplicity this example only shows a bad cardinality estimate. It is not necessary to show a query running slow, since bad row estimates inevitably lead to bad execution plans for many reasons.

explain plan for select /* static partition pruning */ *
from gprs_history_import
where start_call_date_time < to_date('20140601 000000','yyyymmdd hh24miss');

select * from table(dbms_xplan.display);

Plan hash value: 452971246

--------------------------------------------------------------------------------------------------------------
| Id  | Operation              | Name                | Rows  | Bytes | Cost (%CPU)| Time     | Pstart| Pstop |
--------------------------------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT       |                     |  1000 | 12000 |    16   (0)| 00:00:01 |       |       |
|   1 |  PARTITION RANGE SINGLE|                     |  1000 | 12000 |    16   (0)| 00:00:01 |     1 |     1 |
|   2 |   TABLE ACCESS FULL    | GPRS_HISTORY_IMPORT |  1000 | 12000 |    16   (0)| 00:00:01 |     1 |     1 |
--------------------------------------------------------------------------------------------------------------

explain plan for select /* dybnamic partition pruning */ *
from gprs_history_import
where start_call_date_time < to_date('20140601 ','yyyymmdd ');

select * from table(dbms_xplan.display);


Plan hash value: 2464174375

----------------------------------------------------------------------------------------------------------------
| Id  | Operation                | Name                | Rows  | Bytes | Cost (%CPU)| Time     | Pstart| Pstop |
----------------------------------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT         |                     |   505 |  6060 |    29   (0)| 00:00:01 |       |       |
|   1 |  PARTITION RANGE ITERATOR|                     |   505 |  6060 |    29   (0)| 00:00:01 |     1 |   KEY |
|*  2 |   TABLE ACCESS FULL      | GPRS_HISTORY_IMPORT |   505 |  6060 |    29   (0)| 00:00:01 |     1 |   KEY |
----------------------------------------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------

   2 - filter("START_CALL_DATE_TIME"<TO_DATE('20140601 ','yyyymmdd '))

日期格式解析问题

现在有人猜测为什么查询从静态分区修剪到动态分区修剪。

Now for some speculation as to why the query moves from static to dynamic partition pruning.

当优化器可以使用静态和动态分区时,并不总是很明显。通常,文字允许静态修剪,变量需要动态修剪。

It is not always obvious when the optimizer can use static and dynamic partitioning. In general, literals allow static pruning and variables require dynamic pruning.

--#1: Obviously static: It uses an unambiguous ANSI date literal.
select * from gprs_history_import where start_call_date_time = date '2000-11-01';

--#2: Obviously dyanmic: It uses a bind variable.
select * from gprs_history_import where start_call_date_time = :date;

--#3: Probably dynamic: The optimizer cannot always infer the literal value.
select * from gprs_history_import where start_call_date_time =
    (select date '2000-11-01' from dual);

--#4: Probably static: FEB is not always valid, but Oracle can figure that out.
select * from gprs_history_import where start_call_date_time =
    to_date('01-FEB-2000', 'DD-MON-YYYY');

当你考虑案例#4的所有表现和国际化问题时,很明显它是多么困难解析日期。 to_date('01-FEB-2000','DD-MON-YYYY')的值取决于多个NLS参数,例如 NLS_DATE_LANGUAGE 。该查询对英语有效,但不对德语有效。如果 NLS_CALENDAR 未设置为 GREGORIAN ,即使是全数字日期格式也可能是错误的。 to_date 字符串不是绑定值,但它也不是字面值。

When you consider all the performance and internationalization issues around case #4 it becomes clear how difficult it is to parse dates. The value of to_date('01-FEB-2000', 'DD-MON-YYYY') depends on several NLS parameters, such as NLS_DATE_LANGUAGE. The query is valid for English but not German. And if NLS_CALENDAR is not set to GREGORIAN than even an all-numeric date format can be wrong. The to_date string is not a bind value but it's not clearly a literal either.

真实日期文字之间的区别如果计算硬分析,则格式化字符串会更明显。即使语言被更改,查询#1也不会强制进行硬解析,但查询#4会。这可以通过运行每个变体,更改语言,然后运行从v $ sesstat自然连接v $ statname中选择值来证明,其中name ='parse count(hard)'和sid = userenv( 'SID');

The difference between true date literals and formatted strings is more apparent if hard parses are counted. Query #1 will not force a hard-parse even if the language is changed, but query #4 will. This can be demonstrated by running several variations of each, changing the language, and then running select value from v$sesstat natural join v$statname where name = 'parse count (hard)' and sid = userenv('SID');.

Oracle必须在某个地方有一个变量来表示这不是绑定变量,但可能导致基于不同的计划在NLS设置上。该变量总是会导致动态分区修剪,但必定会出现一些偶尔会破坏它的错误。

Oracle must have a variable somewhere to denote "this is not a bind variable but could lead to different plans based on NLS settings". That variable does not always lead to dynamic partition pruning but there must be some bugs somewhere that occasionally break it.

这篇关于Oracle根据to_date(literal)的格式使用或忽略索引列的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持!

08-31 10:31