本文介绍了Oracle中临时数据的性能注意事项的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我正在评估各种选项,以便针对Oracle中的单个临时数据集运行一堆高性能查询.在T-SQL中,我可能会使用内存中的临时表,但是Oracle没有与此功能完全相同的功能.

I'm evaluating various options to run a bunch of high-performing queries against a single temporary data set in Oracle. In T-SQL, I'd probably use in-memory temporary tables, but Oracle doesn't have an exact equivalent of this feature.

我目前正在查看以下选项:

I'm currently seeing these options:

CREATE GLOBAL TEMPORARY TABLE test_temp_t (
  n NUMBER(10),
  s VARCHAR2(10)
) ON COMMIT DELETE ROWS; -- Other configurations are possible, too

DECLARE
  t test_t;
  n NUMBER(10);
BEGIN

  -- Replace this with the actual temporary data set generation
  INSERT INTO test_temp_t
  SELECT MOD(level, 10), '' || MOD(level, 12)
  FROM dual
  CONNECT BY level < 1000000;

  -- Replace this example query with more interesting statistics
  SELECT COUNT(DISTINCT t.n)
  INTO n
  FROM test_temp_t t;

  DBMS_OUTPUT.PUT_LINE(n);
END;

计划:

----------------------------------------------------
| Id  | Operation            | A-Rows |   A-Time   |
----------------------------------------------------
|   0 | SELECT STATEMENT     |      1 |00:00:00.27 |
|   1 |  SORT AGGREGATE      |      1 |00:00:00.27 |
|   2 |   VIEW               |     10 |00:00:00.27 |
|   3 |    HASH GROUP BY     |     10 |00:00:00.27 |
|   4 |     TABLE ACCESS FULL|    999K|00:00:00.11 |
----------------------------------------------------

2.取消PL/SQL表类型变量的设置

CREATE TYPE test_o AS OBJECT (n NUMBER(10), s VARCHAR2(10));
CREATE TYPE test_t AS TABLE OF test_o;

DECLARE
  t test_t;
  n NUMBER(10);
BEGIN

  -- Replace this with the actual temporary data set generation
  SELECT test_o(MOD(level, 10), '' || MOD(level, 12))
  BULK COLLECT INTO t
  FROM dual
  CONNECT BY level < 1000000;

  -- Replace this example query with more interesting statistics
  SELECT COUNT(DISTINCT n)
  INTO n
  FROM TABLE(t) t;

  DBMS_OUTPUT.PUT_LINE(n);
END;

计划:

------------------------------------------------------------------
| Id  | Operation                          | A-Rows |   A-Time   |
------------------------------------------------------------------
|   0 | SELECT STATEMENT                   |      1 |00:00:00.68 |
|   1 |  SORT GROUP BY                     |      1 |00:00:00.68 |
|   2 |   COLLECTION ITERATOR PICKLER FETCH|    999K|00:00:00.22 |
------------------------------------------------------------------

3.物化视图

在此用例中,我将其排除在外,因为所涉及的临时数据集非常复杂,并且对更新实例化视图的意义将太大.

3. Materialised views

I'm ruling them out for this use-case, because the temporary data set in question is rather complex and the implications on updating the materialised view would be too significant.

以上是我正在尝试执行的操作的示例.实际的数据集包括:

The above are examples of what I'm trying to do. The real data sets involve:

  • 从大约15个联接表中对临时数据进行了归一化.
  • 它的产生速度约为2-20倍/秒.
  • 每个临时数据集的实际行数大约为10-200(不是上面示例中的那么大).
  • 系统的每个用户都有自己的临时数据集(总共1M用户,10,000个并发用户).
  • 一旦建立了数据集,就应该对它进行大约10-50次分析查询.
  • 这些分析必须在线运行,即不能将其推迟到批处理作业中.

根据我的直觉,临时表查询应该" 较慢,因为它(可能)涉及I/O和磁盘访问,而PL/SQL集合查询只是内存中的解决方案.但是在我的琐碎基准测试中,情况并非如此,因为临时表查询比PL/SQL集合查询高出3倍.为什么会这样呢?是否有PL/SQL<-> SQL上下文切换发生?

From my intuition, the temp table query "should" be slower because it (probably) involves I/O and disk access, whereas the PL/SQL collection query is a mere in-memory solution. But in my trivial benchmark, this is not the case as the temp table query beats the PL/SQL collection query by factor 3x. Why is this the case? Is there some PL/SQL <-> SQL context switch happening?

在定义明确的临时数据集上,我是否还有其他选择可以进行快速(但仍广泛)的内存中"数据分析?有没有比较各种选择的重要的公开基准?

Do I have other options for fast (yet extensive) "in-memory" data analysis on a well-defined temporary data set? Are there any significant publicly available benchmarks comparing the various options?

推荐答案

由于缓存和异步I/O,临时表实际上与内存表相同,并且临时表解决方案不需要任何开销即可在SQL和PL/SQL.

Temporary tables are effectively the same as in-memory tables thanks to caching and asynchronous I/O, and the temporary table solution does not require any overhead for converting between SQL and PL/SQL.

确认结果

将这两个版本与RunStats进行比较,临时表版本看起来要差得多.对于Run1中的临时表版本,所有这些垃圾,而对于Run2中的PL/SQL版本,只剩下一点额外的内存.起初,似乎PL/SQL应该是明显的赢家.

Comparing the two versions with RunStats, the temporary table version looks much worse. All that junk for the temporary table version in Run1, and only a little extra memory for the PL/SQL version in Run2. At first it seems like PL/SQL should be the clear winner.

Type  Name                              Run1 (temp) Run2 (PLSQL)         Diff
----- -------------------------------- ------------ ------------ ------------
...
STAT  physical read bytes                    81,920            0      -81,920
STAT  physical read total bytes              81,920            0      -81,920
LATCH cache buffers chains                  104,663          462     -104,201
STAT  session uga memory                    445,488      681,016      235,528
STAT  KTFB alloc space (block)            2,097,152            0   -2,097,152
STAT  undo change vector size             2,350,188            0   -2,350,188
STAT  redo size                           2,804,516            0   -2,804,516
STAT  temp space allocated (bytes)       12,582,912            0  -12,582,912
STAT  table scan rows gotten             15,499,845            0  -15,499,845
STAT  session pga memory                    196,608   19,857,408   19,660,800
STAT  logical read bytes from cache     299,958,272            0 -299,958,272

但是在一天结束时,挂钟时间才是重要的.临时表的加载和查询步骤都快得多.

But at the end of the day only the wall clock time matters. Both the loading and the querying steps run much faster with temporary tables.

可以通过将BULK COLLECT替换为cast(collect(test_o(MOD(a, 10), '' || MOD(a, 12))) as test_t) INTO t来改进PL/SQL版本.但这仍然比临时表版本慢得多.

The PL/SQL version can be improved by replacing the BULK COLLECT with cast(collect(test_o(MOD(a, 10), '' || MOD(a, 12))) as test_t) INTO t. But it's still significantly slower than the temporary table version.

优化阅读次数

从小型临时表中读取仅使用内存中的缓冲区高速缓存.仅多次运行查询部分,并观察consistent gets from cache(内存)如何增加而physical reads cache(磁盘)保持不变.

Reading from the small temporary table only uses the buffer cache, which is in memory. Run only the query part many times, and watch how the consistent gets from cache (memory) increase while the physical reads cache (disk) stay the same.

select name, value
from v$sysstat
where name in ('db block gets from cache', 'consistent gets from cache',
'physical reads cache');

优化写入

理想情况下,将没有物理I/O,尤其是因为临时表为ON COMMIT DELETE ROWS.听起来,Oracle的下一版本可能会引入这样的机制.但是在这种情况下,它并不重要,磁盘I/O似乎并不会减慢速度.

Ideally there would be no physical I/O, especially since the temporary table is ON COMMIT DELETE ROWS. And it sounds like the next version of Oracle may introduce such a mechanism. But it doesn't matter much in this case, the disk I/O does not seem to slow things down.

多次运行加载步骤,然后运行select * from v$active_session_history order by sample_time desc;.大多数I/O是BACKGROUND,这意味着没有任何等待.我假设临时表内部逻辑只是常规DML机制的副本.通常,新表数据 如果已提交,则可能需要写入磁盘. Oracle可能会开始处理它,例如,通过将数据从日志缓冲区移到磁盘上,但是直到出现真正的COMMIT时,我们才着急.

Run the load step multiple times, and then run select * from v$active_session_history order by sample_time desc;. Most of the I/O is BACKGROUND, which means nothing is waiting on it. I assume the temporary table internal logic is just a copy of regular DML mechanisms. In general, new table data may need to be written to disk, if it's committed. Oracle may start working on it, for example by moving data from the log buffer to disk, but there is no rush until there is an actual COMMIT.

PL/SQL时间在哪里?

我不知道.在SQL和PL/SQL引擎之间是否存在多个上下文切换或单个转换?据我所知,没有可用的指标显示在SQL和PL/SQL之间切换所花费的时间.

I have no clue. Are there multiple context switches, or a single conversion between the SQL and PL/SQL engines? As far as I know none of the available metrics show the time spent on switching between SQL and PL/SQL.

我们可能永远无法确切知道为什么PL/SQL代码比较慢.我不用担心太多.普遍的答案是,无论如何,绝大多数数据库工作都必须使用SQL进行.如果甲骨文花了更多的时间优化数据库核心SQL(而不是附加语言PL/SQL),那将很有意义.

We may never know exactly why PL/SQL code is slower. I don't worry about it too much. The general answer is, the vast majority of database work has to be done in SQL anyway. It would make a lot of sense if Oracle spent more time optimizing the core of their database, SQL, than the add-on language, PL/SQL.

附加说明

对于性能测试,将connect by逻辑删除到一个单独的步骤中可能会有所帮助.该SQL是加载数据的绝妙技巧,但它可能非常缓慢且占用大量资源.用这种技巧一次加载一个样本表,然后从该表中插入更为现实.

For performance testing it can be helpful to remove the connect by logic into a separate step. That SQL is a great trick for loading data, but it can be very slow and resource intensive. It's more realistic to load a sample table once with that trick, and then insert from that table.

我尝试使用新的Oracle 12c功能(临时撤消)和新的18c功能(私有临时表).两者都没有比常规临时表提高性能.

I tried using the new Oracle 12c feature, temporary undo, and the new 18c feature, private temporary tables. Neither one improved performance over regular temporary tables.

我不会打赌,但是我可以看到一种结果,随着数据变大,结果将完全改变.日志缓冲区和缓冲区高速缓存只能变大.最终,该后台I/O可能加起来并淹没了某些进程,从而将BACKGROUND等待变成了FOREGROUND等待.另一方面,PL/SQL解决方案只有这么多的PGA内存,然后崩溃.

I wouldn't bet on it, but I can see a way that the results would completely change as the data gets larger. The log buffer and the buffer cache can only get so large. And eventually that background I/O could add up and overwhelm some processes, turning the BACKGROUND wait into a FOREGROUND wait. On the other hand, there's only so much PGA memory for the PL/SQL solution, and then things crash.

最后,这部分证实了我对内存数据库"的怀疑.缓存并不是什么新鲜事物,数据库已经做了数十年了.

Finally, this partially confirms my skepticism of "in-memory databases". Caching is nothing new, databases have been doing it for decades.

这篇关于Oracle中临时数据的性能注意事项的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持!

08-26 08:44