Oracle 统计信息不准 (谓词越界) 造成的性能问题

什么是谓词越界? 谓词越界其实就是 SQL 语句的查询条件超出了数据库统计信息所记录的范围. 谓词越界会导致 Oracle 优化器错误的选择 SQL 语句的执行计划, 导致性能问题.

这里举一个简单的例子说明谓词越界导致优化器选择了错误的执行计划.

create table t1 (col1 number);
create index idx_t1 on t1(col1);
begin
  for i in 1..10000 loop
  insert into t1 values (i);
  end loop;
  commit;
end;
/

这里创建了 t1 表, 并在 col1 列上创建了索引, 并向表里写入了 10000 条数据. 提供过对 t1 表收集统计信息, 可以得到目前表 t1 的谓词情况.

SQL> exec dbms_stats.gather_table_stats('SALP','T1');
SQL> select low_value,high_value from dba_tab_col_statistics where table_name='T1' and owner='SALP';
LOW_VALUE  HIGH_VALUE
---------- ----------

C102 C302

SQL> var x number;
SQL> exec dbms_stats.convert_raw_value('C102',:x);

PL/SQL procedure successfully completed.

SQL> select :x from dual;
        :X
----------
         1
SQL> exec dbms_stats.convert_raw_value('C302',:x);

PL/SQL procedure successfully completed.

SQL> select :x from dual;
        :X
----------
     10000

上面用到了一个系统包, 把统计信息表里的上下限裸数据转换成可读的数值.

在谓词范围内的条件查询的执行计划为

explain plan for select * from t1 where col1 between 1 and 10000;
select * from table(dbms_xplan.display);
PLAN_TABLE_OUTPUT
--------------------------------------------------------------------------------
Plan hash value: 1387720244
-------------------------------------------------------------------------------
| Id  | Operation            | Name   | Rows  | Bytes | Cost (%CPU)| Time     |
-------------------------------------------------------------------------------
|   0 | SELECT STATEMENT     |        | 10000 | 40000 |     7   (0)| 00:00:01 |
|*  1 |  INDEX FAST FULL SCAN| IDX_T1 | 10000 | 40000 |     7   (0)| 00:00:01 |
-------------------------------------------------------------------------------
Predicate Information (identified by operation id):
---------------------------------------------------
   1 - filter("COL1">=1 AND "COL1"<=10000)

13 rows selected.

因为这里的条件包含了 t1 表内的所有数据, 所以采用多块读且不需要回表的执行计划是最优的(table access full/index fast full scan), 这里实际使用的是 index fast full scan.

接下来继续向 t1 表写入数据

begin
  for i in 10001..10000000 loop
  insert into t1 values (i);
  end loop;
  commit;
end;
/

在不重新收集统计信息的情况下, 检查表的统计信息

select low_value,high_value from dba_tab_col_statistics where table_name='T1' and owner='SALP';
LOW_VALUE  HIGH_VALUE
---------- ----------

C102 C302

现在来进行一次谓词越界的查询, 使用谓词条件 col1 between 10001 and 10000000. 按道理来说, 这种选择表里 99.9% 数据的语句应该使用多块读且不回表的执行计划(table access full/index fast full scan). 我们来实际试验一下.

SQL> set timing on;
SQL> Select count(*) from t1 where col1 between 10001 and 10000000;
  COUNT(*)
----------
   9990000
Elapsed: 00:00:11.17
SQL> select * from table(dbms_xplan.display_cursor(null,null,'advanced'));
PLAN_TABLE_OUTPUT
--------------------------------------------------------------------------------

SQL_ID 86kr1tnhns36d, child number 0

-------------------------------------
Select count(*) from t1 where col1 between 10001 and 10000000
Plan hash value: 1970818898
----------------------------------------------------------------------------
| Id  | Operation         | Name   | Rows  | Bytes | Cost (%CPU)| Time     |
----------------------------------------------------------------------------
|   0 | SELECT STATEMENT  |        |       |       |     2 (100)|          |
|   1 |  SORT AGGREGATE   |        |     1 |     4 |            |          |
|*  2 |   INDEX RANGE SCAN| IDX_T1 |     1 |     4 |     2   (0)| 00:00:01 |
----------------------------------------------------------------------------

可以看到这条语句执行了 11s 才出结果, 且执行计划选择的是单块读的 index range scan, 而不是我们期望的多块读不回表的两种执行计划之一且返回的 Rows 和 Bytes 出现了严重预估错误.

我们重新为 t1 表收集一次统计信息, 再次执行同样的语句并检查执行计划.

SQL> exec dbms_stats.gather_table_stats('SALP','T1');

PL/SQL procedure successfully completed.

Elapsed: 00:00:07.92
SQL> select count(*) from t1 where col1 between 10001 and 10000000;
  COUNT(*)
----------
   9990000
Elapsed: 00:00:00.31
SQL> select * from table(dbms_xplan.display_cursor(null,null,'advanced'));
PLAN_TABLE_OUTPUT
--------------------------------------------------------------------------------

SQL_ID g47843nv7gsdq, child number 0

-------------------------------------
select count(*) from t1 where col1 between 10001 and 10000000
Plan hash value: 3724264953
---------------------------------------------------------------------------
| Id  | Operation          | Name | Rows  | Bytes | Cost (%CPU)| Time     |
---------------------------------------------------------------------------
|   0 | SELECT STATEMENT   |      |       |       |  4434 (100)|          |
|   1 |  SORT AGGREGATE    |      |     1 |     6 |            |          |
|*  2 |   TABLE ACCESS FULL| T1   |  9990K|    57M|  4434   (2)| 00:00:01 |
---------------------------------------------------------------------------

这次走出了我们希望的执行计划(table access full), 预估的 Rows 和 Bytes 也都正常了, 且语句花了 310ms 就运行完了.

谓词越界一般会发生在什么场景下?

1 临时表

这里指的是业务上的临时表而不是 Oracle 数据库本身的 temporary table. 在某些系统中会根据业务条件创建前台表和后台表, 数据先进入前台表, 处理完毕后, 存入后台表, 并用 delete 语句清理前台表的数据, 前台表起到一个临时表的作用. 我们知道, Oracle 自动收集统计信息的默认时间窗口是工作日晚上的 22 点到凌晨 2 点, 或者周末的早上 6 点到第二天凌晨 2 点. 在自动收集统计信息窗口内, 数据库前台表基本上处于无数据, 或者数据量很小的情况, 那么产生的统计信息就会和白天实际处理业务数据时有偏差, 就有可能发生谓词越界的情况.

2 巨大表

Oracle 触发自动收集某个表的统计信息的条件是表中修改的数据量超过该表数据总量的 10%, 假设一个表每天新增 1w 条数据, 一年后这个表变成了 365w 条数据, 那么这意味着这个表需要再过一个多月才会触发一次自动收集统计信息的作业. 那么在这个表上的谓词查询, 尤其是时间, 序列等自增条件上的查询, 就可能发生谓词越界的情况, 影响优化器正确选择执行计划.

来源: http://www.bubuko.com/infodetail-2607442.html

与本文相关文章

暂无,快来抢沙发吧！