« OLTP Database Machine with Sun FlashFire Technology | Blog首页 | SUN + Oracle推出Exadata 2 终止与HP的合作 »
CBO的魔术 - 一个错误的索引选择会带来的后果
作者:eygle | 【转载请注出处】|【云和恩墨 领先的zData数据库一体机 | zCloud PaaS云管平台 | SQM SQL审核平台 | ZDBM 数据库备份一体机】
链接:https://www.eygle.com/archives/2009/09/cbo_choose_wrong_index.html
在对20亿记录大表的查询中,发现Oracle的执行计划选择并不稳定,当然这是CBO的正常行为,然而当选择不同时,结果是巨大的。链接:https://www.eygle.com/archives/2009/09/cbo_choose_wrong_index.html
在以下查询中,使用指定的索引,查询快速得出结果,但是这依赖于Hints的强制指定:
SQL> select /*+ index(smsmg IDX_smsmg_DEST_MDN) */ count(*)而如果不加Hints,查询是一时无法得出结果的:
2 from smsmg where msg_to_dest_mdn='861318888888' and service_id='54';
COUNT(*)
----------
1
Elapsed: 00:00:00.00
Execution Plan
----------------------------------------------------------
Plan hash value: 1659057974
--------------------------------------------------------------------------------------------------------------------------------
| Id | Operation | Name | Rows | Bytes | Cost (%CPU)| Time | Pstart| Pstop |
--------------------------------------------------------------------------------------------------------------------------------
| 0 | SELECT STATEMENT | | 1 | 18 | 98 (0)| 00:00:02 | | |
| 1 | SORT AGGREGATE | | 1 | 18 | | | | |
|* 2 | TABLE ACCESS BY GLOBAL INDEX ROWID| smsmg | 1 | 18 | 98 (0)| 00:00:02 | ROWID | ROWID |
|* 3 | INDEX RANGE SCAN | IDX_smsmg_msg_to_des_mdn | 106 | | 4 (0)| 00:00:01 | | |
--------------------------------------------------------------------------------------------------------------------------------
Predicate Information (identified by operation id):
---------------------------------------------------
2 - filter("SERVICE_ID"='54')
3 - access("msg_to_dest_mdn"='861318888888')
Statistics
----------------------------------------------------------
0 recursive calls
0 db block gets
82 consistent gets
0 physical reads
0 redo size
515 bytes sent via SQL*Net to client
469 bytes received via SQL*Net from client
2 SQL*Net roundtrips to/from client
0 sorts (memory)
0 sorts (disk)
1 rows processed
SQL> select count(*) from smsmg where msg_to_dest_mdn='861318888888' and service_id='54';其执行计划显示,这一缺省的执行方式导致了错误的索引选择:
select count(*) from smsmg where msg_to_dest_mdn='861318888888' and service_id='54'
*
ERROR at line 1:
ORA-01013: user requested cancel of current operation
Elapsed: 00:04:27.88
SQL> set autotrace trace explain说CBO是Oracle最为博大精深的技术一点也不为过,只是这技术越复杂越深奥出错的机会就越多了。
SQL> select count(*) from smsmg where msg_to_dest_mdn='861318888888' and service_id='54';
Elapsed: 00:00:00.00
Execution Plan
----------------------------------------------------------
Plan hash value: 1152948967
----------------------------------------------------------------------------------------------------------------------------------
| Id | Operation | Name | Rows | Bytes | Cost (%CPU)| Time | Pstart| Pstop |
----------------------------------------------------------------------------------------------------------------------------
| 0 | SELECT STATEMENT | | 1 | 18 | 5 (0)| 00:00:01 | | |
| 1 | SORT AGGREGATE | | 1 | 18 | | | | |
|* 2 | TABLE ACCESS BY GLOBAL INDEX ROWID| smsmg | 1 | 18 | 5 (0)| 00:00:01 | ROWID | ROWID |
|* 3 | INDEX RANGE SCAN | IDX_smsmg_SERVICE_ID | 1 | | 4 (0)| 00:00:01 | | |
----------------------------------------------------------------------------------------------------------------------------------
Predicate Information (identified by operation id):
---------------------------------------------------
2 - filter("msg_to_dest_mdn"='861318888888')
3 - access("SERVICE_ID"='54')
这个表的数据量大约是500G:
SQL> select segment_name,partition_name,bytes/1024/1024/1024 GB,blocks from dba_segments每个分区的记录数量大约是1~2亿条:
2 where owner='SMSMSG' and segment_name='SMSSENDMSG';
SEGMENT_NAME PARTITION_NAME GB BLOCKS
------------------------------ --------------- ---------- ----------
SMSSENDMSG M01 30.625 4014080
SMSSENDMSG M02 29.875 3915776
SMSSENDMSG M03 43.25 5668864
SMSSENDMSG M04 38 4980736
SMSSENDMSG M05 43.1875 5660672
SMSSENDMSG M06 50.6875 6643712
SMSSENDMSG M08 55.4375 7266304
SMSSENDMSG M09 32.125 4210688
SMSSENDMSG M10 23.9375 3137536
SMSSENDMSG M11 25.6875 3366912
SMSSENDMSG M12 31.9375 4186112
SEGMENT_NAME PARTITION_NAME GB BLOCKS
------------------------------ --------------- ---------- ----------
SMSSENDMSG M13 .000061035 8
SMSSENDMSG M07 58.5 7667712
13 rows selected.
SQL> select sum(bytes)/1024/1024/1024 GB from dba_segments where owner='SMSMSG' and segment_name='SMSSENDMSG';
GB
----------
463.250061
SQL> select table_owner,table_name,partition_name,num_rows from dba_tab_partitions
2 where table_owner='SMS9885' and table_name='SMS_TO_ISMG';
TABLE_OWNER TABLE_NAME PARTITION_NAME NUM_ROWS
------------------------------ ------------------------------ --------------- ----------
SMSMSG SMSSENDMSG M01 135605804
SMSMSG SMSSENDMSG M02 134599287
SMSMSG SMSSENDMSG M03 187959758
SMSMSG SMSSENDMSG M04 169663942
SMSMSG SMSSENDMSG M05 187435468
SMSMSG SMSSENDMSG M06 222079762
SMSMSG SMSSENDMSG M07 256482704
SMSMSG SMSSENDMSG M08 229089535
SMSMSG SMSSENDMSG M09 122453724
SMSMSG SMSSENDMSG M10 104093080
SMSMSG SMSSENDMSG M11 116095184
TABLE_OWNER TABLE_NAME PARTITION_NAME NUM_ROWS
------------------------------ ------------------------------ --------------- ----------
SMSMSG SMSSENDMSG M12 143216009
SMSMSG SMSSENDMSG M13 0
13 rows selected.
前面使用的都是单键值索引,其索引选择性参考前几天的记录。
-The End-
历史上的今天...
>> 2019-09-16文章:
>> 2018-09-16文章:
>> 2010-09-16文章:
>> 2008-09-16文章:
>> 2007-09-16文章:
>> 2005-09-16文章:
>> 2004-09-16文章:
By eygle on 2009-09-16 08:19 | Comments (22) | Case | SQL.PLSQL | 2401 |
在表的字段上上有多个索引,oracle会按照创建的先后顺序还是名称选择呢?还是有它自己的算法。。
个人认为Oracle的CBO是基于对SQL或者数据集未知的情况下做的最优匹配。。。前段时间看Cost-Base Fundamentals 十分郁闷的说
如果表有多个索引,且几个索引都可以,计算的COST也是一样的话,会按索引的名称为序来选择索引。
eygle的这个例子,我认为是service_id列上的统计数据有问题导致索引选择错误。
扯蛋,20亿记录最好使用RBO,收集统计数据不现实,如是OLTP影响更大,不信你试试
如是9208版本以下,分区表统计数据这块是有bug的
盖大牛,这表的索引是怎么建的?msg_to_dest_mdn有单独索引,service_id也有单独索引,或者有msg_to_dest_mdn和service_id的复合索引,如果是复合索引,见索引的顺序又是如何?这CBO选错执行计划,是建的索引不对,还是没有定时收集统计信息?看这个错误执行计划,在列service_id上应该有单独索引。
偶今天遇到10g的一百多万的数据 date有个索引
select * from t where data>=trunc(sysdate)
and date=begin_date
and date<date;
就可以用到索引
很明显是索引字段的统计信息由问题,评估成1行了
是不是最大值最小值越界了,还是直方图的信息太老了
多谢各位帮忙参谋!
多谢棉花。
这个索引service_id的直方图不能算老:
Column (#3): SERVICE_ID(VARCHAR2)
AvgLen: 4.00 NDV: 48 Nulls: 0 Density: 2.5440e-10
Histogram: Freq #Bkts: 48 UncompBkts: 5291 EndPtVals: 48
大约是20亿记录左右的信息。
我的意思是你的直方图里可能没有记录service_id=56的信息,基于频率的查查就知道,或者是这个值的数据从上次收集统计信息到这次发生了巨大改变,但是统计信息里没有反应出来
service='54',前面看错了
service_id落在 low_value 和 high_value 之间的。
索引的信息是这样的:
Index Stats::
Index: IDX_SMS_TO_ISMG_DEST_MDN Col#: 9
LVLS: 3 #LB: 12186077 #DK: 19507227 LB/K: 1.00 DB/K: 93.00 CLUF: 1816174507.00
Index: IDX_SMS_TO_ISMG_SERVICE_ID Col#: 3
LVLS: 3 #LB: 8184696 #DK: 48 LB/K: 170514.00 DB/K: 2354832.00 CLUF: 113031962.00
service_id的cluf要小很多,这可能是个影响因素。
每个service_id的总量肯定是很大的,DK 48,其采样数据反映的大约是19亿多时的信息,不算有太大变化。
晚上再看。
Column (#9): DEST_MDN(VARCHAR2)
AvgLen: 14.00 NDV: 19507227 Nulls: 0 Density: 5.1263e-08
Column (#3): SERVICE_ID(VARCHAR2)
AvgLen: 4.00 NDV: 48 Nulls: 0 Density: 2.5440e-10
Histogram: Freq #Bkts: 48 UncompBkts: 5291 EndPtVals: 48
Access Path: index (AllEqRange)
Index: IDX_SMS_TO_ISMG_DEST_MDN
resc_io: 98.00 resc_cpu: 757780
ix_sel: 5.1263e-08 ix_sel_with_filters: 5.1263e-08
Cost: 98.09 Resp: 98.09 Degree: 1
Access Path: index (AllEqRange)
Index: IDX_SMS_TO_ISMG_SERVICE_ID
resc_io: 5.00 resc_cpu: 36143
ix_sel: 2.5562e-10 ix_sel_with_filters: 2.5562e-10
Cost: 5.00 Resp: 5.00 Degree: 1
这个字段唯一值那么少,clustering_factor小很多比较正常,clustering_factor的影响在这里应该没多大,选择性评估的太小了
问题在于Histogram: Freq。Frequency 类型的histogram有bug,会造成density计算偏小。比如service_id的Density: 2.5440e-10。这个计算方法是1/(2*num_rows)。我经常遇到这个bug。如果去掉histogram应该就可以了。oracle对于distinct value个数少于255的一般会用Frequency histogram。
这有篇文章说这个问题的。http://jonathanlewis.wordpress.com/2009/04/23/histogram-change/
Freq的Density计算是有些问题。
删除histogram应该是有用,但是不能删。
谢谢ghostly的信息,我会在合适的时候试一下。
有点高深……
我还没有这么高~~
请教eygle,我也碰到一个类似的问题,强制走索引查询快,不加hints 太慢,执行计划为:SELECT STATEMENT, GOAL = ALL_ROWS Cost=2962 Cardinality=1 Bytes=165
SORT GROUP BY Cost=2962 Cardinality=1 Bytes=165
TABLE ACCESS BY INDEX ROWID Object owner=PUB Object name=AB01 Cost=2 Cardinality=1 Bytes=37
NESTED LOOPS Cost=2961 Cardinality=1 Bytes=165
NESTED LOOPS ANTI Cost=2959 Cardinality=1 Bytes=128
NESTED LOOPS Cost=2957 Cardinality=1 Bytes=112
MERGE JOIN CARTESIAN Cost=2944 Cardinality=1 Bytes=53
INDEX RANGE SCAN Object owner=SIMIS Object name=PK_AB15 Cost=2240 Cardinality=213040 Bytes=7030320
BUFFER SORT Cost=1.84467440737095E19 Cardinality=1 Bytes=20
SORT UNIQUE Cost=2 Cardinality=1 Bytes=20
TABLE ACCESS BY INDEX ROWID Object owner=SIMIS Object name=AB02 Cost=2 Cardinality=1 Bytes=20
INDEX RANGE SCAN Object owner=SIMIS Object name=PK_AB02 Cost=1 Cardinality=1
TABLE ACCESS BY INDEX ROWID Object owner=SIMIS Object name=AC13 Cost=13 Cardinality=1 Bytes=59
INDEX RANGE SCAN Object owner=SIMIS Object name=PK_AC13 Cost=12 Cardinality=1
TABLE ACCESS BY INDEX ROWID Object owner=SIMIS Object name=AC10 Cost=2 Cardinality=50287 Bytes=804592
INDEX RANGE SCAN Object owner=SIMIS Object name=IDX_AC10_BAB221 Cost=1 Cardinality=2
INDEX RANGE SCAN Object owner=PUB Object name=PK_AB01 Cost=1 Cardinality=1
值得注意的是: buffer sort 特别大,请教eygle有可能是什么原因引起的
请教eygle,我也碰到一个类似的问题,强制走索引查询快,不加hints 太慢,执行计划为:SELECT STATEMENT, GOAL = ALL_ROWS Cost=2962 Cardinality=1 Bytes=165
SORT GROUP BY Cost=2962 Cardinality=1 Bytes=165
TABLE ACCESS BY INDEX ROWID Object owner=PUB Object name=AB01 Cost=2 Cardinality=1 Bytes=37
NESTED LOOPS Cost=2961 Cardinality=1 Bytes=165
NESTED LOOPS ANTI Cost=2959 Cardinality=1 Bytes=128
NESTED LOOPS Cost=2957 Cardinality=1 Bytes=112
MERGE JOIN CARTESIAN Cost=2944 Cardinality=1 Bytes=53
INDEX RANGE SCAN Object owner=SIMIS Object name=PK_AB15 Cost=2240 Cardinality=213040 Bytes=7030320
BUFFER SORT Cost=1.84467440737095E19 Cardinality=1 Bytes=20
SORT UNIQUE Cost=2 Cardinality=1 Bytes=20
TABLE ACCESS BY INDEX ROWID Object owner=SIMIS Object name=AB02 Cost=2 Cardinality=1 Bytes=20
INDEX RANGE SCAN Object owner=SIMIS Object name=PK_AB02 Cost=1 Cardinality=1
TABLE ACCESS BY INDEX ROWID Object owner=SIMIS Object name=AC13 Cost=13 Cardinality=1 Bytes=59
INDEX RANGE SCAN Object owner=SIMIS Object name=PK_AC13 Cost=12 Cardinality=1
TABLE ACCESS BY INDEX ROWID Object owner=SIMIS Object name=AC10 Cost=2 Cardinality=50287 Bytes=804592
INDEX RANGE SCAN Object owner=SIMIS Object name=IDX_AC10_BAB221 Cost=1 Cardinality=2
INDEX RANGE SCAN Object owner=PUB Object name=PK_AB01 Cost=1 Cardinality=1
值得注意的是: buffer sort 特别大,请教eygle有可能是什么原因引起的
你这个是不是统计数据有问题,重新分析一下表试试。
MERGE JOIN CARTESIAN出现,下面的BUFFER SORT Cost=1.84467440737095E19 是因为预测 Cardinality=1
你做个10053事件的跟踪文件发给我看看。
最初我也以为这是统计数据有问题,使用dbms_stats分析过之后,执行计划没有任何改变,后台执行直接hang住,v$sql_plan连执行计划都产生不出来。
表ab02数据量不大,只有840条数据。
跟踪文件明天上午发过去,谢谢eygle。
10053trace文件已经发过去,帮忙看看,谢谢eygle!