免费注册 查看新帖 |

Chinaunix

  平台 论坛 博客 文库
最近访问板块 发新帖
查看: 1617 | 回复: 1
打印 上一主题 下一主题

[Veritas NBU] (原创)优化器如何计算LIKE的cardinality ? [复制链接]

论坛徽章:
0
跳转到指定楼层
1 [收藏(0)] [报告]
发表于 2012-03-16 14:36 |只看该作者 |倒序浏览
(原创)优化器如何计算LIKE的cardinality ?












Optimizer how to calculate the cardinality with predicate LIKE? http://**/viewthread.php?tid=156120&extra=page%3D1 作者:文杰 Valen Wong 最近为客户进行多次DW调优,其中又处理了一例由于cardinality错误导致的执行计划异常,一个本该20秒的SQL,却跑了1个半小时。这次比较特殊,并不是skew data导致的。错误的估算是来至于一个LIKE 字句。我google了很久,没有发现任何关于描述LIKE的选择率计算的文章,所以我进行一些研究,基本了解了optimizer对LIKE的cardinality的计算方式。 错误的cardinality估算来如下SQL 片段: SELECT * FROM COST_ELEMENT_DIM T WHERE T.COST_ELEMENT_CODE LIKE '0%' dw@test10g(WANGWENJIE)> set linesize 999 dw@test10g(WANGWENJIE)> set autotrace traceonly dw@test10g(WANGWENJIE)> SELECT * FROM COST_ELEMENT_DIM T WHERE T.COST_ELEMENT_CODE LIKE '0%'; 1095 rows selected. -------------------------------------------------------------------------------------- | Id | Operation | Name | Rows | Bytes | Cost (%CPU)| Time | -------------------------------------------------------------------------------------- | 0 | SELECT STATEMENT | | 34 | 2016 | 5 (0)| 00:00:01 | |* 1 | TABLE ACCESS FULL| COST_ELEMENT_DIM | 34| 2016 | 5 (0)| 00:00:01 | Optimizer的结果是34 实际呢? dw@test10g(WANGWENJIE)> SELECT COUNT(*) FROM COST_ELEMENT_DIM T WHERE T.COST_ELEMENT_CODE LIKE '0%'; COUNT(*) ---------- 1095 30倍,这个GAP 够大了。导致这个REPORT采用了错误的NEST LOOP JOIN. 那么optimizer是如何得出34这个结果的呢? 首先,得谈谈字符类型的范围查询。对于 ColumnA > ‘ADDFFSD’ and ColumnA <=’ZPDFDF’ 这种字符类型范围条件,Oracle 的cardinality计算,是通过一个复杂的转换过程实现。关键的地方就是,把字符转换成数字类型,然后进行数学计算。其基本思路还是传统的范围计算公式: 总记录数 *(高值减去低值)/ (最高值减去最低值) 。具体算法,在COST BASE ORACLE 一书第6章,作者Janathan. 我摘录如下: Oracle appears to behave as follows: • Extract a maximum of 32 bytes from the column; this representation of the column value is how the low_value, high_value and end_point_actual values are stored. • Extract the first 15 bytes from the 32, padding with zeros at the right if necessary. • Convert the 15 bytes hex number to decimal and round to 15 significant figures. Let’s take a worked example—the string 'Aardvark'—to see if we can end up with the value stored in the histogram: 'Aardvark', when dumped in hex from the char(10) column, contains the following list of byte values (note the 20,20 space padding that is imposed by the char(10) definition): '41,61,72,64,76,61,72,6B,20,20'. • Since this is less than 15 bytes, we append a few zeros to produce the number 0x416172647661726B20200000000000. (If the column had been declared as char(40) the value would already have been padded with spaces (0x20) up to 40 characters, so we would stop at the fifteenth byte, and the value we would use would look like 0x416172647661726B20202020202020.) • Converting this rather large hex number to decimal we get 339,475,752,638,459,043,065,991,628,037,554,176. • And if we throw away everything after the first 15 digits then, as required, we get 339,475,752,638,459,000,000,000,000,000,000,000. LIKE ‘0%’ 其实也是一个范围表达式,optimizer也是通过范围进行确定。那么 0 的下一位ASCII 码是1,所以它的范围就是 >=0 和 < 1. 在COST BASE ORACLE 一书,给出一个计算字符cardinality的公式,用于验证我想法: select round( 1114 * ( 2/1114 + (cbo_char_value('1') - cbo_char_value('0')) / (cbo_char_value('PLAN KG N/A') - cbo_char_value('0009400800')) ),2 ) cardinality from dual ; --说明 1114是总记录数 1114 也是distinct 的记录数 0009400800是该表的最高值,PLAN KG N/A是最低值,通过查询dba_tab_column view可以获得。 cbo_char_value是一个函数,它显示oracle用于计算cardinality过程中把字符转换成数值类型的算法。脚本见本文末尾。作者是Jonathan。 dw@test10g(WANGWENJIE)> select 2 round( 3 1114 * ( 4 2/1114 + 5 (cbo_char_value('1') - cbo_char_value('0')) / 6 (cbo_char_value('PLAN KG N/A') - cbo_char_value('0009400800')) 7 ),2 8 ) cardinality 9 from 10 dual 11 ; CARDINALITY ----------- 36.69 结果36.69,已经非常的接近了。 同时我继续测试了更复杂的情况 LIKE ‘0XXA%’, 那么它的范围是 >=OXXA and <0XXB ,结果同样证明了我算法和实际结果非常的接近。存在的微小误差,我不想花时间去继续证明。通过这个试验,已经搞清楚这种CASE。对以后工作的TUNING 有很大的帮助。 另外,如果LIKE 的值超出最高和最低值的范围,那么cardinality会永远返回1. 另外我测试HISTOGRAM, 对帮助optimizer 纠正like的cardinality有效(并不是sliver bullet)。 dw@test10g(WANGWENJIE)> begin 2 -- Call the procedure 3 sys.dbms_stats.gather_table_stats(ownname => 'DW', 4 tabname => 'COST_ELEMENT_DIM', 5 estimate_percent => 100, 6 method_opt => 'for columns COST_ELEMENT_CODE size auto', 7 granularity => 'GLOBAL', 8 degree => 8, 9 cascade => DBMS_STATS.AUTO_CASCADE, 10 force => TRUE); 11 end; 12 / SELECT * FROM COST_ELEMENT_DIM T WHERE T.COST_ELEMENT_CODE LIKE '0%' Execution Plan ---------------------------------------------------------- Plan hash value: 3082164707 -------------------------------------------------------------------------------------- | Id | Operation | Name | Rows | Bytes | Cost (%CPU)| Time | -------------------------------------------------------------------------------------- | 0 | SELECT STATEMENT | | 1092 | 78624 | 5 (0)| 00:00:01 | |* 1 | TABLE ACCESS FULL| COST_ELEMENT_DIM | 1092 | 78624 | 5 (0)| 00:00:01 | Cardinality结果是1092,已经基本正确。 ORACLE怎么使用frequency histogram在计算cardinality在这里不作说明,我有另外的文章详细说明这个过程。 总的来说,对字符型变量,做任何比较,范围的filter,都是有很大风险的,尤其是在数据仓库环境中。 cbo_char_value函数 create or replace function cbo_char_value ( i_instring in varchar2, i_charsize in number default 0 ) return number as m_size number; m_vc varchar2(15); m_n number := 0; begin if i_charsize = 0 then m_size := length(i_instring) ; else m_size := i_charsize ; end if; m_vc := rpad(rpad(i_instring,m_size,' '),15,chr(0)); for i in 1..15 loop dbms_output.put(ascii(substr(m_vc,i,1))); dbms_output.put(chr(9)); dbms_output.put_Line( to_char( power(256,15-i) * ascii(substr(m_vc,i,1)), '999,999,999,999,999,999,999,999,999,999,999,999' ) ); m_n := m_n + power(256,15-i) * ascii(substr(m_vc,i,1)); end loop; return m_n; end;

论坛徽章:
0
2 [报告]
发表于 2012-03-16 14:36 |只看该作者
谢谢分享
您需要登录后才可以回帖 登录 | 注册

本版积分规则 发表回复

  

北京盛拓优讯信息技术有限公司. 版权所有 京ICP备16024965号-6 北京市公安局海淀分局网监中心备案编号:11010802020122 niuxiaotong@pcpop.com 17352615567
未成年举报专区
中国互联网协会会员  联系我们:huangweiwei@itpub.net
感谢所有关心和支持过ChinaUnix的朋友们 转载本站内容请注明原作者名及出处

清除 Cookies - ChinaUnix - Archiver - WAP - TOP