论坛徽章:: 0

电梯直达

1楼 [收藏(0)] [报告]

发表于 2011-12-20 09:48 |只看该作者 |倒序浏览

部分文章发布在itpub上，网民aqcjsy1

分布式优化

在现实系统中，由于业务的扩张或者为了保持已有系统的投资，往往一个应用通常会涉及到多个数据库系统的访问，比如说CRM系统即需要访问营销系统中的一小部分数据，也需要访问客户回访平台的部分数据，而这些数据库系统往往由不同的部门或者公司开发或维护。由于各种不同的原因如信息机密以及部门、公司等出于自身利益的考虑，往往这些数据库难以进行较好的整合，因此新开发的应用就不得不进行分布式的处理，通常这些系统之间又需要能够访问彼此实时的数据，而且对性能又有一定的要求，比如说客户要求系统的响应时间不得超过3秒，此时如果事务在异地涉及的数据访问量超过百万，直接通过dblink进行访问查询在很多情况下会出现响应时间很慢的情况。

总的来说，只要一个系统访问的数据量不大或者对系统性能要求不是非常严格，使用driving_site提示足以满足要求，但是如果系统对响应时间的要求比较高，driving_site将无法满足要求，在这种情况下，我们可能需要重写拆分语句，以便最小化数据的传输。

考虑下列下列生产中的一个例子，cb_client有360万记录，cb_task大约150多万记录，customer有450万记录，operatorbranch几千条，其中cb_client和cb_task在另一个数据库中，通过dblink连接访问。

with t as(

select a.account,

a.branchno,

a.activeid,

b.exectime,

(case

when a.taskstatus = '500' and a.callstatus not in ('5515003', '5515004') then

1

when a.taskstatus = '600' and a.callstatus = '5512004' then

1

when a.taskstatus = '500' and a.callstatus = '5515003' then

1.3

when a.taskstatus = '500' and a.callstatus = '5515004' then

1.4

else

4

end) status

from cb_client a, cb_task b

where a.id = b.id

and a.foreigntype = 'htcrm'

and instr('81,82,83,85,86,88,89,90,91,93,94,,96,97,99', a.activeid) > 0

and to_char(b.exectime, 'yyyymmdd') >= 20100101),

p as(

select account, branchno, exectime, status

from (select t.account,

t.branchno,

t.exectime,

t.status,

row_number() over(partition by t.account, t.branchno order by t.status, t.exectime) rn

from t

/*where instr(v_c_tasktype, t.activeid) > 0*/)

where rn < 2)

select y.l_branch_no,

y.l_total,

y.l_tel_done,

--y.l_email_done,

y.l_face_done,

y.l_auto_done,

y.l_intime_done,

y.l_intime_undo,

to_char(round(y.l_intime_done / y.l_total, 4) * 100, 990.99) || '%' en_intime_done,

to_char(round(y.l_intime_undo / y.l_total, 4) * 100, 990.99) || '%' en_intime_undo

from (select x.l_branch_no,

count(1) as l_total,

sum(case when x.is_intime = 1 and x.status = 1 then 1 else 0 end ) as l_tel_done,

--sum(case when x.is_intime = 1 and x.status = 1.2 then 1 else 0 end ) as l_email_done,

sum(case when x.is_intime = 1 and x.status = 1.3 then 1 else 0 end ) as l_face_done,

sum(case when x.is_intime = 1 and x.status = 1.4 then 1 else 0 end ) as l_auto_done,

sum(case when x.is_intime = 1 then 1 else 0 end ) as l_intime_done,

sum(case when x.is_intime = 0 then 1 else 0 end ) as l_intime_undo

from (select c.vc_customer_no,

c.l_branch_no,

(case

when p.status>=1 and p.status <=1.4 and p.exectime is not null and

(to_char(p.exectime, 'yyyymmdd') >= 20100101 or 20100101 = 0) and

(to_char(p.exectime, 'yyyymmdd') <= 20110101 or 20110101 = 0) then

1

else

0

end) is_intime,

p.status

from customer c, p

where c.vc_customer_no = p.account(+)

and c.l_branch_no = p.branchno(+)

and c.c_status = '1'

and (c.l_reg_date >= 19800101 or 19800101 = 0)

and (c.l_reg_date <= 20091231 or 20091231 = 0)) x

group by x.l_branch_no) y

where exists (select 1

from operatorbranch op

where op.vc_operator_no = 8888

and op.l_branch_no = y.l_branch_no)

order by y.l_branch_no;

在具有32G内存，6*4核的系统中这个语句需要执行将近90秒，而该查询有比较频繁，客户要求最好越快越好，一定要在30秒内响应。其原始执行计划如下：

因为这个语句的涉及了两个系统的访问，其中每个系统所要访问的数据量相差不是很大。所以在那边做都不合适，最好结局是各自在一边处理。因此，在本例中，分布式常用的hint “driving_site”解决不了我们的实际问题，我们必须使用进行拆分sql语句，将前面的两个with子句拆分成临时表，并使用driving_site提示让查询引擎在远程系统完成结果集的处理。这样网络的传输就可以最小化。

另外，由于每个结果集的大小仍然是100w和400w左右，因此我们在主查询中对customer使用了parallel_index并行扫描可覆盖索引。

Create table t as

with t as(

select /*+ driving_site(a)*/a.account,

a.branchno,

a.activeid,

b.exectime,

(case

when a.taskstatus = '500' and a.callstatus not in ('5515003', '5515004') then

1

when a.taskstatus = '600' and a.callstatus = '5512004' then

1

when a.taskstatus = '500' and a.callstatus = '5515003' then

1.3

when a.taskstatus = '500' and a.callstatus = '5515004' then

1.4

else

4

end) status

from cb_client a, cb_task b

where a.id = b.id

and a.foreigntype = 'htcrm'

and a.activeid in ('81’,’82’,’83’,’85’,’86’,’88’,’89’,’90’,’91’,’93’,’94’,’96’,’97’,’99’) > 0

and to_char(b.exectime, 'yyyymmdd') >= 20100101),

p as(

select account, branchno, exectime, status

from (select t.account,

t.branchno,

t.exectime,

t.status,

row_number() over(partition by t.account, t.branchno order by t.status, t.exectime) rn

from t

/*where instr(v_c_tasktype, t.activeid) > 0*/)

where rn = 1);

select y.l_branch_no,

y.l_total,

y.l_tel_done,

--y.l_email_done,

y.l_face_done,

y.l_auto_done,

y.l_intime_done,

y.l_intime_undo,

to_char(round(y.l_intime_done / y.l_total, 4) * 100, 990.99) || '%' en_intime_done,

to_char(round(y.l_intime_undo / y.l_total, 4) * 100, 990.99) || '%' en_intime_undo

from (select x.l_branch_no,

count(1) as l_total,

sum(case when x.is_intime = 1 and x.status = 1 then 1 else 0 end ) as l_tel_done,

--sum(case when x.is_intime = 1 and x.status = 1.2 then 1 else 0 end ) as l_email_done,

sum(case when x.is_intime = 1 and x.status = 1.3 then 1 else 0 end ) as l_face_done,

sum(case when x.is_intime = 1 and x.status = 1.4 then 1 else 0 end ) as l_auto_done,

sum(case when x.is_intime = 1 then 1 else 0 end ) as l_intime_done,

sum(case when x.is_intime = 0 then 1 else 0 end ) as l_intime_undo

from (select /*+ parallel_index(c,4) */c.vc_customer_no,

c.l_branch_no,

(case

when p.status>=1 and p.status <=1.4 and p.exectime is not null and

(to_char(p.exectime, 'yyyymmdd') >= 20100101 or 20100101 = 0) and

(to_char(p.exectime, 'yyyymmdd') <= 20110101 or 20110101 = 0) then

1

else

0

end) is_intime,

p.status

from customer c, p

where c.vc_customer_no = p.account(+)

and c.l_branch_no = p.branchno(+)

and c.c_status = '1'

and (c.l_reg_date >= 19800101 or 19800101 = 0)

and (c.l_reg_date <= 20091231 or 20091231 = 0)) x

group by x.l_branch_no) y

where exists (select /*+ hash_sj */1

from operatorbranch op

where op.vc_operator_no = 8888

and op.l_branch_no = y.l_branch_no)

order by y.l_branch_no;

最后优化完成后，响应时间为子查询约8s，主查询没有超过2s。两者前后的执行计划除了主查询使用并行以外，并没有发生其他的变化。

最后，在涉及并行和分布式的场合中，将parallel_execution_message_size参数调到4k有一定的性能收益，但不是很大。

笔记本蓝屏，shit。。

注：对于业务不是很复杂的情况，如果只是简单的查询远程的数据，最好还是使用GG或流等机制实现两库同步能够获得比分布式好的多的性能。

文库|博客

返回列表

Chinaunix › 论坛 › 数据库技术 › Oracle › 调优思想之-分布式优化

调优思想之-分布式优化 [复制链接]

分布式优化

浏览过的版块