分布式优化
在现实系统中,由于业务的扩张或者为了保持已有系统的投资,往往一个应用通常会涉及到多个数据库系统的访问,比如说CRM系统即需要访问营销系统中的一小部分数据,也需要访问客户回访平台的部分数据,而这些数据库系统往往由不同的部门或者公司开发或维护。由于各种不同的原因如信息机密以及部门、公司等出于自身利益的考虑,往往这些数据库难以进行较好的整合,因此新开发的应用就不得不进行分布式的处理,通常这些系统之间又需要能够访问彼此实时的数据,而且对性能又有一定的要求,比如说客户要求系统的响应时间不得超过3秒,此时如果事务在异地涉及的数据访问量超过百万,直接通过dblink进行访问查询在很多情况下会出现响应时间很慢的情况。
总的来说,只要一个系统访问的数据量不大或者对系统性能要求不是非常严格,使用driving_site提示足以满足要求,但是如果系统对响应时间的要求比较高,driving_site将无法满足要求,在这种情况下,我们可能需要重写拆分语句,以便最小化数据的传输。
考虑下列下列生产中的一个例子,cb_client有360万记录,cb_task大约150多万记录,customer有450万记录,operatorbranch几千条,其中cb_client和cb_task在另一个数据库中,通过dblink连接访问。
with t as(
select a.account,
a.branchno,
a.activeid,
b.exectime,
(case
when a.taskstatus = '500' and
a.callstatus not in ('5515003', '5515004') then
1
when a.taskstatus = '600' and
a.callstatus = '5512004' then
1
when a.taskstatus = '500' and
a.callstatus = '5515003' then
1.3
when a.taskstatus = '500' and
a.callstatus = '5515004' then
1.4
else
4
end) status
from cb_client a, cb_task b
where a.id = b.id
and a.foreigntype = 'htcrm'
and instr('81,82,83,85,86,88,89,90,91,93,94,,96,97,99', a.activeid) >
0
and to_char(b.exectime, 'yyyymmdd') >= 20100101),
p as(
select account, branchno, exectime, status
from (select t.account,
t.branchno,
t.exectime,
t.status,
row_number()
over(partition by t.account, t.branchno order by t.status, t.exectime) rn
from t
/*where instr(v_c_tasktype,
t.activeid) > 0*/)
where rn < 2)
select y.l_branch_no,
y.l_total,
y.l_tel_done,
--y.l_email_done,
y.l_face_done,
y.l_auto_done,
y.l_intime_done,
y.l_intime_undo,
to_char(round(y.l_intime_done /
y.l_total, 4) * 100, 990.99) || '%' en_intime_done,
to_char(round(y.l_intime_undo /
y.l_total, 4) * 100, 990.99) || '%' en_intime_undo
from (select x.l_branch_no,
count(1) as l_total,
sum(case when
x.is_intime = 1 and x.status = 1 then 1 else 0 end ) as l_tel_done,
--sum(case when
x.is_intime = 1 and x.status = 1.2 then 1 else 0 end ) as l_email_done,
sum(case when
x.is_intime = 1 and x.status = 1.3 then 1 else 0 end ) as l_face_done,
sum(case when
x.is_intime = 1 and x.status = 1.4 then 1 else 0 end ) as l_auto_done,
sum(case when
x.is_intime = 1 then 1 else 0 end ) as l_intime_done,
sum(case when
x.is_intime = 0 then 1 else 0 end ) as l_intime_undo
from (select
c.vc_customer_no,
c.l_branch_no,
(case
when
p.status>=1 and p.status <=1.4 and p.exectime is not null and
(to_char(p.exectime, 'yyyymmdd') >= 20100101 or 20100101 = 0) and
(to_char(p.exectime, 'yyyymmdd') <= 20110101 or 20110101 = 0) then
1
else
0
end) is_intime,
p.status
from customer c, p
where c.vc_customer_no
= p.account(+)
and c.l_branch_no =
p.branchno(+)
and c.c_status = '1'
and (c.l_reg_date
>= 19800101 or 19800101 = 0)
and (c.l_reg_date
<= 20091231 or 20091231 = 0)) x
group by x.l_branch_no) y
where exists (select 1
from operatorbranch op
where op.vc_operator_no = 8888
and op.l_branch_no =
y.l_branch_no)
order by y.l_branch_no;
在具有32G内存,6*4核的系统中这个语句需要执行将近90秒,而该查询有比较频繁,客户要求最好越快越好,一定要在30秒内响应。其原始执行计划如下:

因为这个语句的涉及了两个系统的访问,其中每个系统所要访问的数据量相差不是很大。所以在那边做都不合适,最好结局是各自在一边处理。因此,在本例中,分布式常用的hint “driving_site”解决不了我们的实际问题,我们必须使用进行拆分sql语句,将前面的两个with子句拆分成临时表,并使用driving_site提示让查询引擎在远程系统完成结果集的处理。这样网络的传输就可以最小化。
另外,由于每个结果集的大小仍然是100w和400w左右,因此我们在主查询中对customer使用了parallel_index并行扫描可覆盖索引。
Create table t as
with t as(
select /*+ driving_site(a)*/a.account,
a.branchno,
a.activeid,
b.exectime,
(case
when a.taskstatus = '500' and
a.callstatus not in ('5515003', '5515004') then
1
when a.taskstatus = '600' and
a.callstatus = '5512004' then
1
when a.taskstatus = '500' and
a.callstatus = '5515003' then
1.3
when a.taskstatus = '500' and
a.callstatus = '5515004' then
1.4
else
4
end) status
from cb_client a, cb_task b
where a.id = b.id
and a.foreigntype = 'htcrm'
and a.activeid in ('81’,’82’,’83’,’85’,’86’,’88’,’89’,’90’,’91’,’93’,’94’,’96’,’97’,’99’) > 0
and to_char(b.exectime, 'yyyymmdd') >= 20100101),
p as(
select account, branchno, exectime, status
from (select t.account,
t.branchno,
t.exectime,
t.status,
row_number()
over(partition by t.account, t.branchno order by t.status, t.exectime) rn
from t
/*where instr(v_c_tasktype,
t.activeid) > 0*/)
where rn = 1);
select y.l_branch_no,
y.l_total,
y.l_tel_done,
--y.l_email_done,
y.l_face_done,
y.l_auto_done,
y.l_intime_done,
y.l_intime_undo,
to_char(round(y.l_intime_done /
y.l_total, 4) * 100, 990.99) || '%' en_intime_done,
to_char(round(y.l_intime_undo /
y.l_total, 4) * 100, 990.99) || '%' en_intime_undo
from (select x.l_branch_no,
count(1) as l_total,
sum(case when
x.is_intime = 1 and x.status = 1 then 1 else 0 end ) as l_tel_done,
--sum(case when
x.is_intime = 1 and x.status = 1.2 then 1 else 0 end ) as l_email_done,
sum(case when
x.is_intime = 1 and x.status = 1.3 then 1 else 0 end ) as l_face_done,
sum(case when
x.is_intime = 1 and x.status = 1.4 then 1 else 0 end ) as l_auto_done,
sum(case when
x.is_intime = 1 then 1 else 0 end ) as l_intime_done,
sum(case when
x.is_intime = 0 then 1 else 0 end ) as l_intime_undo
from (select /*+
parallel_index(c,4) */c.vc_customer_no,
c.l_branch_no,
(case
when
p.status>=1 and p.status <=1.4 and p.exectime is not null and
(to_char(p.exectime, 'yyyymmdd') >= 20100101 or 20100101 = 0) and
(to_char(p.exectime, 'yyyymmdd') <= 20110101 or 20110101 = 0) then
1
else
0
end) is_intime,
p.status
from customer c, p
where c.vc_customer_no
= p.account(+)
and c.l_branch_no =
p.branchno(+)
and c.c_status = '1'
and (c.l_reg_date
>= 19800101 or 19800101 = 0)
and (c.l_reg_date
<= 20091231 or 20091231 = 0)) x
group by x.l_branch_no) y
where exists (select /*+ hash_sj */1
from operatorbranch op
where op.vc_operator_no = 8888
and op.l_branch_no =
y.l_branch_no)
order by y.l_branch_no;
最后优化完成后,响应时间为子查询约8s,主查询没有超过2s。两者前后的执行计划除了主查询使用并行以外,并没有发生其他的变化。
最后,在涉及并行和分布式的场合中,将parallel_execution_message_size参数调到4k有一定的性能收益,但不是很大。
笔记本蓝屏,shit。。
注:对于业务不是很复杂的情况,如果只是简单的查询远程的数据,最好还是使用GG或流等机制实现两库同步能够获得比分布式好的多的性能。