Chinaunix

标题: 关于大数据量的处理方法 [打印本页]

作者: cloud_zgw 时间: 2008-03-24 10:14
标题: 关于大数据量的处理方法
SYBASE版本12.5ASE
请教一下啊.我现在要处理一批比较大的数据,由于数据来源很多.目前是放在一个临时表内.
但是在后一步处理需要用一个实表与临时表做关联判断.发现很慢很慢,处理一次要4\5个小时.不知道有什么好的处理方法.

目前临时表我已加了索引.但是效果还是很慢.我把表结构放上来.

有处理\优化心得的朋友帮帮忙啊

create table #a_user_cost_tq
   (user_id char(14) not null,
   cal_date datetime not null,
   serial_num int default 0 not null,
   e_cost decimal(14,2) null,
   e_quan int null,
   book_id char(7) null,
   deal_flag_id char(1) null,
   dept_id char(6) null,
   invc_id char(2) null,
   invc_count int default 0 null,
   print_flag char(1) null,
   print_count int default 0 null,
   if_late_fee decimal(14,2) default 0 null,
   charge_enddate char(3) default '115' null,
   pay_type_id char(2) null,
   user_name varchar(60) null,
   user_addr varchar(60) null,
   coll_cost decimal(14,2) null,
   coll_date datetime null,
   coll_staff char(10) null,
   coll_dept char(6) null,
   old_user_id char(14) null,
   lock_bank char(2) null,
   lock_time datetime null,
   bank_pay_type char(1) null,
   plan_cost decimal(12,2) null,
   tpchrg_late_fee decimal(14,2) default 0 null,
   odd_cost decimal(12,2) null,
   tran_date datetime null,
   transfer_num char(

null,
   bank_account char(30) null,
   late_fee decimal(12,2) default 0 null,
   user_type_id char(2) null,
   bank_serial char(12) null,
   charge_type_id char(1) default 'A' null,
   tqys int null,
   tqcs int null,
   score int null, primary key (user_id,cal_date,serial_num))

实表与这个临时表结构基本一致
select    @cal_date_1,a.user_id,"",'0700','',5,'0',a.dept_id
from collect_cost a  (INDEX PK_collect_cost)
where
a.cal_date>=dateadd(mm,-6,@cal_date_1)
and a.dept_id =@dept_id //and a.user_id <'010004748'
and NOT EXISTS (SELECT 'X'
                  from #a_user_cost_tq b
                  WHERE b.cal_date >= dateadd(mm, -6, @cal_date_1)
                  and b.dept_id = @dept_id
                  AND b.user_id = a.user_id
AND a.user_id >= CHAR(0)
and a.serial_num=0

作者: mcolinc 时间: 2008-03-24 18:13
扩充tempdb试试看

作者: maolinzhou 时间: 2008-03-28 13:40
把你需要对比的主表数据也放到临时表中
把你以下部分共存的删掉不比你用not exists快呀,not exists 用不上索引
SELECT 'X'
                  from #a_user_cost_tq b
                  WHERE b.cal_date >= dateadd(mm, -6, @cal_date_1)
                  and b.dept_id = @dept_id
                  AND b.user_id = a.user_id
AND a.user_id >= CHAR(0)
and a.serial_num=0

作者: netchecking 时间: 2008-06-15 15:54
大数据量处理办法其实就是一个核心问题：数据存储算法。
只要针对系统需求，设计良好的算法，任何大数据量都不是问题。可以到我们的研究中心看看。http://www.distributed-cluster.com

作者: zhaopingzi 时间: 2008-06-15 17:05
给tempdb绑定高速缓冲,试试

作者: chuxu 时间: 2008-06-16 09:20
SELECT 'X'
                  from #a_user_cost_tq b
                  WHERE b.cal_date >= dateadd(mm, -6, @cal_date_1)
                  and b.dept_id = @dept_id
                  AND b.user_id = a.user_id

1、看你的这个语句中只用到了#a_user_cost_tq 的几个字段，是否这个表没必要建这么多列？
2、用show plan看一下查询计划吧

欢迎光临 Chinaunix (http://bbs.chinaunix.net/)